python - Getting a MemoryError because list/array is too large -
problem
i have download object_x
. simplicity's sake, object_x
comprises series of integers
adding 1000
. download irregular. receive groups or chunks
of integers in seemingly random order, , need keep track of them until have 1000
make final object_x
.
the incoming chunks can overlap, instance:
chunk 1: integers 0-500 chunk 2: integers 600-1000 chunk 3: integers 400-700
current method
create object_x
list
containing of comprising integers 0-1000
. when chunk
downloaded, remove of integers comprise chunk
object_x
. keep doing until object_x
empty (known complete then).
object_x = range(0,1000) # download chunk 1 chunk = range(0, 500) number in chunk: if number in object_x: object_x.remove(number) # repeat every downloaded chunk
conclusion
this method memory intensive. script throws memoryerror if object_x
or chunk
large.
i'm searching better way keep track of chunks build object_x
. ideas? i'm using python, language doesn't matter guess.
this kind of scenario streaming important. doing in memory bad idea because might not have enough memory (as in case). should save chunks disk, keep track of how many downloaded, , when reach 1000, process them on disk (or load them memory 1 one process them).
"c# security: computing file hashes" recent article wrote - it's different subject, illustrate importance of streaming towards end.
Comments
Post a Comment