python - Getting a MemoryError because list/array is too large -


problem

i have download object_x. simplicity's sake, object_x comprises series of integers adding 1000. download irregular. receive groups or chunks of integers in seemingly random order, , need keep track of them until have 1000 make final object_x.

the incoming chunks can overlap, instance:

chunk 1: integers 0-500 chunk 2: integers 600-1000 chunk 3: integers 400-700 

current method

create object_x list containing of comprising integers 0-1000. when chunk downloaded, remove of integers comprise chunk object_x. keep doing until object_x empty (known complete then).

object_x = range(0,1000)  # download chunk 1 chunk = range(0, 500)  number in chunk:     if number in object_x:         object_x.remove(number)  # repeat every downloaded chunk 

conclusion

this method memory intensive. script throws memoryerror if object_x or chunk large.

i'm searching better way keep track of chunks build object_x. ideas? i'm using python, language doesn't matter guess.

this kind of scenario streaming important. doing in memory bad idea because might not have enough memory (as in case). should save chunks disk, keep track of how many downloaded, , when reach 1000, process them on disk (or load them memory 1 one process them).

"c# security: computing file hashes" recent article wrote - it's different subject, illustrate importance of streaming towards end.


Comments

Popular posts from this blog

how to proxy from https to http with lighttpd -

android - Automated my builds -

python - Flask migration error -