I'm using python to process large time series datasets. The data are processed frames by frames that are partially overlapped. Currently I'm processing in this way: read -> process -> write -> read -> process ....
read_data_into_datastore(...) # read data into a ndarray limited by RAM
while datastore_is_not_empty:
if few_data_in_store: # fill if not much data in the store
read_successive_data_into_datastore(...)
frames = pop_from_datastore(...) # fetch frames from datastore
results = process(frames)
write_results_to_disk(results)
The writing is not a big problem if I don't ask for flushing. But the time-consuming reading blocks the loops frequently. I want to speed it up by using two threadings
or two processings
: one called cargo
in charge of monitoring and filling the datastore, the other called processor
in charge of the processing.
My difficulty is, both cargo
and processor
change the data ndarray (by filling and by cutting). It will be a mess if processor
cuts some data from the datastore, while in the meantime, cargo
is filling the store. How can I "lock" the datastore when one operation is ongoing?
question from:
https://stackoverflow.com/questions/65922280/speed-up-python-codes-by-separating-io-from-cpu-calculation 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…