I'm currently working on a project that I need help. I am working on some large graphs that I need to take some properties of them though out the years. I was thinking of using multiprocessing or threading package from Python. I have a for loop that goes though each year and produces a csv. I'm not sure how can I parallelize this, can you help me?
Here is my code:
for year in tqdm(years):
temp_df = df[df.label <= year]
processed_df = id_df.copy()
G = nx.DiGraph()
G.add_edges_from(temp_df.iloc[:,:2].values.tolist())
# Degree Centrality
DegreeCentrality = degree_centrality(G)
DegreeCentrality_df = pd.DataFrame(DegreeCentrality.items(), columns=['id', 'DegreeCentrality'])
processed_df = pd.merge(processed_df, DegreeCentrality_df, how='left', on='id').fillna(0)
del DegreeCentrality
del DegreeCentrality_df
gc.collect()
# In Degree Centrality
InDegreeCentrality = in_degree_centrality(G)
InDegreeCentrality_df = pd.DataFrame(InDegreeCentrality.items(), columns=['id', 'InDegreeCentrality'])
processed_df = pd.merge(processed_df, InDegreeCentrality_df, how='left', on='id').fillna(0)
del InDegreeCentrality
del InDegreeCentrality_df
gc.collect()
processed_df.to_csv('properties_{}'.format(year), index=False)
My guess is that I should make everything that is in the for loop as a function and call it for different threads. Any help would be appreciated, thank you!
question from:
https://stackoverflow.com/questions/65915926/multiprocessing-or-threading-a-for-loop-in-python 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…