我有一个相当大的 pandas 数据框,我想根据条件选择一些行。
问题在于,保存为 CSV 的操作与程序的整体流程是分开的,并且会消耗相当多的时间。
是否可以分离线程,以便主线程前进到选定的行,同时未选定的行在另一个线程中保存为 csv?
比如...
# This is sudo code
import pandas as pd
df = pd.DataFrame({"col1":[x for x in range(10000)], "col2":[x**2 for x in range(0, 10000)]})
df_selected = df[df.apply(lambda x: x.col1%3==0, axis=1)]
df_unselected = df[df.apply(lambda x: x.col1%3!=0, axis=1)]
def Other_thread_save_to_csv(df:pd.DataFrame):
# this function is the last function to use df_unselected .
Other_thread_save_to_csv(df_unselected )
all_other_hadlings(df_selected )
尝试这样
import pandas as pd
import threading
df = pd.DataFrame({"col1":[x for x in range(10000)], "col2":[x**2 for x in range(0, 10000)]})
df_selected = df[df.apply(lambda x: x.col1 % 3 == 0, axis=1)]
df_unselected = df[df.apply(lambda x: x.col1 % 3 != 0, axis=1)]
def other_thread_save_to_csv(df_unselected):
df_unselected.to_csv('unselected_data.csv', index=False)
save_csv_thread = threading.Thread(target=other_thread_save_to_csv, args=(df_unselected,))
save_csv_thread.start()
def all_other_handling(df_selected):
all_other_handling(df_selected)