我正在沿管道进行分析,沿段对点进行建模,并使用一些物理参数。 我需要跟踪进度,以便当模型由于连接问题等而挂起时,我可以重新开始,而无需从头开始。
我将完成的参数存储在字典中,构建到列表中并存储到 CSV。在脚本开始时,我将提取迄今为止所有已完成的运行。
我当前的实现无法很好地扩展,因为我将已完成的运行数据帧值转换为运行列表,然后将其与标题值列表进行比较以查看是否有任何行匹配。 我正在寻找一种能够更好扩展的解决方案。
# at the top of the script, pull in the runs completed to date
completed_runs_df = pd.read_csv('completed_runs.csv')
completed_runs_list_of_values = completed_runs.values
completed_runs_list_of_dicts = completed_runs.to_dict(orient='records')
for model in models:
header = {
'lat': lat,
'long': long,
'press_psig': press_psig,
'diam_in': diam_in,
}
# need a better way to check this
header_vals = list(header.values())
if header_vals in completed_runs_list_of_values:
continue
# modeling goes here
completed_runs_list_of_dicts.append(header)
completed_runs_df = pd.DataFrame(completed_runs_list_of_dicts)
completed_runs_df.to_csv('completed_runs.csv', index=False)
我正在寻找一种直接方法来获取标头字典并立即测试completed_runs_df 中是否存在匹配的值行。
为了提高解决方案的可扩展性,您可以利用 Pandas 的强大功能进行高效的行检查,并避免将 DataFrame 转换为值列表或字典:
# Create a set of tuples representing the completed runs
completed_runs_set = set(tuple(row) for row in completed_runs_df.to_records(index=False))
new_runs = []
# Iterate through the models
for model in models:
header = {
'lat': lat,
'long': long,
'press_psig': press_psig,
'diam_in': diam_in,
}
# Convert header to a tuple
header_tuple = tuple(header.values())
# Check if the header tuple is already in the completed runs set
if header_tuple in completed_runs_set:
continue
###### Modeling goes here ######
# Append the header dictionary to new_runs list
new_runs.append(header)
# Update the DataFrame and CSV file if there are new runs
if new_runs:
new_runs_df = pd.DataFrame(new_runs)
updated_runs_df = pd.concat([completed_runs_df, new_runs_df], ignore_index=True)