我有两个具有相同列名的 CSV,我想获得逐行差异以将其写入 CSV 文件路径。
我还在文件/数据帧中索引了“ID”列。
示例数据框
data1 = {
'ID': [100, 21, 32, 42, 51, 81],
'Name': ['A', 'B', 'C', 'D','E','F'],
'State': [TX, FL, FL, CA, CA, TX ]
}
data2 = {
'ID': [100, 21, 32, 42, 51, 81],
'Name': ['A', 'BB', 'C', 'DD','E','F'], # Difference in the 2nd,4th row
'State': [TX, TX, FL, CA, CA, TX]
}
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
# Indexed by 'ID'
df1 = df1.set_index('ID')
df2 = df2.set_index('ID')
我的逻辑给了我一个布尔错误。我有多种逻辑,但似乎不起作用。
方法 - 1
# Find common indices between DataFrames
common_index = df1.index.intersection(df2.index)
# Save differences to an output file
output_file_path = 'row_wise_differences.txt'
with open(output_file_path, 'w') as file:
for idx in common_index:
differences = []
for col in df1.columns:
if df1.loc[idx, col] != df2.loc[idx, col]:
differences.append(f"{col}: {df1.loc[idx, col]} <> {df2.loc[idx, col]}")
if differences:
file.write(f"Index ID: {idx}, Differences: {', '.join(differences)}\n")
else:
file.write(f"Index ID: {idx}, No Differences\n")
print(f"Differences saved to {output_file_path}")
方法2
common_index = df1.index.intersection(df2.index)
# Save differences to an output file
output_file_path = 'row_wise_differences.txt'
with open(output_file_path, 'w') as file:
for idx in common_index:
differences = [f"{col}: {df1.loc[idx, col]} <> {df2.loc[idx, col]}" for col in df1.columns if df1.loc[idx, col] != df2.loc[idx, col]]
if differences:
file.write(f"Index ID: {idx}, Differences: {', '.join(differences)}\n")
else:
file.write(f"Index ID: {idx}, No Differences\n")
print(f"Differences saved to {output_file_path}")
方法3
# Create a DataFrame showing differences as 'ID: Column: Value1 <> Value2'
diff_df = df1.loc[common_index][differences].stack().reset_index()
diff_df.columns = ['ID', 'Column', 'Difference']
diff_df['Difference'] = diff_df['Column'] + ': ' + diff_df['Difference'].astype(str)
# Save differences to an output CSV file
output_file_path = 'row_wise_differences.csv'
diff_df.to_csv(output_file_path, index=False)
print(f"Differences saved to {output_file_path}")
预期产出 索引 ID:21,差异:名称:B <> BB,州:FL <> TX 索引 ID:42,差异:名称:D <> DD,州:CA <> CA
只要捕获 df1 和 df2 名称以及行间差异,输出格式并不重要。
请帮我比较逻辑。 我的所有逻辑都遇到了下面提到的错误
ValueError: The truth value of a series is ambiguous, Use a.empty, a.bool(), a.item(), a.any() or a.all()
预先感谢您花时间帮助我!
我根本没有收到您描述的错误,但从您的评论来看,似乎唯一真正的问题是您需要获取输出中的所有列。
您可以通过添加一些逻辑来完成此操作,以确保获得输出中的所有列值:
import pandas as pd
data1 = {
'ID': [100, 21, 32, 42, 51, 81],
'Name': ['A', 'B', 'C', 'D','E','F'],
'State': ['TX', 'FL', 'FL', 'CA', 'CA', 'TX' ]
}
data2 = {
'ID': [100, 21, 32, 42, 51, 81],
'Name': ['A', 'BB', 'C', 'DD','E','F'], # Difference in the 2nd,4th row
'State': ['TX', 'TX', 'FL', 'CA', 'CA', 'TX']
}
df1 = pd.DataFrame(data1).set_index("ID")
df2 = pd.DataFrame(data2).set_index("ID")
# Find common indices between DataFrames
common_index = df1.index.intersection(df2.index)
# Save differences to an output file
found = False
output_file_path = 'row_wise_differences.txt'
with open(output_file_path, 'w') as file:
for idx in common_index:
differences = []
found = False
for col in df1.columns:
if df1.loc[idx, col] != df2.loc[idx, col]:
found = True
break
if found:
for col in df1.columns:
differences.append(f"{col}: {df1.loc[idx, col]}/{df2.loc[idx, col]}")
file.write(f"Index ID: {idx}, Differences: {', '.join(differences)}\n")
else:
file.write(f"Index ID: {idx}, No Differences\n")
print(f"Differences saved to {output_file_path}")
result = df1.compare(df2, align_axis=1, keep_equal=True, result_names=('DF1', 'DF2'))
result.columns = [f'{col[1]}-{col[0]}' for col in dfa.columns.values]