我需要读取 Excel 文件并循环行,保留列的值
Location
,匿名化同一列的其他值,并将所有组合的结果输出到单独的 pdf 文件
原DF
| Location | Patients | Deceased | Rank |
|-----------|----------|----------|------|
| Leicester | 2000 | 13 | 1 |
| Coventry | 2200 | 24 | 2 |
| Norwich | 3000 | 56 | 3 |
| Sheffield | 2800 | 65 | 4 |
| Luton | 1800 | 90 | 5 |
所需的 DF #1
| Location | Patients | Deceased | Rank |
|-----------|----------|----------|------|
| Leicester | 2000 | 13 | 1 |
| ######## | 2200 | 24 | 2 |
| ######## | 3000 | 56 | 3 |
| ######## | 2800 | 65 | 4 |
| ######## | 1800 | 90 | 5 |
所需的 DF #2
| Location | Patients | Deceased | Rank |
|-----------|----------|----------|------|
| ######## | 2000 | 13 | 1 |
| Coventry | 2200 | 24 | 2 |
| ######## | 3000 | 56 | 3 |
| ######## | 2800 | 65 | 4 |
| ######## | 1800 | 90 | 5 |
我想使用
pd.to_html
写出每个 DF,然后进行转换。
我发现很难弄清楚如何在 pandas 中实现这一点,或者如何最好地使用
iterrows
和 faker
来实现这一点,或者 replace()
是否会在每次迭代中执行。
最简单的方法,就是使用索引来改变一个值一个值
import pandas as pd
import pandas as pd
# Create the fake data
data = {
'Location': ['Leicester', 'Coventry', 'Norwich', 'Sheffield', 'Luton'],
'Patients': [2000, 2200, 3000, 2800, 1800],
'Deceased': [13, 24, 56, 65, 90],
'Rank': [1, 2, 3, 4, 5]
}
# Create a DataFrame
df = pd.DataFrame(data)
# Step 2: Function to create a new DataFrame with anonymized Locations except for one row
def anonymize_locations(df, index_to_keep):
# Create a copy of the original DataFrame to avoid modifying it
df_copy = df.copy()
# Replace all Location values with '########' except for the index_to_keep
df_copy['Location'] = df_copy['Location'].apply(lambda x: '########')
df_copy.at[index_to_keep, 'Location'] = df.at[index_to_keep, 'Location']
return df_copy
# Step 3: Loop over each row and generate the desired DataFrames and save to html
for index in df.index:
anonymized_df = anonymize_locations(df, index)
file_name = f"output_{index + 1}.html"
anonymized_df.to_html(file_name)
print(f"Saved {file_name}")