Pandas 循环和匿名数据

Question

我需要读取 Excel 文件并循环行，保留列的值

Location

，匿名化同一列的其他值，并将所有组合的结果输出到单独的 pdf 文件

原DF

| Location  | Patients | Deceased | Rank |
|-----------|----------|----------|------|
| Leicester | 2000     | 13       | 1    |
| Coventry  | 2200     | 24       | 2    |
| Norwich   | 3000     | 56       | 3    |
| Sheffield | 2800     | 65       | 4    |
| Luton     | 1800     | 90       | 5    |

所需的 DF #1

| Location  | Patients | Deceased | Rank |
|-----------|----------|----------|------|
| Leicester | 2000     | 13       | 1    |
| ########  | 2200     | 24       | 2    |
| ########  | 3000     | 56       | 3    |
| ########  | 2800     | 65       | 4    |
| ########  | 1800     | 90       | 5    |

所需的 DF #2

| Location  | Patients | Deceased | Rank |
|-----------|----------|----------|------|
| ########  | 2000     | 13       | 1    |
| Coventry  | 2200     | 24       | 2    |
| ########  | 3000     | 56       | 3    |
| ########  | 2800     | 65       | 4    |
| ########  | 1800     | 90       | 5    |

我想使用

pd.to_html

写出每个 DF，然后进行转换。

我发现很难弄清楚如何在 pandas 中实现这一点，或者如何最好地使用

iterrows

和

faker

来实现这一点，或者

replace()

是否会在每次迭代中执行。

Answer 1

最简单的方法，就是使用索引来改变一个值一个值

import pandas as pd

import pandas as pd

# Create the fake data
data = {
    'Location': ['Leicester', 'Coventry', 'Norwich', 'Sheffield', 'Luton'],
    'Patients': [2000, 2200, 3000, 2800, 1800],
    'Deceased': [13, 24, 56, 65, 90],
    'Rank': [1, 2, 3, 4, 5]
}

# Create a DataFrame
df = pd.DataFrame(data)

# Step 2: Function to create a new DataFrame with anonymized Locations except for one row
def anonymize_locations(df, index_to_keep):
    # Create a copy of the original DataFrame to avoid modifying it
    df_copy = df.copy()
    # Replace all Location values with '########' except for the index_to_keep
    df_copy['Location'] = df_copy['Location'].apply(lambda x: '########')
    df_copy.at[index_to_keep, 'Location'] = df.at[index_to_keep, 'Location']
    return df_copy


# Step 3: Loop over each row and generate the desired DataFrames and save to html
for index in df.index:
    anonymized_df = anonymize_locations(df, index)
    file_name = f"output_{index + 1}.html"
    anonymized_df.to_html(file_name)
    print(f"Saved {file_name}")

Pandas 循环和匿名数据

问题描述投票：0回答：1

1个回答

最新问题

Pandas 循环和匿名数据

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1