我有一个数据框,它只有工作日的数据。以下是示例数据框:
将 pandas 导入为 pd
df = pd.DataFrame({'BAS_DT': ['2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05', '2023-01-05', '2023-01-05', '2023-01-06', '2023-01-07'],
'CUS_NO': ['', '', '900816636', '900816636', '900816946', '900816931', '', '']})
df
BAS_DT CUS_NO
0 2023-01-02
1 2023-01-03
2 2023-01-04 900816636
3 2023-01-05 900816636
4 2023-01-05 900816946
5 2023-01-05 900816931
6 2023-01-06
7 2023-01-07
我想填充
2023-01-06
和 2023-01-07
与 2023-01-05
相同。我尝试了 ffill
但它只是填充了最接近 NaN 行的第一行。以下是我想要的输出:
BAS_DT CUS_NO
0 2023-01-02
1 2023-01-03
2 2023-01-04 900816636
3 2023-01-05 900816636
4 2023-01-05 900816946
5 2023-01-05 900816931
6 2023-01-06 900816636
7 2023-01-06 900816946
8 2023-01-06 900816931
9 2023-01-07 900816636
10 2023-01-07 900816946
11 2023-01-07 900816931
谢谢你。
向前填充(ffill)方法似乎没有按预期工作,因为日期“2023-01-06”和“2023-01-07”的“CUS_NO”字段没有填充“2023-”中的值01-05'。这可能是由于空字符串未被识别为可以向前填充的 NA 值。
我们需要做的是首先将空字符串替换为实际的 NA 值(None 或 pd.NA),然后在 BAS_DT 为“2023-01-06”或“2023-01-07”的日期上应用填充。我将进行此更正并向您展示更新后的 DataFrame。
import pandas as pd
# Assuming 'df' is your initial DataFrame
# Replace empty strings with NaN to enable forward fill
df['CUS_NO'].replace('', pd.NA, inplace=True)
# Forward fill NaN values for '2023-01-06' and '2023-01-07'
mask = df['BAS_DT'].isin([pd.Timestamp('2023-01-06'), pd.Timestamp('2023-01-07')])
df.loc[mask, 'CUS_NO'] = df.loc[mask, 'CUS_NO'].ffill()
# Duplicate the rows for '2023-01-05' and create new rows for '2023-01-06' and '2023-01-07'
rows_to_duplicate = df[df['BAS_DT'] == pd.Timestamp('2023-01-05')].copy()
rows_to_add = pd.concat([rows_to_duplicate] * 2, ignore_index=True)
rows_to_add['BAS_DT'] = pd.date_range(start='2023-01-06', periods=len(rows_to_add), freq='D')
# Combine the original dataframe with the new rows and sort them
result_df = pd.concat([df, rows_to_add]).sort_values(by='BAS_DT').reset_index(drop=True)
# Filter out the rows for '2023-01-06' and '2023-01-07' only
result_df = result_df[result_df['BAS_DT'] <= pd.Timestamp('2023-01-07')]
# Display the final dataframe
print(result_df)
在初始 DataFrame 设置后运行此代码,它将根据 ' 中的 'CUS_NO' 值,为您提供所需的输出,并为 '2023-01-06' 和 '2023-01-07' 填充 'CUS_NO' 值2023-01-05'.