用 dict 的日期时间索引替换数据框中的 NaN 不起作用

问题描述 投票:0回答:1

我有一个包含每小时测量数据的字典,其中缺少一些条目(间隙)。我当前的方法是创建一个具有每小时日期时间索引并预填充 NaN 的数据框。然后用gasDict替换数据框中的值(见下文)。随后对数据帧进行插值以消除 NaN。

import pandas as pd
import numpy as np

dataRange = pd.date_range(pd.to_datetime('2023-01-01 01:00:00'), pd.to_datetime('2023-01-01 05:00:00'), freq='H')
df = pd.DataFrame(np.nan, index=dataRange, columns=['gas'])
df['gas'] = pd.to_numeric(df['gas'], errors='coerce')

gasDict = {'2023-01-01 01:00:00' : 40,
           '2023-01-01 03:00:00' : 20  
          }

# these 3 methods do not work here
# methods from stackoverflow remap-values-in-pandas-column-with-a-dict-preserve-nans
df1 = df['gas'].map(gasDict).fillna(df['gas']) 
print(df1)

df2 = df['gas'].map(gasDict)
print(df2)

df3 = df.replace({'gas': gasDict})
print(df3)

# this code is correct but slow:
for key, value in gasDict.items():
    df.at[pd.to_datetime(key)] = value    

print(df) 

结果(只有最后一个是正确的!):

2023-01-01 01:00:00   NaN
2023-01-01 02:00:00   NaN
2023-01-01 03:00:00   NaN
2023-01-01 04:00:00   NaN
2023-01-01 05:00:00   NaN
Freq: H, Name: gas, dtype: float64
2023-01-01 01:00:00   NaN
2023-01-01 02:00:00   NaN
2023-01-01 03:00:00   NaN
2023-01-01 04:00:00   NaN
2023-01-01 05:00:00   NaN
Freq: H, Name: gas, dtype: float64
                     gas
2023-01-01 01:00:00  NaN
2023-01-01 02:00:00  NaN
2023-01-01 03:00:00  NaN
2023-01-01 04:00:00  NaN
2023-01-01 05:00:00  NaN
                      gas
2023-01-01 01:00:00  40.0
2023-01-01 02:00:00   NaN
2023-01-01 03:00:00  20.0
2023-01-01 04:00:00   NaN
2023-01-01 05:00:00   NaN

但是最后一个方法代码非常慢(gasDict 有大约 10000 个条目)。正确的做法是什么?

pandas dataframe python-datetime
1个回答
0
投票

我认为最好先从数据帧开始,然后扩展索引。要从字典创建数据框,您可以使用

DataFrame.from_dict
:

df = pd.DataFrame.from_dict(gasDict, orient='index', columns=['gas'])

然后将索引转换为

datatime
类型。

df.index = df.index.astype("datetime64['ns']")

之后使用 reindex 方法来扩展你的索引:

df = df.reindex(dataRange)
© www.soinside.com 2019 - 2024. All rights reserved.