pandas
astype()
似乎意外地切换到从腌制文件加载数据后,执行就地操作。具体而言,对于astype(str)
,对输入数据框架值的数据类型进行了修改。是什么导致这种行为?
Pandas版本:2.0.3
小型示例:
import pandas as pd
import numpy as np
# create a test dataframe
df = pd.DataFrame({'col1': ['hi']*10 + [False]*20 + [np.nan]*30})
# print the data types of the cells, before and after casting to string
print(pd.unique([type(elem) for elem in df['col1'].values]))
_ = df.astype(str)
print(pd.unique([type(elem) for elem in df['col1'].values]))
# store the dataframe as pkl and directly load it again
outpath = 'C:/Dokumente/my_test_df.pkl'
df.to_pickle(outpath)
df2 = pd.read_pickle(outpath)
# print the data types of the cells, before and after casting to string
print(pd.unique([type(elem) for elem in df2['col1'].values]))
_ = df2.astype(str)
print(pd.unique([type(elem) for elem in df2['col1'].values]))
输出:
DataFrame.astype()
时,the nir -bug可能会更改就位(在那里使用:GH54654
)在pr中,关于泡菜mre
问题并不完全是泡菜,这只是复制问题的快速方法。
问题在于,这里的代码试图检查两个数组是否具有相同的内存(或共享内存),并且确实如此 - str
有关更多技术细节,请参见umpy/numpy#24478
themired更改:如果您使用的是版本
result is arr
。
##l759:".../pandas/_libs/lib.pyx"
< 2.2 and cannot upgrade, you could manually apply the fix mentioned in the PR and recompile
if copy and result is arr:
result = result.copy()