如何将字符串列表的字符串转换为浮动pandas列表

问题描述 投票:1回答:1

我有以下数据框(来自使用pd.read_csv的大型csv文件):

sal_vcf_to_df = pd.read_csv(sal_filepath, delimiter='\t', header = 0, index_col = False,
                            low_memory=False, usecols=['listA', 'Amino_Acid_Change', 'Gene_Name'])

sal_df_wo_na = sal_vcf_to_df.dropna(axis = 0, how = 'any')

sal_df_wo_na['listA'] = sal_df_wo_na['listA'].apply(lambda x : ast.literal_eval(x))
sal_df_wo_na['listA'] = sal_df_wo_na['listA'].apply(lambda x: list(map(float, x)))

我得到的数据帧:

            listA                Amino_Acid_Change        Gene_Name
0  "['133', '115', '3', '1']"        Q637K                 ATM                   
1  "['114', '115', '2', '3']"        I111                  PIK3R1
2  "['51', '59', '1', '1']"          T2491                 KMT2C

我想将'listA'列转换为浮点列表。到目前为止,我已经尝试过几个步骤:

sal_df_wo_na['listA'] = sal_df_wo_na['listA'].apply(lambda x : ast.literal_eval(x))

然后:

sal_df_wo_na['DP4_freeBayes'] = sal_df_wo_na['DP4_freeBayes'].apply(lambda x: list(map(float, x)))

但是在第一步之后我收到了以下警告:

SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

有谁知道如何修复警告或有更好的解决方案?

pandas
1个回答
1
投票

选项1 pd.eval - 最多可行100行 在这个可怕的专栏上执行转换的一种非常快速的方法是删除所有引号,然后调用pd.eval -

v = pd.eval(df.listA.str.replace("['\"]", '')).astype(float)

v
array([[ 133.,  115.,    3.,    1.],
       [ 114.,  115.,    2.,    3.],
       [  51.,   59.,    1.,    1.]])

将结果分配回来 -

df['listA'] = v
df

              listA Amino_Acid_Change Gene_Name
0  [133, 115, 3, 1]             Q637K       ATM
1  [114, 115, 2, 3]              I111    PIK3R1
2    [51, 59, 1, 1]             T2491     KMT2C

选项2 ast.literal_eval - 可靠的主力 更新:pd.eval only supports upto a 100 rows,所以更慢,更可靠的后备将使用ast.literal_eval -

from ast import literal_eval

df.listA = df.listA.str.replace("'", '').apply(literal_eval)
df 

              listA Amino_Acid_Change Gene_Name
0  [133, 115, 3, 1]             Q637K       ATM
1  [114, 115, 2, 3]              I111    PIK3R1
2    [51, 59, 1, 1]             T2491     KMT2C

至于SettingWithCopyWarning,最好的阅读来源是

简而言之,您正在做的是通过从更大的数据帧中提取切片/视图来创建sal_df_wo_na,类似这样 -

sal_df_wo_na = df[<some condition here>]

这可能导致链式索引,大熊猫警告说。相反,你需要做类似的事情

sal_df_wo_na = df[<some condition here>].copy()

通过使用pd.DataFrame.copy函数创建切片的副本。如果列中有对象,请将deep=True作为参数添加到copy

© www.soinside.com 2019 - 2024. All rights reserved.