我正在做一个使用 MIMIC-IV 数据集作为源的项目。我发现了一个在许多项目中广泛使用的预处理管道。当我尝试运行所述管道时,一切都很好,直到我尝试生成时间序列数据表示模块(我自己没有以任何方式修改数据或管道代码)。出现以下错误:
TypeError Traceback (most recent call last)
.../Downloads/MIMIC-IV-Data-Pipeline-main/mainPipeline.ipynb Cell 27 in <cell line: 20>()
18 impute=False
20 if data_icu:
---> 21 gen=data_generation_icu.Generator(cohort_output,data_mort,data_admn,data_los,diag_flag,proc_flag,out_flag,chart_flag,med_flag,impute,include,bucket,predW)
22 #gen=data_generation_icu.Generator(cohort_output,data_mort,diag_flag,False,False,chart_flag,False,impute,include,bucket,predW)
23 #if chart_flag:
24 # gen=data_generation_icu.Generator(cohort_output,data_mort,False,False,False,chart_flag,False,impute,include,bucket,predW)
25 else:
26 gen=data_generation.Generator(cohort_output,data_mort,data_admn,data_los,diag_flag,lab_flag,proc_flag,med_flag,impute,include,bucket,predW)
File ~/Downloads/MIMIC-IV-Data-Pipeline-main/model/data_generation_icu.py:22, in Generator.__init__(self, cohort_output, if_mort, if_admn, if_los, feat_cond, feat_proc, feat_out, feat_chart, feat_med, impute, include_time, bucket, predW)
20 self.cohort_output=cohort_output
21 self.impute=impute
---> 22 self.data = self.generate_adm()
23 print("[ READ COHORT ]")
25 self.generate_feat()
File ~/Downloads/MIMIC-IV-Data-Pipeline-main/model/data_generation_icu.py:64, in Generator.generate_adm(self)
62 data['los']=pd.to_timedelta(data['outtime']-data['intime'],unit='h')
63 data['los']=data['los'].astype(str)
---> 64 data[['days', 'dummy','hours']] = data['los'].str.split(' ', -1, expand=True)
65 data[['hours','min','sec']] = data['hours'].str.split(':', -1, expand=True)
66 data['los']=pd.to_numeric(data['days'])*24+pd.to_numeric(data['hours'])
...
127 )
128 raise TypeError(msg)
--> 129 return func(self, *args, **kwargs)
TypeError: split() takes from 1 to 2 positional arguments but 3 positional arguments (and 1 keyword-only argument) were given
我假设问题在于 pandas.str.split() 函数的使用(我使用的是 pandas 版本 2.0.3),但是当我检查文档时,据我所知,它应该接受 3 个关键字参数。 由于这不是我的代码,我很难调试这里出了什么问题,但也许我遗漏了一些东西。有谁知道或有人在尝试使用此管道时遇到同样的问题并知道如何解决此问题吗?
非常感谢!!!
IIUC用途:
data[['days', 'dummy','hours']] = data['los'].str.split(pat= ' ', n=-1, expand=True)
或者:
data[['days', 'dummy','hours']] = data['los'].str.split(expand=True)