我是 googleapis 的 bigframes 包的新用户。我正在尝试操作从 Bigquery 加载的数据帧。
我试图执行一些代码,但我遇到了一个我无法解决的问题。
我尝试在参数 axis=1 的 Dataframe 上使用 apply 函数,但它似乎不起作用。我总是收到错误消息。
你能帮我解决这个问题吗?
谢谢。
代码示例
# example
def condition(row):
print(row )
if 1 <= row["month"] <= 6:
return f"{row['year']:02}S1{row['CODPY']}{row['CODDE']}"
else:
return f"{row['year']:02}S2{row['CODPY']}{row['CODDE']}"
valodetail_df['IDT'] = valodetail_df.apply(condition,axis=1)
堆栈跟踪
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/ubuntu/miniconda3/envs/qback/lib/python3.11/site-packages/bigframes/core/log_adapter.py", line 44, in wrapper
return method(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/miniconda3/envs/qback/lib/python3.11/site-packages/bigframes/dataframe.py", line 3118, in apply
results = {name: func(col, *args, **kwargs) for name, col in self.items()}
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/miniconda3/envs/qback/lib/python3.11/site-packages/bigframes/dataframe.py", line 3118, in <dictcomp>
results = {name: func(col, *args, **kwargs) for name, col in self.items()}
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<stdin>", line 3, in condition
File "missing.pyx", line 419, in pandas._libs.missing.NAType.__bool__
TypeError: boolean value of NA is ambiguous
>>> valodetail_df['IDTDCI'] = valodetail_df.apply(condition,axis=1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/ubuntu/miniconda3/envs/qback/lib/python3.11/site-packages/bigframes/core/log_adapter.py", line 44, in wrapper
return method(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/miniconda3/envs/qback/lib/python3.11/site-packages/bigframes/dataframe.py", line 3118, in apply
results = {name: func(col, *args, **kwargs) for name, col in self.items()}
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/miniconda3/envs/qback/lib/python3.11/site-packages/bigframes/dataframe.py", line 3118, in <dictcomp>
results = {name: func(col, *args, **kwargs) for name, col in self.items()}
^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: condition() got an unexpected keyword argument 'axis'
axis=1
:https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.dataframe.DataFrame#bigframes_dataframe_DataFrame_apply
这是一个功能请求 https://github.com/googleapis/python-bigquery-dataframes/issues/592 同样的。
但是,对于您的特定用例,可以通过其他方式来实现。
这是对您正在使用的 DataFrame 类型的猜测:
import bigframes.pandas as bpd
df = bpd.DataFrame({
"month": [1,3,6,7,12],
"year": ["8", "9", "10", "11", "12"],
"CODPY": ["PY", "PY", "PY", "PY", "PY"],
"CODDE": ["DE", "DE", "DE", "DE", "DE"],
})
df
month year CODPY CODDE
0 1 8 PY DE
1 3 9 PY DE
2 6 10 PY DE
3 7 11 PY DE
4 12 12 PY DE
我们可以使用其他 DataFrame 和 Series API 来创建所需的列:
condition = (df["month"] >= 1) & (df["month"] <= 6)
s1 = df["year"].str.pad(fillchar='0', width=2) + "S1" + df["CODPY"] + df["CODDE"]
s2 = df["year"].str.pad(fillchar='0', width=2) + "S2" + df["CODPY"] + df["CODDE"]
df['IDT'] = s1.where(condition, s2)
df
month year CODPY CODDE IDT
0 1 8 PY DE 08S1PYDE
1 3 9 PY DE 09S1PYDE
2 6 10 PY DE 10S1PYDE
3 7 11 PY DE 11S2PYDE
4 12 12 PY DE 12S2PYDE
希望这有帮助。