Pandas Dataframe loc 将值（pybel 对象）分配给单元格：TypeError：“Molecule”类型的对象没有 len()

Question

背景

我需要通过 python 脚本使用 pybel 将数千个 SMILES 转换为 SDF 文件（下面的几行代码）。因为 SMILES 存储在 csv 中，并且我想使用 pandas（或将来使用 dask 进行并行），所以我选择创建 pandas。

df['ROMol'] = None # Create a new column
for ind, row in df.iterrows():
      mol = pybel.readstring("smi", row[args.smi_column])
      df.loc[ind, 'ROMol'] = mol

示例 CSV 文件：

SMILES
C1=CC=CC=C1
CC(=O)Oc1ccccc1C(=O)O
C1CCCCC1

问题

Pandas 抛出错误：

/pandas/core/indexing.py", line 1984, in _setitem_with_indexer_split_path elif len(ilocs) == 1 and lplane_indexer == len(value) and not is_scalar(pi): TypeError: object of type 'Molecule' has no len()

我通过创建一个列表变量（调用 mol_list）并将 pybel.readstring 对象附加到该列表中来解决这个问题。最后，我在现有的 pandas 数据框中分配一个新列：

df['ROMol'] = mol_list

但是，我想使用loc。如何防止 pandas 检查对象的长度。我还将新列和数据框的

dtype

转换为

object

，遵循本指南。

此外，我尝试了一些其他方法，它们有效，但我不认为它们解决了主要原因。

分配列表而不是值

df.loc[ind, 'ROMol'] = [mol]

使用视图副本

df["ROMol"][ind] = mol

它警告：

Use df.loc[row_indexer, "col"] = values instead, to perform the assignment in a single step and ensure this keeps updating the original df.

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

此外，还有一些参考： BUG：类型错误：使用对象 dtype 列保存 DataFrame 时，“int”类型的对象没有 len() Pandas 数据框 - TypeError：“_io.TextIOWrapper”类型的对象没有 len()

Answer 1

为什么不直接使用

apply

功能呢？:

df["ROMol"] = df[args.smi_column].apply(lambda x: pybel.readstring("smi", x))

不仅更干净，而且应该更高效。

Pandas Dataframe loc 将值（pybel 对象）分配给单元格：TypeError：“Molecule”类型的对象没有 len()

问题描述投票：0回答：1

背景

问题

1个回答

最新问题

Pandas Dataframe loc 将值（pybel 对象）分配给单元格：TypeError：“Molecule”类型的对象没有 len()

问题描述 投票：0回答：1

背景

问题

1个回答

最新问题

问题描述投票：0回答：1