首次运行时 to_list() 函数出现错误 (python)

Question

我正在 google co.lab 上从事与 Gemini 相关的情感分析项目。其中一个步骤是我隔离包含 ios 应用程序评论的列，将其转换为列表并清除其中的特殊字符，然后将其加载回主查询中。问题是我清理阶段最后一步中的代码块第一次不起作用，只有在重新运行前面的步骤后才起作用。

my_data=df[['User','Review', 'Rating']]
my_data.columns=[['user','review','rating']]

my_data.head()

print(my_data.dtypes)

用户对象审核对象评级 int64 数据类型：对象

mydata=my_data['review']
print(mydata)

['xxxxxxx 到目前为止为我提供了很棒的服务...',....]

# cleaning the data and removing special characters
import re

def clean_text(text):
  if not isinstance(text, str): # Check if the input is a string
    return "" # Return an empty string if not
  # remove special characters
  text=re.sub(r"[^\w\s]"," ",text)
  # Remove single characters
  text = re.sub(r"\b[a-zA-Z]\b"," ", text)
  # Remove HTML tags
  text = text.lower()

  # Lowercase the text
  text=text.lower()

  #Remove extra whitespace
  text= re.sub(r"\s+"," ", text)

  # Trim leading and trailing spaces
  text= text.strip()

  return text

my_data.columns = my_data.columns.get_level_values(0)

import pandas as pd

# Extract the 'review' column as a list
reviews = mydata.tolist()
#apply the clean_text function to each review
cleaned_reviews = [clean_text(review) for review in reviews]

print(cleaned_reviews)

此时它要么是空白[]，要么会导致有关 tolist() 不存在的错误

类似这样的：

AttributeError                            Traceback (most recent call last) <ipython-input-45-b447fa2a5069> in <cell line: 6>()
      4 
      5 # Extract the 'review' column as a list
----> 6 reviews = mydata.tolist()
      7 
      8 

/usr/local/lib/python3.10/dist-packages/pandas/core/generic.py in
__getattr__(self, name)    5987         ):    5988             return self[name]
-> 5989         return object.__getattribute__(self, name)    5990     5991     def __setattr__(self, name: str, value) -> None:

AttributeError: 'DataFrame' object has no attribute 'tolist'

当我尝试按照 Gemini 的建议将其更改为 to_list() 或 list(mydata) 时，会发生相同的错误，但现在会说 to_list()/list(mydata) 不存在。

当我从 mydata=my_data['review'] （隔离一列）和清理文本步骤开始重新运行之前的步骤时。它最终会起作用

我会得到正确的结果[“xxxxxxx到目前为止为我提供了优质的服务并提供了便利，希望它会继续很好”，....]

所有步骤最终都会起作用，但我试图理解为什么最后一步不能立即起作用，只有在重新运行后才能起作用。我检查了执行日志，它似乎正在工作。我尝试以某些方式更改代码，例如使用 list() 但它仍然无法运行，我最终将其更改回来。

我想了解一些见解，为什么它不能持续运行，因为我希望它在第一次尝试时或当我按全部运行时运行

Answer 1

再次尝试打印（mydata）以查看它包含的内容。我的问题是，当您在第一行中将其称为“评论”时，编写 mydata=my_data['review'] 是否正确。

如果 DataFrame 只有一列，DataFrame.tolist() 应该可以工作，所以我想知道 mydata 里面有什么，以尝试理解为什么它不能正常工作。

首次运行时 to_list() 函数出现错误 (python)

问题描述投票：0回答：1

1个回答

最新问题

首次运行时 to_list() 函数出现错误 (python)

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1