Python：循环时出现 Keyerror:0

Question

我正在尝试按照代码从 pandas 列中提取关键短语。链接到 jupyter 笔记本

每当我尝试运行时，我总是在正则表达式操作行中收到错误。

corpus = []
dataset['word_count'] = dataset[datacol].apply(lambda x: len(str(x).split(" ")))
ds_count = len(dataset.word_count)
for i in range(0, ds_count):
    # Remove punctuation
    text = re.sub('[^a-zA-Z]', ' ', str(dataset[datacol][i])) # the error is here

这是我收到的错误。有谁知道这可能是什么原因造成的？

---> 58     text = re.sub('[^a-zA-Z]', ' ', pr_df['Comment'][i])
    

File C:\Program Files\Anaconda3\Lib\site-packages\pandas\core\series.py:981, in Series.__getitem__(self, key)
    978     return self._values[key]
    980 elif key_is_scalar:
--> 981     return self._get_value(key)
    983 if is_hashable(key):
    984     # Otherwise index.get_value will raise InvalidIndexError
    985     try:
    986         # For labels that don't resolve as scalars like tuples and frozensets

File C:\Program Files\Anaconda3\Lib\site-packages\pandas\core\series.py:1089, in Series._get_value(self, label, takeable)
   1086     return self._values[label]
   1088 # Similar to Index.get_value, but we do not fall back to positional
-> 1089 loc = self.index.get_loc(label)
   1090 return self.index._get_values_for_loc(self, loc, label)

File C:\Program Files\Anaconda3\Lib\site-packages\pandas\core\indexes\base.py:3804, in Index.get_loc(self, key, method, tolerance)
   3802     return self._engine.get_loc(casted_key)
   3803 except KeyError as err:
-> 3804     raise KeyError(key) from err
   3805 except TypeError:
   3806     # If we have a listlike key, _check_indexing_error will raise
   3807     #  InvalidIndexError. Otherwise we fall through and re-raise
   3808     #  the TypeError.
   3809     self._check_indexing_error(key)

KeyError: 0

Answer 1

您的代码对于我来说与您的输入文件配合得很好

rfi-data.tsv

。您的错误意味着您没有名为

:

的索引标签

>>> dataset[datacol].head()
0    ||Hello there|I'm from Nepal looking for admis...
1    ||Hello,||My name is. Currently, I am a second...
2    |Could you give me more information on agronom...
3    |Dear, I am a Brazilian student and would like...
4    |Hello my name is and I've recently enrolled t...
Name: question, dtype: object

>>> dataset[datacol][0]
"||Hello there|I'm from Nepal looking for admis..."

如您所见，索引标签

存在于上面的

Series

中，并且您的代码可以正常工作。但是，如果您修改标签，则会引发相同的错误：

# Do: dataset.index += 10
>>> dataset[datacol].head()
10    ||Hello there|I'm from Nepal looking for admis...
11    ||Hello,||My name is. Currently, I am a second...
12    |Could you give me more information on agronom...
13    |Dear, I am a Brazilian student and would like...
14    |Hello my name is and I've recently enrolled t...
Name: question, dtype: object

>>> dataset[datacol][0]
...
KeyError: 0

解决索引标签问题的一个简单方法是迭代文本：

corpus = []
dataset['word_count'] = dataset[datacol].str.split(' ').str.len()
for text in dataset[datacol].str.lower():
    # do stuff here

Python：循环时出现 Keyerror:0

问题描述投票：0回答：1

1个回答

最新问题

Python：循环时出现 Keyerror:0

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1