使用pandas读取sheet状态错误的excel文件

Question

我使用 pandas 来读取此 Excel 文件，该文件是通过自动化脚本从网站下载的。这是我的代码：

import pandas as pd
df = pd.read_excel('CallHistory.xlsx')

但它显示以下错误：

ValueError                                Traceback (most recent call last)
c:\Users\minhviet\Box\Telio\vietpm\python\crawler\test_crawl_3.ipynb Cell 7' in <module>
      1 import pandas as pd
----> 2 df = pd.read_excel('CallHistory.xlsx')
      3 df

File c:\Users\minhviet\Anaconda3\lib\site-packages\pandas\util\_decorators.py:311, in deprecate_nonkeyword_arguments.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
    305 if len(args) > num_allow_args:
    306     warnings.warn(
    307         msg.format(arguments=arguments),
    308         FutureWarning,
    309         stacklevel=stacklevel,
    310     )
--> 311 return func(*args, **kwargs)

File c:\Users\minhviet\Anaconda3\lib\site-packages\pandas\io\excel\_base.py:364, in read_excel(io, sheet_name, header, names, index_col, usecols, squeeze, dtype, engine, converters, true_values, false_values, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, parse_dates, date_parser, thousands, comment, skipfooter, convert_float, mangle_dupe_cols, storage_options)
    362 if not isinstance(io, ExcelFile):
    363     should_close = True
--> 364     io = ExcelFile(io, storage_options=storage_options, engine=engine)
    365 elif engine and engine != io.engine:
    366     raise ValueError(
    367         "Engine should not be specified when passing "
    368         "an ExcelFile - ExcelFile already has the engine set"
...
    127     if value not in self.values:
--> 128         raise ValueError(self.__doc__)
    129     super(Set, self).__set__(instance, value)

ValueError: Value must be one of {'visible', 'hidden', 'veryHidden'}

我搜索了这个错误并找到了一些信息。似乎工作表的状态是错误的。 https://github.com/exceljs/exceljs/issues/678

我尝试在 Excel 中打开这个文件，编辑一些内容并保存，然后我可以成功地从 pandas 读取它。但是，打开此文件是自动化脚本的一部分，因此使用 Excel 打开和编辑是不可能的。

你们可以在这里下载文件，希望任何人都可以找到用Python处理这个文件的方法：https://app.box.com/s/8vds9zmhhxhn18p0ngodqeepfcgpzevv

Answer 1

我找到了解决方案。我尝试使用不同的引擎（openpyxl 和 xlrd）通过 pandas read_excel 读取此文件。 Openpyxl 显示上述错误（ValueError：值必须是 {'visible'，'hidden'，'veryHidden'} 之一），而 xlrd 显示另一个错误（KeyError：'show'）。我搜索并发现此错误在 1.2.0 版本之后但版本 2.0 之前的 xlrd 提交中已修复（不再支持 xlsx 文件），因此此修复不在 xlrd 的任何正式版本中。以下是有关此修复的信息：https://github.com/python-excel/xlrd/commit/6ec98fc74796a6439c6dd64ed71597a3c50d4986#diff-74efe2926535c21edf8087564ce132fe

所以我安装了 xlrd 的这个固定提交。在安装此版本之前，请务必删除其他版本的 xlrd。

pip install git+https://github.com/python-excel/xlrd.git@6ec98fc74796a6439c6dd64ed71597a3c50d4986

然后我通过引擎xlrd读取了这个文件。它有效。

import pandas as pd
df = pd.read_excel("C://Users/minhviet/Documents/CallHistory.xlsx", engine='xlrd')

Answer 2

import pandas as pd
df = pd.read_excel("C:/Users/minhviet/Documents/CallHistory.xlsx", engine='xlrd')

之后

C:

只使用一个

对我有用。

Answer 3

您能检查该文件位置中的斜杠吗？ Windows 通常使用反斜杠 (

)，而 Mac/Unix/Linux 使用正斜杠 (

)。 Pandas 有时会同时识别两者，但可能会不一致。您最好使用操作系统默认值。

在 Python 中使用文件位置时，必须小心反斜杠不会被识别为转义字符。有两种方法可以做到这一点。

双反斜杠 -

"C:\Users\path\to\file"

变为

"C:\\Users\\path\\to\\file"

。

在字符串前面加上
```
r
```
：
```
r"C:\Users\path\to\file"
```
。
```
r
```
前缀代表
```
raw
```
，并告诉解析器按字面意思接受所有反斜杠，而不是作为转义字符。

Answer 4

正确答案应该是

df = pd.read_excel(open('CallHistory.xlsx','rb'),sheet_name=0)

使用pandas读取sheet状态错误的excel文件

问题描述投票：0回答：4

4个回答

最新问题

使用pandas读取sheet状态错误的excel文件

问题描述 投票：0回答：4

4个回答

最新问题

问题描述投票：0回答：4