我想删除通过使用 Numpy.dtype 模板读取二进制文件生成的 DataFrame 中的行。我使用了多种方法删除一行并继续受到错误的阻碍,通常是:
TypeError: void() 至少需要 1 个位置参数(给定 0 个)
在 IDE 中打开变量资源管理器在尝试检查列名称时显示相同的错误,这表明提取数据的不正确方法在某种程度上损坏了列名称。
我按以下方式加载数据(为简洁起见,此处缩短了变量数量):
```
data_template = np.dtype([
('header_a','V22'),
('variable_A','>u2'),
('gpssec','>u4')
])
with open(source_file, 'rb') as f: byte_data = f.read()
np_data = np.frombuffer(byte_data, data_template)
df = pd.DataFrame(np_data)
```
当我尝试使用一种方法来减少 DataFrame 时。
`df = df[df['gpssec'] > 1000]`
我明白了...
File C:\ProgramData\anaconda311\Lib\site-packages\pandas\core\frame.py:3798 in __getitem__
return self._getitem_bool_array(key)
File C:\ProgramData\anaconda311\Lib\site-packages\pandas\core\frame.py:3853 in _getitem_bool_array
return self._take_with_is_copy(indexer, axis=0)
File C:\ProgramData\anaconda311\Lib\site-packages\pandas\core\generic.py:3902 in _take_with_is_copy
result = self._take(indices=indices, axis=axis)
File C:\ProgramData\anaconda311\Lib\site-packages\pandas\core\generic.py:3886 in _take
new_data = self._mgr.take(
File C:\ProgramData\anaconda311\Lib\site-packages\pandas\core\internals\managers.py:978 in take
return self.reindex_indexer(
File C:\ProgramData\anaconda311\Lib\site-packages\pandas\core\internals\managers.py:751 in reindex_indexer
new_blocks = [
File C:\ProgramData\anaconda311\Lib\site-packages\pandas\core\internals\managers.py:752 in <listcomp>
blk.take_nd(
File C:\ProgramData\anaconda311\Lib\site-packages\pandas\core\internals\blocks.py:880 in take_nd
new_values = algos.take_nd(
File C:\ProgramData\anaconda311\Lib\site-packages\pandas\core\array_algos\take.py:117 in take_nd
return _take_nd_ndarray(arr, indexer, axis, fill_value, allow_fill)
File C:\ProgramData\anaconda311\Lib\site-packages\pandas\core\array_algos\take.py:134 in _take_nd_ndarray
dtype, fill_value, mask_info = _take_preprocess_indexer_and_fill_value(
File C:\ProgramData\anaconda311\Lib\site-packages\pandas\core\array_algos\take.py:582 in _take_preprocess_indexer_and_fill_value
dtype, fill_value = arr.dtype, arr.dtype.type()
TypeError: void() takes at least 1 positional argument (0 given)
```
I've been able to work around the problem by copying each column of relevant data into a blank DataFrame that doesn't have the corrupt headers, but it's a kludgy solution. Not sure this qualifies as a bug as it's very likely it's a user error, but I can't find anything obvious I'm doing wrong.
In [230]: data_template = np.dtype([
...: ('header_a','V22'),
...: ('variable_A','>u2'),
...: ('gpssec','>u4')
...: ])
从此数据类型创建虚拟数组:
In [231]: arr = np.zeros(4, data_template)
In [232]: arr
Out[232]:
array([(b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00', 0, 0),
(b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00', 0, 0),
(b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00', 0, 0),
(b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00', 0, 0)],
dtype=[('header_a', 'V22'), ('variable_A', '>u2'), ('gpssec', '>u4')])
我们可以用它制作一个数据框:
In [233]: df = pd.DataFrame(arr)
In [234]: df.describe()
Out[234]:
variable_A gpssec
count 4.0 4.0
mean 0.0 0.0
std 0.0 0.0
min 0.0 0.0
25% 0.0 0.0
50% 0.0 0.0
75% 0.0 0.0
max 0.0 0.0
但是显示或信息引发错误:
In [235]: df.info()
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''