我从之前的项目中获得了一个 CSV 文件,我应该使用 Python 准备一些脚本来绘制它包含的值。此 CSV 文件中的数据集保存来自电信号和振动信号的数据。我感兴趣的数据存储在“DecompressedValue”列中,其中每行保存一个 16.000 个元素长的浮点值数组,代表振动/电信号。
我想使用 Vaex 来利用其更高的性能特性,但在处理信号时我发现了一个我认为是错误的地方。我开始改编在 Pandas 中运行的代码。
import pandas as pd
import json
signal_df = pd.read_csv('csv_test.csv', sep=';')
# The DecompressedValue column, despite being stored as a regular array, is read a long string, so in order to turn it into an array, json.loads() has to be applied to each value of the column
signal_df.DecompressedValue = signal_df.DecompressedValue.apply(lambda r: json.loads(r))
但是,当尝试在 Vaex 中复制相同的功能时,即使此代码运行正确,之后尝试访问数据帧也会产生错误(找到 vaex_test.csv 用于测试此代码此处)。
import vaex
test = vaex.from_csv('vaex_test.csv', sep=';')
test['DecompressedValue'] = test['DecompressedValue'].apply(lambda r: json.loads(r))
test.head()
这会产生一个 ValueError:
[12/19/24 12:50:48] ERROR error evaluating: DecompressedValue at rows 0-5 [dataframe.py](file:///C:/Users/user/AppData/Local/anaconda3/envs/py310env/lib/site-packages/vaex/dataframe.py):[4101](file:///C:/Users/user/AppData/Local/anaconda3/envs/py310env/lib/site-packages/vaex/dataframe.py#4101)
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File
"c:\Users\user\AppData\Local\anaconda3\envs\py310env\lib\mu
ltiprocessing\pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File
"c:\Users\user\AppData\Local\anaconda3\envs\py310env\lib\si
te-packages\vaex\expression.py", line 1629, in _apply
result = np.array(result)
ValueError: setting an array element with a sequence. The requested
array has an inhomogeneous shape after 1 dimensions. The detected
shape was (5,) + inhomogeneous part.
"""
Pandas 和 Vaex 中的 DataFrame 是不同的。
要将 Vaex DataFrame 中的 csv 文件中的列表作为列表而不是字符串获取,一种方法是让 Pandas 进行格式化并使用 vaex from_pandas:
test_pd = pd.read_csv('vaex_test.csv')
test_pd['DecompressedValue'] = test_pd['DecompressedValue'].apply(lambda r: json.loads(r))
test = vaex.from_pandas(test_pd)
print(test.head())
print(type(test['DecompressedValue'] ))
print(test[3])
print(test[3][0]) # 4th list from csv
print(test[3][0][0])
# DecompressedValue
0 '[-0.004518906585872173, -0.004478906746953726, ...
1 '[-0.0005845219711773098, -0.0002945219748653471...
2 '[-0.006645397283136845, -0.006435397081077099, ...
3 '[0.003976251929998398, 0.0019852519035339355, 0...
4 '[0.003452450269833207, 0.0017284504137933254, 0...
<class 'vaex.expression.Expression'>
[[0.003976251929998398, 0.0019852519035339355, ...
[0.003976251929998398, 0.0019852519035339355, ...
0.003976251929998398