我有一个这样的txt文件:
index timestamp polarisation current (A) signal (V) head temperature (°C) head relat.humidity (%RH) MUGS temperature (°C) laser voltage (V) laser current (A) driver temperature (°C)
0 16:11:24 0.4 0.0006019 26.51 43.5 32.0 11.37 0.3922 26.5
1 16:11:29 0.402 0.0006286 26.51 43.5 32.5 11.41 0.3972 31.5
2 16:11:34 0.404 0.0005828 26.51 43.5 32.5 11.42 0.4048 32.5
3 16:11:38 0.406 0.0006139 26.51 43.5 32.5 11.39 0.3984 32.5
我用以下方式读取该文件:
with open(universal_path,'rt'):
values = np.genfromtxt( universal_path, delimiter="", skip_header = 1, encoding='unicode_escape')
但问题是第二列充满了 NaN : 我尝试写 dtype = None。但现在它只读取 txt 文件的第一列:
我试着写
dtype = [int, str, float, float, float, float, float, float, float, float]
。但它也只读取第一列。
我该如何将第二列作为字符串读取?
这是使用 duckdb 的基本转换(在我看来,我已将列名改成了“合理”的名称),可以使用 numpy 函数来操作生成的字典
import duckdb as ddb
conn =ddb.connect() #in memory db, disappears when closed
conn.execute("""create table camile as SELECT *
FROM read_csv('test.csv',
delim = '\t',
header = true,
columns = {
'index': 'integer',
'timestamp': 'time',
'polarisation-current': 'double',
'signal': 'double',
'head-temp': 'double',
'head-relat-humidity': 'double',
'MUGS-temp': 'double',
'laser-voltage': 'double',
'laser-current': 'double',
'driver-temp': 'double'
} ); """
)
myData = conn.sql("SELECT * from camile").fetchnumpy()
conn.close()
# print first 5 items ...
for key, values in myData.items():
print(f"{key}: {values[:5]}")
#
# read https://duckdb.org/docs/guides/python/export_numpy.html
#