我有一个名为
input.csv
的 csv 文件,其中包含以下列:
行数 | 开始日期时间 | id | json_消息 |
---|---|---|---|
120 | 2024-02-02 00:01:00.001+00 | 1020240202450 | {'金额': 10000, '货币': '新西兰元','seqnbr': 161 } |
121 | 2024-02-02 00:02:00.001+00 | 1020240202451 | {'金额': 20000, '货币': '澳元','seqnbr': 162 } |
122 | 2024-02-02 00:03:00.001+00 | 1020240202452 | {'金额': 30000, '货币': '美元','seqnbr': 无 } |
123 | 2024-02-02 00:04:00.001+00 | 1020240202455 | {'金额': 40000, '货币': 'INR','seqnbr': 163 } |
我正在使用 python3 来读取此 csv 文件,我需要将
seqnbr
列下的
json_message
字段替换为每行的随机整数数字。如果 seqnbr 包含 None 则该行不应被随机 seqnbr 替换,而应按原样保留。我的 csv 文件的分隔符是管道 (|) 符号。我使用下面的 python 代码将文件替换为随机生成的整数值,但它仍然不会覆盖它,这是我的代码:
def update_seqnbr(cls):
filename = 'input.csv'
seqnbr_pattern = "'seqnbr': ([\s\d]+)"
with open(filename, 'r') as csvfile:
datareader = (csv.reader(csvfile, delimiter="|"))
next(datareader, None) # skip the headers
for row in datareader:
json_message = row[3]
match = re.findall(seqnbr_pattern, json_message)
if len(match) != 0:
replaced_json_message = json_message.replace(match[0], str(random.randint(500, 999)))
row[3] = replaced_json_message
x = open(FILENAME, "a")
x.writelines(row)
x.close()
我的文件应如下所示:
行数 | 开始日期时间 | id | json_消息 |
---|---|---|---|
120 | 2024-02-02 00:01:00.001+00 | 1020240202450 | {'金额': 10000, '货币': '新西兰元','seqnbr': 555 } |
121 | 2024-02-02 00:02:00.001+00 | 1020240202451 | {'金额': 20000, '货币': '澳元','seqnbr': 897 } |
122 | 2024-02-02 00:03:00.001+00 | 1020240202452 | {'金额': 30000, '货币': '美元','seqnbr': 无 } |
123 | 2024-02-02 00:04:00.001+00 | 1020240202455 | {'金额': 40000, '货币': 'INR','seqnbr': 768 } |
有人可以帮我解决这个问题吗?
str.json_decode()
将 json 转换为 struct。struct.with_fields()
更新 seqnbr
字段。np.random.randint()
创建随机整数列表。struct.json_encode()
转换回 json。import numpy as np
df.with_columns(
pl.col.json_message.str.json_decode()
).with_columns(
pl.col.json_message.struct.with_fields(
seqnbr = np.random.randint(500, 999, len(df))
).struct.json_encode()
)
┌─────────┬────────────────────────────┬───────────────┬────────────────────────────────────────────────┐
│ row_num ┆ start_date_time ┆ id ┆ json_message │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str │
╞═════════╪════════════════════════════╪═══════════════╪════════════════════════════════════════════════╡
│ 120 ┆ 2024-02-02 00:01:00.001+00 ┆ 1020240202450 ┆ {"amount":10000,"currency":"NZD","seqnbr":850} │
│ 121 ┆ 2024-02-02 00:02:00.001+00 ┆ 1020240202451 ┆ {"amount":20000,"currency":"AUD","seqnbr":841} │
│ 122 ┆ 024-02-02 00:03:00.001+00 ┆ 1020240202452 ┆ {"amount":30000,"currency":"USD","seqnbr":537} │
│ 123 ┆ 2024-02-02 00:04:00.001+00 ┆ 1020240202455 ┆ {"amount":40000,"currency":"INR","seqnbr":937} │
└─────────┴────────────────────────────┴───────────────┴────────────────────────────────────────────────┘
我对您的代码做了一些更改(也使用 csv 编写新的 csv,最重要的是识别
write
部分,忽略 None
行:
def update_seqnbr(cls):
filename = 'input.csv'
seqnbr_pattern = "'seqnbr': ([\s\d]+)"
with open(FILENAME, 'a', newline='') as targetfile:
targetcsv = csv.writer(targetfile, delimiter='|')
with open(filename, 'r') as csvfile:
datareader = (csv.reader(csvfile, delimiter="|"))
targetcsv.writerow(next(datareader, None)) # skip the headers
for row in datareader:
json_message = row[3]
match = re.findall(seqnbr_pattern, json_message)
if len(match) != 0:
replaced_json_message = json_message.replace(match[0], str(random.randint(500, 999)))
row[3] = replaced_json_message
targetcsv.writerow(row)
import numpy as np
(
df
# convert json_message to struct column
.with_columns(pl.col("json_message").str.json_decode())
# conditionally replace seqnbr values
.with_columns(
pl.col("json_message").struct.with_fields(
pl.when(
pl.field("seqnbr").is_not_null()
).then(
pl.Series(np.random.randint(0, 200, size=df.height))
).alias("seqnbr")
)
)
# convert json_message to json string
.with_columns(
pl.col("json_message").struct.json_encode()
)
)
shape: (4, 4)
┌─────────┬─────────────────────────┬───────────────┬─────────────────────────────────────────────────┐
│ row_num ┆ start_date_time ┆ id ┆ json_message │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ datetime[μs] ┆ str ┆ str │
╞═════════╪═════════════════════════╪═══════════════╪═════════════════════════════════════════════════╡
│ 120 ┆ 2024-02-02 00:01:00.001 ┆ 1020240202450 ┆ {"amount":10000,"currency":"NZD","seqnbr":108} │
│ 121 ┆ 2024-02-02 00:02:00.001 ┆ 1020240202451 ┆ {"amount":20000,"currency":"AUD","seqnbr":133} │
│ 122 ┆ 2024-02-02 00:03:00.001 ┆ 1020240202452 ┆ {"amount":30000,"currency":"USD","seqnbr":null} │
│ 123 ┆ 2024-02-02 00:04:00.001 ┆ 1020240202455 ┆ {"amount":40000,"currency":"INR","seqnbr":43} │
└─────────┴─────────────────────────┴───────────────┴─────────────────────────────────────────────────┘