如何用Python中的csv文件的每一行中的JSON字符串中的特定字段替换为随机值?

问题描述 投票:0回答:3

我有一个名为

input.csv
的 csv 文件,其中包含以下列:

行数 开始日期时间 id json_消息
120 2024-02-02 00:01:00.001+00 1020240202450 {'金额': 10000, '货币': '新西兰元','seqnbr': 161 }
121 2024-02-02 00:02:00.001+00 1020240202451 {'金额': 20000, '货币': '澳元','seqnbr': 162 }
122 2024-02-02 00:03:00.001+00 1020240202452 {'金额': 30000, '货币': '美元','seqnbr': 无 }
123 2024-02-02 00:04:00.001+00 1020240202455 {'金额': 40000, '货币': 'INR','seqnbr': 163 }

我正在使用 python3 来读取此 csv 文件,我需要将

seqnbr
 列下的 
json_message
字段替换为每行的随机整数数字。如果 seqnbr 包含 None 则该行不应被随机 seqnbr 替换,而应按原样保留。我的 csv 文件的分隔符是管道 (|) 符号。我使用下面的 python 代码将文件替换为随机生成的整数值,但它仍然不会覆盖它,这是我的代码:

def update_seqnbr(cls):
    filename = 'input.csv'
    seqnbr_pattern = "'seqnbr': ([\s\d]+)"
    with open(filename, 'r') as csvfile:
        datareader = (csv.reader(csvfile, delimiter="|"))
        next(datareader, None)  # skip the headers
        for row in datareader:
            json_message = row[3]
            match = re.findall(seqnbr_pattern, json_message)
            if len(match) != 0:
                replaced_json_message = json_message.replace(match[0], str(random.randint(500, 999)))
                row[3] = replaced_json_message
                x = open(FILENAME, "a")
                x.writelines(row)
                x.close()

我的文件应如下所示:

行数 开始日期时间 id json_消息
120 2024-02-02 00:01:00.001+00 1020240202450 {'金额': 10000, '货币': '新西兰元','seqnbr': 555 }
121 2024-02-02 00:02:00.001+00 1020240202451 {'金额': 20000, '货币': '澳元','seqnbr': 897 }
122 2024-02-02 00:03:00.001+00 1020240202452 {'金额': 30000, '货币': '美元','seqnbr': 无 }
123 2024-02-02 00:04:00.001+00 1020240202455 {'金额': 40000, '货币': 'INR','seqnbr': 768 }

有人可以帮我解决这个问题吗?

python python-3.x pandas python-polars
3个回答
0
投票
  • str.json_decode()
    将 json 转换为 struct。
  • struct.with_fields()
    更新
    seqnbr
    字段。
  • np.random.randint()
    创建随机整数列表。
  • struct.json_encode()
    转换回 json。
import numpy as np

df.with_columns(
    pl.col.json_message.str.json_decode()
).with_columns(
    pl.col.json_message.struct.with_fields(
        seqnbr = np.random.randint(500, 999, len(df))
    ).struct.json_encode()
)

┌─────────┬────────────────────────────┬───────────────┬────────────────────────────────────────────────┐
│ row_num ┆ start_date_time            ┆ id            ┆ json_message                                   │
│ ---     ┆ ---                        ┆ ---           ┆ ---                                            │
│ i64     ┆ str                        ┆ i64           ┆ str                                            │
╞═════════╪════════════════════════════╪═══════════════╪════════════════════════════════════════════════╡
│ 120     ┆ 2024-02-02 00:01:00.001+00 ┆ 1020240202450 ┆ {"amount":10000,"currency":"NZD","seqnbr":850} │
│ 121     ┆ 2024-02-02 00:02:00.001+00 ┆ 1020240202451 ┆ {"amount":20000,"currency":"AUD","seqnbr":841} │
│ 122     ┆ 024-02-02 00:03:00.001+00  ┆ 1020240202452 ┆ {"amount":30000,"currency":"USD","seqnbr":537} │
│ 123     ┆ 2024-02-02 00:04:00.001+00 ┆ 1020240202455 ┆ {"amount":40000,"currency":"INR","seqnbr":937} │
└─────────┴────────────────────────────┴───────────────┴────────────────────────────────────────────────┘

0
投票

我对您的代码做了一些更改(也使用 csv 编写新的 csv,最重要的是识别

write
部分,忽略
None
行:

def update_seqnbr(cls):
    filename = 'input.csv'
    seqnbr_pattern = "'seqnbr': ([\s\d]+)"
    with open(FILENAME, 'a', newline='') as targetfile:
        targetcsv = csv.writer(targetfile, delimiter='|')
        with open(filename, 'r') as csvfile:
            datareader = (csv.reader(csvfile, delimiter="|"))
            targetcsv.writerow(next(datareader, None))  # skip the headers
            for row in datareader:
                json_message = row[3]
                match = re.findall(seqnbr_pattern, json_message)
                if len(match) != 0:
                    replaced_json_message = json_message.replace(match[0], str(random.randint(500, 999)))
                    row[3] = replaced_json_message
                targetcsv.writerow(row)

0
投票
import numpy as np

(
    df
    # convert json_message to struct column
    .with_columns(pl.col("json_message").str.json_decode())
    # conditionally replace seqnbr values
    .with_columns(
        pl.col("json_message").struct.with_fields(
            pl.when(
                pl.field("seqnbr").is_not_null()
            ).then(
                pl.Series(np.random.randint(0, 200, size=df.height))
            ).alias("seqnbr")
        )
    )
    # convert json_message to json string
    .with_columns(
        pl.col("json_message").struct.json_encode()
    )
)
shape: (4, 4)
┌─────────┬─────────────────────────┬───────────────┬─────────────────────────────────────────────────┐
│ row_num ┆ start_date_time         ┆ id            ┆ json_message                                    │
│ ---     ┆ ---                     ┆ ---           ┆ ---                                             │
│ i64     ┆ datetime[μs]            ┆ str           ┆ str                                             │
╞═════════╪═════════════════════════╪═══════════════╪═════════════════════════════════════════════════╡
│ 120     ┆ 2024-02-02 00:01:00.001 ┆ 1020240202450 ┆ {"amount":10000,"currency":"NZD","seqnbr":108}  │
│ 121     ┆ 2024-02-02 00:02:00.001 ┆ 1020240202451 ┆ {"amount":20000,"currency":"AUD","seqnbr":133}  │
│ 122     ┆ 2024-02-02 00:03:00.001 ┆ 1020240202452 ┆ {"amount":30000,"currency":"USD","seqnbr":null} │
│ 123     ┆ 2024-02-02 00:04:00.001 ┆ 1020240202455 ┆ {"amount":40000,"currency":"INR","seqnbr":43}   │
└─────────┴─────────────────────────┴───────────────┴─────────────────────────────────────────────────┘
© www.soinside.com 2019 - 2024. All rights reserved.