如何将一个 csv 文件中的数据准确映射到另一个 csv 文件

Question

有两个 csv 文件，其中包含两个相似的数据，我想以 json 格式映射和过滤数据

示例

样本数据1

sku_full	实力	类型	打包	处方	成分	公司
Nizapas 102 镁片 Mr 10	100/2 MG	平板先生	女士 10 片装	需要处方	尼美舒利+替扎尼定	丹尼尔·巴斯德

样本数据2

名字	制造商	盐_成分	包装	套餐	数量	产品形态	需要处方
Nizapas 100毫克/2毫克片剂Mr 10	丹尼尔·巴斯德	尼美舒利+替扎尼定	10 片装条	脱衣	10	平板电脑	需要处方

现在这些数据非常相似，成分值给了我 100+2 = 102 mg
如果您看到类型/产品（它可以包括注射剂，胶囊，片剂先生，片剂女士等等）需要一个代码来映射准确的数据
是否有任何人工智能模型或解决方案可以提供正确的json格式文件中的数据映射
如果有任何与此相关的解决方案（需要映射数据的代码）

代码片段

import pandas as pd
import re
import json
from google.colab import files


file_1_path = 'F:\dataset\sample1.csv'
file_2_path = 'F:\dataset\sample2.csv'


file_1 = pd.read_csv(file_1_path)
file_2 = pd.read_csv(file_2_path)


def extract_name_dosage_and_type(medicine, medicine_type):
    first_word = medicine.split()[0].lower()
    dosage_matches = re.findall(r'(\d+\.?\d*)mg', medicine.lower())
    total_dosage = sum(map(float, dosage_matches)) if dosage_matches else None
    cleaned_name = ' '.join(re.sub(r'\d+\.?\d*mg|\b(sr|md|dt|tab|capsule)\b', '', medicine, flags=re.IGNORECASE).split())

    return first_word, total_dosage, medicine_type.lower(), cleaned_name


file_1[['first_name', 'total_dosage', 'medicine_type', 'cleaned_name']] = file_1.apply(lambda x: pd.Series(extract_name_dosage_and_type(x['sku'], x['type'])), axis=1)
file_2[['first_name', 'total_dosage', 'medicine_type', 'cleaned_name']] = file_2.apply(lambda x: pd.Series(extract_name_dosage_and_type(x['name'], x['Product Form'])), axis=1)


medicine_mapping = {}


for index, sample_row in file_2.iterrows():
    matches = file_1[
        (file_1['cleaned_name'] == sample_row['cleaned_name']) &
        (file_1['total_dosage'] == sample_row['total_dosage']) &
        (file_1['medicine_type'] == sample_row['medicine_type']) &
        (file_1['pack'].str.contains(sample_row['Packaging'], case=False, na=False))
    ]['sku'].tolist()

    medicine_mapping[sample_row['name']] = matches


json_file_path = 'medicine_mapping_strict.json'
with open(json_file_path, 'w') as json_file:
    json.dump(medicine_mapping, json_file, indent=4)


files.download(json_file_path)
print(f"Dictionary has been exported to '{json_file_path}' and is ready for download.")

我需要两个 csv 文件 100% 准确映射数据

Answer 1

也许这会起作用。根据您的需要更正

column_mapping

和列名称

import pandas as pd

# Load the CSV files
df1 = pd.read_csv('file1.csv')
df2 = pd.read_csv('file2.csv')

column_mapping = {
    'sku_full': 'name',
    'strength': 'salt_composition',
    'type': 'Product Form',
    'pack': 'Packaging',
    'prescription': 'prescription_required',
    'composition': 'salt_composition',
    'company': 'manufacturers'
}

df2 = df2.rename(columns=column_mapping)


merged_df = pd.merge(df1, df2, on=['sku_full', 'prescription', 'composition', 'company'], how='inner')

result_df = merged_df[['sku_full', 'strength', 'type', 'pack', 'prescription', 'composition', 'company']]

# Convert to JSON
result_json = result_df.to_json(orient='records')

# Save or display JSON
with open('output.json', 'w') as file:
    file.write(result_json)

print(result_json)  # Optional: print the JSON output

如何将一个 csv 文件中的数据准确映射到另一个 csv 文件

问题描述投票：0回答：1

1个回答

最新问题

如何将一个 csv 文件中的数据准确映射到另一个 csv 文件

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1