将多列相同数据合并为一列

问题描述 投票:0回答:1

我创建了一个数据框,添加来自多个来源的数据。这是一个示例子集:

index   CompanyName   Source1site   Source2site   Source3site   City
1       Comp1         web1.com      Nan           web2.com      Paris
2       Comp1         Web2.com      web2.com      Nan           Nan
3       Comp2         Nan           site1.com     Nan           Oakland
4       Comp2         site2.com     Nan           Nan           London
5       Comp3         Nan           Nan           Nan           Mexico
6       Comp4         Nan           url1.com      Nan           Nan
7       Comp5         Nan           example.com   Nan           New York

现在Source1site、Source2site和Source3site基本上都是从不同来源为CompanyName收集的网站域名。我希望以一种也保留其他列中的数据的方式合并这三列。这是我正在寻找的示例输出:

index   CompanyName   MergeSourceSite   City
1       Comp1         web1.com          Paris
2       Comp1         web2.com          Paris
3       Comp2         site1.com         Oakland
4       Comp2         Site1.com         London
5       Comp2         site2.com         Oakland
6       Comp2         site2.com         London
7       Comp3         Nan               Mexico
8       Comp4         url1.com          Nan
9       Comp5         example.com       New York

非常感谢我能得到的任何帮助。

谢谢,

python pandas dataframe
1个回答
0
投票

您可以通过以下方式实现此目的:

df = your_dataframe

# Merge source columns into a single column 
merged_sources = df.melt(
    id_vars=["index", "CompanyName", "City"],
    value_vars=["Source1site", "Source2site", "Source3site"],
    value_name="MergeSourceSite" 
)

# Remove rows with NaN in the MergeSourceSite column 
merged_sources = merged_sources.dropna(subset=["MergeSourceSite"])

# Remove duplicate rows and reset index 
merged_sources = merged_sources.drop(columns= 
["variable"]).drop_duplicates().reset_index(drop=True)

# Sort by index for a cleaner output 
merged_sources = merged_sources.sort_values(by="index").reset_index(drop=True)

它将为您提供所需的输出。

© www.soinside.com 2019 - 2024. All rights reserved.