当列具有逗号分隔值时创建新的数据框行

问题描述 投票:0回答:1

示例数据框:

name      col1    col2     col3
bob       bird     78       1000
alice     cat      55       500,600,700
rob       dog      333      20,30

当 col3 具有逗号分隔的字符串值时添加行的所需数据框:

name     col1      col2     col3
bob      bird       78      1000
alice    cat        55      500
alice    cat        55      600
alice    cat        55      700
rob      dog        333      20
rob      dog        333      30

如有任何建议,我们将不胜感激!谢谢!

python pandas dataframe
1个回答
0
投票
import pandas as pd
import numpy as np


class DataFrameExpander:
    """
    Class for extending DataFrame by adding new rows when the values ​​in a column contain comma-separated strings.
    """

    def __init__(self, dataframe):
        """
        Initializing a class with a DataFrame.
        :param dataframe: Исходный DataFrame
        """
        self.dataframe = dataframe

    def expand_column(self, column_name):
        """
        Method to extend a DataFrame by adding new rows for comma-separated values ​​in the specified column.
        :param column_name: The name of the column to split comma-separated values
        :return: The new extended DataFrame
        """
        # Use the explode method to efficiently split string values
        df = self.dataframe.copy()
        df[column_name] = df[column_name].apply(lambda x: str(x).split(','))
        expanded_df = df.explode(column_name, ignore_index=True)

        # Приводим значения в колонке обратно к числовому типу, если возможно
        expanded_df[column_name] = pd.to_numeric(
            expanded_df[column_name], errors='ignore')

        return expanded_df


if __name__ == "__main__":

    data = {
        'name': ['bob', 'alice', 'rob'],
        'col1': ['bird', 'cat', 'dog'],
        'col2': [78, 55, 333],
        'col3': ['1000', '500,600,700', '20,30']
    }

    df = pd.DataFrame(data)
    expander = DataFrameExpander(df)
    expanded_df = expander.expand_column('col3')
    print(expanded_df)

© www.soinside.com 2019 - 2024. All rights reserved.