在Python中通过其列标题引用表列

问题描述 投票:1回答:3

是否有Pythonic方式按名称引用2D列表的列?

我从网上导入了很多表,所以我创建了一个通用函数,用于从各种HTML表中创建二维列表。到现在为止还挺好。但下一步通常是逐行解析表。

# Sample table. 
# In real life I would do something like: table = HTML_table('url', 'table id')
table = 
[
    ['Column A', 'Column B', 'Column C'],
    ['One', 'Two', 3],
    ['Four', 'Five', 6]
]

# Current code:
iA = table[0].index('Column A')
iB = tabel[0].index('Column B')
for row in table[1:]:
    process_row(row[iA], row[iC])

# Desired code:
for row in table[1:]:
    process_row(row['Column A'], row['Column C'])
python-2.7
3个回答
2
投票

我想你真的很喜欢熊猫模块! http://pandas.pydata.org/

Put your list into a DataFrame

这也可以直接从html,csv等完成。

df = pd.DataFrame(table[1:], columns=table[0]).astype(str)

Access columns

df['Column A']

Access first row by index

df.iloc[0]

Process row by row

df.apply(lambda x: '_'.join(x), axis=0)

for index,row in df.iterrows():
    process_row(row['Column A'], row['Column C'])

Process a column

df['Column C'].astype(int).sum()

0
投票

对于您的问题,键的有序命令和列的行列表是不是更好的方法?我会用以下的东西:

table = {
    'Column A': [1, 4],
    'Column B': [2, 5],
    'Column C': [3, 6]
}

# And you would parse column by column...

for col, rows in table.iteritems():
    #do something

0
投票

我的QueryList很容易使用。

ql.filter(图集= '123')

ql.group_by(['portfolio','ticker'])

class QueryList(list):
    """filter and/or group_by a list of objects."""

    def group_by(self, attrs) -> dict:
        """Like a database group_by function.

        args:
            attrs: str or list.

        Returns:
            {value_of_the_group: list_of_matching_objects, ...}
            When attrs is a list, each key is a tuple.
            Ex:
            {'AMZN': QueryList(),
            'MSFT': QueryList(),
            ...
            }
            -- or --
            {('Momentum', 'FB'): QueryList(),
             ...,
            }
        """
        result = defaultdict(QueryList)
        if isinstance(attrs, str):
            for item in self:
                result[getattr(item, attrs)].append(item)
        else:
            for item in self:
                result[tuple(getattr(item, x) for x in attrs)].append(item)

        return result

   def filter(self, **kwargs):
        """Returns the subset of IndexedList that has matching attributes.
        args:
            kwargs: Attribute name/value pairs.

        Example:
            foo.filter(portfolio='123', account='ABC').
        """
        ordered_kwargs = OrderedDict(kwargs)
        match = tuple(ordered_kwargs.values())

        def is_match(item):
            if tuple(getattr(item, y) for y in ordered_kwargs.keys()) == match:
                return True
            else:
                return False

        result = IndexedList([x for x in self if is_match(x)])

        return result

    def scalar(self, default=None, attr=None):
        """Returns the first item in this QueryList.

        args:
            default: The value to return if there is less than one item,
                or if the attr is not found.
            attr: Returns getattr(item, attr) if not None.
        """
        item, = self[0:1] or [default]

        if attr is None:
            result = item
        else:
            result = getattr(item, attr, default)
        return result

我试过熊猫。我想要它,我真的喜欢它。但最终它对我的需求来说太复杂了。

例如:

df [df ['portfolio'] =='123']&df ['ticker'] =='MSFT']]

不是那么简单

ql.filter(portfolio ='123',ticker ='MSFT')

此外,创建QueryList比创建df更简单。

那是因为您倾向于使用带有QueryList的自定义类。数据转换代码自然会被放入自定义类中,从而使其与逻辑的其余部分分开。但是df的数据转换通常与其余代码一起内联。

© www.soinside.com 2019 - 2024. All rights reserved.