是否有Pythonic方式按名称引用2D列表的列?
我从网上导入了很多表,所以我创建了一个通用函数,用于从各种HTML表中创建二维列表。到现在为止还挺好。但下一步通常是逐行解析表。
# Sample table.
# In real life I would do something like: table = HTML_table('url', 'table id')
table =
[
['Column A', 'Column B', 'Column C'],
['One', 'Two', 3],
['Four', 'Five', 6]
]
# Current code:
iA = table[0].index('Column A')
iB = tabel[0].index('Column B')
for row in table[1:]:
process_row(row[iA], row[iC])
# Desired code:
for row in table[1:]:
process_row(row['Column A'], row['Column C'])
我想你真的很喜欢熊猫模块! http://pandas.pydata.org/
这也可以直接从html,csv等完成。
df = pd.DataFrame(table[1:], columns=table[0]).astype(str)
df['Column A']
df.iloc[0]
df.apply(lambda x: '_'.join(x), axis=0)
for index,row in df.iterrows():
process_row(row['Column A'], row['Column C'])
df['Column C'].astype(int).sum()
对于您的问题,键的有序命令和列的行列表是不是更好的方法?我会用以下的东西:
table = {
'Column A': [1, 4],
'Column B': [2, 5],
'Column C': [3, 6]
}
# And you would parse column by column...
for col, rows in table.iteritems():
#do something
我的QueryList很容易使用。
ql.filter(图集= '123')
ql.group_by(['portfolio','ticker'])
class QueryList(list):
"""filter and/or group_by a list of objects."""
def group_by(self, attrs) -> dict:
"""Like a database group_by function.
args:
attrs: str or list.
Returns:
{value_of_the_group: list_of_matching_objects, ...}
When attrs is a list, each key is a tuple.
Ex:
{'AMZN': QueryList(),
'MSFT': QueryList(),
...
}
-- or --
{('Momentum', 'FB'): QueryList(),
...,
}
"""
result = defaultdict(QueryList)
if isinstance(attrs, str):
for item in self:
result[getattr(item, attrs)].append(item)
else:
for item in self:
result[tuple(getattr(item, x) for x in attrs)].append(item)
return result
def filter(self, **kwargs):
"""Returns the subset of IndexedList that has matching attributes.
args:
kwargs: Attribute name/value pairs.
Example:
foo.filter(portfolio='123', account='ABC').
"""
ordered_kwargs = OrderedDict(kwargs)
match = tuple(ordered_kwargs.values())
def is_match(item):
if tuple(getattr(item, y) for y in ordered_kwargs.keys()) == match:
return True
else:
return False
result = IndexedList([x for x in self if is_match(x)])
return result
def scalar(self, default=None, attr=None):
"""Returns the first item in this QueryList.
args:
default: The value to return if there is less than one item,
or if the attr is not found.
attr: Returns getattr(item, attr) if not None.
"""
item, = self[0:1] or [default]
if attr is None:
result = item
else:
result = getattr(item, attr, default)
return result
我试过熊猫。我想要它,我真的喜欢它。但最终它对我的需求来说太复杂了。
例如:
df [df ['portfolio'] =='123']&df ['ticker'] =='MSFT']]
不是那么简单
ql.filter(portfolio ='123',ticker ='MSFT')
此外,创建QueryList比创建df更简单。
那是因为您倾向于使用带有QueryList的自定义类。数据转换代码自然会被放入自定义类中,从而使其与逻辑的其余部分分开。但是df的数据转换通常与其余代码一起内联。