获取每个组值的列表列表,而无需在 pandas 中旋转表格

问题描述 投票:0回答:2

我有以下数据框:

import numpy as np
import pandas as pd

data = np.random.uniform(0, 1, (4, 5))
df = pd.DataFrame(data, columns = [2010,2011,2012,2013,2014])
df = df.stack().reset_index().drop(['level_0'], axis =1)

这给了我这张桌子:

    level_1 0
0   2010    0.490534
1   2011    0.292247
2   2012    0.809696
3   2013    0.198586
4   2014    0.642714
5   2010    0.854330
6   2011    0.637989
7   2012    0.229752
8   2013    0.017705
9   2014    0.632559
10  2010    0.596599
11  2011    0.919915
12  2012    0.622230
13  2013    0.991401
14  2014    0.983660
15  2010    0.351667
16  2011    0.439194
17  2012    0.532181
18  2013    0.205366
19  2014    0.226996

我想创建一个包含每一行的列表列表,但不先旋转表格:

[[0.4905338749671617,
  0.29224663913303917,
  0.8096956093927243,
  0.19858573316125572,
  0.6427138499319793],
 [0.8543300469401851,
  0.637988503570788,
  0.22975189294097909,
  0.017704963198544643,
  0.6325592815879836],
 [0.5965991619700056,
  0.9199147665832661,
  0.6222296923842731,
  0.9914005292156067,
  0.9836596573737321],
 [0.35166657263076084,
  0.43919406028150476,
  0.5321807826469648,
  0.2053657224576596,
  0.22699615245608507]]

我知道如果我旋转桌子,我可以轻松地做到这一点:

import numpy as np
import pandas as pd

data = np.random.uniform(0, 1, (4, 5))
df = pd.DataFrame(data, columns = [2010,2011,2012,2013,2014])
data = df.values.tolist()

但很想知道是否可以在不先旋转桌子的情况下做同样的事情。

预期输出为: [[每年的第一个值],[每年的第二个值],[每年的第三个值],...]

pandas
2个回答
1
投票

假设数据已正确排序/分布,请使用

to_numpy
/
reshape
&
tolist
:

cols = [2010, 2011, 2012, 2013, 2014]

data = df[0].to_numpy().reshape(-1, len(cols)).tolist()

或者,按照 @mozway 的建议:

data = df[0].to_numpy().reshape(-1, df["level_1"].nunique()).tolist()

输出:

# with np.random.seed(0)

[[0.5488135039273248,
  0.7151893663724195,
  0.6027633760716439,
  0.5448831829968969,
  0.4236547993389047],
 [0.6458941130666561,
  0.4375872112626925,
  0.8917730007820798,
  0.9636627605010293,
  0.3834415188257777],
 [0.7917250380826646,
  0.5288949197529045,
  0.5680445610939323,
  0.925596638292661,
  0.07103605819788694],
 [0.08712929970154071,
  0.02021839744032572,
  0.832619845547938,
  0.7781567509498505,
  0.8700121482468192]]

0
投票

这将完成这项工作:

[v.tolist() for v in df.set_index(0).groupby('level_1').groups.values()]
© www.soinside.com 2019 - 2024. All rights reserved.