我有以下数据框:
import numpy as np
import pandas as pd
data = np.random.uniform(0, 1, (4, 5))
df = pd.DataFrame(data, columns = [2010,2011,2012,2013,2014])
df = df.stack().reset_index().drop(['level_0'], axis =1)
这给了我这张桌子:
level_1 0
0 2010 0.490534
1 2011 0.292247
2 2012 0.809696
3 2013 0.198586
4 2014 0.642714
5 2010 0.854330
6 2011 0.637989
7 2012 0.229752
8 2013 0.017705
9 2014 0.632559
10 2010 0.596599
11 2011 0.919915
12 2012 0.622230
13 2013 0.991401
14 2014 0.983660
15 2010 0.351667
16 2011 0.439194
17 2012 0.532181
18 2013 0.205366
19 2014 0.226996
我想创建一个包含每一行的列表列表,但不先旋转表格:
[[0.4905338749671617,
0.29224663913303917,
0.8096956093927243,
0.19858573316125572,
0.6427138499319793],
[0.8543300469401851,
0.637988503570788,
0.22975189294097909,
0.017704963198544643,
0.6325592815879836],
[0.5965991619700056,
0.9199147665832661,
0.6222296923842731,
0.9914005292156067,
0.9836596573737321],
[0.35166657263076084,
0.43919406028150476,
0.5321807826469648,
0.2053657224576596,
0.22699615245608507]]
我知道如果我旋转桌子,我可以轻松地做到这一点:
import numpy as np
import pandas as pd
data = np.random.uniform(0, 1, (4, 5))
df = pd.DataFrame(data, columns = [2010,2011,2012,2013,2014])
data = df.values.tolist()
但很想知道是否可以在不先旋转桌子的情况下做同样的事情。
预期输出为: [[每年的第一个值],[每年的第二个值],[每年的第三个值],...]
to_numpy
/reshape
& tolist
:
cols = [2010, 2011, 2012, 2013, 2014]
data = df[0].to_numpy().reshape(-1, len(cols)).tolist()
或者,按照 @mozway 的建议:
data = df[0].to_numpy().reshape(-1, df["level_1"].nunique()).tolist()
输出:
# with np.random.seed(0)
[[0.5488135039273248,
0.7151893663724195,
0.6027633760716439,
0.5448831829968969,
0.4236547993389047],
[0.6458941130666561,
0.4375872112626925,
0.8917730007820798,
0.9636627605010293,
0.3834415188257777],
[0.7917250380826646,
0.5288949197529045,
0.5680445610939323,
0.925596638292661,
0.07103605819788694],
[0.08712929970154071,
0.02021839744032572,
0.832619845547938,
0.7781567509498505,
0.8700121482468192]]
这将完成这项工作:
[v.tolist() for v in df.set_index(0).groupby('level_1').groups.values()]