$ cat n2.txt
apn,date
3704-156,11/04/2019
3704-156,11/22/2019
5515-004,10/23/2019
3732-231,10/07/2019
3732-231,11/15/2019
$ python3
Python 3.7.5 (default, Oct 25 2019, 10:52:18)
[Clang 4.0.1 (tags/RELEASE_401/final)] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> df = pd.read_csv("n2.txt")
>>> df
apn date
0 3704-156 11/04/2019
1 3704-156 11/22/2019
2 5515-004 10/23/2019
3 3732-231 10/07/2019
4 3732-231 11/15/2019
>>> g = df.groupby('apn')
>>> g.last()
date
apn
3704-156 11/22/2019
3732-231 11/15/2019
5515-004 10/23/2019
>>> f = g.last()
>>> for r in f.itertuples(index=True, name='Pandas'):
... print(getattr(r,'apn'), getattr(r,'date'))
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
AttributeError: 'Pandas' object has no attribute 'apn'
>>> for r in f.itertuples(index=True, name='Pandas'):
... print(getattr(r,"apn"), getattr(r,"date"))
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
AttributeError: 'Pandas' object has no attribute 'apn'
将其打印到文件中的正确方法是什么?
例如
apn, date
3704-156,11/22/2019
3732-231,11/15/2019
5515-004,10/23/2019
df = pd.read_csv("n2.txt")
g = df.groupby('apn').last()
print(g.to_csv())
应按您希望的方式工作。
如果在控制台中键入g.to_csv()
,它将返回以'apn,data,\r\n...'
开头的字符串。当遇到print
时,'\r\n'
函数将开始新的一行,最终将按您的意愿输出。
您的代码应更改:
df = pd.read_csv("n2.txt")
g = df.groupby('apn')
f = g.last()
使用Series.to_csv
,因为Series.to_csv
的输出是熊猫f
:
Series
或使用f.to_csv(file)
将DataFrame.to_csv
转换为两列DataFrame.to_csv
:
index
或与DataFrame
一起使用解决方案:
f.reset_index().to_csv(file, index=False)
在您的解决方案中,将DataFrame.drop_duplicates
用于选择DataFrame.drop_duplicates
的df = pd.read_csv("n2.txt")
df = df.drop_duplicates('apn', keep='last')
df.to_csv(file, index=False)
:
Index