奇数.txt报告到熊猫数据框

问题描述 投票:1回答:1

我有一个.txt报告,其中包含以.txt为报告格式的帐号,地址和信用额度

具有分页符,但通常看起来像这样

Customer Address Credit limit A001 Wendy's 20000 123 Main Street City, State Zip

我希望我的数据框看起来像这样

Customer Address Credit Limit A001 Wendy's 123 Main Street, City, Statement 20000

这里是我正在处理的示例csv的链接。

http://faculty.tlu.edu/mthompson/IDEA%20files/Customer.txt

我试图跳过行,但这没用。

python pandas csv report analysis
1个回答
0
投票

好吧,使用这种格式没有什么困难,但它不是csv。因此,既不能使用Python csv模块,也不能使用pandas read_csv。我们将不得不解析它[[手工。

最复杂的决定是为每个客户标识第一行和最后一行。我会使用:

    第一行以仅包含大写字母和数字的单词开头,以仅包含数字且长度超过100个字符的单词结尾
  • 该块在第一行空白处结束
  • 一旦完成:

      第一行包含帐号,名称,地址的第一行和帐户限制
  • 后续行包含地址的其他行
  • 这些字段位于固定位置:[5,19),[23,49),[57,77),[90,end_of_line)
  • 在Python中会给出:

    fieldpos = [(5,19), (23,49), (57,77), (90, -1)] # position of fields in the initial line inblock = False # we do not start inside a block account_pat = re.compile(r'[A-Z]+\d+\s*$') # regex patterns are compiled once for performance limit_pat = re.compile(r'\s*\d+$') data = [] # a list for the accounts with open(file) as fd: for line in fd: if not inblock: if (len(line) > 100): row = [line[f[0]:f[1]].strip() for f in fieldpos] if account_pat.match(row[0]) and limit_pat.match(row[-1]): inblock = True data.append(row) else: line = line.strip() if len(line) > 0: row[2] += ', ' + line else: inblock = False # we can now build a dataframe df = pd.DataFrame(data, columns=['Account Number', 'Name', 'Address', 'Credit Limit'])

    最终给出:

    Account Number Name Address Credit Limit 0 A001 Dan Ackroyd Audenshaw, 125 New Street, Montreal, Quebec, H... 20000 1 A123 Mike Atsil The Vetinary House, 123 Dog Row, Thunder Bay, ... 20000 2 A128 Ivan Aker The Old House, Ottawa, Ontario, P1D 8D4 10000 3 B001 Kim Basinger Mesh House, Fish Street, Rouyn, Quebec, J5V 2A9 12000 4 B002 Richard Burton Eagle Castle, Leafy Lane, Sudbury, Ontario, L3... 9000 5 B004 Jeff Bridges Arrow Road North, Lakeside, Kenora, Ontario, N... 20000 6 B008 Denise Bent The Dance Studio, Covent Garden, Montreal, Que... 20000 7 B010 Carter Bout Removals Close, No Fixed Abode Road, Toronto, ... 20000 8 B022 Ronnie Biggs Gotaway Cottage, Thunder Bay, Ontario, K3A 6F3 5000 9 C001 Tom Cruise The Firm, Gunnersbury, Waskaganish, Quebec, G1... 25000 10 C003 John Candy The Sweet Shop, High Street, Trois Rivieres, Q... 15000

  • © www.soinside.com 2019 - 2024. All rights reserved.