在 python 中打开 DBF 文件时出现问题

问题描述 投票:0回答:5

我正在尝试打开多个 DBF 文件并将其转换为数据帧。其中大多数工作正常,但对于其中一个文件,我收到错误: “UnicodeDecodeError:‘utf-8’编解码器无法解码位置 15 中的字节 0xf6:起始字节无效”

我在其他一些主题上读过此错误,例如打开 csv 和 xlsx 以及其他文件。建议的解决方案是包括

encoding = 'utf-8'
在读取文件部分。不幸的是,我还没有找到 DBF 文件的解决方案,而且我对 DBF 文件的了解非常有限。

到目前为止我尝试过的:

1)

from dbfread import DBF
dbf = DBF('file.DBF')
dbf = pd.DataFrame(dbf)

UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 8: character maps to <undefined>

2)

from simpledbf import Dbf5
dbf = Dbf5('file.DBF')
dbf = dbf.to_dataframe()

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf6 in position 15: invalid start byte

3)

# this block of code copied from https://gist.github.com/ryan-hill/f90b1c68f60d12baea81 
import pysal as ps

def dbf2DF(dbfile, upper=True): #Reads in DBF files and returns Pandas DF
    db = ps.table(dbfile) #Pysal to open DBF
    d = {col: db.by_col(col) for col in db.header} #Convert dbf to dictionary
    #pandasDF = pd.DataFrame(db[:]) #Convert to Pandas DF
    pandasDF = pd.DataFrame(d) #Convert to Pandas DF
    if upper == True: #Make columns uppercase if wanted 
        pandasDF.columns = map(str.upper, db.header) 
    db.close() 
    return pandasDF

dfb = dbf2DF('file.DBF')

AttributeError: module 'pysal' has no attribute 'open'

最后,如果我尝试安装

dbfpy
模块,我会收到: 语法错误:语法无效

关于如何解决这个问题有什么建议吗?

python dbf
5个回答
4
投票

尝试使用我的

dbf

import dbf

table = dbf.Table('file.DBF')

打印它以查看文件中是否存在编码:

print table    # print(table) in Python 3

我的一个测试表如下所示:

    Table:         tempy.dbf
    Type:          dBase III Plus
    Codepage:      ascii (plain ol ascii)
    Status:        DbfStatus.CLOSED
    Last updated:  2019-07-26
    Record count:  1
    Field count:   2
    Record length: 31 
    --Fields--
      0) name C(20)
      1) desc M

重要的一行是

Codepage
行 - 听起来好像没有为您的
DBF
文件正确设置。 如果您知道它应该是什么,您可以使用该代码页(暂时)打开它:

table = dbf.Table('file.DBF', codepage='...')

或者您可以使用以下方法永久更改它(更新

DBF
文件):

table.open()
table.codepage = dbf.CodePage('cp1252') # for example
table.close()

0
投票
 from simpledbf import Dbf5
 dbf2 = Dbf5('/Users/.../TCAT_MUNICIPIOS.dbf', codec='latin')
 df2 = dbf2.to_dataframe()
 df2.head(3)

0
投票
  1. 安装库DBF

    conda install DBF

  2. from dbfread import DBF

  3. db_in_dbf = DBF('paht/database.dbf)
    此行上传数据库

  4. df = pd.DataFrame(db_in_dbf )
    此行转换 pandas 的数据框


0
投票

对于所有在这个问题上帮助我的人,我必须修复损坏的 .dbf 文件(因此来自 .dbf,必须返回到 .dbf)。我的具体问题是整个 .dbf 中的日期...只是非常错误...并且尝试了多种方法来破解并重新组装它,但都失败了,有很多错误...在成功执行以下操作之前:

#Modify dbase3 file to recast null date fields as a default date and 
#reimport back into dbase3 file

import collections
import datetime
from typing import OrderedDict
import dbf as dbf1
from simpledbf import Dbf5
from dbfread import DBF, FieldParser
import pandas as pd
import numpy as np

#Default date to overwrite NaN values
blank_date = datetime.date(1900, 1, 1)

#Read in dbase file from Old Path and point to new Path
old_path = r"C:\...\ex.dbf"
new_path = r"C:\...\newex.dbf"

#Establish 1st rule for resolving corrupted dates
class MyFieldParser(FieldParser):
    def parse(self, field, data):
        try:
            return FieldParser.parse(self, field, data)
        except ValueError:
            return blank_date

#Collect the original .DBF data while stepping over any errors
table = DBF(old_path, None, True, False, MyFieldParser, collections.OrderedDict, False, False, False,'ignore')

#Grab the Header Name, Old School Variable Format, and number of characters/length for each variable
dbfh = Dbf5(old_path, codec='utf-8')
headers = dbfh.fields
hdct = {x[0]: x[1:] for x in headers}
hdct.pop('DeletionFlag')
keys = hdct.keys()

#Position of Type and Length relative to field name
ftype = 0
characters = 1

# Reformat and join all old school DBF Header fields in required format
fields = list()

for key in keys:
    ftemp = hdct.get(key)
    k1 = str(key)
    res1 = ftemp[ftype]
    res2 = ftemp[characters]
    if k1 == "decimal_field_name":
        fields.append(k1 + " " + res1 + "(" + str(res2) + ",2)")
    elif res1 == 'N':
        fields.append(k1 + " " + res1 + "(" + str(res2) + ",0)")
    elif res1 == 'D':
        fields.append(k1 + " " + res1)
    elif res1 == 'L':
        fields.append(k1 + " " + res1)
    else: 
        fields.append(k1 + " " + res1 + "(" + str(res2) + ")")


addfields = '; '.join(str(f) for f in fields)

#load the records of the.dbf into a dataframe
df = pd.DataFrame(iter(table))

#go ham reformatting date fields to ensure they are in the correct format
df['DATE_FIELD1'] = df['DATE_FIELD1'].replace(np.nan, blank_date)

df['DATE_FIELD1'] = pd.to_datetime(df['DATE_FIELD1'])


# eliminate further errors in the dataframe
df = df.fillna('0')

#drop added "record index" field from dataframe
df.set_index('existing_primary_key', inplace=False)


#initialize defaulttdict and convert the dataframe into a .DBF appendable format
dd = collections.defaultdict(list)
records = df.to_dict('records',into=dd)

#create the new .DBF file
new_table = dbf1.Table(new_path, addfields)

#append the dataframe to the new .DBF file
new_table.open(mode=dbf1.READ_WRITE)

for record in records:
    new_table.append(record)

new_table.close()

0
投票
import dbfread
import pandas as pd
 
dbf = dbfread.DBF('file.dbf')
print(dbf.encoding) #this will print the dbf encoding

dbf.encoding = 'utf8' #just change it to utf-8

pd.DataFrame(dbf)

可以访问dbf编码并将其更改为utf-8。

© www.soinside.com 2019 - 2024. All rights reserved.