Dataframe放入的必须是unicode字符串,而不是0,如何给出字符串而不是dataframe

问题描述 投票:0回答:1

我尝试操作一些数据框,并编写了一个函数来计算两个城市之间的距离。

def find_distance(A,B):
    key = '0377f0e6b42a47fe9d30a4e9a2b3bb63'  # get api key from:  https://opencagedata.com
    geocoder = OpenCageGeocode(key)
    
    result_A = geocoder.geocode(A)
    lat_A = result_A[0]['geometry']['lat']
    lng_A = result_A[0]['geometry']['lng']
    
    result_B = geocoder.geocode(B)
    lat_B = result_B[0]['geometry']['lat']
    lng_B = result_B[0]['geometry']['lng']  

    return int(geodesic((lat_A,lng_A), (lat_B,lng_B)).kilometers)

这是我的数据框:

2          32              Mulhouse      1874.0       2 797         16.8             16,3 €  10.012786
13          13         Saint-Étienne      1994.0       3 005         14.3             13,5 €   8.009882
39          39               Roubaix      2845.0       2 591         17.4             15,0 €   6.830968
27          27             Perpignan      2507.0       3 119         15.1             13,3 €   6.727255
40          40             Tourcoing      3089.0       2 901         17.5             15,3 €   6.327547
25          25               Limoges      2630.0       2 807         14.2             12,5 €   6.030424
20          20               Le Mans      2778.0       3 202         14.4             12,3 €   5.789559

这是我的代码:

def clean_text(row):
    # return the list of decoded cell in the Series instead 
    return [r.decode('unicode_escape').encode('ascii', 'ignore') for r in row]

def main():
    inFile = "prix_m2_france.xlsx" #On ouvre l'excel
    inSheetName = "Sheet1" #le nom de l excel

    cols = ['Ville', 'Prix_moyen', 'Loyer_moyen'] #Les colomnes

    df =(pd.read_excel(inFile, sheet_name = inSheetName))
    
    df[cols] = df[cols].replace({'€': '', ",": ".", " ": "", "\u202f":""}, regex=True)
    # df['Prix_moyen'] = df.apply(clean_text)
    # df['Loyer_moyen'] = df.apply(clean_text)

    df['Prix_moyen'] = df['Prix_moyen'].astype(float)
    df['Loyer_moyen'] = df['Loyer_moyen'].astype(float)

    # df["Prix_moyen"] += 1
    df["revenu"] = (df['Loyer_moyen'] * 12) / (df["Prix_moyen"] * 1.0744) * 100
    # df['Ville'].replace({'Le-Havre': 'Le Havre', 'Le-Mans': 'Le Mans'})
    df["Ville"] = df['Ville'].replace(['Le-Havre', 'Le-Mans'], ['Le Havre', 'Le Mans'])
    df["distance"] = find_distance("Paris", df["Ville"])
    df2 = df.sort_values(by = 'revenu', ascending = False)
    print(df2.head(90))

main()
df["distance"] = find_distance("Paris", df["Ville"]) 

失败,我收到此错误:

opencage.geocoder.InvalidInputError:输入必须是 unicode 字符串,
不是 0 巴黎
1 马赛
2 里昂
3T

我将其想象为一个循环,我将在其中放置巴黎和城市之间的距离,但我想它将所有数据帧都放在我的第一个值上。

感谢您的帮助

(编辑,我刚刚粘贴了数据框的一部分)

python python-3.x pandas dataframe
1个回答
1
投票

你可以尝试这样的事情:

df["distance"] = [find_distance("Paris", city) for city in df["Ville"]]

    
© www.soinside.com 2019 - 2024. All rights reserved.