如何将整列作为参数传递给 tldextract 函数？

Question

tldextract 用于从 URL 中提取域名。这里，“url”是数据框“df”中的列名称之一。可以将“url”的一个值作为参数传递。但是，我无法将整个列作为参数传递。此处传递的网址是“https://www.google.com/search?source=hp&ei=7iE”

listed = tldextract.extract(df['url'][0])
dom_name = listed.domain
print(dom_name)

输出：谷歌

我想要的是在名为“Domain”的数据框中创建一个新列，其中包含从 URL 中提取的域名。

类似：

df['Domain'] = tldextract.extract(df['url'])

但这不起作用

这是代码：

# IMPORTING PANDAS
import pandas as pd
from IPython.display import display

import tldextract

# Read data sample
df = pd.read_csv("bookcsv.csv")

df['Domain'] = df['url'].apply(lambda url: tldextract.extract(url).domain)

这是输入数据：

数据框看起来像这样我无法将数据直接放在这里。所以，我发布了一张快照。

Answer 1

使用 apply 将函数应用于列中的每个元素，并使所有内容保持整齐排列。

df['Domain'] = df['url'].apply(lambda url: tldextract.extract(url).domain)

这是我用于测试的完整代码：

import pandas as pd, tldextract

df = pd.DataFrame([{'url':'https://google.com'}]*12)
df['Domain'] = df['url'].apply(lambda url: tldextract.extract(url).domain)
print(df)

输出：

                   url  Domain
0   https://google.com  google
1   https://google.com  google
2   https://google.com  google
3   https://google.com  google
4   https://google.com  google
5   https://google.com  google
6   https://google.com  google
7   https://google.com  google
8   https://google.com  google
9   https://google.com  google
10  https://google.com  google
11  https://google.com  google

Answer 2

@Neil 很接近。但是，你真的不需要 lamdba 函数。

import pandas as pd, tldextract

df = pd.DataFrame([{'url':'https://google.com'}]*12)
df['Domain'] = df['url'].apply(tldextract.extract)
print(df)

如何将整列作为参数传递给 tldextract 函数？

问题描述投票：0回答：2

2个回答

最新问题

如何将整列作为参数传递给 tldextract 函数？

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2