我正在使用Selenium Web Driver从LinkedIn个人资料中提取数据点。在这个例子中,我想从技能部分中提取每个技能,但数据被提取为HTML格式。
尝试将HTML代码转换为文本时,我收到附加的错误消息。
from parsel import Selector
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
driver = webdriver.Chrome('/Users/davidcraven/Downloads/chromedriver')
# get profile URL
driver.get('https://www.linkedin.com/AnyProfileURL')
# assigning the source code for the web page to variable sel
sel = Selector(text=driver.page_source)
# get skills
skills = sel.xpath('//*[starts-with(@class, "skills searchable has-several ")]').extract()
newtext = BeautifulSoup(skills, "lxml").text
你需要先选择一个元素:
driver.get('https://www.linkedin.com/AnyProfileURL')
soup = BeautifulSoup(driver.page_source, "lxml")
elem = soup.select_one('.skills.searchable.has-several')
if elem:
txt = elem.text