从clinicalTrials.gov获取数据

问题描述 投票:1回答:2

我正在研究一个小的Python函数来从clinicalTrials.gov中获取数据。从每份研究记录中,我希望了解该研究所针对的条件。例如,对于this学习记录,我想要以下内容:

conditions = ['Rhinoconjunctivitis', 'Rhinitis', 'Conjunctivitis'. 'Allergy']

但是,在每个研究记录中,有不同数量的条件。我写了以下脚本来获取数据:

page = requests.get('https://clinicaltrials.gov/ct2/show/study/NCT00550550')
soup = BeautifulSoup(page.text, 'html.parser')
studyDesign = soup.find_all(headers='studyInfoColData')
condition = soup.find(attrs={'class':'data_table'}).find_all('span')
for each in condition:
    print(each.text.encode('utf-8').strip())

像这样:

b'Condition or disease'
b'Intervention/treatment'
b'Phase'
b'Rhinoconjunctivitis'
b'Rhinitis'
b'Conjunctivitis'
b'Allergy'
b'Drug: Placebo'
b'Biological: SCH 697243'
b'Drug: Loratadine Syrup 1 mg/mL Rescue Treatment'
b'Drug: Loratadine 10 mg Rescue Treatment'
b'Drug: Olopatadine 0.1% Rescue Treatment'
b'Drug: Mometasone furoate 50 mcg Rescue Treatment'
b'Drug: Albuterol 108 mcg Rescue Treatment'
b'Drug: Fluticasone 44 mcg Rescue Treatment'
b'Drug: Prednisone 5 mg Rescue Treatment'
b'Phase 3'

我怎么才能在没有干预/治疗信息的情况下获得病情?

python web-scraping beautifulsoup python-requests
2个回答
1
投票

您可以在table中使用第一个data_table和类span并提取td元素:

import requests
from bs4 import BeautifulSoup

page = requests.get('https://clinicaltrials.gov/ct2/show/study/NCT00550550')
soup = BeautifulSoup(page.text, 'html.parser')
studyDesign = soup.find("table", {"class" : "data_table"}).find('td')
conditions = [ t.text.strip() for t in studyDesign.find_all('span') ]
print(conditions)

这使 :

[u'Rhinoconjunctivitis', u'Rhinitis', u'Conjunctivitis', u'Allergy']

1
投票

也许这段代码会有所帮助。

import requests
from bs4 import BeautifulSoup

#url = "https://clinicaltrials.gov/ct2/show/NCT02656888"
url = "https://clinicaltrials.gov/ct2/show/study/NCT00550550"

page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
table = soup.find_all("table", class_="data_table")

tds = [tr.find_all("td") for tr in table]
conditions = [condition for condition in (tds[0][0].get_text().split("\n")) if condition != ""]

print(conditions)
© www.soinside.com 2019 - 2024. All rights reserved.