我想获取个人的详细信息[关闭]

问题描述 投票:-3回答:2

我没有在这里得到地址。它给了我每个人的地址'我'。我想为每个人取地址。此代码提供除了来自bs4 import BeautifulSoup的地址之外的所有其他详细信息

import requests
for count in range(1,2):
   r = requests.get('https://www.ratemds.com/best-doctors/? 
   country=in&page='+str(count))
   soup = BeautifulSoup(r.text,'lxml')
   for links in soup.find_all('a',class_='search-item-doctor-link'):
   link = "https://www.ratemds.com"+links['href']
   r2 = requests.get(link)
   soup2 = BeautifulSoup(r2.text,'lxml')
   try:
         name = soup2.select_one('h1').text
         print "NAME:"+name
    except:
         print "NAME:NA"
    try:
         speciality = soup2.select_one('.search-item-info a').text
         print "SPECIALITY:"+speciality
    except:
         print "SPECIALITY:NA"
    try:  
         gender = soup2.select_one('i + a').text
         print "GENDER:"+gender
    except:
         print "GENDER:NA"
    try:
         speciality1 = soup2.select_one('i ~ [itemprop=name]').text
         print "SPECIALTY1:"+speciality1
    except:
         print"SPECIALITY1:NA"
    try:
         contact = soup2.select_one('[itemprop=telephone]')['content']
         print "CONTACT:"+contact
    except:
         print "CONTACT:NA"
    try:     
        website = soup2.select_one('[itemprop=sameAs]')['href']
        print "WEBSITE:"+website
    except:
        print "WEBSITE:NA"
    try:
        add = [item['content'] for item in soup2.select('[itemprop=address] meta')]
        print "ADDESS:"+add
    except:
        print "ADDRESS:NA"
python web-scraping beautifulsoup
2个回答
0
投票

以下是更广泛信息的选择器示例

from bs4 import BeautifulSoup
import requests
r = requests.get('https://www.ratemds.com/doctor-ratings/dr-dilip-raja-mumbai-mh-in', headers={'User-Agent': 'Mozilla/5.0'})
soup = BeautifulSoup(r.text,'lxml')

details = {
'name' : soup.select_one('h1').text,
'speciality' : soup.select_one('.search-item-info a').text,
'rating' : soup.select_one('.star-rating')['title'],
'gender' : soup.select_one('i + a').text,
'specialty_full' : soup.select_one('i ~ [itemprop=name]').text,
'phone' : soup.select_one('[itemprop=telephone]')['content'],
'address' : [item['content'] for item in soup.select('[itemprop=address] meta')],
'website' : soup.select_one('[itemprop=sameAs]')['href']
}

print(details)

您还可以将脚本标记定位为可以转换为json的大量信息。遗憾的是,一个漂亮的库转换hex> ascii似乎没有工作,因此已经完成了对dict的替换。

import requests
import json
from bs4 import BeautifulSoup as bs
import pandas as pd
import re

res = requests.get('https://www.ratemds.com/doctor-ratings/dr-dilip-raja-mumbai-mh-in', headers={'User-Agent': 'Mozilla/5.0'})
soup = bs(res.content, 'lxml')
r = re.compile(r'window\.DATA\.doctorDetailProps = JSON\.parse(.*)')
data = soup.find('script', text=r).text
script = r.findall(data)[0].rstrip('");').lstrip('("')
convert_dict = {
    '\\u0022' : '"',
    '\\u002D' : '-',
    '\\u003D' : '=',
    '\\u005Cn' : ' ',
    '\\u0027' : "'",
    '\\u005Cr' : '',
    'false' : '"false"',
    'true' : '"True"',
    'null' : '"null"'
}

for k, v in convert_dict.items():
    script = script.replace(k, v)

items = json.loads(script)
doctor = items['doctor']
print(doctor['full_name'])
print(doctor['specialty_name'])
print(doctor['gender'])
print(doctor['geocode_address'])
print(doctor['rating'])

0
投票

假设你做了pip install lxmlpip install beautifulsoup4,你使用的代码似乎工作得很好。

这里的工作示例(单击“运行”):https://repl.it/repls/DarkorangeFinishedSoftwaresuite

如果你没有得到与我的工作示例相同的结果,那么它可能是你的request.get()网址中的额外空间。在这种情况下,您可以复制我使用的代码,看看它是否适合您。

© www.soinside.com 2019 - 2024. All rights reserved.