无法使用请求模块提取链接到平面图的所有图像链接

问题描述 投票:0回答:1

我正在尝试使用请求模块获取与位于网页中间的平面图相关的图像链接。这些链接在页面源代码中可用,但即使使用正则表达式,我也无法抓取它们,因为它们分散在整个页面中。里面有 11 张图片。

import re
import json
import requests

link = 'https://www.livabl.com/abbotsford-bc/jem1'

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
    'Referer': 'https://www.livabl.com/',
}

def get_floor_plan_images(link,headers):
    res = requests.get(link,headers=headers)
    print(res.status_code)
    match = re.search(r"\{\\\"images\\\":(.*?]),",res.text)
    if match:
        image_links = match.group(1)
        return image_links

images = get_floor_plan_images(link,headers)
print(images)

如何使用请求模块提取连接到平面图的所有图像链接?

python python-3.x web-scraping python-requests
1个回答
0
投票

我想这就是你所需要的:

import re
import json
import requests

link = 'https://www.livabl.com/abbotsford-bc/jem1'

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
    'Referer': 'https://www.livabl.com/',
}

def get_floor_plan_images(link,headers):
    res = requests.get(link,headers=headers)
    print(res.status_code)
    return re.finditer(r"\{\\\"images\\\":(.*?]),",res.text)


for img in get_floor_plan_images(link,headers):
    print(img.group(1))
© www.soinside.com 2019 - 2024. All rights reserved.