/profile 位于我的 URL 末尾，导致网络抓取功能无法工作

Question

我正在尝试从雅虎财经中抓取数据作为功能的一部分。末尾带有“/profile”的 URL 不起作用，但如果我将其删除，该 URL 就会引入。有人知道为什么会这样吗？这是我的代码。不知道为什么“/profile”会破坏 URL，但它是必需的，因为我想在该网站上提取更多数据，并且仅在个人资料页面上。

def get_name(ticker):
    import requests, re
    
    # put the URL here
    url =f'https://finance.yahoo.com/quote/{ticker}/profile'
    
    # download
    req = requests.get(url, headers = {'User-Agent': ua})
    html = req.text
    
    try:
        # use a regular expression to find the name
        name = re.search(r'(?<=<title>)(.*?)(?=\s\([A-Z]{3,4}\))', html)
        print(name)

    except:
        # if the regular expression fails, do nothing
        pass

我尝试过拉动

/profile

，它有效，但它只是不正确的链接。我也尝试过将

{ticker}

更改为

'+ticker+'

Answer 1

如果我太频繁地调用 url，我似乎会返回 html... 然后它会在 24 小时内返回“内容当前不可用”，然后突然一切又恢复正常了。

/profile 位于我的 URL 末尾，导致网络抓取功能无法工作

问题描述投票：0回答：1

1个回答

最新问题

/profile 位于我的 URL 末尾，导致网络抓取功能无法工作

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1