尝试编写一个简单的解析器来获取我的机器人的信息。代码很简单。但作为回报,我只得到“您无权访问此资源。”不是像我在网络浏览器中看到的源代码
using System;
using System.Net.Http;
namespace ConsoleApp3
{
internal class Program
{
static void Main(string[] args)
{
using (var client = new HttpClient())
{
var endpoint = new Uri("https://www.investing.com/economic-calendar/");
var result = client.GetAsync(endpoint).Result;
var json = result.Content.ReadAsStringAsync().Result;
}
}
}
}
包含 json 变量的完整数据看起来像
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>403 Forbidden</title>
</head><body>
<h1>Forbidden</h1>
<p>You don't have permission to access this resource.</p>
<script>(function(){if (!document.body) return;var js = "window['__CF$cv$params']={r:'88e838692a8f9d5a',t:'MTcxNzUwNzIyNy4yNjMwMDA='};_cpo=document.createElement('script');_cpo.nonce='',_cpo.src='/cdn-cgi/challenge-platform/scripts/jsd/main.js',document.getElementsByTagName('head')[0].appendChild(_cpo);";var _0xh = document.createElement('iframe');_0xh.height = 1;_0xh.width = 1;_0xh.style.position = 'absolute';_0xh.style.top = 0;_0xh.style.left = 0;_0xh.style.border = 'none';_0xh.style.visibility = 'hidden';document.body.appendChild(_0xh);function handler() {var _0xi = _0xh.contentDocument || _0xh.contentWindow.document;if (_0xi) {var _0xj = _0xi.createElement('script');_0xj.innerHTML = js;_0xi.getElementsByTagName('head')[0].appendChild(_0xj);}}if (document.readyState !== 'loading') {handler();} else if (window.addEventListener) {document.addEventListener('DOMContentLoaded', handler);} else {var prev = document.onreadystatechange || function () {};document.onreadystatechange = function (e) {prev(e);if (document.readyState !== 'loading') {document.onreadystatechange = prev;handler();}};}})();</script></body></html>
有人可以帮助我获得像在浏览器中一样的完整代码吗?
一些网站会检查请求标头(包括UserAgent)来确定请求是来自浏览器还是来自机器人/爬虫。
如果您设置了真正的UserAgent,那么您将收到与从浏览器发出请求相同的结果
var client = new HttpClient();
client.DefaultRequestHeaders.Add("User-Agent","Mozilla/5.0 (Macintosh; Intel Mac OS X x.y; rv:42.0) Gecko/20100101 Firefox/42.0");
var endpoint = new Uri("https://www.investing.com/economic-calendar/");
var result = await client.GetAsync(endpoint);
var html = await result.Content.ReadAsStringAsync();
另外请优先选择
await
而不是.Result
Dotnet 小提琴:https://dotnetfiddle.net/sTbavn