我想用curl到达facebook登录页面。我的目的是登录facebook,然后做一些scaping。由于最新的限制,我没有使用facebook API ...我需要在帖子上抓取评论,这是不可能的,只使用API。
这是我的一些代码:
curl_setopt($ch, CURLOPT_URL,"https://web.facebook.com");
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
$response = curl_exec($ch);
curl_close($ch);
echo $response;
我希望这会重定向到登录页面,然后当用户填写登录表单时,我会获取凭据并使用它们重定向到主页并开始抓取。
无论如何,这是我得到的:
(ps,我是该程序的作者)this program登录到facebook发送消息。登录代码可以找到here,登录程序在构造函数中完成,
但它的要点是你需要先做一个GET请求来获取一个cookie和一个csrf令牌和一些东西,从lgoin表单中解析出来,然后将其发送回application/x-www-form-urlencoded
POST请求以及用户名和密码到您的cookie会话特定的登录URL,您必须解析其在第一个GET请求中收到的html的URL。
使用一个暗示你有糟糕的javascript支持的用户代理也是符合你的最佳利益的(因为实际上,使用PHP,你没有。),这个代码使用的一个例子是'Mozilla/5.0 (BlackBerry; U; BlackBerry 9300; en) AppleWebKit/534.8+ (KHTML, like Gecko) Version/6.0.0.570 Mobile Safari/534.8+'
(又名,一个老黑莓手机)
"//a[contains(@href,'/login/save-device/cancel/')]"
和protip检测该问题,这是确认您设法登录的一种很好的方法,即查找注销按钮,在XPath中看起来像//a[contains(@href,"/logout.php")]
代码中最相关的部分是:
function __construct() {
$this->recipientID = \MsgMe\getUserOption ( 'Facebook', 'recipientID', NULL );
if (NULL === $this->recipientID) {
throw new \Exception ( 'Error: cannot find [Facebook] recipientID option!' );
}
$this->email = \MsgMe\getUserOption ( 'Facebook', 'email', NULL );
if (NULL === $this->email) {
throw new \Exception ( 'Error: cannot find [Facebook] email option!' );
}
$this->password = \MsgMe\getUserOption ( 'Facebook', 'password', NULL );
if (NULL === $this->password) {
throw new \Exception ( 'Error: cannot find [Facebook] password option!' );
}
$this->hc = new \hhb_curl ();
$hc = &$this->hc;
$hc->_setComfortableOptions ();
$hc->setopt_array ( array (
CURLOPT_USERAGENT => 'Mozilla/5.0 (BlackBerry; U; BlackBerry 9300; en) AppleWebKit/534.8+ (KHTML, like Gecko) Version/6.0.0.570 Mobile Safari/534.8+',
CURLOPT_HTTPHEADER => array (
'accept-language:en-US,en;q=0.8'
)
) );
$hc->exec ( 'https://m.facebook.com/' );
// \hhb_var_dump ( $hc->getStdErr (), $hc->getStdOut () ) & die ();
$domd = @\DOMDocument::loadHTML ( $hc->getResponseBody () );
$form = (\MsgMe\tools\getDOMDocumentFormInputs ( $domd, true )) ['login_form'];
$url = $domd->getElementsByTagName ( "form" )->item ( 0 )->getAttribute ( "action" );
$postfields = (function () use (&$form): array {
$ret = array ();
foreach ( $form as $input ) {
$ret [$input->getAttribute ( "name" )] = $input->getAttribute ( "value" );
}
return $ret;
});
$postfields = $postfields (); // sorry about that, eclipse can't handle IIFE syntax.
assert ( array_key_exists ( 'email', $postfields ) );
assert ( array_key_exists ( 'pass', $postfields ) );
$postfields ['email'] = $this->email;
$postfields ['pass'] = $this->password;
$hc->setopt_array ( array (
CURLOPT_POST => true,
CURLOPT_POSTFIELDS => http_build_query ( $postfields ),
CURLOPT_HTTPHEADER => array (
'accept-language:en-US,en;q=0.8'
)
) );
// \hhb_var_dump ($postfields ) & die ();
$hc->exec ( $url );
// \hhb_var_dump ( $hc->getStdErr (), $hc->getStdOut () ) & die ();
$domd = @\DOMDocument::loadHTML ( $hc->getResponseBody () );
$xp = new \DOMXPath ( $domd );
$InstallFacebookAppRequest = $xp->query ( "//a[contains(@href,'/login/save-device/cancel/')]" );
if ($InstallFacebookAppRequest->length > 0) {
// not all accounts get this, but some do, not sure why, anyway, if this exist, fb is asking "ey wanna install the fb app instead of using the website?"
// and won't let you proceed further until you say yes or no. so we say no.
$url = 'https://m.facebook.com' . $InstallFacebookAppRequest->item ( 0 )->getAttribute ( "href" );
$hc->exec ( $url );
$domd = @\DOMDocument::loadHTML ( $hc->getResponseBody () );
$xp = new \DOMXPath ( $domd );
}
unset ( $InstallFacebookAppRequest, $url );
$urlinfo = parse_url ( $hc->getinfo ( CURLINFO_EFFECTIVE_URL ) );
$a = $xp->query ( '//a[contains(@href,"/logout.php")]' );
if ($a->length < 1) {
$debuginfo = $hc->getStdErr () . $hc->getStdOut ();
$tmp = tmpfile ();
fwrite ( $tmp, $debuginfo );
$debuginfourl = shell_exec ( "cat " . escapeshellarg ( stream_get_meta_data ( $tmp ) ['uri'] ) . " | pastebinit" );
fclose ( $tmp );
throw new \RuntimeException ( 'failed to login to facebook! apparently... cannot find the logout url! debuginfo url: ' . $debuginfourl );
}
$a = $a->item ( 0 );
$url = $urlinfo ['scheme'] . '://' . $urlinfo ['host'] . $a->getAttribute ( "href" );
$this->logoutUrl = $url;
// all initialized, ready to sendMessage();
}