我收到了一份应该是最新的员工名单,但它与用 ASP.NET 编写的 Intranet People Finder 不匹配。
由于信息很敏感,我无法访问人员查找器正在使用的数据库,因此我获取信息的唯一方法是从顶部的黄铜开始抓取结构,然后依次遍历每一层.
每个人都有一个员工编号,然后形成 URL
http://intranet/peoplefinder/index.aspx?srn=ABC1234
,然后所有向他们报告的人员都以格式 <a id="gvEmployees_ctl03_lnkFullName" href="index.aspx?srn=ABC4321" target="_self">
列在下面,其中每个 URL 指示员工编号并提供其团队的链接。
当团队规模很大时,就会出现问题,因为分页是在 GridView 中通过诸如
<a href="javascript:__doPostBack('gvEmployees','Page$2')">2</a>
之类的 URL 实现的。
我如何抓取此页面,捕获 SRN 和其他详细信息以及在 GridView 的所有页面上向该人员报告的人员,然后循环遍历每个报告者并执行相同的过程,直到整个列表完成?
结果 HTML 示例
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" >
<head><title>
People Finder: Name Surname
</title><link rel="stylesheet" href="/path/to/style.css" type="text/css" /><link rel="stylesheet" href="/path/to/anotherStyle.css" type="text/css" />
<script type="text/javascript" src="/path/to/peoplefinder.js"></script>
</head>
<body>
<form name="form1" method="post" action="/path/to/index.aspx" id="form1">
<div>
<input type="hidden" name="__EVENTTARGET" id="__EVENTTARGET" value="" />
<input type="hidden" name="__EVENTARGUMENT" id="__EVENTARGUMENT" value="" />
<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="### ViewState ###" />
</div>
<script type="text/javascript">
<!--
var theForm = document.forms['form1'];
if (!theForm) {
theForm = document.form1;
}
function __doPostBack(eventTarget, eventArgument) {
if (!theForm.onsubmit || (theForm.onsubmit() != false)) {
theForm.__EVENTTARGET.value = eventTarget;
theForm.__EVENTARGUMENT.value = eventArgument;
theForm.submit();
}
}
// -->
</script>
<script src="/path/to/WebResource.axd?d=AueXWrgAf8xSxMTAt1Q4AA2&t=633311832634916698" type="text/javascript"></script>
<div class="HP3CHeader">
<div id="LWHPBanner">
<h1><span id="lblName">Name Surname</span></h1>
</div>
</div>
<div id='CPMain'>
<div id="mainBox">
<div id="pnlEmployeeDetails">
<div id='basicData'>
<img id="imgPhoto" class="photo" src="/path/to/photo.jpg" style="height:69px;width:69px;border-width:0px;" />
<span id="lblBusinessUnit">Business Unit</span>
<span id="lblCostCentreName">Cost Centre</span>
<span id="lblLocation">Location</span>
<a href='/path/to/checkcontactdetails.htm' target='_blank' onclick='return OpenCheckContactDetails();' >Find out how to change your details/photo.</a>
<div id="manager">
<strong>Reports to: </strong><a id="hlManager" href="/path/to/index.aspx?srn=ABC1234">Name Surname</a>
</div>
</div>
<div id='contactData'>
<div id="pnlSrn">
<strong>Staff number:</strong> <span id="lblSrn">ABC1234</span>
</div>
<div id="pnlEmailAddress">
<strong>Email Address:</strong> <span id="lblEmailAddress">Email</span>
</div>
<div style="clear: both"></div>
</div>
</div>
<div id="pnlGrid">
<h3><span id="lblGridTitle">Name's team</span></h3>
<div>
<table class="subordinates" cellspacing="0" cellpadding="2" rules="cols" border="1" id="gvEmployees" style="border-style:None;border-collapse:collapse;">
<tr style="color:Black;background-color:#EFF3FB;border-style:None;font-weight:bold;">
<th scope="col"><a href="javascript:__doPostBack('gvEmployees','Sort$SRN')" style="color:Black;">SRN</a></th><th scope="col"><a href="javascript:__doPostBack('gvEmployees','Sort$FullName')" style="color:Black;">Full name</a></th><th scope="col"><a href="javascript:__doPostBack('gvEmployees','Sort$RACFID')" style="color:Black;">RACFID</a></th>
</tr><tr class="reports" style="background-color:White;border-style:None;">
<td style="width:70px;">ABC1234</td><td>
<a id="gvEmployees_ctl02_lnkFullName" href="index.aspx?srn=1K5932" target="_self">Name Surname</a>
</td><td>ABCD</td>
</tr><tr class="reports" style="background-color:#EFF3FB;border-style:None;">
<td style="width:70px;">ABC1234</td><td>
<a id="gvEmployees_ctl03_lnkFullName" href="/path/to/index.aspx?srn=ABC1234" target="_self">Name Surname</a>
</td><td>ABCD</td>
</tr><tr class="reports" style="background-color:White;border-style:None;">
<td style="width:70px;">ABC1234</td><td>
<a id="gvEmployees_ctl04_lnkFullName" href="/path/to/index.aspx?srn=ABC1234" target="_self">Name Surname</a>
</td><td>ABCD</td>
</tr><tr class="reports" style="background-color:#EFF3FB;border-style:None;">
<td style="width:70px;">ABC1234</td><td>
<a id="gvEmployees_ctl05_lnkFullName" href="/path/to/index.aspx?srn=ABC1234" target="_self">Name Surname</a>
</td><td>ABCD</td>
</tr><tr class="reports" style="background-color:White;border-style:None;">
<td style="width:70px;">ABC1234</td><td>
<a id="gvEmployees_ctl06_lnkFullName" href="/path/to/index.aspx?srn=ABC1234" target="_self">Name Surname</a>
</td><td>ABCD</td>
</tr><tr class="reports" style="background-color:#EFF3FB;border-style:None;">
<td style="width:70px;">ABC1234</td><td>
<a id="gvEmployees_ctl07_lnkFullName" href="/path/to/index.aspx?srn=ABC1234" target="_self">Name Surname</a>
</td><td>ABCD</td>
</tr><tr class="reports" style="background-color:White;border-style:None;">
<td style="width:70px;">ABC1234</td><td>
<a id="gvEmployees_ctl08_lnkFullName" href="/path/to/index.aspx?srn=ABC1234" target="_self">Name Surname</a>
</td><td>ABCD</td>
</tr><tr class="reports" style="background-color:#EFF3FB;border-style:None;">
<td style="width:70px;">ABC1234</td><td>
<a id="gvEmployees_ctl09_lnkFullName" href="/path/to/index.aspx?srn=ABC1234" target="_self">Name Surname</a>
</td><td>ABCD</td>
</tr><tr class="reports" style="background-color:White;border-style:None;">
<td style="width:70px;">ABC1234</td><td>
<a id="gvEmployees_ctl10_lnkFullName" href="/path/to/index.aspx?srn=ABC1234" target="_self">Name Surname</a>
</td><td>ABCD</td>
</tr><tr class="reports" style="background-color:#EFF3FB;border-style:None;">
<td style="width:70px;">ABC1234</td><td>
<a id="gvEmployees_ctl11_lnkFullName" href="/path/to/index.aspx?srn=ABC1234" target="_self">Name Surname</a>
</td><td>ABCD</td>
</tr><tr class="PagerStyle" style="color:#000039;border-style:None;">
<td colspan="3"><table border="0">
<tr>
<td><span>1</span></td><td><a href="javascript:__doPostBack('gvEmployees','Page$2')" style="color:#000039;">2</a></td>
</tr>
</table></td>
</tr>
</table>
</div>
</div>
</div>
<div id="searchBox">
<strong>Search People Finder:</strong>
<br /><br />
<span>Forename:</span><br/>
<span><input name="txtFirstname" type="text" id="txtFirstname" /></span><br/>
<span>Surname:</span><br/>
<span><input name="txtSurname" type="text" id="txtSurname" /></span><br/>
<span>RACFID:</span><br/>
<span><input name="txtRacfid" type="text" id="txtRacfid" /></span><br/>
<span>Staff number:</span><br/>
<span><input name="txtSrn" type="text" id="txtSrn" /></span><br/>
<div class="searchBoxItem" style="text-align:center;width:100%"><input type="submit" name="btnFind" value="Search" onclick="javascript:WebForm_DoPostBackWithOptions(new WebForm_PostBackOptions("btnFind", "", false, "", "index.aspx", false, false))" id="btnFind" title="Search for employees member" class="button" style="border-style:Outset;" /></div><br/>
<div>People Finder searches only UK staff.</div>
<!-- <div><a class="execBoardLink" href="/path/to/index.aspx?srn=ABC1234">Show Executive Board</a></div> -->
<div style="margin-top:5px;"><a href="/path/to/phonebook" target="phoneBook" onclick='return OpenPhonebook();' title="Open Phonebook in new window">Open Phonebook</a></div>
</div>
</div>
<div class="contentFooter" style="text-align:center;">
<table width="100%" cellpadding="0" cellspacing="0" border="0" summary="Navigation layout table">
<tr>
<td align="left"><span class="linkArrow"><</span> <a href="javascript:history.back();">Back</a></td>
<td align="center"></td>
<td align="right"><span class="linkArrow">^ </span><a href="#top">Top</a></td>
</tr>
</table>
</div>
<div>
<input type="hidden" name="__PREVIOUSPAGE" id="__PREVIOUSPAGE" value="vy066Txz34y1E515UsTSTDabHKEmdBRCsq7xM0lpJls1" />
<input type="hidden" name="__EVENTVALIDATION" id="__EVENTVALIDATION" value="/wEWCgKM3uTTAgLP/83pDwLfwaTTAQKNguzjCAKt98LeCwLZh62pDwKKqdGpBwLd2q7jAwKa+5aMBAL5zb65C42zY4GBEUKujhjtZ/hZ8sLESfiF" />
</div></form>
</body>
</html>
您可以将变量发布到 HTML 页面来进行分页。
string lcUrl = "http://www.mysite.com/page.aspx";
HttpWebRequest loHttp =
(HttpWebRequest) WebRequest.Create(lcUrl);
// *** Send any POST data
string lcPostData =
"gvEmployees=" + HttpUtility.UrlEncode("Page$2");
loHttp.Method="POST";
byte [] lbPostBuffer = System.Text.
Encoding.GetEncoding(1252).GetBytes(lcPostData);
loHttp.ContentLength = lbPostBuffer.Length;
Stream loPostData = loHttp.GetRequestStream();
loPostData.Write(lbPostBuffer,0,lbPostBuffer.Length);
loPostData.Close();
HttpWebResponse loWebResponse = (HttpWebResponse) loHttp.GetResponse();
Encoding enc = System.Text.Encoding.GetEncoding(1252);
StreamReader loResponseStream =
new StreamReader(loWebResponse.GetResponseStream(),enc);
string lcHtml = loResponseStream.ReadToEnd();
loWebResponse.Close();
loResponseStream.Close();
然后从字符串中解析出你需要的数据。
--编辑--
这是我会尝试的(类似的),其中发送所有帖子数据:
string lcPostData =
"__EVENTTARGET" + HttpUtility.UrlEncode("gvEmployees"); &
"__EVENTARGUMENT" + HttpUtility.UrlEncode("Page%242"); &
"__VIEWSTATE" + HttpUtility.UrlEncode("<Value of _Viewstate>");
您打开 fiddler 并打开 asp.net 网站表格的第二页。转到 Fiddler 中该特定页面会话的 webforms 选项卡,并在正文中检查正在发布的变量是什么。以相同的序列格式连接所有变量并发布数据使用 HttpWebRequest。 就我而言,它是:
string PostData = "__EVENTTARGET="
+ HttpUtility.UrlEncode("ctl00$ContentPlaceHolder2$grdDirectory")
+ "&"
+ "__EVENTARGUMENT="+HttpUtility.UrlEncode("Page$2")
+ "&"
+ "__VIEWSTATE="+ HttpUtility.UrlEncode(view_state)
+ "&"
+ "__VIEWSTATEGENERATOR="
+ HttpUtility.UrlEncode(viewstategenerator)
+ "&"
+ "__VIEWSTATEENCRYPTED="
+ HttpUtility.UrlEncode(viewstateencrypted)
+ "&"
+ "__EVENTVALIDATION=" + HttpUtility.UrlEncode(eventvalidation);
希望它会起作用。