vb.net 从网站抓取

Question

所以我尝试从网站上抓取用户名并按照此处的教程进行操作

https://www.youtube.com/watch?v=FpAvBOhDrYk 第 1 部分

https://www.youtube.com/watch?src_vid=FpAvBOhDrYk第二部分

并遵循了所有内容，但无法使其工作，但这是我使用的 vb.net 代码

导入 System.Text.RegularExpressions

公开课表格1

Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
    Dim Request As System.Net.HttpWebRequest = System.Net.HttpWebRequest.Create("http://statigr.am/tag/anime")
    Dim response As System.Net.HttpWebResponse = Request.GetResponse

    Dim rs As System.IO.StreamReader = New System.IO.StreamReader(response.GetResponseStream())

    Dim rssourcecode As String = rs.ReadToEnd

    '<a href="/hannahotaku">hannahotaku</a>

    Dim r As New System.Text.RegularExpressions.Regex("<a href=""/.*"">hannahotaku</a>")
    Dim matches As MatchCollection = r.Matches(rssourcecode)


    For Each itemcode As Match In matches
        ListBox1.Items.Add(itemcode.Value.Split("""").GetValue(1))

    Next


End Sub End Class

如你所见，我正在使用网站静态图我试图抓取的来源是这个

<a href="/hannahotaku">hannahotaku</a>

请让我知道我做错了什么，因为我想刮掉

之间的部分

(<a href="/**whatever username here**"></a>)

Answer 1

如果您想捕获整个链接：

(<a href="\/.+?">hannahotaku<\/a>)

如果您想捕获用户名：

<a href="\/(.+?)">hannahotaku<\/a>

据我所知，VB.net 可能是：

<a href=""/(.+?)"">hannahotaku</a>

使用惰性匹配（

+?

）来确保它只匹配所需的数量，没有额外的内容，并使用加号来确保其中至少有一些单字母用户名，并且它不是完全空的。

附注我对vb.net不太熟悉，所以如果需要做一些调整，请告诉我。

演示

Answer 2

改用这个正则表达式：

"<div><div>([^<]+)</div>"

并且在 for 循环中，使用

itemcode.Groups(1).Value

而不是

itemcode.Value.Split("""").GetValue(1)

。这将为您提供 div 标签之间的部分。

要检索匹配项，请尝试将它们放入文件中：

Imports System.Text.RegularExpressions

Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
    Dim Request As System.Net.HttpWebRequest = System.Net.HttpWebRequest.Create("http://statigr.am/tag/anime")
    Dim response As System.Net.HttpWebResponse = Request.GetResponse

    Dim rs As System.IO.StreamReader = New System.IO.StreamReader(response.GetResponseStream())

    Dim rssourcecode As String = rs.ReadToEnd

    Dim r As New System.Text.RegularExpressions.Regex("<div><div>([^<]+)</div>")
    Dim matches As MatchCollection = r.Matches(rssourcecode)

    Using Dim addInfo = File.CreateText("c:\Textfile.txt")
        For Each itemcode As Match In matches
            addInfo.WriteLine(itemcode.Groups(1).Value)
        Next
    End Using


End Sub End Class

vb.net 从网站抓取

问题描述投票：0回答：2

2个回答

最新问题

vb.net 从网站抓取

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2