Selenium 网页抓取元素

问题描述 投票:0回答:1

我正在尝试使用 Selenium (Python) 从具有以下结构的网站进行网页抓取(我对 html 进行了匿名化),但 GET 部分不起作用..

我想从 HTML 中检索以下信息:

  • 信息1
  • 信息2
  • 信息3
  • 文本1

每当我使用 XPATH 尝试以下代码时,它都不起作用:

代码:

INFO_1= driver.find_elements(By.XPATH, "/html/body/div[1]/div/div/div[2]/section[2]/div[2]/div[1]/div/a/div[3]/div/span[1]") 

HTML 代码:

<a href=" anonymized href " title=" **TITLE** " class="suggestion-link">
   <div>
<a href="#" style="display: inherit;"><span class="fav"></span></a></div> 
<div class="image-wrap">
   <!----> <!----> <!----> <!----> <!----> <!----> 
   <div class="carousel">
      <span id="carousel_prev_8iikolcpmwi" style="display: none;"></span> 
      <div id="carousel_o2u8ikkx7nl" class="owl-carousel owl-theme owl-loaded owl-drag">
         <div class="owl-stage-outer">
            <div class="owl-stage" style="transform: translate3d(0px, 0px, 0px); transition: all 0s ease 0s; width: 970px;">
               <div class="owl-item active" style="width: 323.242px;">
                  <div class="item">
                     <div class="filigrane"><img src="/images/filigrane.png"></div>
                     <div class="loaded">< anonymised.jpg" alt=" **TITLE 2**" class="img"> </div>
                  </div>
               </div>
               <div class="owl-item" style="width: 323.242px;">
                  <div class="item">
                     <div class="filigrane"><img src="/images/filigrane.png"></div>
                     <div class="loaded"><img src=" anonymized.jpg" alt=" **TITLE** " class="img"> </div>
                  </div>
               </div>
               <div class="owl-item" style="width: 323.242px;">
                  <div class="item">
                     <div class="filigrane"><img src="/images/filigrane.png"></div>
                     <div class="loaded"><img src=" anonymised.jpg" alt=" **TITLE** " class="img"> </div>
                  </div>
               </div>
            </div>
         </div>
         <div class="owl-nav disabled">
            <div class="owl-prev">next</div>
            <div class="owl-next">prev</div>
         </div>
         <div class="owl-dots"><button role="button" class="owl-dot active"><span></span></button><button role="button" class="owl-dot"><span></span></button><button role="button" class="owl-dot"><span></span></button></div>
      </div>
      <span id="carousel_next_d6crxou5xy"></span>
   </div>
</div>
<div class="content-wrap">
   <div class="card-top">
      <span class="card-left uppercase"> **INFO 1**</span> 
      <span class="card-right">
         <!----> <!---->
         **INFO 2**
         <!---->
      </span>
   </div>
   <h3 class="title-wrap">**INFO 3**</h3>
   <p class=""><span class="moreup"></span> **TEXT 1**.</p>
</div>
</a>
python html selenium-webdriver web-scraping
1个回答
0
投票

鉴于共享 HTML,请尝试以下相对 XPath 表达式:

  • 要找到信息 1:
//span[text()=' **INFO 1**' and @class='card-left uppercase']
  • 要找到信息 2:
//span[@class='card-right']
  • 要找到信息 3:
//h3[@class='title-wrap' and contains(text(),'INFO 3')]
  • 要找到文本 1:
//p[contains(text(),'TEXT 1')]
© www.soinside.com 2019 - 2024. All rights reserved.