我只想使用python从下表中抓取代码
如图中所示,您可以看到我只想废弃CPT,CTC,PTC,STC,SPT,HTC,P5TC,P1A,P2A,P3A,P1E,P2E,P3E。该代码可能会不时更改,例如添加P4E或删除P1E。
上表的HTML代码是:
<table class="list">
<tbody>
<tr>
<td>
<p>PRODUCT<br>DESCRIPTION</p>
</td>
<td>
<p><strong>Time Charter:</strong> CPT, CTC, PTC, STC, SPT, HTC, P5TC<br><strong>Time Charter Trip:</strong> P1A, P2A, P3A,<br>P1E, P2E, P3E</p>
</td>
<td><strong>Voyage: </strong>C3E, C4E, C5E, C7E</td>
</tr>
<tr>
<td>
<p>CONTRACT SIZE</p>
<p></p>
</td>
<td>
<p>1 day</p>
</td>
<td>
<p>1,000 metric tons</p>
</td>
</tr>
<tr>
<td>
<p>MINIMUM TICK</p>
<p></p>
</td>
<td>
<p>US$ 25</p>
</td>
<td>
<p>US$ 0.01</p>
</td>
</tr>
<tr>
<td>
<p>FINAL SETTLEMENT PRICE</p>
<p></p>
</td>
<td colspan="2" rowspan="1">
<p>The floating price will be the end-of-day price as supplied by the Baltic Exchange.</p>
<p><br><strong>All products:</strong> Final settlement price will be the mean of the daily Baltic Exchange spot price assessments for every trading day in the expiry month.</p>
<p><br><strong>Exception for P1A, P2A, P3A:</strong> Final settlement price will be the mean of the last 7 Baltic Exchange spot price assessments in the expiry month.</p>
</td>
</tr>
<tr>
<td>
<p>CONTRACT SERIES</p>
</td>
<td colspan="2" rowspan="1">
<p><strong><strong>CTC, CPT, PTC, STC, SPT, HTC, P5TC</strong>:</strong> Months, quarters and calendar years out to a maximum of 72 months</p>
<p><strong>C3E, C4E, C5E, C7E, P1A, P2A, P3A, P1E, P2E, P3E:</strong> Months, quarters and calendar years out to a maximum of 36 months</p>
</td>
</tr>
<tr>
<td>
<p>SETTLEMENT</p>
</td>
<td colspan="2" rowspan="1">
<p>At 13:00 hours (UK time) on the last business day of each month within the contract series</p>
</td>
</tr>
</tbody>
</table>
您可以从下面的网站链接中看到代码
如果您的用例是擦除所有文本:
您必须为所需的WebDriverWait引入visibility_of_element_located()
,并且可以使用以下任何一个Locator Strategies:
使用CSS_SELECTOR
:
driver.get('https://www.eex.com/en/products/global-commodities/freight')
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "article div:last-child table>tbody>tr td:nth-child(2)>p"))).text)
使用XPATH
:
driver.get('https://www.eex.com/en/products/global-commodities/freight')
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//h3[text()='Contract Specifications']//following::table[1]/tbody/tr//following::td[1]/p"))).text)
控制台输出:
Time Charter: CPT, CTC, PTC, STC, SPT, HTC, P5TC
Time Charter Trip: P1A, P2A, P3A,
P1E, P2E, P3E
注:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC