我想将 Shiny for Python 文档转换为 pdf。可以跳到每个部分然后打印为 pdf。但是,想知道是否有一种更紧凑的方法可以一次性打印所有部分。
wkhtmltopdf
和Python的解决方案(抓取文档不同部分的html文件的链接并将它们传递给pdfkit
,一个Python库,它是wkhtmltopdf
的包装器)将 HTML 转换为 PDF 的实用程序。
wkhtmltopdf
,然后在您的系统上安装此工具(您可以阅读this以获取有关安装过程的帮助,如果您是Windows用户,请记住将wkhtmltopdf
添加到PATH) .
然后通过 cmd/shell 检查其可用性,
$ wkhtmltopdf --version
# wkhtmltopdf 0.12.6 (with patched qt)
现在安装这些Python库(假设你已经安装了Python),
pip install requests beautifulsoup4 pdfkit
然后运行这个Python脚本,
$ python html2pdf.py
html2pdf.py
import re
import pdfkit
import requests
from bs4 import BeautifulSoup
# Making a GET request
r = requests.get('https://shiny.rstudio.com/py/docs/get-started.html')
# print(r.status_code)
# Parsing the HTML
soup = BeautifulSoup(r.content, 'html.parser')
a = soup.find_all('a', class_='sidebar-link')
# get the links
links = [link.get('href') for link in a if link.get('href') is not None]
site_link = 'https://shiny.rstudio.com/py'
full_links = [site_link + link[2:] for link in links]
# for file names
names = [re.findall("(?:.+\/)(.+)(?:.html)", link)[0] for link in full_links]
# convert the link of htmls to pdf
for i, link in enumerate(full_links):
pdfkit.from_url(link, f"{names[i]}.pdf")
它将把所有html文件(https://shiny.rstudio.com/py/docs/侧边栏中的链接)一次性转换为pdf文件。
$ ls
get-started.pdf reactive-programming.pdf ui-navigation.pdf
html2pdf.py reactive-values.pdf ui-page-layouts.pdf
overview.pdf running-debugging.pdf ui-static.pdf
putting-it-together.pdf server.pdf user-interface.pdf
reactive-calculations.pdf ui-dynamic.pdf workflow-modules.pdf
reactive-events.pdf ui-feedback.pdf workflow-server.pdf
reactive-mutable.pdf ui-html.pdf