parsing 相关问题

我有一本以下格式的字典：键（字符串）：值（列表[字符串]） my_dict = {'Foo': ['Lorem', 'Ipsum', 'Dolor', 'Baz'], 'Bar': ['Amet', 'Consectetur'], 'Baz': ['...'] , 'Lorem': ['......

python dictionary parsing

回答 3 投票 0

Scanner.readLine() 与自定义行分隔符

我有一个字符串，需要使用 Scanner 类从头到尾读取特点。问题是在我的例子中源流可能包含字符 \u2028。我知道...

java parsing text java.util.scanner readline

回答 1 投票 0

现在，我怎样才能从屏幕上抓取这样的html行（使用java）？

我正在尝试屏幕抓取 html 页面，以便我可以从中提取所需的有价值的数据并将其放入文本文件中。到目前为止一切都很顺利，直到我在 html 页面中遇到了这个：我正在尝试筛选抓取一个 html 页面，以便我可以从中提取所需的有价值的数据并将其放入文本文件中。到目前为止，一切进展顺利，直到我在 html 页面中遇到了这个： <td> In inventory: 0.3 kg Equipped: -4.5 kg 页面 html 代码中的上述行通常会有所不同。因此，它需要找到一种方法来扫描线（无论它包含什么）的重量（在本例中为 0.3 和 -4.5）并将这些数据存储到 2 个单独的 double 中因此： double inventoryWeight = 0.3 double 装备重量 = -4.5 我希望使用纯java来完成此操作；如果需要，请随时通知我可以在我的 java 应用程序中执行的任何第三方程序来实现此目的（但如果是这样，请生动地解释）。非常感谢！ RegEx 通常是抓取文本的一个很好的解决方案。括号表示“捕获组”，这些组被存储起来，然后可以使用 Matcher.group() 进行访问。 [-.\d]+ 匹配由一位或多位数字 (0-9)、句点和连字符组成的任何内容。 .* 匹配任何内容（但有时不匹配换行符）。在这里它只是用来本质上“扔掉”你不关心的一切。 import java.util.regex.*; public class Foo { public static void main(String[] args) { String regex = ".*inventory<\\/b>: ([-.\\d]+).*Equipped<\\/b>: ([-.\\d]+).*"; String text = "<td> In inventory: 0.3 kg Equipped: -4.5 kg"; // Look for a match Pattern pattern = Pattern.compile(regex); Matcher matcher = pattern.matcher(text); // Get the matched text if (matcher.matches()) { String inventoryWeight = matcher.group(1); String equippedWeight = matcher.group(2); System.out.println("Inventory weight: " + inventoryWeight); System.out.println("Equipped weight: " + equippedWeight); } else { System.out.println("No match!"); } } } 你有这段 html 作为字符串吗？如果是这样，只需搜索Equipped。然后获取 Equipped 结束字符位置加一。然后通过逐个附加字符来构建新字符串，直到它不是数字或点。当您在字符串变量中包含这些数字时，您只需使用 double aDouble = Double.parseDouble(aString) 将它们转换为双精度数即可

java html parsing web-scraping

回答 2 投票 0

使用python解析.txt文件时出错

我写了一个小片段，帮助我浏览各种 .txt 数据文件。基本上它们看起来像这样： x 值 y 值 x 值 y 值 x 值 y 值 ... 或使用 repr() 函数

python parsing txt

回答 1 投票 0

如何在 javascript 中解析大型 Excel 工作表

我需要解析大小高达 5GM 的大型 Excel 工作表，但出现此错误解析 XLSX 时出错：RangeError：字符串长度无效在 Array.join () 在 concat (C:\Users\Pie-Cyfer\Desktop\

excel typescript parsing exceljs sheetjs

回答 1 投票 0

从 DOM 树中提取节点值

我希望下面的代码能够回显在装备元素内找到的字符串。这不应该起作用吗？ loadHTML('http://website.com'); $电...

php html parsing web-scraping

回答 1 投票 0

在 JMeter 中使用 Xpath 解析 HTML

我无法通过以下请求提取 Google 在搜索过程中找到的资源的链接（页面：https://google.com/search?q=example）： //div[contains(@class, 'g')]//a[descendant::h3]/@href.

html xml parsing xpath jmeter

回答 1 投票 0

是否可以在第 2 列中搜索某个值，然后从第 4 列中提取该值的相邻值？

数据表图片我可以毫无问题地搜索并提取数据表中第 2 列中的值，但从第 4 列中提取与其一致的值是我遇到问题的地方。我试过了...

parsing variables multiple-columns powerbi-desktop power-automate

回答 1 投票 0

将许多头文件从项目导入lldb

我想知道是否有一个选项可以将许多头文件（C++头文件）导入到lldb中。我的目标是能够将地址解析为结构/对象，例如： p *(some_struct *)<

parsing header lldb

回答 1 投票 0

如何使用 jsoniter 序列化 Map[String, Any]

我想采用 jsoniter，一个很棒的 Scala JSON 序列化器库。我正在使用 Scala 3。唯一的障碍是我需要序列化它：地图[字符串，任意] 其中 Any 是一个类，其中给定的 co...

json scala parsing scala-3 jsoniter-scala

回答 1 投票 0

如何将正则表达式解析为（类似）AST？

强调一下，我不想“使用正则表达式进行解析” - 我想“将正则表达式解析为符号树”。（搜索只找到了前者......）我的用例：通过

python regex parsing

回答 2 投票 0

Python 库将正则表达式解析为 AST？

强调一下，我不想“使用正则表达式进行解析” - 我想“将正则表达式解析为符号树”。（搜索只找到了前者......）我的用例：通过

python regex parsing

回答 2 投票 0

使用 Cassava 和 Attoparsec 解析自定义字段

我有一个 CSV，其中包含我必须解析的单位值的字段。举个简单的例子：数据 EValue = 法拉 Double |双微法 |皮法双倍因此我需要解析

csv parsing haskell attoparsec

回答 1 投票 0

在 Python 中使用设置（配置）文件的最佳实践是什么？ [已关闭]

目标是通过编写可动态添加项目的配置（设置）文件来简化在 Python 程序中使用许多参数。使用设置（配置）文件的最佳实践是什么或

python parsing configuration yaml settings

回答 4 投票 0

如何在Python中高效解析HTML？

我想在没有外部库的情况下有效地解析 HTML 代码。我已经尝试过使用 for 循环来检查它是哪个符号：列表=[] html =“”“... 我想在没有外部库的情况下有效地解析 HTML 代码。我已经尝试过使用 for 循环来检查它是哪个符号： list = [] html = """<html>Hello</html>""" m = 0 for a in html: if a == "<": m = 1 list.append([]) elif a == ">": m = 0 list.append([]) else: list[-1] = a print(list) 但是代码在 50KB 文件上非常慢。我可以建议从一个简单的 HTML 解析器开始，如下所示？它使用Python自带的标准库，没有外部依赖。您可能需要根据需要更改和扩展它，但它为您提供了一个基本的 DOM API，这应该是一个很好的工作起点。该代码适用于它要解决的简单情况；但根据您的需求，您可能需要添加更多功能来实现您的最终目标。 #! /usr/bin/env python3 import html.parser import pprint import xml.dom.minidom def main(): # noinspection PyPep8 document = ''' <html><head><title>The Dormouse's story</title></head> <body> The Dormouse's story Once upon a time there were three little sisters; and their names were <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>, <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>; and they lived at the bottom of a well. ... ''' parser = DocumentParser() parser.feed(document) parser.close() model = parser.document.documentElement model.normalize() print(model.toprettyxml()) first_title = model.getElementsByTagName('title')[0] print(first_title.toxml()) print(first_title.tagName) print(first_title.firstChild.data) print(first_title.parentNode.tagName) first_p = model.getElementsByTagName('p')[0] print(first_p.toxml()) print(first_p.getAttribute('class')) all_a = model.getElementsByTagName('a') print(all_a[0].toxml()) pprint.pprint([element.toxml() for element in all_a]) pprint.pprint([element.toxml() for element in find(model, id='link3')]) for element in all_a: print(element.getAttribute('href')) print(*get_text(model), sep='\n') class DocumentParser(html.parser.HTMLParser): # noinspection SpellCheckingInspection def __init__(self, *, convert_charrefs=True): super().__init__(convert_charrefs=convert_charrefs) self.document = self.focus = xml.dom.minidom.DOMImplementation() \ .createDocument(None, None, None) @property def document_has_focus(self): return self.document is self.focus def handle_starttag(self, tag, attrs): element = self.document.createElement(tag) for name, value in attrs: element.setAttribute(name, value) self.focus.appendChild(element) self.focus = element def handle_endtag(self, tag): while self.focus.tagName != tag: self.focus = self.focus.parentNode self.focus = self.focus.parentNode def handle_data(self, data): if not self.document_has_focus and not data.isspace(): self.focus.appendChild(self.document.createTextNode(data.strip())) def error(self, message): raise RuntimeError(message) def close(self): super().close() while not self.document_has_focus: self.focus = self.focus.parentNode def find(element, **kwargs): get_attribute = getattr(element, 'getAttribute', None) if get_attribute and \ all(get_attribute(key) == value for key, value in kwargs.items()): yield element for child in element.childNodes: yield from find(child, **kwargs) def get_nodes_by_type(node, node_type): if node.nodeType == node_type: yield node for child in node.childNodes: yield from get_nodes_by_type(child, node_type) def get_text(node): return (node.data for node in get_nodes_by_type(node, node.TEXT_NODE)) if __name__ == '__main__': main()

python parsing

回答 1 投票 0

parsing 相关问题

最新问题