获取扫描仪类别使用的当前定界符

Question

当分隔符为正则表达式时，是否有可能获得Scanner正在使用的current分隔符？例如，我有此代码：

        String dictionary = "computer: A computer is an electronic machine that can store\n"
                          + "          and deal with large amounts of information.\n"
                          + "computer-aided: done or improved by computer\n"; 
        Scanner src = new Scanner(dictionary);
        String delimiterRegex = "^(.+?:)"; // <-- Matches a new term
        Pattern delimiterPattern = Pattern.compile(delimiterRegex, Pattern.MULTILINE);
        src.useDelimiter(delimiterPattern);
        String definition = "";
        String term = "";

        while(src.hasNext())
        {
            definition = src.next();
            term = ???????; // <--- The term is the current delimiter match
        }

这是获得所有定义的非常简单的方法，只要我也可以得到该术语。

Answer 1

[Scanner API无法执行此操作。

但是，如果查看Scanner的源代码，则会看到有一个私有的Matcher对象用于匹配定界符。如果您愿意（通过讨厌的反射）打开Scanner抽象，则可以从匹配器中提取所需的信息...如果在适当的时间进行了检查。

如果您要尝试此操作，我的建议是使用Scanner源代码创建您自己的自定义扫描仪类。这将使您的代码不受标准Scanner类的实现更改的影响。

请确保您从OpenJDK获得源代码，并满足文件上“ GPLv2”许可的要求。

Answer 2

这是XY problem。

而不是尝试获取扫描程序的matched定界符（这是实现细节），您应该重写定界符regex，以便next返回所需的内容。

例如：

// this matches both the zero-width string before the term, and the zero-width string after the colon
String delimiterRegex = "^(?=.+?:)|(?<=:)";
Pattern delimiterPattern = Pattern.compile(delimiterRegex, Pattern.MULTILINE);
src.useDelimiter(delimiterPattern);
String definition = "";
String term = "";

while(src.hasNext())
{
    term = src.next(); // read the term first!
    definition = src.next();
}

或者，只需使用一个正则表达式即可。我想出了：

Pattern p = Pattern.compile("([^:\r\n]+?:)([\\s\\S]+?)(?=^[^:\r\n]+?:|\\z)", Pattern.MULTILINE);
Matcher m = p.matcher(dictionary);
while (m.find()) {
    String term = m.group(1);
    String definition = m.group(2);
}

获取扫描仪类别使用的当前定界符

问题描述投票：0回答：2

2个回答

最新问题

获取扫描仪类别使用的当前定界符

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2