我的 java 标签内容提取程序（hackerrank）有什么问题吗？

Question

我的代码通过了前两个测试用例，但第三个测试用例失败了。有人可以帮忙吗？

链接：https://www.hackerrank.com/challenges/tag-content-extractor

问题陈述：

在基于标签的语言（例如 XML 或 HTML）中，内容包含在开始标签和结束标签之间。请注意，相应的结束标记以

开头。

给定基于标签的语言的文本字符串，解析该文本并检索满足以下标准的组织良好的标签序列中包含的内容：

开始标签和结束标签的名称必须相同。
标签可以嵌套，但嵌套标签之间的内容被视为无效
标签可以包含任何可打印字符。

输入格式：

输入的第一行包含一个整数，N（行数）。随后的 N 行每行包含一行文本。

限制：

```
1 <= N <= 100
```
每行最多包含
```
10000
```
个可打印字符。
所有测试用例的字符总数不会超过
```
1000000
```
。

输出格式：

对于每一行，打印有效标签内包含的内容。如果一行包含多个有效内容实例，则在新行上打印出每个有效内容实例；如果没有找到有效内容，则打印 None。

我的代码：

import java.io.*;
import java.util.*;   
import java.text.*;    
import java.math.*;
import java.util.regex.*;

public class Solution {
public static void main(String[] args) {
    Scanner in = new Scanner(System.in);
    int testCases = Integer.parseInt(in.nextLine());

    while(testCases > 0) {
        String line = in.nextLine();
        char[] A = line.toCharArray();
        String tag = "", tag1 = "";
        int a1 = 0, b1 = 0;
        int a = 0, b = 0;
        int flag = 0, end = 0;

        a = line.indexOf('<', a1);
        b = line.indexOf('>', b1);
        //System.out.println("Index of first '<' is " + a);
        //System.out.println("Index of first '>' is " + b);

        while ((a != -1) && (b != -1) && b < line.lastIndexOf(">")) {
            tag = "";
            tag1 = "";
            //System.out.println("Index of first '<' is " + a);
            //System.out.println("Index of first '>' is " + b);
            for (int k = a + 1; k < b; k++)
                tag = tag + A[k];
            //System.out.println("tag is " + tag);

            a1 = line.indexOf('<', a + 1);
            b1 = line.indexOf('>', b + 1);

            if (A[a1+1] == '/') {
                //System.out.println("Index of second '<' is " + a1);
                //System.out.println("Index of second '>' is " + b1);   
                for (int k = a1 + 2; k < b1; k++)
                    tag1 = tag1 + A[k];
                if ((!tag.isEmpty()) && (!tag1.isEmpty())) {    
                    if (tag.equals(tag1)) {  
                        if ((b + 1) == a1) {
                            System.out.println("None");
                            flag = 1;
                        } else {
                            for (int k = b + 1; k < a1; k++)
                                System.out.print(A[k]);
                            System.out.println();
                            flag = 1;
                        }
                    } else if (flag == 0) {
                        System.out.println("None");
                        flag = 1;
                    }
                }   
            } 
            a = a1;
            b = b1;
            //System.out.println("tag1 is " + tag1);
        }
        if ((b == -1 || a == -1 || tag1.isEmpty() || tag.isEmpty()) && (flag == 0)) {
            System.out.println("None");
        }
        testCases--;
    }
 }
}

编辑：对于测试用例#3，我无法调试问题，为什么大字符串会被逐行解析，而它必须解析整个段落！如果它能将其作为一个整体进行解析，那么我就会得到正确的输出。

Answer 1

您可以在这里获取：

import java.io.*;
import java.util.*;
import java.text.*;
import java.math.*;
import java.util.regex.*;
public class Solution{
public static void main(String[] args){
  Scanner in = new Scanner(System.in);
  int testCases = Integer.parseInt(in.nextLine());
  while(testCases>0){

         String line=in.nextLine();
        int cur=0;
        boolean none=true;
        for(;;){
            int start=line.indexOf("<",cur);
            if(start<0)break;
            int end=line.indexOf(">",start);
            if(end<0)break;
            String tag=line.substring(start+1,end);
            if(tag.length()==0 || tag.charAt(0)=='/'){
                cur=end+1;
                continue;
            }
            int brk=line.indexOf("</"+tag+">");
            if(brk>=0){
                String output=line.substring(end+1,brk);
                if(output.length()>0 && output.indexOf("<")<0){
                    none=false;
                    System.out.println(output);
                }
            }
            cur=end+1;
        }
        if(none)System.out.println("None");
        testCases--;
    }
  }
 }

Answer 2

借助regex

的力量可以轻松解决

import java.io.*;
import java.util.*;
import java.text.*;
import java.math.*;
import java.util.regex.*;

public class Solution{
    public static void main(String[] args){
        
        Scanner in = new Scanner(System.in);
        int testCases = Integer.parseInt(in.nextLine());
        
        String patternString ="<(.+)>([\\w]+[^<]*)</(\\1)>";
        Pattern pattern = Pattern.compile(patternString);
        
        while(testCases > 0 ){
            
            String line = in.nextLine();
            Matcher matcher = pattern.matcher(line);
            
            boolean found = false;
            
            while (matcher.find()) {
                System.out.println(matcher.group(2));
                found = true;
            } 
            
            if (!found) {
                System.out.println("None");
            }
            
            testCases--;
        }
        in.close();
    }
}

Answer 3

通过使用常规表达式我们可以轻松做到这一点

public class Solution{
public static void main(String[] args){
    
    Scanner in = new Scanner(System.in);
    int testCases = Integer.parseInt(in.nextLine());
    while(testCases>0){
        String line = in.nextLine();
        Pattern pattern = Pattern.compile("<(.+)>([^<]+)</\\1>");
        Matcher matcher = pattern.matcher(line);
        if(matcher.find()){
            do{
                System.out.println(matcher.group(2));
            } while(matcher.find());
        } else{
            System.out.println("None");
        }
        testCases--;
    }
}

}

我的 java 标签内容提取程序（hackerrank）有什么问题吗？

问题描述投票：0回答：3

3个回答

最新问题

我的 java 标签内容提取程序（hackerrank）有什么问题吗？

问题描述 投票：0回答：3

3个回答

最新问题

问题描述投票：0回答：3