我想做一个获取字符串的函数,如果它有内联注释,它将删除它。我知道这听起来很简单,但我想确保我做得正确,例如:
private String filterString(String code) {
// lets say code = "some code //comment inside"
// return the string "some code" (without the comment)
}
我想到了两种方法:否则请随时提出建议
你能告诉我最好的方法是什么并告诉我应该如何做吗? (请不要建议太高级的解决方案)
编辑:可以使用 Scanner 对象以某种方式完成此操作吗? (无论如何我都在使用这个对象)
如果您想要一个更高效的正则表达式来真正匹配所有类型的注释,请使用这个:
replaceAll("(?:/\\*(?:[^*]|(?:\\*+[^*/]))*\\*+/)|(?://.*)","");
来源:http://ostermiller.org/findcomment.html
编辑:
如果您不确定是否使用正则表达式,另一种解决方案是设计一个小型自动机,如下所示:
public static String removeComments(String code){
final int outsideComment=0;
final int insideLineComment=1;
final int insideblockComment=2;
final int insideblockComment_noNewLineYet=3; // we want to have at least one new line in the result if the block is not inline.
int currentState=outsideComment;
String endResult="";
Scanner s= new Scanner(code);
s.useDelimiter("");
while(s.hasNext()){
String c=s.next();
switch(currentState){
case outsideComment:
if(c.equals("/") && s.hasNext()){
String c2=s.next();
if(c2.equals("/"))
currentState=insideLineComment;
else if(c2.equals("*")){
currentState=insideblockComment_noNewLineYet;
}
else
endResult+=c+c2;
}
else
endResult+=c;
break;
case insideLineComment:
if(c.equals("\n")){
currentState=outsideComment;
endResult+="\n";
}
break;
case insideblockComment_noNewLineYet:
if(c.equals("\n")){
endResult+="\n";
currentState=insideblockComment;
}
case insideblockComment:
while(c.equals("*") && s.hasNext()){
String c2=s.next();
if(c2.equals("/")){
currentState=outsideComment;
break;
}
}
}
}
s.close();
return endResult;
}
最好的方法是使用正则表达式。 首先找到
/**/
评论,然后删除所有 //
通讯网络。例如:
private String filterString(String code) {
String partialFiltered = code.replaceAll("/\\*.*\\*/", "");
String fullFiltered = partialFiltered.replaceAll("//.*(?=\\n)", "")
}
只需使用 String 类中的 replaceAll 方法,结合简单的 正则表达式。操作方法如下:
import java.util.*;
import java.lang.*;
class Main
{
public static void main (String[] args) throws java.lang.Exception
{
String s = "private String filterString(String code) {\n" +
" // lets say code = \"some code //comment inside\"\n" +
" // return the string \"some code\" (without the comment)\n}";
s = s.replaceAll("//.*?\n","\n");
System.out.println("s=" + s);
}
}
关键是那行:
s = s.replaceAll("//.*?\n","\n");
正则表达式//.*? 匹配以 // 开头直到行尾的字符串。
如果您想查看此代码的实际运行情况,请转到此处:http://www.ideone.com/e26Ve
希望有帮助!
@Christian Hujer 已正确指出,如果评论出现在字符串中,则发布的许多或所有解决方案都会失败。
@Loïc Gammaitoni 表示,他的自动机方法可以轻松扩展来处理这种情况。这是该扩展名。
enum State { outsideComment, insideLineComment, insideblockComment, insideblockComment_noNewLineYet, insideString };
public static String removeComments(String code) {
State state = State.outsideComment;
StringBuilder result = new StringBuilder();
Scanner s = new Scanner(code);
s.useDelimiter("");
while (s.hasNext()) {
String c = s.next();
switch (state) {
case outsideComment:
if (c.equals("/") && s.hasNext()) {
String c2 = s.next();
if (c2.equals("/"))
state = State.insideLineComment;
else if (c2.equals("*")) {
state = State.insideblockComment_noNewLineYet;
} else {
result.append(c).append(c2);
}
} else {
result.append(c);
if (c.equals("\"")) {
state = State.insideString;
}
}
break;
case insideString:
result.append(c);
if (c.equals("\"")) {
state = State.outsideComment;
} else if (c.equals("\\") && s.hasNext()) {
result.append(s.next());
}
break;
case insideLineComment:
if (c.equals("\n")) {
state = State.outsideComment;
result.append("\n");
}
break;
case insideblockComment_noNewLineYet:
if (c.equals("\n")) {
result.append("\n");
state = State.insideblockComment;
}
case insideblockComment:
while (c.equals("*") && s.hasNext()) {
String c2 = s.next();
if (c2.equals("/")) {
state = State.outsideComment;
break;
}
}
}
}
s.close();
return result.toString();
}
使用正则表达式替换来查找常量子字符串之前的子字符串有点多。
您可以使用
indexOf()
检查注释开始的位置,并使用 substring()
获取第一部分,例如:
String code = "some code // comment";
int offset = code.indexOf("//");
if (-1 != offset) {
code = code.substring(0, offset);
}
我为此目的制作了一个开源库(在 GitHub 上),它称为 CommentRemover,您可以删除单行和多行 Java 注释。
它支持删除或不删除TODO。
它还支持 JavaScript、HTML、CSS、Properties、JSP 和 XML Comments。
如何使用它的小代码片段(有 2 种用法):
第一种方式InternalPath
public static void main(String[] args) throws CommentRemoverException {
// root dir is: /Users/user/Projects/MyProject
// example for startInternalPath
CommentRemover commentRemover = new CommentRemover.CommentRemoverBuilder()
.removeJava(true) // Remove Java file Comments....
.removeJavaScript(true) // Remove JavaScript file Comments....
.removeJSP(true) // etc.. goes like that
.removeTodos(false) // Do Not Touch Todos (leave them alone)
.removeSingleLines(true) // Remove single line type comments
.removeMultiLines(true) // Remove multiple type comments
.startInternalPath("src.main.app") // Starts from {rootDir}/src/main/app , leave it empty string when you want to start from root dir
.setExcludePackages(new String[]{"src.main.java.app.pattern"}) // Refers to {rootDir}/src/main/java/app/pattern and skips this directory
.build();
CommentProcessor commentProcessor = new CommentProcessor(commentRemover);
commentProcessor.start();
}
第二种方式外部路径
public static void main(String[] args) throws CommentRemoverException {
// example for externalPath
CommentRemover commentRemover = new CommentRemover.CommentRemoverBuilder()
.removeJava(true) // Remove Java file Comments....
.removeJavaScript(true) // Remove JavaScript file Comments....
.removeJSP(true) // etc..
.removeTodos(true) // Remove todos
.removeSingleLines(false) // Do not remove single line type comments
.removeMultiLines(true) // Remove multiple type comments
.startExternalPath("/Users/user/Projects/MyOtherProject")// Give it full path for external directories
.setExcludePackages(new String[]{"src.main.java.model"}) // Refers to /Users/user/Projects/MyOtherProject/src/main/java/model and skips this directory.
.build();
CommentProcessor commentProcessor = new CommentProcessor(commentRemover);
commentProcessor.start();
}
对于扫描仪,使用分隔符,
分隔符示例。
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.util.Scanner;
public class MainClass {
public static void main(String args[]) throws IOException {
FileWriter fout = new FileWriter("test.txt");
fout.write("2, 3.4, 5,6, 7.4, 9.1, 10.5, done");
fout.close();
FileReader fin = new FileReader("Test.txt");
Scanner src = new Scanner(fin);
// Set delimiters to space and comma.
// ", *" tells Scanner to match a comma and zero or more spaces as
// delimiters.
src.useDelimiter(", *");
// Read and sum numbers.
while (src.hasNext()) {
if (src.hasNextDouble()) {
System.out.println(src.nextDouble());
} else {
break;
}
}
fin.close();
}
}
对普通字符串使用分词器
标记器:
// start with a String of space-separated words
String tags = "pizza pepperoni food cheese";
// convert each tag to a token
StringTokenizer st = new StringTokenizer(tags," ");
while ( st.hasMoreTokens() )
{
String token = (String)st.nextToken();
System.out.println(token);
}
http://www.devdaily.com/blog/post/java/java-faq-stringtokenizer-example
如果代码能单独处理单行注释和多行注释会更好。有什么建议吗?
public class RemovingCommentsFromFile {
public static void main(String[] args) throws IOException {
BufferedReader fin = new BufferedReader(new FileReader("/home/pathtofilewithcomments/File"));
BufferedWriter fout = new BufferedWriter(new FileWriter("/home/result/File1"));
boolean multilinecomment = false;
boolean singlelinecomment = false;
int len,j;
String s = null;
while ((s = fin.readLine()) != null) {
StringBuilder obj = new StringBuilder(s);
len = obj.length();
for (int i = 0; i < len; i++) {
for (j = i; j < len; j++) {
if (obj.charAt(j) == '/' && obj.charAt(j + 1) == '*') {
j += 2;
multilinecomment = true;
continue;
} else if (obj.charAt(j) == '/' && obj.charAt(j + 1) == '/') {
singlelinecomment = true;
j = len;
break;
} else if (obj.charAt(j) == '*' && obj.charAt(j + 1) == '/') {
j += 2;
multilinecomment = false;
break;
} else if (multilinecomment == true)
continue;
else
break;
}
if (j == len)
{
singlelinecomment=false;
break;
}
else
i = j;
System.out.print((char)obj.charAt(i));
fout.write((char)obj.charAt(i));
}
System.out.println();
fout.write((char)10);
}
fin.close();
fout.close();
}
简单的解决方案,不会删除额外的代码部分(如上面的代码) // 适用于任何读者,您也可以迭代字符串列表
String str="";
String s;
while ((s = reader.readLine()) != null)
{
s=s.replaceAll("//.*","\n");
str+=s;
}
str=str.replaceAll("/\\*.*\\*/"," ");
我不确定这是否有效,但它似乎保留了字符串文字(通过了我所有的 7 个测试)
public static String removeJavaComments(String line) {
StringBuilder builder = new StringBuilder();
char[] lineChars = line.toCharArray();
boolean quoted = false;
boolean commented = false;
boolean line_commented = false;
for(int pos = 0; pos < lineChars.length; ++pos) {
switch(lineChars[pos]) {
case '"':
if(!(commented || line_commented)) {
quoted = !quoted;
builder.append(lineChars[pos]);
}
break;
case '/':
if(quoted) {
builder.append(lineChars[pos]);
} else if(lineChars[pos+1] == '/') {
line_commented = true;
} else if(!line_commented && lineChars[pos+1] == '*') {
commented = true;
} else if(commented && !line_commented && lineChars[pos-1] == '*') {
commented = false;
}
break;
case '\n':
line_commented = false;
builder.append(lineChars[pos]);
break;
default:
if(!(commented || line_commented)) {
builder.append(lineChars[pos]);
}
}
}
return builder.toString();
}