尝试解析由(,)分隔并使用引号分隔的CSV文件

问题描述 投票:1回答:1

我没有在网上找到太多帮助。我有一个我要解析的CSV文件。分隔符是逗号,但是如果它是字段的一部分我想要一个逗号被忽略,所以我使用引号。当我的字段中没有逗号时,我的方法效果很好。但是,当我通过向其中一个字段添加逗号来尝试它时,期望它被视为单个记录,我得到一个ArrayIndexOutOfBoundsException错误。这是我的代码。我用AsyncTask运行它。您会注意到我插入了代码 - r.get(1); r.get(2);这仅用于测试。 r.get(1)是抛出错误的行

class ParseCsvTask extends AsyncTask<File, Void, Void>{

        @Override
        protected void onPreExecute() {
            mProgressBar.setVisibility(View.VISIBLE);
        }

        @Override
        protected Void doInBackground(File... files) {
            BufferedReader reader = null;
            CSVParser parser = null;


            File file = files[0];

            CSVFormat formatter = CSVFormat.RFC4180.withFirstRecordAsHeader();

            try {
                reader = new BufferedReader(new FileReader(file));

                parser = CSVParser.parse(reader, formatter);

                List<CSVRecord> list = parser.getRecords();

                for (CSVRecord r : list) {
                    r.get(1);
                    r.get(2);
                    Competitor competitor = new Competitor(r.get(1), r.get(2));
                    if (!r.get(0).equals("")) {
                        competitor.setMemberNum(r.get(0));
                    }
                    if(!r.get(4).equals("")){
                        competitor.setEmail(r.get(4));
                    }
                    if(!r.get(5).equals("")){
                        competitor.setPhone(r.get(5));
                    }

                    switch (r.get(7)){
                        case "":
                            competitor.setAge(Competitor.Age.ADULT);
                            break;
                        case "Junior":
                            competitor.setAge(Competitor.Age.JUNIOR);
                            break;
                        case "Senior":
                            competitor.setAge(Competitor.Age.SENIOR);
                            break;
                        case "Super Senior":
                            competitor.setAge(Competitor.Age.SUPER_SENIOR);
                            break;
                        default:
                            break;
                    }

                    if(r.get(8).equals("")){
                        competitor.setLady(false);
                    } else {
                        competitor.setLady(true);
                    }

                    mImportedComps.add(competitor);

                }

                FileHelper.writeMasterCompetitorsFile(mContext, mImportedComps);

                Intent intent = new Intent(mContext, MasterCompetitorListActivity.class);
                startActivity(intent);

            } catch (Exception e) {
                e.printStackTrace();
                Log.d("record", "what is going on");
            } finally {
                try {
                    assert parser != null;
                    parser.close();
                    reader.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }

            return null;
        }

        @Override
        protected void onPostExecute(Void aVoid) {
            mProgressBar.setVisibility(View.INVISIBLE);
        }
    }

请记住:当我不在记录中使用逗号时,它很有用。 “名字”工作正常,但如果一条记录说“首先,名字”我得到错误。另外,我正在使用*org.apache.commons.csv*

有人建议我发布的这个问题可能是这篇文章的重复:Apache commons CSV: quoted input doesn't work。这篇文章的错误是invalid char between encapsulated token and delimiter和我的错误与一个数组索引超出范围的事实清楚地表明我们正在处理不同的场景。我没有被告知分隔符之间的任何无效字符。在我的案例中发生了一些不同的事情

这是我捕获此错误时调用的堆栈跟踪:

03-05 15:34:44.397 778-778/com.checkinsystems.ez_score D/ViewRootImpl@4ca832c[MasterCompetitorListActivity]: ViewPostImeInputStage processPointer 0
03-05 15:34:44.479 778-778/com.checkinsystems.ez_score D/ViewRootImpl@4ca832c[MasterCompetitorListActivity]: ViewPostImeInputStage processPointer 1
03-05 15:34:44.550 778-825/com.checkinsystems.ez_score W/System.err: java.lang.ArrayIndexOutOfBoundsException: length=1; index=1
03-05 15:34:44.550 778-825/com.checkinsystems.ez_score W/System.err:     at org.apache.commons.csv.CSVRecord.get(CSVRecord.java:79)
03-05 15:34:44.550 778-825/com.checkinsystems.ez_score W/System.err:     at com.checkinsystems.ez_score.ImportMasterCompsFileFragment$ParseCsvTask.doInBackground(ImportMasterCompsFileFragment.java:186)
03-05 15:34:44.550 778-825/com.checkinsystems.ez_score W/System.err:     at com.checkinsystems.ez_score.ImportMasterCompsFileFragment$ParseCsvTask.doInBackground(ImportMasterCompsFileFragment.java:158)
03-05 15:34:44.550 778-825/com.checkinsystems.ez_score W/System.err:     at android.os.AsyncTask$2.call(AsyncTask.java:304)
03-05 15:34:44.550 778-825/com.checkinsystems.ez_score W/System.err:     at java.util.concurrent.FutureTask.run(FutureTask.java:237)
03-05 15:34:44.550 778-825/com.checkinsystems.ez_score W/System.err:     at android.os.AsyncTask$SerialExecutor$1.run(AsyncTask.java:243)
03-05 15:34:44.550 778-825/com.checkinsystems.ez_score W/System.err:     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1133)
03-05 15:34:44.550 778-825/com.checkinsystems.ez_score W/System.err:     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:607)
03-05 15:34:44.550 778-825/com.checkinsystems.ez_score W/System.err:     at java.lang.Thread.run(Thread.java:762)
03-05 15:34:44.550 778-825/com.checkinsystems.ez_score D/record: what is going on

所以我发现了为什么ArrayIndexOutOfBounds错误被抛出。我跑了代码:

for(CSVRecord  r : list){
                    Log.d("record", r.toString());
                }

刚拿到清单后。我注意到,由于某种原因,我得到一个空白记录,然后是正确的记录。换句话说,这种模式重复,我在某种程度上获得了两倍于我需要的记录,但是每一个都是空白的,这就是为什么我会得到索引问题。但我仍然不明白为什么我会得到这些空白记录。这是调用代码的onClick按钮:

@Override
        public void onClick(View view) {

            File file = new File(Environment.getExternalStoragePublicDirectory(Environment.DIRECTORY_DOWNLOADS).getAbsolutePath()
                    + "/" + mFileName.getText().toString());

            new ParseCsvTask().execute(file);

        }

这是一些logcat输出....我已经更改了数据以隐藏人们的信息:

03-05 16:25:40.223 13019-13633/com.checkinsystems.ez_score D/record: CSVRecord [comment=null, mapping={member=0, first name=1, last name=2, name=3, email=4, phone=5, squad=6, age=7, gender=8, division=9, power factor=10, class=11, special =12}, recordNumber=1, values=[]]
03-05 16:25:40.223 13019-13633/com.checkinsystems.ez_score D/record: CSVRecord [comment=null, mapping={member=0, first name=1, last name=2, name=3, email=4, phone=5, squad=6, age=7, gender=8, division=9, power factor=10, class=11, special =12}, recordNumber=2, values=[A9J41, Bob, Al,len, Bob Allen, [email protected], 5555555555, 7, , , Production, Minor, D, ]]
03-05 16:25:40.223 13019-13633/com.checkinsystems.ez_score D/record: CSVRecord [comment=null, mapping={member=0, first name=1, last name=2, name=3, email=4, phone=5, squad=6, age=7, gender=8, division=9, power factor=10, class=11, special =12}, recordNumber=3, values=[]]
03-05 16:25:40.223 13019-13633/com.checkinsystems.ez_score D/record: CSVRecord [comment=null, mapping={member=0, first name=1, last name=2, name=3, email=4, phone=5, squad=6, age=7, gender=8, division=9, power factor=10, class=11, special =12}, recordNumber=4, values=[TY912111, Fred , Jones , Fred Jones , [email protected], 5555555555, 5, , , Revolver, Minor, C, ]]

请记住,这只发生在我在第一条记录的姓氏中间添加逗号时。如果我把那个逗号拿出来,它就可以了

java android csv delimiter
1个回答
0
投票

我解决了!我正在使用依赖于RFC4180标准的格式化程序。此标准默认如下:

withDelimiter(',')
withQuote('"')
withRecordSeparator("\r\n")
withIgnoreEmptyLines(false)

最后一个属性withIgnoreEmptyLines需要设置为true,否则格式化程序会在每隔一个记录之后插入一个空白记录。我不完全确定为什么在我的记录之间插入空白记录将是一个标准,但我用这一行修复它:

CSVFormat formatter = CSVFormat.RFC4180.withFirstRecordAsHeader()
                    .withIgnoreEmptyLines(true);

这就是为什么我得到ArrayIndexOutOfBounds

我希望这有助于其他人。谢谢大家帮助我解决这个问题

© www.soinside.com 2019 - 2024. All rights reserved.