文本分类器训练数据未通过spacy调试数据CLI正确加载

Question

背景

我正在尝试在Google Colab笔记本中的Spacy中训练multiclass（标签互斥）文本分类模型。这些类是>

正面
负
中性

我将训练数据形成为指定为here的注释格式>

以下是我制作的注释示例

[.
.
["Happy #MothersDay to all ... ", {'cats': {'NEUTRAL': 1.0}}],
["Happy mothers day ..", {"cats": {"POSITIVE": 1.0}}],
.
.]
问题
[当我尝试使用以下命令在spacy CLI中使用debug-data选项调试数据时（在Jupyter笔记本中完成）

%%bash
(python -m spacy debug-data en \
    /content/drive/My\ Drive/Spacy/Pretrained/train_clas.json \
    /content/drive/My\ Drive/Spacy/Pretrained/eval_clas.json \
    -p 'textcat' \
)
我得到以下输出
=========================== Data format validation ===========================
✔ Corpus is loadable

=============================== Training stats ===============================
Training pipeline: textcat
Starting with blank model 'en'
0 training docs
0 evaluation docs
✘ No evaluation docs
✔ No overlap between training and evaluation data
✘ Low number of examples to train from a blank model (0)

============================== Vocab & Vectors ==============================
ℹ 0 total words in the data (0 unique)
ℹ No word vectors present in the model

============================ Text Classification ============================
ℹ Text Classification: 0 new label(s), 0 existing label(s)
ℹ The train data contains only instances with mutually-exclusive
classes.

================================== Summary ==================================
✔ 2 checks passed
✘ 2 errors
它无法正确读取数据，但是我已经检查了文件，并且至少有1000个以上的样本。
链接到train和eval json。

我在数据中找不到任何错误，有人可以指出该错误吗？，谢谢！！

背景我正在尝试在Google Colab笔记本中的Spacy中训练多类（标签互斥）文本分类模型。这些类是正负中性，我形成了...

Answer 1

spacy debug-data命令期望使用spacy的内部JSON训练格式的数据，在这里描述：https://spacy.io/api/annotation#json-input）

这里有一些示例：https://github.com/explosion/spaCy/tree/master/examples/training/textcat_example_data。同一目录中的转换脚本显示了如何从JSONL格式进行转换，该格式与示例脚本中使用的TRAIN_DATA类型格式非常相似。

文本分类器训练数据未通过spacy调试数据CLI正确加载

问题描述投票：0回答：1

背景

问题

1个回答

最新问题

文本分类器训练数据未通过spacy调试数据CLI正确加载

问题描述 投票：0回答：1

背景

问题

1个回答

最新问题

问题描述投票：0回答：1