我正在使用 Spacy 版本 3.5.0 使用一些虚拟数据训练自定义 NER 模型。下面给出了我的整个代码和虚拟数据。 这与此链接的第二部分中给出的代码完全相同。代码运行良好,但它只执行到训练的初始化管道步骤,并且训练管道未执行。
知道为什么训练管道没有被执行吗?
import pandas as pd
import os
from tqdm import tqdm
from spacy.tokens import DocBin
train = [
("An average-sized strawberry has about 200 seeds on its outer surface and are quite edible.",{"entities":[(17,27,"Fruit")]}),
("The outer skin of Guava is bitter tasting and thick, dark green for raw fruits and as the fruit ripens, the bitterness subsides. ",{"entities":[(18,23,"Fruit")]}),
("Grapes are one of the most widely grown types of fruits in the world, chiefly for the making of different wines. ",{"entities":[(0,6,"Fruit")]}),
("Watermelon is composed of 92 percent water and significant amounts of Vitamins and antioxidants. ",{"entities":[(0,10,"Fruit")]}),
("Papaya fruits are usually cylindrical in shape and the size can go beyond 20 inches. ",{"entities":[(0,6,"Fruit")]}),
("Mango, the King of the fruits is a drupe fruit that grows in tropical regions. ",{"entities":[(0,5,"Fruit")]}),
("undefined",{"entities":[(0,6,"Fruit")]}),
("Oranges are great source of vitamin C",{"entities":[(0,7,"Fruit")]}),
("A apple a day keeps doctor away. ",{"entities":[(2,7,"Fruit")]})
]
db = DocBin() # create a DocBin object
for text, annot in tqdm(train): # data in previous format
doc = nlp.make_doc(text) # create doc object from text
ents = []
for start, end, label in annot["entities"]: # add character indexes
span = doc.char_span(start, end, label=label, alignment_mode="contract")
if span is None:
print("Skipping entity")
else:
ents.append(span)
doc.ents = ents # label the text with the ents
db.add(doc)
db.to_disk("./train.spacy") # save the docbin object
!python -m spacy init fill-config base_config.cfg config.cfg
!python -m spacy train config.cfg --output ./output --paths.train ./train.spacy --paths.dev ./train.spacy
预期产量
我得到的输出
我自己在寻找宽敞的火车资源时发现了这个,并注意到最后一行有问题。
!python -m spacy train config.cfg --output ./output --paths.train ./train.spacy --paths.dev ./train.spacy
本例中的训练数据集和开发数据集均来自同一来源“./train.spacy”。可能正是这种重叠导致训练在第一个时期之后结束,其中评估分数从 0.12 跃升至 1.00。