为什么即使代码运行没有错误,Spacy 也不执行训练管道?

问题描述 投票:0回答:1

我正在使用 Spacy 版本 3.5.0 使用一些虚拟数据训练自定义 NER 模型。下面给出了我的整个代码和虚拟数据。 这与此链接的第二部分中给出的代码完全相同。代码运行良好,但它只执行到训练的初始化管道步骤,并且训练管道未执行。

知道为什么训练管道没有被执行吗?

import pandas as pd
import os
from tqdm import tqdm
from spacy.tokens import DocBin

train = [
          ("An average-sized strawberry has about 200 seeds on its outer surface and are quite edible.",{"entities":[(17,27,"Fruit")]}),
          ("The outer skin of Guava is bitter tasting and thick, dark green for raw fruits and as the fruit ripens, the bitterness subsides. ",{"entities":[(18,23,"Fruit")]}),
          ("Grapes are one of the most widely grown types of fruits in the world, chiefly for the making of different wines. ",{"entities":[(0,6,"Fruit")]}),
          ("Watermelon is composed of 92 percent water and significant amounts of Vitamins and antioxidants. ",{"entities":[(0,10,"Fruit")]}),
          ("Papaya fruits are usually cylindrical in shape and the size can go beyond 20 inches. ",{"entities":[(0,6,"Fruit")]}),
          ("Mango, the King of the fruits is a drupe fruit that grows in tropical regions. ",{"entities":[(0,5,"Fruit")]}),
          ("undefined",{"entities":[(0,6,"Fruit")]}),
          ("Oranges are great source of vitamin C",{"entities":[(0,7,"Fruit")]}),
          ("A apple a day keeps doctor away. ",{"entities":[(2,7,"Fruit")]})
        ]

db = DocBin() # create a DocBin object

for text, annot in tqdm(train): # data in previous format
    doc = nlp.make_doc(text) # create doc object from text
    ents = []
    for start, end, label in annot["entities"]: # add character indexes
        span = doc.char_span(start, end, label=label, alignment_mode="contract")
        if span is None:
            print("Skipping entity")
        else:
            ents.append(span)
    doc.ents = ents # label the text with the ents
    db.add(doc)

db.to_disk("./train.spacy") # save the docbin object

!python -m spacy init fill-config base_config.cfg config.cfg

!python -m spacy train config.cfg --output ./output --paths.train ./train.spacy --paths.dev ./train.spacy

预期产量

enter image description here

我得到的输出

enter image description here

python nlp nltk spacy named-entity-recognition
1个回答
0
投票

我自己在寻找宽敞的火车资源时发现了这个,并注意到最后一行有问题。

!python -m spacy train config.cfg --output ./output --paths.train ./train.spacy --paths.dev ./train.spacy

本例中的训练数据集和开发数据集均来自同一来源“./train.spacy”。可能正是这种重叠导致训练在第一个时期之后结束,其中评估分数从 0.12 跃升至 1.00。

最新问题
© www.soinside.com 2019 - 2024. All rights reserved.