I必须在Raspberry Pi 4B上运行对象检测器才能实时对象检测。为此,我决定使用Yolov8n。我必须实时运行检测器,并且由于我没有任何硬件加速器,因此我唯一的想法是修剪和量化模型。我只能检测到包括人类,动物和车辆在内的3个课程。因此,我已经自定义了可可数据集,并丢弃了所有类,除了11是我的目标类。然后我对50个时期的模型进行了微调。 我的下一个任务是修剪模型。我了解到结构化修剪用于修剪边缘设备的模型。我试图实施它并修剪我的模型。然后我用5个时代进行了微调。现在我不确定我是否正确完成了。在检查它的情况下,我尝试查看参数,并发现修剪之前和之后的参数相等。我使用的代码如下:
是我正确优化代码的策略吗?我是否正确地修剪。一旦我的修剪完成,就可以将模型量化为INT8的任何方法。我应该将模型导出到ONNX格式。 ONNX和ONNX-Runtime之间有什么区别。我也对参数感到困惑,当我打印模型摘要时,它显示了3007793参数,而当我打印参数时,它显示了3012993参数。为什么这样?
import torch
import torch.nn.utils.prune as prune
from ultralytics import YOLO
# Function to prune Conv2D layers (structured pruning for better performance)
def prune_model(model, amount=0.1):
for module in model.modules():
if isinstance(module, torch.nn.Conv2d):
prune.ln_structured(module, name='weight', amount=amount, n=2, dim=0) # Channel pruning
prune.remove(module, 'weight') # Make pruning permanent
return model
# Load YOLO model
model = YOLO('best.pt')
# Validate the original model
results_str = model.val(data="custom_coco.yaml")
print(f"Original mAP50-95: {results_str.box.map}")
# Access the PyTorch model
torch_model_structured = model.model
# Apply pruning
print("Pruning model...")
pruned_torch_model_structured = prune_model(torch_model_structured, amount=0.1)
print("Model pruned.")
# Save pruned model
torch.save(pruned_torch_model_structured.state_dict(), 'yolov8n_Structured_pruned_weights.pth')
print("Pruned model weights saved as 'yolov8n_Structured_pruned_weights.pth'.")
# Reload pruned weights into a YOLO model
pruned_model = YOLO('best.pt') # Load the original YOLO model
pruned_model.model.load_state_dict(torch.load('yolov8n_Structured_pruned_weights.pth'), strict=False)
# Validate pruned model
results_str = pruned_model.val(data="custom_coco.yaml")
print(f"Pruned mAP50-95: {results_str.box.map}")
# Fine-tune pruned model
print("Fine-tuning pruned model...")
results_str = pruned_model.train(
data='custom_coco.yaml',
epochs=5,
imgsz=640,
batch=8,
lr0=0.001 # Lower learning rate for stability
)
pruned_model.save('yolov8n_structured_pruned_finetuned.pt')
print("Pruned and fine-tuned model saved as 'yolov8n_structured_pruned_finetuned.pt'.")
# Validate fine-tuned model
fine_tuned_model = YOLO('yolov8n_structured_pruned_finetuned.pt')
results_str = fine_tuned_model.val(data="custom_coco.yaml")
print(f"Fine-tuned mAP50-95: {results_str.box.map}")
此代码的输出:
Ultralytics 8.3.31 Python-3.10.14 torch-2.4.1+cu118 CUDA:0 (NVIDIA GeForce GTX 1060 6GB, 6144MiB)
Model summary (fused): 168 layers, 3,007,793 parameters, 0 gradients, 8.1 GFLOPs
val: Scanning J:\quantization\datasets\custom_data\labels\val.cache... 3321 images, 0 backgrounds, 0 corrupt: 100%|██████████| 3321/3321 [00:00<?, ?it/s]
Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 208/208 [00:57<00:00, 3.61it/s]
all 3321 15491 0.72 0.599 0.669 0.481
person 2693 10777 0.798 0.663 0.761 0.532
bicycle 149 314 0.7 0.379 0.438 0.257
car 535 1918 0.698 0.521 0.585 0.378
motorcycle 159 367 0.709 0.572 0.657 0.418
bus 189 283 0.815 0.653 0.743 0.615
truck 250 414 0.573 0.379 0.451 0.297
cat 184 202 0.782 0.835 0.867 0.672
dog 177 218 0.717 0.683 0.748 0.602
horse 128 272 0.742 0.624 0.741 0.558
sheep 65 354 0.624 0.653 0.666 0.46
cow 87 372 0.756 0.625 0.702 0.496
Speed: 0.4ms preprocess, 4.1ms inference, 0.0ms loss, 2.2ms postprocess per image
Results saved to runs\detect\val22
Original mAP50-95: 0.4805636008285908
Pruning model...
Model pruned.
Pruned model weights saved as 'yolov8n_Structured_pruned_weights.pth'.
Ultralytics 8.3.31 Python-3.10.14 torch-2.4.1+cu118 CUDA:0 (NVIDIA GeForce GTX 1060 6GB, 6144MiB)
Model summary (fused): 168 layers, 3,007,793 parameters, 0 gradients, 8.1 GFLOPs
val: Scanning J:\quantization\datasets\custom_data\labels\val.cache... 3321 images, 0 backgrounds, 0 corrupt: 100%|██████████| 3321/3321 [00:00<?, ?it/s]
Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 208/208 [01:07<00:00, 3.08it/s]
all 3321 15491 0.455 0.000775 4.48e-05 9.65e-06
person 2693 10777 0.00331 0.000186 3.5e-05 1.46e-05
bicycle 149 314 1 0 0 0
car 535 1918 0.000977 0.00834 0.000458 9.15e-05
motorcycle 159 367 1 0 0 0
bus 189 283 0 0 0 0
truck 250 414 1 0 0 0
cat 184 202 1 0 0 0
dog 177 218 0 0 0 0
horse 128 272 0 0 0 0
sheep 65 354 0 0 0 0
cow 87 372 1 0 0 0
Speed: 0.4ms preprocess, 4.0ms inference, 0.0ms loss, 2.6ms postprocess per image
Results saved to runs\detect\val23
Pruned mAP50-95: 9.649495810499685e-06
Fine-tuning pruned model...
New https://pypi.org/project/ultralytics/8.3.34 available Update with 'pip install -U ultralytics'
Ultralytics 8.3.31 Python-3.10.14 torch-2.4.1+cu118 CUDA:0 (NVIDIA GeForce GTX 1060 6GB, 6144MiB)
engine\trainer: task=detect, mode=train, model=best.pt, data=custom_coco.yaml, epochs=5, time=None, patience=100, batch=8, imgsz=640, save=True, save_period=-1, cache=False, device=None, workers=8, project=None, name=train9, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, multi_scale=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, vid_stride=1, stream_buffer=False, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, embed=None, show=False, save_frames=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=True, show_boxes=True, line_width=None, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=True, opset=None, workspace=4, nms=False, lr0=0.001, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=7.5, cls=0.5, dfl=1.5, pose=12.0, kobj=1.0, label_smoothing=0.0, nbs=64, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, bgr=0.0, mosaic=1.0, mixup=0.0, copy_paste=0.0, copy_paste_mode=flip, auto_augment=randaugment, erasing=0.4, crop_fraction=1.0, cfg=None, tracker=botsort.yaml, save_dir=runs\detect\train9
from n params module arguments
0 -1 1 464 ultralytics.nn.modules.conv.Conv [3, 16, 3, 2]
1 -1 1 4672 ultralytics.nn.modules.conv.Conv [16, 32, 3, 2]
2 -1 1 7360 ultralytics.nn.modules.block.C2f [32, 32, 1, True]
3 -1 1 18560 ultralytics.nn.modules.conv.Conv [32, 64, 3, 2]
4 -1 2 49664 ultralytics.nn.modules.block.C2f [64, 64, 2, True]
5 -1 1 73984 ultralytics.nn.modules.conv.Conv [64, 128, 3, 2]
6 -1 2 197632 ultralytics.nn.modules.block.C2f [128, 128, 2, True]
7 -1 1 295424 ultralytics.nn.modules.conv.Conv [128, 256, 3, 2]
8 -1 1 460288 ultralytics.nn.modules.block.C2f [256, 256, 1, True]
9 -1 1 164608 ultralytics.nn.modules.block.SPPF [256, 256, 5]
10 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
11 [-1, 6] 1 0 ultralytics.nn.modules.conv.Concat [1]
12 -1 1 148224 ultralytics.nn.modules.block.C2f [384, 128, 1]
13 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
14 [-1, 4] 1 0 ultralytics.nn.modules.conv.Concat [1]
15 -1 1 37248 ultralytics.nn.modules.block.C2f [192, 64, 1]
16 -1 1 36992 ultralytics.nn.modules.conv.Conv [64, 64, 3, 2]
17 [-1, 12] 1 0 ultralytics.nn.modules.conv.Concat [1]
18 -1 1 123648 ultralytics.nn.modules.block.C2f [192, 128, 1]
19 -1 1 147712 ultralytics.nn.modules.conv.Conv [128, 128, 3, 2]
20 [-1, 9] 1 0 ultralytics.nn.modules.conv.Concat [1]
21 -1 1 493056 ultralytics.nn.modules.block.C2f [384, 256, 1]
22 [15, 18, 21] 1 753457 ultralytics.nn.modules.head.Detect [11, [64, 128, 256]]
Model summary: 225 layers, 3,012,993 parameters, 3,012,977 gradients, 8.2 GFLOPs
Transferred 70/355 items from pretrained weights
TensorBoard: Start with 'tensorboard --logdir runs\detect\train9', view at http://localhost:6006/
Freezing layer 'model.22.dfl.conv.weight'
AMP: running Automatic Mixed Precision (AMP) checks...
AMP: checks passed
train: Scanning J:\quantization\datasets\custom_data\labels\train.cache... 78140 images, 0 backgrounds, 0 corrupt: 100%|██████████| 78140/78140 [00:00<?, ?it/s]
val: Scanning J:\quantization\datasets\custom_data\labels\val.cache... 3321 images, 0 backgrounds, 0 corrupt: 100%|██████████| 3321/3321 [00:00<?, ?it/s]
Plotting labels to runs\detect\train9\labels.jpg...
optimizer: 'optimizer=auto' found, ignoring 'lr0=0.001' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically...
optimizer: AdamW(lr=0.000667, momentum=0.9) with parameter groups 57 weight(decay=0.0), 64 weight(decay=0.0005), 63 bias(decay=0.0)
TensorBoard: model graph visualization added
Image sizes 640 train, 640 val
Using 8 dataloader workers
Logging results to runs\detect\train9
Starting training for 5 epochs...
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
1/5 1.76G 1.382 1.496 1.337 12 640: 100%|██████████| 9768/9768 [41:16<00:00, 3.94it/s]
Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 208/208 [00:59<00:00, 3.50it/s]
all 3321 15491 0.643 0.494 0.546 0.363
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
2/5 1.59G 1.178 1.189 1.202 24 640: 100%|██████████| 9768/9768 [37:35<00:00, 4.33it/s]
Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 208/208 [00:51<00:00, 4.04it/s]
all 3321 15491 0.668 0.533 0.598 0.408
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
3/5 1.61G 1.147 1.139 1.183 36 640: 100%|██████████| 9768/9768 [36:33<00:00, 4.45it/s]
Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 208/208 [00:50<00:00, 4.12it/s]
all 3321 15491 0.689 0.546 0.613 0.425
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
4/5 1.51G 1.115 1.086 1.164 14 640: 100%|██████████| 9768/9768 [36:16<00:00, 4.49it/s]
Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 208/208 [00:49<00:00, 4.23it/s]
all 3321 15491 0.697 0.558 0.63 0.437
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
5/5 1.66G 1.088 1.038 1.147 22 640: 100%|██████████| 9768/9768 [37:34<00:00, 4.33it/s]
Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 208/208 [00:53<00:00, 3.88it/s]
all 3321 15491 0.706 0.573 0.64 0.447
5 epochs completed in 3.234 hours.
Optimizer stripped from runs\detect\train9\weights\last.pt, 6.2MB
Optimizer stripped from runs\detect\train9\weights\best.pt, 6.2MB
Validating runs\detect\train9\weights\best.pt...
Ultralytics 8.3.31 Python-3.10.14 torch-2.4.1+cu118 CUDA:0 (NVIDIA GeForce GTX 1060 6GB, 6144MiB)
Model summary (fused): 168 layers, 3,007,793 parameters, 0 gradients, 8.1 GFLOPs
Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 208/208 [01:01<00:00, 3.39it/s]
all 3321 15491 0.705 0.573 0.64 0.447
person 2693 10777 0.795 0.641 0.741 0.51
bicycle 149 314 0.687 0.347 0.419 0.235
car 535 1918 0.697 0.493 0.563 0.351
motorcycle 159 367 0.731 0.549 0.633 0.379
bus 189 283 0.784 0.642 0.728 0.588
truck 250 414 0.584 0.36 0.414 0.264
cat 184 202 0.727 0.807 0.837 0.632
dog 177 218 0.704 0.643 0.703 0.549
horse 128 272 0.782 0.61 0.721 0.528
sheep 65 354 0.567 0.625 0.625 0.422
cow 87 372 0.702 0.59 0.66 0.457
Speed: 0.3ms preprocess, 3.7ms inference, 0.0ms loss, 2.4ms postprocess per image
Results saved to runs\detect\train9
Pruned and fine-tuned model saved as 'yolov8n_structured_pruned_finetuned.pt'.
Ultralytics 8.3.31 Python-3.10.14 torch-2.4.1+cu118 CUDA:0 (NVIDIA GeForce GTX 1060 6GB, 6144MiB)
Model summary (fused): 168 layers, 3,007,793 parameters, 0 gradients, 8.1 GFLOPs
val: Scanning J:\quantization\datasets\custom_data\labels\val.cache... 3321 images, 0 backgrounds, 0 corrupt: 100%|██████████| 3321/3321 [00:00<?, ?it/s]
Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 208/208 [01:04<00:00, 3.25it/s]
all 3321 15491 0.706 0.573 0.64 0.447
person 2693 10777 0.796 0.641 0.741 0.51
bicycle 149 314 0.688 0.347 0.418 0.235
car 535 1918 0.698 0.492 0.562 0.351
motorcycle 159 367 0.73 0.552 0.632 0.38
bus 189 283 0.785 0.64 0.728 0.588
truck 250 414 0.585 0.36 0.414 0.264
cat 184 202 0.731 0.806 0.837 0.631
dog 177 218 0.703 0.641 0.703 0.55
horse 128 272 0.783 0.61 0.721 0.528
sheep 65 354 0.57 0.624 0.624 0.424
cow 87 372 0.699 0.589 0.661 0.455
Speed: 0.4ms preprocess, 4.2ms inference, 0.0ms loss, 2.7ms postprocess per image
Results saved to runs\detect\val24
Fine-tuned mAP50-95: 0.44696036711098114