Celery 工作线程过早退出信号 11:尝试在 Django 视图中单击按钮时运行 python 脚本

问题描述 投票:0回答:1

我正在开发一个 Django 应用程序,其部分过程是用时间戳转录音频。当用户单击 Web 界面中的按钮时,Django 服务器会启动一个有助于转录的 Python 脚本。

现在,这里有一些我已经尝试过的方法: 我有一个单独的 transcribe.py 文件。当用户单击网页中的转录按钮时,它将访问项目应用程序中的视图。但是,部分运行脚本后,Django 服务器将从终端终止。

由于 Python 脚本是一个长时间运行的进程,我认为我应该在后台运行该程序,这样 Django 服务器就不会终止。所以,我实现了 Celery 和 Redis。首先,当我从 Django shell 运行 transcribe.py 脚本时,它运行得非常好。但是,当我尝试从视图/网页执行它时,它再次终止。

python管理.py shell

由于我实现了 celery Worker 部分,服务器不会终止,但 Worker 会抛出以下错误。

[tasks]
  . transcribeApp.tasks.run_transcription

[2024-11-25 03:26:04,500: INFO/MainProcess] Connected to redis://localhost:6379/0
[2024-11-25 03:26:04,514: INFO/MainProcess] mingle: searching for neighbors
[2024-11-25 03:26:05,520: INFO/MainProcess] mingle: all alone
[2024-11-25 03:26:05,544: INFO/MainProcess] [email protected] ready.
[2024-11-25 03:26:16,253: INFO/MainProcess] Task searchApp.tasks.run_transcription[c684bdfa-ec21-4b4e-9542-0ca1f7729682] received
[2024-11-25 03:26:16,255: INFO/ForkPoolWorker-15] Starting transcription process.
[2024-11-25 03:26:16,509: WARNING/ForkPoolWorker-15] /Users/user/Desktop/project/django_app/django_venv/lib/python3.12/site-packages/whisper/__init__.py:150: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  checkpoint = torch.load(fp, map_location=device)

[2024-11-25 03:26:16,670: ERROR/MainProcess] Process 'ForkPoolWorker-15' pid:38956 exited with 'signal 11 (SIGSEGV)'
[2024-11-25 03:26:16,683: ERROR/MainProcess] Task handler raised error: WorkerLostError('Worker exited prematurely: signal 11 (SIGSEGV) Job: 0.')
Traceback (most recent call last):
  File "/Users/user/Desktop/project/django_app/django_venv/lib/python3.12/site-packages/billiard/pool.py", line 1265, in mark_as_worker_lost
    raise WorkerLostError(
billiard.einfo.ExceptionWithTraceback: 
"""
Traceback (most recent call last):
  File "/Users/user/Desktop/project/django_app/django_venv/lib/python3.12/site-packages/billiard/pool.py", line 1265, in mark_as_worker_lost
    raise WorkerLostError(
billiard.exceptions.WorkerLostError: Worker exited prematurely: signal 11 (SIGSEGV) Job: 0.
"""

实现看起来像这样,

# Views.py
from . import tasks
from django.shortcuts import render
from django.http import HttpResponse, JsonResponse

def trainVideos(request):
    try:
        tasks.run_transcription.delay()
        return JsonResponse({"status": "success", "message": "Transcription has started check back later."})
    # return render(request, 'embed.html', {'data': data})
    except Exception as e:
        JsonResponse({"status": "error", "message": str(e)})

这是转录函数的样子,芹菜工作人员会抛出工作人员过早退出错误。

# Add one or two audios possibly .wav, .mp3 in a folder,
# and provide the file path here.
# transcribe.py 

import whisper_timestamped as whisper
import os
def transcribeTexts(model_id, filePath):
    result = []
    fileNames = os.listdir()
    
    model = whisper.load_model(model_id)

    for files in fileNames:
        audioPath = filePath + "/" + files

        audio = whisper.load_audio(audioPath)

        result.append(model.transcribe(audio, language="en"))
    
    return result
 model_id = "tiny"
 audioFilePath = path/to/audio
 transcribeTexts(model_id, audioFilePath)

安装以下库以重现问题:

 pip install openai-whisper
 pip3 install whisper-timestamped
 pip install Django
 pip install celery redis
 pip install redis-server

Celery 实现:# celery.py 来自项目 main_app 目录

from __future__ import absolute_import, unicode_literals
import os
from celery import Celery

os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'main_app.settings')

app = Celery('main_app')

app.config_from_object('django.conf:settings', namespace='CELERY')

app.autodiscover_tasks()

def debug_tasks(self):
    print(f"Request: {self.request!r}")
transcribe_app 目录中的

tasks.py:

from __future__ import absolute_import, unicode_literals
from . import transcribe
from celery import shared_task

@shared_task
def run_transcription():
    transcribe.transcribe()
    return "Transcription Completed..."

settings.py 还更新了以下内容:

CELERY_BROKER_URL = 'redis://localhost:6379/0'
CELERY_BROKER_CONNECTION_RETRY_ON_STARTUP = True

另外,修改了 django_app 中的 init.py 文件

from __future__ import absolute_import, unicode_literals

from .celery import app as celery_app

__all__ = ('celery_app',) 

对于此应用程序,某些库依赖于特定版本。下面列出了所有库和包:

Package              Version
-------------------- -----------
amqp                 5.3.1
asgiref              3.8.1
billiard             4.2.1
celery               5.4.0
certifi              2024.8.30
charset-normalizer   3.3.2
click                8.1.7
click-didyoumean     0.3.1
click-plugins        1.1.1
click-repl           0.3.0
Cython               3.0.11
Django               5.1.2
django-widget-tweaks 1.5.0
dtw-python           1.5.3
faiss-cpu            1.9.0
ffmpeg               1.4
filelock             3.16.1
fsspec               2024.9.0
huggingface-hub      0.25.2
idna                 3.10
Jinja2               3.1.4
kombu                5.4.2
lfs                  0.2
llvmlite             0.43.0
MarkupSafe           3.0.1
more-itertools       10.5.0
mpmath               1.3.0
msgpack              1.1.0
networkx             3.3
numba                0.60.0
numpy                2.0.2
packaging            24.1
panda                0.3.1
pillow               10.4.0
pip                  24.3.1
prompt_toolkit       3.0.48
pydub                0.25.1
python-dateutil      2.9.0.post0
PyYAML               6.0.2
redis                5.2.0
regex                2024.9.11
requests             2.32.3
safetensors          0.4.5
scipy                1.14.1
semantic-version     2.10.0
setuptools           75.1.0
setuptools-rust      1.10.2
six                  1.16.0
sqlparse             0.5.1
sympy                1.13.3
tiktoken             0.8.0
tokenizers           0.20.1
torch                2.4.1
torchaudio           2.4.1
torchvision          0.19.1
tqdm                 4.66.5
transformers         4.45.2
txtai                7.4.0
typing_extensions    4.12.2
tzdata               2024.2
urllib3              2.2.3
vine                 5.1.0
wcwidth              0.2.13
whisper-timestamped  1.15.4

总的来说,当我独立运行该程序时,它运行得很好。但在 Django 中,无论我如何执行它,它都会终止。我认为原因之一可能是因为我正在加载长音频,所以我将其分块并尝试使用用户界面运行 transcribe.py 程序;然而,这也是工作人员过早退出的情况,信号 11 (SIGSEGV) 作业:0。我尝试将工作人员的内存池大小更改为更高的级别,但没有成功。我不确定在 Django 中运行 transcribe.py 文件需要做什么,因为大多数已知的方法都不适合我。我可能错过了一些东西,所以请帮我解决这个问题。谢谢您的宝贵时间。

python pytorch celery
1个回答
0
投票
当您尝试访问程序无法访问的内存时,经常会出现

sigsegv请参阅此处。我可以重新创建代码,它在我这边工作得很好。以下是您发生这种情况的可能原因:

  • 您在 celery 命令中指定的池类型未成功运行,--pool=solo 似乎可以工作,因为它不会分叉该进程。
  • 部分代码以 root 身份执行,其他部分则不然。
  • 您提供的文件路径不正确,或者存在错误的权限。
  • 也许您在内存非常有限的虚拟机上执行此操作,因此没有可用内存,因为您加载的 AI 模型和库已经很重?
  • 你的机器上的 libc 或 Celery 本身确实存在问题,但问题不清楚。

我将引导您完成如何重新创建代码,也许您犯了一个拼写错误或一个小错误,导致了您提到的错误。

django-admin startproject project101
cd project101
python3 manage.py startapp app101

project101/urls.py:

from django.contrib import admin
from django.urls import path, include

urlpatterns = [
    path('admin/', admin.site.urls),
    path('', include("app101.urls"))
]

project101/settings.py

INSTALLED_APPS = [
    # ...
    
    'app101'
]

# put this at the end of settings.py
CELERY_BROKER_URL = 'redis://localhost:6379/0'
CELERY_BROKER_CONNECTION_RETRY_ON_STARTUP = True

project101/celery.py

from __future__ import absolute_import, unicode_literals
import os
from celery import Celery

os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'project101.settings')

app = Celery('project101')

app.config_from_object('django.conf:settings', namespace='CELERY')

app.autodiscover_tasks()

def debug_tasks(self):
    print(f"Request: {self.request!r}")

project101/init.py

from __future__ import absolute_import, unicode_literals

from .celery import app as celery_app

__all__ = ('celery_app',) 

app101/views.py

from . import tasks
from django.shortcuts import render
from django.http import HttpResponse, JsonResponse

def trainVideos(request):
    try:
        tasks.run_transcription.delay()
        return JsonResponse({"status": "success", "message": "Transcription has started check back later."})
    # return render(request, 'embed.html', {'data': data})
    except Exception as e:
        JsonResponse({"status": "error", "message": str(e)})

app101/urls.py

from django.urls import path, include
from . import views

urlpatterns = [
    path('transcribe', views.trainVideos)
]

app101/tasks.py

from __future__ import absolute_import, unicode_literals
from . import transcribe
from celery import shared_task

@shared_task
def run_transcription():
    transcribe.transcribe()
    return "Transcription Completed..."

app101/transcribe.py


import whisper_timestamped as whisper
import os

def transcribeTexts(model_id, audio_directory_path):
    result = []
    fileNames = os.listdir(audio_directory_path)
    
    model = whisper.load_model(model_id)

    for files in fileNames:
        print(files)
        audioPath = audio_directory_path + "/" + files

        audio = whisper.load_audio(audioPath)

        result.append(model.transcribe(audio, language="en"))
    print(result)
    return result

def transcribe():
    model_id = "tiny"
    audio_directory_path = 'audio_sample'
    transcribeTexts(model_id, audio_directory_path)

注意,

audio_sample
是app101之外的文件夹,与app101和project101具有相同的级别。您可以将其放在另一个文件夹中,但请确保指定正确的目录路径。我在下面添加了目录结构。

.
├── app101
│   ├── admin.py
│   ├── apps.py
│   ├── __init__.py
│   ├── migrations
│   ├── models.py
│   ├── __pycache__
│   ├── tasks.py
│   ├── tests.py
│   ├── transcribe.py
│   ├── urls.py
│   └── views.py
├── audio_sample
│   └── some_audio.mp3
├── db.sqlite3
├── manage.py
└── p101
    ├── asgi.py
    ├── celery.py
    ├── __init__.py
    ├── __pycache__
    ├── settings.py
    ├── urls.py
    └── wsgi.py

此后,在不同的终端上运行以下命令:

python3 manage.py runserver
celery -A project101 worker --pool=solo -l info

这将使您的项目启动并运行。但请注意以下几点:

  • 这只是为了引导您如何成功运行 celery,不要忘记在您的项目中实现代码并进行相应的迁移。
  • 您可以使用不同的参数运行 Celery 命令,例如将池从 single 更改为 gevent。 --pool=solo 似乎工作正常。
  • 以同一用户身份执行所有操作,无论是 root 用户(不推荐)还是普通用户。
  • 确保所有文件都有正确的权限。
© www.soinside.com 2019 - 2024. All rights reserved.