我已经构建了一个机器学习api,使用Torch作为ML框架。当我把代码上传到Googe App Engine时,它的内存用完了。经过一些调试,我发现问题出在Torch的安装上。
我使用的是Torch 1.5.0和python 3.7.4。
那么我如何解决这个错误呢?也许我可以在app.yaml中修改一些东西?
错误信息。
Step #1 - "builder": OSError: [Errno 12] Cannot allocate memory
Step #1 - "builder": self.pid = os.fork()
Step #1 - "builder": File "/usr/lib/python2.7/subprocess.py", line 938, in _execute_child
Step #1 - "builder": errread, errwrite)
Step #1 - "builder": File "/usr/lib/python2.7/subprocess.py", line 394, in __init__
Step #1 - "builder": File "/usr/local/bin/ftl.par/__main__/ftl/python/layer_builder.py", line 346, in _python_version
Step #1 - "builder": File "/usr/local/bin/ftl.par/__main__/ftl/python/layer_builder.py", line 332, in GetCacheKeyRaw
Step #1 - "builder": File "/usr/local/bin/ftl.par/__main__/ftl/python/layer_builder.py", line 109, in GetCacheKeyRaw
Step #1 - "builder": File "/usr/local/bin/ftl.par/__main__/ftl/common/single_layer_image.py", line 60, in GetCacheKey
Step #1 - "builder": File "/usr/local/bin/ftl.par/__main__/ftl/python/layer_builder.py", line 153, in BuildLayer
Step #1 - "builder": File "/usr/local/bin/ftl.par/__main__/ftl/python/builder.py", line 114, in Build
Step #1 - "builder": File "/usr/local/bin/ftl.par/__main__.py", line 54, in main
Step #1 - "builder": File "/usr/local/bin/ftl.par/__main__.py", line 65, in <module>
Step #1 - "builder": exec code in run_globals
Step #1 - "builder": File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
Step #1 - "builder": "__main__", fname, loader, pkg_name)
Step #1 - "builder": File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
Step #1 - "builder": Traceback (most recent call last):
当我在requirements.txt中没有包含Torch时,又出现了这个错误信息。
来重现。
app.yaml
runtime: python37
resources:
memory_gb: 16
disk_size_gb: 10
要求.txt
gunicorn==20.0.4
aniso8601==8.0.0
beautifulsoup4==4.9.0
boto3==1.13.3
botocore==1.16.3
bs4==0.0.1
certifi==2020.4.5.1
chardet==3.0.4
click==7.1.2
colorama==0.4.3
docutils==0.15.2
filelock==3.0.12
Flask==1.1.2
Flask-RESTful==0.3.8
googletrans==2.4.0
idna==2.9
itsdangerous==1.1.0
Jinja2==2.11.2
jmespath==0.9.5
joblib==0.14.1
MarkupSafe==1.1.1
numpy==1.18.4
protobuf==3.11.3
python-dateutil==2.8.1
pytz==2020.1
regex==2020.4.4
requests==2.23.0
s3transfer==0.3.3
sacremoses==0.0.43
sentencepiece==0.1.86
six==1.14.0
soupsieve==2.0
tokenizers==0.5.2
tqdm==4.46.0
transformers==2.8.0
urllib3==1.25.9
Werkzeug==1.0.1
主.py
import flask
from flask import Flask, request
from flask_restful import Api, Resource
app = Flask(__name__)
api = Api(app)
production = False
import json
# Import api code
# Create main api 'view'
class main_api(Resource):
def get(self):
question = request.args.get('question')
# Run the script
# But not necessary for the minimum working test
return {
'question': question,
# 'results': results_from_script,
}
# Adds resource
api.add_resource(main_api, '/')
# Starts the api
if __name__ == '__main__':
host = '127.0.0.1'
port = 8080
app.run(host=host, port=port, debug=not production)
我使用flex环境修复了这个错误。我唯一需要修改的是app.yaml的内容。
runtime: python
env: flex
entrypoint: gunicorn -b :$PORT main:app
runtime_config:
python_version: 3
manual_scaling:
instances: 1
resources:
cpu: 2
memory_gb: 5
disk_size_gb: 10
然后就可以部署了