我无法使用Docker将我的简单Socket.IO应用程序扩展到大约980个并发连接。但是,如果我在我的macOS Sierra 10.12.6上本地运行它,我可以获得超过3000个连接。我已经包含了一个我正在测试的简单SocketIO应用程序的repo:https://github.com/gsccheng/simple-socketIO-app
我的Docker-for-Mac配置为4个CPU和5 GB内存。版本是
Version 17.09.0-ce-mac35 (19611)
Channel: stable
a98b7c1b7c
我正在使用Artillery版本1.6.0-9
加载测试它
$ artillery run load-test.yaml
我正在显示一些设置的冗余配置(以显示它们已被考虑)。以下是我重现的步骤。
$ docker build . -t socket-test
$ docker run -p 8000:8000 -c 1024 -m 4096M --privileged --ulimit nofile=9000:9000 -it test-socket:latest /bin/sh
#> DEBUG=* npm start
最多约980个连接我会得到这样的日志:
Connected to Socket!
socket.io:client writing packet {"type":2,"data":["news",{"hello":"world"}],"nsp":"/"} +0ms
socket.io-parser encoding packet {"type":2,"data":["news",{"hello":"world"}],"nsp":"/"} +0ms
socket.io-parser encoded {"type":2,"data":["news",{"hello":"world"}],"nsp":"/"} as 2["news",{"hello":"world"}] +0ms
engine:socket sending packet "message" (2["news",{"hello":"world"}]) +0ms
socket.io:socket joined room 0ohCcHMWYASnfRgJAAPS +0ms
engine:ws received "2" +5ms
engine:socket packet +0ms
engine:socket got ping +0ms
engine:socket sending packet "pong" (undefined) +0ms
engine:socket flushing buffer to transport +0ms
engine:ws writing "3" +0ms
engine upgrading existing transport +2ms
engine:socket might upgrade socket transport from "polling" to "websocket" +0ms
engine intercepting request for path "/socket.io/" +2ms
engine handling "GET" http request "/socket.io/?EIO=3&transport=polling&t=Ly8pfqL&b64=1&sid=0ohCcHMWYASnfRgJAAPS" +0ms
engine setting new request for existing client +0ms
engine:polling setting request +0ms
engine:socket flushing buffer to transport +0ms
engine:polling writing "28:42["news",{"hello":"world"}]" +0ms
engine:socket executing batch send callback +1ms
engine:ws received "2probe" +4ms
engine:ws writing "3probe" +0ms
engine intercepting request for path "/socket.io/" +4ms
engine handling "GET" http request "/socket.io/?EIO=3&transport=polling&t=Ly8pfqV&b64=1&sid=0ohCcHMWYASnfRgJAAPS" +0ms
engine setting new request for existing client +0ms
engine:polling setting request +0ms
engine:socket writing a noop packet to polling for fast upgrade +10ms
engine:polling writing "1:6" +0ms
engine:ws received "5" +2ms
engine:socket got upgrade packet - upgrading +0ms
engine:polling closing +0ms
engine:polling transport discarded - closing right away +1ms
engine:ws received "2" +20ms
engine:socket packet +0ms
engine:socket got ping +0ms
engine:socket sending packet "pong" (undefined) +0ms
engine:socket flushing buffer to transport +1ms
engine:ws writing "3" +0ms
engine intercepting request for path "/socket.io/" +1ms
engine handling "GET" http request "/socket.io/?EIO=3&transport=polling&t=Ly8pfr1&b64=1" +0ms
engine handshaking client "6ccAiZwbvrchxZEiAAPT" +0ms
engine:socket sending packet "open" ({"sid":"6ccAiZwbvrchxZEiAAPT","upgrades":["websocket"],"pingInterval":25000,"pingTimeout":60000}) +0ms
engine:socket sending packet "message" (0) +0ms
engine:polling setting request +0ms
engine:socket flushing buffer to transport +0ms
engine:polling writing "97:0{"sid":"6ccAiZwbvrchxZEiAAPT","upgrades":["websocket"],"pingInterval":25000,"pingTimeout":60000}2:40" +0ms
engine:socket executing batch send callback +0ms
socket.io:server incoming connection with id 6ccAiZwbvrchxZEiAAPT +0ms
socket.io:client connecting to namespace / +1ms
socket.io:namespace adding socket to nsp / +0ms
socket.io:socket socket connected - writing packet +0ms
socket.io:socket joining room 6ccAiZwbvrchxZEiAAPT +0ms
socket.io:socket packet already sent in initial handshake +0ms
Connected to Socket!
在大约980个连接处,我将开始看到这些断开连接的事件:
disconnected to Socket!
transport close
engine intercepting request for path "/socket.io/" +27ms
engine handling "GET" http request "/socket.io/?EIO=3&transport=polling&t=Ly8pg1T&b64=1" +0ms
engine handshaking client "C-pdSXFCbwQaTeYLAAPh" +0ms
engine:socket sending packet "open" ({"sid":"C-pdSXFCbwQaTeYLAAPh","upgrades":["websocket"],"pingInterval":25000,"pingTimeout":60000}) +0ms
engine:socket sending packet "message" (0) +0ms
engine:polling setting request +0ms
engine:socket flushing buffer to transport +0ms
engine:polling writing "97:0{"sid":"C-pdSXFCbwQaTeYLAAPh","upgrades":["websocket"],"pingInterval":25000,"pingTimeout":60000}2:40" +0ms
engine:socket executing batch send callback +0ms
socket.io:server incoming connection with id C-pdSXFCbwQaTeYLAAPh +0ms
socket.io:client connecting to namespace / +0ms
socket.io:namespace adding socket to nsp / +0ms
socket.io:socket socket connected - writing packet +1ms
socket.io:socket joining room C-pdSXFCbwQaTeYLAAPh +0ms
socket.io:socket packet already sent in initial handshake +0ms
Connected to Socket!
socket.io:client writing packet {"type":2,"data":["news",{"hello":"world"}],"nsp":"/"} +0ms
socket.io-parser encoding packet {"type":2,"data":["news",{"hello":"world"}],"nsp":"/"} +0ms
socket.io-parser encoded {"type":2,"data":["news",{"hello":"world"}],"nsp":"/"} as 2["news",{"hello":"world"}] +0ms
engine:socket sending packet "message" (2["news",{"hello":"world"}]) +0ms
socket.io:socket joined room C-pdSXFCbwQaTeYLAAPh +0ms
engine intercepting request for path "/socket.io/" +13ms
engine handling "POST" http request "/socket.io/?EIO=3&transport=polling&t=Ly8pg1g&b64=1&sid=C-pdSXFCbwQaTeYLAAPh" +0ms
engine setting new request for existing client +1ms
engine:polling received "1:1" +0ms
engine:polling got xhr close packet +0ms
socket.io:client client close with reason transport close +0ms
socket.io:socket closing socket - reason transport close +1ms
disconnected to Socket!
然后它会一遍又一遍地重复:
engine:ws writing "3" +0ms
engine:ws received "2" +42ms
engine:socket packet +0ms
engine:socket got ping +0ms
engine:socket sending packet "pong" (undefined) +1ms
engine:socket flushing buffer to transport +0ms
engine:ws writing "3" +0ms
engine:ws received "2" +4ms
engine:socket packet +0ms
engine:socket got ping +0ms
engine:socket sending packet "pong" (undefined) +0ms
engine:socket flushing buffer to transport +0ms
engine:ws writing "3" +0ms
engine:ws received "2" +45ms
engine:socket packet +0ms
engine:socket got ping +0ms
engine:socket sending packet "pong" (undefined) +0ms
engine:socket flushing buffer to transport +0ms
engine:ws writing "3" +0ms
engine:ws received "2" +7ms
engine:socket packet +0ms
engine:socket got ping +0ms
engine:socket sending packet "pong" (undefined) +0ms
engine:socket flushing buffer to transport +0ms
engine:ws writing "3" +0ms
正如您在我的Dockerfile中看到的,我已经设置了一些我通过Google搜索我的问题收集的配置:
COPY limits.conf /etc/security/
COPY sysctl.conf /etc/
COPY rc.local /etc/
COPY common-session /etc/pam.d/
COPY common-session-noninteractive /etc/pam.d/
COPY supervisord.conf /etc/supervisor/
在我的本地系统上,我也做了一些配置,比如跟随这个example。这是我的主机的状态:
$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
file size (blocks, -f) unlimited
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 64000
pipe size (512 bytes, -p) 1
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 2048
virtual memory (kbytes, -v) unlimited
如何获得超过~980个并发套接字连接?为什么我在那时没有再建立连接?如何调整我的回购(如果需要)以使其工作?
编辑
当我将容器的nofiles
限制降低到500时,我看到我的应用程序断开连接似乎以同样的方式失败。当我增加或减少我的内存和CPU减半/双。我没有看到任何不同的行为,所以看起来似乎不是问题。
本地应用程序的网络路径与Docker for Mac中运行的应用程序之间存在显着差异。
Mac上应用程序的路径是直接通过loopback接口:
mac
client -> lo -> nodejs
当使用Docker for Mac时,路径包含更多的跃点,并包括两个用户域代理进程,mac上的vpnkit
和docker-proxy
,它们接受转发端口上的TCP连接并转发数据:
mac | vm | container
client -> lo -> vpnkit -> if -> docker-proxy -> NAT -> bridge -> if -> nodejs
尝试使用可以直接访问mac的网络的VM,看看vpnkit
是否有明显的差异。
mac | vm | container
client -> if -> if -> docker-proxy -> NAT -> bridge -> if -> nodejs
您还可以通过将容器接口直接附加到VM网络来删除docker-proxy
,因此容器不需要端口映射(-p
)。这可以通过映射macvlan interface to the container或将容器放置在连接到VM网络的网桥上来完成。这是我用于桥接网络的a vagrant setup。
mac | container <- there is a little vm here, but minimal.
client -> if -> if -> nodejs
mac | vm | container
client -> if -> if -> bridge -> if -> nodejs
一旦你摆脱了网络差异,我就会更详细地调整VM和容器。我猜你应该看到VM减少了10-20%,而不是66%。
我遇到了engine:polling got xhr close packet
我试图从stackoverflow搜索所有,但只有这个问题有这个信息。
我已经简要地调查了它,并且当客户端发送get
+ post
http请求时,不知何故,负载均衡器拒绝了get
,而post
可能仍然正常工作,所以这也发生在我们的网站上。
问题应该升级到负载均衡器的稳定性。 (特别是它的粘性稳定性)