C2Compiler::compile_method/PhaseCFG::insert_anti_dependences 中可疑的大 mmap

问题描述 投票:0回答:1

我们怀疑C2Compiler的某些峰值分配行为导致java使用的内存达到超出docker限制的峰值内存并调用oom Killer:

为了继续检查 docker oom 发生在我们托管在 32GB 内存限制的 docker 容器内的 java web 服务中,以及版本

openjdk version "11.0.14.1" 2022-02-08
OpenJDK Runtime Environment Temurin-11.0.14.1+1 (build 11.0.14.1+1)
OpenJDK 64-Bit Server VM Temurin-11.0.14.1+1 (build 11.0.14.1+1, mixed mode)

具有以下 java 选项:

-Xms28g
-Xmx28g
-Xss256k
-XX:+UseG1GC
-XX:MaxGCPauseMillis=200
-XX:ParallelGCThreads=12
-XX:MetaspaceSize=512m
-XX:MaxMetaspaceSize=512m
-XX:InitialCodeCacheSize=128m
-XX:ReservedCodeCacheSize=512m
-XX:MinHeapFreeRatio=30
-XX:MaxHeapFreeRatio=50
-XX:CICompilerCount=4
-XX:+UseCompressedOops
-XX:SoftRefLRUPolicyMSPerMB=0
-XX:-OmitStackTraceInFastThrow
-XX:+AggressiveOpts
-XX:+PrintGC
-XX:+PrintGCDetails
-XX:+PrintClassHistogram
-XX:MaxTenuringThreshold=10
-XX:+IgnoreUnrecognizedVMOptions
-Djava.lang.Integer.IntegerCache.high=1000000
-Dcustomer.java.lang.Integer.IntegerCache.high=1000000
-Djava.util.concurrent.ForkJoinPool.common.parallelism=8
-Djdk.attach.allowAttachSelf=true
-Dfastjson.parser.safeMode=true
-Dlog4j2.formatMsgNoLookups=true
-Dio.netty.allocator.numDirectArenas=8

为了在更大的内存限制环境中检查这一点,使用 strace 我们发现在 C2Compiler::compile_method 的以下堆栈跟踪期间存在持续的 Arena 增长,如下所示:

[pid 24629] 20:40:44 mmap(NULL, 335544320, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f5c35378000
> /usr/lib64/libc-2.17.so(mmap64+0x3a) [0xf8fca]
> /opt/container/lib/libjemalloc.so(je_pages_map+0x47) [0x50657]
> /opt/container/lib/libjemalloc.so(je_extent_alloc_mmap+0x13) [0x4ab23]
> /opt/container/lib/libjemalloc.so(extent_grow_retained+0x709) [0x48ed9]
> /opt/container/lib/libjemalloc.so(je_extent_alloc_wrapper+0x5bf) [0x4997f]
> /opt/container/lib/libjemalloc.so(je_arena_extent_alloc_large+0x173) [0x20123]
> /opt/container/lib/libjemalloc.so(je_large_malloc+0xb9) [0x4b739]
> /opt/container/lib/libjemalloc.so(je_malloc_default+0x6c3) [0xf3f3]
> /usr/java/jdk11/lib/server/libjvm.so(os::malloc(unsigned long, MemoryType, NativeCallStack const&)+0xfc) [0xc5edec]
> /usr/java/jdk11/lib/server/libjvm.so(Arena::grow(unsigned long, AllocFailStrategy::AllocFailEnum)+0x109) [0x44d1f9]
> /usr/java/jdk11/lib/server/libjvm.so(Arena::Arealloc(void*, unsigned long, unsigned long, AllocFailStrategy::AllocFailEnum)+0x201) [0x44d541]
> /usr/java/jdk11/lib/server/libjvm.so(Node_Array::grow(unsigned int)+0x56) [0xc348a6]
> /usr/java/jdk11/lib/server/libjvm.so(PhaseCFG::insert_anti_dependences(Block*, Node*, bool)+0x750) [0x827160]
> /usr/java/jdk11/lib/server/libjvm.so(PhaseCFG::schedule_late(VectorSet&, Node_Stack&)+0x447) [0x82a0c7]
> /usr/java/jdk11/lib/server/libjvm.so(PhaseCFG::global_code_motion()+0x314) [0x82d564]
> /usr/java/jdk11/lib/server/libjvm.so(PhaseCFG::do_global_code_motion()+0x49) [0x82dea9]
> /usr/java/jdk11/lib/server/libjvm.so(Compile::Code_Gen()+0x1d4) [0x66cc64]
> /usr/java/jdk11/lib/server/libjvm.so(Compile::Compile(ciEnv*, C2Compiler*, ciMethod*, int, bool, bool, bool, bool, DirectiveSet*)+0xd4a) [0x6704da]
> /usr/java/jdk11/lib/server/libjvm.so(C2Compiler::compile_method(ciEnv*, ciMethod*, int, DirectiveSet*)+0xd3) [0x588193]
> /usr/java/jdk11/lib/server/libjvm.so(CompileBroker::invoke_compiler_on_method(CompileTask*)+0x445) [0x67a8c5]
> /usr/java/jdk11/lib/server/libjvm.so(CompileBroker::compiler_thread_loop()+0x5a7) [0x67c1f7]
> /usr/java/jdk11/lib/server/libjvm.so(JavaThread::thread_main_inner()+0x1b9) [0xed4899]
> /usr/java/jdk11/lib/server/libjvm.so(Thread::call_run()+0x14e) [0xed149e]
> /usr/java/jdk11/lib/server/libjvm.so(thread_native_entry(Thread*)+0xed) [0xc714cd]
> /usr/lib64/libpthread-2.17.so(start_thread+0xc4) [0x7ea4]
> /usr/lib64/libc-2.17.so(__clone+0x6c) [0xfeb0c]

mmap 的 strace 显示(全部包含上述堆栈跟踪)总计超过 1GB 内存:

[pid 24629] 20:40:33 mmap(NULL, 29360128, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f5ca6678000
[pid 24629] 20:40:35 mmap(NULL, 50331648, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f5c9ee78000
[pid 24629] 20:40:36 mmap(NULL, 58720256, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f5c9b678000
[pid 24629] 20:40:36 mmap(NULL, 67108864, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f5c97678000
[pid 24629] 20:40:36 mmap(NULL, 83886080, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f5c92678000
[pid 24629] 20:40:39 mmap(NULL, 100663296, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f5c8c678000
[pid 24629] 20:40:41 mmap(NULL, 134217728, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f5c7d678000
[pid 24629] 20:40:42 mmap(NULL, 167772160, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f5c73678000
[pid 24629] 20:40:42 mmap(NULL, 201326592, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f5c67678000
[pid 24629] 20:40:43 mmap(NULL, 234881024, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f5c59378000
[pid 24629] 20:40:43 mmap(NULL, 268435456, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f5c49378000
[pid 24629] 20:40:44 mmap(NULL, 335544320, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f5c35378000

在 /proc/$pid/status 中,我们看到 指标 VmHWM 同时从 31.04GB 增加到 32.12GB,这意味着在某个时刻进程使用的内存高达 32.12GB。但在 NativeMemoryTool 中,我们仅对 java.nmt.compiler.commissed=122MB 的记录进行采样。 使用 -XX:+PrintCompilation 选项,似乎没有太多方法被编译(20:40:33 到 20:40:44 的范围是 1937 到 1948 秒,替换了一些包的名称)

1936335 58263       4       com.wecorp.webiz.product.service.v1.PublicParametersEntity::put (4606 bytes)
1936376 52134       3       org.apache.catalina.valves.AbstractAccessLogValve$RequestURIElement::addElement (24 bytes)   made not entrant
1936507 58259       3       com.wecorp.webiz.facade.mapper.PublicParametersEntityMapper::mapToInternal (6527 bytes)
1936634 58262       4       com.wecorp.webiz.cache.DataBusinessV2::getHash (143 bytes)
1936986 56014       3       com.wecorp.webiz.cache.DataBusinessV2::getHash (143 bytes)   made not entrant
1937112 58251       4       java.util.IdentityHashMap::containsKey (55 bytes)
1937141  898       3       java.util.IdentityHashMap::containsKey (55 bytes)   made not entrant
1937288 58264       4       com.wecorp.webiz.business.dynamic.simplefilter.Filter$$Lambda$1463/0x0000000800b54040::apply (13 bytes)
1937334 36436       2       com.wecorp.webiz.business.dynamic.simplefilter.Filter$$Lambda$1463/0x0000000800b54040::apply (13 bytes)   made not entrant
1938478 58267       4       com.wecorp.webiz.facade.mapper.WebizFacilitiesEntityMapper::mapToInternal (1541 bytes)
1938588 52139   !   3       org.apache.catalina.core.StandardHostValve::invoke (402 bytes)   made not entrant
1938739 58272   !   4       io.grpc.internal.MessageDeframer::readRequiredBytes (558 bytes)
1939036 53988       4       com.wecorp.webiz.cache.BaseInfoBusinessV10::getByBaseId (109 bytes)   made not entrant
1942257 23486   !   3       io.grpc.internal.MessageDeframer::readRequiredBytes (558 bytes)   made not entrant
1942481 58276       4       com.wecorp.webiz.business.plan.ResourceTag::getSpecialTypes (552 bytes)
1942733 47364       3       com.wecorp.webiz.facade.mapper.WebizFacilitiesEntityMapper::mapToInternal (1541 bytes)   made not entrant
1942768 58166       3       com.wecorp.webiz.business.plan.ResourceTag::getSpecialTypes (552 bytes)   made not entrant
1943829 58278       4       com.wecorp.webiz.business.plan.model.hour.HourStgy::isHour (14 bytes)
1943841 58282       4       com.wecorp.webiz.business.plan.model.hour.CodeHour::isHour (73 bytes)
1944863 58167       3       com.wecorp.webiz.business.plan.model.hour.CodeHour::isHour (73 bytes)   made not entrant
1944928 58168       3       com.wecorp.webiz.business.plan.model.hour.HourStgy::isHour (14 bytes)   made not entrant
1945127 58271       4       com.wecorp.webiz.business.factory.lazy.LazyToWebizVMS::isSupport (39 bytes)
1945134 46559       3       com.wecorp.webiz.business.factory.lazy.LazyToWebizVMS::isSupport (39 bytes)   made not entrant
1946397 58269       4       io.grpc.Deadline$SystemTicker::nanoTime (4 bytes)
1946418 23483       3       io.grpc.Deadline$SystemTicker::nanoTime (4 bytes)   made not entrant
1947083 58295       4       com.wecorp.infra.kafka.common.protocol.types.Struct::instance (13 bytes)
1947448 54974       4       com.wecorp.webiz.business.dynamic.webizfilterrules.FilterSpecialWebiz::filter (174 bytes)   made not entrant
1949042 40631       3       com.wecorp.infra.kafka.common.protocol.types.Struct::instance (13 bytes)   made not entrant
1949471 58297       4       com.wecorp.webiz.business.dynamic.webizfilterrules.FilterSpecialWebiz::filter (174 bytes)
1949474 58306       4       com.wecorp.webiz.product.cache.Cache::isDataReady (26 bytes)
1949479 27680       3       com.wecorp.webiz.product.cache.Cache::isDataReady (26 bytes)   made not entrant

请问为什么C2Compiler::compile_method中,即PhaseCFG::insert_anti_dependences中,会出现连续的峰值分配? Arena::grow的代码似乎没有释放每个块?这意味着在某个时候所有分配的内存都在使用? 我们是否可以在 docker 内存有限的环境下避免这种持续的大尺寸分配,但仍然启用 C2Compiler?

为了解决 docker oom,我们尝试使用本机内存工具以及 pmap 或 strace 等许多工具进行连续度量。现在我们怀疑 JDK11u 的 C2Compiler::compile_method 中存在分配峰值,我们希望了解该行为并解决它。

java java-11 jit
1个回答
0
投票

也许参考https://bugs.openjdk.org/browse/JDK-8259541,并尝试添加 -XX:TieredStopAtLevel=1

© www.soinside.com 2019 - 2024. All rights reserved.