我们遇到了Java程序崩溃的问题。系统生成崩溃日志,其中包含消息“Java 运行时环境检测到致命错误”。 Java运行时环境为“JRE版本:OpenJDK运行时环境(8.0_301-b02)(内部版本1.8.0_301-b02)”。指示异常的位置是:“有问题的帧:# j io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(II)Lio/netty/buffer/ByteBuf;+7#”。
我们要找到根本原因。我们尝试了很多方法,但还是没有解决问题。
我们调整了JVM堆内存大小和转储参数,但还是不行。
我不太确定调整 JVM 堆内存大小和转储参数会对您有很大帮助,因为 Netty(大部分)使用直接内存,因此在处理像 Netty 这样直接与本机资源(例如直接内存)交互的库时,此类问题很常见分配)和您的异常堆栈跟踪。
"Problematic frame: # j io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(II)Lio/netty/buffer/ByteBuf;+7#".
这也指出了同样的问题:崩溃发生在 Netty 缓冲区管理内的直接内存分配操作期间。此崩溃表明它是在使用 Netty 池分配器分配新的直接字节缓冲区时发生的。这涉及直接内存分配问题,而不是 Java 堆内存问题。
您注意到此类问题的原因可能有多种。
您可能应该使用本机内存跟踪 (NMT) 或内存映射等工具来分析和管理直接内存使用模式。
还值得验证直接缓冲区在使用后是否被释放 Netty
ResourceLeakDetector.setLevel(ResourceLeakDetector.Level.PARANOID)
可以帮助识别资源泄漏
如果一切看起来仍然正确,那么也许值得尝试调整
PooledByteBufAllocator.
我将分享我在应用程序中为克服此类问题所做的工作。
// IO netty related properties
{
// Reduce io netty thread cache intervals.
System.setProperty("io.netty.allocator.cacheTrimInterval", "256");
System.setProperty("io.netty.allocator.cacheTrimIntervalMillis", "60000");
// to fix https://github.com/spring-projects/spring-framework/issues/21174
System.setProperty("io.netty.allocator.useCacheForAllThreads", "false");
// due to netty internal change direct buffers are not cleared - https://github.com/netty/netty/issues/12333 hence reverting to previously tested value
System.setProperty("io.netty.allocator.maxOrder", "9"); // 8192 << 9 = 4MB buffer size
// Netty will use cleaner, and it will not enforce max memory, and instead will defer to JDK.
System.setProperty("io.netty.maxDirectMemory", "0");
// restrict default netty threads to be maximum 2
System.setProperty("io.netty.eventLoopThreads", "2");
// for netty optimization
System.setProperty("io.netty.tryReflectionSetAccessible", "true");
}
// micrometer netty related properties due to shaded libraries.
{
// Reduce io netty thread cache intervals.
System.setProperty("io.micrometer.shaded.io.netty.allocator.cacheTrimInterval", "256");
System.setProperty("io.micrometer.shaded.io.netty.allocator.cacheTrimIntervalMillis", "60000");
// to fix https://github.com/spring-projects/spring-framework/issues/21174
System.setProperty("io.micrometer.shaded.io.netty.allocator.useCacheForAllThreads", "false");
// due to netty internal change direct buffers are not cleared - https://github.com/netty/netty/issues/12333 hence reverting to previously tested value
System.setProperty("io.micrometer.shaded.io.netty.allocator.maxOrder", "9"); // 8192 << 9 = 4MB buffer size
// Netty will use cleaner, and it will not enforce max memory, and instead will defer to JDK.
System.setProperty("io.micrometer.shaded.io.netty.maxDirectMemory", "0");
// restrict default netty threads to be maximum 2
System.setProperty("io.micrometer.shaded.io.netty.eventLoopThreads", "2");
// for netty optimization
System.setProperty("io.micrometer.shaded.io.netty.tryReflectionSetAccessible", "true");
}
// https://www.evanjones.ca/java-bytebuffer-leak.html
System.setProperty("jdk.nio.maxCachedBufferSize", "262144");
@Service
@Slf4j
final class DirectBufferMonitorImpl implements DirectBufferMonitor {
private static final String POOL_NAME = "direct";
private static final Collection<Thread.State> THREAD_STATES = Set.of(State.BLOCKED,
State.RUNNABLE,
State.TIMED_WAITING,
State.WAITING);
private final MetricsService metricsService;
@Autowired
DirectBufferMonitorImpl(final MetricsService metricsService) {
this.metricsService = metricsService;
}
@Override
@Scheduled(fixedDelayString = "${" + AppProperties.EVENT_FORWARDER_DIRECT_BUFFER_MONITOR_JOB_INTERVAL_IN_MILLIS + "}")
public void record() {
try {
measureRuntimeSystemBehaviour();
} catch (final Exception exn) {
log.debug("Exception occurred while recording jvm/netty direct buffer");
}
}
private void measureRuntimeSystemBehaviour() {
recordJvmBufferPool();
recordNettyDirectMemory();
recordJvmThreadMetrics();
}
private void recordJvmBufferPool() {
final Collection<BufferPoolMXBean> pools = ManagementFactory.getPlatformMXBeans(BufferPoolMXBean.class);
for (BufferPoolMXBean pool : pools) {
// we only want to measure direct buffer pool at this point because we aren't using other pool such memory mapped.
if (Objects.nonNull(pool) && POOL_NAME.equals(pool.getName())) {
log.trace("Started recording direct buffer pools");
this.metricsService.gaugeMetric(Metrics.JVM_DIRECT_BUFFER_COUNT)
.observe(pool, this::getBufferCount);
this.metricsService.gaugeMetric(Metrics.JVM_DIRECT_BUFFER_MEMORY_USED_IN_BYTES)
.observe(pool, this::getMemoryUsed);
}
}
}
private void recordNettyDirectMemory() {
final Current current = NettyDirectMemoryMonitor.measure();
if (current != null) {
log.debug("reservedMemory={}, maxMemory={}", current.getReservedMemory(), current.getMaxMemory());
this.metricsService.gaugeMetric(Metrics.NETTY_RESERVED_MEMORY_USED_IN_BYTES)
.observe(current, current::getReservedMemoryUsage);
this.metricsService.gaugeMetric(Metrics.NETTY_MAX_MEMORY_USED_IN_BYTES)
.observe(current, current::getMaxMemoryUsage);
}
}
private void recordJvmThreadMetrics() {
final ThreadMXBean threadBean = ManagementFactory.getThreadMXBean();
observe(Metrics.JVM_THREAD_PEAK, threadBean, ThreadMXBean::getPeakThreadCount);
observe(Metrics.JVM_THREAD_LIVE, threadBean, ThreadMXBean::getThreadCount);
try {
for (State state : THREAD_STATES) {
switch (state) {
case BLOCKED:
observe(Metrics.JVM_BLOCKED_THREAD_COUNT, threadBean, toDoubleFunction(state));
break;
case RUNNABLE:
observe(Metrics.JVM_RUNNABLE_THREAD_COUNT, threadBean, toDoubleFunction(state));
break;
case TIMED_WAITING:
observe(Metrics.JVM_TIMED_WAITING_THREAD_COUNT, threadBean, toDoubleFunction(state));
break;
case WAITING:
observe(Metrics.JVM_WAITING_THREAD_COUNT, threadBean, toDoubleFunction(state));
break;
default:
break;
}
}
} catch (final Error ignore) {
// An error will be thrown for unsupported operations
}
}
private ToDoubleFunction<ThreadMXBean> toDoubleFunction(final State state) {
return (bean) -> getThreadStateCount(bean, state);
}
private <T> void observe(final String metricsName,
final T objectToObserve,
final ToDoubleFunction<T> valueHarvestingCallback) {
if (ObjectUtils.anyNull(objectToObserve, valueHarvestingCallback)) {
log.info("Recording OffHeap cache stats is skipped because either "
+ "objectToObserve: [{}] or valueHarvestingCallback: [{}] is null/empty",
objectToObserve,
valueHarvestingCallback);
return;
}
this.metricsService.gaugeMetric(metricsName)
.observe(objectToObserve, valueHarvestingCallback);
}
private long getBufferCount(final BufferPoolMXBean bufferPoolMXBean) {
return bufferPoolMXBean.getCount();
}
private long getMemoryUsed(final BufferPoolMXBean bufferPoolMXBean) {
return bufferPoolMXBean.getMemoryUsed();
}
private long getThreadStateCount(final ThreadMXBean threadBean, final State state) {
return Arrays.stream(threadBean.getThreadInfo(threadBean.getAllThreadIds()))
.filter(threadInfo -> threadInfo != null && threadInfo.getThreadState() == state)
.count();
}
/**
* A helper class for recording netty max/reserved memory usage.
*/
@AllArgsConstructor
@Getter
private static final class Current {
long maxMemory;
long reservedMemory;
private long getMaxMemoryUsage(final Current current) {
return current.getMaxMemory();
}
private long getReservedMemoryUsage(final Current current) {
return current.getReservedMemory();
}
}
/**
* A netty direct memory monitor helper class.
*/
private static final class NettyDirectMemoryMonitor {
private static final String CLASS_NAME = "io.netty.util.internal.PlatformDependent";
private static final Class<?> NETTY_INTERNAL_PLATFORM_DEPENDENT_CLASS = load();
private static final String DIRECT_MEMORY_LIMIT_FIELD_NAME = "DIRECT_MEMORY_LIMIT";
private static final String DIRECT_MEMORY_COUNTER_FIELD_NAME = "DIRECT_MEMORY_COUNTER";
private static final Supplier<Long> DIRECT_MEMORY_LIMIT_GETTER = getValue(DIRECT_MEMORY_LIMIT_FIELD_NAME);
private static final Supplier<Long> RESERVED_MEMORY_GETTER = getValue(DIRECT_MEMORY_COUNTER_FIELD_NAME);
private static Current measure() {
try {
return new Current(DIRECT_MEMORY_LIMIT_GETTER.get(), RESERVED_MEMORY_GETTER.get());
} catch (final Exception exception) {
log.debug("Error measuring direct memory.", exception);
return null;
}
}
@CheckForNull
private static Class<?> load() {
try {
return Class.forName(CLASS_NAME);
} catch (final Exception exception) {
log.debug("Unable to load netty platform dependent class", exception);
return null;
}
}
private static Supplier<Long> getValue(final String fieldName) {
if (Objects.isNull(NETTY_INTERNAL_PLATFORM_DEPENDENT_CLASS)) {
log.debug("No netty internal platform dependent class instance found");
return () -> 0L;
}
try {
final Field field = NETTY_INTERNAL_PLATFORM_DEPENDENT_CLASS.getDeclaredField(fieldName);
field.setAccessible(true);
return safeSupplier(() -> {
final Object object = field.get(null);
if (Objects.isNull(object)) {
log.debug("Netty direct max/resrved memory value is null");
return 0L;
}
if (object instanceof AtomicLong) {
final AtomicLong value = (AtomicLong)object;
return value.get();
}
return ((Long)object);
});
} catch (final Exception exception) {
log.debug("Unable to retrieve direct memory related monitor", exception);
return () -> 0L;
}
}
}
}
PooledByteBufAllocator
设置是/**
* The PooledByteBufAllocator from Netty creates ThreadLocal caches even for non-Netty Threads.
* These caches quickly move to Old Gen and do not get collected during normal G1 collections.
* Hence, setting {@code useCacheForAllThreads} as false fixes underlying issue by only using ThreadLocal caches in Netty Threads.
*
* @see <a href="https://github.com/spring-projects/spring-framework/issues/21174">Resource leak</a>
* @see <a href="https://www.kaper.com/notes/netty-cache-thread-memory-issues/">Resource leak</a>
*/
private DataBufferFactory getNettyDataBufferFactory() {
final ByteBufAllocator byteBufAllocator = new PooledByteBufAllocator(PlatformDependent.directBufferPreferred(),
PooledByteBufAllocator.defaultNumHeapArena(),
PooledByteBufAllocator.defaultNumDirectArena(),
PooledByteBufAllocator.defaultPageSize(),
PooledByteBufAllocator.defaultMaxOrder(),
PooledByteBufAllocator.defaultSmallCacheSize(),
PooledByteBufAllocator.defaultNormalCacheSize(),
false);
return new NettyDataBufferFactory(byteBufAllocator);
}
ResourceLeakDetector
设置为/**
* Post construct.
*/
@PostConstruct
public void init() {
setResourceLeakDetectionLevel();
}
/**
* This will assist in diagnosing potential resource leak problems when reference counting
* to handle {@link io.netty.buffer.ByteBuf}s is not properly released.
* Since it has a heavy impact on performance, we would like to use only for the debugging purpose.
*
* @see <a href="https://netty.io/4.1/api/io/netty/util/ResourceLeakDetector.Level.html">Resource leak detection</a>
*/
private void setResourceLeakDetectionLevel() {
final String level = this.env.getRequiredProperty(AppProperties.RESOURCE_LEAK_DETECTION_LEVEL);
log.trace("Setting resource leak detector level to {}", level);
ResourceLeakDetector.setLevel(ResourceLeakDetector.Level.valueOf(level.toUpperCase()));
}
Epoll
事件循环,因为通常,与 NIO 相比,Epoll 消耗的内存更少,因为它使用本机、堆外数据结构来管理事件。这意味着 Java 堆上使用的内存更少,从而减少了垃圾收集压力,而且 Epoll 由于能够有效处理大量连接,因此在 Linux 上提供了更好的性能。