Spring Boot程序系统崩溃了

问题描述 投票:0回答:1

我们遇到了Java程序崩溃的问题。系统生成崩溃日志,其中包含消息“Java 运行时环境检测到致命错误”。 Java运行时环境为“JRE版本:OpenJDK运行时环境(8.0_301-b02)(内部版本1.8.0_301-b02)”。指示异常的位置是:“有问题的帧:# j io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(II)Lio/netty/buffer/ByteBuf;+7#”。

我们要找到根本原因。我们尝试了很多方法,但还是没有解决问题。

我们调整了JVM堆内存大小和转储参数,但还是不行。

java spring-boot crash netty
1个回答
0
投票

我不太确定调整 JVM 堆内存大小和转储参数会对您有很大帮助,因为 Netty(大部分)使用直接内存,因此在处理像 Netty 这样直接与本机资源(例如直接内存)交互的库时,此类问题很常见分配)和您的异常堆栈跟踪。

"Problematic frame: # j io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(II)Lio/netty/buffer/ByteBuf;+7#".

这也指出了同样的问题:崩溃发生在 Netty 缓冲区管理内的直接内存分配操作期间。此崩溃表明它是在使用 Netty 池分配器分配新的直接字节缓冲区时发生的。这涉及直接内存分配问题,而不是 Java 堆内存问题。

您注意到此类问题的原因可能有多种。

  • 直接内存不足:可能会超出 JVM 的直接内存(堆外)限制,这与堆大小不同。 Netty 使用直接内存进行 I/O 操作,这些操作在堆外部进行管理。
  • 内存碎片:即使您有足够的总直接内存,碎片也会阻止大量连续分配,从而导致分配失败。
  • 内存泄漏:不正确释放的缓冲区或保留的引用可能会导致内存泄漏,最终耗尽直接内存。
  • 本机内存损坏:这可能是由 Netty 或 undNettyng 系统库使用的本机代码中的错误引起的。

您可能应该使用本机内存跟踪 (NMT) 或内存映射等工具来分析和管理直接内存使用模式。

还值得验证直接缓冲区在使用后是否被释放 Netty

ResourceLeakDetector.setLevel(ResourceLeakDetector.Level.PARANOID)
可以帮助识别资源泄漏

如果一切看起来仍然正确,那么也许值得尝试调整

PooledByteBufAllocator.

我将分享我在应用程序中为克服此类问题所做的工作。

  • 调整了一些环境设置以帮助控制内存。现在使用netty的库非常多,而Netty的它们都是被遮蔽的,这也对应用程序资源有贡献,而大多数情况下,这些都被隐藏和忽略了
// IO netty related properties
{
// Reduce io netty thread cache intervals.
System.setProperty("io.netty.allocator.cacheTrimInterval", "256");
System.setProperty("io.netty.allocator.cacheTrimIntervalMillis", "60000");
// to fix https://github.com/spring-projects/spring-framework/issues/21174
System.setProperty("io.netty.allocator.useCacheForAllThreads", "false");
// due to netty internal change direct buffers are not cleared - https://github.com/netty/netty/issues/12333 hence reverting to previously tested value
System.setProperty("io.netty.allocator.maxOrder", "9"); // 8192 << 9 = 4MB buffer size
// Netty will use cleaner, and it will not enforce max memory, and instead will defer to JDK.
System.setProperty("io.netty.maxDirectMemory", "0");
// restrict default netty threads to be maximum 2
System.setProperty("io.netty.eventLoopThreads", "2");
// for netty optimization
System.setProperty("io.netty.tryReflectionSetAccessible", "true");
}
// micrometer netty related properties due to shaded libraries.
{
// Reduce io netty thread cache intervals.
System.setProperty("io.micrometer.shaded.io.netty.allocator.cacheTrimInterval", "256");
System.setProperty("io.micrometer.shaded.io.netty.allocator.cacheTrimIntervalMillis", "60000");
// to fix https://github.com/spring-projects/spring-framework/issues/21174
System.setProperty("io.micrometer.shaded.io.netty.allocator.useCacheForAllThreads", "false");
// due to netty internal change direct buffers are not cleared - https://github.com/netty/netty/issues/12333 hence reverting to previously tested value
System.setProperty("io.micrometer.shaded.io.netty.allocator.maxOrder", "9"); // 8192 << 9 = 4MB buffer size
// Netty will use cleaner, and it will not enforce max memory, and instead will defer to JDK.
System.setProperty("io.micrometer.shaded.io.netty.maxDirectMemory", "0");
// restrict default netty threads to be maximum 2
System.setProperty("io.micrometer.shaded.io.netty.eventLoopThreads", "2");
// for netty optimization
System.setProperty("io.micrometer.shaded.io.netty.tryReflectionSetAccessible", "true");
}
// https://www.evanjones.ca/java-bytebuffer-leak.html
System.setProperty("jdk.nio.maxCachedBufferSize", "262144");
  • 不确定您在应用程序中使用了任何应用程序可观察性工具,但如果您使用了,您可以观察和监控与 Netty 行为特别相关的应用程序资源的运行状况
@Service
@Slf4j
final class DirectBufferMonitorImpl implements DirectBufferMonitor {

    private static final String POOL_NAME = "direct";
    private static final Collection<Thread.State> THREAD_STATES = Set.of(State.BLOCKED,
            State.RUNNABLE,
            State.TIMED_WAITING,
            State.WAITING);

    private final MetricsService metricsService;

    @Autowired
    DirectBufferMonitorImpl(final MetricsService metricsService) {
        this.metricsService = metricsService;
    }

    @Override
    @Scheduled(fixedDelayString = "${" + AppProperties.EVENT_FORWARDER_DIRECT_BUFFER_MONITOR_JOB_INTERVAL_IN_MILLIS + "}")
    public void record() {
        try {
            measureRuntimeSystemBehaviour();
        } catch (final Exception exn) {
            log.debug("Exception occurred while recording jvm/netty direct buffer");
        }
    }

    private void measureRuntimeSystemBehaviour() {
        recordJvmBufferPool();

        recordNettyDirectMemory();

        recordJvmThreadMetrics();
    }

    private void recordJvmBufferPool() {
        final Collection<BufferPoolMXBean> pools = ManagementFactory.getPlatformMXBeans(BufferPoolMXBean.class);
        for (BufferPoolMXBean pool : pools) {
            // we only want to measure direct buffer pool at this point because we aren't using other pool such memory mapped.
            if (Objects.nonNull(pool) && POOL_NAME.equals(pool.getName())) {
                log.trace("Started recording direct buffer pools");

                this.metricsService.gaugeMetric(Metrics.JVM_DIRECT_BUFFER_COUNT)
                        .observe(pool, this::getBufferCount);
                this.metricsService.gaugeMetric(Metrics.JVM_DIRECT_BUFFER_MEMORY_USED_IN_BYTES)
                        .observe(pool, this::getMemoryUsed);
            }
        }
    }

    private void recordNettyDirectMemory() {
        final Current current = NettyDirectMemoryMonitor.measure();
        if (current != null) {
            log.debug("reservedMemory={}, maxMemory={}", current.getReservedMemory(), current.getMaxMemory());
            this.metricsService.gaugeMetric(Metrics.NETTY_RESERVED_MEMORY_USED_IN_BYTES)
                    .observe(current, current::getReservedMemoryUsage);
            this.metricsService.gaugeMetric(Metrics.NETTY_MAX_MEMORY_USED_IN_BYTES)
                    .observe(current, current::getMaxMemoryUsage);
        }
    }

    private void recordJvmThreadMetrics() {
        final ThreadMXBean threadBean = ManagementFactory.getThreadMXBean();

        observe(Metrics.JVM_THREAD_PEAK, threadBean, ThreadMXBean::getPeakThreadCount);
        observe(Metrics.JVM_THREAD_LIVE, threadBean, ThreadMXBean::getThreadCount);

        try {
            for (State state : THREAD_STATES) {
                switch (state) {
                case BLOCKED:
                    observe(Metrics.JVM_BLOCKED_THREAD_COUNT, threadBean, toDoubleFunction(state));
                    break;
                case RUNNABLE:
                    observe(Metrics.JVM_RUNNABLE_THREAD_COUNT, threadBean, toDoubleFunction(state));
                    break;
                case TIMED_WAITING:
                    observe(Metrics.JVM_TIMED_WAITING_THREAD_COUNT, threadBean, toDoubleFunction(state));
                    break;
                case WAITING:
                    observe(Metrics.JVM_WAITING_THREAD_COUNT, threadBean, toDoubleFunction(state));
                    break;
                default:
                    break;
                }
            }
        } catch (final Error ignore) {
            // An error will be thrown for unsupported operations
        }
    }

    private ToDoubleFunction<ThreadMXBean> toDoubleFunction(final State state) {
        return (bean) -> getThreadStateCount(bean, state);
    }

    private <T> void observe(final String metricsName,
            final T objectToObserve,
            final ToDoubleFunction<T> valueHarvestingCallback) {
        if (ObjectUtils.anyNull(objectToObserve, valueHarvestingCallback)) {
            log.info("Recording OffHeap cache stats is skipped because either "
                            + "objectToObserve: [{}] or  valueHarvestingCallback: [{}] is null/empty",
                    objectToObserve,
                    valueHarvestingCallback);
            return;
        }
        this.metricsService.gaugeMetric(metricsName)
                .observe(objectToObserve, valueHarvestingCallback);
    }

    private long getBufferCount(final BufferPoolMXBean bufferPoolMXBean) {
        return bufferPoolMXBean.getCount();
    }

    private long getMemoryUsed(final BufferPoolMXBean bufferPoolMXBean) {
        return bufferPoolMXBean.getMemoryUsed();
    }

    private long getThreadStateCount(final ThreadMXBean threadBean, final State state) {
        return Arrays.stream(threadBean.getThreadInfo(threadBean.getAllThreadIds()))
                .filter(threadInfo -> threadInfo != null && threadInfo.getThreadState() == state)
                .count();
    }

    /**
     * A helper class for recording netty max/reserved memory usage.
     */
    @AllArgsConstructor
    @Getter
    private static final class Current {

        long maxMemory;
        long reservedMemory;

        private long getMaxMemoryUsage(final Current current) {
            return current.getMaxMemory();
        }

        private long getReservedMemoryUsage(final Current current) {
            return current.getReservedMemory();
        }
    }

    /**
     * A netty direct memory monitor helper class.
     */
    private static final class NettyDirectMemoryMonitor {

        private static final String CLASS_NAME = "io.netty.util.internal.PlatformDependent";
        private static final Class<?> NETTY_INTERNAL_PLATFORM_DEPENDENT_CLASS = load();
        private static final String DIRECT_MEMORY_LIMIT_FIELD_NAME = "DIRECT_MEMORY_LIMIT";
        private static final String DIRECT_MEMORY_COUNTER_FIELD_NAME = "DIRECT_MEMORY_COUNTER";
        private static final Supplier<Long> DIRECT_MEMORY_LIMIT_GETTER = getValue(DIRECT_MEMORY_LIMIT_FIELD_NAME);
        private static final Supplier<Long> RESERVED_MEMORY_GETTER = getValue(DIRECT_MEMORY_COUNTER_FIELD_NAME);

        private static Current measure() {
            try {
                return new Current(DIRECT_MEMORY_LIMIT_GETTER.get(), RESERVED_MEMORY_GETTER.get());
            } catch (final Exception exception) {
                log.debug("Error measuring direct memory.", exception);
                return null;
            }
        }

        @CheckForNull
        private static Class<?> load() {
            try {
                return Class.forName(CLASS_NAME);
            } catch (final Exception exception) {
                log.debug("Unable to load netty platform dependent class", exception);
                return null;
            }
        }

        private static Supplier<Long> getValue(final String fieldName) {
            if (Objects.isNull(NETTY_INTERNAL_PLATFORM_DEPENDENT_CLASS)) {
                log.debug("No netty internal platform dependent class instance found");
                return () -> 0L;
            }

            try {
                final Field field = NETTY_INTERNAL_PLATFORM_DEPENDENT_CLASS.getDeclaredField(fieldName);
                field.setAccessible(true);

                return safeSupplier(() -> {
                    final Object object = field.get(null);
                    if (Objects.isNull(object)) {
                        log.debug("Netty direct max/resrved memory value is null");
                        return 0L;
                    }
                    if (object instanceof AtomicLong) {
                        final AtomicLong value = (AtomicLong)object;
                        return value.get();
                    }
                    return ((Long)object);
                });
            } catch (final Exception exception) {
                log.debug("Unable to retrieve direct memory related monitor", exception);
                return () -> 0L;
            }
        }
    }
}
  • 我的
    PooledByteBufAllocator
    设置是
/**
     * The PooledByteBufAllocator from Netty creates ThreadLocal caches even for non-Netty Threads.
     * These caches quickly move to Old Gen and do not get collected during normal G1 collections.
     * Hence, setting {@code useCacheForAllThreads} as false fixes underlying issue by only using ThreadLocal caches in Netty Threads.
     *
     * @see <a href="https://github.com/spring-projects/spring-framework/issues/21174">Resource leak</a>
     * @see <a href="https://www.kaper.com/notes/netty-cache-thread-memory-issues/">Resource leak</a>
     */
    private DataBufferFactory getNettyDataBufferFactory() {
        final ByteBufAllocator byteBufAllocator = new PooledByteBufAllocator(PlatformDependent.directBufferPreferred(),
                PooledByteBufAllocator.defaultNumHeapArena(),
                PooledByteBufAllocator.defaultNumDirectArena(),
                PooledByteBufAllocator.defaultPageSize(),
                PooledByteBufAllocator.defaultMaxOrder(),
                PooledByteBufAllocator.defaultSmallCacheSize(),
                PooledByteBufAllocator.defaultNormalCacheSize(),
                false);
        return new NettyDataBufferFactory(byteBufAllocator);
    }
  • ResourceLeakDetector
    设置为
/**
     * Post construct.
     */
    @PostConstruct
    public void init() {
        setResourceLeakDetectionLevel();
    }
/**
     * This will assist in diagnosing potential resource leak problems when reference counting
     * to handle {@link io.netty.buffer.ByteBuf}s is not properly released.
     * Since it has a heavy impact on performance, we would like to use only for the debugging purpose.
     *
     * @see <a href="https://netty.io/4.1/api/io/netty/util/ResourceLeakDetector.Level.html">Resource leak detection</a>
     */
    private void setResourceLeakDetectionLevel() {
        final String level = this.env.getRequiredProperty(AppProperties.RESOURCE_LEAK_DETECTION_LEVEL);
        log.trace("Setting resource leak detector level to {}", level);
        ResourceLeakDetector.setLevel(ResourceLeakDetector.Level.valueOf(level.toUpperCase()));
    }
  • 在我们的应用程序中,与 NIO 相比,我们使用
    Epoll
    事件循环,因为通常,与 NIO 相比,Epoll 消耗的内存更少,因为它使用本机、堆外数据结构来管理事件。这意味着 Java 堆上使用的内存更少,从而减少了垃圾收集压力,而且 Epoll 由于能够有效处理大量连接,因此在 Linux 上提供了更好的性能。
© www.soinside.com 2019 - 2024. All rights reserved.