警告:问题有点长,但分隔线以下的部分仅供好奇。
Oracle 的 JDK 7 实现的 AtomicInteger 包括以下方法:
public final int addAndGet(int delta) {
for (;;) {
int current = get();
int next = current + delta; // Only difference
if (compareAndSet(current, next))
return next;
}
}
public final int incrementAndGet() {
for (;;) {
int current = get();
int next = current + 1; // Only difference
if (compareAndSet(current, next))
return next;
}
}
很明显可以编写第二种方法:
public final int incrementAndGet() {
return addAndGet(1);
}
该类中还有其他几个类似代码重复的示例。除了性能考虑(*)之外,我想不出任何这样做的理由。我很确定作者在决定该设计之前做了一些深入的测试。
为什么(或在什么情况下)第一个代码的性能比第二个代码更好?
(*) 我无法抗拒,但写了一个快速的微基准测试。它显示(JIT 后)
addAndGet(1)
与 incrementAndGet()
相比,系统性存在 2-4% 的性能差距(诚然,这个差距很小,但非常一致)。老实说,我也无法真正解释这个结果......
输出:
增量并获取():905
添加并获取(1):868
增量并获取():902
添加并获取(1):863
增量并获取():891
添加并获取(1):867
...
代码:
public static void main(String[] args) throws Exception {
final int size = 100_000_000;
long start, end;
AtomicInteger ai;
System.out.println("JVM warmup");
for (int j = 0; j < 10; j++) {
start = System.nanoTime();
ai = new AtomicInteger();
for (int i = 0; i < size / 10; i++) {
ai.addAndGet(1);
}
end = System.nanoTime();
System.out.println("addAndGet(1): " + ((end - start) / 1_000_000));
start = System.nanoTime();
ai = new AtomicInteger();
for (int i = 0; i < size / 10; i++) {
ai.incrementAndGet();
}
end = System.nanoTime();
System.out.println("incrementAndGet(): " + ((end - start) / 1_000_000));
}
System.out.println("\nStart measuring\n");
for (int j = 0; j < 10; j++) {
start = System.nanoTime();
ai = new AtomicInteger();
for (int i = 0; i < size; i++) {
ai.incrementAndGet();
}
end = System.nanoTime();
System.out.println("incrementAndGet(): " + ((end - start) / 1_000_000));
start = System.nanoTime();
ai = new AtomicInteger();
for (int i = 0; i < size; i++) {
ai.addAndGet(1);
}
end = System.nanoTime();
System.out.println("addAndGet(1): " + ((end - start) / 1_000_000));
}
}
我将给出一个新的猜想。如果我们查看
AtomicInteger
的字节码,我们会发现它们之间的主要区别是 addAndGet
使用 iload_
指令,而 incrementAndGet
使用 iconst_
指令:
public final int addAndGet(int);
...
4: istore_2
5: iload_2
6: iload_1
7: iadd
public final int incrementAndGet();
...
4: istore_1
5: iload_1
6: iconst_1
7: iadd
看起来,
iconst_
+iadd
翻译为INC
指令,因为iload_
...iadd
为ADD
指令。这一切都与关于 ADD 1
vs INC
等众所周知的问题相关:
这可能就是答案,为什么
addAndGet
比incrementAndGet
稍快
出于好奇,这里是 JIT 生成的汇编代码。总结起来,主要区别是:
incrementAndGet
mov r8d,eax
inc r8d ;*iadd
addAndGet
mov r9d,r8d
add r9d,eax ;*iadd
其余代码基本相同。这证实了:
INC
与 ADD 1
我不太擅长阅读汇编,不知道为什么这会产生影响。这并不能真正回答我最初的问题。
完整列表(incrementAndGet):
# {method} 'incrementAndGet' '()I' in 'java/util/concurrent/atomic/AtomicInteger'
# [sp+0x20] (sp of caller)
0x00000000026804c0: mov r10d,DWORD PTR [rdx+0x8]
0x00000000026804c4: shl r10,0x3
0x00000000026804c8: cmp rax,r10
0x00000000026804cb: jne 0x0000000002657b60 ; {runtime_call}
0x00000000026804d1: data32 xchg ax,ax
0x00000000026804d4: nop DWORD PTR [rax+rax*1+0x0]
0x00000000026804dc: data32 data32 xchg ax,ax
[Verified Entry Point]
0x00000000026804e0: sub rsp,0x18
0x00000000026804e7: mov QWORD PTR [rsp+0x10],rbp ;*synchronization entry
; - java.util.concurrent.atomic.AtomicInteger::incrementAndGet@-1 (line 204)
0x00000000026804ec: mov eax,DWORD PTR [rdx+0xc] ;*invokevirtual compareAndSwapInt
; - java.util.concurrent.atomic.AtomicInteger::compareAndSet@9 (line 135)
; - java.util.concurrent.atomic.AtomicInteger::incrementAndGet@12 (line 206)
0x00000000026804ef: mov r8d,eax
0x00000000026804f2: inc r8d ;*iadd
; - java.util.concurrent.atomic.AtomicInteger::incrementAndGet@7 (line 205)
0x00000000026804f5: lock cmpxchg DWORD PTR [rdx+0xc],r8d
0x00000000026804fb: sete r11b
0x00000000026804ff: movzx r11d,r11b ;*invokevirtual compareAndSwapInt
; - java.util.concurrent.atomic.AtomicInteger::compareAndSet@9 (line 135)
; - java.util.concurrent.atomic.AtomicInteger::incrementAndGet@12 (line 206)
0x0000000002680503: test r11d,r11d
0x0000000002680506: je 0x0000000002680520 ;*iload_2
; - java.util.concurrent.atomic.AtomicInteger::incrementAndGet@18 (line 207)
0x0000000002680508: mov eax,r8d
0x000000000268050b: add rsp,0x10
0x000000000268050f: pop rbp
0x0000000002680510: test DWORD PTR [rip+0xfffffffffdbafaea],eax # 0x0000000000230000
; {poll_return}
0x0000000002680516: ret
0x0000000002680517: nop WORD PTR [rax+rax*1+0x0] ; OopMap{rdx=Oop off=96}
;*goto
; - java.util.concurrent.atomic.AtomicInteger::incrementAndGet@20 (line 208)
0x0000000002680520: test DWORD PTR [rip+0xfffffffffdbafada],eax # 0x0000000000230000
;*goto
; - java.util.concurrent.atomic.AtomicInteger::incrementAndGet@20 (line 208)
; {poll}
0x0000000002680526: mov r11d,DWORD PTR [rdx+0xc] ;*invokevirtual compareAndSwapInt
; - java.util.concurrent.atomic.AtomicInteger::compareAndSet@9 (line 135)
; - java.util.concurrent.atomic.AtomicInteger::incrementAndGet@12 (line 206)
0x000000000268052a: mov r8d,r11d
0x000000000268052d: inc r8d ;*iadd
; - java.util.concurrent.atomic.AtomicInteger::incrementAndGet@7 (line 205)
0x0000000002680530: mov eax,r11d
0x0000000002680533: lock cmpxchg DWORD PTR [rdx+0xc],r8d
0x0000000002680539: sete r11b
0x000000000268053d: movzx r11d,r11b ;*invokevirtual compareAndSwapInt
; - java.util.concurrent.atomic.AtomicInteger::compareAndSet@9 (line 135)
; - java.util.concurrent.atomic.AtomicInteger::incrementAndGet@12 (line 206)
0x0000000002680541: test r11d,r11d
0x0000000002680544: je 0x0000000002680520 ;*ifeq
; - java.util.concurrent.atomic.AtomicInteger::incrementAndGet@15 (line 206)
0x0000000002680546: jmp 0x0000000002680508
完整列表(addAndGet):
# {method} 'addAndGet' '(I)I' in 'java/util/concurrent/atomic/AtomicInteger'
# this: rdx:rdx = 'java/util/concurrent/atomic/AtomicInteger'
# parm0: r8 = int
# [sp+0x20] (sp of caller)
0x0000000002680d00: mov r10d,DWORD PTR [rdx+0x8]
0x0000000002680d04: shl r10,0x3
0x0000000002680d08: cmp rax,r10
0x0000000002680d0b: jne 0x0000000002657b60 ; {runtime_call}
0x0000000002680d11: data32 xchg ax,ax
0x0000000002680d14: nop DWORD PTR [rax+rax*1+0x0]
0x0000000002680d1c: data32 data32 xchg ax,ax
[Verified Entry Point]
0x0000000002680d20: sub rsp,0x18
0x0000000002680d27: mov QWORD PTR [rsp+0x10],rbp ;*synchronization entry
; - java.util.concurrent.atomic.AtomicInteger::addAndGet@-1 (line 233)
0x0000000002680d2c: mov eax,DWORD PTR [rdx+0xc] ;*invokevirtual compareAndSwapInt
; - java.util.concurrent.atomic.AtomicInteger::compareAndSet@9 (line 135)
; - java.util.concurrent.atomic.AtomicInteger::addAndGet@12 (line 235)
0x0000000002680d2f: mov r9d,r8d
0x0000000002680d32: add r9d,eax ;*iadd
; - java.util.concurrent.atomic.AtomicInteger::addAndGet@7 (line 234)
0x0000000002680d35: lock cmpxchg DWORD PTR [rdx+0xc],r9d
0x0000000002680d3b: sete r11b
0x0000000002680d3f: movzx r11d,r11b ;*invokevirtual compareAndSwapInt
; - java.util.concurrent.atomic.AtomicInteger::compareAndSet@9 (line 135)
; - java.util.concurrent.atomic.AtomicInteger::addAndGet@12 (line 235)
0x0000000002680d43: test r11d,r11d
0x0000000002680d46: je 0x0000000002680d60 ;*iload_3
; - java.util.concurrent.atomic.AtomicInteger::addAndGet@18 (line 236)
0x0000000002680d48: mov eax,r9d
0x0000000002680d4b: add rsp,0x10
0x0000000002680d4f: pop rbp
0x0000000002680d50: test DWORD PTR [rip+0xfffffffffdbaf2aa],eax # 0x0000000000230000
; {poll_return}
0x0000000002680d56: ret
0x0000000002680d57: nop WORD PTR [rax+rax*1+0x0] ; OopMap{rdx=Oop off=96}
;*goto
; - java.util.concurrent.atomic.AtomicInteger::addAndGet@20 (line 237)
0x0000000002680d60: test DWORD PTR [rip+0xfffffffffdbaf29a],eax # 0x0000000000230000
;*goto
; - java.util.concurrent.atomic.AtomicInteger::addAndGet@20 (line 237)
; {poll}
0x0000000002680d66: mov r11d,DWORD PTR [rdx+0xc] ;*invokevirtual compareAndSwapInt
; - java.util.concurrent.atomic.AtomicInteger::compareAndSet@9 (line 135)
; - java.util.concurrent.atomic.AtomicInteger::addAndGet@12 (line 235)
0x0000000002680d6a: mov r9d,r11d
0x0000000002680d6d: add r9d,r8d ;*iadd
; - java.util.concurrent.atomic.AtomicInteger::addAndGet@7 (line 234)
0x0000000002680d70: mov eax,r11d
0x0000000002680d73: lock cmpxchg DWORD PTR [rdx+0xc],r9d
0x0000000002680d79: sete r11b
0x0000000002680d7d: movzx r11d,r11b ;*invokevirtual compareAndSwapInt
; - java.util.concurrent.atomic.AtomicInteger::compareAndSet@9 (line 135)
; - java.util.concurrent.atomic.AtomicInteger::addAndGet@12 (line 235)
0x0000000002680d81: test r11d,r11d
0x0000000002680d84: je 0x0000000002680d60 ;*ifeq
; - java.util.concurrent.atomic.AtomicInteger::addAndGet@15 (line 235)
0x0000000002680d86: jmp 0x0000000002680d48
为了扩展@AlexeiKaigorodov的答案,如果这是真正的Java代码,它会更快,因为它会消除调用堆栈上的额外帧。这使得它运行得更快(为什么不呢?),并且可能意味着对循环的多个并发调用不太可能失败,从而导致循环重复运行。 (不过,我无法凭空想出任何这样的理由。)
但是,通过您的微基准测试,代码可能不是真实的,并且
incrementAndGet()
方法是按照您指定的方式在本机代码中实现的,或者两者都只是内部指令(委托给 x86 上的 lock:xadd
例子)。然而,通常很难预测 JVM 一直在做什么,并且可能还有其他原因导致这种情况。
为了完成讨论,并发兴趣 - JSR-166 的讨论列表邮件列表中也提出了同样的问题,几乎与此处同时。
这里是线程的开始 - [并发兴趣] AtomicInteger 实现讨论 AtomicInteger 实现。
原因是他们更愿意让代码更快,但以牺牲代码大小为代价。
我确信,消息来源是真实的。如果它们是内在函数,它们将被标记为本机。