LOOPNE (Rel8)
Summary:
"Loop According to ECX Counter"
Reference:
https://www.felixcloutier.com/x86/LOOP:LOOPcc.html
Extension:
BASE
Category:
COND_BR
ISA-Set:
I86
CPL:
3
iform:
LOOPNE_RELBRb
iclass:
LOOPNE
ASM:
LOOPNE
Operands
Operand 1 (r):
Operand 2 (r/w, suppressed): Register (RCX)
Operand 3 (r/w, suppressed): Register (RIP)
Operand 4 (r, suppressed): Flags (ZF: r)
Available performance data
Alder Lake-P
Alder Lake-E
Rocket Lake
Tiger Lake
Ice Lake
Cascade Lake
Cannon Lake
Skylake-X
Coffee Lake
Kaby Lake
Skylake
Broadwell
Haswell
Ivy Bridge
Sandy Bridge
Westmere
Nehalem
Wolfdale
Conroe
Tremont
Goldmont Plus
Goldmont
Airmont
Bonnell
AMD Zen 4
AMD Zen 3
AMD Zen 2
AMD Zen+
Alder Lake-P
Measurements
Latencies
Latency operand 2 → 2:
3
Latency operand 4 → 2:
1
Throughput
Computed from the port usage: 3.00
Measured (loop):
6.00
Measured (unrolled):
5.97
Number of μops
Executed: 10
Retire slots: 12
Decoded (MITE): 4
Microcode Sequencer (MS): 8
Port usage:
3*p0156B+6*p06+1*p1
Alder Lake-E
Measurements
Latencies
Latency operand 2 → 2:
3
Latency operand 4 → 2:
2
Throughput
Measured (loop):
5.00
Measured (unrolled):
5.00
Number of μops
Executed: 8
Microcode Sequencer (MS): 7
Rocket Lake
Measurements
Latencies
Latency operand 2 → 2:
3
Latency operand 4 → 2:
2
Throughput
Computed from the port usage: 3.00
Measured (loop):
6.00
Measured (unrolled):
6.00
Number of μops
Executed: 11
Retire slots: 12
Decoded (MITE): 4
Microcode Sequencer (MS): 8
Port usage:
4*p0156+6*p06+1*p1
Tiger Lake
Measurements
Latencies
Latency operand 2 → 2:
3
Latency operand 4 → 2:
2
Throughput
Computed from the port usage: 3.00
Measured (loop):
6.00
Measured (unrolled):
6.00
Number of μops
Executed: 11
Retire slots: 12
Decoded (MITE): 4
Microcode Sequencer (MS): 8
Port usage:
4*p0156+6*p06+1*p1
Ice Lake
Measurements
Latencies
Latency operand 2 → 2:
3
Latency operand 4 → 2:
2
Throughput
Computed from the port usage: 3.00
Measured (loop):
6.00
Measured (unrolled):
6.00
Number of μops
Executed: 11
Retire slots: 12
Decoded (MITE): 4
Microcode Sequencer (MS): 8
Port usage:
4*p0156+6*p06+1*p1
Cascade Lake
Measurements
Latencies
Latency operand 2 → 2:
3
Latency operand 4 → 2:
2
Throughput
Computed from the port usage: 3.00
Measured (loop):
6.00
Measured (unrolled):
6.00
Number of μops
Executed: 11
Retire slots: 11
Decoded (MITE): 4
Microcode Sequencer (MS): 7
Port usage:
1*p015+3*p0156+6*p06+1*p1
Cannon Lake
Measurements
Latencies
Latency operand 2 → 2:
3
Latency operand 4 → 2:
3
Throughput
Computed from the port usage: 3.00
Measured (loop):
6.00
Measured (unrolled):
6.00
Number of μops
Executed: 11
Retire slots: 11
Decoded (MITE): 4
Microcode Sequencer (MS): 7
Port usage:
1*p015+3*p0156+6*p06+1*p1
Skylake-X
Measurements
Latencies
Latency operand 2 → 2:
3
Latency operand 4 → 2:
2
Throughput
Computed from the port usage: 3.00
Measured (loop):
6.00
Measured (unrolled):
6.00
Number of μops
Executed: 11
Retire slots: 11
Decoded (MITE): 4
Microcode Sequencer (MS): 7
Port usage:
1*p015+3*p0156+6*p06+1*p1
IACA 2.3
Throughput
Computed from the port usage: 2.75
IACA:
10.62
Number of μops:
11
Port usage:
9*p0156+2*p06
IACA 3.0
Throughput
Computed from the port usage: 2.75
IACA:
10.42
Number of μops:
11
Port usage:
9*p0156+2*p06
Coffee Lake
Measurements
Latencies
Latency operand 2 → 2:
3
Latency operand 4 → 2:
2
Throughput
Computed from the port usage: 3.00
Measured (loop):
6.00
Measured (unrolled):
6.00
Number of μops
Executed: 11
Retire slots: 11
Decoded (MITE): 4
Microcode Sequencer (MS): 7
Port usage:
1*p015+3*p0156+6*p06+1*p1
Kaby Lake
Measurements
Latencies
Latency operand 2 → 2:
2
Latency operand 4 → 2:
2
Throughput
Computed from the port usage: 3.00
Measured (loop):
6.00
Measured (unrolled):
6.00
Number of μops
Executed: 11
Retire slots: 11
Decoded (MITE): 4
Microcode Sequencer (MS): 7
Port usage:
1*p015+3*p0156+6*p06+1*p1
Skylake
Measurements
Latencies
Latency operand 2 → 2:
3
Latency operand 4 → 2:
1
Throughput
Computed from the port usage: 3.00
Measured (loop):
6.00
Measured (unrolled):
6.00
Number of μops
Executed: 11
Retire slots: 11
Decoded (MITE): 4
Microcode Sequencer (MS): 7
Port usage:
1*p015+3*p0156+6*p06+1*p1
IACA 2.3
Throughput
Computed from the port usage: 2.75
IACA:
10.62
Number of μops:
11
Port usage:
9*p0156+2*p06
IACA 3.0
Throughput
Computed from the port usage: 2.75
IACA:
10.42
Number of μops:
11
Port usage:
9*p0156+2*p06
Broadwell
Measurements
Latencies
Latency operand 2 → 2:
3
Latency operand 4 → 2:
1
Throughput
Computed from the port usage: 3.00
Measured (loop):
6.05
Measured (unrolled):
6.00
Number of μops
Executed: 11
Retire slots: 11
Decoded (MITE): 4
Microcode Sequencer (MS): 7
Port usage:
2*p015+2*p0156+6*p06+1*p1
IACA 2.2
Throughput
Computed from the port usage: 2.75
IACA:
11.00 (with the -no_interiteration flag: 8.14)
Number of μops:
11
Port usage:
9*p0156+2*p06
IACA 2.3
Throughput
Computed from the port usage: 2.75
IACA:
10.57
Number of μops:
11
Port usage:
9*p0156+2*p06
IACA 3.0
Throughput
Computed from the port usage: 2.75
IACA:
10.42
Number of μops:
11
Port usage:
9*p0156+2*p06
Haswell
Measurements
Latencies
Latency operand 2 → 2:
3
Latency operand 4 → 2:
2
Throughput
Computed from the port usage: 3.00
Measured (loop):
6.00
Measured (unrolled):
6.00
Number of μops
Executed: 11
Retire slots: 11
Decoded (MITE): 4
Microcode Sequencer (MS): 7
Port usage:
2*p015+2*p0156+6*p06+1*p1
IACA 2.2
Throughput
Computed from the port usage: 2.75
IACA:
11.05 (with the -no_interiteration flag: 7.62)
Number of μops:
11
Port usage:
9*p0156+2*p06
IACA 2.3
Throughput
Computed from the port usage: 2.75
IACA:
10.62
Number of μops:
11
Port usage:
9*p0156+2*p06
IACA 3.0
Throughput
Computed from the port usage: 2.75
IACA:
10.42
Number of μops:
11
Port usage:
9*p0156+2*p06
Ivy Bridge
Measurements
Latencies
Latency operand 2 → 2:
4
Latency operand 4 → 2:
4
Throughput
Computed from the port usage: 3.67
Measured (loop):
6.00
Measured (unrolled):
5.91
Number of μops
Executed: 11
Retire slots: 11
Decoded (MITE): 4
Microcode Sequencer (MS): 7
Port usage:
5*p015+2*p05+1*p1+3*p5
IACA 2.1
Latency:
1
Throughput
Computed from the port usage: 1.00
IACA:
1.00 (with the -no_interiteration flag: 1.00)
Number of μops:
1
Port usage:
1*p5
IACA 2.2
Throughput
Computed from the port usage: 1.00
IACA:
1.00 (with the -no_interiteration flag: 1.00)
Number of μops:
1
Port usage:
1*p5
IACA 2.3
Throughput
Computed from the port usage: 1.00
IACA:
1.00
Number of μops:
1
Port usage:
1*p5
Sandy Bridge
Measurements
Latencies
Latency operand 2 → 2:
4
Latency operand 4 → 2:
3
Throughput
Computed from the port usage: 3.67
Measured (loop):
6.00
Measured (unrolled):
6.00
Number of μops
Executed: 11
Retire slots: 11
Decoded (MITE): 4
Microcode Sequencer (MS): 7
Port usage:
5*p015+2*p05+1*p1+3*p5
IACA 2.1
Latency:
1
Throughput
Computed from the port usage: 1.00
IACA:
1.00 (with the -no_interiteration flag: 1.00)
Number of μops:
1
Port usage:
1*p5
IACA 2.2
Throughput
Computed from the port usage: 1.00
IACA:
1.00 (with the -no_interiteration flag: 1.00)
Number of μops:
1
Port usage:
1*p5
IACA 2.3
Throughput
Computed from the port usage: 1.00
IACA:
1.00
Number of μops:
1
Port usage:
1*p5
Westmere
Measurements
Latencies
Latency operand 2 → 2:
3
Latency operand 4 → 2:
2
Throughput
Computed from the port usage: 3.67
Measured (loop):
8.00
Measured (unrolled):
8.00
Number of μops
Executed: 11
Retire slots: 11
Microcode Sequencer (MS): 22
Port usage:
6*p015+3*p05+2*p5
IACA 2.1
Latency:
1
Throughput
Computed from the port usage: 1.00
IACA:
1.00 (with the -no_interiteration flag: 1.00)
Number of μops:
1
Port usage:
1*p5
IACA 2.2
Throughput
Computed from the port usage: 1.00
IACA:
1.00 (with the -no_interiteration flag: 1.00)
Number of μops:
1
Port usage:
1*p5
Nehalem
Measurements
Latencies
Latency operand 2 → 2:
3
Latency operand 4 → 2:
2
Throughput
Computed from the port usage: 3.67
Measured (loop):
8.00
Measured (unrolled):
8.00
Number of μops
Executed: 11
Retire slots: 11
Microcode Sequencer (MS): 22
Port usage:
6*p015+3*p05+2*p5
IACA 2.1
Latency:
1
Throughput
Computed from the port usage: 1.00
IACA:
1.00 (with the -no_interiteration flag: 1.00)
Number of μops:
1
Port usage:
1*p5
IACA 2.2
Throughput
Computed from the port usage: 1.00
IACA:
1.00 (with the -no_interiteration flag: 1.00)
Number of μops:
1
Port usage:
1*p5
Wolfdale
Measurements
Latencies
Latency operand 2 → 2:
2
Latency operand 4 → 2:
2
Throughput
Computed from the port usage: 3.67
Measured (loop):
6.00
Measured (unrolled):
6.03
Number of μops
Executed: 11
Port usage:
7*p015+1*p05+1*p1+2*p5
Conroe
Measurements
Latencies
Latency operand 2 → 2:
2
Latency operand 4 → 2:
2
Throughput
Computed from the port usage: 3.67
Measured (loop):
6.00
Measured (unrolled):
6.08
Number of μops
Executed: 11
Port usage:
7*p015+1*p05+1*p1+2*p5
Tremont
Measurements
Latencies
Latency operand 2 → 2:
2
Latency operand 4 → 2:
2
Throughput
Measured (loop):
6.00
Measured (unrolled):
6.00
Number of μops
Executed: 9
Microcode Sequencer (MS): 8
Goldmont Plus
Measurements
Latencies
Latency operand 2 → 2:
2
Latency operand 4 → 2:
2
Throughput
Measured (loop):
12.00
Measured (unrolled):
12.00
Number of μops
Executed: 9
Microcode Sequencer (MS): 9
Goldmont
Measurements
Latencies
Latency operand 2 → 2:
2
Latency operand 4 → 2:
2
Throughput
Measured (loop):
12.03
Measured (unrolled):
12.00
Number of μops
Executed: 9
Microcode Sequencer (MS): 9
Airmont
Measurements
Latencies
Latency operand 2 → 2:
4
Latency operand 4 → 2:
4
Throughput
Measured (loop):
13.00
Measured (unrolled):
12.98
Number of μops
Executed: 8
Microcode Sequencer (MS): 8
Bonnell
Measurements
Latencies
Latency operand 2 → 2:
7
Latency operand 4 → 2:
7
Throughput
Measured (loop):
11.00
Measured (unrolled):
11.00
Number of μops
Executed: 8
Microcode Sequencer (MS): 8
AMD Zen 4
Measurements
Latencies
Latency operand 2 → 2:
1
Latency operand 4 → 2:
1
Throughput
Measured (loop):
0.50
Measured (unrolled):
0.56
Number of μops
Executed: 1
AMD Zen 3
Measurements
Latencies
Latency operand 2 → 2:
1
Latency operand 4 → 2:
1
Throughput
Measured (loop):
0.50
Measured (unrolled):
0.56
Number of μops
Executed: 1
Documentation
Latency: 1
Throughput: 0.50
Number of μops: 1
Port usage: BRU
AMD Zen 2
Measurements
Latencies
Latency operand 2 → 2:
1
Latency operand 4 → 2:
1
Throughput
Measured (loop):
0.57
Measured (unrolled):
0.56
Number of μops
Executed: 1
Documentation
Latency: 1
Throughput: 0.50
Number of μops: 1
Port usage: ALU0/ALU3
AMD Zen+
Measurements
Latencies
Latency operand 2 → 2:
1
Latency operand 4 → 2:
1
Throughput
Measured (loop):
0.70
Measured (unrolled):
0.56
Number of μops
Executed: 1
Documentation
Latency: 1
Throughput: 0.50
Number of μops: 1
Port usage: ALU0/ALU3