MUL (R64) - Throughput and Uops
With unroll_count=500 and no inner loop
Code:
0: 49 f7 e0 mul r8
Show nanoBench command
Results:
Instructions retired: 1.0
Core cycles: 3.13
Reference cycles: 2.89
UOPS_RETIRED.ANY: 3.01
RETIRE_SLOTS: 3.0
UOPS_MS: 0.0
UOPS_PORT_0: 1.01
UOPS_PORT_1: 1.36
UOPS_PORT_2: 0.0
UOPS_PORT_3: 0.0
UOPS_PORT_4: 0.0
UOPS_PORT_5: 0.0
DIV_CYCLES: 0.0
ILD_STALL.LCP: 0.0
INST_DECODED.DEC0: 1.0
With loop_count=1000 and unroll_count=10
Code:
0: 49 f7 e0 mul r8
Show nanoBench command
Results:
Instructions retired: 1.2
Core cycles: 3.1
Reference cycles: 2.86
UOPS_RETIRED.ANY: 3.2
RETIRE_SLOTS: 3.2
UOPS_MS: 0.0
UOPS_PORT_0: 1.15
UOPS_PORT_1: 1.3
UOPS_PORT_2: 0.0
UOPS_PORT_3: 0.0
UOPS_PORT_4: 0.0
UOPS_PORT_5: 0.0
DIV_CYCLES: 0.0
ILD_STALL.LCP: 0.0
INST_DECODED.DEC0: 1.0
With loop_count=100 and unroll_count=100
Code:
0: 49 f7 e0 mul r8
Show nanoBench command
Results:
Instructions retired: 1.02
Core cycles: 3.13
Reference cycles: 2.88
UOPS_RETIRED.ANY: 3.02
RETIRE_SLOTS: 3.02
UOPS_MS: 0.0
UOPS_PORT_0: 1.03
UOPS_PORT_1: 1.37
UOPS_PORT_2: 0.0
UOPS_PORT_3: 0.0
UOPS_PORT_4: 0.0
UOPS_PORT_5: 0.0
DIV_CYCLES: 0.0
ILD_STALL.LCP: 0.0
INST_DECODED.DEC0: 1.0
With additional dependency-breaking instructions
With unroll_count=500 and no inner loop
Code:
0: 48 31 c0 xor rax,rax 3: 49 f7 e0 mul r8
Show nanoBench command
Results:
Instructions retired: 2.0
Core cycles: 2.0
Reference cycles: 1.84
UOPS_RETIRED.ANY: 4.0
RETIRE_SLOTS: 4.0
UOPS_MS: 0.0
UOPS_PORT_0: 1.28
UOPS_PORT_1: 1.39
UOPS_PORT_2: 0.0
UOPS_PORT_3: 0.0
UOPS_PORT_4: 0.0
UOPS_PORT_5: 0.0
DIV_CYCLES: 0.0
ILD_STALL.LCP: 0.0
INST_DECODED.DEC0: 1.0
With loop_count=1000 and unroll_count=10
Code:
0: 48 31 c0 xor rax,rax 3: 49 f7 e0 mul r8
Show nanoBench command
Results:
Instructions retired: 2.2
Core cycles: 2.1
Reference cycles: 1.94
UOPS_RETIRED.ANY: 4.2
RETIRE_SLOTS: 4.2
UOPS_MS: 0.0
UOPS_PORT_0: 1.36
UOPS_PORT_1: 1.5
UOPS_PORT_2: 0.0
UOPS_PORT_3: 0.0
UOPS_PORT_4: 0.0
UOPS_PORT_5: 0.0
DIV_CYCLES: 0.0
ILD_STALL.LCP: 0.0
INST_DECODED.DEC0: 1.0
With loop_count=100 and unroll_count=100
Code:
0: 48 31 c0 xor rax,rax 3: 49 f7 e0 mul r8
Show nanoBench command
Results:
Instructions retired: 2.02
Core cycles: 2.01
Reference cycles: 1.86
UOPS_RETIRED.ANY: 4.02
RETIRE_SLOTS: 4.02
UOPS_MS: 0.0
UOPS_PORT_0: 1.28
UOPS_PORT_1: 1.42
UOPS_PORT_2: 0.0
UOPS_PORT_3: 0.0
UOPS_PORT_4: 0.0
UOPS_PORT_5: 0.0
DIV_CYCLES: 0.0
ILD_STALL.LCP: 0.0
INST_DECODED.DEC0: 1.0