MUL (R64) - Throughput and Uops
With unroll_count=500 and no inner loop
Code:
0: 49 f7 e0 mul r8
Show nanoBench command
Results:
Instructions retired: 1.0
Core cycles: 14.11
Reference cycles: 14.1
UOPS_RETIRED.ANY: 8.0
UOPS_MS: 8.0
With loop_count=1000 and unroll_count=10
Code:
0: 49 f7 e0 mul r8
Show nanoBench command
Results:
Instructions retired: 1.2
Core cycles: 14.11
Reference cycles: 14.11
UOPS_RETIRED.ANY: 8.2
UOPS_MS: 8.01
With loop_count=100 and unroll_count=100
Code:
0: 49 f7 e0 mul r8
Show nanoBench command
Results:
Instructions retired: 1.02
Core cycles: 14.17
Reference cycles: 14.17
UOPS_RETIRED.ANY: 8.02
UOPS_MS: 8.0
With additional dependency-breaking instructions
With unroll_count=500 and no inner loop
Code:
0: 48 31 c0 xor rax,rax 3: 49 f7 e0 mul r8
Show nanoBench command
Results:
Instructions retired: 2.0
Core cycles: 18.0
Reference cycles: 18.0
UOPS_RETIRED.ANY: 9.0
UOPS_MS: 8.0
With loop_count=1000 and unroll_count=10
Code:
0: 48 31 c0 xor rax,rax 3: 49 f7 e0 mul r8
Show nanoBench command
Results:
Instructions retired: 2.2
Core cycles: 18.17
Reference cycles: 18.17
UOPS_RETIRED.ANY: 9.2
UOPS_MS: 8.0
With loop_count=100 and unroll_count=100
Code:
0: 48 31 c0 xor rax,rax 3: 49 f7 e0 mul r8
Show nanoBench command
Results:
Instructions retired: 2.02
Core cycles: 18.03
Reference cycles: 18.03
UOPS_RETIRED.ANY: 9.02
UOPS_MS: 8.0