MOVSB - Throughput and Uops
With unroll_count=500 and no inner loop
Code:
0: a4 movs BYTE PTR es:[rdi],BYTE PTR ds:[rsi]
Show nanoBench command
Results:
Instructions retired: 1.0
Core cycles: 5.0
Reference cycles: 4.61
UOPS_RETIRED.ANY: 5.0
RETIRE_SLOTS: 5.0
UOPS_MS: 1.0
UOPS_PORT_0: 0.19
UOPS_PORT_1: 0.81
UOPS_PORT_2: 1.0
UOPS_PORT_3: 1.0
UOPS_PORT_4: 1.0
UOPS_PORT_5: 0.0
DIV_CYCLES: 0.0
ILD_STALL.LCP: 0.0
INST_DECODED.DEC0: 1.0
With loop_count=1000 and unroll_count=10
Code:
0: a4 movs BYTE PTR es:[rdi],BYTE PTR ds:[rsi]
Show nanoBench command
Results:
Instructions retired: 1.2
Core cycles: 5.1
Reference cycles: 4.71
UOPS_RETIRED.ANY: 5.2
RETIRE_SLOTS: 5.2
UOPS_MS: 1.0
UOPS_PORT_0: 0.3
UOPS_PORT_1: 0.8
UOPS_PORT_2: 1.0
UOPS_PORT_3: 1.0
UOPS_PORT_4: 1.0
UOPS_PORT_5: 0.0
DIV_CYCLES: 0.0
ILD_STALL.LCP: 0.0
INST_DECODED.DEC0: 1.0
With loop_count=100 and unroll_count=100
Code:
0: a4 movs BYTE PTR es:[rdi],BYTE PTR ds:[rsi]
Show nanoBench command
Results:
Instructions retired: 1.02
Core cycles: 5.01
Reference cycles: 4.62
UOPS_RETIRED.ANY: 5.02
RETIRE_SLOTS: 5.02
UOPS_MS: 1.0
UOPS_PORT_0: 0.2
UOPS_PORT_1: 0.81
UOPS_PORT_2: 1.0
UOPS_PORT_3: 1.0
UOPS_PORT_4: 1.0
UOPS_PORT_5: 0.0
DIV_CYCLES: 0.0
ILD_STALL.LCP: 0.0
INST_DECODED.DEC0: 1.0