KMOVW (K, M16)
Summary:
"Move from and to Mask Registers"
Reference:
https://www.felixcloutier.com/x86/KMOVW:KMOVB:KMOVQ:KMOVD.html
Extension:
AVX512VEX
Category:
KMASK
ISA-Set:
AVX512F_KOP
CPL:
3
iform:
KMOVW_MASKmskw_MEMu16_AVX512
iclass:
KMOVW
ASM:
KMOVW
Operands
Operand 1 (w): Register (K0, K1, K2, K3, K4, K5, K6, K7)
Operand 2 (r): Memory
Available performance data
Alder Lake-P
Rocket Lake
Tiger Lake
Ice Lake
Cascade Lake
Cannon Lake
Skylake-X
AMD Zen 4
Alder Lake-P
Measurements
Throughput
Computed from the port usage: 1.00 (if an indexed addressing mode is used: 0.33)
Measured (loop):
1.00
Measured (unrolled):
1.00
Number of μops
Executed: 2
Retire slots: 3
Decoded (MITE): 3
Microcode Sequencer (MS): 0
Requires the complex decoder (3 other instructions can be decoded with simple decoders in the same cycle)
Port usage:
1*p23A+1*p5 (if an indexed addressing mode is used: 1*p23A)
Rocket Lake
Measurements
Throughput
Computed from the port usage: 1.00
Measured (loop):
1.00
Measured (unrolled):
1.00
Number of μops
Executed: 2
Retire slots: 3
Decoded (MITE): 3
Microcode Sequencer (MS): 0
Requires the complex decoder (2 other instructions can be decoded with simple decoders in the same cycle)
Port usage:
1*p23+1*p5
Tiger Lake
Measurements
Throughput
Computed from the port usage: 1.00
Measured (loop):
1.00
Measured (unrolled):
1.00
Number of μops
Executed: 2
Retire slots: 3
Decoded (MITE): 3
Microcode Sequencer (MS): 0
Requires the complex decoder (2 other instructions can be decoded with simple decoders in the same cycle)
Port usage:
1*p23+1*p5
Ice Lake
Measurements
Throughput
Computed from the port usage: 1.00
Measured (loop):
1.00
Measured (unrolled):
1.00
Number of μops
Executed: 2
Retire slots: 3
Decoded (MITE): 3
Microcode Sequencer (MS): 0
Requires the complex decoder (2 other instructions can be decoded with simple decoders in the same cycle)
Port usage:
1*p23+1*p5
Cascade Lake
Measurements
Throughput
Computed from the port usage: 1.00
Measured (loop):
1.00
Measured (unrolled):
1.00
Number of μops
Executed: 2
Retire slots: 3
Decoded (MITE): 3
Microcode Sequencer (MS): 0
Requires the complex decoder (2 other instructions can be decoded with simple decoders in the same cycle)
Port usage:
1*p23+1*p5
Cannon Lake
Measurements
Throughput
Computed from the port usage: 1.00
Measured (loop):
1.00
Measured (unrolled):
1.00
Number of μops
Executed: 2
Retire slots: 3
Decoded (MITE): 3
Microcode Sequencer (MS): 0
Requires the complex decoder (2 other instructions can be decoded with simple decoders in the same cycle)
Port usage:
1*p23+1*p5
Skylake-X
Measurements
Throughput
Computed from the port usage: 1.00
Measured (loop):
1.00
Measured (unrolled):
1.00
Number of μops
Executed: 2
Retire slots: 3
Decoded (MITE): 3
Microcode Sequencer (MS): 0
Requires the complex decoder (2 other instructions can be decoded with simple decoders in the same cycle)
Port usage:
1*p23+1*p5
IACA 2.3
Throughput
Computed from the port usage: 1.00
IACA:
1.00
Number of μops:
3
Port usage:
1*p0156+1*p23+1*p5
IACA 3.0
Throughput
Computed from the port usage: 1.00
IACA:
1.00
Number of μops:
3
Port usage:
1*p0156+1*p23+1*p5
AMD Zen 4
Measurements
Throughput
Computed from the port usage: 0.50
Measured (loop):
0.50
Measured (unrolled):
0.50
Number of μops
Executed: 2
Port usage:
1*FP01