2015-04-01 11:11:34

by Denys Vlasenko

[permalink] [raw]
Subject: Re: [PATCH 7/9] x86/asm/entry/32: tidy up some instructions

On 04/01/2015 12:21 AM, Brian Gerst wrote:
> On Tue, Mar 31, 2015 at 1:00 PM, Denys Vlasenko <[email protected]> wrote:
>> After TESTs, use logically correct JZ mnemonic instead of JE
>> (this doesn't change code).
>>
>> Tidy up CMPW insns:
>>
>> Modern CPUs are not good with 16-bit operations.
>> The instructions with 16-bit immediates are especially bad,
>> on many CPUs they cause length changing prefix stall
>> in the decoders, costing ~6 cycles to recover.
>>
>> Replace CMPWs with CMPLs.
>> Of these, for form with 8-bit sign-extended immediates
>> it is a win because they are smaller now
>> (no 0x66 prefix anymore);
>> ones with 16-bit immediates are faster.
>>
>> @@ -708,7 +708,7 @@ END(sysenter_badsys)
>> #ifdef CONFIG_X86_ESPFIX32
>> movl %ss, %eax
>> /* see if on espfix stack */
>> - cmpw $__ESPFIX_SS, %ax
>> + cmpl $__ESPFIX_SS, %eax
>> jne 27f
>> movl $__KERNEL_DS, %eax
>> movl %eax, %ds
>
> This is incorrect. 32-bit reads from a segment register are not
> zero-extended. The upper 16 bits are implementation-defined. Most
> processors will clear them but it's not guaranteed.

I did not know that. I was sure they are always zero extended.


2015-04-01 15:50:08

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 7/9] x86/asm/entry/32: tidy up some instructions

On Wed, Apr 1, 2015 at 4:10 AM, Denys Vlasenko <[email protected]> wrote:
>
> I did not know that. I was sure they are always zero extended.

On all half-way modern cpu's they are. But on some older cpu's
(possibly just the original 386) the segment move instructions
basically are always 16-bit, and the operand size is ignored (so the
32-bit version is just smaller and faster to decode, because it
doesn't have a 16-bit operand size prefix)

Iirc, the same is true for the values pushed to memory on exceptions,
so the 'cs/ss' values on the exception stack may not be reliable in
the upper 16 bits.

I don't remember if the same might be true of "pushl %Sseg". The intel
architecture manual says segment registers are zero-extended on push.

Linus

2015-04-01 20:52:11

by Denys Vlasenko

[permalink] [raw]
Subject: Re: [PATCH 7/9] x86/asm/entry/32: tidy up some instructions

On 04/01/2015 05:50 PM, Linus Torvalds wrote:
> On Wed, Apr 1, 2015 at 4:10 AM, Denys Vlasenko <[email protected]> wrote:
>>
>> I did not know that. I was sure they are always zero extended.
>
> On all half-way modern cpu's they are. But on some older cpu's
> (possibly just the original 386) the segment move instructions
> basically are always 16-bit, and the operand size is ignored (so the
> 32-bit version is just smaller and faster to decode, because it
> doesn't have a 16-bit operand size prefix)
>
> Iirc, the same is true for the values pushed to memory on exceptions,
> so the 'cs/ss' values on the exception stack may not be reliable in
> the upper 16 bits.
>
> I don't remember if the same might be true of "pushl %Sseg". The intel
> architecture manual says segment registers are zero-extended on push.

BTW, AMD64 docs do explicitly say that MOVs from segment registers
to gpregs are zero-extending.

2015-04-01 20:57:50

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 7/9] x86/asm/entry/32: tidy up some instructions

On 04/01/2015 01:52 PM, Denys Vlasenko wrote:
> On 04/01/2015 05:50 PM, Linus Torvalds wrote:
>> On Wed, Apr 1, 2015 at 4:10 AM, Denys Vlasenko <[email protected]> wrote:
>>>
>>> I did not know that. I was sure they are always zero extended.
>>
>> On all half-way modern cpu's they are. But on some older cpu's
>> (possibly just the original 386) the segment move instructions
>> basically are always 16-bit, and the operand size is ignored (so the
>> 32-bit version is just smaller and faster to decode, because it
>> doesn't have a 16-bit operand size prefix)
>>
>> Iirc, the same is true for the values pushed to memory on exceptions,
>> so the 'cs/ss' values on the exception stack may not be reliable in
>> the upper 16 bits.
>>
>> I don't remember if the same might be true of "pushl %Sseg". The intel
>> architecture manual says segment registers are zero-extended on push.
>
> BTW, AMD64 docs do explicitly say that MOVs from segment registers
> to gpregs are zero-extending.
>

For Intel processors it is true for Pentium Pro and later processors, as
far as I know.

-hpa

2015-04-01 22:14:21

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 7/9] x86/asm/entry/32: tidy up some instructions

On Wed, Apr 1, 2015 at 1:52 PM, Denys Vlasenko <[email protected]> wrote:
>
> BTW, AMD64 docs do explicitly say that MOVs from segment registers
> to gpregs are zero-extending.

Yeah, I think anything even *remotely* recent enough to do 64-bit does
zero-extending.

Even on the 32-bit side, anything that does register renaming is much
better off with zero-extension than with partial register writes.

And I found the "push" thing. It's actually documented:

"When pushing a segment selector onto the stack, the Pentium 4,
Intel Xeon, P6 family, and Intel486 processors
decrement the ESP register by the operand size and then write 2 bytes.
If the operand size is 32-bits, the upper
two bytes of the write are not modified"

but I can't find any similar documentation for the "mov
Sreg->register" thing. So now I'm starting to doubt my own memory.

Linus

2015-04-02 00:32:08

by Brian Gerst

[permalink] [raw]
Subject: Re: [PATCH 7/9] x86/asm/entry/32: tidy up some instructions

On Wed, Apr 1, 2015 at 6:14 PM, Linus Torvalds
<[email protected]> wrote:
> On Wed, Apr 1, 2015 at 1:52 PM, Denys Vlasenko <[email protected]> wrote:
>>
>> BTW, AMD64 docs do explicitly say that MOVs from segment registers
>> to gpregs are zero-extending.
>
> Yeah, I think anything even *remotely* recent enough to do 64-bit does
> zero-extending.
>
> Even on the 32-bit side, anything that does register renaming is much
> better off with zero-extension than with partial register writes.
>
> And I found the "push" thing. It's actually documented:
>
> "When pushing a segment selector onto the stack, the Pentium 4,
> Intel Xeon, P6 family, and Intel486 processors
> decrement the ESP register by the operand size and then write 2 bytes.
> If the operand size is 32-bits, the upper
> two bytes of the write are not modified"
>
> but I can't find any similar documentation for the "mov
> Sreg->register" thing. So now I'm starting to doubt my own memory.
>
> Linus

It's in the description of MOV:

"When the processor executes the instruction with a 32-bit
general-purpose register, it assumes that the 16 least-significant
bits of the general-purpose register are the destination or source
operand. If the register is a destination operand, the resulting
value in the two high-order bytes of the register is implementation
dependent. For the Pentium 4, Intel Xeon, and P6 family processors,
the two high-order bytes are filled with zeros; for earlier 32-bit
IA-32 processors, the two high order bytes are undefined."

AMD will always zero-extend, although this applies specifically to
64-bit processors:

"When reading segment-registers with a 32-bit operand size, the
processor zero-extends the 16-bit selector results to 32 bits. When
reading segment-registers with a 64-bit operand size, the processor
zero-extends the 16-bit selector to 64 bits."

So I think it's safe to assume zero-extension on 64-bit, but not 32-bit.

--
Brian Gerst