2005-03-27 22:24:47

by H. J. Lu

[permalink] [raw]
Subject: i386/x86_64 segment register issuses (Re: PATCH: Fix x86 segment register access)

It turns out that 2.4 kernel has

arch/i386/kernel/process.c: asm volatile("movl %%" #seg ",%0":"=m" (*(int *)&(value)))
arch/i386/kernel/process.c: asm volatile("movl %%fs,%0":"=m" (*(int *)&prev->fs));
arch/i386/kernel/process.c: asm volatile("movl %%gs,%0":"=m" (*(int *)&prev->gs));
arch/x86_64/kernel/process.c: asm("movl %%gs,%0" : "=m" (p->thread.gsindex));
arch/x86_64/kernel/process.c: asm("movl %%fs,%0" : "=m" (p->thread.fsindex));
arch/x86_64/kernel/process.c: asm("movl %%es,%0" : "=m" (p->thread.es));
arch/x86_64/kernel/process.c: asm("movl %%ds,%0" : "=m" (p->thread.ds));
arch/x86_64/kernel/process.c: asm volatile("movl %%es,%0" : "=m" (prev->es));
arch/x86_64/kernel/process.c: asm volatile ("movl %%ds,%0" : "=m" (prev->ds));

2.6 kernel has

arch/i386/kernel/process.c: asm volatile("movl %%fs,%0":"=m" (*(int *)&prev->fs));
arch/i386/kernel/process.c: asm volatile("movl %%gs,%0":"=m" (*(int *)&prev->gs));
arch/x86_64/kernel/process.c: asm("movl %%gs,%0" : "=m" (p->thread.gsindex));
arch/x86_64/kernel/process.c: asm("movl %%fs,%0" : "=m" (p->thread.fsindex));
arch/x86_64/kernel/process.c: asm("movl %%es,%0" : "=m" (p->thread.es));
arch/x86_64/kernel/process.c: asm("movl %%ds,%0" : "=m" (p->thread.ds));
arch/x86_64/kernel/process.c: asm volatile("movl %%es,%0" : "=m" (prev->es));
arch/x86_64/kernel/process.c: asm volatile ("movl %%ds,%0" : "=m" (prev->ds));
arch/x86_64/kernel/process.c: asm volatile("movl %%fs,%0" : "=g" (fsindex));
arch/x86_64/kernel/process.c: asm volatile("movl %%gs,%0" : "=g" (gsindex));

The new assembler will disallow them since those instructions with
memory operand will only use the first 16bits. If the memory operand
is 16bit, you won't see any problems. But if the memory destinatin
is 32bit, the upper 16bits may have random values. The new assembler
will force people to use

mov (%eax),%ds
movw (%eax),%ds
movw %ds,(%eax)
mov %ds,(%eax)

Will it be a big problem for kernel people?

BTW, I haven't checked glibc yet. It may have similar issues.

H.J.
---
On Fri, Mar 25, 2005 at 06:05:06PM -0800, H. J. Lu wrote:
> X86 segment register access is a special. We can move between a segment
> register and a 16/32/64bit general-purpose register. But we can only
> move between a segment register and a 16bit memory address. The current
> assembler allows "movl (%eax),%ds", but doesn't allow "movq %rax,%ds".
> The disassembler display "movl (%eax),%ds". This patch tries to fix
> those.
>
>
> H.J.
> ----
> gas/testsuite/
>
> 2005-03-25 H.J. Lu <[email protected]>
>
> * gas/i386/i386.exp: Run segment and inval-seg for i386. Run
> x86-64-segment and x86-64-inval-seg for x86-64.
>
> * gas/i386/intel.d: Expect movw for moving between memory and
> segment register.
> * gas/i386/naked.d: Likewise.
> * gas/i386/opcode.d: Likewise.
> * gas/i386/x86-64-opcode.d: Likewise.
>
> * gas/i386/opcode.s: Use movw for moving between memory and
> segment register.
> * gas/i386/x86-64-opcode.s: Likewise.
>
> * : Likewise.
>
> * gas/i386/inval-seg.l: New.
> * gas/i386/inval-seg.s: New.
> * gas/i386/segment.l: New.
> * gas/i386/segment.s: New.
> * gas/i386/x86-64-inval-seg.l: New.
> * gas/i386/x86-64-inval-seg.s: New.
> * gas/i386/x86-64-segment.l: New.
> * gas/i386/x86-64-segment.s: New.
>
> include/opcode/
>
> 2005-03-25 H.J. Lu <[email protected]>
>
> * i386.h (i386_optab): Don't allow the `l' suffix for moving
> moving between memory and segment register. Allow movq for
> moving between general-purpose register and segment register.
>
> opcodes/
>
> 2005-03-25 H.J. Lu <[email protected]>
>
> * i386-dis.c (SEG_Fixup): New.
> (Sv): New.
> (dis386): Use "Sv" for 0x8c and 0x8e.
>


2005-03-28 15:47:10

by Andi Kleen

[permalink] [raw]
Subject: Re: i386/x86_64 segment register issuses (Re: PATCH: Fix x86 segment register access)

"H. J. Lu" <[email protected]> writes:
> The new assembler will disallow them since those instructions with
> memory operand will only use the first 16bits. If the memory operand
> is 16bit, you won't see any problems. But if the memory destinatin
> is 32bit, the upper 16bits may have random values. The new assembler

Does it really have random values on existing x86 hardware?

If it is a only a "theoretical" problem that does not happen
in practice I would advise to not do the change.

> will force people to use
>
> mov (%eax),%ds
> movw (%eax),%ds
> movw %ds,(%eax)
> mov %ds,(%eax)
>
> Will it be a big problem for kernel people?

Well, we re getting used to the tool chain regularly breaking
perfectly good code.

You would not get more than the usual curses and only waste
a couple hundred man hours of testers worlwide scratching their heads
why their kernel does not compile anymore. World economy
will probably survive ite ;-)

-Andi

2005-03-28 17:59:58

by H. J. Lu

[permalink] [raw]
Subject: Re: i386/x86_64 segment register issuses (Re: PATCH: Fix x86 segment register access)

On Mon, Mar 28, 2005 at 05:47:06PM +0200, Andi Kleen wrote:
> "H. J. Lu" <[email protected]> writes:
> > The new assembler will disallow them since those instructions with
> > memory operand will only use the first 16bits. If the memory operand
> > is 16bit, you won't see any problems. But if the memory destinatin
> > is 32bit, the upper 16bits may have random values. The new assembler
>
> Does it really have random values on existing x86 hardware?

The x86 hardwares will only change the first 16bits. The rest bits
are unchanged. A simple test program can verify that.

>
> If it is a only a "theoretical" problem that does not happen
> in practice I would advise to not do the change.
>

It depends on what the initial value in the upper bits is. The
assembler in CVS generates the same binary code as

movw %ds,(%eax)

for

movl %ds,(%eax)

But the previous assemblers will generate

66 8c 18 movw %ds,(%eax)

for

movw %ds,(%eax)

This bug has been fixed for a while. I guess that may be why Linux
kernel uses

movl %ds,(%eax)


H.J.

2005-03-28 18:26:37

by H. J. Lu

[permalink] [raw]
Subject: Re: i386/x86_64 segment register issuses (Re: PATCH: Fix x86 segment register access)

On Mon, Mar 28, 2005 at 09:46:00AM -0800, H. J. Lu wrote:
> On Mon, Mar 28, 2005 at 05:47:06PM +0200, Andi Kleen wrote:
> > "H. J. Lu" <[email protected]> writes:
> > > The new assembler will disallow them since those instructions with
> > > memory operand will only use the first 16bits. If the memory operand
> > > is 16bit, you won't see any problems. But if the memory destinatin
> > > is 32bit, the upper 16bits may have random values. The new assembler
> >
> > Does it really have random values on existing x86 hardware?
>
> The x86 hardwares will only change the first 16bits. The rest bits
> are unchanged. A simple test program can verify that.
>
> >
> > If it is a only a "theoretical" problem that does not happen
> > in practice I would advise to not do the change.
> >
>
> It depends on what the initial value in the upper bits is. The
> assembler in CVS generates the same binary code as
>
> movw %ds,(%eax)
>
> for
>
> movl %ds,(%eax)
>
> But the previous assemblers will generate
>
> 66 8c 18 movw %ds,(%eax)
>
> for
>
> movw %ds,(%eax)
>
> This bug has been fixed for a while. I guess that may be why Linux
> kernel uses
>
> movl %ds,(%eax)

It turns out that both old and new assemblers will generate

0: 8c 18 movw %ds,(%eax)

for
mov %ds,(%eax)

So kernel can use "mov" instead of "movl" and the binary output will
be the same.


H.J.

2005-03-29 19:19:40

by H. J. Lu

[permalink] [raw]
Subject: PATCH: i386/x86_64 segment register access update

The new i386/x86_64 assemblers no longer accept instructions for moving
between a segment register and a 32bit memory location, i.e.,

movl (%eax),%ds
movl %ds,(%eax)

To generate instructions for moving between a segment register and a
16bit memory location without the 16bit operand size prefix, 0x66,

mov (%eax),%ds
mov %ds,(%eax)

should be used. It will work with both new and old assemblers. The
assembler starting from 2.16.90.0.1 will also support

movw (%eax),%ds
movw %ds,(%eax)

without the 0x66 prefix. I am enclosing patches for 2.4 and 2.6 kernels
here. The resulting kernel binaries should be unchanged as before, with
old and new assemblers, if gcc never generates memory access for

unsigned gsindex;
asm volatile("movl %%gs,%0" : "=g" (gsindex));

If gcc does generate memory access for the code above, the upper bits
in gsindex are undefined and the new assembler doesn't allow it.


H.J.


Attachments:
(No filename) (987.00 B)
linux-2.4-seg-4.patch (4.02 kB)
linux-2.6-seg-5.patch (3.87 kB)
Download all attachments

2005-03-30 00:28:43

by Linus Torvalds

[permalink] [raw]
Subject: Re: i386/x86_64 segment register issuses (Re: PATCH: Fix x86 segment register access)



On Mon, 28 Mar 2005, Andi Kleen wrote:
>
> "H. J. Lu" <[email protected]> writes:
> > The new assembler will disallow them since those instructions with
> > memory operand will only use the first 16bits. If the memory operand
> > is 16bit, you won't see any problems. But if the memory destinatin
> > is 32bit, the upper 16bits may have random values. The new assembler
>
> Does it really have random values on existing x86 hardware?

The upper bits are not written at all, so it's not random.

> If it is a only a "theoretical" problem that does not happen
> in practice I would advise to not do the change.

My preference too. The reason we use "movl" is because we really do want
the 32-bit versions, since they are faster. It's a conscious choice. In
contrast "movw" generates bigger and slower code on all assemblers out
there, and "mov" doesn't make it clear which one it is. Is it the slow
one, or the fast one?

For example, "mov %ds,%eax" does seem to generate the (faster) 32-bit code
on modern assemblers, while "mov %ds,%ax" generates (slower) 16-bit code
that leaves the high bits of %eax untouched. Sometimes you may want the
slower one, sometimes the faster one. I have this pretty strong memory of
old versions of gas not making any difference between %ax and %eax as a
target, and that you really needed to set the size explicitly.

Now, those versions of gas may be so old that nobody cares, but the
explicit size still is a GOOD THING. The size DOES MATTER. People who want
the smaller and faster version do not want to just rely on gas
automatically getting it right, especially since gas has historically been
very very bad at getting things right.

What is the advantage of not allowing "movl %ds,mem"? Really? Especially
since I suspect the kernel is pretty much the only one who does this, and
the kernel really does do it on purpose. The kernel explicitly wants the
32-bit version, knowing that the upper bits are undefined.

Linus

2005-03-30 01:53:23

by H. J. Lu

[permalink] [raw]
Subject: Re: i386/x86_64 segment register issuses (Re: PATCH: Fix x86 segment register access)

On Tue, Mar 29, 2005 at 04:30:01PM -0800, Linus Torvalds wrote:
>
>
> On Mon, 28 Mar 2005, Andi Kleen wrote:
> >
> > "H. J. Lu" <[email protected]> writes:
> > > The new assembler will disallow them since those instructions with
> > > memory operand will only use the first 16bits. If the memory operand
> > > is 16bit, you won't see any problems. But if the memory destinatin
> > > is 32bit, the upper 16bits may have random values. The new assembler
> >
> > Does it really have random values on existing x86 hardware?
>
> The upper bits are not written at all, so it's not random.
>
> > If it is a only a "theoretical" problem that does not happen
> > in practice I would advise to not do the change.
>
> My preference too. The reason we use "movl" is because we really do want
> the 32-bit versions, since they are faster. It's a conscious choice. In
> contrast "movw" generates bigger and slower code on all assemblers out
> there, and "mov" doesn't make it clear which one it is. Is it the slow
> one, or the fast one?

"mov" shouldn't generate the 0x66 prefix, at least with the assembler
since binutils 2.14.90.0.4 20030523. The assembler in CVS won't generate
0x66 for "movw" either.

> Now, those versions of gas may be so old that nobody cares, but the
> explicit size still is a GOOD THING. The size DOES MATTER. People who want

Suggesting "mov" instead of "movw" is for the existing assemblers. Or
kernel can check assembler version to decide if "movw" should be used.
I can verify the first Linux assembler which won't generate 0x66 for
"movw".

> the smaller and faster version do not want to just rely on gas
> automatically getting it right, especially since gas has historically been
> very very bad at getting things right.

We are fixing those issues in assembler. If people run into problems
like that with gas, they can report them. They will be fixed.

>
> What is the advantage of not allowing "movl %ds,mem"? Really? Especially
> since I suspect the kernel is pretty much the only one who does this, and
> the kernel really does do it on purpose. The kernel explicitly wants the
> 32-bit version, knowing that the upper bits are undefined.
>

Kernel has

unsigned gsindex;
asm volatile("movl %%gs,%0" : "=g" (gsindex));
...
if (gsindex)


It is OK if gcc never generates memory access like

movl %gs,0x128(%rsp)

Otherwise, the upper bits in gsindex are undefined. The new
assembler will make sure that it won't happen.


H.J.

2005-03-30 02:43:21

by Linus Torvalds

[permalink] [raw]
Subject: Re: i386/x86_64 segment register issuses (Re: PATCH: Fix x86 segment register access)



On Tue, 29 Mar 2005, H. J. Lu wrote:
>
> > the smaller and faster version do not want to just rely on gas
> > automatically getting it right, especially since gas has historically been
> > very very bad at getting things right.
>
> We are fixing those issues in assembler. If people run into problems
> like that with gas, they can report them. They will be fixed.

It's fine if gas fixes things. It's not fine if gas breaks things that
used to work, for no really good reason.

> > What is the advantage of not allowing "movl %ds,mem"? Really? Especially
> > since I suspect the kernel is pretty much the only one who does this, and
> > the kernel really does do it on purpose. The kernel explicitly wants the
> > 32-bit version, knowing that the upper bits are undefined.
> >
>
> Kernel has
>
> unsigned gsindex;
> asm volatile("movl %%gs,%0" : "=g" (gsindex));

Ok, that's a real x86-64 bug, it seems. Andi, please fix, preferably by
just making the "g" be a "r".

However, your argument isn't very valid, since:

> The new assembler will make sure that it won't happen.

Not true, since the suggestion was just to change all segment "movl"
things to "mov", at which point the same old bug is still there, and the
assembler didn't really help us at all.

See the problem? You're not actually protecting anything. The change just
makes it _harder_ to make sizes explicit, and suddenly we have to trust an
assembler to be clever about sizes, when that assembler historically has
definitely _not_ been very clever about them at all.

Linus

2005-03-30 04:00:38

by H. J. Lu

[permalink] [raw]
Subject: Re: i386/x86_64 segment register issuses (Re: PATCH: Fix x86 segment register access)

On Tue, Mar 29, 2005 at 06:44:18PM -0800, Linus Torvalds wrote:
>
>
> On Tue, 29 Mar 2005, H. J. Lu wrote:
> >
> > > the smaller and faster version do not want to just rely on gas
> > > automatically getting it right, especially since gas has historically been
> > > very very bad at getting things right.
> >
> > We are fixing those issues in assembler. If people run into problems
> > like that with gas, they can report them. They will be fixed.
>
> It's fine if gas fixes things. It's not fine if gas breaks things that
> used to work, for no really good reason.
>
> > > What is the advantage of not allowing "movl %ds,mem"? Really? Especially
> > > since I suspect the kernel is pretty much the only one who does this, and
> > > the kernel really does do it on purpose. The kernel explicitly wants the
> > > 32-bit version, knowing that the upper bits are undefined.
> > >
> >
> > Kernel has
> >
> > unsigned gsindex;
> > asm volatile("movl %%gs,%0" : "=g" (gsindex));
>
> Ok, that's a real x86-64 bug, it seems. Andi, please fix, preferably by
> just making the "g" be a "r".
>
> However, your argument isn't very valid, since:
>
> > The new assembler will make sure that it won't happen.
>
> Not true, since the suggestion was just to change all segment "movl"
> things to "mov", at which point the same old bug is still there, and the
> assembler didn't really help us at all.

The new assembler won't accept

movl %gs,128(%rsp)

It makes it harder to generate binary code user doesn't tend. FWIW,
what I suggested are in

http://sourceware.org/ml/binutils/2005-03/msg00873.html

Thera are things like

- asm volatile("movl %%fs,%0" : "=g" (fsindex));
+ asm volatile("movl %%fs,%0" : "=r" (fsindex));

>
> See the problem? You're not actually protecting anything. The change just
> makes it _harder_ to make sizes explicit, and suddenly we have to trust an
> assembler to be clever about sizes, when that assembler historically has
> definitely _not_ been very clever about them at all.
>

There is no such an instruction of "movl %ds,(%eax)". The old assembler
accepts it and turns it into "movw %ds,(%eax)". It won't catch problems
like

unsigned fsindex;
asm volatile("movl %%fs,%0" : "=m" (fsindex));

The "movw %ds,(%eax)" bug was fixed in binutils 2.15.94.0.1. Gas no
longer generates 0x66 for it. If you find gas preventing you from doing
what the hardware supports, I will be happy to fix it.


H.J.

2005-03-30 15:25:32

by Andi Kleen

[permalink] [raw]
Subject: Re: i386/x86_64 segment register issuses (Re: PATCH: Fix x86 segment register access)

> > unsigned gsindex;
> > asm volatile("movl %%gs,%0" : "=g" (gsindex));
>
> Ok, that's a real x86-64 bug, it seems. Andi, please fix, preferably by
> just making the "g" be a "r".

Will do.

-Andi

2005-03-30 15:58:47

by Linus Torvalds

[permalink] [raw]
Subject: Re: i386/x86_64 segment register issuses (Re: PATCH: Fix x86 segment register access)


[ binutils and libc back in the discussion - I don't know why they got
dropped ]

On Tue, 29 Mar 2005, H. J. Lu wrote:
>
> There is no such an instruction of "movl %ds,(%eax)". The old assembler
> accepts it and turns it into "movw %ds,(%eax)".

I disagree. Violently. As does the old assembler, which does not turn
"mov" into "movw" as you say. AT ALL.

A "movw" has a 0x66 prefix. The assembler agree with me. Plain logic
agrees with me. Being consistent _also_ agrees with me (it's the same damn
instruction to move to a register, for chrissake!)

"movw" is totally different from "movl". They _act_ the same, but that's
like saying that "orw $5,%ax" is the same as "orl $5,%eax". They also
_act_ the same, but that IN NO WAY makes them the same.

According to your logic, the assembler should disallow "orl $5,ax" because
it does the same thing as "or $5,%eax" and "orw $5,%eax", and thus to
"protect" the user, the user should not be able to say the size
explicitly.

The fact is, every single "mov" instruction takes the size hint, and it
HAS MEANING, even if the meaning is only about performance, not about
semantics. In other words, yes, in the specific case of "mov segment to
memory", it ends up being only a performance hit, but as such IT DOES HAVE
MEANING. And in fact, even if it didn't end up having any meaning at all,
it's still a good idea as just a consistency issue.

Dammit, if I say "orl $5,%eax", I mean "orl $5,%eax", and if the assembler
complains about it or claims it is the same as "orw $5,%ax", then the
assembler is fundamentally BROKEN.

None of your arguments have in any way responded to this fact.

If you think people should use just "mov", then fine, let people use
"mov". That's their choice - the same way you can write just "or $5,%eax"
and gas will pick the 32-bit version based on the register name, yes, you
should be able to write just "mov %fs,mem", and gas will pick whatever
version using its heuristics for the size (in this case the 32-bit, since
it does the same thing and is smaller and faster).

And "mov" has always worked. The kernel just doesn't use it much, because
the kernel - for good historical reasons - doesn't trust gas to pick sizes
of instructions automagically.

And the fact that it is obvious that gas _should_ pick the 32-bit format
of the instruction when you do not specify a size does NOT MEAN that it's
wrong to specify the size explicitly.

And your arguments that there is no semantic difference between the 16-bit
and the 32-bit version IS MEANINGLESS. An assembler shouldn't care. This
is not an argument about semantic difference. This is an argument over a
user wanting to make the size explicit, to DOCUMENT it.

The fact is, if users use "movl" and "movw" explicitly (and the kernel has
traditionally been _very_ careful to use all instruction sizes explicitly,
partly exactly because gas itself has been very happy-go-lucky about
them), then that is a GOOD THING. It means that the instruction is
well-defined to somebody who knows the x86 instruction set, and he never
needs to worry or use "objdump" to see if gas was being stupid and
generated the 16-bit version.

Linus

2005-03-30 16:25:27

by linux-os (Dick Johnson)

[permalink] [raw]
Subject: Re: i386/x86_64 segment register issuses (Re: PATCH: Fix x86 segment register access)

On Wed, 30 Mar 2005, Linus Torvalds wrote:

>
> [ binutils and libc back in the discussion - I don't know why they got
> dropped ]
>
> On Tue, 29 Mar 2005, H. J. Lu wrote:
>>
>> There is no such an instruction of "movl %ds,(%eax)". The old assembler
>> accepts it and turns it into "movw %ds,(%eax)".
>
> I disagree. Violently. As does the old assembler, which does not turn
> "mov" into "movw" as you say. AT ALL.
>
> A "movw" has a 0x66 prefix. The assembler agree with me. Plain logic
> agrees with me. Being consistent _also_ agrees with me (it's the same damn
> instruction to move to a register, for chrissake!)
>
> "movw" is totally different from "movl". They _act_ the same, but that's
> like saying that "orw $5,%ax" is the same as "orl $5,%eax". They also
> _act_ the same, but that IN NO WAY makes them the same.
>
> According to your logic, the assembler should disallow "orl $5,ax" because
> it does the same thing as "or $5,%eax" and "orw $5,%eax", and thus to
> "protect" the user, the user should not be able to say the size
> explicitly.
>
> The fact is, every single "mov" instruction takes the size hint, and it
> HAS MEANING, even if the meaning is only about performance, not about
> semantics. In other words, yes, in the specific case of "mov segment to
> memory", it ends up being only a performance hit, but as such IT DOES HAVE
> MEANING. And in fact, even if it didn't end up having any meaning at all,
> it's still a good idea as just a consistency issue.
>
> Dammit, if I say "orl $5,%eax", I mean "orl $5,%eax", and if the assembler
> complains about it or claims it is the same as "orw $5,%ax", then the
> assembler is fundamentally BROKEN.
>
> None of your arguments have in any way responded to this fact.
>
> If you think people should use just "mov", then fine, let people use
> "mov". That's their choice - the same way you can write just "or $5,%eax"
> and gas will pick the 32-bit version based on the register name, yes, you
> should be able to write just "mov %fs,mem", and gas will pick whatever
> version using its heuristics for the size (in this case the 32-bit, since
> it does the same thing and is smaller and faster).
>
> And "mov" has always worked. The kernel just doesn't use it much, because
> the kernel - for good historical reasons - doesn't trust gas to pick sizes
> of instructions automagically.
>
> And the fact that it is obvious that gas _should_ pick the 32-bit format
> of the instruction when you do not specify a size does NOT MEAN that it's
> wrong to specify the size explicitly.
>
> And your arguments that there is no semantic difference between the 16-bit
> and the 32-bit version IS MEANINGLESS. An assembler shouldn't care. This
> is not an argument about semantic difference. This is an argument over a
> user wanting to make the size explicit, to DOCUMENT it.
>
> The fact is, if users use "movl" and "movw" explicitly (and the kernel has
> traditionally been _very_ careful to use all instruction sizes explicitly,
> partly exactly because gas itself has been very happy-go-lucky about
> them), then that is a GOOD THING. It means that the instruction is
> well-defined to somebody who knows the x86 instruction set, and he never
> needs to worry or use "objdump" to see if gas was being stupid and
> generated the 16-bit version.
>
> Linus
> -


We went over this stuff when we first started using the
Intel 486. (Ref Intel 486 Microprocessor Programmers
reference manual, ISBN 1-55512-192-4)

Segment registers are really 32 bits in length. They
have a 'visible' part and an invisible part. The
visible part contains the 16-bit selector. The
invisible part contains the base address, limit,
etc., that was loaded from the GDT or the LDT.
(Ref. pp 5-9)

All access to these registers is 32 bits. If you
execute 'push ds' or 'pop ds' the stack-pointer
will move 4 bytes. An 0x66 override prefix is
ignored when accessing segment registers. It
should never be used. There is another override
prefix that can be used instead. The push ds
opcode is 0x1e and the pop ds opcode is 0x1f
if somebody wants to experiment.

Even a move from a CPU general purpose register
to a segment register is a 32-bit operation. If
you want to move the contents of a segment register
to memory or a register as a 16-bit action, for
instance not overwriting the high-word of a register,
the override prefix is 0x67, not 0x66. (Ref. pp 26-210)

This means that segment values stored in memory
should really be aligned on 32-bit boundaries
so that extra clock-cycles are not wasted
accessing these registers. This also means
that they should be treated as (Posix) uint32_t
not uint16_t, even though the value will never
exceed 8192.

So if there are any "movw (mem), %ds" and
"movw %ds, (mem)" in the code. The sizeof(mem)
needs to be 32-bits and the 'w' needs to be removed.
Otherwise, we are wasting CPU cycles and/or fooling
ourselves. GAS needs to continue to generate whatever
it was fed, with appropriate diagnostics if it
is fed the wrong stuff.


Cheers,
Dick Johnson
Penguin : Linux version 2.6.11 on an i686 machine (5537.79 BogoMips).
Notice : All mail here is now cached for review by Dictator Bush.
98.36% of all statistics are fiction.

2005-03-30 21:11:49

by H. J. Lu

[permalink] [raw]
Subject: Re: i386/x86_64 segment register issuses (Re: PATCH: Fix x86 segment register access)

On Wed, Mar 30, 2005 at 07:57:28AM -0800, Linus Torvalds wrote:
>
> [ binutils and libc back in the discussion - I don't know why they got
> dropped ]

Removing glibc since it accesses segment register with proper
instructions.

>
> On Tue, 29 Mar 2005, H. J. Lu wrote:
> >
> > There is no such an instruction of "movl %ds,(%eax)". The old assembler
> > accepts it and turns it into "movw %ds,(%eax)".
>
> I disagree. Violently. As does the old assembler, which does not turn
> "mov" into "movw" as you say. AT ALL.

I should have made myself clear. By "movw %ds,(%eax)", I meant:

8c 18 movw %ds,(%eax)

That is what the assembler generates, and should have generated, for
"movw %ds,(%eax)" since Nov. 4, 2004.

>
> A "movw" has a 0x66 prefix. The assembler agree with me. Plain logic
> agrees with me. Being consistent _also_ agrees with me (it's the same damn
> instruction to move to a register, for chrissake!)

This is a bug in asssembler and has been fixed on Nov. 4, 2004. If
you want the 0x66 prefix for "movw %ds,(%eax)", you need to use
"word movw %ds,(%eax)" with the new assembler.

>
> The fact is, every single "mov" instruction takes the size hint, and it
> HAS MEANING, even if the meaning is only about performance, not about
> semantics. In other words, yes, in the specific case of "mov segment to
> memory", it ends up being only a performance hit, but as such IT DOES HAVE
> MEANING. And in fact, even if it didn't end up having any meaning at all,
> it's still a good idea as just a consistency issue.

Accessing segment register is a very special case. It has been treated
differently by gas. Try "movw (%eax),%ds" with your gas. Gas doesn't
generate 0x66. The "movw %ds,(%eax)" bug was fixed last year.

> If you think people should use just "mov", then fine, let people use

I only suggested "mov" for old assemblers.

> "mov". That's their choice - the same way you can write just "or $5,%eax"
> and gas will pick the 32-bit version based on the register name, yes, you
> should be able to write just "mov %fs,mem", and gas will pick whatever
> version using its heuristics for the size (in this case the 32-bit, since
> it does the same thing and is smaller and faster).
>
> And "mov" has always worked. The kernel just doesn't use it much, because
> the kernel - for good historical reasons - doesn't trust gas to pick sizes
> of instructions automagically.
>
> And the fact that it is obvious that gas _should_ pick the 32-bit format
> of the instruction when you do not specify a size does NOT MEAN that it's
> wrong to specify the size explicitly.
>
> And your arguments that there is no semantic difference between the 16-bit
> and the 32-bit version IS MEANINGLESS. An assembler shouldn't care. This

For segment register access, there is no 16-bit nor 32-bit version.
There is only one version.

> is not an argument about semantic difference. This is an argument over a
> user wanting to make the size explicit, to DOCUMENT it.

Are you suggesting that gas should put back 0x66 for both
"movw %ds,(%eax)" and "movw (%eax),%ds"?

>
> The fact is, if users use "movl" and "movw" explicitly (and the kernel has
> traditionally been _very_ careful to use all instruction sizes explicitly,
> partly exactly because gas itself has been very happy-go-lucky about
> them), then that is a GOOD THING. It means that the instruction is
> well-defined to somebody who knows the x86 instruction set, and he never
> needs to worry or use "objdump" to see if gas was being stupid and
> generated the 16-bit version.

Allowing "movl %ds,(%eax)" has a possibilty that people assume it will
update 32bit memory location. That is how this issue was uncovered.
If you really don't like "mov %ds,(%eax)" and want to support the
old assembler, I can write a kernel patch to check asssembler to
use "movl" for the old asssembler and "movw" for the new assembler.

BTW, to report problems with assembler, there is

http://www.sourceware.org/bugzilla/

Or I can be reached at [email protected].


H.J.

2005-03-30 21:15:00

by H. J. Lu

[permalink] [raw]
Subject: Re: i386/x86_64 segment register issuses (Re: PATCH: Fix x86 segment register access)

On Wed, Mar 30, 2005 at 11:23:25AM -0500, linux-os wrote:
>
> So if there are any "movw (mem), %ds" and
> "movw %ds, (mem)" in the code. The sizeof(mem)
> needs to be 32-bits and the 'w' needs to be removed.
> Otherwise, we are wasting CPU cycles and/or fooling
> ourselves. GAS needs to continue to generate whatever
> it was fed, with appropriate diagnostics if it
> is fed the wrong stuff.

FYI, gas hasn't generated 0x66 on "movw (%eax),%ds" for a long time
and started doing it on "movw %ds,(%eax)" since Nov. 4, 2004.


H.J.

2005-03-30 22:18:59

by Pau Aliagas

[permalink] [raw]
Subject: Re: i386/x86_64 segment register issuses (Re: PATCH: Fix x86 segment register access)

On Wed, 30 Mar 2005, H. J. Lu wrote:

> On Wed, Mar 30, 2005 at 07:57:28AM -0800, Linus Torvalds wrote:

>>> There is no such an instruction of "movl %ds,(%eax)". The old assembler
>>> accepts it and turns it into "movw %ds,(%eax)".
>>
>> I disagree. Violently. As does the old assembler, which does not turn
>> "mov" into "movw" as you say. AT ALL.
>
> I should have made myself clear. By "movw %ds,(%eax)", I meant:
>
> 8c 18 movw %ds,(%eax)
>
> That is what the assembler generates, and should have generated, for
> "movw %ds,(%eax)" since Nov. 4, 2004.

Could this be the reason for the reported slowdown in the last six months?

--

Pau

2005-03-31 00:37:49

by H. J. Lu

[permalink] [raw]
Subject: Re: i386/x86_64 segment register issuses (Re: PATCH: Fix x86 segment register access)

On Thu, Mar 31, 2005 at 12:18:55AM +0200, Pau Aliagas wrote:
> On Wed, 30 Mar 2005, H. J. Lu wrote:
>
> >On Wed, Mar 30, 2005 at 07:57:28AM -0800, Linus Torvalds wrote:
>
> >>>There is no such an instruction of "movl %ds,(%eax)". The old assembler
> >>>accepts it and turns it into "movw %ds,(%eax)".
> >>
> >>I disagree. Violently. As does the old assembler, which does not turn
> >>"mov" into "movw" as you say. AT ALL.
> >
> >I should have made myself clear. By "movw %ds,(%eax)", I meant:
> >
> > 8c 18 movw %ds,(%eax)
> >
> >That is what the assembler generates, and should have generated, for
> >"movw %ds,(%eax)" since Nov. 4, 2004.
>
> Could this be the reason for the reported slowdown in the last six months?
>

Can you elaborate?


H.J.

2005-03-31 00:58:09

by Pau Aliagas

[permalink] [raw]
Subject: Re: i386/x86_64 segment register issuses (Re: PATCH: Fix x86 segment register access)

On Wed, 30 Mar 2005, H. J. Lu wrote:

>>> That is what the assembler generates, and should have generated, for
>>> "movw %ds,(%eax)" since Nov. 4, 2004.
>>
>> Could this be the reason for the reported slowdown in the last six months?
>
> Can you elaborate?

There's an unexplained slowdown of kernel 2.6 detailed in this thread:
http://kerneltrap.org/node/4940

I don't want at all to justify it with the change you talk about in gas,
but maybe it is worth to check if it has anything to do with it. The
slowdown happened in this last six months.

--

Pau

2005-03-31 01:53:12

by H. J. Lu

[permalink] [raw]
Subject: Re: i386/x86_64 segment register issuses (Re: PATCH: Fix x86 segment register access)

On Thu, Mar 31, 2005 at 02:57:57AM +0200, Pau Aliagas wrote:
> On Wed, 30 Mar 2005, H. J. Lu wrote:
>
> >>>That is what the assembler generates, and should have generated, for
> >>>"movw %ds,(%eax)" since Nov. 4, 2004.
> >>
> >>Could this be the reason for the reported slowdown in the last six months?
> >
> >Can you elaborate?
>
> There's an unexplained slowdown of kernel 2.6 detailed in this thread:
> http://kerneltrap.org/node/4940
>

It is dated as "November 13, 2002 - 13:58". The assembler change was
made on Nov. 4, 2004. I don't think they are related at all.

> I don't want at all to justify it with the change you talk about in gas,
> but maybe it is worth to check if it has anything to do with it. The
> slowdown happened in this last six months.


H.J.

2005-03-31 10:19:56

by Andi Kleen

[permalink] [raw]
Subject: Re: i386/x86_64 segment register issuses (Re: PATCH: Fix x86 segment register access)

> >That is what the assembler generates, and should have generated, for
> >"movw %ds,(%eax)" since Nov. 4, 2004.
>
> Could this be the reason for the reported slowdown in the last six months?

No.

-Andi