LinuxLists.cc - i386 flags register clober in inline assembly

2001-11-17 14:09:44

Subject: i386 flags register clober in inline assembly

There are several inline assembly routines in the i386 port, which
clobber the flags register, nevertheless fail to declare that.
E.g. atomic_inc

__asm__ __volatile__(
LOCK "incl %0"
:"=m" (v->counter)
:"m" (v->counter));

should be

__asm__ __volatile__(
LOCK "incl %0"
:"=m" (v->counter)
:"m" (v->counter)
: "cc");

since "incl" clobbers flags.

Any ideas if these functions should be corrected ?

Although gcc documentation says "cc" has no effect on some machines,
it seems this is not the case with i386, judging from the "(set (reg:CC 17) ..."
and "(clobber (reg:CC 17))" in i386.md.

Regards,
-velco

2001-11-17 15:15:15

by Jan Hubicka

[permalink] [raw]

Subject: Re: i386 flags register clober in inline assembly

>
> There are several inline assembly routines in the i386 port, which
> clobber the flags register, nevertheless fail to declare that.
> E.g. atomic_inc
>
> __asm__ __volatile__(
> LOCK "incl %0"
> :"=m" (v->counter)
> :"m" (v->counter));
>
> should be
>
> __asm__ __volatile__(
> LOCK "incl %0"
> :"=m" (v->counter)
> :"m" (v->counter)
> : "cc");
>
> since "incl" clobbers flags.
>
> Any ideas if these functions should be corrected ?
They don't need to be. On i386, the flags are (partly for historical reasons) clobbered
by default.
>
> Although gcc documentation says "cc" has no effect on some machines,
> it seems this is not the case with i386, judging from the "(set (reg:CC 17) ..."
> and "(clobber (reg:CC 17))" in i386.md.
i386.md does add the clobber to each asm statement.

Honza
>
> Regards,
> -velco
>

2001-11-17 19:26:30

by Linus Torvalds

[permalink] [raw]

Subject: Re: i386 flags register clober in inline assembly

In article <[email protected]> you write:
>
>They don't need to be. On i386, the flags are (partly for historical reasons) clobbered
>by default.

However, this is one area where I would just be tickled pink if gcc were
to allow asm's to return status in eflags, even if that means that we
need to fix all our existing asms.

We have some really _horrid_ code where we use operations that
intrinsically set the flag bits, and we actually want to use them.
Using things like cmpxchg, and atomic decrement-and-test-with-zero have
these horrid asm statements that have to move the eflags value (usually
just one bit) into a register, so that we can tell gcc where it is.

(Example code:

atomic_sub_and_test:
__asm__ __volatile__(
LOCK "subl %2,%0; sete %1"
:"=m" (v->counter), "=qm" (c)
:"ir" (i), "m" (v->counter) : "memory");

Where we first get the value we _really_ want in ZF in eflags, then we
use "sete" to move it to a register, and then gcc will end up generating
code to test that register by hand, so the end result is usually
something like:

#APP
lock ; decl 20(%edi); sete %al
#NO_APP
testb %al, %al
je .L1570

even though what we'd _want_ is really

lock ; decl 20(%edi)
jne .L1570

which is not only smaller and faster, but is often _really_ faster
because at least some Intel CPU's will forward the flags values to the
branch prediction stuff, and going through a register dependency will
add non-forwarded state and thus extra cycles.

So I would personally _really_ really like for some way to expose the
internal gcc

"(set (cc0) ..asm..)"

construct, together with some way of setting the cc_status.flags.

>From what I can tell, all the x86 machine description already uses "cc0"
together with the notion of comparing it to zero (either signed or
unsigned), so something like this _might_ just work

unsigned long result;
asm volatile(
LOCK "decl %m"
:"+m" (v->counter),
"=cc" (result)
: :"memory");
if (result > 0) /* "jnb" */
...

which would be wonderful, and would expand to

(set (cc0) ..asm..)
(set (pc)
(if_then_else (gtu (cc0) (const_int 0))
(label_ref (match_operand ..
(pc))

Which _should_ just automatically give us

lock ; decl ..
ja ..

which is exactly what we want.

I know this used to be impossible in gcc, because the x86 didn't
actually track the flags values, and conditional jumps were really a
_combination_ of the conditional and the jump, and splitting it up so
that the conditional would be in an asm was thus not possible.

But I think gcc makes cc0 explicit on x86 these days, and that the above
kind of setup might be possible today, no?

Linus

2001-11-17 19:59:05

by H. Peter Anvin

[permalink] [raw]

Subject: Re: i386 flags register clober in inline assembly

Followup to: <[email protected]>
By author: Linus Torvalds <[email protected]>
In newsgroup: linux.dev.kernel
>
> In article <[email protected]> you write:
> >
> >They don't need to be. On i386, the flags are (partly for historical reasons) clobbered
> >by default.
>
> However, this is one area where I would just be tickled pink if gcc were
> to allow asm's to return status in eflags, even if that means that we
> need to fix all our existing asms.
>
> We have some really _horrid_ code where we use operations that
> intrinsically set the flag bits, and we actually want to use them.
> Using things like cmpxchg, and atomic decrement-and-test-with-zero have
> these horrid asm statements that have to move the eflags value (usually
> just one bit) into a register, so that we can tell gcc where it is.
>

The clean way to do that would be for gcc to implement _Bool, the C99
boolean data type, and add a new kind of register for the flags, i.e.

_Bool c;

asm volatile(LOCK "subl %2,%0"
: "=m" (v->counter), "=zf" (c)
: "ir" (i), "0" (v->counter) : "memory", "cc");

-hpa
--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt <[email protected]>

2001-11-17 20:27:48

by Momchil Velikov

[permalink] [raw]

Subject: Re: i386 flags register clober in inline assembly

>>>>> "Linus" == Linus Torvalds <[email protected]> writes:

Linus> (Example code:

Linus> atomic_sub_and_test:
Linus> __asm__ __volatile__(
Linus> LOCK "subl %2,%0; sete %1"
Linus> :"=m" (v->counter), "=qm" (c)
Linus> :"ir" (i), "m" (v->counter) : "memory");

Linus> Where we first get the value we _really_ want in ZF in eflags, then we
Linus> use "sete" to move it to a register, and then gcc will end up generating
Linus> code to test that register by hand, so the end result is usually
Linus> something like:

Linus> #APP
Linus> lock ; decl 20(%edi); sete %al
Linus> #NO_APP
Linus> testb %al, %al
Linus> je .L1570

Linus> even though what we'd _want_ is really

Linus> lock ; decl 20(%edi)
Linus> jne .L1570

Linus> which is not only smaller and faster, but is often _really_ faster
Linus> because at least some Intel CPU's will forward the flags values to the
Linus> branch prediction stuff, and going through a register dependency will
Linus> add non-forwarded state and thus extra cycles.

Linus> So I would personally _really_ really like for some way to expose the
Linus> internal gcc

Linus> "(set (cc0) ..asm..)"

Linus> construct, together with some way of setting the cc_status.flags.

Linux> From what I can tell, all the x86 machine description already uses "cc0"
Linus> together with the notion of comparing it to zero (either signed or
Linus> unsigned), so something like this _might_ just work

Linus> unsigned long result;
Linus> asm volatile(
Linus> LOCK "decl %m"
Linus> :"+m" (v->counter),
Linus> "=cc" (result)
Linus> : :"memory");
Linus> if (result > 0) /* "jnb" */
Linus> ...

Linus> which would be wonderful, and would expand to

Linus> (set (cc0) ..asm..)
Linus> (set (pc)
Linus> (if_then_else (gtu (cc0) (const_int 0))
Linus> (label_ref (match_operand ..
Linus> (pc))

Indeed, with pattern like:
(define_insn "*ja"
[(set (pc)
(if_then_else (gtu (match_operand:CC 0 "" "") (const_int 0))
(label_ref (match_operand 1 "" ""))
(pc)))]
""
"ja ..."
...)

Linus> Which _should_ just automatically give us

Linus> lock ; decl ..
Linus> ja ..

Linus> which is exactly what we want.

Regards,
-velco

2001-11-17 20:35:20

by Linus Torvalds

[permalink] [raw]

Subject: Re: i386 flags register clober in inline assembly

On 17 Nov 2001, Momchil Velikov wrote:
>
> Indeed, with pattern like:
> (define_insn "*ja"
> [(set (pc)
> (if_then_else (gtu (match_operand:CC 0 "" "") (const_int 0))
> (label_ref (match_operand 1 "" ""))
> (pc)))]
> ""
> "ja ..."
> ...)

As far as I can tell, that pattern already exists in i386.md.

So it shouldn't need any new patterns, it would only need a way to allow
the setting of cc0 from the asm - so that we can get the first part,
namely the "(set (cc0) ..asm..)" thing.

Linus

2001-11-17 20:41:09

by Jan Hubicka

[permalink] [raw]

Subject: Re: i386 flags register clober in inline assembly

> unsigned long result;
> asm volatile(
> LOCK "decl %m"
> :"+m" (v->counter),
> "=cc" (result)
> : :"memory");
> if (result > 0) /* "jnb" */
> ...
>
> which would be wonderful, and would expand to
>
> (set (cc0) ..asm..)
> (set (pc)
> (if_then_else (gtu (cc0) (const_int 0))
> (label_ref (match_operand ..
> (pc))
>
> Which _should_ just automatically give us
>
> lock ; decl ..
> ja ..
>
> which is exactly what we want.
>
> I know this used to be impossible in gcc, because the x86 didn't
> actually track the flags values, and conditional jumps were really a
> _combination_ of the conditional and the jump, and splitting it up so
> that the conditional would be in an asm was thus not possible.
>
> But I think gcc makes cc0 explicit on x86 these days, and that the above
> kind of setup might be possible today, no?
Actually the main dificulty I see is storing cc0 to variable. CC0 is hard
register and pretty strange one - you can't move it, you can't spill. Using
the syntax above you can easilly make cc0 from asm statement to span another
cc0 set resulting in incorrect code or compiler crash.
(the code generator may insert any code in between statements as it don't
know he can't clobber cc0. In fact this is happening in from of if
construct as deffered stack deallocators are flushed).

If i386 were IA-64 this would be possible as their flags behave more regularry,
but in i386 way I don't see easy trick how to get this (or something
equivalent) working. Maybe someone do have idea how to get around, but I was
thinking about this some time ago too and failed to find feasible sollution.

Honza
>
> Linus

2001-11-17 20:47:49

by Linus Torvalds

[permalink] [raw]

Subject: Re: i386 flags register clober in inline assembly

On Sat, 17 Nov 2001, Jan Hubicka wrote:
>
> Actually the main dificulty I see is storing cc0 to variable. CC0 is hard
> register and pretty strange one - you can't move it, you can't spill.

Well, you _can_ spill it, but you need to use "pushfl/popfl" to
spill/restore.

Also, if you're clever you don't spill cc0 itself, but the _comparison_,
ie if you need to spill in

asm(.. "=cc" (cc0))
if (cc0 > 0)

a sufficiently clever spill-engine would spill not eflags, but instead
spill "cc0 > 0", which it can do with the "seq" expansions..

gcc already does know about "store-flag" instructions, although I
certainly agree that the _patterns_ of usage may end up being very
different than existing conditional comparisons..

Linus

2001-11-17 21:01:10

by H. Peter Anvin

[permalink] [raw]

Subject: Re: i386 flags register clober in inline assembly

Followup to: <[email protected]>
By author: Jan Hubicka <[email protected]>
In newsgroup: linux.dev.kernel
>
> Actually the main dificulty I see is storing cc0 to variable. CC0 is hard
> register and pretty strange one - you can't move it, you can't spill. Using
> the syntax above you can easilly make cc0 from asm statement to span another
> cc0 set resulting in incorrect code or compiler crash.
> (the code generator may insert any code in between statements as it don't
> know he can't clobber cc0. In fact this is happening in from of if
> construct as deffered stack deallocators are flushed).
>

Why can't you move or spill it? There are a whole lot of ways you
could do either: pushf, sahf, setcc...

-hpa
--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt <[email protected]>

2001-11-18 01:10:17

by Jan Hubicka

[permalink] [raw]

Subject: Re: i386 flags register clober in inline assembly

>
> On Sat, 17 Nov 2001, Jan Hubicka wrote:
> >
> > Actually the main dificulty I see is storing cc0 to variable. CC0 is hard
> > register and pretty strange one - you can't move it, you can't spill.
>
> Well, you _can_ spill it, but you need to use "pushfl/popfl" to
> spill/restore.
I know, I can, but we are speaking about sollution how to get programs faster,
not slow down them to scrawl.
Sadly gcc's reload pass is one (or the) most tricky part of gcc and adding any
new feature to that beast is extreme problem, especially with later maitenance.

Actually what can be feasible is to make asm statement set flags and follow
it by store flag instruction that will be used in the conditional. Later
the combine pass should be able to get it connected.

Problem with this sollution I see is that it would need to redesign way
the conditionals are output. Currently gcc believes there is single flags
register and emits comparison and user of flags separately.

Backends plays around this and remembers the comparisons instead of emitting
and do use it later once asked for emitting the flags user. If the asm
statement is emit, backend would not be notified and whole machinery needs
to be tweaked.

Also I am not quite sure how to nicely design the syntax representing the
asm and flag store together.

The 3.1 feature freeze is in few weeks, not enought to implement something
so drastic, but I will try to keep it in the mind and discuss later for 3.2.

The design of asm statements should be IMO re-tought. I think it has been
mistake to make them so low level and allow user to write constraints directly,
so perhaps we can think about big change for future gcc...

Honza
>
> Also, if you're clever you don't spill cc0 itself, but the _comparison_,
> ie if you need to spill in
>
> asm(.. "=cc" (cc0))
> if (cc0 > 0)
>
> a sufficiently clever spill-engine would spill not eflags, but instead
> spill "cc0 > 0", which it can do with the "seq" expansions..
>
> gcc already does know about "store-flag" instructions, although I
> certainly agree that the _patterns_ of usage may end up being very
> different than existing conditional comparisons..
>
> Linus

2001-11-18 02:53:34

by Linus Torvalds

[permalink] [raw]

Subject: Re: i386 flags register clober in inline assembly

On Sun, 18 Nov 2001, Jan Hubicka wrote:
> >
> > Well, you _can_ spill it, but you need to use "pushfl/popfl" to
> > spill/restore.
>
> I know, I can, but we are speaking about sollution how to get programs faster,
> not slow down them to scrawl.

Agreed, which is why I suggested the "spill the _comparison_, not the
actual cc0 register" approach.

That way, if you have to spill, you'll end up at worst with the same code
we already have to have, ie the code will end up something like

lock ; decl mem
seta %al <- spill comparison to %al
..
testb %al,%al <- re-do comparison test later
jne ..

> Actually what can be feasible is to make asm statement set flags and follow
> it by store flag instruction that will be used in the conditional. Later
> the combine pass should be able to get it connected.

That sounds pretty ideal - have some way of telling gcc to add a "seta
%reg", while at the same time telling gcc that if it can elide the "seta"
and use a direct jump instead, do so..

> The design of asm statements should be IMO re-tought. I think it has been
> mistake to make them so low level and allow user to write constraints directly,
> so perhaps we can think about big change for future gcc...

I don't see many alternatives. The fact is, asm's end up being used
exactly when gcc simply doesn't know what to do, so gcc doesn't know what
the constraints are either.

Linus

2001-11-20 08:34:18

by Richard Henderson

[permalink] [raw]

Subject: Re: i386 flags register clober in inline assembly

On Sat, Nov 17, 2001 at 06:48:22PM -0800, Linus Torvalds wrote:
> That sounds pretty ideal - have some way of telling gcc to add a "seta
> %reg", while at the same time telling gcc that if it can elide the "seta"
> and use a direct jump instead, do so..

Hmm. It appears to be easy to do with machine-dependent builtins. E.g.

int x;
__asm__ __volatile__(LOCK "subl %1,%0"
: "=m"(v->counter) : "ir"(i) : "memory");
x = __builtin_ia32_sete();
if (x) {
...
}

Now, you'd have to be careful in where that __builtin_ia32_sete
gets placed, but I'd guess that immediately after an asm would
be relatively safe. No 100% guarantees on that, unfortunately.

And the sete _ought_ to get merged with the if test by combine
or cse with no extra code.

It wouldn't take too much effort to try this out either...

r~

2001-11-20 13:01:38

by Jan Hubicka

[permalink] [raw]

Subject: Re: i386 flags register clober in inline assembly

> On Sat, Nov 17, 2001 at 06:48:22PM -0800, Linus Torvalds wrote:
> > That sounds pretty ideal - have some way of telling gcc to add a "seta
> > %reg", while at the same time telling gcc that if it can elide the "seta"
> > and use a direct jump instead, do so..
>
> Hmm. It appears to be easy to do with machine-dependent builtins. E.g.
>
> int x;
> __asm__ __volatile__(LOCK "subl %1,%0"
> : "=m"(v->counter) : "ir"(i) : "memory");
> x = __builtin_ia32_sete();
> if (x) {
> ...
> }
>
> Now, you'd have to be careful in where that __builtin_ia32_sete
> gets placed, but I'd guess that immediately after an asm would
> be relatively safe. No 100% guarantees on that, unfortunately.
>
> And the sete _ought_ to get merged with the if test by combine
> or cse with no extra code.
>
> It wouldn't take too much effort to try this out either...
True. Only obstackle I see is how to make visible that the flags
are set by the asm statement. I guess we need to replace the clobber
we have by set. Do you have any idea for nice syntax for this?
Or just to do that by default, as asms are mostly non-single-set anyway?

Honza

>
>
> r~

2001-11-20 14:49:19

by Martin Dalecki

[permalink] [raw]

Subject: PATCH 2.4.15-pre6 idt compilation and proc_misc cleanup.

diff -ur linux-mdcki/fs/proc/proc_misc.c linux-mdcki-new/fs/proc/proc_misc.c
--- linux-mdcki/fs/proc/proc_misc.c Sun Nov 18 15:09:57 2001
+++ linux-mdcki-new/fs/proc/proc_misc.c Tue Nov 20 02:46:18 2001
@@ -50,11 +50,6 @@
* have a way to deal with that gracefully. Right now I used straightforward
* wrappers, but this needs further analysis wrt potential overflows.
*/
-extern int get_hardware_list(char *);
-extern int get_stram_list(char *);
-#ifdef CONFIG_DEBUG_MALLOC
-extern int get_malloc(char * buffer);
-#endif
#ifdef CONFIG_MODULES
extern int get_module_list(char *);
#endif
@@ -151,19 +146,10 @@
si_swapinfo(&i);
pg_size = atomic_read(&page_cache_size) - i.bufferram ;

- len = sprintf(page, " total: used: free: shared: buffers: cached:\n"
- "Mem: %8Lu %8Lu %8Lu %8Lu %8Lu %8Lu\n"
- "Swap: %8Lu %8Lu %8Lu\n",
- B(i.totalram), B(i.totalram-i.freeram), B(i.freeram),
- B(i.sharedram), B(i.bufferram),
- B(pg_size), B(i.totalswap),
- B(i.totalswap-i.freeswap), B(i.freeswap));
/*
* Tagged format, for easy grepping and expansion.
- * The above will go away eventually, once the tools
- * have been updated.
*/
- len += sprintf(page+len,
+ len = sprintf(page,
"MemTotal: %8lu kB\n"
"MemFree: %8lu kB\n"
"MemShared: %8lu kB\n"
@@ -222,33 +208,6 @@
release: seq_release,
};

-#ifdef CONFIG_PROC_HARDWARE
-static int hardware_read_proc(char *page, char **start, off_t off,
- int count, int *eof, void *data)
-{
- int len = get_hardware_list(page);
- return proc_calc_metrics(page, start, off, count, eof, len);
-}
-#endif
-
-#ifdef CONFIG_STRAM_PROC
-static int stram_read_proc(char *page, char **start, off_t off,
- int count, int *eof, void *data)
-{
- int len = get_stram_list(page);
- return proc_calc_metrics(page, start, off, count, eof, len);
-}
-#endif
-
-#ifdef CONFIG_DEBUG_MALLOC
-static int malloc_read_proc(char *page, char **start, off_t off,
- int count, int *eof, void *data)
-{
- int len = get_malloc(page);
- return proc_calc_metrics(page, start, off, count, eof, len);
-}
-#endif
-
#ifdef CONFIG_MODULES
static int modules_read_proc(char *page, char **start, off_t off,
int count, int *eof, void *data)
@@ -541,15 +500,6 @@
{"uptime", uptime_read_proc},
{"meminfo", meminfo_read_proc},
{"version", version_read_proc},
-#ifdef CONFIG_PROC_HARDWARE
- {"hardware", hardware_read_proc},
-#endif
-#ifdef CONFIG_STRAM_PROC
- {"stram", stram_read_proc},
-#endif
-#ifdef CONFIG_DEBUG_MALLOC
- {"malloc", malloc_read_proc},
-#endif
#ifdef CONFIG_MODULES
{"modules", modules_read_proc},
#endif

Attachments:

compilefix-idt77252.patch (486.00 B)
clean-proc_misc.patch (2.55 kB)
Download all attachments

2001-11-20 17:17:57

by Linus Torvalds

[permalink] [raw]

Subject: Re: i386 flags register clober in inline assembly

In article <[email protected]>,
Richard Henderson <[email protected]> wrote:
>
>Hmm. It appears to be easy to do with machine-dependent builtins. E.g.
>
> int x;
> __asm__ __volatile__(LOCK "subl %1,%0"
> : "=m"(v->counter) : "ir"(i) : "memory");
> x = __builtin_ia32_sete();
> if (x) {

This would obviously be more than useful.

However, at the same time I worry that the syntax of having things
as separate expressions would be a total nightmare to support in the
long run for gcc - making sure that they never split up by mistake
during parsing/tree-forming/CSE/whatever. That makes for a nasty special
case that just sounds like a maintainance headache.

It _sounds_ like you prototyped something like the above to test it
out? If so, how hard would it be to just change the syntax slightly, and
move the "builting_ia32_sete()" syntactically into the __asm__, even if
it as an implementation then gets split out again for now.

That would make it less of a special case - or at least it would be an
_internal_ special case rather than one exported to the user.

The simplest syntactic extension would obviously be to add a fourth set
of flags, and make the above look somehting like

char flag;
__asm__ __volatile__(LOCK "subl %1,%0"
:"=m" (v->counter) /* Inputs */
:"ir" (i) /* Outputs */
:"memory" /* Clobbers */
:"=Z" (x) /* Flags (Z/E=equal, A=above, C=carry etc etc*/
);
if (x) {
...

which looks like a reasonable syntax to me. It has the advantage that it
should be _very_ easy and natural to use this syntax on predicate-based
machines like ia64, where the "flags" are trivially predicates. On such
machines I bet that the need to export the predicates is even bigger
than the need to export eflags on x86.

In fact, for predicate architectures it might be reasonable to have both
input and output predicates, which is why I did the "=Z" syntax (so that
if you find it useful to do _input_ predicates, you might have fields 4
and 5 look something like

:"=p" (is_zero)
:"p" (a == 7))

and have support for using something like "%p0" and "%!p1" etc for
specifying predicates in the assembly string. I don't know if you
already do something like this on ia64, or if gcc/ia64 even considers
the predicate bits to be independent registers.

I don't know how much inline-asm has been written for ia64, but I
suspect that it could come in handy to let gcc select predicates too,
and not have to hardcode and clobber them (or whatever it is ia64 asms
do).

Linus

2001-11-20 23:15:29

by Richard Henderson

[permalink] [raw]

Subject: Re: i386 flags register clober in inline assembly

On Tue, Nov 20, 2001 at 02:00:59PM +0100, Jan Hubicka wrote:
> True. Only obstackle I see is how to make visible that the flags
> are set by the asm statement.

Well, that's the Z constraint. Trickier is getting the backend
to _not_ add the clobber in that instance.

r~

2001-11-23 22:27:11

by Roman Zippel

[permalink] [raw]

Subject: Re: PATCH 2.4.15-pre6 idt compilation and proc_misc cleanup.

Hi,

Martin Dalecki wrote:

> 2. Killing some code which is dead since ages in proc_misc.c
> [..]
> -#ifdef CONFIG_PROC_HARDWARE
> - {"hardware", hardware_read_proc},
> -#endif
> -#ifdef CONFIG_STRAM_PROC
> - {"stram", stram_read_proc},
> -#endif

Unfortunately I see this only now, but this isn't dead at all, it's just
not used on i386, please grep more carefully next time.

bye, Roman