2009-09-27 22:34:51

by Linus Torvalds

[permalink] [raw]
Subject: Linux 2.6.32-rc1


It's been two weeks (and then some - but with last week being LinuxCon and
Plumbers conf I extended it by a few days), and as usual that means that
the merge window is all over and done with. 2.6.32-rc1 is out, so give it
a whirl.

What can I say? 67% drivers (the bulk of which is from 'staging', but
there's driver changes all over), 10% firmware, 10% arch updates
(dominated by arm, but MIPS, POWER, SH and x86 updates are there too,
along with the new 'SCore' architecture), 5% Documentation, and a random
smattering of other things (ie the normal filesystem, kernel, networking
etc updates).

For a change, I don't think we have a single new filesystem this time
around, but we do have updates to existing ones (ocfs2, btrfs, nfs, nilfs,
xfs, gfs2, ext4 - you name it).

Some of the more interesting changes (but perhaps that's just me) are some
of the VM updates (ZERO_PAGE is back!) and the writeback work by Jens and
others to spread out writeback by backing store.

Go wild, test it out, and let us know about any regressions you find,

Linus


2009-09-27 23:45:05

by Stephen Rothwell

[permalink] [raw]
Subject: Re: Linux 2.6.32-rc1

Hi Linus,

On Sun, 27 Sep 2009 15:34:53 -0700 (PDT) Linus Torvalds <[email protected]> wrote:
>
> It's been two weeks (and then some - but with last week being LinuxCon and
> Plumbers conf I extended it by a few days), and as usual that means that
> the merge window is all over and done with. 2.6.32-rc1 is out, so give it
> a whirl.

Of course, you realise that you set EXTRAVERSION to -rc2 :-(
--
Cheers,
Stephen Rothwell [email protected]
http://www.canb.auug.org.au/~sfr/


Attachments:
(No filename) (516.00 B)
(No filename) (198.00 B)
Download all attachments

2009-09-27 23:48:12

by Diego Calleja

[permalink] [raw]
Subject: Re: Linux 2.6.32-rc1

On Lunes 28 Septiembre 2009 01:44:58 Stephen Rothwell escribi?:

> Of course, you realise that you set EXTRAVERSION to -rc2 :-(

Not a big problem i guess, except that the kernel.org files are "-rc1"

2009-09-27 23:52:58

by Linus Torvalds

[permalink] [raw]
Subject: Re: Linux 2.6.32-rc1



On Mon, 28 Sep 2009, Stephen Rothwell wrote:

> Hi Linus,
>
> On Sun, 27 Sep 2009 15:34:53 -0700 (PDT) Linus Torvalds <[email protected]> wrote:
> >
> > It's been two weeks (and then some - but with last week being LinuxCon and
> > Plumbers conf I extended it by a few days), and as usual that means that
> > the merge window is all over and done with. 2.6.32-rc1 is out, so give it
> > a whirl.
>
> Of course, you realise that you set EXTRAVERSION to -rc2 :-(

Oh nnooooooooooooo-(takes breath)-ooooooo...

Damn. I hadn't realized. I'm a moron.

Ok, so it's an extra-special -rc1. It's the "short bus" kind of special
-rc1 release. But quite frankly, I'll just let it be. The git tags etc
look to be ok, it's just my editing of the Makefile itself that was too
complicated for my little mind.

I'll try not to do that again,

Linus

2009-09-28 00:17:29

by Stephen Rothwell

[permalink] [raw]
Subject: Re: Linux 2.6.32-rc1

On Sun, 27 Sep 2009 16:52:56 -0700 (PDT) Linus Torvalds <[email protected]> wrote:
>
> Ok, so it's an extra-special -rc1. It's the "short bus" kind of special
> -rc1 release. But quite frankly, I'll just let it be. The git tags etc
> look to be ok, it's just my editing of the Makefile itself that was too
> complicated for my little mind.

I was going to apply a patch to -next to set it to -rc1, but I won't
do that either. This way all bug reports are consistently confusing :-)

--
Cheers,
Stephen Rothwell [email protected]
http://www.canb.auug.org.au/~sfr/


Attachments:
(No filename) (602.00 B)
(No filename) (198.00 B)
Download all attachments

2009-09-28 07:07:28

by Eric Dumazet

[permalink] [raw]
Subject: Re: Linux 2.6.32-rc1

Linus Torvalds a ?crit :
>
> Go wild, test it out, and let us know about any regressions you find,
>
>
Hi Linus

Something seems wrong with process time accounting.

Following program should consume 10*8 seconds of cpu on a 8 cpu machine, but
with 2.6.32-rc1 numbers are crazy.

$ gcc -O2 -o process process.c -lpthread
$ ./process
PID TTY STAT TIME COMMAND
5532 pts/1 - 0:08 ./process
- - Sl+ 0:00 -
- - Rl+ 0:05 -
- - Rl+ 0:05 -
- - Rl+ 0:09 -
- - Rl+ 0:05 -
- - Rl+ 0:05 -
- - Rl+ 0:05 -
- - Rl+ 0:05 -
- - Rl+ 0:01 -
PID TTY STAT TIME COMMAND
5532 pts/1 - 0:16 ./process
- - Sl+ 0:00 -
- - Rl+ 0:14 -
- - Rl+ 0:06 -
- - Rl+ 0:14 -
- - Rl+ 0:10 -
- - Rl+ 0:06 -
- - Rl+ 0:06 -
- - Rl+ 0:10 -
- - Rl+ 0:02 -
PID TTY STAT TIME COMMAND
5532 pts/1 - 0:24 ./process
- - Sl+ 0:00 -
- - Rl+ 0:20 -
- - Rl+ 0:11 -
- - Rl+ 0:24 -
- - Rl+ 0:15 -
- - Rl+ 0:07 -
- - Rl+ 0:07 -
- - Rl+ 0:24 -
- - Rl+ 0:03 -
PID TTY STAT TIME COMMAND
5532 pts/1 - 0:32 ./process
- - Sl+ 0:00 -
- - Rl+ 0:29 -
- - Rl+ 0:12 -
- - Rl+ 0:29 -
- - Rl+ 0:21 -
- - Rl+ 0:08 -
- - Rl+ 0:08 -
- - Rl+ 0:29 -
- - Rl+ 0:04 -
PID TTY STAT TIME COMMAND
5532 pts/1 - 0:40 ./process
- - Sl+ 0:00 -
- - Rl+ 0:35 -
- - Rl+ 0:13 -
- - Rl+ 0:35 -
- - Rl+ 0:26 -
- - Rl+ 0:09 -
- - Rl+ 0:09 -
- - Rl+ 0:35 -
- - Rl+ 0:05 -
PID TTY STAT TIME COMMAND
5532 pts/1 - 0:48 ./process
- - Sl+ 0:00 -
- - Rl+ 0:40 -
- - Rl+ 0:14 -
- - Rl+ 0:40 -
- - Rl+ 0:31 -
- - Rl+ 0:10 -
- - Rl+ 0:10 -
- - Rl+ 0:40 -
- - Rl+ 0:06 -
PID TTY STAT TIME COMMAND
5532 pts/1 - 0:56 ./process
- - Sl+ 0:00 -
- - Rl+ 0:45 -
- - Rl+ 0:15 -
- - Rl+ 0:45 -
- - Rl+ 0:37 -
- - Rl+ 0:11 -
- - Rl+ 0:11 -
- - Rl+ 0:45 -
- - Rl+ 0:07 -
PID TTY STAT TIME COMMAND
5532 pts/1 - 1:05 ./process
- - Sl+ 0:00 -
- - Rl+ 0:51 -
- - Rl+ 0:21 -
- - Rl+ 0:55 -
- - Rl+ 0:46 -
- - Rl+ 0:12 -
- - Rl+ 0:12 -
- - Rl+ 0:59 -
- - Rl+ 0:08 -
PID TTY STAT TIME COMMAND
5532 pts/1 - 1:13 ./process
- - Sl+ 0:00 -
- - Rl+ 0:56 -
- - Rl+ 0:22 -
- - Rl+ 1:00 -
- - Rl+ 0:52 -
- - Rl+ 0:13 -
- - Rl+ 0:13 -
- - Rl+ 1:04 -
- - Rl+ 0:09 -
PID TTY STAT TIME COMMAND
5532 pts/1 - 5:13 ./process
- - S+ 5:13 -


$ cat process.c
#include <pthread.h>
#include <time.h>
#include <stdio.h>

/*
* Burn cpu cycles for about 10 seconds
*/
static void *do_unit(void *arg)
{
time_t t0,t1;

t0 = time(NULL);
do {
t1 = time(NULL);
} while (t1 - t0 <= 10);
return NULL;
}




int main(int argc, char *argv[])
{
pthread_t tids[8];
int i;
char cmd[128];

sprintf(cmd, "ps m -p %d", getpid());
for (i = 0; i < 8 ; i++)
pthread_create(&tids[i], NULL, do_unit, NULL);
for (i = 0; i < 9; i++) {
sleep(1);
system(cmd);
}
for (i = 0; i < 8 ; i++)
pthread_join(tids[i], NULL);
system(cmd);
return 0;
}

2009-09-28 14:35:08

by Wolfgang Erig

[permalink] [raw]
Subject: Linux 2.6.32-rc1 compile error

------------ snip ------------
CC kernel/time/tick-oneshot.o
CC kernel/time/tick-sched.o
LD kernel/time/built-in.o
CC kernel/futex.o
CC kernel/rtmutex.o
CC kernel/dma.o
CC kernel/smp.o
CC kernel/spinlock.o
CC kernel/uid16.o
CC kernel/module.o
kernel/module.c:1995: warning: type defaults to ‘int’ in declaration of ‘Elf_Hdr’
kernel/module.c:1995: error: expected ‘;’, ‘,’ or ‘)’ before ‘*’ token
kernel/module.c: In function ‘load_module’:
kernel/module.c:2203: error: ‘strmap’ undeclared (first use in this function)
kernel/module.c:2203: error: (Each undeclared identifier is reported only once
kernel/module.c:2203: error: for each function it appears in.)
kernel/module.c:2239: error: ‘symoffs’ undeclared (first use in this function)
kernel/module.c:2239: error: implicit declaration of function ‘layout_symtab’
kernel/module.c:2240: error: ‘stroffs’ undeclared (first use in this function)
make[1]: *** [kernel/module.o] Fehler 1
make: *** [kernel] Fehler 2

seemed to be a strange .config, which is attached.

Wolfgang


Attachments:
(No filename) (1.11 kB)
.config.gz (9.88 kB)
Download all attachments

2009-09-28 15:10:35

by Jaswinder Singh Rajput

[permalink] [raw]
Subject: Re: Linux 2.6.32-rc1 compile error

Hello Wolfgang,

Can you please check, following patch fix your problem. (This is
untested)

On Mon, 2009-09-28 at 16:34 +0200, Wolfgang Erig wrote:
> ------------ snip ------------
> CC kernel/time/tick-oneshot.o
> CC kernel/time/tick-sched.o
> LD kernel/time/built-in.o
> CC kernel/futex.o
> CC kernel/rtmutex.o
> CC kernel/dma.o
> CC kernel/smp.o
> CC kernel/spinlock.o
> CC kernel/uid16.o
> CC kernel/module.o
> kernel/module.c:1995: warning: type defaults to ‘int’ in declaration of ‘Elf_Hdr’
> kernel/module.c:1995: error: expected ‘;’, ‘,’ or ‘)’ before ‘*’ token
> kernel/module.c: In function ‘load_module’:
> kernel/module.c:2203: error: ‘strmap’ undeclared (first use in this function)
> kernel/module.c:2203: error: (Each undeclared identifier is reported only once
> kernel/module.c:2203: error: for each function it appears in.)
> kernel/module.c:2239: error: ‘symoffs’ undeclared (first use in this function)
> kernel/module.c:2239: error: implicit declaration of function ‘layout_symtab’
> kernel/module.c:2240: error: ‘stroffs’ undeclared (first use in this function)
> make[1]: *** [kernel/module.o] Fehler 1
> make: *** [kernel] Fehler 2
>
> seemed to be a strange .config, which is attached.
>

From: Jaswinder Singh Rajput <[email protected]>
Date: Mon, 28 Sep 2009 20:14:37 +0530
Subject: [PATCH] module: fix compilation errors

Fix following compilation errors :
kernel/module.c:1995: warning: type defaults to ‘int’ in declaration of ‘Elf_Hdr’
kernel/module.c:1995: error: expected ‘;’, ‘,’ or ‘)’ before ‘*’ token
kernel/module.c: In function ‘load_module’:
kernel/module.c:2203: error: ‘strmap’ undeclared (first use in this function)
kernel/module.c:2203: error: (Each undeclared identifier is reported only once
kernel/module.c:2203: error: for each function it appears in.)
kernel/module.c:2239: error: ‘symoffs’ undeclared (first use in this function)
kernel/module.c:2239: error: implicit declaration of function ‘layout_symtab’
kernel/module.c:2240: error: ‘stroffs’ undeclared (first use in this function)
make[1]: *** [kernel/module.o] Fehler 1
make: *** [kernel] Fehler 2

Reported-by: Wolfgang Erig <[email protected]>
Signed-off-by: Jaswinder Singh Rajput <[email protected]>
Cc: Jan Beulich <[email protected]>
Cc: Rusty Russell <[email protected]>
---
kernel/module.c | 5 ++---
1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/kernel/module.c b/kernel/module.c
index 5a29397..0034621 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -1992,7 +1992,7 @@ static inline unsigned long layout_symtab(struct module *mod,
Elf_Shdr *sechdrs,
unsigned int symindex,
unsigned int strindex,
- const Elf_Hdr *hdr,
+ const Elf_Ehdr *hdr,
const char *secstrings,
unsigned long *pstroffs,
unsigned long *strmap)
@@ -2081,9 +2081,8 @@ static noinline struct module *load_module(void __user *umod,
struct module *mod;
long err = 0;
void *percpu = NULL, *ptr = NULL; /* Stops spurious gcc warning */
-#ifdef CONFIG_KALLSYMS
unsigned long symoffs, stroffs, *strmap;
-#endif
+
mm_segment_t old_fs;

DEBUGP("load_module: umod=%p, len=%lu, uargs=%p\n",
--
1.6.0.6


2009-09-28 15:35:30

by Wolfgang Erig

[permalink] [raw]
Subject: Re: Linux 2.6.32-rc1 compile error

Hello Jaswinder,

On Mon, Sep 28, 2009 at 08:40:09PM +0530, Jaswinder Singh Rajput wrote:
>
> Can you please check, following patch fix your problem. (This is
> untested)

compilation ist fixed, kernel is running.

Thx, Wolfgang

2009-09-28 15:40:17

by Linus Torvalds

[permalink] [raw]
Subject: Re: Linux 2.6.32-rc1




On Mon, 28 Sep 2009, Eric Dumazet wrote:
> Linus Torvalds a ?crit :
> >
> > Go wild, test it out, and let us know about any regressions you find,
>
> Something seems wrong with process time accounting.
>
> Following program should consume 10*8 seconds of cpu on a 8 cpu machine, but
> with 2.6.32-rc1 numbers are crazy.

Ok, so the top-level process time looks sane _while_ the thing is running
(sum of all threads), but the per-thread times look broken (as if they had
randomly had the times of the other threads added into them - looks to me
like they on average had half the other threads' time accounted into
them).

And then at the end, it looks like the time of the threads (which was
over-accounted) get re-accounted back into the main process, so the final
time is indeed wildly inflated.

IOW, looks like we're adding thread times multiple times to other threads
(and then finally to the parent).

I'm adding the usual suspects to the cc, and leaving your results and
test-program quoted here for them.. Thomas, Martin, John - if you're not
the people I should blame for this, let me know.

Linus

---
> $ gcc -O2 -o process process.c -lpthread
> $ ./process
> PID TTY STAT TIME COMMAND
> 5532 pts/1 - 0:08 ./process
> - - Sl+ 0:00 -
> - - Rl+ 0:05 -
> - - Rl+ 0:05 -
> - - Rl+ 0:09 -
> - - Rl+ 0:05 -
> - - Rl+ 0:05 -
> - - Rl+ 0:05 -
> - - Rl+ 0:05 -
> - - Rl+ 0:01 -
> PID TTY STAT TIME COMMAND
> 5532 pts/1 - 0:16 ./process
> - - Sl+ 0:00 -
> - - Rl+ 0:14 -
> - - Rl+ 0:06 -
> - - Rl+ 0:14 -
> - - Rl+ 0:10 -
> - - Rl+ 0:06 -
> - - Rl+ 0:06 -
> - - Rl+ 0:10 -
> - - Rl+ 0:02 -
> PID TTY STAT TIME COMMAND
> 5532 pts/1 - 0:24 ./process
> - - Sl+ 0:00 -
> - - Rl+ 0:20 -
> - - Rl+ 0:11 -
> - - Rl+ 0:24 -
> - - Rl+ 0:15 -
> - - Rl+ 0:07 -
> - - Rl+ 0:07 -
> - - Rl+ 0:24 -
> - - Rl+ 0:03 -
> PID TTY STAT TIME COMMAND
> 5532 pts/1 - 0:32 ./process
> - - Sl+ 0:00 -
> - - Rl+ 0:29 -
> - - Rl+ 0:12 -
> - - Rl+ 0:29 -
> - - Rl+ 0:21 -
> - - Rl+ 0:08 -
> - - Rl+ 0:08 -
> - - Rl+ 0:29 -
> - - Rl+ 0:04 -
> PID TTY STAT TIME COMMAND
> 5532 pts/1 - 0:40 ./process
> - - Sl+ 0:00 -
> - - Rl+ 0:35 -
> - - Rl+ 0:13 -
> - - Rl+ 0:35 -
> - - Rl+ 0:26 -
> - - Rl+ 0:09 -
> - - Rl+ 0:09 -
> - - Rl+ 0:35 -
> - - Rl+ 0:05 -
> PID TTY STAT TIME COMMAND
> 5532 pts/1 - 0:48 ./process
> - - Sl+ 0:00 -
> - - Rl+ 0:40 -
> - - Rl+ 0:14 -
> - - Rl+ 0:40 -
> - - Rl+ 0:31 -
> - - Rl+ 0:10 -
> - - Rl+ 0:10 -
> - - Rl+ 0:40 -
> - - Rl+ 0:06 -
> PID TTY STAT TIME COMMAND
> 5532 pts/1 - 0:56 ./process
> - - Sl+ 0:00 -
> - - Rl+ 0:45 -
> - - Rl+ 0:15 -
> - - Rl+ 0:45 -
> - - Rl+ 0:37 -
> - - Rl+ 0:11 -
> - - Rl+ 0:11 -
> - - Rl+ 0:45 -
> - - Rl+ 0:07 -
> PID TTY STAT TIME COMMAND
> 5532 pts/1 - 1:05 ./process
> - - Sl+ 0:00 -
> - - Rl+ 0:51 -
> - - Rl+ 0:21 -
> - - Rl+ 0:55 -
> - - Rl+ 0:46 -
> - - Rl+ 0:12 -
> - - Rl+ 0:12 -
> - - Rl+ 0:59 -
> - - Rl+ 0:08 -
> PID TTY STAT TIME COMMAND
> 5532 pts/1 - 1:13 ./process
> - - Sl+ 0:00 -
> - - Rl+ 0:56 -
> - - Rl+ 0:22 -
> - - Rl+ 1:00 -
> - - Rl+ 0:52 -
> - - Rl+ 0:13 -
> - - Rl+ 0:13 -
> - - Rl+ 1:04 -
> - - Rl+ 0:09 -
> PID TTY STAT TIME COMMAND
> 5532 pts/1 - 5:13 ./process
> - - S+ 5:13 -
>
>
> $ cat process.c
> #include <pthread.h>
> #include <time.h>
> #include <stdio.h>
>
> /*
> * Burn cpu cycles for about 10 seconds
> */
> static void *do_unit(void *arg)
> {
> time_t t0,t1;
>
> t0 = time(NULL);
> do {
> t1 = time(NULL);
> } while (t1 - t0 <= 10);
> return NULL;
> }
>
>
>
>
> int main(int argc, char *argv[])
> {
> pthread_t tids[8];
> int i;
> char cmd[128];
>
> sprintf(cmd, "ps m -p %d", getpid());
> for (i = 0; i < 8 ; i++)
> pthread_create(&tids[i], NULL, do_unit, NULL);
> for (i = 0; i < 9; i++) {
> sleep(1);
> system(cmd);
> }
> for (i = 0; i < 8 ; i++)
> pthread_join(tids[i], NULL);
> system(cmd);
> return 0;
> }
>

2009-09-28 16:26:14

by Randy Dunlap

[permalink] [raw]
Subject: [PATCH] isdn: fix netjet/isdnhdlc build errors

From: Randy Dunlap <[email protected]>

Commit cb3824bade2549d7ad059d5802da43312540fdee didn't fix this problem.

Fix build errors in netjet, using isdnhdlc module:

drivers/built-in.o: In function `mode_tiger':
netjet.c:(.text+0x1ca0c7): undefined reference to `isdnhdlc_rcv_init'
netjet.c:(.text+0x1ca0d4): undefined reference to `isdnhdlc_out_init'
drivers/built-in.o: In function `fill_dma':
netjet.c:(.text+0x1ca2bd): undefined reference to `isdnhdlc_encode'
drivers/built-in.o: In function `read_dma':
netjet.c:(.text+0x1ca614): undefined reference to `isdnhdlc_decode'
drivers/built-in.o: In function `nj_irq':
netjet.c:(.text+0x1cb07a): undefined reference to `isdnhdlc_encode'

drivers/built-in.o: In function `isdnhdlc_decode':
(.text+0x1c2088): undefined reference to `crc_ccitt_table'
drivers/built-in.o: In function `isdnhdlc_encode':
(.text+0x1c2339): undefined reference to `crc_ccitt_table'

Signed-off-by: Randy Dunlap <[email protected]>
Cc: Karsten Keil <[email protected]>
---
drivers/isdn/hardware/mISDN/Kconfig | 1 +
drivers/isdn/i4l/Kconfig | 3 +--
2 files changed, 2 insertions(+), 2 deletions(-)

--- lnx-2632-rc1.orig/drivers/isdn/hardware/mISDN/Kconfig
+++ lnx-2632-rc1/drivers/isdn/hardware/mISDN/Kconfig
@@ -78,6 +78,7 @@ config MISDN_NETJET
depends on PCI
select MISDN_IPAC
select ISDN_HDLC
+ select ISDN_I4L
help
Enable support for Traverse Technologies NETJet PCI cards.

--- lnx-2632-rc1.orig/drivers/isdn/i4l/Kconfig
+++ lnx-2632-rc1/drivers/isdn/i4l/Kconfig
@@ -141,8 +141,7 @@ endmenu
endif

config ISDN_HDLC
- tristate
- depends on HISAX_ST5481
+ tristate
select CRC_CCITT
select BITREVERSE

2009-09-28 17:15:09

by Martin Schwidefsky

[permalink] [raw]
Subject: Re: Linux 2.6.32-rc1

On Mon, 28 Sep 2009 08:39:31 -0700 (PDT)
Linus Torvalds <[email protected]> wrote:

> On Mon, 28 Sep 2009, Eric Dumazet wrote:
> > Linus Torvalds a écrit :
> > >
> > > Go wild, test it out, and let us know about any regressions you find,
> >
> > Something seems wrong with process time accounting.
> >
> > Following program should consume 10*8 seconds of cpu on a 8 cpu machine, but
> > with 2.6.32-rc1 numbers are crazy.
>
> Ok, so the top-level process time looks sane _while_ the thing is running
> (sum of all threads), but the per-thread times look broken (as if they had
> randomly had the times of the other threads added into them - looks to me
> like they on average had half the other threads' time accounted into
> them).
>
> And then at the end, it looks like the time of the threads (which was
> over-accounted) get re-accounted back into the main process, so the final
> time is indeed wildly inflated.
>
> IOW, looks like we're adding thread times multiple times to other threads
> (and then finally to the parent).
>
> I'm adding the usual suspects to the cc, and leaving your results and
> test-program quoted here for them.. Thomas, Martin, John - if you're not
> the people I should blame for this, let me know.

Uh-oh.. usual suspects, eh?

For starters I run the test program on an s390 with
VIRT_CPU_ACCOUNTING=y:

time ./burn-cpu
PID TTY STAT TIME COMMAND
2979 pts/0 - 0:08 ./burn-cpu
- - Sl+ 0:00 -
- - Rl+ 0:01 -
- - Rl+ 0:01 -
- - Rl+ 0:01 -
- - Rl+ 0:01 -
- - Rl+ 0:01 -
- - Rl+ 0:01 -
- - Rl+ 0:01 -
- - Rl+ 0:01 -
PID TTY STAT TIME COMMAND
2979 pts/0 - 0:16 ./burn-cpu
- - Sl+ 0:00 -
- - Rl+ 0:02 -
- - Rl+ 0:02 -
- - Rl+ 0:02 -
- - Rl+ 0:02 -
- - Rl+ 0:02 -
- - Rl+ 0:02 -
- - Rl+ 0:02 -
- - Rl+ 0:02 -
PID TTY STAT TIME COMMAND
2979 pts/0 - 0:25 ./burn-cpu
- - Sl+ 0:00 -
- - Rl+ 0:03 -
- - Rl+ 0:03 -
- - Rl+ 0:03 -
- - Rl+ 0:03 -
- - Rl+ 0:03 -
- - Rl+ 0:03 -
- - Rl+ 0:03 -
- - Rl+ 0:03 -
PID TTY STAT TIME COMMAND
2979 pts/0 - 0:33 ./burn-cpu
- - Sl+ 0:00 -
- - Rl+ 0:04 -
- - Rl+ 0:04 -
- - Rl+ 0:04 -
- - Rl+ 0:04 -
- - Rl+ 0:04 -
- - Rl+ 0:04 -
- - Rl+ 0:04 -
- - Rl+ 0:04 -
PID TTY STAT TIME COMMAND
2979 pts/0 - 0:41 ./burn-cpu
- - Sl+ 0:00 -
- - Rl+ 0:05 -
- - Rl+ 0:05 -
- - Rl+ 0:05 -
- - Rl+ 0:05 -
- - Rl+ 0:05 -
- - Rl+ 0:05 -
- - Rl+ 0:05 -
- - Rl+ 0:05 -
PID TTY STAT TIME COMMAND
2979 pts/0 - 0:50 ./burn-cpu
- - Sl+ 0:00 -
- - Rl+ 0:06 -
- - Rl+ 0:06 -
- - Rl+ 0:06 -
- - Rl+ 0:06 -
- - Rl+ 0:06 -
- - Rl+ 0:06 -
- - Rl+ 0:06 -
- - Rl+ 0:06 -
PID TTY STAT TIME COMMAND
2979 pts/0 - 0:58 ./burn-cpu
- - Sl+ 0:00 -
- - Rl+ 0:07 -
- - Rl+ 0:07 -
- - Rl+ 0:07 -
- - Rl+ 0:07 -
- - Rl+ 0:07 -
- - Rl+ 0:07 -
- - Rl+ 0:07 -
- - Rl+ 0:07 -
PID TTY STAT TIME COMMAND
2979 pts/0 - 1:07 ./burn-cpu
- - Sl+ 0:00 -
- - Rl+ 0:08 -
- - Rl+ 0:08 -
- - Rl+ 0:08 -
- - Rl+ 0:08 -
- - Rl+ 0:08 -
- - Rl+ 0:08 -
- - Rl+ 0:08 -
- - Rl+ 0:08 -
PID TTY STAT TIME COMMAND
2979 pts/0 - 1:15 ./burn-cpu
- - Sl+ 0:00 -
- - Rl+ 0:09 -
- - Rl+ 0:09 -
- - Rl+ 0:09 -
- - Rl+ 0:09 -
- - Rl+ 0:09 -
- - Rl+ 0:09 -
- - Rl+ 0:09 -
- - Rl+ 0:09 -
PID TTY STAT TIME COMMAND
2979 pts/0 - 1:25 ./burn-cpu
- - S+ 1:25 -

real 0m10.708s
user 1m25.051s
sys 0m0.174s

looks better, gives an average of 10.63 seconds per thread and the per
thread numbers are fine. I'll see what happens with the test case on my
pc@home.

--
blue skies,
Martin.

"Reality continues to ruin my life." - Calvin.

2009-09-28 18:42:14

by Eric Dumazet

[permalink] [raw]
Subject: Re: Linux 2.6.32-rc1

Martin Schwidefsky a écrit :
> On Mon, 28 Sep 2009 08:39:31 -0700 (PDT)
> Linus Torvalds <[email protected]> wrote:
>
>> On Mon, 28 Sep 2009, Eric Dumazet wrote:
>>> Linus Torvalds a écrit :
>>>> Go wild, test it out, and let us know about any regressions you find,
>>> Something seems wrong with process time accounting.
>>>
>>> Following program should consume 10*8 seconds of cpu on a 8 cpu machine, but
>>> with 2.6.32-rc1 numbers are crazy.
>> Ok, so the top-level process time looks sane _while_ the thing is running
>> (sum of all threads), but the per-thread times look broken (as if they had
>> randomly had the times of the other threads added into them - looks to me
>> like they on average had half the other threads' time accounted into
>> them).
>>
>> And then at the end, it looks like the time of the threads (which was
>> over-accounted) get re-accounted back into the main process, so the final
>> time is indeed wildly inflated.
>>
>> IOW, looks like we're adding thread times multiple times to other threads
>> (and then finally to the parent).
>>
>> I'm adding the usual suspects to the cc, and leaving your results and
>> test-program quoted here for them.. Thomas, Martin, John - if you're not
>> the people I should blame for this, let me know.
>
> Uh-oh.. usual suspects, eh?
>
> For starters I run the test program on an s390 with
> VIRT_CPU_ACCOUNTING=y:
>
> time ./burn-cpu
> PID TTY STAT TIME COMMAND
> 2979 pts/0 - 0:08 ./burn-cpu
> - - Sl+ 0:00 -
> - - Rl+ 0:01 -
> - - Rl+ 0:01 -
> - - Rl+ 0:01 -
> - - Rl+ 0:01 -
> - - Rl+ 0:01 -
> - - Rl+ 0:01 -
> - - Rl+ 0:01 -
> - - Rl+ 0:01 -
> PID TTY STAT TIME COMMAND
> 2979 pts/0 - 0:16 ./burn-cpu
> - - Sl+ 0:00 -
> - - Rl+ 0:02 -
> - - Rl+ 0:02 -
> - - Rl+ 0:02 -
> - - Rl+ 0:02 -
> - - Rl+ 0:02 -
> - - Rl+ 0:02 -
> - - Rl+ 0:02 -
> - - Rl+ 0:02 -
> PID TTY STAT TIME COMMAND
> 2979 pts/0 - 0:25 ./burn-cpu
> - - Sl+ 0:00 -
> - - Rl+ 0:03 -
> - - Rl+ 0:03 -
> - - Rl+ 0:03 -
> - - Rl+ 0:03 -
> - - Rl+ 0:03 -
> - - Rl+ 0:03 -
> - - Rl+ 0:03 -
> - - Rl+ 0:03 -
> PID TTY STAT TIME COMMAND
> 2979 pts/0 - 0:33 ./burn-cpu
> - - Sl+ 0:00 -
> - - Rl+ 0:04 -
> - - Rl+ 0:04 -
> - - Rl+ 0:04 -
> - - Rl+ 0:04 -
> - - Rl+ 0:04 -
> - - Rl+ 0:04 -
> - - Rl+ 0:04 -
> - - Rl+ 0:04 -
> PID TTY STAT TIME COMMAND
> 2979 pts/0 - 0:41 ./burn-cpu
> - - Sl+ 0:00 -
> - - Rl+ 0:05 -
> - - Rl+ 0:05 -
> - - Rl+ 0:05 -
> - - Rl+ 0:05 -
> - - Rl+ 0:05 -
> - - Rl+ 0:05 -
> - - Rl+ 0:05 -
> - - Rl+ 0:05 -
> PID TTY STAT TIME COMMAND
> 2979 pts/0 - 0:50 ./burn-cpu
> - - Sl+ 0:00 -
> - - Rl+ 0:06 -
> - - Rl+ 0:06 -
> - - Rl+ 0:06 -
> - - Rl+ 0:06 -
> - - Rl+ 0:06 -
> - - Rl+ 0:06 -
> - - Rl+ 0:06 -
> - - Rl+ 0:06 -
> PID TTY STAT TIME COMMAND
> 2979 pts/0 - 0:58 ./burn-cpu
> - - Sl+ 0:00 -
> - - Rl+ 0:07 -
> - - Rl+ 0:07 -
> - - Rl+ 0:07 -
> - - Rl+ 0:07 -
> - - Rl+ 0:07 -
> - - Rl+ 0:07 -
> - - Rl+ 0:07 -
> - - Rl+ 0:07 -
> PID TTY STAT TIME COMMAND
> 2979 pts/0 - 1:07 ./burn-cpu
> - - Sl+ 0:00 -
> - - Rl+ 0:08 -
> - - Rl+ 0:08 -
> - - Rl+ 0:08 -
> - - Rl+ 0:08 -
> - - Rl+ 0:08 -
> - - Rl+ 0:08 -
> - - Rl+ 0:08 -
> - - Rl+ 0:08 -
> PID TTY STAT TIME COMMAND
> 2979 pts/0 - 1:15 ./burn-cpu
> - - Sl+ 0:00 -
> - - Rl+ 0:09 -
> - - Rl+ 0:09 -
> - - Rl+ 0:09 -
> - - Rl+ 0:09 -
> - - Rl+ 0:09 -
> - - Rl+ 0:09 -
> - - Rl+ 0:09 -
> - - Rl+ 0:09 -
> PID TTY STAT TIME COMMAND
> 2979 pts/0 - 1:25 ./burn-cpu
> - - S+ 1:25 -
>
> real 0m10.708s
> user 1m25.051s
> sys 0m0.174s
>
> looks better, gives an average of 10.63 seconds per thread and the per
> thread numbers are fine. I'll see what happens with the test case on my
> pc@home.
>


I did a bisection and found commit def0a9b2573e00ab0b486cb5382625203ab4c4a6
was the origin of the problem on my x86_32 machine.

def0a9b2573e00ab0b486cb5382625203ab4c4a6 is first bad commit
commit def0a9b2573e00ab0b486cb5382625203ab4c4a6
Author: Peter Zijlstra <[email protected]>
Date: Fri Sep 18 20:14:01 2009 +0200

sched_clock: Make it NMI safe

Arjan complained about the suckyness of TSC on modern machines, and
asked if we could do something about that for PERF_SAMPLE_TIME.

Make cpu_clock() NMI safe by removing the spinlock and using
cmpxchg. This also makes it smaller and more robust.

Affects architectures that use HAVE_UNSTABLE_SCHED_CLOCK, i.e. IA64
and x86.

Signed-off-by: Peter Zijlstra <[email protected]>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <[email protected]>

git bisect start
# bad: [17d857be649a21ca90008c6dc425d849fa83db5c] Linux 2.6.32-rc1
git bisect bad 17d857be649a21ca90008c6dc425d849fa83db5c
# good: [74fca6a42863ffacaf7ba6f1936a9f228950f657] Linux 2.6.31
git bisect good 74fca6a42863ffacaf7ba6f1936a9f228950f657
# good: [74fca6a42863ffacaf7ba6f1936a9f228950f657] Linux 2.6.31
git bisect good 74fca6a42863ffacaf7ba6f1936a9f228950f657
# good: [5d1fe0c98f2aef99fb57aaf6dd25e793c186cea3] Staging: vt6656: Integrate vt6656 into build system.
git bisect good 5d1fe0c98f2aef99fb57aaf6dd25e793c186cea3
# good: [5d1fe0c98f2aef99fb57aaf6dd25e793c186cea3] Staging: vt6656: Integrate vt6656 into build system.
git bisect good 5d1fe0c98f2aef99fb57aaf6dd25e793c186cea3
# bad: [c720f5655df159a630fa0290a0bd67c93e92b0bf] Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6
git bisect bad c720f5655df159a630fa0290a0bd67c93e92b0bf
# good: [a03fdb7612874834d6847107198712d18b5242c7] Merge branch 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
git bisect good a03fdb7612874834d6847107198712d18b5242c7
# good: [3530c1886291df061e3972c55590777ef1cb67f8] Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
git bisect good 3530c1886291df061e3972c55590777ef1cb67f8
# good: [84d6ae431f315e8973aac3c3fe1d550fc9240ef3] V4L/DVB (13033): pt1: Don't use a deprecated DMA_BIT_MASK macro
git bisect good 84d6ae431f315e8973aac3c3fe1d550fc9240ef3
# bad: [ebc79c4f8da0f92efa968e0328f32334a2ce80cf] Merge git://git.kernel.org/pub/scm/linux/kernel/git/jaswinder/linux-2.6
git bisect bad ebc79c4f8da0f92efa968e0328f32334a2ce80cf
# good: [7bd032dc2793afcbaf4a350056768da84cdbd89b] USB serial: update the console driver
git bisect good 7bd032dc2793afcbaf4a350056768da84cdbd89b
# good: [1281a49b7aa865a1f0d60e2b28410e6234fc686b] perf trace: Sample timestamp and cpu when using record flag
git bisect good 1281a49b7aa865a1f0d60e2b28410e6234fc686b
# bad: [e11c675ede0d42a405ae595528bf0b29ce1ae56f] Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6
git bisect bad e11c675ede0d42a405ae595528bf0b29ce1ae56f
# bad: [6161352142d5fed4cd753b32e5ccde66e705b14e] tracing, perf: Convert the power tracer into an event tracer
git bisect bad 6161352142d5fed4cd753b32e5ccde66e705b14e
# bad: [929bf0d0156562ce631728b6fa53d68004d456d2] Merge branch 'linus' into perfcounters/core
git bisect bad 929bf0d0156562ce631728b6fa53d68004d456d2
# good: [5622f295b53fb60dbf9bed3e2c89d182490a8b7f] x86, perf_counter, bts: Optimize BTS overflow handling
git bisect good 5622f295b53fb60dbf9bed3e2c89d182490a8b7f
# bad: [def0a9b2573e00ab0b486cb5382625203ab4c4a6] sched_clock: Make it NMI safe
git bisect bad def0a9b2573e00ab0b486cb5382625203ab4c4a6


My .config file is attached to this mail


Attachments:
.config (57.47 kB)

2009-09-28 19:47:40

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] isdn: fix netjet/isdnhdlc build errors

From: Randy Dunlap <[email protected]>
Date: Mon, 28 Sep 2009 09:25:20 -0700

> From: Randy Dunlap <[email protected]>
>
> Commit cb3824bade2549d7ad059d5802da43312540fdee didn't fix this problem.
>
> Fix build errors in netjet, using isdnhdlc module:
...
> Signed-off-by: Randy Dunlap <[email protected]>

Since Karsten has been unresonspive to ISDN stuff for weeks,
I'm going to apply this directly.

Thanks!

2009-09-28 20:56:28

by Martin Schwidefsky

[permalink] [raw]
Subject: Re: Linux 2.6.32-rc1

On Mon, 28 Sep 2009 20:41:41 +0200
Eric Dumazet <[email protected]> wrote:

> I did a bisection and found commit def0a9b2573e00ab0b486cb5382625203ab4c4a6
> was the origin of the problem on my x86_32 machine.
>
> def0a9b2573e00ab0b486cb5382625203ab4c4a6 is first bad commit
> commit def0a9b2573e00ab0b486cb5382625203ab4c4a6
> Author: Peter Zijlstra <[email protected]>
> Date: Fri Sep 18 20:14:01 2009 +0200
>
> sched_clock: Make it NMI safe
>
> Arjan complained about the suckyness of TSC on modern machines, and
> asked if we could do something about that for PERF_SAMPLE_TIME.
>
> Make cpu_clock() NMI safe by removing the spinlock and using
> cmpxchg. This also makes it smaller and more robust.
>
> Affects architectures that use HAVE_UNSTABLE_SCHED_CLOCK, i.e. IA64
> and x86.
>
> Signed-off-by: Peter Zijlstra <[email protected]>
> LKML-Reference: <new-submission>
> Signed-off-by: Ingo Molnar <[email protected]>

Confirmed. The bisect run on my machine gave me the same bad commit.
The new logic in sched_clock_remove seems racy: the old code got the
locks for the sched_clock_data of the local and the remove cpu before
it changed any value. The new code tries to get to the same result with
a single cmpxchg. Bad things happen if two cpus try to update the clock
values crosswise, no?

--
blue skies,
Martin.

"Reality continues to ruin my life." - Calvin.

2009-09-28 22:10:28

by Roland

[permalink] [raw]
Subject: Re: Linux 2.6.32-rc1

>and as usual that means that the merge window is all over and done with

so, all the "heavy" stuff is in now - but no DRBD ?




List: linux-kernel
Subject: Linux 2.6.32-rc1
From: Linus Torvalds <torvalds () linux-foundation ! org>
Date: 2009-09-27 22:34:53
Message-ID: alpine.LFD.2.01.0909271511190.3349 () localhost ! localdomain
[Download message RAW]


It's been two weeks (and then some - but with last week being LinuxCon and
Plumbers conf I extended it by a few days), and as usual that means that
the merge window is all over and done with. 2.6.32-rc1 is out, so give it
a whirl.

What can I say? 67% drivers (the bulk of which is from 'staging', but
there's driver changes all over), 10% firmware, 10% arch updates
(dominated by arm, but MIPS, POWER, SH and x86 updates are there too,
along with the new 'SCore' architecture), 5% Documentation, and a random
smattering of other things (ie the normal filesystem, kernel, networking
etc updates).

For a change, I don't think we have a single new filesystem this time
around, but we do have updates to existing ones (ocfs2, btrfs, nfs, nilfs,
xfs, gfs2, ext4 - you name it).

Some of the more interesting changes (but perhaps that's just me) are some
of the VM updates (ZERO_PAGE is back!) and the writeback work by Jens and
others to spread out writeback by backing store.

Go wild, test it out, and let us know about any regressions you find,

Linus

______________________________________________________
GRATIS f?r alle WEB.DE-Nutzer: Die maxdome Movie-FLAT!
Jetzt freischalten unter http://movieflat.web.de

2009-09-29 20:43:27

by Eric Dumazet

[permalink] [raw]
Subject: Re: Linux 2.6.32-rc1

Eric Dumazet a écrit :
> Martin Schwidefsky a écrit :
>> On Mon, 28 Sep 2009 08:39:31 -0700 (PDT)
>> Linus Torvalds <[email protected]> wrote:
>>
>>> On Mon, 28 Sep 2009, Eric Dumazet wrote:
>>>> Linus Torvalds a écrit :
>>>>> Go wild, test it out, and let us know about any regressions you find,
>>>> Something seems wrong with process time accounting.
>>>>
>>>> Following program should consume 10*8 seconds of cpu on a 8 cpu machine, but
>>>> with 2.6.32-rc1 numbers are crazy.
>>> Ok, so the top-level process time looks sane _while_ the thing is running
>>> (sum of all threads), but the per-thread times look broken (as if they had
>>> randomly had the times of the other threads added into them - looks to me
>>> like they on average had half the other threads' time accounted into
>>> them).
>>>
>>> And then at the end, it looks like the time of the threads (which was
>>> over-accounted) get re-accounted back into the main process, so the final
>>> time is indeed wildly inflated.
>>>
>>> IOW, looks like we're adding thread times multiple times to other threads
>>> (and then finally to the parent).
>>>
>>> I'm adding the usual suspects to the cc, and leaving your results and
>>> test-program quoted here for them.. Thomas, Martin, John - if you're not
>>> the people I should blame for this, let me know.
>> Uh-oh.. usual suspects, eh?
>>
>> For starters I run the test program on an s390 with
>> VIRT_CPU_ACCOUNTING=y:
>>
>> time ./burn-cpu
>> PID TTY STAT TIME COMMAND
>> 2979 pts/0 - 0:08 ./burn-cpu
>> - - Sl+ 0:00 -
>> - - Rl+ 0:01 -
>> - - Rl+ 0:01 -
>> - - Rl+ 0:01 -
>> - - Rl+ 0:01 -
>> - - Rl+ 0:01 -
>> - - Rl+ 0:01 -
>> - - Rl+ 0:01 -
>> - - Rl+ 0:01 -
>> PID TTY STAT TIME COMMAND
>> 2979 pts/0 - 0:16 ./burn-cpu
>> - - Sl+ 0:00 -
>> - - Rl+ 0:02 -
>> - - Rl+ 0:02 -
>> - - Rl+ 0:02 -
>> - - Rl+ 0:02 -
>> - - Rl+ 0:02 -
>> - - Rl+ 0:02 -
>> - - Rl+ 0:02 -
>> - - Rl+ 0:02 -
>> PID TTY STAT TIME COMMAND
>> 2979 pts/0 - 0:25 ./burn-cpu
>> - - Sl+ 0:00 -
>> - - Rl+ 0:03 -
>> - - Rl+ 0:03 -
>> - - Rl+ 0:03 -
>> - - Rl+ 0:03 -
>> - - Rl+ 0:03 -
>> - - Rl+ 0:03 -
>> - - Rl+ 0:03 -
>> - - Rl+ 0:03 -
>> PID TTY STAT TIME COMMAND
>> 2979 pts/0 - 0:33 ./burn-cpu
>> - - Sl+ 0:00 -
>> - - Rl+ 0:04 -
>> - - Rl+ 0:04 -
>> - - Rl+ 0:04 -
>> - - Rl+ 0:04 -
>> - - Rl+ 0:04 -
>> - - Rl+ 0:04 -
>> - - Rl+ 0:04 -
>> - - Rl+ 0:04 -
>> PID TTY STAT TIME COMMAND
>> 2979 pts/0 - 0:41 ./burn-cpu
>> - - Sl+ 0:00 -
>> - - Rl+ 0:05 -
>> - - Rl+ 0:05 -
>> - - Rl+ 0:05 -
>> - - Rl+ 0:05 -
>> - - Rl+ 0:05 -
>> - - Rl+ 0:05 -
>> - - Rl+ 0:05 -
>> - - Rl+ 0:05 -
>> PID TTY STAT TIME COMMAND
>> 2979 pts/0 - 0:50 ./burn-cpu
>> - - Sl+ 0:00 -
>> - - Rl+ 0:06 -
>> - - Rl+ 0:06 -
>> - - Rl+ 0:06 -
>> - - Rl+ 0:06 -
>> - - Rl+ 0:06 -
>> - - Rl+ 0:06 -
>> - - Rl+ 0:06 -
>> - - Rl+ 0:06 -
>> PID TTY STAT TIME COMMAND
>> 2979 pts/0 - 0:58 ./burn-cpu
>> - - Sl+ 0:00 -
>> - - Rl+ 0:07 -
>> - - Rl+ 0:07 -
>> - - Rl+ 0:07 -
>> - - Rl+ 0:07 -
>> - - Rl+ 0:07 -
>> - - Rl+ 0:07 -
>> - - Rl+ 0:07 -
>> - - Rl+ 0:07 -
>> PID TTY STAT TIME COMMAND
>> 2979 pts/0 - 1:07 ./burn-cpu
>> - - Sl+ 0:00 -
>> - - Rl+ 0:08 -
>> - - Rl+ 0:08 -
>> - - Rl+ 0:08 -
>> - - Rl+ 0:08 -
>> - - Rl+ 0:08 -
>> - - Rl+ 0:08 -
>> - - Rl+ 0:08 -
>> - - Rl+ 0:08 -
>> PID TTY STAT TIME COMMAND
>> 2979 pts/0 - 1:15 ./burn-cpu
>> - - Sl+ 0:00 -
>> - - Rl+ 0:09 -
>> - - Rl+ 0:09 -
>> - - Rl+ 0:09 -
>> - - Rl+ 0:09 -
>> - - Rl+ 0:09 -
>> - - Rl+ 0:09 -
>> - - Rl+ 0:09 -
>> - - Rl+ 0:09 -
>> PID TTY STAT TIME COMMAND
>> 2979 pts/0 - 1:25 ./burn-cpu
>> - - S+ 1:25 -
>>
>> real 0m10.708s
>> user 1m25.051s
>> sys 0m0.174s
>>
>> looks better, gives an average of 10.63 seconds per thread and the per
>> thread numbers are fine. I'll see what happens with the test case on my
>> pc@home.
>>
>
>
> I did a bisection and found commit def0a9b2573e00ab0b486cb5382625203ab4c4a6
> was the origin of the problem on my x86_32 machine.
>
> def0a9b2573e00ab0b486cb5382625203ab4c4a6 is first bad commit
> commit def0a9b2573e00ab0b486cb5382625203ab4c4a6
> Author: Peter Zijlstra <[email protected]>
> Date: Fri Sep 18 20:14:01 2009 +0200
>
> sched_clock: Make it NMI safe
>
> Arjan complained about the suckyness of TSC on modern machines, and
> asked if we could do something about that for PERF_SAMPLE_TIME.
>
> Make cpu_clock() NMI safe by removing the spinlock and using
> cmpxchg. This also makes it smaller and more robust.
>
> Affects architectures that use HAVE_UNSTABLE_SCHED_CLOCK, i.e. IA64
> and x86.
>
> Signed-off-by: Peter Zijlstra <[email protected]>
> LKML-Reference: <new-submission>
> Signed-off-by: Ingo Molnar <[email protected]>
>

Pretty calm lkml these days... must be some kind of event in the states ? :)

Checking this commit, I believe problem comes from cmpxchg(), which doesnt
handle 64 bit on X86_32 (no compilation error, and null operation :( )

We have two (or three choices) :

1) Use cmpxchg64()

2) Fix cmpxchg() to handle 64bit as well (or issue a compilation error)

3) Revert Peter patch :(

Here is the patch I used to get proper operation.

[PATCH] sched_clock: Use cmpxchg64()

Commit def0a9b2573e00ab0b486cb5382625203ab4c4a6
(sched_clock: Make it NMI safe) assumed cmpxchg() of 64bit values was available on X86_32

Signed-off-by: Eric Dumazet <[email protected]>
---

diff --git a/kernel/sched_clock.c b/kernel/sched_clock.c
index ac2e1dc..479ce56 100644
--- a/kernel/sched_clock.c
+++ b/kernel/sched_clock.c
@@ -127,7 +127,7 @@ again:
clock = wrap_max(clock, min_clock);
clock = wrap_min(clock, max_clock);

- if (cmpxchg(&scd->clock, old_clock, clock) != old_clock)
+ if (cmpxchg64(&scd->clock, old_clock, clock) != old_clock)
goto again;

return clock;
@@ -163,7 +163,7 @@ again:
val = remote_clock;
}

- if (cmpxchg(ptr, old_val, val) != old_val)
+ if (cmpxchg64(ptr, old_val, val) != old_val)
goto again;

return val;

2009-09-29 21:19:06

by Linus Torvalds

[permalink] [raw]
Subject: Re: Linux 2.6.32-rc1



On Tue, 29 Sep 2009, Eric Dumazet wrote:
>
> Checking this commit, I believe problem comes from cmpxchg(), which doesnt
> handle 64 bit on X86_32 (no compilation error, and null operation :( )

Wow. That's broken. Very nasty silent failure.

> 1) Use cmpxchg64()

Clearly better than what we have now, although cmpxchg64 does result is
some really disgusting code. We will use a bare CMPXCHG64 only if you
compile for PAE right now - so even if you tell Kconfig that you want to
compile for a modern CPU, we won't be doing that whole cmpxchg64b thing
directly, we'll inline some really disgusting code.

So we really need to fix that up before it would be acceptable to use
cmpxchg64().

And regardless, we should fix the silent cmpxchg failure, even if it's
just a link-time failure or something. 64-bit things in the kernel used to
be rare and special, but they aren't any more.

Ingo, Peter, comments?

Linus

2009-09-29 21:22:36

by Arjan van de Ven

[permalink] [raw]
Subject: Re: Linux 2.6.32-rc1

On Tue, 29 Sep 2009 14:17:58 -0700 (PDT)
Linus Torvalds <[email protected]> wrote:

>
>
> On Tue, 29 Sep 2009, Eric Dumazet wrote:
> >
> > Checking this commit, I believe problem comes from cmpxchg(), which
> > doesnt handle 64 bit on X86_32 (no compilation error, and null
> > operation :( )
>
> Wow. That's broken. Very nasty silent failure.
>
> > 1) Use cmpxchg64()
>
> Clearly better than what we have now, although cmpxchg64 does result
> is some really disgusting code. We will use a bare CMPXCHG64 only if
> you compile for PAE right now - so even if you tell Kconfig that you
> want to compile for a modern CPU, we won't be doing that whole
> cmpxchg64b thing directly, we'll inline some really disgusting code.

can't we use alternatives() for this, to patch cmpxchg64 in ?
I mean.. it'll be commonly supported nowadays.. the fallback to it not
being supported could be a bit slower by now...


--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2009-09-29 21:57:22

by Linus Torvalds

[permalink] [raw]
Subject: Re: Linux 2.6.32-rc1



On Tue, 29 Sep 2009, Arjan van de Ven wrote:
>
> can't we use alternatives() for this, to patch cmpxchg64 in ?
> I mean.. it'll be commonly supported nowadays.. the fallback to it not
> being supported could be a bit slower by now...

Yes, we could. It would limit us to some fixed address format, probably

cmpxchg8b (%esi)

or something. Use something like this as a starting point, perhaps?

NOTE! Totally untested! And you'd actually need to implement the
"cmpxchg8b_emu" function that takes it's arguments in %eax:%edx, %ebx:%ecx
and %esi and doesn't trash any other registers..

Linus

---
arch/x86/include/asm/cmpxchg_32.h | 27 ++++++++++++++-------------
1 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/cmpxchg_32.h b/arch/x86/include/asm/cmpxchg_32.h
index 82ceb78..d16a486 100644
--- a/arch/x86/include/asm/cmpxchg_32.h
+++ b/arch/x86/include/asm/cmpxchg_32.h
@@ -312,19 +312,20 @@ static inline unsigned long cmpxchg_386(volatile void *ptr, unsigned long old,

extern unsigned long long cmpxchg_486_u64(volatile void *, u64, u64);

-#define cmpxchg64(ptr, o, n) \
-({ \
- __typeof__(*(ptr)) __ret; \
- if (likely(boot_cpu_data.x86 > 4)) \
- __ret = (__typeof__(*(ptr)))__cmpxchg64((ptr), \
- (unsigned long long)(o), \
- (unsigned long long)(n)); \
- else \
- __ret = (__typeof__(*(ptr)))cmpxchg_486_u64((ptr), \
- (unsigned long long)(o), \
- (unsigned long long)(n)); \
- __ret; \
-})
+#define cmpxchg64(ptr, o, n) \
+({ \
+ __typeof__(*(ptr)) __ret; \
+ __typeof__(*(ptr)) __old = (o); \
+ __typeof__(*(ptr)) __new = (n); \
+ alternative_io(LOCK_PREFIX "cmpxchg8b (%%esi)", \
+ "call cmpxchg8b_emu", \
+ X86_FEATURE_CMPXCHG8B, \
+ "=A" (__ret), \
+ "m" (*(ptr)), "0" (__old), \
+ "b" ((unsigned int)__new), \
+ "c" ((unsigned int)(__new>>32)) ); \
+ __ret; })
+
#define cmpxchg64_local(ptr, o, n) \
({ \
__typeof__(*(ptr)) __ret; \

2009-09-30 15:07:40

by Arjan van de Ven

[permalink] [raw]
Subject: Re: Linux 2.6.32-rc1

On Tue, 29 Sep 2009 14:56:28 -0700 (PDT)
Linus Torvalds <[email protected]> wrote:

>
>
> On Tue, 29 Sep 2009, Arjan van de Ven wrote:
> >
> > can't we use alternatives() for this, to patch cmpxchg64 in ?
> > I mean.. it'll be commonly supported nowadays.. the fallback to it
> > not being supported could be a bit slower by now...
>
> Yes, we could. It would limit us to some fixed address format,
> probably
>
> cmpxchg8b (%esi)
>
> or something. Use something like this as a starting point, perhaps?
>
> NOTE! Totally untested! And you'd actually need to implement the
> "cmpxchg8b_emu" function that takes it's arguments in %eax:%edx,
> %ebx:%ecx and %esi and doesn't trash any other registers..

so I debugged this guy (had a few bugs ;-)

patch, including a new cmpxchg8b_emu below:

>From 5a76986c5dd272ea16a9b8abb7349ff3d6791c2b Mon Sep 17 00:00:00 2001
From: Arjan van de Ven <[email protected]>
Date: Wed, 30 Sep 2009 17:04:35 +0200
Subject: [PATCH] x86: Provide an alternative() based cmpxchg64()

Based on Linus' patch, this patch provides cmpxchg64() using
the alternative() infrastructure.

Note: the fallback is NOT smp safe, just like the current fallback
is not SMP safe.

Signed-off-by: Arjan van de Ven <[email protected]>
---
arch/x86/include/asm/cmpxchg_32.h | 29 ++++++++++--------
arch/x86/kernel/i386_ksyms_32.c | 3 ++
arch/x86/lib/Makefile | 2 +-
arch/x86/lib/cmpxchg8b_emu.S | 61 +++++++++++++++++++++++++++++++++++++
4 files changed, 81 insertions(+), 14 deletions(-)
create mode 100644 arch/x86/lib/cmpxchg8b_emu.S

diff --git a/arch/x86/include/asm/cmpxchg_32.h b/arch/x86/include/asm/cmpxchg_32.h
index 82ceb78..3b21afa 100644
--- a/arch/x86/include/asm/cmpxchg_32.h
+++ b/arch/x86/include/asm/cmpxchg_32.h
@@ -312,19 +312,22 @@ static inline unsigned long cmpxchg_386(volatile void *ptr, unsigned long old,

extern unsigned long long cmpxchg_486_u64(volatile void *, u64, u64);

-#define cmpxchg64(ptr, o, n) \
-({ \
- __typeof__(*(ptr)) __ret; \
- if (likely(boot_cpu_data.x86 > 4)) \
- __ret = (__typeof__(*(ptr)))__cmpxchg64((ptr), \
- (unsigned long long)(o), \
- (unsigned long long)(n)); \
- else \
- __ret = (__typeof__(*(ptr)))cmpxchg_486_u64((ptr), \
- (unsigned long long)(o), \
- (unsigned long long)(n)); \
- __ret; \
-})
+#define cmpxchg64(ptr, o, n) \
+({ \
+ __typeof__(*(ptr)) __ret; \
+ __typeof__(*(ptr)) __old = (o); \
+ __typeof__(*(ptr)) __new = (n); \
+ alternative_io("call cmpxchg8b_emu", \
+ "lock; cmpxchg8b (%%esi)" , \
+ X86_FEATURE_CX8, \
+ "=A" (__ret), \
+ "S" ((ptr)), "0" (__old), \
+ "b" ((unsigned int)__new), \
+ "c" ((unsigned int)(__new>>32))); \
+ __ret; })
+
+
+
#define cmpxchg64_local(ptr, o, n) \
({ \
__typeof__(*(ptr)) __ret; \
diff --git a/arch/x86/kernel/i386_ksyms_32.c b/arch/x86/kernel/i386_ksyms_32.c
index 43cec6b..f17930e 100644
--- a/arch/x86/kernel/i386_ksyms_32.c
+++ b/arch/x86/kernel/i386_ksyms_32.c
@@ -10,6 +10,9 @@
EXPORT_SYMBOL(mcount);
#endif

+extern void cmpxchg8b_emu(void); /* dummy proto */
+EXPORT_SYMBOL(cmpxchg8b_emu);
+
/* Networking helper routines. */
EXPORT_SYMBOL(csum_partial_copy_generic);

diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index 9e60920..3e549b8 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -15,7 +15,7 @@ ifeq ($(CONFIG_X86_32),y)
obj-y += atomic64_32.o
lib-y += checksum_32.o
lib-y += strstr_32.o
- lib-y += semaphore_32.o string_32.o
+ lib-y += semaphore_32.o string_32.o cmpxchg8b_emu.o

lib-$(CONFIG_X86_USE_3DNOW) += mmx_32.o
else
diff --git a/arch/x86/lib/cmpxchg8b_emu.S b/arch/x86/lib/cmpxchg8b_emu.S
new file mode 100644
index 0000000..b8af4c7
--- /dev/null
+++ b/arch/x86/lib/cmpxchg8b_emu.S
@@ -0,0 +1,61 @@
+/*
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ *
+ */
+
+#include <linux/linkage.h>
+#include <asm/alternative-asm.h>
+#include <asm/frame.h>
+#include <asm/dwarf2.h>
+
+
+.text
+
+/*
+ * Inputs:
+ * %esi : memory location to compare
+ * %eax : low 32 bits of old value
+ * %edx : high 32 bits of old value
+ * %ebx : low 32 bits of new value
+ * %ecx : high 32 bits of new value
+ */
+ENTRY(cmpxchg8b_emu)
+ CFI_STARTPROC
+
+ push %edi
+ push %ebx
+ push %ecx
+ /* disable interrupts */
+ pushf
+ pop %edi
+ cli
+
+ cmpl %edx, 4(%esi)
+ jne 1f
+ cmpl %eax, (%esi)
+ jne 1f
+
+ xchg (%esi), %ebx
+ xchg 4(%esi), %ecx
+ mov %ebx, %eax
+ mov %ecx, %edx
+
+2:
+ /* restore interrupts */
+ push %edi
+ popf
+
+ pop %ecx
+ pop %ebx
+ pop %edi
+ ret
+1:
+ mov (%esi), %eax
+ mov 4(%esi), %edx
+ jmp 2b
+ CFI_ENDPROC
+ENDPROC(cmpxchg8b_emu)
+
--
1.6.2.5

2009-09-30 15:28:09

by Eric Dumazet

[permalink] [raw]
Subject: Re: Linux 2.6.32-rc1

Arjan van de Ven a ?crit :
> On Tue, 29 Sep 2009 14:56:28 -0700 (PDT)
> Linus Torvalds <[email protected]> wrote:
>
>>
>> On Tue, 29 Sep 2009, Arjan van de Ven wrote:
>>> can't we use alternatives() for this, to patch cmpxchg64 in ?
>>> I mean.. it'll be commonly supported nowadays.. the fallback to it
>>> not being supported could be a bit slower by now...
>> Yes, we could. It would limit us to some fixed address format,
>> probably
>>
>> cmpxchg8b (%esi)
>>
>> or something. Use something like this as a starting point, perhaps?
>>
>> NOTE! Totally untested! And you'd actually need to implement the
>> "cmpxchg8b_emu" function that takes it's arguments in %eax:%edx,
>> %ebx:%ecx and %esi and doesn't trash any other registers..
>
> so I debugged this guy (had a few bugs ;-)
>
> patch, including a new cmpxchg8b_emu below:
>
> From 5a76986c5dd272ea16a9b8abb7349ff3d6791c2b Mon Sep 17 00:00:00 2001
> From: Arjan van de Ven <[email protected]>
> Date: Wed, 30 Sep 2009 17:04:35 +0200
> Subject: [PATCH] x86: Provide an alternative() based cmpxchg64()
>
> Based on Linus' patch, this patch provides cmpxchg64() using
> the alternative() infrastructure.
>
> Note: the fallback is NOT smp safe, just like the current fallback
> is not SMP safe.
>
> Signed-off-by: Arjan van de Ven <[email protected]>
> ---
> arch/x86/include/asm/cmpxchg_32.h | 29 ++++++++++--------
> arch/x86/kernel/i386_ksyms_32.c | 3 ++
> arch/x86/lib/Makefile | 2 +-
> arch/x86/lib/cmpxchg8b_emu.S | 61 +++++++++++++++++++++++++++++++++++++
> 4 files changed, 81 insertions(+), 14 deletions(-)
> create mode 100644 arch/x86/lib/cmpxchg8b_emu.S
>
> diff --git a/arch/x86/include/asm/cmpxchg_32.h b/arch/x86/include/asm/cmpxchg_32.h
> index 82ceb78..3b21afa 100644
> --- a/arch/x86/include/asm/cmpxchg_32.h
> +++ b/arch/x86/include/asm/cmpxchg_32.h
> @@ -312,19 +312,22 @@ static inline unsigned long cmpxchg_386(volatile void *ptr, unsigned long old,
>
> extern unsigned long long cmpxchg_486_u64(volatile void *, u64, u64);
>
> -#define cmpxchg64(ptr, o, n) \
> -({ \
> - __typeof__(*(ptr)) __ret; \
> - if (likely(boot_cpu_data.x86 > 4)) \
> - __ret = (__typeof__(*(ptr)))__cmpxchg64((ptr), \
> - (unsigned long long)(o), \
> - (unsigned long long)(n)); \
> - else \
> - __ret = (__typeof__(*(ptr)))cmpxchg_486_u64((ptr), \
> - (unsigned long long)(o), \
> - (unsigned long long)(n)); \
> - __ret; \
> -})
> +#define cmpxchg64(ptr, o, n) \
> +({ \
> + __typeof__(*(ptr)) __ret; \
> + __typeof__(*(ptr)) __old = (o); \
> + __typeof__(*(ptr)) __new = (n); \
> + alternative_io("call cmpxchg8b_emu", \
> + "lock; cmpxchg8b (%%esi)" , \
> + X86_FEATURE_CX8, \
> + "=A" (__ret), \
> + "S" ((ptr)), "0" (__old), \
> + "b" ((unsigned int)__new), \
> + "c" ((unsigned int)(__new>>32))); \
> + __ret; })
> +
> +
> +
> #define cmpxchg64_local(ptr, o, n) \
> ({ \
> __typeof__(*(ptr)) __ret; \
> diff --git a/arch/x86/kernel/i386_ksyms_32.c b/arch/x86/kernel/i386_ksyms_32.c
> index 43cec6b..f17930e 100644
> --- a/arch/x86/kernel/i386_ksyms_32.c
> +++ b/arch/x86/kernel/i386_ksyms_32.c
> @@ -10,6 +10,9 @@
> EXPORT_SYMBOL(mcount);
> #endif
>
> +extern void cmpxchg8b_emu(void); /* dummy proto */
> +EXPORT_SYMBOL(cmpxchg8b_emu);
> +
> /* Networking helper routines. */
> EXPORT_SYMBOL(csum_partial_copy_generic);
>
> diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
> index 9e60920..3e549b8 100644
> --- a/arch/x86/lib/Makefile
> +++ b/arch/x86/lib/Makefile
> @@ -15,7 +15,7 @@ ifeq ($(CONFIG_X86_32),y)
> obj-y += atomic64_32.o
> lib-y += checksum_32.o
> lib-y += strstr_32.o
> - lib-y += semaphore_32.o string_32.o
> + lib-y += semaphore_32.o string_32.o cmpxchg8b_emu.o
>
> lib-$(CONFIG_X86_USE_3DNOW) += mmx_32.o
> else
> diff --git a/arch/x86/lib/cmpxchg8b_emu.S b/arch/x86/lib/cmpxchg8b_emu.S
> new file mode 100644
> index 0000000..b8af4c7
> --- /dev/null
> +++ b/arch/x86/lib/cmpxchg8b_emu.S
> @@ -0,0 +1,61 @@
> +/*
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; version 2
> + * of the License.
> + *
> + */
> +
> +#include <linux/linkage.h>
> +#include <asm/alternative-asm.h>
> +#include <asm/frame.h>
> +#include <asm/dwarf2.h>
> +
> +
> +.text
> +
> +/*
> + * Inputs:
> + * %esi : memory location to compare
> + * %eax : low 32 bits of old value
> + * %edx : high 32 bits of old value
> + * %ebx : low 32 bits of new value
> + * %ecx : high 32 bits of new value
> + */
> +ENTRY(cmpxchg8b_emu)
> + CFI_STARTPROC
> +
> + push %edi
> + push %ebx
> + push %ecx
> + /* disable interrupts */
> + pushf

> + pop %edi
Why do you pop flags in edi, to later re-push them ?

> + cli
> +
> + cmpl %edx, 4(%esi)
> + jne 1f
> + cmpl %eax, (%esi)
> + jne 1f
> +

So we dont have irq fro this cpu, ok.

And other cpus allowed, and xchg implies lock prefix, ok.


> + xchg (%esi), %ebx
> + xchg 4(%esi), %ecx
How this sequence is guaranteed to be atomic with other cpus ?

If it is a !SMP implementation, then you could replace xchg by mov instructions.

mov %ebx,(%esi)
mov %ecx,4(%esi)

> + mov %ebx, %eax
> + mov %ecx, %edx
and avoid these of course


> +
> +2:
> + /* restore interrupts */
> + push %edi
> + popf
> +
> + pop %ecx
> + pop %ebx
> + pop %edi
> + ret
> +1:
> + mov (%esi), %eax
> + mov 4(%esi), %edx
> + jmp 2b
> + CFI_ENDPROC
> +ENDPROC(cmpxchg8b_emu)
> +


So I suggest :


ENTRY(cmpxchg8b_emu)
CFI_STARTPROC

/* disable interrupts */
pushf
cli

cmpl %eax,(%esi)
jne 1f
cmpl %edx,4(%esi)
jne 2f

mov %ebx,(%esi)
mov %ecx,4(%esi)

/* restore interrupts */
popf
ret
1:
mov (%esi), %eax
2:
mov 4(%esi), %edx
popf
ret
CFI_ENDPROC
ENDPROC(cmpxchg8b_emu)

2009-09-30 15:30:42

by Arjan van de Ven

[permalink] [raw]
Subject: Re: Linux 2.6.32-rc1

On Wed, 30 Sep 2009 17:27:05 +0200
Eric Dumazet <[email protected]> wrote:
>
> > + pop %edi
> Why do you pop flags in edi, to later re-push them ?
>
> > + cli

because here I disable interrupts

(basically this is local_irq_save() )
>
>
> > + xchg (%esi), %ebx
> > + xchg 4(%esi), %ecx
> How this sequence is guaranteed to be atomic with other cpus ?

it is not. this is the 486 implementation which is !SMP
(just like the current cmpxchg64() fallback)

>
> If it is a !SMP implementation, then you could replace xchg by mov
> instructions.

that is not equivalent. I need to also store the old values
and return them....


>
> So I suggest :
>
>
> ENTRY(cmpxchg8b_emu)
> CFI_STARTPROC
>
> /* disable interrupts */
> pushf
> cli
>
> cmpl %eax,(%esi)
> jne 1f
> cmpl %edx,4(%esi)
> jne 2f
>
> mov %ebx,(%esi)
> mov %ecx,4(%esi)

this is not equivalent since you don't return the "prev" value.



--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2009-09-30 15:58:18

by Eric Dumazet

[permalink] [raw]
Subject: Re: Linux 2.6.32-rc1

Arjan van de Ven a ?crit :
> On Tue, 29 Sep 2009 14:56:28 -0700 (PDT)
> Linus Torvalds <[email protected]> wrote:
>
>>
>> On Tue, 29 Sep 2009, Arjan van de Ven wrote:
>>> can't we use alternatives() for this, to patch cmpxchg64 in ?
>>> I mean.. it'll be commonly supported nowadays.. the fallback to it
>>> not being supported could be a bit slower by now...
>> Yes, we could. It would limit us to some fixed address format,
>> probably
>>
>> cmpxchg8b (%esi)
>>
>> or something. Use something like this as a starting point, perhaps?
>>
>> NOTE! Totally untested! And you'd actually need to implement the
>> "cmpxchg8b_emu" function that takes it's arguments in %eax:%edx,
>> %ebx:%ecx and %esi and doesn't trash any other registers..
>
> so I debugged this guy (had a few bugs ;-)
>
> patch, including a new cmpxchg8b_emu below:
>
>>From 5a76986c5dd272ea16a9b8abb7349ff3d6791c2b Mon Sep 17 00:00:00 2001
> From: Arjan van de Ven <[email protected]>
> Date: Wed, 30 Sep 2009 17:04:35 +0200
> Subject: [PATCH] x86: Provide an alternative() based cmpxchg64()
>
> Based on Linus' patch, this patch provides cmpxchg64() using
> the alternative() infrastructure.
>
> Note: the fallback is NOT smp safe, just like the current fallback
> is not SMP safe.
>
> Signed-off-by: Arjan van de Ven <[email protected]>

> +#define cmpxchg64(ptr, o, n) \
> +({ \
> + __typeof__(*(ptr)) __ret; \
> + __typeof__(*(ptr)) __old = (o); \
> + __typeof__(*(ptr)) __new = (n); \
> + alternative_io("call cmpxchg8b_emu", \
> + "lock; cmpxchg8b (%%esi)" , \
> + X86_FEATURE_CX8, \
> + "=A" (__ret), \
> + "S" ((ptr)), "0" (__old), \
> + "b" ((unsigned int)__new), \
> + "c" ((unsigned int)(__new>>32))); \


Note:

lock; cmpxchg8b (%%esi)

gives 4 bytes opcode : f0 0f c7 0e
Because alternative (call cmpxchg8b_emu) uses 5 bytes, a nop will be added.

Choosing ".byte 0xf0, 0x0f, 0xc7, 0x4e, 0x00" aka "lock cmpxchg8b 0x0(%esi)" is a litle bit better ?

2009-09-30 16:13:47

by Arjan van de Ven

[permalink] [raw]
Subject: Re: Linux 2.6.32-rc1

On Wed, 30 Sep 2009 17:57:25 +0200
Eric Dumazet <[email protected]> wrote:
>
> Note:
>
> lock; cmpxchg8b (%%esi)
>
> gives 4 bytes opcode : f0 0f c7 0e
> Because alternative (call cmpxchg8b_emu) uses 5 bytes, a nop will be
> added.
>
> Choosing ".byte 0xf0, 0x0f, 0xc7, 0x4e, 0x00" aka "lock cmpxchg8b
> 0x0(%esi)" is a litle bit better ?

doesn't matter normally.. nops fall out quickly in the execution path ;)



--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2009-09-30 16:14:57

by Linus Torvalds

[permalink] [raw]
Subject: Re: Linux 2.6.32-rc1



On Wed, 30 Sep 2009, Eric Dumazet wrote:
>
> lock; cmpxchg8b (%%esi)
>
> gives 4 bytes opcode : f0 0f c7 0e
> Because alternative (call cmpxchg8b_emu) uses 5 bytes, a nop will be added.
>
> Choosing ".byte 0xf0, 0x0f, 0xc7, 0x4e, 0x00" aka "lock cmpxchg8b 0x0(%esi)" is a litle bit better ?

And if you want to be really clever, you would want to handle the non-SMP
case too, where you have just "cmpxchgb (%%esi)" (three bytes) without the
lock prefix.

However, at this point I think Arjan's patch is already way superior to
what we have now (feel free to take a look at what we generate on 32-bit
without PAE today - just have a barf-bag handy), so all I'd really want is
a few "tested-by"s to make me feel happier about it, and a few more people
looking at the emulation routine to all say "ok, looks sane, ACK".

And at that point we can then either make "cmpxchg()" just do the 8-byte
case natively, or just take your patch to change sched_clock.c to now use
the no-longer-entirely-disgusting cmpxchg64().

Ingo - I suspect both those patches should just go through you. You do
both x86 and scheduler, so I'll happily pull the end result.

Linus

2009-09-30 16:14:50

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: Linux 2.6.32-rc1

[Eric Dumazet - Wed, Sep 30, 2009 at 05:57:25PM +0200]
...
|
| > +#define cmpxchg64(ptr, o, n) \
| > +({ \
| > + __typeof__(*(ptr)) __ret; \
| > + __typeof__(*(ptr)) __old = (o); \
| > + __typeof__(*(ptr)) __new = (n); \
| > + alternative_io("call cmpxchg8b_emu", \
| > + "lock; cmpxchg8b (%%esi)" , \
| > + X86_FEATURE_CX8, \
| > + "=A" (__ret), \
| > + "S" ((ptr)), "0" (__old), \
| > + "b" ((unsigned int)__new), \
| > + "c" ((unsigned int)(__new>>32))); \
|
|
| Note:
|
| lock; cmpxchg8b (%%esi)
|
| gives 4 bytes opcode : f0 0f c7 0e
| Because alternative (call cmpxchg8b_emu) uses 5 bytes, a nop will be added.
|
| Choosing ".byte 0xf0, 0x0f, 0xc7, 0x4e, 0x00" aka "lock cmpxchg8b 0x0(%esi)" is a litle bit better ?
|

Just curious why not "nop; lock; cmpxchg8b (%esi)"? lock itself is destructive
instruction with causes write buffers to flush data back and NOP itself will
be discarded by cpu internals so I suppose this form should be better. Though
I could miss something, and OTOH it's not a big deal. But still curious :)

-- Cyrill

2009-09-30 16:56:21

by Stefan Richter

[permalink] [raw]
Subject: Re: Linux 2.6.32-rc1

Arjan van de Ven wrote:
> Subject: [PATCH] x86: Provide an alternative() based cmpxchg64()
>
> Based on Linus' patch, this patch provides cmpxchg64() using
> the alternative() infrastructure.
>
> Note: the fallback is NOT smp safe, just like the current fallback
> is not SMP safe.

As long as it can't be turned into an atomic access, shouldn't cmpxchg64
be hidden from the common kernel coder? Almost everybody will assume
that it is an atomic operation and happily use it in unsafe places.
--
Stefan Richter
-=====-==--= =--= ====-
http://arcgraph.de/sr/

2009-09-30 17:10:24

by Linus Torvalds

[permalink] [raw]
Subject: Re: Linux 2.6.32-rc1



On Wed, 30 Sep 2009, Stefan Richter wrote:
>
> As long as it can't be turned into an atomic access, shouldn't cmpxchg64
> be hidden from the common kernel coder? Almost everybody will assume
> that it is an atomic operation and happily use it in unsafe places.

But it _is_ atomic. We don't support SMP on the platforms that we have to
do it with emulation on.

There's a theoretical problem with NMI, but it's not one we can solve or
that is really all that interesting, so might as well ignore it.

Linus

2009-09-30 17:41:30

by Stefan Richter

[permalink] [raw]
Subject: Re: Linux 2.6.32-rc1

Linus Torvalds wrote:
> On Wed, 30 Sep 2009, Stefan Richter wrote:
>> As long as it can't be turned into an atomic access, shouldn't cmpxchg64
>> be hidden from the common kernel coder? Almost everybody will assume
>> that it is an atomic operation and happily use it in unsafe places.
>
> But it _is_ atomic. We don't support SMP on the platforms that we have to
> do it with emulation on.
>
> There's a theoretical problem with NMI, but it's not one we can solve or
> that is really all that interesting, so might as well ignore it.

Ah, right, I could have taken a hint from /* disable interrupts */ in
Arjan's cmpxchg8b_emu.
--
Stefan Richter
-=====-==--= =--= ====-
http://arcgraph.de/sr/

2009-09-30 18:53:30

by Ingo Molnar

[permalink] [raw]
Subject: Re: Linux 2.6.32-rc1


* Linus Torvalds <[email protected]> wrote:

> On Wed, 30 Sep 2009, Eric Dumazet wrote:
> >
> > lock; cmpxchg8b (%%esi)
> >
> > gives 4 bytes opcode : f0 0f c7 0e
> > Because alternative (call cmpxchg8b_emu) uses 5 bytes, a nop will be added.
> >
> > Choosing ".byte 0xf0, 0x0f, 0xc7, 0x4e, 0x00" aka "lock cmpxchg8b 0x0(%esi)" is a litle bit better ?
>
> And if you want to be really clever, you would want to handle the
> non-SMP case too, where you have just "cmpxchgb (%%esi)" (three bytes)
> without the lock prefix.
>
> However, at this point I think Arjan's patch is already way superior
> to what we have now (feel free to take a look at what we generate on
> 32-bit without PAE today - just have a barf-bag handy), so all I'd
> really want is a few "tested-by"s to make me feel happier about it,
> and a few more people looking at the emulation routine to all say "ok,
> looks sane, ACK".
>
> And at that point we can then either make "cmpxchg()" just do the
> 8-byte case natively, or just take your patch to change sched_clock.c
> to now use the no-longer-entirely-disgusting cmpxchg64().
>
> Ingo - I suspect both those patches should just go through you. You do
> both x86 and scheduler, so I'll happily pull the end result.

Yeah - working on it - just got back from a trip. It's two risky patches
and if that place breaks everyone will be affected so i'll probably send
the pull request tomorrow.

Ingo

2009-09-30 19:32:48

by Ingo Molnar

[permalink] [raw]
Subject: Re: Linux 2.6.32-rc1


* Arjan van de Ven <[email protected]> wrote:

> From 5a76986c5dd272ea16a9b8abb7349ff3d6791c2b Mon Sep 17 00:00:00 2001
> From: Arjan van de Ven <[email protected]>
> Date: Wed, 30 Sep 2009 17:04:35 +0200
> Subject: [PATCH] x86: Provide an alternative() based cmpxchg64()
>
> Based on Linus' patch, this patch provides cmpxchg64() using
> the alternative() infrastructure.
>
> Note: the fallback is NOT smp safe, just like the current fallback
> is not SMP safe.
>
> Signed-off-by: Arjan van de Ven <[email protected]>
> ---
> arch/x86/include/asm/cmpxchg_32.h | 29 ++++++++++--------
> arch/x86/kernel/i386_ksyms_32.c | 3 ++
> arch/x86/lib/Makefile | 2 +-
> arch/x86/lib/cmpxchg8b_emu.S | 61 +++++++++++++++++++++++++++++++++++++
> 4 files changed, 81 insertions(+), 14 deletions(-)
> create mode 100644 arch/x86/lib/cmpxchg8b_emu.S

The two patches cause hangs on 32-bit, in sched_clock_cpu(). I attached
the hang serial log plus the two patches as i applied them.

Bug in cmpxchg8b_emu()?

Ingo

[ 45.800000] i2c-core: driver [tw9910] registered
[ 45.810000] initcall tw9910_module_init+0x0/0x2e returned 0 after 9765 usecs
[ 45.810000] BUG: spinlock lockup on CPU#1, swapper/1, 7be09ec0
[ 45.810000] Pid: 1, comm: swapper Tainted: G W 2.6.32-rc2-tip-00990-g6139d57-dirty #19124
[ 45.810000] Call Trace:
[ 45.810000] [<79ebce42>] ? printk+0x22/0x35
[ 45.810000] [<794356c7>] _raw_spin_lock+0x103/0x13f
[ 45.810000] [<79ebf84d>] _spin_lock+0x3c/0x55
[ 45.810000] [<790673e0>] ? scheduler_tick+0x44/0x20e
[ 45.810000] [<790673e0>] scheduler_tick+0x44/0x20e
[ 45.810000] [<790793fc>] update_process_times+0x4a/0x68
[ 45.810000] [<7909354c>] tick_periodic+0x7a/0x8d
[ 45.810000] [<79093588>] tick_handle_periodic+0x29/0x91
[ 45.810000] [<7909398f>] tick_do_broadcast+0x42/0x7d
[ 45.810000] [<79093b17>] tick_do_periodic_broadcast+0x3c/0x59
[ 45.810000] [<79093fb0>] tick_handle_periodic_broadcast+0x20/0x70
[ 45.810000] [<7902fac1>] timer_interrupt+0x4d/0x85
[ 45.810000] [<790ab402>] handle_IRQ_event+0x65/0x13b
[ 45.810000] [<790ad1fc>] handle_edge_irq+0xbe/0x111
[ 45.810000] [<7902f553>] handle_irq+0x2f/0x46
[ 45.810000] [<7902ee8b>] do_IRQ+0x51/0xb8
[ 45.810000] [<7902d435>] common_interrupt+0x35/0x40
[ 45.810000] [<7906afe2>] ? vprintk+0x3a8/0x3fa
[ 45.810000] [<79090020>] ? ntp_clear+0x22/0x7d
[ 45.810000] [<79ebdf6c>] ? __mutex_unlock_slowpath+0x10e/0x12f
[ 45.810000] [<7a5ab658>] ? tw9910_module_init+0x0/0x2e
[ 45.810000] [<7a5ab658>] ? tw9910_module_init+0x0/0x2e
[ 45.810000] [<79ebce42>] printk+0x22/0x35
[ 45.810000] [<790010ed>] do_one_initcall+0xc4/0x183
[ 45.810000] [<7a5ab658>] ? tw9910_module_init+0x0/0x2e
[ 45.810000] [<7a562474>] do_basic_setup+0x50/0x72
[ 45.810000] [<7a562509>] kernel_init+0x73/0xc5
[ 45.810000] [<7a562496>] ? kernel_init+0x0/0xc5
[ 45.810000] [<7902d997>] kernel_thread_helper+0x7/0x10
[ 45.810000] sending NMI to all CPUs:
[ 45.810000] NMI backtrace for cpu 1
[ 45.810000]
[ 45.810000] Pid: 1, comm: swapper Tainted: G W (2.6.32-rc2-tip-00990-g6139d57-dirty #19124) System Product Name
[ 45.810000] EIP: 0060:[<79097b38>] EFLAGS: 00000046 CPU: 1
[ 45.810000] EIP is at trace_hardirqs_off_caller+0x3f/0xbd
[ 45.810000] EAX: 82f26dd9 EBX: b78c8000 ECX: 00000000 EDX: 790412fc
[ 45.810000] ESI: 790412fc EDI: 00000002 EBP: b78c3d10 ESP: b78c3d04
[ 45.810000] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[ 45.810000] CR0: 8005003b CR2: 00000000 CR3: 02664000 CR4: 00000690
[ 45.810000] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[ 45.810000] DR6: ffff0ff0 DR7: 00000400
[ 45.810000] Call Trace:
[ 45.810000] [<79097bcf>] trace_hardirqs_off+0x19/0x2c
[ 45.810000] [<790412fc>] default_send_IPI_mask_logical+0x8f/0xb1
[ 45.810000] [<79041084>] default_send_IPI_all+0x35/0x87
[ 45.810000] [<790416bc>] arch_trigger_all_cpu_backtrace+0x40/0x73
[ 45.810000] [<794356cc>] _raw_spin_lock+0x108/0x13f
[ 45.810000] [<79ebf84d>] _spin_lock+0x3c/0x55
[ 45.810000] [<790673e0>] ? scheduler_tick+0x44/0x20e
[ 45.810000] [<790673e0>] scheduler_tick+0x44/0x20e
[ 45.810000] [<790793fc>] update_process_times+0x4a/0x68
[ 45.810000] [<7909354c>] tick_periodic+0x7a/0x8d
[ 45.810000] [<79093588>] tick_handle_periodic+0x29/0x91
[ 45.810000] [<7909398f>] tick_do_broadcast+0x42/0x7d
[ 45.810000] [<79093b17>] tick_do_periodic_broadcast+0x3c/0x59
[ 45.810000] [<79093fb0>] tick_handle_periodic_broadcast+0x20/0x70
[ 45.810000] [<7902fac1>] timer_interrupt+0x4d/0x85
[ 45.810000] [<790ab402>] handle_IRQ_event+0x65/0x13b
[ 45.810000] [<790ad1fc>] handle_edge_irq+0xbe/0x111
[ 45.810000] [<7902f553>] handle_irq+0x2f/0x46
[ 45.810000] [<7902ee8b>] do_IRQ+0x51/0xb8
[ 45.810000] [<7902d435>] common_interrupt+0x35/0x40
[ 45.810000] [<7906afe2>] ? vprintk+0x3a8/0x3fa
[ 45.810000] [<79090020>] ? ntp_clear+0x22/0x7d
[ 45.810000] [<79ebdf6c>] ? __mutex_unlock_slowpath+0x10e/0x12f
[ 45.810000] [<7a5ab658>] ? tw9910_module_init+0x0/0x2e
[ 45.810000] [<7a5ab658>] ? tw9910_module_init+0x0/0x2e
[ 45.810000] [<79ebce42>] printk+0x22/0x35
[ 45.810000] [<790010ed>] do_one_initcall+0xc4/0x183
[ 45.810000] [<7a5ab658>] ? tw9910_module_init+0x0/0x2e
[ 45.810000] [<7a562474>] do_basic_setup+0x50/0x72
[ 45.810000] [<7a562509>] kernel_init+0x73/0xc5
[ 45.810000] [<7a562496>] ? kernel_init+0x0/0xc5
[ 45.810000] [<7902d997>] kernel_thread_helper+0x7/0x10
[ 45.810000] Pid: 1, comm: swapper Tainted: G W 2.6.32-rc2-tip-00990-g6139d57-dirty #19124
[ 45.810000] Call Trace:
[ 45.810000] [<7902bce1>] ? show_regs+0x34/0x4b
[ 45.810000] [<7904188d>] nmi_watchdog_tick+0xa4/0x181
[ 45.810000] [<7902e941>] default_do_nmi+0x64/0x21e
[ 45.810000] [<790412fc>] ? default_send_IPI_mask_logical+0x8f/0xb1
[ 45.810000] [<7902eb5d>] do_nmi+0x62/0xad
[ 45.810000] [<79ec0080>] nmi_stack_correct+0x2f/0x34
[ 45.810000] [<790412fc>] ? default_send_IPI_mask_logical+0x8f/0xb1
[ 45.810000] [<790412fc>] ? default_send_IPI_mask_logical+0x8f/0xb1
[ 45.810000] [<79097b38>] ? trace_hardirqs_off_caller+0x3f/0xbd
[ 45.810000] [<79097bcf>] trace_hardirqs_off+0x19/0x2c
[ 45.810000] [<790412fc>] default_send_IPI_mask_logical+0x8f/0xb1
[ 45.810000] [<79041084>] default_send_IPI_all+0x35/0x87
[ 45.810000] [<790416bc>] arch_trigger_all_cpu_backtrace+0x40/0x73
[ 45.810000] [<794356cc>] _raw_spin_lock+0x108/0x13f
[ 45.810000] [<79ebf84d>] _spin_lock+0x3c/0x55
[ 45.810000] [<790673e0>] ? scheduler_tick+0x44/0x20e
[ 45.810000] [<790673e0>] scheduler_tick+0x44/0x20e
[ 45.810000] [<790793fc>] update_process_times+0x4a/0x68
[ 45.810000] [<7909354c>] tick_periodic+0x7a/0x8d
[ 45.810000] [<79093588>] tick_handle_periodic+0x29/0x91
[ 45.810000] [<7909398f>] tick_do_broadcast+0x42/0x7d
[ 45.810000] [<79093b17>] tick_do_periodic_broadcast+0x3c/0x59
[ 45.810000] [<79093fb0>] tick_handle_periodic_broadcast+0x20/0x70
[ 45.810000] [<7902fac1>] timer_interrupt+0x4d/0x85
[ 45.810000] [<790ab402>] handle_IRQ_event+0x65/0x13b
[ 45.810000] [<790ad1fc>] handle_edge_irq+0xbe/0x111
[ 45.810000] [<7902f553>] handle_irq+0x2f/0x46
[ 45.810000] [<7902ee8b>] do_IRQ+0x51/0xb8
[ 45.810000] [<7902d435>] common_interrupt+0x35/0x40
[ 45.810000] [<7906afe2>] ? vprintk+0x3a8/0x3fa
[ 45.810000] [<79090020>] ? ntp_clear+0x22/0x7d
[ 45.810000] [<79ebdf6c>] ? __mutex_unlock_slowpath+0x10e/0x12f
[ 45.810000] [<7a5ab658>] ? tw9910_module_init+0x0/0x2e
[ 45.810000] [<7a5ab658>] ? tw9910_module_init+0x0/0x2e
[ 45.810000] [<79ebce42>] printk+0x22/0x35
[ 45.810000] [<790010ed>] do_one_initcall+0xc4/0x183
[ 45.810000] [<7a5ab658>] ? tw9910_module_init+0x0/0x2e
[ 45.810000] [<7a562474>] do_basic_setup+0x50/0x72
[ 45.810000] [<7a562509>] kernel_init+0x73/0xc5
[ 45.810000] [<7a562496>] ? kernel_init+0x0/0xc5
[ 45.810000] [<7902d997>] kernel_thread_helper+0x7/0x10
[ 45.810000] NMI backtrace for cpu 0
[ 45.810000]
[ 45.810000] Pid: 0, comm: swapper Tainted: G W (2.6.32-rc2-tip-00990-g6139d57-dirty #19124) System Product Name
[ 45.810000] EIP: 0060:[<7908c649>] EFLAGS: 00000086 CPU: 0
[ 45.810000] EIP is at sched_clock_cpu+0x120/0x159
[ 45.810000] EAX: aa7d20b2 EBX: aa7d20b2 ECX: 0000000a EDX: 0000000a
[ 45.810000] ESI: 7be0a450 EDI: 00000001 EBP: 7a42de1c ESP: 7a42ddd8
[ 45.810000] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[ 45.810000] CR0: 8005003b CR2: ffcff000 CR3: 02664000 CR4: 00000690
[ 45.810000] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[ 45.810000] DR6: ffff0ff0 DR7: 00000400
[ 45.810000] Call Trace:
[ 45.810000] [<7905f54b>] update_rq_clock+0x24/0x45
[ 45.810000] [<7905f5d3>] double_rq_lock+0x67/0x7d
[ 45.810000] [<79066bc4>] load_balance+0x116/0x355
[ 45.810000] [<79098b07>] ? trace_hardirqs_on_caller+0x106/0x159
[ 45.810000] [<79066ec6>] rebalance_domains+0xc3/0x134
[ 45.810000] [<79066f78>] run_rebalance_domains+0x41/0xc2
[ 45.810000] [<79071154>] __do_softirq+0xd2/0x1a0
[ 45.810000] [<79071260>] do_softirq+0x3e/0x68
[ 45.810000] [<79071423>] irq_exit+0x4b/0x9f
[ 45.810000] [<790409e3>] smp_apic_timer_interrupt+0x81/0xa0
[ 45.810000] [<7902d7f6>] apic_timer_interrupt+0x36/0x40
[ 45.810000] [<7903377c>] ? test_ti_thread_flag+0x1/0x30
[ 45.810000] [<79033878>] ? need_resched+0x27/0x42
[ 45.810000] [<790338c2>] poll_idle+0x2f/0x76
[ 45.810000] [<7902bd98>] cpu_idle+0xa0/0xd4
[ 45.810000] [<79e2b392>] rest_init+0x7a/0x8d
[ 45.810000] [<7a562ab2>] start_kernel+0x33a/0x350
[ 45.810000] [<7a56209f>] i386_start_kernel+0x9f/0xb5
[ 45.810000] Pid: 0, comm: swapper Tainted: G W 2.6.32-rc2-tip-00990-g6139d57-dirty #19124
[ 45.810000] Call Trace:
[ 45.810000] [<7902bce1>] ? show_regs+0x34/0x4b
[ 45.810000] [<7904188d>] nmi_watchdog_tick+0xa4/0x181
[ 45.810000] [<7902e941>] default_do_nmi+0x64/0x21e
[ 45.810000] [<7902eb5d>] do_nmi+0x62/0xad
[ 45.810000] [<79ec0080>] nmi_stack_correct+0x2f/0x34
[ 45.810000] [<7908c649>] ? sched_clock_cpu+0x120/0x159
[ 45.810000] [<7905f54b>] update_rq_clock+0x24/0x45
[ 45.810000] [<7905f5d3>] double_rq_lock+0x67/0x7d
[ 45.810000] [<79066bc4>] load_balance+0x116/0x355
[ 45.810000] [<79098b07>] ? trace_hardirqs_on_caller+0x106/0x159
[ 45.810000] [<79066ec6>] rebalance_domains+0xc3/0x134
[ 45.810000] [<79066f78>] run_rebalance_domains+0x41/0xc2
[ 45.810000] [<79071154>] __do_softirq+0xd2/0x1a0
[ 45.810000] [<79071260>] do_softirq+0x3e/0x68
[ 45.810000] [<79071423>] irq_exit+0x4b/0x9f
[ 45.810000] [<790409e3>] smp_apic_timer_interrupt+0x81/0xa0
[ 45.810000] [<7902d7f6>] apic_timer_interrupt+0x36/0x40
[ 45.810000] [<7903377c>] ? test_ti_thread_flag+0x1/0x30
[ 45.810000] [<79033878>] ? need_resched+0x27/0x42
[ 45.810000] [<790338c2>] poll_idle+0x2f/0x76
[ 45.810000] [<7902bd98>] cpu_idle+0xa0/0xd4
[ 45.810000] [<79e2b392>] rest_init+0x7a/0x8d
[ 45.810000] [<7a562ab2>] start_kernel+0x33a/0x350
[ 45.810000] [<7a56209f>] i386_start_kernel+0x9f/0xb5

commit 1cb9955464ad248c79c48e9c8be6669020fe178c
Author: Arjan van de Ven <[email protected]>
Date: Wed Sep 30 20:36:19 2009 +0200

sched_clock: Fix atomicity/continuity bug by using cmpxchg64()

Commit def0a9b2573 (sched_clock: Make it NMI safe) assumed
cmpxchg() of 64bit values was available on X86_32.

That is not so - and causes some subtle scheduler misbehavior due
to incorrect timestamps off to up by ~4 seconds.

Two symptoms are known right now:

- interactivity problems seen by Arjan: up to 600 msecs
latencies instead of the expected 20-40 msecs. These
latencies are very visible on the desktop.

- incorrect CPU stats: occasionally too high percentages in 'top',
and crazy CPU usage stats.

Reported-by: Martin Schwidefsky <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: Arjan van de Ven <[email protected]>
Acked-by: Linus Torvalds <[email protected]>
Cc: John Stultz <[email protected]>
Cc: Peter Zijlstra <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>

diff --git a/kernel/sched_clock.c b/kernel/sched_clock.c
index ac2e1dc..479ce56 100644
--- a/kernel/sched_clock.c
+++ b/kernel/sched_clock.c
@@ -127,7 +127,7 @@ again:
clock = wrap_max(clock, min_clock);
clock = wrap_min(clock, max_clock);

- if (cmpxchg(&scd->clock, old_clock, clock) != old_clock)
+ if (cmpxchg64(&scd->clock, old_clock, clock) != old_clock)
goto again;

return clock;
@@ -163,7 +163,7 @@ again:
val = remote_clock;
}

- if (cmpxchg(ptr, old_val, val) != old_val)
+ if (cmpxchg64(ptr, old_val, val) != old_val)
goto again;

return val;

commit 8da752098d4e584fbcc149046e5c716f5d356db4
Author: Arjan van de Ven <[email protected]>
Date: Wed Sep 30 17:07:54 2009 +0200

x86: Provide an alternative() based cmpxchg64()

cmpxchg64() today generates, to quote Linus, "barf bag" code.

cmpxchg64() is about to get used in the scheduler to a bug there,
but it's a prerequisite that cmpxchg64() first be made non-sucking.

This patch turns cmpxchg64() into an efficient implementation that
uses the alternative() mechanism to just use the raw instruction on
all modern systens

Note: the fallback is NOT smp safe, just like the current fallback
is not SMP safe. (Interested parties with i486 based SMP systems
are welcome to submit fix patches for that.)

Signed-off-by: Arjan van de Ven <[email protected]>
Acked-by: Linus Torvalds <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: John Stultz <[email protected]>
Cc: Peter Zijlstra <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>

diff --git a/arch/x86/include/asm/cmpxchg_32.h b/arch/x86/include/asm/cmpxchg_32.h
index 82ceb78..3b21afa 100644
--- a/arch/x86/include/asm/cmpxchg_32.h
+++ b/arch/x86/include/asm/cmpxchg_32.h
@@ -312,19 +312,22 @@ static inline unsigned long cmpxchg_386(volatile void *ptr, unsigned long old,

extern unsigned long long cmpxchg_486_u64(volatile void *, u64, u64);

-#define cmpxchg64(ptr, o, n) \
-({ \
- __typeof__(*(ptr)) __ret; \
- if (likely(boot_cpu_data.x86 > 4)) \
- __ret = (__typeof__(*(ptr)))__cmpxchg64((ptr), \
- (unsigned long long)(o), \
- (unsigned long long)(n)); \
- else \
- __ret = (__typeof__(*(ptr)))cmpxchg_486_u64((ptr), \
- (unsigned long long)(o), \
- (unsigned long long)(n)); \
- __ret; \
-})
+#define cmpxchg64(ptr, o, n) \
+({ \
+ __typeof__(*(ptr)) __ret; \
+ __typeof__(*(ptr)) __old = (o); \
+ __typeof__(*(ptr)) __new = (n); \
+ alternative_io("call cmpxchg8b_emu", \
+ "lock; cmpxchg8b (%%esi)" , \
+ X86_FEATURE_CX8, \
+ "=A" (__ret), \
+ "S" ((ptr)), "0" (__old), \
+ "b" ((unsigned int)__new), \
+ "c" ((unsigned int)(__new>>32))); \
+ __ret; })
+
+
+
#define cmpxchg64_local(ptr, o, n) \
({ \
__typeof__(*(ptr)) __ret; \
diff --git a/arch/x86/kernel/i386_ksyms_32.c b/arch/x86/kernel/i386_ksyms_32.c
index 43cec6b..1736c5a 100644
--- a/arch/x86/kernel/i386_ksyms_32.c
+++ b/arch/x86/kernel/i386_ksyms_32.c
@@ -10,6 +10,14 @@
EXPORT_SYMBOL(mcount);
#endif

+/*
+ * Note, this is a prototype to get at the symbol for
+ * the export, but dont use it from C code, it is used
+ * by assembly code and is not using C calling convention!
+ */
+extern void cmpxchg8b_emu(void);
+EXPORT_SYMBOL(cmpxchg8b_emu);
+
/* Networking helper routines. */
EXPORT_SYMBOL(csum_partial_copy_generic);

diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index 9e60920..3e549b8 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -15,7 +15,7 @@ ifeq ($(CONFIG_X86_32),y)
obj-y += atomic64_32.o
lib-y += checksum_32.o
lib-y += strstr_32.o
- lib-y += semaphore_32.o string_32.o
+ lib-y += semaphore_32.o string_32.o cmpxchg8b_emu.o

lib-$(CONFIG_X86_USE_3DNOW) += mmx_32.o
else
diff --git a/arch/x86/lib/cmpxchg8b_emu.S b/arch/x86/lib/cmpxchg8b_emu.S
new file mode 100644
index 0000000..b8af4c7
--- /dev/null
+++ b/arch/x86/lib/cmpxchg8b_emu.S
@@ -0,0 +1,61 @@
+/*
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ *
+ */
+
+#include <linux/linkage.h>
+#include <asm/alternative-asm.h>
+#include <asm/frame.h>
+#include <asm/dwarf2.h>
+
+
+.text
+
+/*
+ * Inputs:
+ * %esi : memory location to compare
+ * %eax : low 32 bits of old value
+ * %edx : high 32 bits of old value
+ * %ebx : low 32 bits of new value
+ * %ecx : high 32 bits of new value
+ */
+ENTRY(cmpxchg8b_emu)
+ CFI_STARTPROC
+
+ push %edi
+ push %ebx
+ push %ecx
+ /* disable interrupts */
+ pushf
+ pop %edi
+ cli
+
+ cmpl %edx, 4(%esi)
+ jne 1f
+ cmpl %eax, (%esi)
+ jne 1f
+
+ xchg (%esi), %ebx
+ xchg 4(%esi), %ecx
+ mov %ebx, %eax
+ mov %ecx, %edx
+
+2:
+ /* restore interrupts */
+ push %edi
+ popf
+
+ pop %ecx
+ pop %ebx
+ pop %edi
+ ret
+1:
+ mov (%esi), %eax
+ mov 4(%esi), %edx
+ jmp 2b
+ CFI_ENDPROC
+ENDPROC(cmpxchg8b_emu)
+


Attachments:
(No filename) (17.32 kB)
config (76.30 kB)
Download all attachments

2009-09-30 19:35:57

by Ingo Molnar

[permalink] [raw]
Subject: Re: Linux 2.6.32-rc1


* Ingo Molnar <[email protected]> wrote:

> > 4 files changed, 81 insertions(+), 14 deletions(-)
> > create mode 100644 arch/x86/lib/cmpxchg8b_emu.S
>
> The two patches cause hangs on 32-bit, in sched_clock_cpu(). I
> attached the hang serial log plus the two patches as i applied them.
>
> Bug in cmpxchg8b_emu()?

... on this CPU we should be using CMPXCHG8B so cmpxchg8b_emu() shouldnt
be called.

Ingo

2009-09-30 19:38:32

by Arjan van de Ven

[permalink] [raw]
Subject: [tip:sched/urgent] x86: Provide an alternative() based cmpxchg64()

Commit-ID: 8da752098d4e584fbcc149046e5c716f5d356db4
Gitweb: http://git.kernel.org/tip/8da752098d4e584fbcc149046e5c716f5d356db4
Author: Arjan van de Ven <[email protected]>
AuthorDate: Wed, 30 Sep 2009 17:07:54 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Wed, 30 Sep 2009 20:58:26 +0200

x86: Provide an alternative() based cmpxchg64()

cmpxchg64() today generates, to quote Linus, "barf bag" code.

cmpxchg64() is about to get used in the scheduler to a bug there,
but it's a prerequisite that cmpxchg64() first be made non-sucking.

This patch turns cmpxchg64() into an efficient implementation that
uses the alternative() mechanism to just use the raw instruction on
all modern systens

Note: the fallback is NOT smp safe, just like the current fallback
is not SMP safe. (Interested parties with i486 based SMP systems
are welcome to submit fix patches for that.)

Signed-off-by: Arjan van de Ven <[email protected]>
Acked-by: Linus Torvalds <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: John Stultz <[email protected]>
Cc: Peter Zijlstra <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>


---
arch/x86/include/asm/cmpxchg_32.h | 29 ++++++++++--------
arch/x86/kernel/i386_ksyms_32.c | 8 +++++
arch/x86/lib/Makefile | 2 +-
arch/x86/lib/cmpxchg8b_emu.S | 61 +++++++++++++++++++++++++++++++++++++
4 files changed, 86 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/cmpxchg_32.h b/arch/x86/include/asm/cmpxchg_32.h
index 82ceb78..3b21afa 100644
--- a/arch/x86/include/asm/cmpxchg_32.h
+++ b/arch/x86/include/asm/cmpxchg_32.h
@@ -312,19 +312,22 @@ static inline unsigned long cmpxchg_386(volatile void *ptr, unsigned long old,

extern unsigned long long cmpxchg_486_u64(volatile void *, u64, u64);

-#define cmpxchg64(ptr, o, n) \
-({ \
- __typeof__(*(ptr)) __ret; \
- if (likely(boot_cpu_data.x86 > 4)) \
- __ret = (__typeof__(*(ptr)))__cmpxchg64((ptr), \
- (unsigned long long)(o), \
- (unsigned long long)(n)); \
- else \
- __ret = (__typeof__(*(ptr)))cmpxchg_486_u64((ptr), \
- (unsigned long long)(o), \
- (unsigned long long)(n)); \
- __ret; \
-})
+#define cmpxchg64(ptr, o, n) \
+({ \
+ __typeof__(*(ptr)) __ret; \
+ __typeof__(*(ptr)) __old = (o); \
+ __typeof__(*(ptr)) __new = (n); \
+ alternative_io("call cmpxchg8b_emu", \
+ "lock; cmpxchg8b (%%esi)" , \
+ X86_FEATURE_CX8, \
+ "=A" (__ret), \
+ "S" ((ptr)), "0" (__old), \
+ "b" ((unsigned int)__new), \
+ "c" ((unsigned int)(__new>>32))); \
+ __ret; })
+
+
+
#define cmpxchg64_local(ptr, o, n) \
({ \
__typeof__(*(ptr)) __ret; \
diff --git a/arch/x86/kernel/i386_ksyms_32.c b/arch/x86/kernel/i386_ksyms_32.c
index 43cec6b..1736c5a 100644
--- a/arch/x86/kernel/i386_ksyms_32.c
+++ b/arch/x86/kernel/i386_ksyms_32.c
@@ -10,6 +10,14 @@
EXPORT_SYMBOL(mcount);
#endif

+/*
+ * Note, this is a prototype to get at the symbol for
+ * the export, but dont use it from C code, it is used
+ * by assembly code and is not using C calling convention!
+ */
+extern void cmpxchg8b_emu(void);
+EXPORT_SYMBOL(cmpxchg8b_emu);
+
/* Networking helper routines. */
EXPORT_SYMBOL(csum_partial_copy_generic);

diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index 9e60920..3e549b8 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -15,7 +15,7 @@ ifeq ($(CONFIG_X86_32),y)
obj-y += atomic64_32.o
lib-y += checksum_32.o
lib-y += strstr_32.o
- lib-y += semaphore_32.o string_32.o
+ lib-y += semaphore_32.o string_32.o cmpxchg8b_emu.o

lib-$(CONFIG_X86_USE_3DNOW) += mmx_32.o
else
diff --git a/arch/x86/lib/cmpxchg8b_emu.S b/arch/x86/lib/cmpxchg8b_emu.S
new file mode 100644
index 0000000..b8af4c7
--- /dev/null
+++ b/arch/x86/lib/cmpxchg8b_emu.S
@@ -0,0 +1,61 @@
+/*
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ *
+ */
+
+#include <linux/linkage.h>
+#include <asm/alternative-asm.h>
+#include <asm/frame.h>
+#include <asm/dwarf2.h>
+
+
+.text
+
+/*
+ * Inputs:
+ * %esi : memory location to compare
+ * %eax : low 32 bits of old value
+ * %edx : high 32 bits of old value
+ * %ebx : low 32 bits of new value
+ * %ecx : high 32 bits of new value
+ */
+ENTRY(cmpxchg8b_emu)
+ CFI_STARTPROC
+
+ push %edi
+ push %ebx
+ push %ecx
+ /* disable interrupts */
+ pushf
+ pop %edi
+ cli
+
+ cmpl %edx, 4(%esi)
+ jne 1f
+ cmpl %eax, (%esi)
+ jne 1f
+
+ xchg (%esi), %ebx
+ xchg 4(%esi), %ecx
+ mov %ebx, %eax
+ mov %ecx, %edx
+
+2:
+ /* restore interrupts */
+ push %edi
+ popf
+
+ pop %ecx
+ pop %ebx
+ pop %edi
+ ret
+1:
+ mov (%esi), %eax
+ mov 4(%esi), %edx
+ jmp 2b
+ CFI_ENDPROC
+ENDPROC(cmpxchg8b_emu)
+

2009-09-30 19:38:39

by Arjan van de Ven

[permalink] [raw]
Subject: [tip:sched/urgent] sched_clock: Fix atomicity/continuity bug by using cmpxchg64()

Commit-ID: 1cb9955464ad248c79c48e9c8be6669020fe178c
Gitweb: http://git.kernel.org/tip/1cb9955464ad248c79c48e9c8be6669020fe178c
Author: Arjan van de Ven <[email protected]>
AuthorDate: Wed, 30 Sep 2009 20:36:19 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Wed, 30 Sep 2009 20:58:27 +0200

sched_clock: Fix atomicity/continuity bug by using cmpxchg64()

Commit def0a9b2573 (sched_clock: Make it NMI safe) assumed
cmpxchg() of 64bit values was available on X86_32.

That is not so - and causes some subtle scheduler misbehavior due
to incorrect timestamps off to up by ~4 seconds.

Two symptoms are known right now:

- interactivity problems seen by Arjan: up to 600 msecs
latencies instead of the expected 20-40 msecs. These
latencies are very visible on the desktop.

- incorrect CPU stats: occasionally too high percentages in 'top',
and crazy CPU usage stats.

Reported-by: Martin Schwidefsky <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: Arjan van de Ven <[email protected]>
Acked-by: Linus Torvalds <[email protected]>
Cc: John Stultz <[email protected]>
Cc: Peter Zijlstra <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>


---
kernel/sched_clock.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sched_clock.c b/kernel/sched_clock.c
index ac2e1dc..479ce56 100644
--- a/kernel/sched_clock.c
+++ b/kernel/sched_clock.c
@@ -127,7 +127,7 @@ again:
clock = wrap_max(clock, min_clock);
clock = wrap_min(clock, max_clock);

- if (cmpxchg(&scd->clock, old_clock, clock) != old_clock)
+ if (cmpxchg64(&scd->clock, old_clock, clock) != old_clock)
goto again;

return clock;
@@ -163,7 +163,7 @@ again:
val = remote_clock;
}

- if (cmpxchg(ptr, old_val, val) != old_val)
+ if (cmpxchg64(ptr, old_val, val) != old_val)
goto again;

return val;

2009-09-30 19:40:01

by Ingo Molnar

[permalink] [raw]
Subject: Re: [tip:sched/urgent] sched_clock: Fix atomicity/continuity bug by using cmpxchg64()


* tip-bot for Arjan van de Ven <[email protected]> wrote:

> Commit-ID: 1cb9955464ad248c79c48e9c8be6669020fe178c
> Gitweb: http://git.kernel.org/tip/1cb9955464ad248c79c48e9c8be6669020fe178c
> Author: Arjan van de Ven <[email protected]>

Note: i just fixed this to be Eric.

Ingo

2009-09-30 19:41:18

by Eric Dumazet

[permalink] [raw]
Subject: [tip:sched/urgent] sched_clock: Fix atomicity/continuity bug by using cmpxchg64()

Commit-ID: 54f2adad1d808252110e3f494a5beb25a4fc7b84
Gitweb: http://git.kernel.org/tip/54f2adad1d808252110e3f494a5beb25a4fc7b84
Author: Eric Dumazet <[email protected]>
AuthorDate: Wed, 30 Sep 2009 20:36:19 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Wed, 30 Sep 2009 21:37:58 +0200

sched_clock: Fix atomicity/continuity bug by using cmpxchg64()

Commit def0a9b2573 (sched_clock: Make it NMI safe) assumed
cmpxchg() of 64bit values was available on X86_32.

That is not so - and causes some subtle scheduler misbehavior due
to incorrect timestamps off to up by ~4 seconds.

Two symptoms are known right now:

- interactivity problems seen by Arjan: up to 600 msecs
latencies instead of the expected 20-40 msecs. These
latencies are very visible on the desktop.

- incorrect CPU stats: occasionally too high percentages in 'top',
and crazy CPU usage stats.

Reported-by: Martin Schwidefsky <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: Arjan van de Ven <[email protected]>
Acked-by: Linus Torvalds <[email protected]>
Cc: John Stultz <[email protected]>
Cc: Peter Zijlstra <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>


---
kernel/sched_clock.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sched_clock.c b/kernel/sched_clock.c
index ac2e1dc..479ce56 100644
--- a/kernel/sched_clock.c
+++ b/kernel/sched_clock.c
@@ -127,7 +127,7 @@ again:
clock = wrap_max(clock, min_clock);
clock = wrap_min(clock, max_clock);

- if (cmpxchg(&scd->clock, old_clock, clock) != old_clock)
+ if (cmpxchg64(&scd->clock, old_clock, clock) != old_clock)
goto again;

return clock;
@@ -163,7 +163,7 @@ again:
val = remote_clock;
}

- if (cmpxchg(ptr, old_val, val) != old_val)
+ if (cmpxchg64(ptr, old_val, val) != old_val)
goto again;

return val;

2009-09-30 20:16:42

by Eric Dumazet

[permalink] [raw]
Subject: Re: Linux 2.6.32-rc1

Ingo Molnar a ?crit :
> * Arjan van de Ven <[email protected]> wrote:
>
>> From 5a76986c5dd272ea16a9b8abb7349ff3d6791c2b Mon Sep 17 00:00:00 2001
>> From: Arjan van de Ven <[email protected]>
>> Date: Wed, 30 Sep 2009 17:04:35 +0200
>> Subject: [PATCH] x86: Provide an alternative() based cmpxchg64()
>>
>> Based on Linus' patch, this patch provides cmpxchg64() using
>> the alternative() infrastructure.
>>
>> Note: the fallback is NOT smp safe, just like the current fallback
>> is not SMP safe.
>>
>> Signed-off-by: Arjan van de Ven <[email protected]>
>> ---
>> arch/x86/include/asm/cmpxchg_32.h | 29 ++++++++++--------
>> arch/x86/kernel/i386_ksyms_32.c | 3 ++
>> arch/x86/lib/Makefile | 2 +-
>> arch/x86/lib/cmpxchg8b_emu.S | 61 +++++++++++++++++++++++++++++++++++++
>> 4 files changed, 81 insertions(+), 14 deletions(-)
>> create mode 100644 arch/x86/lib/cmpxchg8b_emu.S
>
> The two patches cause hangs on 32-bit, in sched_clock_cpu(). I attached
> the hang serial log plus the two patches as i applied them.
>
> Bug in cmpxchg8b_emu()?
>
> Ingo
>
> [ 45.800000] i2c-core: driver [tw9910] registered
> [ 45.810000] initcall tw9910_module_init+0x0/0x2e returned 0 after 9765 usecs
> [ 45.810000] BUG: spinlock lockup on CPU#1, swapper/1, 7be09ec0
> [ 45.810000] Pid: 1, comm: swapper Tainted: G W 2.6.32-rc2-tip-00990-g6139d57-dirty #19124
> [ 45.810000] Call Trace:
> [ 45.810000] [<79ebce42>] ? printk+0x22/0x35
> [ 45.810000] [<794356c7>] _raw_spin_lock+0x103/0x13f
> [ 45.810000] [<79ebf84d>] _spin_lock+0x3c/0x55
> [ 45.810000] [<790673e0>] ? scheduler_tick+0x44/0x20e
> [ 45.810000] [<790673e0>] scheduler_tick+0x44/0x20e
> [ 45.810000] [<790793fc>] update_process_times+0x4a/0x68
> [ 45.810000] [<7909354c>] tick_periodic+0x7a/0x8d
> [ 45.810000] [<79093588>] tick_handle_periodic+0x29/0x91
> [ 45.810000] [<7909398f>] tick_do_broadcast+0x42/0x7d
> [ 45.810000] [<79093b17>] tick_do_periodic_broadcast+0x3c/0x59
> [ 45.810000] [<79093fb0>] tick_handle_periodic_broadcast+0x20/0x70
> [ 45.810000] [<7902fac1>] timer_interrupt+0x4d/0x85
> [ 45.810000] [<790ab402>] handle_IRQ_event+0x65/0x13b
> [ 45.810000] [<790ad1fc>] handle_edge_irq+0xbe/0x111
> [ 45.810000] [<7902f553>] handle_irq+0x2f/0x46
> [ 45.810000] [<7902ee8b>] do_IRQ+0x51/0xb8
> [ 45.810000] [<7902d435>] common_interrupt+0x35/0x40
> [ 45.810000] [<7906afe2>] ? vprintk+0x3a8/0x3fa
> [ 45.810000] [<79090020>] ? ntp_clear+0x22/0x7d
> [ 45.810000] [<79ebdf6c>] ? __mutex_unlock_slowpath+0x10e/0x12f
> [ 45.810000] [<7a5ab658>] ? tw9910_module_init+0x0/0x2e
> [ 45.810000] [<7a5ab658>] ? tw9910_module_init+0x0/0x2e
> [ 45.810000] [<79ebce42>] printk+0x22/0x35
> [ 45.810000] [<790010ed>] do_one_initcall+0xc4/0x183
> [ 45.810000] [<7a5ab658>] ? tw9910_module_init+0x0/0x2e
> [ 45.810000] [<7a562474>] do_basic_setup+0x50/0x72
> [ 45.810000] [<7a562509>] kernel_init+0x73/0xc5
> [ 45.810000] [<7a562496>] ? kernel_init+0x0/0xc5
> [ 45.810000] [<7902d997>] kernel_thread_helper+0x7/0x10
> [ 45.810000] sending NMI to all CPUs:
> [ 45.810000] NMI backtrace for cpu 1
> [ 45.810000]
> [ 45.810000] Pid: 1, comm: swapper Tainted: G W (2.6.32-rc2-tip-00990-g6139d57-dirty #19124) System Product Name
> [ 45.810000] EIP: 0060:[<79097b38>] EFLAGS: 00000046 CPU: 1
> [ 45.810000] EIP is at trace_hardirqs_off_caller+0x3f/0xbd
> [ 45.810000] EAX: 82f26dd9 EBX: b78c8000 ECX: 00000000 EDX: 790412fc
> [ 45.810000] ESI: 790412fc EDI: 00000002 EBP: b78c3d10 ESP: b78c3d04
> [ 45.810000] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> [ 45.810000] CR0: 8005003b CR2: 00000000 CR3: 02664000 CR4: 00000690
> [ 45.810000] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> [ 45.810000] DR6: ffff0ff0 DR7: 00000400
> [ 45.810000] Call Trace:
> [ 45.810000] [<79097bcf>] trace_hardirqs_off+0x19/0x2c
> [ 45.810000] [<790412fc>] default_send_IPI_mask_logical+0x8f/0xb1
> [ 45.810000] [<79041084>] default_send_IPI_all+0x35/0x87
> [ 45.810000] [<790416bc>] arch_trigger_all_cpu_backtrace+0x40/0x73
> [ 45.810000] [<794356cc>] _raw_spin_lock+0x108/0x13f
> [ 45.810000] [<79ebf84d>] _spin_lock+0x3c/0x55
> [ 45.810000] [<790673e0>] ? scheduler_tick+0x44/0x20e
> [ 45.810000] [<790673e0>] scheduler_tick+0x44/0x20e
> [ 45.810000] [<790793fc>] update_process_times+0x4a/0x68
> [ 45.810000] [<7909354c>] tick_periodic+0x7a/0x8d
> [ 45.810000] [<79093588>] tick_handle_periodic+0x29/0x91
> [ 45.810000] [<7909398f>] tick_do_broadcast+0x42/0x7d
> [ 45.810000] [<79093b17>] tick_do_periodic_broadcast+0x3c/0x59
> [ 45.810000] [<79093fb0>] tick_handle_periodic_broadcast+0x20/0x70
> [ 45.810000] [<7902fac1>] timer_interrupt+0x4d/0x85
> [ 45.810000] [<790ab402>] handle_IRQ_event+0x65/0x13b
> [ 45.810000] [<790ad1fc>] handle_edge_irq+0xbe/0x111
> [ 45.810000] [<7902f553>] handle_irq+0x2f/0x46
> [ 45.810000] [<7902ee8b>] do_IRQ+0x51/0xb8
> [ 45.810000] [<7902d435>] common_interrupt+0x35/0x40
> [ 45.810000] [<7906afe2>] ? vprintk+0x3a8/0x3fa
> [ 45.810000] [<79090020>] ? ntp_clear+0x22/0x7d
> [ 45.810000] [<79ebdf6c>] ? __mutex_unlock_slowpath+0x10e/0x12f
> [ 45.810000] [<7a5ab658>] ? tw9910_module_init+0x0/0x2e
> [ 45.810000] [<7a5ab658>] ? tw9910_module_init+0x0/0x2e
> [ 45.810000] [<79ebce42>] printk+0x22/0x35
> [ 45.810000] [<790010ed>] do_one_initcall+0xc4/0x183
> [ 45.810000] [<7a5ab658>] ? tw9910_module_init+0x0/0x2e
> [ 45.810000] [<7a562474>] do_basic_setup+0x50/0x72
> [ 45.810000] [<7a562509>] kernel_init+0x73/0xc5
> [ 45.810000] [<7a562496>] ? kernel_init+0x0/0xc5
> [ 45.810000] [<7902d997>] kernel_thread_helper+0x7/0x10
> [ 45.810000] Pid: 1, comm: swapper Tainted: G W 2.6.32-rc2-tip-00990-g6139d57-dirty #19124
> [ 45.810000] Call Trace:
> [ 45.810000] [<7902bce1>] ? show_regs+0x34/0x4b
> [ 45.810000] [<7904188d>] nmi_watchdog_tick+0xa4/0x181
> [ 45.810000] [<7902e941>] default_do_nmi+0x64/0x21e
> [ 45.810000] [<790412fc>] ? default_send_IPI_mask_logical+0x8f/0xb1
> [ 45.810000] [<7902eb5d>] do_nmi+0x62/0xad
> [ 45.810000] [<79ec0080>] nmi_stack_correct+0x2f/0x34
> [ 45.810000] [<790412fc>] ? default_send_IPI_mask_logical+0x8f/0xb1
> [ 45.810000] [<790412fc>] ? default_send_IPI_mask_logical+0x8f/0xb1
> [ 45.810000] [<79097b38>] ? trace_hardirqs_off_caller+0x3f/0xbd
> [ 45.810000] [<79097bcf>] trace_hardirqs_off+0x19/0x2c
> [ 45.810000] [<790412fc>] default_send_IPI_mask_logical+0x8f/0xb1
> [ 45.810000] [<79041084>] default_send_IPI_all+0x35/0x87
> [ 45.810000] [<790416bc>] arch_trigger_all_cpu_backtrace+0x40/0x73
> [ 45.810000] [<794356cc>] _raw_spin_lock+0x108/0x13f
> [ 45.810000] [<79ebf84d>] _spin_lock+0x3c/0x55
> [ 45.810000] [<790673e0>] ? scheduler_tick+0x44/0x20e
> [ 45.810000] [<790673e0>] scheduler_tick+0x44/0x20e
> [ 45.810000] [<790793fc>] update_process_times+0x4a/0x68
> [ 45.810000] [<7909354c>] tick_periodic+0x7a/0x8d
> [ 45.810000] [<79093588>] tick_handle_periodic+0x29/0x91
> [ 45.810000] [<7909398f>] tick_do_broadcast+0x42/0x7d
> [ 45.810000] [<79093b17>] tick_do_periodic_broadcast+0x3c/0x59
> [ 45.810000] [<79093fb0>] tick_handle_periodic_broadcast+0x20/0x70
> [ 45.810000] [<7902fac1>] timer_interrupt+0x4d/0x85
> [ 45.810000] [<790ab402>] handle_IRQ_event+0x65/0x13b
> [ 45.810000] [<790ad1fc>] handle_edge_irq+0xbe/0x111
> [ 45.810000] [<7902f553>] handle_irq+0x2f/0x46
> [ 45.810000] [<7902ee8b>] do_IRQ+0x51/0xb8
> [ 45.810000] [<7902d435>] common_interrupt+0x35/0x40
> [ 45.810000] [<7906afe2>] ? vprintk+0x3a8/0x3fa
> [ 45.810000] [<79090020>] ? ntp_clear+0x22/0x7d
> [ 45.810000] [<79ebdf6c>] ? __mutex_unlock_slowpath+0x10e/0x12f
> [ 45.810000] [<7a5ab658>] ? tw9910_module_init+0x0/0x2e
> [ 45.810000] [<7a5ab658>] ? tw9910_module_init+0x0/0x2e
> [ 45.810000] [<79ebce42>] printk+0x22/0x35
> [ 45.810000] [<790010ed>] do_one_initcall+0xc4/0x183
> [ 45.810000] [<7a5ab658>] ? tw9910_module_init+0x0/0x2e
> [ 45.810000] [<7a562474>] do_basic_setup+0x50/0x72
> [ 45.810000] [<7a562509>] kernel_init+0x73/0xc5
> [ 45.810000] [<7a562496>] ? kernel_init+0x0/0xc5
> [ 45.810000] [<7902d997>] kernel_thread_helper+0x7/0x10
> [ 45.810000] NMI backtrace for cpu 0
> [ 45.810000]
> [ 45.810000] Pid: 0, comm: swapper Tainted: G W (2.6.32-rc2-tip-00990-g6139d57-dirty #19124) System Product Name
> [ 45.810000] EIP: 0060:[<7908c649>] EFLAGS: 00000086 CPU: 0
> [ 45.810000] EIP is at sched_clock_cpu+0x120/0x159
> [ 45.810000] EAX: aa7d20b2 EBX: aa7d20b2 ECX: 0000000a EDX: 0000000a
> [ 45.810000] ESI: 7be0a450 EDI: 00000001 EBP: 7a42de1c ESP: 7a42ddd8
> [ 45.810000] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> [ 45.810000] CR0: 8005003b CR2: ffcff000 CR3: 02664000 CR4: 00000690
> [ 45.810000] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> [ 45.810000] DR6: ffff0ff0 DR7: 00000400
> [ 45.810000] Call Trace:
> [ 45.810000] [<7905f54b>] update_rq_clock+0x24/0x45
> [ 45.810000] [<7905f5d3>] double_rq_lock+0x67/0x7d
> [ 45.810000] [<79066bc4>] load_balance+0x116/0x355
> [ 45.810000] [<79098b07>] ? trace_hardirqs_on_caller+0x106/0x159
> [ 45.810000] [<79066ec6>] rebalance_domains+0xc3/0x134
> [ 45.810000] [<79066f78>] run_rebalance_domains+0x41/0xc2
> [ 45.810000] [<79071154>] __do_softirq+0xd2/0x1a0
> [ 45.810000] [<79071260>] do_softirq+0x3e/0x68
> [ 45.810000] [<79071423>] irq_exit+0x4b/0x9f
> [ 45.810000] [<790409e3>] smp_apic_timer_interrupt+0x81/0xa0
> [ 45.810000] [<7902d7f6>] apic_timer_interrupt+0x36/0x40
> [ 45.810000] [<7903377c>] ? test_ti_thread_flag+0x1/0x30
> [ 45.810000] [<79033878>] ? need_resched+0x27/0x42
> [ 45.810000] [<790338c2>] poll_idle+0x2f/0x76
> [ 45.810000] [<7902bd98>] cpu_idle+0xa0/0xd4
> [ 45.810000] [<79e2b392>] rest_init+0x7a/0x8d
> [ 45.810000] [<7a562ab2>] start_kernel+0x33a/0x350
> [ 45.810000] [<7a56209f>] i386_start_kernel+0x9f/0xb5
> [ 45.810000] Pid: 0, comm: swapper Tainted: G W 2.6.32-rc2-tip-00990-g6139d57-dirty #19124
> [ 45.810000] Call Trace:
> [ 45.810000] [<7902bce1>] ? show_regs+0x34/0x4b
> [ 45.810000] [<7904188d>] nmi_watchdog_tick+0xa4/0x181
> [ 45.810000] [<7902e941>] default_do_nmi+0x64/0x21e
> [ 45.810000] [<7902eb5d>] do_nmi+0x62/0xad
> [ 45.810000] [<79ec0080>] nmi_stack_correct+0x2f/0x34
> [ 45.810000] [<7908c649>] ? sched_clock_cpu+0x120/0x159
> [ 45.810000] [<7905f54b>] update_rq_clock+0x24/0x45
> [ 45.810000] [<7905f5d3>] double_rq_lock+0x67/0x7d
> [ 45.810000] [<79066bc4>] load_balance+0x116/0x355
> [ 45.810000] [<79098b07>] ? trace_hardirqs_on_caller+0x106/0x159
> [ 45.810000] [<79066ec6>] rebalance_domains+0xc3/0x134
> [ 45.810000] [<79066f78>] run_rebalance_domains+0x41/0xc2
> [ 45.810000] [<79071154>] __do_softirq+0xd2/0x1a0
> [ 45.810000] [<79071260>] do_softirq+0x3e/0x68
> [ 45.810000] [<79071423>] irq_exit+0x4b/0x9f
> [ 45.810000] [<790409e3>] smp_apic_timer_interrupt+0x81/0xa0
> [ 45.810000] [<7902d7f6>] apic_timer_interrupt+0x36/0x40
> [ 45.810000] [<7903377c>] ? test_ti_thread_flag+0x1/0x30
> [ 45.810000] [<79033878>] ? need_resched+0x27/0x42
> [ 45.810000] [<790338c2>] poll_idle+0x2f/0x76
> [ 45.810000] [<7902bd98>] cpu_idle+0xa0/0xd4
> [ 45.810000] [<79e2b392>] rest_init+0x7a/0x8d
> [ 45.810000] [<7a562ab2>] start_kernel+0x33a/0x350
> [ 45.810000] [<7a56209f>] i386_start_kernel+0x9f/0xb5
>
> commit 1cb9955464ad248c79c48e9c8be6669020fe178c
> Author: Arjan van de Ven <[email protected]>
> Date: Wed Sep 30 20:36:19 2009 +0200
>
> sched_clock: Fix atomicity/continuity bug by using cmpxchg64()
>
> Commit def0a9b2573 (sched_clock: Make it NMI safe) assumed
> cmpxchg() of 64bit values was available on X86_32.
>
> That is not so - and causes some subtle scheduler misbehavior due
> to incorrect timestamps off to up by ~4 seconds.
>
> Two symptoms are known right now:
>
> - interactivity problems seen by Arjan: up to 600 msecs
> latencies instead of the expected 20-40 msecs. These
> latencies are very visible on the desktop.
>
> - incorrect CPU stats: occasionally too high percentages in 'top',
> and crazy CPU usage stats.
>
> Reported-by: Martin Schwidefsky <[email protected]>
> Signed-off-by: Eric Dumazet <[email protected]>
> Signed-off-by: Arjan van de Ven <[email protected]>
> Acked-by: Linus Torvalds <[email protected]>
> Cc: John Stultz <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> LKML-Reference: <[email protected]>
> Signed-off-by: Ingo Molnar <[email protected]>
>
> diff --git a/kernel/sched_clock.c b/kernel/sched_clock.c
> index ac2e1dc..479ce56 100644
> --- a/kernel/sched_clock.c
> +++ b/kernel/sched_clock.c
> @@ -127,7 +127,7 @@ again:
> clock = wrap_max(clock, min_clock);
> clock = wrap_min(clock, max_clock);
>
> - if (cmpxchg(&scd->clock, old_clock, clock) != old_clock)
> + if (cmpxchg64(&scd->clock, old_clock, clock) != old_clock)
> goto again;
>
> return clock;
> @@ -163,7 +163,7 @@ again:
> val = remote_clock;
> }
>
> - if (cmpxchg(ptr, old_val, val) != old_val)
> + if (cmpxchg64(ptr, old_val, val) != old_val)
> goto again;
>
> return val;
>
> commit 8da752098d4e584fbcc149046e5c716f5d356db4
> Author: Arjan van de Ven <[email protected]>
> Date: Wed Sep 30 17:07:54 2009 +0200
>
> x86: Provide an alternative() based cmpxchg64()
>
> cmpxchg64() today generates, to quote Linus, "barf bag" code.
>
> cmpxchg64() is about to get used in the scheduler to a bug there,
> but it's a prerequisite that cmpxchg64() first be made non-sucking.
>
> This patch turns cmpxchg64() into an efficient implementation that
> uses the alternative() mechanism to just use the raw instruction on
> all modern systens
>
> Note: the fallback is NOT smp safe, just like the current fallback
> is not SMP safe. (Interested parties with i486 based SMP systems
> are welcome to submit fix patches for that.)
>
> Signed-off-by: Arjan van de Ven <[email protected]>
> Acked-by: Linus Torvalds <[email protected]>
> Cc: Eric Dumazet <[email protected]>
> Cc: Martin Schwidefsky <[email protected]>
> Cc: John Stultz <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> LKML-Reference: <[email protected]>
> Signed-off-by: Ingo Molnar <[email protected]>
>
> diff --git a/arch/x86/include/asm/cmpxchg_32.h b/arch/x86/include/asm/cmpxchg_32.h
> index 82ceb78..3b21afa 100644
> --- a/arch/x86/include/asm/cmpxchg_32.h
> +++ b/arch/x86/include/asm/cmpxchg_32.h
> @@ -312,19 +312,22 @@ static inline unsigned long cmpxchg_386(volatile void *ptr, unsigned long old,
>
> extern unsigned long long cmpxchg_486_u64(volatile void *, u64, u64);
>
> -#define cmpxchg64(ptr, o, n) \
> -({ \
> - __typeof__(*(ptr)) __ret; \
> - if (likely(boot_cpu_data.x86 > 4)) \
> - __ret = (__typeof__(*(ptr)))__cmpxchg64((ptr), \
> - (unsigned long long)(o), \
> - (unsigned long long)(n)); \
> - else \
> - __ret = (__typeof__(*(ptr)))cmpxchg_486_u64((ptr), \
> - (unsigned long long)(o), \
> - (unsigned long long)(n)); \
> - __ret; \
> -})
> +#define cmpxchg64(ptr, o, n) \
> +({ \
> + __typeof__(*(ptr)) __ret; \
> + __typeof__(*(ptr)) __old = (o); \
> + __typeof__(*(ptr)) __new = (n); \
> + alternative_io("call cmpxchg8b_emu", \
> + "lock; cmpxchg8b (%%esi)" , \
> + X86_FEATURE_CX8, \
> + "=A" (__ret), \
> + "S" ((ptr)), "0" (__old), \
> + "b" ((unsigned int)__new), \
> + "c" ((unsigned int)(__new>>32))); \

probably a "memory" clobber is needed, alternative_io() doesnt provide it.



> + __ret; })
> +
> +
> +
> #define cmpxchg64_local(ptr, o, n) \
> ({ \
> __typeof__(*(ptr)) __ret; \
> diff --git a/arch/x86/kernel/i386_ksyms_32.c b/arch/x86/kernel/i386_ksyms_32.c
> index 43cec6b..1736c5a 100644
> --- a/arch/x86/kernel/i386_ksyms_32.c
> +++ b/arch/x86/kernel/i386_ksyms_32.c
> @@ -10,6 +10,14 @@
> EXPORT_SYMBOL(mcount);
> #endif
>
> +/*
> + * Note, this is a prototype to get at the symbol for
> + * the export, but dont use it from C code, it is used
> + * by assembly code and is not using C calling convention!
> + */
> +extern void cmpxchg8b_emu(void);
> +EXPORT_SYMBOL(cmpxchg8b_emu);
> +
> /* Networking helper routines. */
> EXPORT_SYMBOL(csum_partial_copy_generic);
>
> diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
> index 9e60920..3e549b8 100644
> --- a/arch/x86/lib/Makefile
> +++ b/arch/x86/lib/Makefile
> @@ -15,7 +15,7 @@ ifeq ($(CONFIG_X86_32),y)
> obj-y += atomic64_32.o
> lib-y += checksum_32.o
> lib-y += strstr_32.o
> - lib-y += semaphore_32.o string_32.o
> + lib-y += semaphore_32.o string_32.o cmpxchg8b_emu.o
>
> lib-$(CONFIG_X86_USE_3DNOW) += mmx_32.o
> else
> diff --git a/arch/x86/lib/cmpxchg8b_emu.S b/arch/x86/lib/cmpxchg8b_emu.S
> new file mode 100644
> index 0000000..b8af4c7
> --- /dev/null
> +++ b/arch/x86/lib/cmpxchg8b_emu.S
> @@ -0,0 +1,61 @@
> +/*
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; version 2
> + * of the License.
> + *
> + */
> +
> +#include <linux/linkage.h>
> +#include <asm/alternative-asm.h>
> +#include <asm/frame.h>
> +#include <asm/dwarf2.h>
> +
> +
> +.text
> +
> +/*
> + * Inputs:
> + * %esi : memory location to compare
> + * %eax : low 32 bits of old value
> + * %edx : high 32 bits of old value
> + * %ebx : low 32 bits of new value
> + * %ecx : high 32 bits of new value
> + */
> +ENTRY(cmpxchg8b_emu)
> + CFI_STARTPROC
> +
> + push %edi
> + push %ebx
> + push %ecx
> + /* disable interrupts */
> + pushf
> + pop %edi
> + cli
> +
> + cmpl %edx, 4(%esi)
> + jne 1f
> + cmpl %eax, (%esi)
> + jne 1f
> +
> + xchg (%esi), %ebx
> + xchg 4(%esi), %ecx
> + mov %ebx, %eax
> + mov %ecx, %edx
> +
> +2:
> + /* restore interrupts */
> + push %edi
> + popf
> +
> + pop %ecx
> + pop %ebx
> + pop %edi
> + ret
> +1:
> + mov (%esi), %eax
> + mov 4(%esi), %edx
> + jmp 2b
> + CFI_ENDPROC
> +ENDPROC(cmpxchg8b_emu)
> +
>

2009-09-30 20:20:42

by Linus Torvalds

[permalink] [raw]
Subject: Re: Linux 2.6.32-rc1



On Wed, 30 Sep 2009, Arjan van de Ven wrote:
> +ENTRY(cmpxchg8b_emu)
> + CFI_STARTPROC
> +
> + push %edi
> + push %ebx
> + push %ecx
> + /* disable interrupts */
> + pushf
> + pop %edi
> + cli
> +
> + cmpl %edx, 4(%esi)
> + jne 1f
> + cmpl %eax, (%esi)
> + jne 1f
> +
> + xchg (%esi), %ebx
> + xchg 4(%esi), %ecx
> + mov %ebx, %eax
> + mov %ecx, %edx

Ok, so why do you do this? You've just checked that the 8 bytes at esi are
the same as edx:eax, why do you do that odd xchg?

Why isn't this part just

mov %ebx,(%esi)
mov %ecx,4(%esi)

and leave ebx/ecx alone?

I also don't see why you play games with eflags and %edi. Just leave the
flags on the stack. So it all would look like

#
# Emulate 'cmpxchg8b (%esi)' except we don't
# set the whole ZF thing (caller will just
# compare eax:edx with the expected value)
#
cmpxchg8b_emu:
pushfl
cli
cmpl (%esi),%eax
jne not_same
cmpl 4(%esi),%edx
jne not_same
movl %ebx,(%esi)
movl %ecx,4(%esi)
popfl
ret
not_same:
movl (%esi),%eax
movl 4(%esi),%edx
popfl
ret

I dunno. You _could_ use edi/ebp as a temporary for 'dest' to only do the
load once (and add push/pop to save the registers), but it's not like the
above is really atomic anyway, and it very much depends on 'cli' just
making sure that nothing else happens. So the above seems to be the
simplest emulation, considering the interfaces..

UNTESTED! I wrote this in an email editor. I haven't assembled it or
verified the semantics or workingness in any way, shape or form.

Linus

2009-09-30 20:21:48

by Linus Torvalds

[permalink] [raw]
Subject: Re: Linux 2.6.32-rc1



On Wed, 30 Sep 2009, Eric Dumazet wrote:
> > +#define cmpxchg64(ptr, o, n) \
> > +({ \
> > + __typeof__(*(ptr)) __ret; \
> > + __typeof__(*(ptr)) __old = (o); \
> > + __typeof__(*(ptr)) __new = (n); \
> > + alternative_io("call cmpxchg8b_emu", \
> > + "lock; cmpxchg8b (%%esi)" , \
> > + X86_FEATURE_CX8, \
> > + "=A" (__ret), \
> > + "S" ((ptr)), "0" (__old), \
> > + "b" ((unsigned int)__new), \
> > + "c" ((unsigned int)(__new>>32))); \
>
> probably a "memory" clobber is needed, alternative_io() doesnt provide it.

Yeah, good catch.

Linus

2009-09-30 20:22:53

by Ingo Molnar

[permalink] [raw]
Subject: Re: Linux 2.6.32-rc1


* Linus Torvalds <[email protected]> wrote:

> On Wed, 30 Sep 2009, Eric Dumazet wrote:
> > > +#define cmpxchg64(ptr, o, n) \
> > > +({ \
> > > + __typeof__(*(ptr)) __ret; \
> > > + __typeof__(*(ptr)) __old = (o); \
> > > + __typeof__(*(ptr)) __new = (n); \
> > > + alternative_io("call cmpxchg8b_emu", \
> > > + "lock; cmpxchg8b (%%esi)" , \
> > > + X86_FEATURE_CX8, \
> > > + "=A" (__ret), \
> > > + "S" ((ptr)), "0" (__old), \
> > > + "b" ((unsigned int)__new), \
> > > + "c" ((unsigned int)(__new>>32))); \
> >
> > probably a "memory" clobber is needed, alternative_io() doesnt provide it.
>
> Yeah, good catch.

Yeah - i'm testing this right now, it looks like something that could
explain the hang.

Ingo

2009-09-30 20:25:44

by Eric Dumazet

[permalink] [raw]
Subject: Re: Linux 2.6.32-rc1

Linus Torvalds a ?crit :
>
> On Wed, 30 Sep 2009, Arjan van de Ven wrote:
>> +ENTRY(cmpxchg8b_emu)
>> + CFI_STARTPROC
>> +
>> + push %edi
>> + push %ebx
>> + push %ecx
>> + /* disable interrupts */
>> + pushf
>> + pop %edi
>> + cli
>> +
>> + cmpl %edx, 4(%esi)
>> + jne 1f
>> + cmpl %eax, (%esi)
>> + jne 1f
>> +
>> + xchg (%esi), %ebx
>> + xchg 4(%esi), %ecx
>> + mov %ebx, %eax
>> + mov %ecx, %edx
>
> Ok, so why do you do this? You've just checked that the 8 bytes at esi are
> the same as edx:eax, why do you do that odd xchg?
>
> Why isn't this part just
>
> mov %ebx,(%esi)
> mov %ecx,4(%esi)
>
> and leave ebx/ecx alone?
>
> I also don't see why you play games with eflags and %edi. Just leave the
> flags on the stack. So it all would look like
>
> #
> # Emulate 'cmpxchg8b (%esi)' except we don't
> # set the whole ZF thing (caller will just
> # compare eax:edx with the expected value)
> #
> cmpxchg8b_emu:
> pushfl
> cli
> cmpl (%esi),%eax
> jne not_same
> cmpl 4(%esi),%edx
> jne not_same
> movl %ebx,(%esi)
> movl %ecx,4(%esi)
> popfl
> ret
> not_same:
> movl (%esi),%eax
> movl 4(%esi),%edx
> popfl
> ret
>
> I dunno. You _could_ use edi/ebp as a temporary for 'dest' to only do the
> load once (and add push/pop to save the registers), but it's not like the
> above is really atomic anyway, and it very much depends on 'cli' just
> making sure that nothing else happens. So the above seems to be the
> simplest emulation, considering the interfaces..
>
> UNTESTED! I wrote this in an email editor. I haven't assembled it or
> verified the semantics or workingness in any way, shape or form.
>
> Linus

Yes, I provided following version in my first answer but apparently it was too complex :)

ENTRY(cmpxchg8b_emu)
CFI_STARTPROC

/* disable interrupts */
pushf
cli

cmpl %eax,(%esi)
jne 1f
cmpl %edx,4(%esi)
jne 2f

mov %ebx,(%esi)
mov %ecx,4(%esi)

/* restore interrupts */
popf
ret
1:
mov (%esi), %eax
2:
mov 4(%esi), %edx
popf
ret
CFI_ENDPROC
ENDPROC(cmpxchg8b_emu)

2009-09-30 20:38:35

by Ingo Molnar

[permalink] [raw]
Subject: Re: Linux 2.6.32-rc1


* Ingo Molnar <[email protected]> wrote:

> * Linus Torvalds <[email protected]> wrote:
>
> > On Wed, 30 Sep 2009, Eric Dumazet wrote:
> > > > +#define cmpxchg64(ptr, o, n) \
> > > > +({ \
> > > > + __typeof__(*(ptr)) __ret; \
> > > > + __typeof__(*(ptr)) __old = (o); \
> > > > + __typeof__(*(ptr)) __new = (n); \
> > > > + alternative_io("call cmpxchg8b_emu", \
> > > > + "lock; cmpxchg8b (%%esi)" , \
> > > > + X86_FEATURE_CX8, \
> > > > + "=A" (__ret), \
> > > > + "S" ((ptr)), "0" (__old), \
> > > > + "b" ((unsigned int)__new), \
> > > > + "c" ((unsigned int)(__new>>32))); \
> > >
> > > probably a "memory" clobber is needed, alternative_io() doesnt provide it.
> >
> > Yeah, good catch.
>
> Yeah - i'm testing this right now, it looks like something that could
> explain the hang.

Yeah - that was the cause of the hang.

I've pushed out the updated patches - they should show up in this thread
soon. I also updated the emulation implementation to your simpler
variant. (will test that too)

Ingo

2009-09-30 20:41:19

by Arjan van de Ven

[permalink] [raw]
Subject: [tip:sched/urgent] x86: Provide an alternative() based cmpxchg64()

Commit-ID: e87aaa7b6d80474ec07a2d2be3f50ef574ffe36c
Gitweb: http://git.kernel.org/tip/e87aaa7b6d80474ec07a2d2be3f50ef574ffe36c
Author: Arjan van de Ven <[email protected]>
AuthorDate: Wed, 30 Sep 2009 17:07:54 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Wed, 30 Sep 2009 22:35:44 +0200

x86: Provide an alternative() based cmpxchg64()

cmpxchg64() today generates, to quote Linus, "barf bag" code.

cmpxchg64() is about to get used in the scheduler to a bug there,
but it's a prerequisite that cmpxchg64() first be made non-sucking.

This patch turns cmpxchg64() into an efficient implementation that
uses the alternative() mechanism to just use the raw instruction on
all modern systens

Note: the fallback is NOT smp safe, just like the current fallback
is not SMP safe. (Interested parties with i486 based SMP systems
are welcome to submit fix patches for that.)

Signed-off-by: Arjan van de Ven <[email protected]>
Acked-by: Linus Torvalds <[email protected]>
[ fixed asm constraint bug ]
Fixed-by: Eric Dumazet <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: John Stultz <[email protected]>
Cc: Peter Zijlstra <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>


---
arch/x86/include/asm/cmpxchg_32.h | 30 +++++++++++--------
arch/x86/kernel/i386_ksyms_32.c | 8 +++++
arch/x86/lib/Makefile | 2 +-
arch/x86/lib/cmpxchg8b_emu.S | 56 +++++++++++++++++++++++++++++++++++++
4 files changed, 82 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/cmpxchg_32.h b/arch/x86/include/asm/cmpxchg_32.h
index 82ceb78..ee1931b 100644
--- a/arch/x86/include/asm/cmpxchg_32.h
+++ b/arch/x86/include/asm/cmpxchg_32.h
@@ -312,19 +312,23 @@ static inline unsigned long cmpxchg_386(volatile void *ptr, unsigned long old,

extern unsigned long long cmpxchg_486_u64(volatile void *, u64, u64);

-#define cmpxchg64(ptr, o, n) \
-({ \
- __typeof__(*(ptr)) __ret; \
- if (likely(boot_cpu_data.x86 > 4)) \
- __ret = (__typeof__(*(ptr)))__cmpxchg64((ptr), \
- (unsigned long long)(o), \
- (unsigned long long)(n)); \
- else \
- __ret = (__typeof__(*(ptr)))cmpxchg_486_u64((ptr), \
- (unsigned long long)(o), \
- (unsigned long long)(n)); \
- __ret; \
-})
+#define cmpxchg64(ptr, o, n) \
+({ \
+ __typeof__(*(ptr)) __ret; \
+ __typeof__(*(ptr)) __old = (o); \
+ __typeof__(*(ptr)) __new = (n); \
+ alternative_io("call cmpxchg8b_emu", \
+ "lock; cmpxchg8b (%%esi)" , \
+ X86_FEATURE_CX8, \
+ "=A" (__ret), \
+ "S" ((ptr)), "0" (__old), \
+ "b" ((unsigned int)__new), \
+ "c" ((unsigned int)(__new>>32)) \
+ : "memory"); \
+ __ret; })
+
+
+
#define cmpxchg64_local(ptr, o, n) \
({ \
__typeof__(*(ptr)) __ret; \
diff --git a/arch/x86/kernel/i386_ksyms_32.c b/arch/x86/kernel/i386_ksyms_32.c
index 43cec6b..1736c5a 100644
--- a/arch/x86/kernel/i386_ksyms_32.c
+++ b/arch/x86/kernel/i386_ksyms_32.c
@@ -10,6 +10,14 @@
EXPORT_SYMBOL(mcount);
#endif

+/*
+ * Note, this is a prototype to get at the symbol for
+ * the export, but dont use it from C code, it is used
+ * by assembly code and is not using C calling convention!
+ */
+extern void cmpxchg8b_emu(void);
+EXPORT_SYMBOL(cmpxchg8b_emu);
+
/* Networking helper routines. */
EXPORT_SYMBOL(csum_partial_copy_generic);

diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index 9e60920..3e549b8 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -15,7 +15,7 @@ ifeq ($(CONFIG_X86_32),y)
obj-y += atomic64_32.o
lib-y += checksum_32.o
lib-y += strstr_32.o
- lib-y += semaphore_32.o string_32.o
+ lib-y += semaphore_32.o string_32.o cmpxchg8b_emu.o

lib-$(CONFIG_X86_USE_3DNOW) += mmx_32.o
else
diff --git a/arch/x86/lib/cmpxchg8b_emu.S b/arch/x86/lib/cmpxchg8b_emu.S
new file mode 100644
index 0000000..0c5e5fa
--- /dev/null
+++ b/arch/x86/lib/cmpxchg8b_emu.S
@@ -0,0 +1,56 @@
+/*
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ *
+ */
+
+#include <linux/linkage.h>
+#include <asm/alternative-asm.h>
+#include <asm/frame.h>
+#include <asm/dwarf2.h>
+
+
+.text
+
+/*
+ * Inputs:
+ * %esi : memory location to compare
+ * %eax : low 32 bits of old value
+ * %edx : high 32 bits of old value
+ * %ebx : low 32 bits of new value
+ * %ecx : high 32 bits of new value
+ */
+ENTRY(cmpxchg8b_emu)
+CFI_STARTPROC
+
+#
+# Emulate 'cmpxchg8b (%esi)' except we don't
+# set the whole ZF thing (caller will just
+# compare eax:edx with the expected value)
+#
+cmpxchg8b_emu:
+ pushfl
+ cli
+
+ cmpl (%esi), %eax
+ jne not_same
+ cmpl 4(%esi), %edx
+ jne not_same
+
+ movl %ebx, (%esi)
+ movl %ecx, 4(%esi)
+
+ popfl
+ ret
+
+ not_same:
+ movl (%esi), %eax
+ movl 4(%esi), %edx
+
+ popfl
+ ret
+
+CFI_ENDPROC
+ENDPROC(cmpxchg8b_emu)

2009-09-30 20:41:30

by Eric Dumazet

[permalink] [raw]
Subject: [tip:sched/urgent] sched_clock: Fix atomicity/continuity bug by using cmpxchg64()

Commit-ID: 623c4f89c47cca0088014388211d8bc0a5358f0a
Gitweb: http://git.kernel.org/tip/623c4f89c47cca0088014388211d8bc0a5358f0a
Author: Eric Dumazet <[email protected]>
AuthorDate: Wed, 30 Sep 2009 20:36:19 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Wed, 30 Sep 2009 22:36:12 +0200

sched_clock: Fix atomicity/continuity bug by using cmpxchg64()

Commit def0a9b2573 (sched_clock: Make it NMI safe) assumed
cmpxchg() of 64bit values was available on X86_32.

That is not so - and causes some subtle scheduler misbehavior due
to incorrect timestamps off to up by ~4 seconds.

Two symptoms are known right now:

- interactivity problems seen by Arjan: up to 600 msecs
latencies instead of the expected 20-40 msecs. These
latencies are very visible on the desktop.

- incorrect CPU stats: occasionally too high percentages in 'top',
and crazy CPU usage stats.

Reported-by: Martin Schwidefsky <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: Arjan van de Ven <[email protected]>
Acked-by: Linus Torvalds <[email protected]>
Cc: John Stultz <[email protected]>
Cc: Peter Zijlstra <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>


---
kernel/sched_clock.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sched_clock.c b/kernel/sched_clock.c
index ac2e1dc..479ce56 100644
--- a/kernel/sched_clock.c
+++ b/kernel/sched_clock.c
@@ -127,7 +127,7 @@ again:
clock = wrap_max(clock, min_clock);
clock = wrap_min(clock, max_clock);

- if (cmpxchg(&scd->clock, old_clock, clock) != old_clock)
+ if (cmpxchg64(&scd->clock, old_clock, clock) != old_clock)
goto again;

return clock;
@@ -163,7 +163,7 @@ again:
val = remote_clock;
}

- if (cmpxchg(ptr, old_val, val) != old_val)
+ if (cmpxchg64(ptr, old_val, val) != old_val)
goto again;

return val;

2009-09-30 20:42:40

by Linus Torvalds

[permalink] [raw]
Subject: Re: Linux 2.6.32-rc1



On Wed, 30 Sep 2009, Eric Dumazet wrote:
>
> Yes, I provided following version in my first answer but apparently it
> was too complex :)

It wasn't too complex, but YOU QUOTED THE WHOL F*CKING EMAIL.

I don't know about anybody else, but if I see a couple of screenfuls of
quotes, with no actual interesting new data, I just move on. I suspect
everybody else did too.

I really don't understand people who can't edit down their answer and only
quote the relevant parts. It makes it impossible to read the reply,
because it's hidden by all the old quoting.

Please spend the time to edit quoting levels, because otherwise people
won't spend the time to look at your new text.

Linus

2009-09-30 20:50:09

by Ingo Molnar

[permalink] [raw]
Subject: Re: Linux 2.6.32-rc1


* Linus Torvalds <[email protected]> wrote:

> On Wed, 30 Sep 2009, Eric Dumazet wrote:
> >
> > Yes, I provided following version in my first answer but apparently
> > it was too complex :)
>
> It wasn't too complex, but YOU QUOTED THE WHOL F*CKING EMAIL.
>
> I don't know about anybody else, but if I see a couple of screenfuls
> of quotes, with no actual interesting new data, I just move on. I
> suspect everybody else did too.

Yeah, i too missed it while reading through the thread.

Ingo

2009-09-30 20:54:31

by Eric Dumazet

[permalink] [raw]
Subject: Re: Linux 2.6.32-rc1

Ingo Molnar a ?crit :
>> On Wed, 30 Sep 2009, Eric Dumazet wrote:
>>> Yes, I provided following version in my first answer but apparently
>>> it was too complex :)
>
> Yeah, i too missed it while reading through the thread.
>
> Ingo

Well, I was refering to Arjan answer, not Linus/Ingo ones, and Arjan
did read my answer and said it was buggy... Its not a problem I
dont have 486 here anyway :)

You are all obviously right, I should more carefuly edit mails
before hit 'Envoyer' button.

Thanks

2009-09-30 20:54:41

by Linus Torvalds

[permalink] [raw]
Subject: Re: Linux 2.6.32-rc1



On Wed, 30 Sep 2009, Linus Torvalds wrote:
>
> It wasn't too complex, but YOU QUOTED THE WHOL F*CKING EMAIL.

Sorry for shouting, but it's a pet peeve.

Linus

2009-09-30 20:55:53

by Arjan van de Ven

[permalink] [raw]
Subject: [tip:sched/urgent] x86: Provide an alternative() based cmpxchg64()

Commit-ID: c2aaf6d1323b356d21ed71654a54c80c0c332756
Gitweb: http://git.kernel.org/tip/c2aaf6d1323b356d21ed71654a54c80c0c332756
Author: Arjan van de Ven <[email protected]>
AuthorDate: Wed, 30 Sep 2009 17:07:54 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Wed, 30 Sep 2009 22:51:08 +0200

x86: Provide an alternative() based cmpxchg64()

cmpxchg64() today generates, to quote Linus, "barf bag" code.

cmpxchg64() is about to get used in the scheduler to fix a bug there,
but it's a prerequisite that cmpxchg64() first be made non-sucking.

This patch turns cmpxchg64() into an efficient implementation that
uses the alternative() mechanism to just use the raw instruction on
all modern systems.

Note: the fallback is NOT smp safe, just like the current fallback
is not SMP safe. (Interested parties with i486 based SMP systems
are welcome to submit fix patches for that.)

Signed-off-by: Arjan van de Ven <[email protected]>
Acked-by: Linus Torvalds <[email protected]>
[ fixed asm constraint bug ]
Fixed-by: Eric Dumazet <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: John Stultz <[email protected]>
Cc: Peter Zijlstra <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>


---
arch/x86/include/asm/cmpxchg_32.h | 30 +++++++++++--------
arch/x86/kernel/i386_ksyms_32.c | 8 +++++
arch/x86/lib/Makefile | 2 +-
arch/x86/lib/cmpxchg8b_emu.S | 56 +++++++++++++++++++++++++++++++++++++
4 files changed, 82 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/cmpxchg_32.h b/arch/x86/include/asm/cmpxchg_32.h
index 82ceb78..ee1931b 100644
--- a/arch/x86/include/asm/cmpxchg_32.h
+++ b/arch/x86/include/asm/cmpxchg_32.h
@@ -312,19 +312,23 @@ static inline unsigned long cmpxchg_386(volatile void *ptr, unsigned long old,

extern unsigned long long cmpxchg_486_u64(volatile void *, u64, u64);

-#define cmpxchg64(ptr, o, n) \
-({ \
- __typeof__(*(ptr)) __ret; \
- if (likely(boot_cpu_data.x86 > 4)) \
- __ret = (__typeof__(*(ptr)))__cmpxchg64((ptr), \
- (unsigned long long)(o), \
- (unsigned long long)(n)); \
- else \
- __ret = (__typeof__(*(ptr)))cmpxchg_486_u64((ptr), \
- (unsigned long long)(o), \
- (unsigned long long)(n)); \
- __ret; \
-})
+#define cmpxchg64(ptr, o, n) \
+({ \
+ __typeof__(*(ptr)) __ret; \
+ __typeof__(*(ptr)) __old = (o); \
+ __typeof__(*(ptr)) __new = (n); \
+ alternative_io("call cmpxchg8b_emu", \
+ "lock; cmpxchg8b (%%esi)" , \
+ X86_FEATURE_CX8, \
+ "=A" (__ret), \
+ "S" ((ptr)), "0" (__old), \
+ "b" ((unsigned int)__new), \
+ "c" ((unsigned int)(__new>>32)) \
+ : "memory"); \
+ __ret; })
+
+
+
#define cmpxchg64_local(ptr, o, n) \
({ \
__typeof__(*(ptr)) __ret; \
diff --git a/arch/x86/kernel/i386_ksyms_32.c b/arch/x86/kernel/i386_ksyms_32.c
index 43cec6b..1736c5a 100644
--- a/arch/x86/kernel/i386_ksyms_32.c
+++ b/arch/x86/kernel/i386_ksyms_32.c
@@ -10,6 +10,14 @@
EXPORT_SYMBOL(mcount);
#endif

+/*
+ * Note, this is a prototype to get at the symbol for
+ * the export, but dont use it from C code, it is used
+ * by assembly code and is not using C calling convention!
+ */
+extern void cmpxchg8b_emu(void);
+EXPORT_SYMBOL(cmpxchg8b_emu);
+
/* Networking helper routines. */
EXPORT_SYMBOL(csum_partial_copy_generic);

diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index 9e60920..3e549b8 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -15,7 +15,7 @@ ifeq ($(CONFIG_X86_32),y)
obj-y += atomic64_32.o
lib-y += checksum_32.o
lib-y += strstr_32.o
- lib-y += semaphore_32.o string_32.o
+ lib-y += semaphore_32.o string_32.o cmpxchg8b_emu.o

lib-$(CONFIG_X86_USE_3DNOW) += mmx_32.o
else
diff --git a/arch/x86/lib/cmpxchg8b_emu.S b/arch/x86/lib/cmpxchg8b_emu.S
new file mode 100644
index 0000000..0c5e5fa
--- /dev/null
+++ b/arch/x86/lib/cmpxchg8b_emu.S
@@ -0,0 +1,56 @@
+/*
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ *
+ */
+
+#include <linux/linkage.h>
+#include <asm/alternative-asm.h>
+#include <asm/frame.h>
+#include <asm/dwarf2.h>
+
+
+.text
+
+/*
+ * Inputs:
+ * %esi : memory location to compare
+ * %eax : low 32 bits of old value
+ * %edx : high 32 bits of old value
+ * %ebx : low 32 bits of new value
+ * %ecx : high 32 bits of new value
+ */
+ENTRY(cmpxchg8b_emu)
+CFI_STARTPROC
+
+#
+# Emulate 'cmpxchg8b (%esi)' except we don't
+# set the whole ZF thing (caller will just
+# compare eax:edx with the expected value)
+#
+cmpxchg8b_emu:
+ pushfl
+ cli
+
+ cmpl (%esi), %eax
+ jne not_same
+ cmpl 4(%esi), %edx
+ jne not_same
+
+ movl %ebx, (%esi)
+ movl %ecx, 4(%esi)
+
+ popfl
+ ret
+
+ not_same:
+ movl (%esi), %eax
+ movl 4(%esi), %edx
+
+ popfl
+ ret
+
+CFI_ENDPROC
+ENDPROC(cmpxchg8b_emu)

2009-09-30 20:56:05

by Eric Dumazet

[permalink] [raw]
Subject: [tip:sched/urgent] sched_clock: Fix atomicity/continuity bug by using cmpxchg64()

Commit-ID: c7eca501d2ac092f2f58452a6f962249e2ab5dbe
Gitweb: http://git.kernel.org/tip/c7eca501d2ac092f2f58452a6f962249e2ab5dbe
Author: Eric Dumazet <[email protected]>
AuthorDate: Wed, 30 Sep 2009 20:36:19 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Wed, 30 Sep 2009 22:51:51 +0200

sched_clock: Fix atomicity/continuity bug by using cmpxchg64()

Commit def0a9b2573 (sched_clock: Make it NMI safe) assumed
cmpxchg() of 64bit values was available on X86_32.

That is not so - and causes some subtle scheduler misbehavior due
to incorrect timestamps off to up by ~4 seconds.

Two symptoms are known right now:

- interactivity problems seen by Arjan: up to 600 msecs
latencies instead of the expected 20-40 msecs. These
latencies are very visible on the desktop.

- incorrect CPU stats: occasionally too high percentages in 'top',
and crazy CPU usage stats.

Reported-by: Martin Schwidefsky <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: Arjan van de Ven <[email protected]>
Acked-by: Linus Torvalds <[email protected]>
Cc: John Stultz <[email protected]>
Cc: Peter Zijlstra <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>


---
kernel/sched_clock.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sched_clock.c b/kernel/sched_clock.c
index ac2e1dc..479ce56 100644
--- a/kernel/sched_clock.c
+++ b/kernel/sched_clock.c
@@ -127,7 +127,7 @@ again:
clock = wrap_max(clock, min_clock);
clock = wrap_min(clock, max_clock);

- if (cmpxchg(&scd->clock, old_clock, clock) != old_clock)
+ if (cmpxchg64(&scd->clock, old_clock, clock) != old_clock)
goto again;

return clock;
@@ -163,7 +163,7 @@ again:
val = remote_clock;
}

- if (cmpxchg(ptr, old_val, val) != old_val)
+ if (cmpxchg64(ptr, old_val, val) != old_val)
goto again;

return val;

2009-09-30 20:58:28

by Ingo Molnar

[permalink] [raw]
Subject: Re: [tip:sched/urgent] x86: Provide an alternative() based cmpxchg64()


* tip-bot for Arjan van de Ven <[email protected]> wrote:

> Commit-ID: e87aaa7b6d80474ec07a2d2be3f50ef574ffe36c
> Gitweb: http://git.kernel.org/tip/e87aaa7b6d80474ec07a2d2be3f50ef574ffe36c
> Author: Arjan van de Ven <[email protected]>
> AuthorDate: Wed, 30 Sep 2009 17:07:54 +0200
> Committer: Ingo Molnar <[email protected]>
> CommitDate: Wed, 30 Sep 2009 22:35:44 +0200
>
> x86: Provide an alternative() based cmpxchg64()
>
> cmpxchg64() today generates, to quote Linus, "barf bag" code.
>
> cmpxchg64() is about to get used in the scheduler to a bug there,
> but it's a prerequisite that cmpxchg64() first be made non-sucking.
>
> This patch turns cmpxchg64() into an efficient implementation that
> uses the alternative() mechanism to just use the raw instruction on
> all modern systens

The changelog above has 3 typos so i fixed that. The push will generate
the hopefully-final version of these patches.

Note, i slightly updated Linus's version of the old-CPU emu code with
Eric's micro-optimization.

Ingo

2009-09-30 21:01:11

by Ingo Molnar

[permalink] [raw]
Subject: Re: Linux 2.6.32-rc1


* Eric Dumazet <[email protected]> wrote:

> Ingo Molnar a ?crit :
> >> On Wed, 30 Sep 2009, Eric Dumazet wrote:
> >>> Yes, I provided following version in my first answer but apparently
> >>> it was too complex :)
> >
> > Yeah, i too missed it while reading through the thread.
> >
> > Ingo
>
> Well, I was refering to Arjan answer, not Linus/Ingo ones, and Arjan
> did read my answer and said it was buggy... Its not a problem I dont
> have 486 here anyway :)

I picked up the improvement that is in your version nevertheless. Even
if it doesnt matter in practice we dont drop good code :)

> You are all obviously right, I should more carefuly edit mails before
> hit 'Envoyer' button.

It happens. Since email on lkml is the precursor to real code, one
should generally treat it with the same high level of taste you treat
the final code with.

Ingo

2009-09-30 21:01:48

by Arjan van de Ven

[permalink] [raw]
Subject: [tip:sched/urgent] x86: Provide an alternative() based cmpxchg64()

Commit-ID: 79e1dd05d1a22e95ab6d54d21836f478b3b56976
Gitweb: http://git.kernel.org/tip/79e1dd05d1a22e95ab6d54d21836f478b3b56976
Author: Arjan van de Ven <[email protected]>
AuthorDate: Wed, 30 Sep 2009 17:07:54 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Wed, 30 Sep 2009 22:55:59 +0200

x86: Provide an alternative() based cmpxchg64()

cmpxchg64() today generates, to quote Linus, "barf bag" code.

cmpxchg64() is about to get used in the scheduler to fix a bug there,
but it's a prerequisite that cmpxchg64() first be made non-sucking.

This patch turns cmpxchg64() into an efficient implementation that
uses the alternative() mechanism to just use the raw instruction on
all modern systems.

Note: the fallback is NOT smp safe, just like the current fallback
is not SMP safe. (Interested parties with i486 based SMP systems
are welcome to submit fix patches for that.)

Signed-off-by: Arjan van de Ven <[email protected]>
Acked-by: Linus Torvalds <[email protected]>
[ fixed asm constraint bug ]
Fixed-by: Eric Dumazet <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: John Stultz <[email protected]>
Cc: Peter Zijlstra <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>


---
arch/x86/include/asm/cmpxchg_32.h | 30 +++++++++++--------
arch/x86/kernel/i386_ksyms_32.c | 8 +++++
arch/x86/lib/Makefile | 2 +-
arch/x86/lib/cmpxchg8b_emu.S | 57 +++++++++++++++++++++++++++++++++++++
4 files changed, 83 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/cmpxchg_32.h b/arch/x86/include/asm/cmpxchg_32.h
index 82ceb78..ee1931b 100644
--- a/arch/x86/include/asm/cmpxchg_32.h
+++ b/arch/x86/include/asm/cmpxchg_32.h
@@ -312,19 +312,23 @@ static inline unsigned long cmpxchg_386(volatile void *ptr, unsigned long old,

extern unsigned long long cmpxchg_486_u64(volatile void *, u64, u64);

-#define cmpxchg64(ptr, o, n) \
-({ \
- __typeof__(*(ptr)) __ret; \
- if (likely(boot_cpu_data.x86 > 4)) \
- __ret = (__typeof__(*(ptr)))__cmpxchg64((ptr), \
- (unsigned long long)(o), \
- (unsigned long long)(n)); \
- else \
- __ret = (__typeof__(*(ptr)))cmpxchg_486_u64((ptr), \
- (unsigned long long)(o), \
- (unsigned long long)(n)); \
- __ret; \
-})
+#define cmpxchg64(ptr, o, n) \
+({ \
+ __typeof__(*(ptr)) __ret; \
+ __typeof__(*(ptr)) __old = (o); \
+ __typeof__(*(ptr)) __new = (n); \
+ alternative_io("call cmpxchg8b_emu", \
+ "lock; cmpxchg8b (%%esi)" , \
+ X86_FEATURE_CX8, \
+ "=A" (__ret), \
+ "S" ((ptr)), "0" (__old), \
+ "b" ((unsigned int)__new), \
+ "c" ((unsigned int)(__new>>32)) \
+ : "memory"); \
+ __ret; })
+
+
+
#define cmpxchg64_local(ptr, o, n) \
({ \
__typeof__(*(ptr)) __ret; \
diff --git a/arch/x86/kernel/i386_ksyms_32.c b/arch/x86/kernel/i386_ksyms_32.c
index 43cec6b..1736c5a 100644
--- a/arch/x86/kernel/i386_ksyms_32.c
+++ b/arch/x86/kernel/i386_ksyms_32.c
@@ -10,6 +10,14 @@
EXPORT_SYMBOL(mcount);
#endif

+/*
+ * Note, this is a prototype to get at the symbol for
+ * the export, but dont use it from C code, it is used
+ * by assembly code and is not using C calling convention!
+ */
+extern void cmpxchg8b_emu(void);
+EXPORT_SYMBOL(cmpxchg8b_emu);
+
/* Networking helper routines. */
EXPORT_SYMBOL(csum_partial_copy_generic);

diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index 9e60920..3e549b8 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -15,7 +15,7 @@ ifeq ($(CONFIG_X86_32),y)
obj-y += atomic64_32.o
lib-y += checksum_32.o
lib-y += strstr_32.o
- lib-y += semaphore_32.o string_32.o
+ lib-y += semaphore_32.o string_32.o cmpxchg8b_emu.o

lib-$(CONFIG_X86_USE_3DNOW) += mmx_32.o
else
diff --git a/arch/x86/lib/cmpxchg8b_emu.S b/arch/x86/lib/cmpxchg8b_emu.S
new file mode 100644
index 0000000..828cb71
--- /dev/null
+++ b/arch/x86/lib/cmpxchg8b_emu.S
@@ -0,0 +1,57 @@
+/*
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ *
+ */
+
+#include <linux/linkage.h>
+#include <asm/alternative-asm.h>
+#include <asm/frame.h>
+#include <asm/dwarf2.h>
+
+
+.text
+
+/*
+ * Inputs:
+ * %esi : memory location to compare
+ * %eax : low 32 bits of old value
+ * %edx : high 32 bits of old value
+ * %ebx : low 32 bits of new value
+ * %ecx : high 32 bits of new value
+ */
+ENTRY(cmpxchg8b_emu)
+CFI_STARTPROC
+
+#
+# Emulate 'cmpxchg8b (%esi)' on UP except we don't
+# set the whole ZF thing (caller will just compare
+# eax:edx with the expected value)
+#
+cmpxchg8b_emu:
+ pushfl
+ cli
+
+ cmpl (%esi), %eax
+ jne not_same
+ cmpl 4(%esi), %edx
+ jne half_same
+
+ movl %ebx, (%esi)
+ movl %ecx, 4(%esi)
+
+ popfl
+ ret
+
+ not_same:
+ movl (%esi), %eax
+ half_same:
+ movl 4(%esi), %edx
+
+ popfl
+ ret
+
+CFI_ENDPROC
+ENDPROC(cmpxchg8b_emu)

2009-09-30 21:01:49

by Eric Dumazet

[permalink] [raw]
Subject: [tip:sched/urgent] sched_clock: Fix atomicity/continuity bug by using cmpxchg64()

Commit-ID: 152f9d0710a62708710161bce1b29fa8292c8c11
Gitweb: http://git.kernel.org/tip/152f9d0710a62708710161bce1b29fa8292c8c11
Author: Eric Dumazet <[email protected]>
AuthorDate: Wed, 30 Sep 2009 20:36:19 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Wed, 30 Sep 2009 22:56:10 +0200

sched_clock: Fix atomicity/continuity bug by using cmpxchg64()

Commit def0a9b2573 (sched_clock: Make it NMI safe) assumed
cmpxchg() of 64bit values was available on X86_32.

That is not so - and causes some subtle scheduler misbehavior due
to incorrect timestamps off to up by ~4 seconds.

Two symptoms are known right now:

- interactivity problems seen by Arjan: up to 600 msecs
latencies instead of the expected 20-40 msecs. These
latencies are very visible on the desktop.

- incorrect CPU stats: occasionally too high percentages in 'top',
and crazy CPU usage stats.

Reported-by: Martin Schwidefsky <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: Arjan van de Ven <[email protected]>
Acked-by: Linus Torvalds <[email protected]>
Cc: John Stultz <[email protected]>
Cc: Peter Zijlstra <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>


---
kernel/sched_clock.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sched_clock.c b/kernel/sched_clock.c
index ac2e1dc..479ce56 100644
--- a/kernel/sched_clock.c
+++ b/kernel/sched_clock.c
@@ -127,7 +127,7 @@ again:
clock = wrap_max(clock, min_clock);
clock = wrap_min(clock, max_clock);

- if (cmpxchg(&scd->clock, old_clock, clock) != old_clock)
+ if (cmpxchg64(&scd->clock, old_clock, clock) != old_clock)
goto again;

return clock;
@@ -163,7 +163,7 @@ again:
val = remote_clock;
}

- if (cmpxchg(ptr, old_val, val) != old_val)
+ if (cmpxchg64(ptr, old_val, val) != old_val)
goto again;

return val;

2009-09-30 21:53:34

by David Miller

[permalink] [raw]
Subject: Re: Linux 2.6.32-rc1

From: Linus Torvalds <[email protected]>
Date: Wed, 30 Sep 2009 13:41:26 -0700 (PDT)

> Please spend the time to edit quoting levels, because otherwise people
> won't spend the time to look at your new text.

Indeed, Eric, please do. This has been bugging me in networking land
for quite some time as well.

If you notice, when I comment on patches I kill all of the patch from
the reply except for the specific hunks I want to comment on.

If one has to scroll multiple screens just to see what you have to
say, that's a burdon on every person who you might get feedback
from.

2009-09-30 22:04:11

by Ingo Molnar

[permalink] [raw]
Subject: [GIT PULL] scheduler fixes


* Ingo Molnar <[email protected]> wrote:

> * Linus Torvalds <[email protected]> wrote:
>
> > Ingo - I suspect both those patches should just go through you. You
> > do both x86 and scheduler, so I'll happily pull the end result.
>
> Yeah - working on it - just got back from a trip. It's two risky
> patches and if that place breaks everyone will be affected so i'll
> probably send the pull request tomorrow.

Ok, it now looks good in all tests here so Linus please pull the latest
sched-fixes-for-linus git tree from:

git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git sched-fixes-for-linus

Thanks,

Ingo

------------------>
Arjan van de Ven (1):
x86: Provide an alternative() based cmpxchg64()

Eric Dumazet (1):
sched_clock: Fix atomicity/continuity bug by using cmpxchg64()


arch/x86/include/asm/cmpxchg_32.h | 30 +++++++++++--------
arch/x86/kernel/i386_ksyms_32.c | 8 +++++
arch/x86/lib/Makefile | 2 +-
arch/x86/lib/cmpxchg8b_emu.S | 57 +++++++++++++++++++++++++++++++++++++
kernel/sched_clock.c | 4 +-
5 files changed, 85 insertions(+), 16 deletions(-)
create mode 100644 arch/x86/lib/cmpxchg8b_emu.S

diff --git a/arch/x86/include/asm/cmpxchg_32.h b/arch/x86/include/asm/cmpxchg_32.h
index 82ceb78..ee1931b 100644
--- a/arch/x86/include/asm/cmpxchg_32.h
+++ b/arch/x86/include/asm/cmpxchg_32.h
@@ -312,19 +312,23 @@ static inline unsigned long cmpxchg_386(volatile void *ptr, unsigned long old,

extern unsigned long long cmpxchg_486_u64(volatile void *, u64, u64);

-#define cmpxchg64(ptr, o, n) \
-({ \
- __typeof__(*(ptr)) __ret; \
- if (likely(boot_cpu_data.x86 > 4)) \
- __ret = (__typeof__(*(ptr)))__cmpxchg64((ptr), \
- (unsigned long long)(o), \
- (unsigned long long)(n)); \
- else \
- __ret = (__typeof__(*(ptr)))cmpxchg_486_u64((ptr), \
- (unsigned long long)(o), \
- (unsigned long long)(n)); \
- __ret; \
-})
+#define cmpxchg64(ptr, o, n) \
+({ \
+ __typeof__(*(ptr)) __ret; \
+ __typeof__(*(ptr)) __old = (o); \
+ __typeof__(*(ptr)) __new = (n); \
+ alternative_io("call cmpxchg8b_emu", \
+ "lock; cmpxchg8b (%%esi)" , \
+ X86_FEATURE_CX8, \
+ "=A" (__ret), \
+ "S" ((ptr)), "0" (__old), \
+ "b" ((unsigned int)__new), \
+ "c" ((unsigned int)(__new>>32)) \
+ : "memory"); \
+ __ret; })
+
+
+
#define cmpxchg64_local(ptr, o, n) \
({ \
__typeof__(*(ptr)) __ret; \
diff --git a/arch/x86/kernel/i386_ksyms_32.c b/arch/x86/kernel/i386_ksyms_32.c
index 43cec6b..1736c5a 100644
--- a/arch/x86/kernel/i386_ksyms_32.c
+++ b/arch/x86/kernel/i386_ksyms_32.c
@@ -10,6 +10,14 @@
EXPORT_SYMBOL(mcount);
#endif

+/*
+ * Note, this is a prototype to get at the symbol for
+ * the export, but dont use it from C code, it is used
+ * by assembly code and is not using C calling convention!
+ */
+extern void cmpxchg8b_emu(void);
+EXPORT_SYMBOL(cmpxchg8b_emu);
+
/* Networking helper routines. */
EXPORT_SYMBOL(csum_partial_copy_generic);

diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index 9e60920..3e549b8 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -15,7 +15,7 @@ ifeq ($(CONFIG_X86_32),y)
obj-y += atomic64_32.o
lib-y += checksum_32.o
lib-y += strstr_32.o
- lib-y += semaphore_32.o string_32.o
+ lib-y += semaphore_32.o string_32.o cmpxchg8b_emu.o

lib-$(CONFIG_X86_USE_3DNOW) += mmx_32.o
else
diff --git a/arch/x86/lib/cmpxchg8b_emu.S b/arch/x86/lib/cmpxchg8b_emu.S
new file mode 100644
index 0000000..828cb71
--- /dev/null
+++ b/arch/x86/lib/cmpxchg8b_emu.S
@@ -0,0 +1,57 @@
+/*
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ *
+ */
+
+#include <linux/linkage.h>
+#include <asm/alternative-asm.h>
+#include <asm/frame.h>
+#include <asm/dwarf2.h>
+
+
+.text
+
+/*
+ * Inputs:
+ * %esi : memory location to compare
+ * %eax : low 32 bits of old value
+ * %edx : high 32 bits of old value
+ * %ebx : low 32 bits of new value
+ * %ecx : high 32 bits of new value
+ */
+ENTRY(cmpxchg8b_emu)
+CFI_STARTPROC
+
+#
+# Emulate 'cmpxchg8b (%esi)' on UP except we don't
+# set the whole ZF thing (caller will just compare
+# eax:edx with the expected value)
+#
+cmpxchg8b_emu:
+ pushfl
+ cli
+
+ cmpl (%esi), %eax
+ jne not_same
+ cmpl 4(%esi), %edx
+ jne half_same
+
+ movl %ebx, (%esi)
+ movl %ecx, 4(%esi)
+
+ popfl
+ ret
+
+ not_same:
+ movl (%esi), %eax
+ half_same:
+ movl 4(%esi), %edx
+
+ popfl
+ ret
+
+CFI_ENDPROC
+ENDPROC(cmpxchg8b_emu)
diff --git a/kernel/sched_clock.c b/kernel/sched_clock.c
index ac2e1dc..479ce56 100644
--- a/kernel/sched_clock.c
+++ b/kernel/sched_clock.c
@@ -127,7 +127,7 @@ again:
clock = wrap_max(clock, min_clock);
clock = wrap_min(clock, max_clock);

- if (cmpxchg(&scd->clock, old_clock, clock) != old_clock)
+ if (cmpxchg64(&scd->clock, old_clock, clock) != old_clock)
goto again;

return clock;
@@ -163,7 +163,7 @@ again:
val = remote_clock;
}

- if (cmpxchg(ptr, old_val, val) != old_val)
+ if (cmpxchg64(ptr, old_val, val) != old_val)
goto again;

return val;

2009-10-01 00:47:58

by Yuhong Bao

[permalink] [raw]
Subject: RE: Linux 2.6.32-rc1


<[email protected]>
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0




----------------------------------------
> Date: Wed=2C 30 Sep 2009 17:31:02 +0200
> From: [email protected]
> To: [email protected]
> CC: [email protected]=3B [email protected]=3B tglx@linut=
ronix.de=3B [email protected]=3B [email protected]=3B a.p.zijl=
stra@c
> Subject: Re: Linux 2.6.32-rc1
>
> On Wed=2C 30 Sep 2009 17:27:05 +0200
> Eric Dumazet wrote:
>>
>>> + pop %edi
>> Why do you pop flags in edi=2C to later re-push them ?
>>
>>> + cli
>
> because here I disable interrupts
But popping the flags to edi to later repush them should not be necessary u=
nless you are switching esp.


Getting rid of this will allow the push/pop edi at the beginning/end to
be eliminated=2C reducing code size as well as increasing the speed.
> (basically this is local_irq_save() )
>>
>>
>>> + xchg (%esi)=2C %ebx
>>> + xchg 4(%esi)=2C %ecx
>> How this sequence is guaranteed to be atomic with other cpus ?
>
> it is not. this is the 486 implementation which is !SMP
> (just like the current cmpxchg64() fallback)



BTW=2C NT4 had a SMP version of this emulation that used a simple spinlock:
http://www.geoffchappell.com/viewer.htm?doc=3Dstudies/windows/km/cpu/cx8.ht=
m

Yuhong Bao
=0A=
_________________________________________________________________=0A=
Insert movie times and more without leaving Hotmail=AE.=0A=
http://windowslive.com/Tutorial/Hotmail/QuickAdd?ocid=3DTXT_TAGLM_WL_HM_Tut=
orial_QuickAdd_062009=

2009-10-01 00:43:20

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] scheduler fixes



On Thu, 1 Oct 2009, Ingo Molnar wrote:
>
> Ok, it now looks good in all tests here so Linus please pull the latest
> sched-fixes-for-linus git tree from:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git sched-fixes-for-linus

Pulled. I'd also like to see something that actually fails gracefully on
cmpxchg() of an unsupported size, though. The silent failure still annoys
me.

Big thanks to Eric for noticing it, it's that dangerous kind of "but it
looks perfectly sane in C code" things.

Linus

2009-10-01 00:58:13

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] scheduler fixes



On Wed, 30 Sep 2009, Linus Torvalds wrote:
>
> Pulled. I'd also like to see something that actually fails gracefully on
> cmpxchg() of an unsupported size, though. The silent failure still annoys
> me.

Oh, and we probably should try to avoid the 'alternates()' code when we
can statically determine that cmpxchg8b is fine. We already have that
CONFIG_x86_CMPXCHG64 (enabled by PAE support), and we could easily also
enable it for some of the CPU cases.

[ If I recall correctly, cmpxchg8b support detection was something that
was totally messed up in WNT, and I have this memory of TMTA _not_
claiming to support CX8 in the cpuid bits, but happily supporting the
instruction itself - because otherwise WNT would blue-screen on boot or
something silly like that.

As a result, the patch below only adds CMPXCHG8B for the obvious Intel
CPU's, not for others. There was something really messy about cmpxchg8b
and clone CPU's, but I'm not 100% sure about the details, so take this
patch with a pinch of salt, and some thinking ]

If we avoid that asm-alternative thing when we can assume the instruction
exists, we'll generate less support crud, and we'll avoid the whole issue
with that extra 'nop' for padding instruction sizes etc.

Linus

---
arch/x86/Kconfig.cpu | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/x86/Kconfig.cpu b/arch/x86/Kconfig.cpu
index 527519b..f2824fb 100644
--- a/arch/x86/Kconfig.cpu
+++ b/arch/x86/Kconfig.cpu
@@ -400,7 +400,7 @@ config X86_TSC

config X86_CMPXCHG64
def_bool y
- depends on X86_PAE || X86_64
+ depends on X86_PAE || X86_64 || MCORE2 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MATOM

# this should be set for all -march=.. options where the compiler
# generates cmov.
@@ -412,6 +412,7 @@ config X86_MINIMUM_CPU_FAMILY
int
default "64" if X86_64
default "6" if X86_32 && X86_P6_NOP
+ default "5" if X86_32 && X86_CMPXCHG64
default "4" if X86_32 && (X86_XADD || X86_CMPXCHG || X86_BSWAP || X86_WP_WORKS_OK)
default "3"


2009-10-01 05:31:05

by Eric Dumazet

[permalink] [raw]
Subject: Re: [GIT PULL] scheduler fixes

Linus Torvalds a ?crit :
>
> On Wed, 30 Sep 2009, Linus Torvalds wrote:
> Oh, and we probably should try to avoid the 'alternates()' code when we
> can statically determine that cmpxchg8b is fine. We already have that
> CONFIG_x86_CMPXCHG64 (enabled by PAE support), and we could easily also
> enable it for some of the CPU cases.
>

Agreed.

Please find following patch to conditionaly compile cmpxchg8b_emu.o and
EXPORT_SYMBOL(cmpxchg8b_emu).

This patch doesnt depend on yours.

Thanks

[PATCH] x86: Dont generate cmpxchg8b_emu if CONFIG_X86_CMPXCHG64=y

cmpxchg8b_emu helper wont be used if CONFIG_X86_CMPXCHG64=y

Signed-off-by: Eric Dumazet <[email protected]>
---
arch/x86/kernel/i386_ksyms_32.c | 2 ++
arch/x86/lib/Makefile | 6 ++++--
2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/i386_ksyms_32.c b/arch/x86/kernel/i386_ksyms_32.c
index 1736c5a..9c3bd4a 100644
--- a/arch/x86/kernel/i386_ksyms_32.c
+++ b/arch/x86/kernel/i386_ksyms_32.c
@@ -15,8 +15,10 @@ EXPORT_SYMBOL(mcount);
* the export, but dont use it from C code, it is used
* by assembly code and is not using C calling convention!
*/
+#ifndef CONFIG_X86_CMPXCHG64
extern void cmpxchg8b_emu(void);
EXPORT_SYMBOL(cmpxchg8b_emu);
+#endif

/* Networking helper routines. */
EXPORT_SYMBOL(csum_partial_copy_generic);
diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index 3e549b8..c309b82 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -15,8 +15,10 @@ ifeq ($(CONFIG_X86_32),y)
obj-y += atomic64_32.o
lib-y += checksum_32.o
lib-y += strstr_32.o
- lib-y += semaphore_32.o string_32.o cmpxchg8b_emu.o
-
+ lib-y += semaphore_32.o string_32.o
+ifeq ($(CONFIG_X86_CMPXCHG64),n)
+ lib-y += cmpxchg8b_emu.o
+endif
lib-$(CONFIG_X86_USE_3DNOW) += mmx_32.o
else
obj-y += io_64.o iomap_copy_64.o

2009-10-01 06:05:23

by Ingo Molnar

[permalink] [raw]
Subject: Re: [GIT PULL] scheduler fixes


* Linus Torvalds <[email protected]> wrote:

> On Thu, 1 Oct 2009, Ingo Molnar wrote:
> >
> > Ok, it now looks good in all tests here so Linus please pull the latest
> > sched-fixes-for-linus git tree from:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git sched-fixes-for-linus
>
> Pulled. I'd also like to see something that actually fails gracefully
> on cmpxchg() of an unsupported size, though. The silent failure still
> annoys me.

Yeah. It even works fine on 64-bit which makes it double dangerous.

Ingo

2009-10-01 06:11:26

by Ingo Molnar

[permalink] [raw]
Subject: Re: [GIT PULL] scheduler fixes


* Eric Dumazet <[email protected]> wrote:

> [PATCH] x86: Dont generate cmpxchg8b_emu if CONFIG_X86_CMPXCHG64=y
>
> cmpxchg8b_emu helper wont be used if CONFIG_X86_CMPXCHG64=y
>
> Signed-off-by: Eric Dumazet <[email protected]>
> ---
> arch/x86/kernel/i386_ksyms_32.c | 2 ++
> arch/x86/lib/Makefile | 6 ++++--
> 2 files changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kernel/i386_ksyms_32.c b/arch/x86/kernel/i386_ksyms_32.c
> index 1736c5a..9c3bd4a 100644
> --- a/arch/x86/kernel/i386_ksyms_32.c
> +++ b/arch/x86/kernel/i386_ksyms_32.c
> @@ -15,8 +15,10 @@ EXPORT_SYMBOL(mcount);
> * the export, but dont use it from C code, it is used
> * by assembly code and is not using C calling convention!
> */
> +#ifndef CONFIG_X86_CMPXCHG64
> extern void cmpxchg8b_emu(void);
> EXPORT_SYMBOL(cmpxchg8b_emu);
> +#endif
>
> /* Networking helper routines. */
> EXPORT_SYMBOL(csum_partial_copy_generic);
> diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
> index 3e549b8..c309b82 100644
> --- a/arch/x86/lib/Makefile
> +++ b/arch/x86/lib/Makefile
> @@ -15,8 +15,10 @@ ifeq ($(CONFIG_X86_32),y)
> obj-y += atomic64_32.o
> lib-y += checksum_32.o
> lib-y += strstr_32.o
> - lib-y += semaphore_32.o string_32.o cmpxchg8b_emu.o
> -
> + lib-y += semaphore_32.o string_32.o
> +ifeq ($(CONFIG_X86_CMPXCHG64),n)
> + lib-y += cmpxchg8b_emu.o
> +endif
> lib-$(CONFIG_X86_USE_3DNOW) += mmx_32.o
> else
> obj-y += io_64.o iomap_copy_64.o

That's not enough - the inline assembly code in
arch/x86/include/asm/cmpxchg_32.h should also not reference
cmpxchg8b_emu in that case.

Ingo

2009-10-01 06:19:23

by Eric Dumazet

[permalink] [raw]
Subject: Re: [GIT PULL] scheduler fixes

Ingo Molnar a ?crit :
>
> That's not enough - the inline assembly code in
> arch/x86/include/asm/cmpxchg_32.h should also not reference
> cmpxchg8b_emu in that case.
>

I believe it is OK as is, since cmpxchg8b_emu is referenced only
once, in a section guarded by :
#ifndef CONFIG_X86_CMPXCHG64

(But I'll re-check in a couple of hours)

extract from arch/x86/include/asm/cmpxchg_32.h :


#ifndef CONFIG_X86_CMPXCHG64


#define cmpxchg64(ptr, o, n) \
({ \
__typeof__(*(ptr)) __ret; \
__typeof__(*(ptr)) __old = (o); \
__typeof__(*(ptr)) __new = (n); \
alternative_io("call cmpxchg8b_emu", \
"lock; cmpxchg8b (%%esi)" , \
X86_FEATURE_CX8, \
"=A" (__ret), \
"S" ((ptr)), "0" (__old), \
"b" ((unsigned int)__new), \
"c" ((unsigned int)(__new>>32)) \
: "memory"); \
__ret; })

2009-10-01 06:41:31

by Linus Torvalds

[permalink] [raw]
Subject: [tip:x86/urgent] x86: Optimize cmpxchg64() at build-time some more

Commit-ID: 982d007a6eec4a0abb404d2355eeec2c041c61ea
Gitweb: http://git.kernel.org/tip/982d007a6eec4a0abb404d2355eeec2c041c61ea
Author: Linus Torvalds <[email protected]>
AuthorDate: Wed, 30 Sep 2009 17:57:27 -0700
Committer: Ingo Molnar <[email protected]>
CommitDate: Thu, 1 Oct 2009 08:01:08 +0200

x86: Optimize cmpxchg64() at build-time some more

Try to avoid the 'alternates()' code when we can statically
determine that cmpxchg8b is fine. We already have that
CONFIG_x86_CMPXCHG64 (enabled by PAE support), and we could easily
also enable it for some of the CPU cases.

Note, this patch only adds CMPXCHG8B for the obvious Intel CPU's,
not for others. (There was something really messy about cmpxchg8b
and clone CPU's, so if you enable it on other CPUs later, do it
carefully.)

If we avoid that asm-alternative thing when we can assume the
instruction exists, we'll generate less support crud, and we'll
avoid the whole issue with that extra 'nop' for padding instruction
sizes etc.

LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>


---
arch/x86/Kconfig.cpu | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/x86/Kconfig.cpu b/arch/x86/Kconfig.cpu
index 527519b..f2824fb 100644
--- a/arch/x86/Kconfig.cpu
+++ b/arch/x86/Kconfig.cpu
@@ -400,7 +400,7 @@ config X86_TSC

config X86_CMPXCHG64
def_bool y
- depends on X86_PAE || X86_64
+ depends on X86_PAE || X86_64 || MCORE2 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MATOM

# this should be set for all -march=.. options where the compiler
# generates cmov.
@@ -412,6 +412,7 @@ config X86_MINIMUM_CPU_FAMILY
int
default "64" if X86_64
default "6" if X86_32 && X86_P6_NOP
+ default "5" if X86_32 && X86_CMPXCHG64
default "4" if X86_32 && (X86_XADD || X86_CMPXCHG || X86_BSWAP || X86_WP_WORKS_OK)
default "3"

2009-10-01 06:42:38

by Ingo Molnar

[permalink] [raw]
Subject: Re: [GIT PULL] scheduler fixes


* Eric Dumazet <[email protected]> wrote:

> Ingo Molnar a ?crit :
> >
> > That's not enough - the inline assembly code in
> > arch/x86/include/asm/cmpxchg_32.h should also not reference
> > cmpxchg8b_emu in that case.
> >
>
> I believe it is OK as is, since cmpxchg8b_emu is referenced only
> once, in a section guarded by :
> #ifndef CONFIG_X86_CMPXCHG64

yeah. But this bit is wrong:

> +++ b/arch/x86/lib/Makefile
> @@ -15,8 +15,10 @@ ifeq ($(CONFIG_X86_32),y)
> obj-y += atomic64_32.o
> lib-y += checksum_32.o
> lib-y += strstr_32.o
> - lib-y += semaphore_32.o string_32.o cmpxchg8b_emu.o
> -
> + lib-y += semaphore_32.o string_32.o
> +ifeq ($(CONFIG_X86_CMPXCHG64),n)
> + lib-y += cmpxchg8b_emu.o
> +endif
> lib-$(CONFIG_X86_USE_3DNOW) += mmx_32.o
> else
> obj-y += io_64.o iomap_copy_64.o

The way to check for a disabled option in a Make rule is:

ifneq ($(CONFIG_X86_CMPXCHG64),y)

As in the disabled case the config variable will be undefined. (and wont
have a value of 'n'). I've done the fix below on your patch.

Ingo

Index: linux2/arch/x86/lib/Makefile
===================================================================
--- linux2.orig/arch/x86/lib/Makefile
+++ linux2/arch/x86/lib/Makefile
@@ -16,7 +16,7 @@ ifeq ($(CONFIG_X86_32),y)
lib-y += checksum_32.o
lib-y += strstr_32.o
lib-y += semaphore_32.o string_32.o
-ifeq ($(CONFIG_X86_CMPXCHG64),n)
+ifneq ($(CONFIG_X86_CMPXCHG64),y)
lib-y += cmpxchg8b_emu.o
endif
lib-$(CONFIG_X86_USE_3DNOW) += mmx_32.o

2009-10-01 06:49:37

by Eric Dumazet

[permalink] [raw]
Subject: [tip:x86/urgent] x86: Don't generate cmpxchg8b_emu if CONFIG_X86_CMPXCHG64=y

Commit-ID: 04edbdef02ec4386a2ae38cab7ae7d166e17a756
Gitweb: http://git.kernel.org/tip/04edbdef02ec4386a2ae38cab7ae7d166e17a756
Author: Eric Dumazet <[email protected]>
AuthorDate: Thu, 1 Oct 2009 07:30:38 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Thu, 1 Oct 2009 08:42:24 +0200

x86: Don't generate cmpxchg8b_emu if CONFIG_X86_CMPXCHG64=y

Conditionaly compile cmpxchg8b_emu.o and EXPORT_SYMBOL(cmpxchg8b_emu).

This reduces the kernel size a bit.

Signed-off-by: Eric Dumazet <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: John Stultz <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Linus Torvalds <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>


---
arch/x86/kernel/i386_ksyms_32.c | 2 ++
arch/x86/lib/Makefile | 6 ++++--
2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/i386_ksyms_32.c b/arch/x86/kernel/i386_ksyms_32.c
index 1736c5a..9c3bd4a 100644
--- a/arch/x86/kernel/i386_ksyms_32.c
+++ b/arch/x86/kernel/i386_ksyms_32.c
@@ -15,8 +15,10 @@ EXPORT_SYMBOL(mcount);
* the export, but dont use it from C code, it is used
* by assembly code and is not using C calling convention!
*/
+#ifndef CONFIG_X86_CMPXCHG64
extern void cmpxchg8b_emu(void);
EXPORT_SYMBOL(cmpxchg8b_emu);
+#endif

/* Networking helper routines. */
EXPORT_SYMBOL(csum_partial_copy_generic);
diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index 3e549b8..85f5db9 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -15,8 +15,10 @@ ifeq ($(CONFIG_X86_32),y)
obj-y += atomic64_32.o
lib-y += checksum_32.o
lib-y += strstr_32.o
- lib-y += semaphore_32.o string_32.o cmpxchg8b_emu.o
-
+ lib-y += semaphore_32.o string_32.o
+ifneq ($(CONFIG_X86_CMPXCHG64),y)
+ lib-y += cmpxchg8b_emu.o
+endif
lib-$(CONFIG_X86_USE_3DNOW) += mmx_32.o
else
obj-y += io_64.o iomap_copy_64.o

2009-10-01 07:00:11

by Eric Dumazet

[permalink] [raw]
Subject: Re: [GIT PULL] scheduler fixes

Ingo Molnar a ?crit :


> The way to check for a disabled option in a Make rule is:
>
> ifneq ($(CONFIG_X86_CMPXCHG64),y)
>

Ah OK, thanks for the explanation, my test 486-build (before your change) just finished to :

arch/x86/built-in.o:(__ksymtab+0xa8): undefined reference to `cmpxchg8b_emu'
kernel/built-in.o: In function `sched_clock_local':
/opt/src/linux-2.6/kernel/sched_clock.c:130: undefined reference to `cmpxchg8b_emu'
kernel/built-in.o: In function `sched_clock_remote':
/opt/src/linux-2.6/kernel/sched_clock.c:166: undefined reference to `cmpxchg8b_emu'
make: *** [.tmp_vmlinux1] Error 1



> As in the disabled case the config variable will be undefined. (and wont
> have a value of 'n'). I've done the fix below on your patch.
>
> Ingo
>
> Index: linux2/arch/x86/lib/Makefile
> ===================================================================
> --- linux2.orig/arch/x86/lib/Makefile
> +++ linux2/arch/x86/lib/Makefile
> @@ -16,7 +16,7 @@ ifeq ($(CONFIG_X86_32),y)
> lib-y += checksum_32.o
> lib-y += strstr_32.o
> lib-y += semaphore_32.o string_32.o
> -ifeq ($(CONFIG_X86_CMPXCHG64),n)
> +ifneq ($(CONFIG_X86_CMPXCHG64),y)
> lib-y += cmpxchg8b_emu.o
> endif
> lib-$(CONFIG_X86_USE_3DNOW) += mmx_32.o


Thats perfect this fixed my 486-build, thanks again.

Looking at disassembly I still see atomic64_cmpxchg(),
atomic64_xchg(), atomic64_add_return(), and friends being
included... (but not used)

I'll check if similar patch could be done as well for atomic64 functions.

2009-10-01 21:51:31

by Arjan van de Ven

[permalink] [raw]
Subject: Re: Linux 2.6.32-rc1

On Wed, 30 Sep 2009 22:38:18 +0200
Ingo Molnar <[email protected]> wrote:

> > > > probably a "memory" clobber is needed, alternative_io() doesnt
> > > > provide it.
> > >
> > > Yeah, good catch.
> >
> > Yeah - i'm testing this right now, it looks like something that
> > could explain the hang.
>
> Yeah - that was the cause of the hang

I woke up at 2am realizing that this was missing.. but you beat me to
it ;)


--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2009-10-01 07:29:03

by Sam Ravnborg

[permalink] [raw]
Subject: Re: [GIT PULL] scheduler fixes

On Thu, Oct 01, 2009 at 08:42:09AM +0200, Ingo Molnar wrote:
>
> * Eric Dumazet <[email protected]> wrote:
>
> > Ingo Molnar a ?crit :
> > >
> > > That's not enough - the inline assembly code in
> > > arch/x86/include/asm/cmpxchg_32.h should also not reference
> > > cmpxchg8b_emu in that case.
> > >
> >
> > I believe it is OK as is, since cmpxchg8b_emu is referenced only
> > once, in a section guarded by :
> > #ifndef CONFIG_X86_CMPXCHG64
>
> yeah. But this bit is wrong:
>
> > +++ b/arch/x86/lib/Makefile
> > @@ -15,8 +15,10 @@ ifeq ($(CONFIG_X86_32),y)
> > obj-y += atomic64_32.o
> > lib-y += checksum_32.o
> > lib-y += strstr_32.o
> > - lib-y += semaphore_32.o string_32.o cmpxchg8b_emu.o
> > -
> > + lib-y += semaphore_32.o string_32.o
> > +ifeq ($(CONFIG_X86_CMPXCHG64),n)
> > + lib-y += cmpxchg8b_emu.o
> > +endif
> > lib-$(CONFIG_X86_USE_3DNOW) += mmx_32.o
> > else
> > obj-y += io_64.o iomap_copy_64.o
>
> The way to check for a disabled option in a Make rule is:
>
> ifneq ($(CONFIG_X86_CMPXCHG64),y)
or
ifeq ($(CONFIG_X86_CMPXCHG64),)

Both variants are suboptimal - but I have not been
able to come up with something readable in this situation.

Looking at the above the first is the better version as
it is a bit more obvious what we test for (assuming reader
knows this is a boolean value and not tristate).

Sam

2009-10-01 12:48:44

by Christoph Hellwig

[permalink] [raw]
Subject: Re: Linux 2.6.32-rc1

On Wed, Sep 30, 2009 at 01:41:26PM -0700, Linus Torvalds wrote:
> On Wed, 30 Sep 2009, Eric Dumazet wrote:
> >
> > Yes, I provided following version in my first answer but apparently it
> > was too complex :)
>
> It wasn't too complex, but YOU QUOTED THE WHOL F*CKING EMAIL.
>
> I don't know about anybody else, but if I see a couple of screenfuls of
> quotes, with no actual interesting new data, I just move on. I suspect
> everybody else did too.

I do most of the time. But if everyone did all the time Linux
development would become almost defunct. Fullquotes are spreading like
a cancer, mostly by people using thunderbird, evolution and similar
crap.

2009-10-01 16:09:06

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: Linux 2.6.32-rc1

On Thu, 01 Oct 2009 08:48:32 EDT, Christoph Hellwig said:

> I do most of the time. But if everyone did all the time Linux
> development would become almost defunct. Fullquotes are spreading like
> a cancer, mostly by people using thunderbird, evolution and similar
> crap.

What? There's mainline graphic MUAs that *don't* do "select big chunk of text
with mouse drag, hit delete"?

No, I suspect people are just lazy. Just like the ones that say "Sorry for
top-posting, MUA XYZ makes me" - in the time it took them to type that, they
could have scrolled down to where they should reply and start typing *there*.


Attachments:
(No filename) (227.00 B)

2009-10-02 16:40:50

by Yuhong Bao

[permalink] [raw]
Subject: RE: [GIT PULL] scheduler fixes


<[email protected]>
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0



> [ If I recall correctly=2C cmpxchg8b support detection was something that
> was totally messed up in WNT=2C and I have this memory of TMTA _not_
> claiming to support CX8 in the cpuid bits=2C but happily supporting the
> instruction itself - because otherwise WNT would blue-screen on boot or
> something silly like that.
Yep=2C Geoff Chappell describe this bug excellently:http://www.geoffchappel=
l.com/viewer.htm?doc=3Dstudies/windows/km/cpu/cx8.htm
> As a result=2C the patch below only adds CMPXCHG8B for the obvious Intel
> CPU's=2C not for others. There was something really messy about cmpxchg8b
> and clone CPU's=2C but I'm not 100% sure about the details=2C so take thi=
s
> patch with a pinch of salt=2C and some thinking ]
Basically=2C pre-SP4 NT 4 would crash if the CX8 bit was set in CPUID and v=
endor was not GenuineIntel=2C AuthenticAMD=2C or CyrixInstead due to a bug =
in the CPU detection=2C XP had to workaround it too when it began requiring=
CX8.
Yuhong bao =0A=
_________________________________________________________________=0A=
Insert movie times and more without leaving Hotmail=AE.=0A=
http://windowslive.com/Tutorial/Hotmail/QuickAdd?ocid=3DTXT_TAGLM_WL_HM_Tut=
orial_QuickAdd_062009=

2009-10-05 14:36:47

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Linux 2.6.32-rc1

On Thu, 2009-10-01 at 12:08 -0400, [email protected] wrote:
> What? There's mainline graphic MUAs that *don't* do "select big chunk of text
> with mouse drag, hit delete"?

Better yet, using evo, if you select part of the msg using your mouse
and then hit reply-all, it will only quote the selected bit.

2009-10-05 15:58:12

by Peter Zijlstra

[permalink] [raw]
Subject: [PATCH] x86: Generate cmpxchg build failures

On Tue, 2009-09-29 at 14:17 -0700, Linus Torvalds wrote:

> And regardless, we should fix the silent cmpxchg failure, even if it's
> just a link-time failure or something.

Something like the below?

Seems to build defconfig-i386, defconfig-x86_64 and generates a build
failure on i386 with the sched_clock cmpxchg64 thing undone.

Signed-off-by: Peter Zijlstra <[email protected]>
---
arch/x86/include/asm/cmpxchg_32.h | 274 ++++++++++++++++++----------------
arch/x86/include/asm/cmpxchg_64.h | 296 +++++++++++++++++++-----------------
2 files changed, 298 insertions(+), 272 deletions(-)

diff --git a/arch/x86/include/asm/cmpxchg_32.h b/arch/x86/include/asm/cmpxchg_32.h
index ee1931b..45ed563 100644
--- a/arch/x86/include/asm/cmpxchg_32.h
+++ b/arch/x86/include/asm/cmpxchg_32.h
@@ -8,14 +8,50 @@
* you need to test for the feature in boot_cpu_data.
*/

-#define xchg(ptr, v) \
- ((__typeof__(*(ptr)))__xchg((unsigned long)(v), (ptr), sizeof(*(ptr))))
+extern void __xchg_wrong_size(void);
+
+/*
+ * Note: no "lock" prefix even on SMP: xchg always implies lock anyway
+ * Note 2: xchg has side effect, so that attribute volatile is necessary,
+ * but generally the primitive is invalid, *ptr is output argument. --ANK
+ */

struct __xchg_dummy {
unsigned long a[100];
};
#define __xg(x) ((struct __xchg_dummy *)(x))

+#define __xchg(x, ptr, size) \
+({ \
+ __typeof(*(ptr)) __x = (x); \
+ switch (size) { \
+ case 1: \
+ asm volatile("xchgb %b0,%1" \
+ : "=q" (__x) \
+ : "m" (*__xg(ptr)), "0" (__x) \
+ : "memory"); \
+ break; \
+ case 2: \
+ asm volatile("xchgw %w0,%1" \
+ : "=r" (__x) \
+ : "m" (*__xg(ptr)), "0" (__x) \
+ : "memory"); \
+ break; \
+ case 4: \
+ asm volatile("xchgl %0,%1" \
+ : "=r" (__x) \
+ : "m" (*__xg(ptr)), "0" (__x) \
+ : "memory"); \
+ break; \
+ default: \
+ __xchg_wrong_size(); \
+ } \
+ __x; \
+})
+
+#define xchg(ptr, v) \
+ __xchg((v), (ptr), sizeof(*ptr))
+
/*
* The semantics of XCHGCMP8B are a bit strange, this is why
* there is a loop and the loading of %%eax and %%edx has to
@@ -71,57 +107,119 @@ static inline void __set_64bit_var(unsigned long long *ptr,
(unsigned int)((value) >> 32)) \
: __set_64bit(ptr, ll_low((value)), ll_high((value))))

-/*
- * Note: no "lock" prefix even on SMP: xchg always implies lock anyway
- * Note 2: xchg has side effect, so that attribute volatile is necessary,
- * but generally the primitive is invalid, *ptr is output argument. --ANK
- */
-static inline unsigned long __xchg(unsigned long x, volatile void *ptr,
- int size)
-{
- switch (size) {
- case 1:
- asm volatile("xchgb %b0,%1"
- : "=q" (x)
- : "m" (*__xg(ptr)), "0" (x)
- : "memory");
- break;
- case 2:
- asm volatile("xchgw %w0,%1"
- : "=r" (x)
- : "m" (*__xg(ptr)), "0" (x)
- : "memory");
- break;
- case 4:
- asm volatile("xchgl %0,%1"
- : "=r" (x)
- : "m" (*__xg(ptr)), "0" (x)
- : "memory");
- break;
- }
- return x;
-}
+extern void __cmpxchg_wrong_size(void);

/*
* Atomic compare and exchange. Compare OLD with MEM, if identical,
* store NEW in MEM. Return the initial value in MEM. Success is
* indicated by comparing RETURN with OLD.
*/
+#define __cmpxchg(ptr, old, new, size) \
+({ \
+ __typeof__(*(ptr)) __ret; \
+ __typeof__(*(ptr)) __old = (old); \
+ __typeof__(*(ptr)) __new = (new); \
+ switch (size) { \
+ case 1: \
+ asm volatile(LOCK_PREFIX "cmpxchgb %b1,%2" \
+ : "=a"(__ret) \
+ : "q"(__new), "m"(*__xg(ptr)), "0"(__old) \
+ : "memory"); \
+ break; \
+ case 2: \
+ asm volatile(LOCK_PREFIX "cmpxchgw %w1,%2" \
+ : "=a"(__ret) \
+ : "r"(__new), "m"(*__xg(ptr)), "0"(__old) \
+ : "memory"); \
+ break; \
+ case 4: \
+ asm volatile(LOCK_PREFIX "cmpxchgl %1,%2" \
+ : "=a"(__ret) \
+ : "r"(__new), "m"(*__xg(ptr)), "0"(__old) \
+ : "memory"); \
+ break; \
+ default: \
+ __cmpxchg_wrong_size(); \
+ } \
+ __ret; \
+})
+
+/*
+ * Always use locked operations when touching memory shared with a
+ * hypervisor, since the system may be SMP even if the guest kernel
+ * isn't.
+ */
+#define __sync_cmpxchg(ptr, old, new, size) \
+({ \
+ __typeof__(*(ptr)) __ret; \
+ __typeof__(*(ptr)) __old = (old); \
+ __typeof__(*(ptr)) __new = (new); \
+ switch (size) { \
+ case 1: \
+ asm volatile("lock; cmpxchgb %b1,%2" \
+ : "=a"(__ret) \
+ : "q"(__new), "m"(*__xg(ptr)), "0"(__old) \
+ : "memory"); \
+ break; \
+ case 2: \
+ asm volatile("lock; cmpxchgw %w1,%2" \
+ : "=a"(__ret) \
+ : "r"(__new), "m"(*__xg(ptr)), "0"(__old) \
+ : "memory"); \
+ break; \
+ case 4: \
+ asm volatile("lock; cmpxchgl %1,%2" \
+ : "=a"(__ret) \
+ : "r"(__new), "m"(*__xg(ptr)), "0"(__old) \
+ : "memory"); \
+ break; \
+ default: \
+ __cmpxchg_wrong_size(); \
+ } \
+ __ret; \
+})
+
+#define __cmpxchg_local(ptr, old, new, size) \
+({ \
+ __typeof__(*(ptr)) __ret; \
+ __typeof__(*(ptr)) __old = (old); \
+ __typeof__(*(ptr)) __new = (new); \
+ switch (size) { \
+ case 1: \
+ asm volatile("cmpxchgb %b1,%2" \
+ : "=a"(__ret) \
+ : "q"(__new), "m"(*__xg(ptr)), "0"(__old) \
+ : "memory"); \
+ break; \
+ case 2: \
+ asm volatile("cmpxchgw %w1,%2" \
+ : "=a"(__ret) \
+ : "r"(__new), "m"(*__xg(ptr)), "0"(__old) \
+ : "memory"); \
+ break; \
+ case 4: \
+ asm volatile("cmpxchgl %1,%2" \
+ : "=a"(__ret) \
+ : "r"(__new), "m"(*__xg(ptr)), "0"(__old) \
+ : "memory"); \
+ break; \
+ default: \
+ __cmpxchg_wrong_size(); \
+ } \
+ __ret; \
+})

#ifdef CONFIG_X86_CMPXCHG
#define __HAVE_ARCH_CMPXCHG 1
-#define cmpxchg(ptr, o, n) \
- ((__typeof__(*(ptr)))__cmpxchg((ptr), (unsigned long)(o), \
- (unsigned long)(n), \
- sizeof(*(ptr))))
-#define sync_cmpxchg(ptr, o, n) \
- ((__typeof__(*(ptr)))__sync_cmpxchg((ptr), (unsigned long)(o), \
- (unsigned long)(n), \
- sizeof(*(ptr))))
-#define cmpxchg_local(ptr, o, n) \
- ((__typeof__(*(ptr)))__cmpxchg_local((ptr), (unsigned long)(o), \
- (unsigned long)(n), \
- sizeof(*(ptr))))
+
+#define cmpxchg(ptr, old, new) \
+ __cmpxchg((ptr), (old), (new), sizeof(*ptr))
+
+#define sync_cmpxchg(ptr, old, new) \
+ __sync_cmpxchg((ptr), (old), (new), sizeof(*ptr))
+
+#define cmpxchg_local(ptr, old, new) \
+ __cmpxchg_local((ptr), (old), (new), sizeof(*ptr))
#endif

#ifdef CONFIG_X86_CMPXCHG64
@@ -133,94 +231,6 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr,
(unsigned long long)(n)))
#endif

-static inline unsigned long __cmpxchg(volatile void *ptr, unsigned long old,
- unsigned long new, int size)
-{
- unsigned long prev;
- switch (size) {
- case 1:
- asm volatile(LOCK_PREFIX "cmpxchgb %b1,%2"
- : "=a"(prev)
- : "q"(new), "m"(*__xg(ptr)), "0"(old)
- : "memory");
- return prev;
- case 2:
- asm volatile(LOCK_PREFIX "cmpxchgw %w1,%2"
- : "=a"(prev)
- : "r"(new), "m"(*__xg(ptr)), "0"(old)
- : "memory");
- return prev;
- case 4:
- asm volatile(LOCK_PREFIX "cmpxchgl %1,%2"
- : "=a"(prev)
- : "r"(new), "m"(*__xg(ptr)), "0"(old)
- : "memory");
- return prev;
- }
- return old;
-}
-
-/*
- * Always use locked operations when touching memory shared with a
- * hypervisor, since the system may be SMP even if the guest kernel
- * isn't.
- */
-static inline unsigned long __sync_cmpxchg(volatile void *ptr,
- unsigned long old,
- unsigned long new, int size)
-{
- unsigned long prev;
- switch (size) {
- case 1:
- asm volatile("lock; cmpxchgb %b1,%2"
- : "=a"(prev)
- : "q"(new), "m"(*__xg(ptr)), "0"(old)
- : "memory");
- return prev;
- case 2:
- asm volatile("lock; cmpxchgw %w1,%2"
- : "=a"(prev)
- : "r"(new), "m"(*__xg(ptr)), "0"(old)
- : "memory");
- return prev;
- case 4:
- asm volatile("lock; cmpxchgl %1,%2"
- : "=a"(prev)
- : "r"(new), "m"(*__xg(ptr)), "0"(old)
- : "memory");
- return prev;
- }
- return old;
-}
-
-static inline unsigned long __cmpxchg_local(volatile void *ptr,
- unsigned long old,
- unsigned long new, int size)
-{
- unsigned long prev;
- switch (size) {
- case 1:
- asm volatile("cmpxchgb %b1,%2"
- : "=a"(prev)
- : "q"(new), "m"(*__xg(ptr)), "0"(old)
- : "memory");
- return prev;
- case 2:
- asm volatile("cmpxchgw %w1,%2"
- : "=a"(prev)
- : "r"(new), "m"(*__xg(ptr)), "0"(old)
- : "memory");
- return prev;
- case 4:
- asm volatile("cmpxchgl %1,%2"
- : "=a"(prev)
- : "r"(new), "m"(*__xg(ptr)), "0"(old)
- : "memory");
- return prev;
- }
- return old;
-}
-
static inline unsigned long long __cmpxchg64(volatile void *ptr,
unsigned long long old,
unsigned long long new)
diff --git a/arch/x86/include/asm/cmpxchg_64.h b/arch/x86/include/asm/cmpxchg_64.h
index 52de72e..df20c6e 100644
--- a/arch/x86/include/asm/cmpxchg_64.h
+++ b/arch/x86/include/asm/cmpxchg_64.h
@@ -3,9 +3,6 @@

#include <asm/alternative.h> /* Provides LOCK_PREFIX */

-#define xchg(ptr, v) ((__typeof__(*(ptr)))__xchg((unsigned long)(v), \
- (ptr), sizeof(*(ptr))))
-
#define __xg(x) ((volatile long *)(x))

static inline void set_64bit(volatile unsigned long *ptr, unsigned long val)
@@ -15,167 +12,186 @@ static inline void set_64bit(volatile unsigned long *ptr, unsigned long val)

#define _set_64bit set_64bit

+extern void __xchg_wrong_size(void);
+extern void __cmpxchg_wrong_size(void);
+
/*
* Note: no "lock" prefix even on SMP: xchg always implies lock anyway
* Note 2: xchg has side effect, so that attribute volatile is necessary,
* but generally the primitive is invalid, *ptr is output argument. --ANK
*/
-static inline unsigned long __xchg(unsigned long x, volatile void *ptr,
- int size)
-{
- switch (size) {
- case 1:
- asm volatile("xchgb %b0,%1"
- : "=q" (x)
- : "m" (*__xg(ptr)), "0" (x)
- : "memory");
- break;
- case 2:
- asm volatile("xchgw %w0,%1"
- : "=r" (x)
- : "m" (*__xg(ptr)), "0" (x)
- : "memory");
- break;
- case 4:
- asm volatile("xchgl %k0,%1"
- : "=r" (x)
- : "m" (*__xg(ptr)), "0" (x)
- : "memory");
- break;
- case 8:
- asm volatile("xchgq %0,%1"
- : "=r" (x)
- : "m" (*__xg(ptr)), "0" (x)
- : "memory");
- break;
- }
- return x;
-}
+#define __xchg(x, ptr, size) \
+({ \
+ __typeof(*(ptr)) __x = (x); \
+ switch (size) { \
+ case 1: \
+ asm volatile("xchgb %b0,%1" \
+ : "=q" (__x) \
+ : "m" (*__xg(ptr)), "0" (__x) \
+ : "memory"); \
+ break; \
+ case 2: \
+ asm volatile("xchgw %w0,%1" \
+ : "=r" (__x) \
+ : "m" (*__xg(ptr)), "0" (__x) \
+ : "memory"); \
+ break; \
+ case 4: \
+ asm volatile("xchgl %k0,%1" \
+ : "=r" (__x) \
+ : "m" (*__xg(ptr)), "0" (__x) \
+ : "memory"); \
+ break; \
+ case 8: \
+ asm volatile("xchgq %0,%1" \
+ : "=r" (__x) \
+ : "m" (*__xg(ptr)), "0" (__x) \
+ : "memory"); \
+ break; \
+ default: \
+ __xchg_wrong_size(); \
+ } \
+ __x; \
+})
+
+#define xchg(ptr, v) \
+ __xchg((v), (ptr), sizeof(*ptr))

/*
* Atomic compare and exchange. Compare OLD with MEM, if identical,
* store NEW in MEM. Return the initial value in MEM. Success is
* indicated by comparing RETURN with OLD.
*/
-
-#define __HAVE_ARCH_CMPXCHG 1
-
-static inline unsigned long __cmpxchg(volatile void *ptr, unsigned long old,
- unsigned long new, int size)
-{
- unsigned long prev;
- switch (size) {
- case 1:
- asm volatile(LOCK_PREFIX "cmpxchgb %b1,%2"
- : "=a"(prev)
- : "q"(new), "m"(*__xg(ptr)), "0"(old)
- : "memory");
- return prev;
- case 2:
- asm volatile(LOCK_PREFIX "cmpxchgw %w1,%2"
- : "=a"(prev)
- : "r"(new), "m"(*__xg(ptr)), "0"(old)
- : "memory");
- return prev;
- case 4:
- asm volatile(LOCK_PREFIX "cmpxchgl %k1,%2"
- : "=a"(prev)
- : "r"(new), "m"(*__xg(ptr)), "0"(old)
- : "memory");
- return prev;
- case 8:
- asm volatile(LOCK_PREFIX "cmpxchgq %1,%2"
- : "=a"(prev)
- : "r"(new), "m"(*__xg(ptr)), "0"(old)
- : "memory");
- return prev;
- }
- return old;
-}
+#define __cmpxchg(ptr, old, new, size) \
+({ \
+ __typeof__(*(ptr)) __ret; \
+ __typeof__(*(ptr)) __old = (old); \
+ __typeof__(*(ptr)) __new = (new); \
+ switch (size) { \
+ case 1: \
+ asm volatile(LOCK_PREFIX "cmpxchgb %b1,%2" \
+ : "=a"(__ret) \
+ : "q"(__new), "m"(*__xg(ptr)), "0"(__old) \
+ : "memory"); \
+ break; \
+ case 2: \
+ asm volatile(LOCK_PREFIX "cmpxchgw %w1,%2" \
+ : "=a"(__ret) \
+ : "r"(__new), "m"(*__xg(ptr)), "0"(__old) \
+ : "memory"); \
+ break; \
+ case 4: \
+ asm volatile(LOCK_PREFIX "cmpxchgl %k1,%2" \
+ : "=a"(__ret) \
+ : "r"(__new), "m"(*__xg(ptr)), "0"(__old) \
+ : "memory"); \
+ break; \
+ case 8: \
+ asm volatile(LOCK_PREFIX "cmpxchgq %1,%2" \
+ : "=a"(__ret) \
+ : "r"(__new), "m"(*__xg(ptr)), "0"(__old) \
+ : "memory"); \
+ break; \
+ default: \
+ __cmpxchg_wrong_size(); \
+ } \
+ __ret; \
+})

/*
* Always use locked operations when touching memory shared with a
* hypervisor, since the system may be SMP even if the guest kernel
* isn't.
*/
-static inline unsigned long __sync_cmpxchg(volatile void *ptr,
- unsigned long old,
- unsigned long new, int size)
-{
- unsigned long prev;
- switch (size) {
- case 1:
- asm volatile("lock; cmpxchgb %b1,%2"
- : "=a"(prev)
- : "q"(new), "m"(*__xg(ptr)), "0"(old)
- : "memory");
- return prev;
- case 2:
- asm volatile("lock; cmpxchgw %w1,%2"
- : "=a"(prev)
- : "r"(new), "m"(*__xg(ptr)), "0"(old)
- : "memory");
- return prev;
- case 4:
- asm volatile("lock; cmpxchgl %1,%2"
- : "=a"(prev)
- : "r"(new), "m"(*__xg(ptr)), "0"(old)
- : "memory");
- return prev;
- }
- return old;
-}
+#define __sync_cmpxchg(ptr, old, new, size) \
+({ \
+ __typeof__(*(ptr)) __ret; \
+ __typeof__(*(ptr)) __old = (old); \
+ __typeof__(*(ptr)) __new = (new); \
+ switch (size) { \
+ case 1: \
+ asm volatile("lock; cmpxchgb %b1,%2" \
+ : "=a"(__ret) \
+ : "q"(__new), "m"(*__xg(ptr)), "0"(__old) \
+ : "memory"); \
+ break; \
+ case 2: \
+ asm volatile("lock; cmpxchgw %w1,%2" \
+ : "=a"(__ret) \
+ : "r"(__new), "m"(*__xg(ptr)), "0"(__old) \
+ : "memory"); \
+ break; \
+ case 4: \
+ asm volatile("lock; cmpxchgl %k1,%2" \
+ : "=a"(__ret) \
+ : "r"(__new), "m"(*__xg(ptr)), "0"(__old) \
+ : "memory"); \
+ break; \
+ case 8: \
+ asm volatile("lock; cmpxchgq %1,%2" \
+ : "=a"(__ret) \
+ : "r"(__new), "m"(*__xg(ptr)), "0"(__old) \
+ : "memory"); \
+ break; \
+ default: \
+ __cmpxchg_wrong_size(); \
+ } \
+ __ret; \
+})

-static inline unsigned long __cmpxchg_local(volatile void *ptr,
- unsigned long old,
- unsigned long new, int size)
-{
- unsigned long prev;
- switch (size) {
- case 1:
- asm volatile("cmpxchgb %b1,%2"
- : "=a"(prev)
- : "q"(new), "m"(*__xg(ptr)), "0"(old)
- : "memory");
- return prev;
- case 2:
- asm volatile("cmpxchgw %w1,%2"
- : "=a"(prev)
- : "r"(new), "m"(*__xg(ptr)), "0"(old)
- : "memory");
- return prev;
- case 4:
- asm volatile("cmpxchgl %k1,%2"
- : "=a"(prev)
- : "r"(new), "m"(*__xg(ptr)), "0"(old)
- : "memory");
- return prev;
- case 8:
- asm volatile("cmpxchgq %1,%2"
- : "=a"(prev)
- : "r"(new), "m"(*__xg(ptr)), "0"(old)
- : "memory");
- return prev;
- }
- return old;
-}
+#define __cmpxchg_local(ptr, old, new, size) \
+({ \
+ __typeof__(*(ptr)) __ret; \
+ __typeof__(*(ptr)) __old = (old); \
+ __typeof__(*(ptr)) __new = (new); \
+ switch (size) { \
+ case 1: \
+ asm volatile("cmpxchgb %b1,%2" \
+ : "=a"(__ret) \
+ : "q"(__new), "m"(*__xg(ptr)), "0"(__old) \
+ : "memory"); \
+ break; \
+ case 2: \
+ asm volatile("cmpxchgw %w1,%2" \
+ : "=a"(__ret) \
+ : "r"(__new), "m"(*__xg(ptr)), "0"(__old) \
+ : "memory"); \
+ break; \
+ case 4: \
+ asm volatile("cmpxchgl %k1,%2" \
+ : "=a"(__ret) \
+ : "r"(__new), "m"(*__xg(ptr)), "0"(__old) \
+ : "memory"); \
+ break; \
+ case 8: \
+ asm volatile("cmpxchgq %1,%2" \
+ : "=a"(__ret) \
+ : "r"(__new), "m"(*__xg(ptr)), "0"(__old) \
+ : "memory"); \
+ break; \
+ default: \
+ __cmpxchg_wrong_size(); \
+ } \
+ __ret; \
+})
+
+#define __HAVE_ARCH_CMPXCHG 1
+
+#define cmpxchg(ptr, old, new) \
+ __cmpxchg((ptr), (old), (new), sizeof(*ptr))
+
+#define sync_cmpxchg(ptr, old, new) \
+ __sync_cmpxchg((ptr), (old), (new), sizeof(*ptr))
+
+#define cmpxchg_local(ptr, old, new) \
+ __cmpxchg_local((ptr), (old), (new), sizeof(*ptr))

-#define cmpxchg(ptr, o, n) \
- ((__typeof__(*(ptr)))__cmpxchg((ptr), (unsigned long)(o), \
- (unsigned long)(n), sizeof(*(ptr))))
#define cmpxchg64(ptr, o, n) \
({ \
BUILD_BUG_ON(sizeof(*(ptr)) != 8); \
cmpxchg((ptr), (o), (n)); \
})
-#define cmpxchg_local(ptr, o, n) \
- ((__typeof__(*(ptr)))__cmpxchg_local((ptr), (unsigned long)(o), \
- (unsigned long)(n), \
- sizeof(*(ptr))))
-#define sync_cmpxchg(ptr, o, n) \
- ((__typeof__(*(ptr)))__sync_cmpxchg((ptr), (unsigned long)(o), \
- (unsigned long)(n), \
- sizeof(*(ptr))))
+
#define cmpxchg64_local(ptr, o, n) \
({ \
BUILD_BUG_ON(sizeof(*(ptr)) != 8); \

2009-10-05 18:52:25

by Maciej Żenczykowski

[permalink] [raw]
Subject: Re: [PATCH] x86: Generate cmpxchg build failures

Strictly speaking isn't there a cmpxchg8b on newer than ancient cpus...

I'm guessing that it's deliberately not supported/used?

Maciej

2009-10-05 19:18:02

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] x86: Generate cmpxchg build failures



On Mon, 5 Oct 2009, Peter Zijlstra wrote:

> On Tue, 2009-09-29 at 14:17 -0700, Linus Torvalds wrote:
>
> > And regardless, we should fix the silent cmpxchg failure, even if it's
> > just a link-time failure or something.
>
> Something like the below?

Looks good to me. This is also one of the cases where a macro will do
better than an inline function, since the return value depends on the size
of the pointer, and it doesn't do the whole 'unsigned long' thing any
more.

It should also be fairly easy to make this now just do a cmpxchg8b for the
64-bit case (again, that wouldn't have worked sanely due to the fixed type
in the inline function version - expanding that size to 64-bit would have
been insane). But that's a separate issue (and maybe we don't want to do
it, due to the whole "it's not DMA-atomic" etc issue - we may be better
off with a build failure, and forcing people who really want 64-bit
accesses to use the explicit cmpxchg64 thing)

That said, I think that you should merge the insane three versions of this
macro into one. Having separate versions for "__[sync_|local_|]cmpxchg()"
is disgusting. I bet you can do it with a single macro, and just pass in
the LOCK_PREFIX (or empty, or "lock;") to that as needed. Rather than
duplicating it three times.

Linus

2009-10-05 19:34:01

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] x86: Generate cmpxchg build failures

On Mon, 2009-10-05 at 12:16 -0700, Linus Torvalds wrote:
>
> On Mon, 5 Oct 2009, Peter Zijlstra wrote:
>
> > On Tue, 2009-09-29 at 14:17 -0700, Linus Torvalds wrote:
> >
> > > And regardless, we should fix the silent cmpxchg failure, even if it's
> > > just a link-time failure or something.
> >
> > Something like the below?
>
> Looks good to me. This is also one of the cases where a macro will do
> better than an inline function, since the return value depends on the size
> of the pointer, and it doesn't do the whole 'unsigned long' thing any
> more.

Indeed, I was glad to see that go.

> It should also be fairly easy to make this now just do a cmpxchg8b for the
> 64-bit case (again, that wouldn't have worked sanely due to the fixed type
> in the inline function version - expanding that size to 64-bit would have
> been insane). But that's a separate issue (and maybe we don't want to do
> it, due to the whole "it's not DMA-atomic" etc issue - we may be better
> off with a build failure, and forcing people who really want 64-bit
> accesses to use the explicit cmpxchg64 thing)

Right, that would be a second patch if we think its a sane thing to do.

> That said, I think that you should merge the insane three versions of this
> macro into one. Having separate versions for "__[sync_|local_|]cmpxchg()"
> is disgusting. I bet you can do it with a single macro, and just pass in
> the LOCK_PREFIX (or empty, or "lock;") to that as needed. Rather than
> duplicating it three times.

You're right, what was I thinking doing all this copy/paste stuff.

Here's a new one, still seems to build i386 and x86_64 defconfigs.


Signed-off-by: Peter Zijlstra <[email protected]>
---
arch/x86/include/asm/cmpxchg_32.h | 218 ++++++++++++++---------------------
arch/x86/include/asm/cmpxchg_64.h | 234 ++++++++++++++----------------------
2 files changed, 177 insertions(+), 275 deletions(-)

diff --git a/arch/x86/include/asm/cmpxchg_32.h b/arch/x86/include/asm/cmpxchg_32.h
index ee1931b..720c3d4 100644
--- a/arch/x86/include/asm/cmpxchg_32.h
+++ b/arch/x86/include/asm/cmpxchg_32.h
@@ -8,14 +8,50 @@
* you need to test for the feature in boot_cpu_data.
*/

-#define xchg(ptr, v) \
- ((__typeof__(*(ptr)))__xchg((unsigned long)(v), (ptr), sizeof(*(ptr))))
+extern void __xchg_wrong_size(void);
+
+/*
+ * Note: no "lock" prefix even on SMP: xchg always implies lock anyway
+ * Note 2: xchg has side effect, so that attribute volatile is necessary,
+ * but generally the primitive is invalid, *ptr is output argument. --ANK
+ */

struct __xchg_dummy {
unsigned long a[100];
};
#define __xg(x) ((struct __xchg_dummy *)(x))

+#define __xchg(x, ptr, size) \
+({ \
+ __typeof(*(ptr)) __x = (x); \
+ switch (size) { \
+ case 1: \
+ asm volatile("xchgb %b0,%1" \
+ : "=q" (__x) \
+ : "m" (*__xg(ptr)), "0" (__x) \
+ : "memory"); \
+ break; \
+ case 2: \
+ asm volatile("xchgw %w0,%1" \
+ : "=r" (__x) \
+ : "m" (*__xg(ptr)), "0" (__x) \
+ : "memory"); \
+ break; \
+ case 4: \
+ asm volatile("xchgl %0,%1" \
+ : "=r" (__x) \
+ : "m" (*__xg(ptr)), "0" (__x) \
+ : "memory"); \
+ break; \
+ default: \
+ __xchg_wrong_size(); \
+ } \
+ __x; \
+})
+
+#define xchg(ptr, v) \
+ __xchg((v), (ptr), sizeof(*ptr))
+
/*
* The semantics of XCHGCMP8B are a bit strange, this is why
* there is a loop and the loading of %%eax and %%edx has to
@@ -71,57 +107,63 @@ static inline void __set_64bit_var(unsigned long long *ptr,
(unsigned int)((value) >> 32)) \
: __set_64bit(ptr, ll_low((value)), ll_high((value))))

-/*
- * Note: no "lock" prefix even on SMP: xchg always implies lock anyway
- * Note 2: xchg has side effect, so that attribute volatile is necessary,
- * but generally the primitive is invalid, *ptr is output argument. --ANK
- */
-static inline unsigned long __xchg(unsigned long x, volatile void *ptr,
- int size)
-{
- switch (size) {
- case 1:
- asm volatile("xchgb %b0,%1"
- : "=q" (x)
- : "m" (*__xg(ptr)), "0" (x)
- : "memory");
- break;
- case 2:
- asm volatile("xchgw %w0,%1"
- : "=r" (x)
- : "m" (*__xg(ptr)), "0" (x)
- : "memory");
- break;
- case 4:
- asm volatile("xchgl %0,%1"
- : "=r" (x)
- : "m" (*__xg(ptr)), "0" (x)
- : "memory");
- break;
- }
- return x;
-}
+extern void __cmpxchg_wrong_size(void);

/*
* Atomic compare and exchange. Compare OLD with MEM, if identical,
* store NEW in MEM. Return the initial value in MEM. Success is
* indicated by comparing RETURN with OLD.
*/
+#define __raw_cmpxchg(ptr, old, new, size, lock) \
+({ \
+ __typeof__(*(ptr)) __ret; \
+ __typeof__(*(ptr)) __old = (old); \
+ __typeof__(*(ptr)) __new = (new); \
+ switch (size) { \
+ case 1: \
+ asm volatile(lock "cmpxchgb %b1,%2" \
+ : "=a"(__ret) \
+ : "q"(__new), "m"(*__xg(ptr)), "0"(__old) \
+ : "memory"); \
+ break; \
+ case 2: \
+ asm volatile(lock "cmpxchgw %w1,%2" \
+ : "=a"(__ret) \
+ : "r"(__new), "m"(*__xg(ptr)), "0"(__old) \
+ : "memory"); \
+ break; \
+ case 4: \
+ asm volatile(lock "cmpxchgl %1,%2" \
+ : "=a"(__ret) \
+ : "r"(__new), "m"(*__xg(ptr)), "0"(__old) \
+ : "memory"); \
+ break; \
+ default: \
+ __cmpxchg_wrong_size(); \
+ } \
+ __ret; \
+})
+
+#define __cmpxchg(ptr, old, new, size) \
+ __raw_cmpxchg((ptr), (old), (new), (size), LOCK_PREFIX)
+
+#define __sync_cmpxchg(ptr, old, new, size) \
+ __raw_cmpxchg((ptr), (old), (new), (size), "lock; ")
+
+#define __cmpxchg_local(ptr, old, new, size) \
+ __raw_cmpxchg((ptr), (old), (new), (size), "")

#ifdef CONFIG_X86_CMPXCHG
#define __HAVE_ARCH_CMPXCHG 1
-#define cmpxchg(ptr, o, n) \
- ((__typeof__(*(ptr)))__cmpxchg((ptr), (unsigned long)(o), \
- (unsigned long)(n), \
- sizeof(*(ptr))))
-#define sync_cmpxchg(ptr, o, n) \
- ((__typeof__(*(ptr)))__sync_cmpxchg((ptr), (unsigned long)(o), \
- (unsigned long)(n), \
- sizeof(*(ptr))))
-#define cmpxchg_local(ptr, o, n) \
- ((__typeof__(*(ptr)))__cmpxchg_local((ptr), (unsigned long)(o), \
- (unsigned long)(n), \
- sizeof(*(ptr))))
+
+#define cmpxchg(ptr, old, new) \
+ __cmpxchg((ptr), (old), (new), sizeof(*ptr))
+
+#define sync_cmpxchg(ptr, old, new) \
+ __sync_cmpxchg((ptr), (old), (new), sizeof(*ptr))
+
+#define cmpxchg_local(ptr, old, new) \
+ __cmpxchg_local((ptr), (old), (new), sizeof(*ptr))
#endif

#ifdef CONFIG_X86_CMPXCHG64
@@ -133,94 +175,6 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr,
(unsigned long long)(n)))
#endif

-static inline unsigned long __cmpxchg(volatile void *ptr, unsigned long old,
- unsigned long new, int size)
-{
- unsigned long prev;
- switch (size) {
- case 1:
- asm volatile(LOCK_PREFIX "cmpxchgb %b1,%2"
- : "=a"(prev)
- : "q"(new), "m"(*__xg(ptr)), "0"(old)
- : "memory");
- return prev;
- case 2:
- asm volatile(LOCK_PREFIX "cmpxchgw %w1,%2"
- : "=a"(prev)
- : "r"(new), "m"(*__xg(ptr)), "0"(old)
- : "memory");
- return prev;
- case 4:
- asm volatile(LOCK_PREFIX "cmpxchgl %1,%2"
- : "=a"(prev)
- : "r"(new), "m"(*__xg(ptr)), "0"(old)
- : "memory");
- return prev;
- }
- return old;
-}
-
-/*
- * Always use locked operations when touching memory shared with a
- * hypervisor, since the system may be SMP even if the guest kernel
- * isn't.
- */
-static inline unsigned long __sync_cmpxchg(volatile void *ptr,
- unsigned long old,
- unsigned long new, int size)
-{
- unsigned long prev;
- switch (size) {
- case 1:
- asm volatile("lock; cmpxchgb %b1,%2"
- : "=a"(prev)
- : "q"(new), "m"(*__xg(ptr)), "0"(old)
- : "memory");
- return prev;
- case 2:
- asm volatile("lock; cmpxchgw %w1,%2"
- : "=a"(prev)
- : "r"(new), "m"(*__xg(ptr)), "0"(old)
- : "memory");
- return prev;
- case 4:
- asm volatile("lock; cmpxchgl %1,%2"
- : "=a"(prev)
- : "r"(new), "m"(*__xg(ptr)), "0"(old)
- : "memory");
- return prev;
- }
- return old;
-}
-
-static inline unsigned long __cmpxchg_local(volatile void *ptr,
- unsigned long old,
- unsigned long new, int size)
-{
- unsigned long prev;
- switch (size) {
- case 1:
- asm volatile("cmpxchgb %b1,%2"
- : "=a"(prev)
- : "q"(new), "m"(*__xg(ptr)), "0"(old)
- : "memory");
- return prev;
- case 2:
- asm volatile("cmpxchgw %w1,%2"
- : "=a"(prev)
- : "r"(new), "m"(*__xg(ptr)), "0"(old)
- : "memory");
- return prev;
- case 4:
- asm volatile("cmpxchgl %1,%2"
- : "=a"(prev)
- : "r"(new), "m"(*__xg(ptr)), "0"(old)
- : "memory");
- return prev;
- }
- return old;
-}
-
static inline unsigned long long __cmpxchg64(volatile void *ptr,
unsigned long long old,
unsigned long long new)
diff --git a/arch/x86/include/asm/cmpxchg_64.h b/arch/x86/include/asm/cmpxchg_64.h
index 52de72e..d565670 100644
--- a/arch/x86/include/asm/cmpxchg_64.h
+++ b/arch/x86/include/asm/cmpxchg_64.h
@@ -3,9 +3,6 @@

#include <asm/alternative.h> /* Provides LOCK_PREFIX */

-#define xchg(ptr, v) ((__typeof__(*(ptr)))__xchg((unsigned long)(v), \
- (ptr), sizeof(*(ptr))))
-
#define __xg(x) ((volatile long *)(x))

static inline void set_64bit(volatile unsigned long *ptr, unsigned long val)
@@ -15,167 +12,118 @@ static inline void set_64bit(volatile unsigned long *ptr, unsigned long val)

#define _set_64bit set_64bit

+extern void __xchg_wrong_size(void);
+extern void __cmpxchg_wrong_size(void);
+
/*
* Note: no "lock" prefix even on SMP: xchg always implies lock anyway
* Note 2: xchg has side effect, so that attribute volatile is necessary,
* but generally the primitive is invalid, *ptr is output argument. --ANK
*/
-static inline unsigned long __xchg(unsigned long x, volatile void *ptr,
- int size)
-{
- switch (size) {
- case 1:
- asm volatile("xchgb %b0,%1"
- : "=q" (x)
- : "m" (*__xg(ptr)), "0" (x)
- : "memory");
- break;
- case 2:
- asm volatile("xchgw %w0,%1"
- : "=r" (x)
- : "m" (*__xg(ptr)), "0" (x)
- : "memory");
- break;
- case 4:
- asm volatile("xchgl %k0,%1"
- : "=r" (x)
- : "m" (*__xg(ptr)), "0" (x)
- : "memory");
- break;
- case 8:
- asm volatile("xchgq %0,%1"
- : "=r" (x)
- : "m" (*__xg(ptr)), "0" (x)
- : "memory");
- break;
- }
- return x;
-}
+#define __xchg(x, ptr, size) \
+({ \
+ __typeof(*(ptr)) __x = (x); \
+ switch (size) { \
+ case 1: \
+ asm volatile("xchgb %b0,%1" \
+ : "=q" (__x) \
+ : "m" (*__xg(ptr)), "0" (__x) \
+ : "memory"); \
+ break; \
+ case 2: \
+ asm volatile("xchgw %w0,%1" \
+ : "=r" (__x) \
+ : "m" (*__xg(ptr)), "0" (__x) \
+ : "memory"); \
+ break; \
+ case 4: \
+ asm volatile("xchgl %k0,%1" \
+ : "=r" (__x) \
+ : "m" (*__xg(ptr)), "0" (__x) \
+ : "memory"); \
+ break; \
+ case 8: \
+ asm volatile("xchgq %0,%1" \
+ : "=r" (__x) \
+ : "m" (*__xg(ptr)), "0" (__x) \
+ : "memory"); \
+ break; \
+ default: \
+ __xchg_wrong_size(); \
+ } \
+ __x; \
+})
+
+#define xchg(ptr, v) \
+ __xchg((v), (ptr), sizeof(*ptr))
+
+#define __HAVE_ARCH_CMPXCHG 1

/*
* Atomic compare and exchange. Compare OLD with MEM, if identical,
* store NEW in MEM. Return the initial value in MEM. Success is
* indicated by comparing RETURN with OLD.
*/
+#define __raw_cmpxchg(ptr, old, new, size, lock) \
+({ \
+ __typeof__(*(ptr)) __ret; \
+ __typeof__(*(ptr)) __old = (old); \
+ __typeof__(*(ptr)) __new = (new); \
+ switch (size) { \
+ case 1: \
+ asm volatile(lock "cmpxchgb %b1,%2" \
+ : "=a"(__ret) \
+ : "q"(__new), "m"(*__xg(ptr)), "0"(__old) \
+ : "memory"); \
+ break; \
+ case 2: \
+ asm volatile(lock "cmpxchgw %w1,%2" \
+ : "=a"(__ret) \
+ : "r"(__new), "m"(*__xg(ptr)), "0"(__old) \
+ : "memory"); \
+ break; \
+ case 4: \
+ asm volatile(lock "cmpxchgl %k1,%2" \
+ : "=a"(__ret) \
+ : "r"(__new), "m"(*__xg(ptr)), "0"(__old) \
+ : "memory"); \
+ break; \
+ case 8: \
+ asm volatile(lock "cmpxchgq %1,%2" \
+ : "=a"(__ret) \
+ : "r"(__new), "m"(*__xg(ptr)), "0"(__old) \
+ : "memory"); \
+ break; \
+ default: \
+ __cmpxchg_wrong_size(); \
+ } \
+ __ret; \
+})

-#define __HAVE_ARCH_CMPXCHG 1
+#define __cmpxchg(ptr, old, new, size) \
+ __raw_cmpxchg((ptr), (old), (new), (size), LOCK_PREFIX)

-static inline unsigned long __cmpxchg(volatile void *ptr, unsigned long old,
- unsigned long new, int size)
-{
- unsigned long prev;
- switch (size) {
- case 1:
- asm volatile(LOCK_PREFIX "cmpxchgb %b1,%2"
- : "=a"(prev)
- : "q"(new), "m"(*__xg(ptr)), "0"(old)
- : "memory");
- return prev;
- case 2:
- asm volatile(LOCK_PREFIX "cmpxchgw %w1,%2"
- : "=a"(prev)
- : "r"(new), "m"(*__xg(ptr)), "0"(old)
- : "memory");
- return prev;
- case 4:
- asm volatile(LOCK_PREFIX "cmpxchgl %k1,%2"
- : "=a"(prev)
- : "r"(new), "m"(*__xg(ptr)), "0"(old)
- : "memory");
- return prev;
- case 8:
- asm volatile(LOCK_PREFIX "cmpxchgq %1,%2"
- : "=a"(prev)
- : "r"(new), "m"(*__xg(ptr)), "0"(old)
- : "memory");
- return prev;
- }
- return old;
-}
+#define __sync_cmpxchg(ptr, old, new, size) \
+ __raw_cmpxchg((ptr), (old), (new), (size), "lock; ")

-/*
- * Always use locked operations when touching memory shared with a
- * hypervisor, since the system may be SMP even if the guest kernel
- * isn't.
- */
-static inline unsigned long __sync_cmpxchg(volatile void *ptr,
- unsigned long old,
- unsigned long new, int size)
-{
- unsigned long prev;
- switch (size) {
- case 1:
- asm volatile("lock; cmpxchgb %b1,%2"
- : "=a"(prev)
- : "q"(new), "m"(*__xg(ptr)), "0"(old)
- : "memory");
- return prev;
- case 2:
- asm volatile("lock; cmpxchgw %w1,%2"
- : "=a"(prev)
- : "r"(new), "m"(*__xg(ptr)), "0"(old)
- : "memory");
- return prev;
- case 4:
- asm volatile("lock; cmpxchgl %1,%2"
- : "=a"(prev)
- : "r"(new), "m"(*__xg(ptr)), "0"(old)
- : "memory");
- return prev;
- }
- return old;
-}
+#define __cmpxchg_local(ptr, old, new, size) \
+ __raw_cmpxchg((ptr), (old), (new), (size), "")

-static inline unsigned long __cmpxchg_local(volatile void *ptr,
- unsigned long old,
- unsigned long new, int size)
-{
- unsigned long prev;
- switch (size) {
- case 1:
- asm volatile("cmpxchgb %b1,%2"
- : "=a"(prev)
- : "q"(new), "m"(*__xg(ptr)), "0"(old)
- : "memory");
- return prev;
- case 2:
- asm volatile("cmpxchgw %w1,%2"
- : "=a"(prev)
- : "r"(new), "m"(*__xg(ptr)), "0"(old)
- : "memory");
- return prev;
- case 4:
- asm volatile("cmpxchgl %k1,%2"
- : "=a"(prev)
- : "r"(new), "m"(*__xg(ptr)), "0"(old)
- : "memory");
- return prev;
- case 8:
- asm volatile("cmpxchgq %1,%2"
- : "=a"(prev)
- : "r"(new), "m"(*__xg(ptr)), "0"(old)
- : "memory");
- return prev;
- }
- return old;
-}
+#define cmpxchg(ptr, old, new) \
+ __cmpxchg((ptr), (old), (new), sizeof(*ptr))
+
+#define sync_cmpxchg(ptr, old, new) \
+ __sync_cmpxchg((ptr), (old), (new), sizeof(*ptr))
+
+#define cmpxchg_local(ptr, old, new) \
+ __cmpxchg_local((ptr), (old), (new), sizeof(*ptr))

-#define cmpxchg(ptr, o, n) \
- ((__typeof__(*(ptr)))__cmpxchg((ptr), (unsigned long)(o), \
- (unsigned long)(n), sizeof(*(ptr))))
#define cmpxchg64(ptr, o, n) \
({ \
BUILD_BUG_ON(sizeof(*(ptr)) != 8); \
cmpxchg((ptr), (o), (n)); \
})
-#define cmpxchg_local(ptr, o, n) \
- ((__typeof__(*(ptr)))__cmpxchg_local((ptr), (unsigned long)(o), \
- (unsigned long)(n), \
- sizeof(*(ptr))))
-#define sync_cmpxchg(ptr, o, n) \
- ((__typeof__(*(ptr)))__sync_cmpxchg((ptr), (unsigned long)(o), \
- (unsigned long)(n), \
- sizeof(*(ptr))))
+
#define cmpxchg64_local(ptr, o, n) \
({ \
BUILD_BUG_ON(sizeof(*(ptr)) != 8); \

2009-10-05 20:55:41

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] x86: Generate cmpxchg build failures



On Mon, 5 Oct 2009, Peter Zijlstra wrote:
>
> Here's a new one, still seems to build i386 and x86_64 defconfigs.

Ack. Looks good to me.

Linus

2009-10-09 14:24:20

by Peter Zijlstra

[permalink] [raw]
Subject: [tip:x86/asm] x86: Generate cmpxchg build failures

Commit-ID: f3834b9ef68067199486740b31f691afb14dbdf5
Gitweb: http://git.kernel.org/tip/f3834b9ef68067199486740b31f691afb14dbdf5
Author: Peter Zijlstra <[email protected]>
AuthorDate: Fri, 9 Oct 2009 10:12:46 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 9 Oct 2009 15:57:00 +0200

x86: Generate cmpxchg build failures

Rework the x86 cmpxchg() implementation to generate build failures
when used on improper types.

Signed-off-by: Peter Zijlstra <[email protected]>
Acked-by: Linus Torvalds <[email protected]>
LKML-Reference: <1254771187.21044.22.camel@laptop>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/cmpxchg_32.h | 218 ++++++++++++++---------------------
arch/x86/include/asm/cmpxchg_64.h | 234 ++++++++++++++----------------------
2 files changed, 177 insertions(+), 275 deletions(-)

diff --git a/arch/x86/include/asm/cmpxchg_32.h b/arch/x86/include/asm/cmpxchg_32.h
index 82ceb78..5371174 100644
--- a/arch/x86/include/asm/cmpxchg_32.h
+++ b/arch/x86/include/asm/cmpxchg_32.h
@@ -8,14 +8,50 @@
* you need to test for the feature in boot_cpu_data.
*/

-#define xchg(ptr, v) \
- ((__typeof__(*(ptr)))__xchg((unsigned long)(v), (ptr), sizeof(*(ptr))))
+extern void __xchg_wrong_size(void);
+
+/*
+ * Note: no "lock" prefix even on SMP: xchg always implies lock anyway
+ * Note 2: xchg has side effect, so that attribute volatile is necessary,
+ * but generally the primitive is invalid, *ptr is output argument. --ANK
+ */

struct __xchg_dummy {
unsigned long a[100];
};
#define __xg(x) ((struct __xchg_dummy *)(x))

+#define __xchg(x, ptr, size) \
+({ \
+ __typeof(*(ptr)) __x = (x); \
+ switch (size) { \
+ case 1: \
+ asm volatile("xchgb %b0,%1" \
+ : "=q" (__x) \
+ : "m" (*__xg(ptr)), "0" (__x) \
+ : "memory"); \
+ break; \
+ case 2: \
+ asm volatile("xchgw %w0,%1" \
+ : "=r" (__x) \
+ : "m" (*__xg(ptr)), "0" (__x) \
+ : "memory"); \
+ break; \
+ case 4: \
+ asm volatile("xchgl %0,%1" \
+ : "=r" (__x) \
+ : "m" (*__xg(ptr)), "0" (__x) \
+ : "memory"); \
+ break; \
+ default: \
+ __xchg_wrong_size(); \
+ } \
+ __x; \
+})
+
+#define xchg(ptr, v) \
+ __xchg((v), (ptr), sizeof(*ptr))
+
/*
* The semantics of XCHGCMP8B are a bit strange, this is why
* there is a loop and the loading of %%eax and %%edx has to
@@ -71,57 +107,63 @@ static inline void __set_64bit_var(unsigned long long *ptr,
(unsigned int)((value) >> 32)) \
: __set_64bit(ptr, ll_low((value)), ll_high((value))))

-/*
- * Note: no "lock" prefix even on SMP: xchg always implies lock anyway
- * Note 2: xchg has side effect, so that attribute volatile is necessary,
- * but generally the primitive is invalid, *ptr is output argument. --ANK
- */
-static inline unsigned long __xchg(unsigned long x, volatile void *ptr,
- int size)
-{
- switch (size) {
- case 1:
- asm volatile("xchgb %b0,%1"
- : "=q" (x)
- : "m" (*__xg(ptr)), "0" (x)
- : "memory");
- break;
- case 2:
- asm volatile("xchgw %w0,%1"
- : "=r" (x)
- : "m" (*__xg(ptr)), "0" (x)
- : "memory");
- break;
- case 4:
- asm volatile("xchgl %0,%1"
- : "=r" (x)
- : "m" (*__xg(ptr)), "0" (x)
- : "memory");
- break;
- }
- return x;
-}
+extern void __cmpxchg_wrong_size(void);

/*
* Atomic compare and exchange. Compare OLD with MEM, if identical,
* store NEW in MEM. Return the initial value in MEM. Success is
* indicated by comparing RETURN with OLD.
*/
+#define __raw_cmpxchg(ptr, old, new, size, lock) \
+({ \
+ __typeof__(*(ptr)) __ret; \
+ __typeof__(*(ptr)) __old = (old); \
+ __typeof__(*(ptr)) __new = (new); \
+ switch (size) { \
+ case 1: \
+ asm volatile(lock "cmpxchgb %b1,%2" \
+ : "=a"(__ret) \
+ : "q"(__new), "m"(*__xg(ptr)), "0"(__old) \
+ : "memory"); \
+ break; \
+ case 2: \
+ asm volatile(lock "cmpxchgw %w1,%2" \
+ : "=a"(__ret) \
+ : "r"(__new), "m"(*__xg(ptr)), "0"(__old) \
+ : "memory"); \
+ break; \
+ case 4: \
+ asm volatile(lock "cmpxchgl %1,%2" \
+ : "=a"(__ret) \
+ : "r"(__new), "m"(*__xg(ptr)), "0"(__old) \
+ : "memory"); \
+ break; \
+ default: \
+ __cmpxchg_wrong_size(); \
+ } \
+ __ret; \
+})
+
+#define __cmpxchg(ptr, old, new, size) \
+ __raw_cmpxchg((ptr), (old), (new), (size), LOCK_PREFIX)
+
+#define __sync_cmpxchg(ptr, old, new, size) \
+ __raw_cmpxchg((ptr), (old), (new), (size), "lock; ")
+
+#define __cmpxchg_local(ptr, old, new, size) \
+ __raw_cmpxchg((ptr), (old), (new), (size), "")

#ifdef CONFIG_X86_CMPXCHG
#define __HAVE_ARCH_CMPXCHG 1
-#define cmpxchg(ptr, o, n) \
- ((__typeof__(*(ptr)))__cmpxchg((ptr), (unsigned long)(o), \
- (unsigned long)(n), \
- sizeof(*(ptr))))
-#define sync_cmpxchg(ptr, o, n) \
- ((__typeof__(*(ptr)))__sync_cmpxchg((ptr), (unsigned long)(o), \
- (unsigned long)(n), \
- sizeof(*(ptr))))
-#define cmpxchg_local(ptr, o, n) \
- ((__typeof__(*(ptr)))__cmpxchg_local((ptr), (unsigned long)(o), \
- (unsigned long)(n), \
- sizeof(*(ptr))))
+
+#define cmpxchg(ptr, old, new) \
+ __cmpxchg((ptr), (old), (new), sizeof(*ptr))
+
+#define sync_cmpxchg(ptr, old, new) \
+ __sync_cmpxchg((ptr), (old), (new), sizeof(*ptr))
+
+#define cmpxchg_local(ptr, old, new) \
+ __cmpxchg_local((ptr), (old), (new), sizeof(*ptr))
#endif

#ifdef CONFIG_X86_CMPXCHG64
@@ -133,94 +175,6 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr,
(unsigned long long)(n)))
#endif

-static inline unsigned long __cmpxchg(volatile void *ptr, unsigned long old,
- unsigned long new, int size)
-{
- unsigned long prev;
- switch (size) {
- case 1:
- asm volatile(LOCK_PREFIX "cmpxchgb %b1,%2"
- : "=a"(prev)
- : "q"(new), "m"(*__xg(ptr)), "0"(old)
- : "memory");
- return prev;
- case 2:
- asm volatile(LOCK_PREFIX "cmpxchgw %w1,%2"
- : "=a"(prev)
- : "r"(new), "m"(*__xg(ptr)), "0"(old)
- : "memory");
- return prev;
- case 4:
- asm volatile(LOCK_PREFIX "cmpxchgl %1,%2"
- : "=a"(prev)
- : "r"(new), "m"(*__xg(ptr)), "0"(old)
- : "memory");
- return prev;
- }
- return old;
-}
-
-/*
- * Always use locked operations when touching memory shared with a
- * hypervisor, since the system may be SMP even if the guest kernel
- * isn't.
- */
-static inline unsigned long __sync_cmpxchg(volatile void *ptr,
- unsigned long old,
- unsigned long new, int size)
-{
- unsigned long prev;
- switch (size) {
- case 1:
- asm volatile("lock; cmpxchgb %b1,%2"
- : "=a"(prev)
- : "q"(new), "m"(*__xg(ptr)), "0"(old)
- : "memory");
- return prev;
- case 2:
- asm volatile("lock; cmpxchgw %w1,%2"
- : "=a"(prev)
- : "r"(new), "m"(*__xg(ptr)), "0"(old)
- : "memory");
- return prev;
- case 4:
- asm volatile("lock; cmpxchgl %1,%2"
- : "=a"(prev)
- : "r"(new), "m"(*__xg(ptr)), "0"(old)
- : "memory");
- return prev;
- }
- return old;
-}
-
-static inline unsigned long __cmpxchg_local(volatile void *ptr,
- unsigned long old,
- unsigned long new, int size)
-{
- unsigned long prev;
- switch (size) {
- case 1:
- asm volatile("cmpxchgb %b1,%2"
- : "=a"(prev)
- : "q"(new), "m"(*__xg(ptr)), "0"(old)
- : "memory");
- return prev;
- case 2:
- asm volatile("cmpxchgw %w1,%2"
- : "=a"(prev)
- : "r"(new), "m"(*__xg(ptr)), "0"(old)
- : "memory");
- return prev;
- case 4:
- asm volatile("cmpxchgl %1,%2"
- : "=a"(prev)
- : "r"(new), "m"(*__xg(ptr)), "0"(old)
- : "memory");
- return prev;
- }
- return old;
-}
-
static inline unsigned long long __cmpxchg64(volatile void *ptr,
unsigned long long old,
unsigned long long new)
diff --git a/arch/x86/include/asm/cmpxchg_64.h b/arch/x86/include/asm/cmpxchg_64.h
index 52de72e..485ae41 100644
--- a/arch/x86/include/asm/cmpxchg_64.h
+++ b/arch/x86/include/asm/cmpxchg_64.h
@@ -3,9 +3,6 @@

#include <asm/alternative.h> /* Provides LOCK_PREFIX */

-#define xchg(ptr, v) ((__typeof__(*(ptr)))__xchg((unsigned long)(v), \
- (ptr), sizeof(*(ptr))))
-
#define __xg(x) ((volatile long *)(x))

static inline void set_64bit(volatile unsigned long *ptr, unsigned long val)
@@ -15,167 +12,118 @@ static inline void set_64bit(volatile unsigned long *ptr, unsigned long val)

#define _set_64bit set_64bit

+extern void __xchg_wrong_size(void);
+extern void __cmpxchg_wrong_size(void);
+
/*
* Note: no "lock" prefix even on SMP: xchg always implies lock anyway
* Note 2: xchg has side effect, so that attribute volatile is necessary,
* but generally the primitive is invalid, *ptr is output argument. --ANK
*/
-static inline unsigned long __xchg(unsigned long x, volatile void *ptr,
- int size)
-{
- switch (size) {
- case 1:
- asm volatile("xchgb %b0,%1"
- : "=q" (x)
- : "m" (*__xg(ptr)), "0" (x)
- : "memory");
- break;
- case 2:
- asm volatile("xchgw %w0,%1"
- : "=r" (x)
- : "m" (*__xg(ptr)), "0" (x)
- : "memory");
- break;
- case 4:
- asm volatile("xchgl %k0,%1"
- : "=r" (x)
- : "m" (*__xg(ptr)), "0" (x)
- : "memory");
- break;
- case 8:
- asm volatile("xchgq %0,%1"
- : "=r" (x)
- : "m" (*__xg(ptr)), "0" (x)
- : "memory");
- break;
- }
- return x;
-}
+#define __xchg(x, ptr, size) \
+({ \
+ __typeof(*(ptr)) __x = (x); \
+ switch (size) { \
+ case 1: \
+ asm volatile("xchgb %b0,%1" \
+ : "=q" (__x) \
+ : "m" (*__xg(ptr)), "0" (__x) \
+ : "memory"); \
+ break; \
+ case 2: \
+ asm volatile("xchgw %w0,%1" \
+ : "=r" (__x) \
+ : "m" (*__xg(ptr)), "0" (__x) \
+ : "memory"); \
+ break; \
+ case 4: \
+ asm volatile("xchgl %k0,%1" \
+ : "=r" (__x) \
+ : "m" (*__xg(ptr)), "0" (__x) \
+ : "memory"); \
+ break; \
+ case 8: \
+ asm volatile("xchgq %0,%1" \
+ : "=r" (__x) \
+ : "m" (*__xg(ptr)), "0" (__x) \
+ : "memory"); \
+ break; \
+ default: \
+ __xchg_wrong_size(); \
+ } \
+ __x; \
+})
+
+#define xchg(ptr, v) \
+ __xchg((v), (ptr), sizeof(*ptr))
+
+#define __HAVE_ARCH_CMPXCHG 1

/*
* Atomic compare and exchange. Compare OLD with MEM, if identical,
* store NEW in MEM. Return the initial value in MEM. Success is
* indicated by comparing RETURN with OLD.
*/
+#define __raw_cmpxchg(ptr, old, new, size, lock) \
+({ \
+ __typeof__(*(ptr)) __ret; \
+ __typeof__(*(ptr)) __old = (old); \
+ __typeof__(*(ptr)) __new = (new); \
+ switch (size) { \
+ case 1: \
+ asm volatile(lock "cmpxchgb %b1,%2" \
+ : "=a"(__ret) \
+ : "q"(__new), "m"(*__xg(ptr)), "0"(__old) \
+ : "memory"); \
+ break; \
+ case 2: \
+ asm volatile(lock "cmpxchgw %w1,%2" \
+ : "=a"(__ret) \
+ : "r"(__new), "m"(*__xg(ptr)), "0"(__old) \
+ : "memory"); \
+ break; \
+ case 4: \
+ asm volatile(lock "cmpxchgl %k1,%2" \
+ : "=a"(__ret) \
+ : "r"(__new), "m"(*__xg(ptr)), "0"(__old) \
+ : "memory"); \
+ break; \
+ case 8: \
+ asm volatile(lock "cmpxchgq %1,%2" \
+ : "=a"(__ret) \
+ : "r"(__new), "m"(*__xg(ptr)), "0"(__old) \
+ : "memory"); \
+ break; \
+ default: \
+ __cmpxchg_wrong_size(); \
+ } \
+ __ret; \
+})

-#define __HAVE_ARCH_CMPXCHG 1
+#define __cmpxchg(ptr, old, new, size) \
+ __raw_cmpxchg((ptr), (old), (new), (size), LOCK_PREFIX)

-static inline unsigned long __cmpxchg(volatile void *ptr, unsigned long old,
- unsigned long new, int size)
-{
- unsigned long prev;
- switch (size) {
- case 1:
- asm volatile(LOCK_PREFIX "cmpxchgb %b1,%2"
- : "=a"(prev)
- : "q"(new), "m"(*__xg(ptr)), "0"(old)
- : "memory");
- return prev;
- case 2:
- asm volatile(LOCK_PREFIX "cmpxchgw %w1,%2"
- : "=a"(prev)
- : "r"(new), "m"(*__xg(ptr)), "0"(old)
- : "memory");
- return prev;
- case 4:
- asm volatile(LOCK_PREFIX "cmpxchgl %k1,%2"
- : "=a"(prev)
- : "r"(new), "m"(*__xg(ptr)), "0"(old)
- : "memory");
- return prev;
- case 8:
- asm volatile(LOCK_PREFIX "cmpxchgq %1,%2"
- : "=a"(prev)
- : "r"(new), "m"(*__xg(ptr)), "0"(old)
- : "memory");
- return prev;
- }
- return old;
-}
+#define __sync_cmpxchg(ptr, old, new, size) \
+ __raw_cmpxchg((ptr), (old), (new), (size), "lock; ")

-/*
- * Always use locked operations when touching memory shared with a
- * hypervisor, since the system may be SMP even if the guest kernel
- * isn't.
- */
-static inline unsigned long __sync_cmpxchg(volatile void *ptr,
- unsigned long old,
- unsigned long new, int size)
-{
- unsigned long prev;
- switch (size) {
- case 1:
- asm volatile("lock; cmpxchgb %b1,%2"
- : "=a"(prev)
- : "q"(new), "m"(*__xg(ptr)), "0"(old)
- : "memory");
- return prev;
- case 2:
- asm volatile("lock; cmpxchgw %w1,%2"
- : "=a"(prev)
- : "r"(new), "m"(*__xg(ptr)), "0"(old)
- : "memory");
- return prev;
- case 4:
- asm volatile("lock; cmpxchgl %1,%2"
- : "=a"(prev)
- : "r"(new), "m"(*__xg(ptr)), "0"(old)
- : "memory");
- return prev;
- }
- return old;
-}
+#define __cmpxchg_local(ptr, old, new, size) \
+ __raw_cmpxchg((ptr), (old), (new), (size), "")

-static inline unsigned long __cmpxchg_local(volatile void *ptr,
- unsigned long old,
- unsigned long new, int size)
-{
- unsigned long prev;
- switch (size) {
- case 1:
- asm volatile("cmpxchgb %b1,%2"
- : "=a"(prev)
- : "q"(new), "m"(*__xg(ptr)), "0"(old)
- : "memory");
- return prev;
- case 2:
- asm volatile("cmpxchgw %w1,%2"
- : "=a"(prev)
- : "r"(new), "m"(*__xg(ptr)), "0"(old)
- : "memory");
- return prev;
- case 4:
- asm volatile("cmpxchgl %k1,%2"
- : "=a"(prev)
- : "r"(new), "m"(*__xg(ptr)), "0"(old)
- : "memory");
- return prev;
- case 8:
- asm volatile("cmpxchgq %1,%2"
- : "=a"(prev)
- : "r"(new), "m"(*__xg(ptr)), "0"(old)
- : "memory");
- return prev;
- }
- return old;
-}
+#define cmpxchg(ptr, old, new) \
+ __cmpxchg((ptr), (old), (new), sizeof(*ptr))
+
+#define sync_cmpxchg(ptr, old, new) \
+ __sync_cmpxchg((ptr), (old), (new), sizeof(*ptr))
+
+#define cmpxchg_local(ptr, old, new) \
+ __cmpxchg_local((ptr), (old), (new), sizeof(*ptr))

-#define cmpxchg(ptr, o, n) \
- ((__typeof__(*(ptr)))__cmpxchg((ptr), (unsigned long)(o), \
- (unsigned long)(n), sizeof(*(ptr))))
#define cmpxchg64(ptr, o, n) \
({ \
BUILD_BUG_ON(sizeof(*(ptr)) != 8); \
cmpxchg((ptr), (o), (n)); \
})
-#define cmpxchg_local(ptr, o, n) \
- ((__typeof__(*(ptr)))__cmpxchg_local((ptr), (unsigned long)(o), \
- (unsigned long)(n), \
- sizeof(*(ptr))))
-#define sync_cmpxchg(ptr, o, n) \
- ((__typeof__(*(ptr)))__sync_cmpxchg((ptr), (unsigned long)(o), \
- (unsigned long)(n), \
- sizeof(*(ptr))))
+
#define cmpxchg64_local(ptr, o, n) \
({ \
BUILD_BUG_ON(sizeof(*(ptr)) != 8); \