2007-05-08 19:52:04

by Ken Moffat

[permalink] [raw]
Subject: Lockup after logging out of X

This is a resend, with a better title and slightly more
clarification. Originally sent yesterday evening, but I can see no
evidence that it got beyond my isp's mailserver. Apologies to the
Cc's if you did get the original.

Using Linus' tree pulled on Sunday afternoon UK time. Running an
amd64 ('pure64') desktop using gdm for graphical login, on Xorg 7.2
with a radeon 9200se. No problems until I log out of the desktop
and go back to gdm. Before that, the text consoles seem to be
working fine. After I go back to gdm. the display is corrupted and
only MagicSysRQ works. Mostly, the keyboard LEDs flash, but the
only thing that made it to the logs was this exceedingly incomplete
oops report:

May 7 21:02:54 bluesbreaker kernel: [ 46.549615] [drm] writeback
test succeeded in 1 usecs
May 7 21:03:24 bluesbreaker kernel: [ 61.552793] Unable to handle
kernel paging request at ffff81003befd3e8 RIP:
May 7 21:03:24 bluesbreaker kernel: [ 61.552798]
[<ffffffff80271576>] fasync_helper+0x52/0xf0
May 7 21:03:24 bluesbreaker kernel: [ 61.552805] PGD 8063 PUD
9063 PMD 800000003bef11e3 BAD
May 7 21:03:24 bluesbreaker kernel: [ 61.552811] Oops: 0009 [1]
PREEMPT
May 7 21:04:18 bluesbreaker syslogd 1.4.1: restart.

After trying git-bisect, it tells me:
0dbf7028c0c1f266c9631139450a1502d3cd457e is first bad commit
commit 0dbf7028c0c1f266c9631139450a1502d3cd457e
Author: Vivek Goyal <[email protected]>
Date: Wed May 2 19:27:07 2007 +0200

[PATCH] x86: __pa and __pa_symbol address space separation

Currently __pa_symbol is for use with symbols in the kernel
address
map and __pa is for use with pointers into the physical memory
map.
But the code is implemented so you can usually interchange the
two.

__pa which is much more common can be implemented much more
cheaply
if it is it doesn't have to worry about any other kernel address
spaces. This is especially true with a relocatable kernel as
__pa_symbol needs to peform an extra variable read to resolve
the address.

[snip the rest of the commit message on the resend]
This time, I'll gzip the config in case it was a size problem for
the list.

Ken
--
das eine Mal als Trag?die, das andere Mal als Farce


Attachments:
(No filename) (2.17 kB)
config.amd64.gz (9.29 kB)
Download all attachments

2007-05-08 20:22:44

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: Lockup after logging out of X

On Tue, 08 May 2007 20:51:42 BST, Ken Moffat said:

> After trying git-bisect, it tells me:
> 0dbf7028c0c1f266c9631139450a1502d3cd457e is first bad commit
> commit 0dbf7028c0c1f266c9631139450a1502d3cd457e
> Author: Vivek Goyal <[email protected]>
> Date: Wed May 2 19:27:07 2007 +0200
>
> [PATCH] x86: __pa and __pa_symbol address space separation
>
> Currently __pa_symbol is for use with symbols in the kernel address
> map and __pa is for use with pointers into the physical memory map.
> But the code is implemented so you can usually interchange the two.

I saw this same exact problem on my box back at 21-rc5-mm4, but didn't
report it here because Christoph Hellwig thinks such things are off-topic (as
I was seeing it with the NVidia driver).

http://marc.info/?l=linux-kernel&m=117579249300455&w=2
http://www.nvnews.net/vbulletin/showthread.php?t=89444

Sorry to see you hit a problem with the exact same patch with a Radeon card
a month later.





Attachments:
(No filename) (226.00 B)

2007-05-08 20:48:58

by Ken Moffat

[permalink] [raw]
Subject: Re: Lockup after logging out of X

On Tue, May 08, 2007 at 04:21:51PM -0400, [email protected] wrote:
> On Tue, 08 May 2007 20:51:42 BST, Ken Moffat said:
>
> > After trying git-bisect, it tells me:
> > 0dbf7028c0c1f266c9631139450a1502d3cd457e is first bad commit
> > commit 0dbf7028c0c1f266c9631139450a1502d3cd457e
> > Author: Vivek Goyal <[email protected]>
> > Date: Wed May 2 19:27:07 2007 +0200
> >
> > [PATCH] x86: __pa and __pa_symbol address space separation
[...]
>
> I saw this same exact problem on my box back at 21-rc5-mm4, but didn't
> report it here because Christoph Hellwig thinks such things are off-topic (as
> I was seeing it with the NVidia driver).
>
> http://marc.info/?l=linux-kernel&m=117579249300455&w=2
> http://www.nvnews.net/vbulletin/showthread.php?t=89444
>
> Sorry to see you hit a problem with the exact same patch with a Radeon card
> a month later.
>
OK, in view of the minimal oops output that made it to my log
(forgot to say in today's repost that it was from one of the bad
bisect builds), I'd better point out that I don't have the ati
module (fglrx or whatever it is called), this is purely using Linus'
tree.

Ken
--
das eine Mal als Trag?die, das andere Mal als Farce

2007-05-08 22:47:20

by Michal Piotrowski

[permalink] [raw]
Subject: Re: Lockup after logging out of X

Hi Ken,

On 08/05/07, Ken Moffat <[email protected]> wrote:
> This is a resend, with a better title and slightly more
> clarification. Originally sent yesterday evening, but I can see no
> evidence that it got beyond my isp's mailserver. Apologies to the
> Cc's if you did get the original.
>
> Using Linus' tree pulled on Sunday afternoon UK time. Running an
> amd64 ('pure64') desktop using gdm for graphical login, on Xorg 7.2
> with a radeon 9200se. No problems until I log out of the desktop
> and go back to gdm. Before that, the text consoles seem to be
> working fine. After I go back to gdm. the display is corrupted and
> only MagicSysRQ works. Mostly, the keyboard LEDs flash, but the
> only thing that made it to the logs was this exceedingly incomplete
> oops report:
>
> May 7 21:02:54 bluesbreaker kernel: [ 46.549615] [drm] writeback
> test succeeded in 1 usecs
> May 7 21:03:24 bluesbreaker kernel: [ 61.552793] Unable to handle
> kernel paging request at ffff81003befd3e8 RIP:
> May 7 21:03:24 bluesbreaker kernel: [ 61.552798]
> [<ffffffff80271576>] fasync_helper+0x52/0xf0
> May 7 21:03:24 bluesbreaker kernel: [ 61.552805] PGD 8063 PUD
> 9063 PMD 800000003bef11e3 BAD
> May 7 21:03:24 bluesbreaker kernel: [ 61.552811] Oops: 0009 [1]
> PREEMPT
> May 7 21:04:18 bluesbreaker syslogd 1.4.1: restart.
>
> After trying git-bisect, it tells me:
> 0dbf7028c0c1f266c9631139450a1502d3cd457e is first bad commit
> commit 0dbf7028c0c1f266c9631139450a1502d3cd457e
> Author: Vivek Goyal <[email protected]>
> Date: Wed May 2 19:27:07 2007 +0200
>
> [PATCH] x86: __pa and __pa_symbol address space separation
>
> Currently __pa_symbol is for use with symbols in the kernel
> address
> map and __pa is for use with pointers into the physical memory
> map.
> But the code is implemented so you can usually interchange the
> two.
>
> __pa which is much more common can be implemented much more
> cheaply
> if it is it doesn't have to worry about any other kernel address
> spaces. This is especially true with a relocatable kernel as
> __pa_symbol needs to peform an extra variable read to resolve
> the address.
>
> [snip the rest of the commit message on the resend]
> This time, I'll gzip the config in case it was a size problem for
> the list.

Please check the latest -git.

31 hours ago Linus Torvalds Revert "[PATCH] x86: __pa and
__pa_symbol address space ...

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=e3ebadd95cb621e2c7436f3d3646447ac9d5c16d

>
> Ken
> --
> das eine Mal als Trag?die, das andere Mal als Farce
>
>

Regards,
Michal

--
Michal K. K. Piotrowski
Kernel Monkeys
(http://kernel.wikidot.com/start)

2007-05-08 23:01:54

by Andi Kleen

[permalink] [raw]
Subject: Re: Lockup after logging out of X

On Tue, May 08, 2007 at 08:51:42PM +0100, Ken Moffat wrote:
> This is a resend, with a better title and slightly more
> clarification. Originally sent yesterday evening, but I can see no
> evidence that it got beyond my isp's mailserver. Apologies to the
> Cc's if you did get the original.
>
> Using Linus' tree pulled on Sunday afternoon UK time. Running an
> amd64 ('pure64') desktop using gdm for graphical login, on Xorg 7.2
> with a radeon 9200se. No problems until I log out of the desktop
> and go back to gdm. Before that, the text consoles seem to be
> working fine. After I go back to gdm. the display is corrupted and
> only MagicSysRQ works. Mostly, the keyboard LEDs flash, but the
> only thing that made it to the logs was this exceedingly incomplete
> oops report:
>
> May 7 21:02:54 bluesbreaker kernel: [ 46.549615] [drm] writeback
> test succeeded in 1 usecs
> May 7 21:03:24 bluesbreaker kernel: [ 61.552793] Unable to handle
> kernel paging request at ffff81003befd3e8 RIP:
> May 7 21:03:24 bluesbreaker kernel: [ 61.552798]
> [<ffffffff80271576>] fasync_helper+0x52/0xf0
> May 7 21:03:24 bluesbreaker kernel: [ 61.552805] PGD 8063 PUD
> 9063 PMD 800000003bef11e3 BAD
> May 7 21:03:24 bluesbreaker kernel: [ 61.552811] Oops: 0009 [1]
> PREEMPT
> May 7 21:04:18 bluesbreaker syslogd 1.4.1: restart.
>
> After trying git-bisect, it tells me:
> 0dbf7028c0c1f266c9631139450a1502d3cd457e is first bad commit
> commit 0dbf7028c0c1f266c9631139450a1502d3cd457e

Already known, although it is still unclear what the bug actually is.
Can you run with the appended patch please (from Eric Biederman)
and post any backtraces the WARN_ON in there spews out?

Also do you use swiotlb?

Thanks

-Andi

diff --git a/include/asm-x86_64/page.h b/include/asm-x86_64/page.h
index b17fc16..e6a4d1e 100644
--- a/include/asm-x86_64/page.h
+++ b/include/asm-x86_64/page.h
@@ -105,7 +105,18 @@ extern unsigned long phys_base;

/* Note: __pa(&symbol_visible_to_c) should be always replaced with __pa_symbol.
Otherwise you risk miscompilation. */
-#define __pa(x) ((unsigned long)(x) - PAGE_OFFSET)
+#define __pa(x) \
+({ \
+ unsigned long v; \
+ asm("" : "=r" (v) : "0" ((unsigned long)(x))); \
+ WARN_ON(v >= __START_KERNEL_map); \
+ if (likely(v < __START_KERNEL_map)) \
+ v -= PAGE_OFFSET; \
+ else \
+ v = (v - __START_KERNEL_map) + phys_base; \
+ v; \
+})
+
/* __pa_symbol should be used for C visible symbols.
This seems to be the official gcc blessed way to do such arithmetic. */
#define __pa_symbol(x) \


2007-05-08 23:04:19

by Ken Moffat

[permalink] [raw]
Subject: Re: Lockup after logging out of X

On Wed, May 09, 2007 at 12:47:02AM +0200, Michal Piotrowski wrote:
>
> Please check the latest -git.
>
> 31 hours ago Linus Torvalds Revert "[PATCH] x86: __pa and
> __pa_symbol address space ...
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=e3ebadd95cb621e2c7436f3d3646447ac9d5c16d

Thanks.
>
> Regards,
> Michal
>
> --
> Michal K. K. Piotrowski
> Kernel Monkeys
> (http://kernel.wikidot.com/start)

--
das eine Mal als Trag?die, das andere Mal als Farce

2007-05-08 23:42:31

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: Lockup after logging out of X

On Wed, 09 May 2007 01:01:34 +0200, Andi Kleen said:
> Already known, although it is still unclear what the bug actually is.
> Can you run with the appended patch please (from Eric Biederman)
> and post any backtraces the WARN_ON in there spews out?

> diff --git a/include/asm-x86_64/page.h b/include/asm-x86_64/page.h
> index b17fc16..e6a4d1e 100644

> + if (likely(v < __START_KERNEL_map)) \
> + v -= PAGE_OFFSET; \
> + else \
> + v = (v - __START_KERNEL_map) + phys_base; \

ERROR: "phys_base" [net/sunrpc/sunrpc.ko] undefined!
ERROR: "phys_base" [net/sunrpc/auth_gss/rpcsec_gss_spkm3.ko] undefined!
ERROR: "phys_base" [net/sunrpc/auth_gss/auth_rpcgss.ko] undefined!
ERROR: "phys_base" [net/mac80211/mac80211.ko] undefined!
ERROR: "phys_base" [net/ieee80211/ieee80211_crypt_wep.ko] undefined!
ERROR: "phys_base" [net/ieee80211/ieee80211_crypt_tkip.ko] undefined!
ERROR: "phys_base" [fs/nfsd/nfsd.ko] undefined!
ERROR: "phys_base" [fs/nfs/nfs.ko] undefined!
ERROR: "phys_base" [fs/jbd2/jbd2.ko] undefined!
ERROR: "phys_base" [fs/ecryptfs/ecryptfs.ko] undefined!
ERROR: "phys_base" [drivers/net/ppp_mppe.ko] undefined!
ERROR: "phys_base" [drivers/char/agp/intel-agp.ko] undefined!
make[1]: *** [__modpost] Error 1

Looks like a missing EXPORT_SYMBOL. Unfortunately, the definition is here:

arch/x86_64/kernel/head.S:ENTRY(phys_base)

Do I need to go the old symtabs.c route and stick the EXPORT_SYMBOL in some
other .c file?


Attachments:
(No filename) (226.00 B)

2007-05-09 00:15:17

by Ken Moffat

[permalink] [raw]
Subject: Re: Lockup after logging out of X

On Wed, May 09, 2007 at 01:01:34AM +0200, Andi Kleen wrote:
>
> Already known, although it is still unclear what the bug actually is.
> Can you run with the appended patch please (from Eric Biederman)
> and post any backtraces the WARN_ON in there spews out?
>
> Also do you use swiotlb?
>
> Thanks
>
> -Andi
>
Had to turn off modules to get it to build. It half-logged another
oops, but no backtraces, and SysRq+r does nothing. Nothing else
unusual in the log.

May 9 00:47:12 bluesbreaker gconfd (ken-2833): Resolved address
"xml:readonly:/etc/gnome/gconf/gconf.xml.defaults" to a read-only
configuration source at position 2
May 9 00:52:59 bluesbreaker kernel: [ 275.667737] Unable to handle
kernel paging request at ffff81003b4ac3e8 RIP:
May 9 00:52:59 bluesbreaker kernel: [ 275.667742]
[<ffffffff8027134a>] fasync_helper+0x52/0xf0
May 9 00:52:59 bluesbreaker kernel: [ 275.667749] PGD 8063 PUD
9063 PMD 800000003b4a11e3 BAD
May 9 00:52:59 bluesbreaker kernel: [ 275.667754] Oops: 0009 [1]
PREEMPT
May 9 00:54:26 bluesbreaker syslogd 1.4.1: restart.
May 9 00:54:26 bluesbreaker bootlog: Starting system log daemon...
[ OK ]

Apparently I do use swiotlb - I didn't know that, and can't see
where it gets asked in menuconfig, but I can see
CONFIG_SWIOTLB=y

Let me know if there is anything else I can test (probably pm
tomorrow), otherwise I'll go back to -head.

Ken
--
das eine Mal als Trag?die, das andere Mal als Farce

2007-05-09 01:48:31

by Dave Young

[permalink] [raw]
Subject: Re: Lockup after logging out of X

Hi,
Seems I have the same problem. Sometimes whole system is locked,
sometimes keyboard is still active but others dead.

It does not happen till now since I run a silly script on background
to log the kernel messages:
-----------------
#!/bin/sh
while true; do
sleep 3
dmesg >/tmp/dmesg.txt
sync
done
-----------------


My video card is radeon x600 , fglrx driver is not used.

2007-05-09 09:08:29

by Andi Kleen

[permalink] [raw]
Subject: Re: Lockup after logging out of X

On Wed, May 09, 2007 at 01:14:59AM +0100, Ken Moffat wrote:
> On Wed, May 09, 2007 at 01:01:34AM +0200, Andi Kleen wrote:
> >
> > Already known, although it is still unclear what the bug actually is.
> > Can you run with the appended patch please (from Eric Biederman)
> > and post any backtraces the WARN_ON in there spews out?
> >
> > Also do you use swiotlb?
> >
> > Thanks
> >
> > -Andi
> >
> Had to turn off modules to get it to build. It half-logged another
> oops, but no backtraces, and SysRq+r does nothing. Nothing else
> unusual in the log.

Hmm, not good.
>
> May 9 00:47:12 bluesbreaker gconfd (ken-2833): Resolved address
> "xml:readonly:/etc/gnome/gconf/gconf.xml.defaults" to a read-only
> configuration source at position 2
> May 9 00:52:59 bluesbreaker kernel: [ 275.667737] Unable to handle
> kernel paging request at ffff81003b4ac3e8 RIP:
> May 9 00:52:59 bluesbreaker kernel: [ 275.667742]
> [<ffffffff8027134a>] fasync_helper+0x52/0xf0
> May 9 00:52:59 bluesbreaker kernel: [ 275.667749] PGD 8063 PUD
> 9063 PMD 800000003b4a11e3 BAD
> May 9 00:52:59 bluesbreaker kernel: [ 275.667754] Oops: 0009 [1]
> PREEMPT
> May 9 00:54:26 bluesbreaker syslogd 1.4.1: restart.
> May 9 00:54:26 bluesbreaker bootlog: Starting system log daemon...
> [ OK ]
>
> Apparently I do use swiotlb - I didn't know that, and can't see
> where it gets asked in menuconfig, but I can see
> CONFIG_SWIOTLB=y
>
> Let me know if there is anything else I can test (probably pm
> tomorrow), otherwise I'll go back to -head.

Yes testing head would be a good idea. It has a different bug fix
from Linus that is supposed to address this too. Unfortunately it is
still not known why it actually breaks; i had hoped to find that out
using the debugging patch

-Andi