2003-09-12 18:18:31

by Pat LaVarre

[permalink] [raw]
Subject: console lost to Ctrl+Alt+F$n in 2.6.0-test5


> ... always ... an oops ... Must be fixed.

Once upon a time Ctrl+Alt+F1 gave me a plain text console, Ctrl+Alt+F7
returned me to me X Windows console.

Much has changed, the last thing I changed was upgrading to 2.6.0-test5
from 2.6.0-test4, and now I find that toggling back and forth a few
times leaves my display permanently dark. Recovered from my ext3
journal are the following two examples of `cat /proc/kmsg | tee ...`
output.

This report differs slightly, e.g. by severity, repeatability, and
mention of handle_vm86_fault, from much of:
http://groups.google.com/groups?q=__might_sleep&scoring=d

Example #1:

...
<4>sr0: scsi3-mmc drive: 0x/48x writer cd/rw xa/form2 cdda tray
<4>sr0: scsi3-mmc maybe not writeable
<6>Uniform CD-ROM driver Revision: 3.12
<7>Attached scsi CD-ROM sr0 at scsi0, channel 0, id 0, lun 0
<4>sr1: scsi3-mmc writable profile: 0x0002
<7>Attached scsi CD-ROM sr1 at scsi1, channel 0, id 0, lun 0
<3>Debug: sleeping function called from invalid context at include/asm/uaccess.h:473
<4>Call Trace:
<4> [<c0121f16>] __might_sleep+0x5f/0x72
<4> [<c010e76a>] save_v86_state+0x6a/0x20f
<4> [<c010f32d>] handle_vm86_fault+0xa7/0x8fb
<4> [<c010cc8f>] do_general_protection+0x0/0x93
<4> [<c010bf49>] error_code+0x2d/0x38
<4> [<c010b4bf>] syscall_call+0x7/0xb
<4>

Example #2:

...
<3>Debug: sleeping function called from invalid context at include/asm/uaccess.h:473
<4>Call Trace:
<4> [<c0121f16>] __might_sleep+0x5f/0x72
<4> [<c010e76a>] save_v86_state+0x6a/0x20f
<4> [<c010f32d>] handle_vm86_fault+0xa7/0x8fb
<4> [<c02323aa>] ipi_handler+0x0/0x7
<4> [<c010cc8f>] do_general_protection+0x0/0x93
<4> [<c010bf49>] error_code+0x2d/0x38
<4> [<c010b4bf>] syscall_call+0x7/0xb
<4>
...

Pat LaVarre




2003-09-12 23:06:57

by Matt Mackall

[permalink] [raw]
Subject: Re: console lost to Ctrl+Alt+F$n in 2.6.0-test5

On Fri, Sep 12, 2003 at 12:19:28PM -0600, Pat LaVarre wrote:
>
> > ... always ... an oops ... Must be fixed.
>
> Once upon a time Ctrl+Alt+F1 gave me a plain text console, Ctrl+Alt+F7
> returned me to me X Windows console.
>
> Much has changed, the last thing I changed was upgrading to 2.6.0-test5
> from 2.6.0-test4, and now I find that toggling back and forth a few
> times leaves my display permanently dark. Recovered from my ext3
> journal are the following two examples of `cat /proc/kmsg | tee ...`
> output.
>
> This report differs slightly, e.g. by severity, repeatability, and
> mention of handle_vm86_fault, from much of:
> http://groups.google.com/groups?q=__might_sleep&scoring=d
>
> Example #1:
>
> ...
> <4>sr0: scsi3-mmc drive: 0x/48x writer cd/rw xa/form2 cdda tray
> <4>sr0: scsi3-mmc maybe not writeable
> <6>Uniform CD-ROM driver Revision: 3.12
> <7>Attached scsi CD-ROM sr0 at scsi0, channel 0, id 0, lun 0
> <4>sr1: scsi3-mmc writable profile: 0x0002
> <7>Attached scsi CD-ROM sr1 at scsi1, channel 0, id 0, lun 0
> <3>Debug: sleeping function called from invalid context at include/asm/uaccess.h:473
> <4>Call Trace:
> <4> [<c0121f16>] __might_sleep+0x5f/0x72
> <4> [<c010e76a>] save_v86_state+0x6a/0x20f
> <4> [<c010f32d>] handle_vm86_fault+0xa7/0x8fb
> <4> [<c010cc8f>] do_general_protection+0x0/0x93
> <4> [<c010bf49>] error_code+0x2d/0x38
> <4> [<c010b4bf>] syscall_call+0x7/0xb
> <4>
>
> Example #2:
>
> ...
> <3>Debug: sleeping function called from invalid context at include/asm/uaccess.h:473
> <4>Call Trace:
> <4> [<c0121f16>] __might_sleep+0x5f/0x72
> <4> [<c010e76a>] save_v86_state+0x6a/0x20f
> <4> [<c010f32d>] handle_vm86_fault+0xa7/0x8fb
> <4> [<c02323aa>] ipi_handler+0x0/0x7
> <4> [<c010cc8f>] do_general_protection+0x0/0x93
> <4> [<c010bf49>] error_code+0x2d/0x38
> <4> [<c010b4bf>] syscall_call+0x7/0xb
> <4>
> ...
>
> Pat LaVarre

I'm working on this, it's rather messy. Your lockup might be caused by
printk spew during console switch, see if it still locks up with the
sleep debugging turned off.

--
Matt Mackall : http://www.selenic.com : of or relating to the moon

2003-09-13 00:48:12

by Pat LaVarre

[permalink] [raw]
Subject: Re: console lost to Ctrl+Alt+F$n in 2.6.0-test5


> I'm working on this, it's rather messy. Your lockup might be caused by
> printk spew during console switch, see if it still locks up with the
> sleep debugging turned off.

Yes, thank you, Ctrl+Alt+F$n now works if only I
CONFIG_DEBUG_SPINLOCK_SLEEP=n.

Also `sudo cat /proc/kmsg | tee ...` also suddenly starts working.

I wonder if somehow /proc/kmsg now working is a clue? Back with =y, my
`dmesg` was clean but via /proc/kmsg I was seeing garbage like

mmae t itbl

or:

mmae t itle

for what now again is such reassuring chatter as:

<6>scsi2 : SCSI emulation for USB Mass Storage devices
<5> Vendor: Iomega Model: RRD Rev: 23.D
<5> Type: CD-ROM ANSI SCSI revision: 02
<7>WARNING: USB Mass Storage data integrity not assured
<7>USB Mass Storage device found at 3
<4>sr1: scsi3-mmc drive: 125x/125x caddy
<4>sr1: scsi3-mmc maybe not writeable
<4>sr1: scsi3-mmc writable profile: 0x0002
<7>Attached scsi CD-ROM sr1 at scsi2, channel 0, id 0, lun 0

Pat LaVarre

P.S. I could easily check to see if =y kills an ssh session or just the
display, if that helps.

P.P.S.

Tentatively I conclude "sleep debugging ... off" meant this .config
change because I see:

$ make defconfig
...
$ grep -i sleep .config
CONFIG_ACPI_SLEEP=y
CONFIG_ACPI_SLEEP_PROC_FS=y
CONFIG_DEBUG_SPINLOCK_SLEEP=y
$ vi +/DEBUG_SPINLOCK_SLEEP .config
...
Kernel hacking ...

Sleep-inside-spinlock checking (DEBUG_SPINLOCK_SLEEP)

If you say Y here, various routines which may sleep will become very
noisy if they are called with a spinlock held.
...

Google isn't quickly confirming/ denying this tentative conclusion of
mine.



2003-09-13 01:57:54

by Matt Mackall

[permalink] [raw]
Subject: Re: console lost to Ctrl+Alt+F$n in 2.6.0-test5

On Fri, Sep 12, 2003 at 06:49:08PM -0600, Pat LaVarre wrote:
>
> > I'm working on this, it's rather messy. Your lockup might be caused by
> > printk spew during console switch, see if it still locks up with the
> > sleep debugging turned off.
>
> Yes, thank you, Ctrl+Alt+F$n now works if only I
> CONFIG_DEBUG_SPINLOCK_SLEEP=n.
>
> Also `sudo cat /proc/kmsg | tee ...` also suddenly starts working.
>
> I wonder if somehow /proc/kmsg now working is a clue? Back with =y, my
> `dmesg` was clean but via /proc/kmsg I was seeing garbage like
>
> mmae t itbl
>
> or:
>
> mmae t itle

What video are you using? I'm guessing you've got a framebuffer console?
VESA by any chance?

> for what now again is such reassuring chatter as:
>
> <6>scsi2 : SCSI emulation for USB Mass Storage devices
> <5> Vendor: Iomega Model: RRD Rev: 23.D
> <5> Type: CD-ROM ANSI SCSI revision: 02
> <7>WARNING: USB Mass Storage data integrity not assured
> <7>USB Mass Storage device found at 3
> <4>sr1: scsi3-mmc drive: 125x/125x caddy
> <4>sr1: scsi3-mmc maybe not writeable
> <4>sr1: scsi3-mmc writable profile: 0x0002
> <7>Attached scsi CD-ROM sr1 at scsi2, channel 0, id 0, lun 0
>
> Pat LaVarre
>
> P.S. I could easily check to see if =y kills an ssh session or just the
> display, if that helps.

That might help track down a bug in the console, sure. Not sure what's
going on with /proc/kmsg though.

> P.P.S.
>
> Tentatively I conclude "sleep debugging ... off" meant this .config
> change because I see:

Yep, that's the one.

--
Matt Mackall : http://www.selenic.com : of or relating to the moon

2003-09-13 13:37:38

by Pat LaVarre

[permalink] [raw]
Subject: Re: console lost to Ctrl+Alt+F$n in 2.6.0-test5


> What video are you using?
> I'm guessing you've got a framebuffer console?
> VESA by any chance?

I do not yet know how to answer such questions confidently.

I see (redhat-config-xfree86 --> tab Advanced) reports:

Video Card
Video Card Type = Intel 865
Memory Size = 16 megabytes
Driver = i810
Enable Hardware 3D Acceleration = no

> > Yes, thank you, Ctrl+Alt+F$n now works
> > if only I CONFIG_DEBUG_SPINLOCK_SLEEP=n.

Please allow me to disavow that first impression.

With CONFIG_DEBUG_SPINLOCK_SLEEP=y, I've now been counting keystrokes
til crash. I count each of Ctrl+Alt+F5 and Ctrl+Alt+F7 as one stroke.
Sometimes I crash, sometimes I do not. I began logging life more
carefully when first I saw a few strokes cause a crash, and thereafter,
per boot:

8 strokes crashed.

60 strokes did not crash, so I gave up and rebooted to try again.

4 strokes crashed. The first 2 seeming had logged me out, killing my
cat /proc/kmsg process.

8 strokes crashed.

26 strokes crashed.

...

The only consistency I see is that always an even number of strokes
cause a crash i.e. always the Ctrl+Alt+F7 switch back to my X console,
not the switch to a text console.

To prepare to crash, I only know of: sync umount ext3. For me as yet
"Checking ... filesystem..." wastes less than three minutes per crash,
and I haven't yet perceptibly lost a disk.

> > I wonder if somehow /proc/kmsg now working is a clue?

Meanwhile, whether `sudo cat /proc/kmsg | tee ...` displays printk
intact or not also varies, without clearly correlating with whether a
crash will or will not occur.

So far, with CONFIG_DEBUG_SPINLOCK_SLEEP=y, trying `sudo cat /proc/kmsg
| tee ...` has never run well enough to capture the cause of the crash.

> > I could easily check ... ssh ...

Remote ssh freezes and remote ping starts losing all packets.

Pat LaVarre



2003-09-13 14:51:27

by Michael Frank

[permalink] [raw]
Subject: Re: console lost to Ctrl+Alt+F$n in 2.6.0-test5

Don't think this is a keyboard problem and have seen
this several times related to video drivers in particular
when switching back to X.

The script below switches every $wait (5) seconds between
VT $vt (1) and X @vt$x (7). It logs to $log (log=/tmp/_vt)
syncs and waits before doing the switch to allow data to
reach the disk. So if it dies, you should know quite
accurately when and where and if it also happens wo KB.

Put this into a file and make it executable. It must
be run as root unless chvt is in sudoers or suid root.

#!/bin/bash

cycle=1
log=/tmp/_vt.log
vt=1
x=7
wait=5
#rm -f $log
echo Starting VT <> X test
while ((1)); do
echo Cycle $cycle switching to VT $vt >> $log
sync
sleep $wait
chvt $vt
echo Cycle $cycle switching to X >> $log
sync
sleep $wait
chvt $x
echo Cycle $cycle
((cycle += 1))
done
;;


On Saturday 13 September 2003 21:38, Pat LaVarre wrote:
>
> > What video are you using?
> > I'm guessing you've got a framebuffer console?
> > VESA by any chance?
>
> I do not yet know how to answer such questions confidently.
>
> I see (redhat-config-xfree86 --> tab Advanced) reports:
>
> Video Card
> Video Card Type = Intel 865
> Memory Size = 16 megabytes
> Driver = i810
> Enable Hardware 3D Acceleration = no
>
> > > Yes, thank you, Ctrl+Alt+F$n now works
> > > if only I CONFIG_DEBUG_SPINLOCK_SLEEP=n.
>
> Please allow me to disavow that first impression.
>
> With CONFIG_DEBUG_SPINLOCK_SLEEP=y, I've now been counting keystrokes
> til crash. I count each of Ctrl+Alt+F5 and Ctrl+Alt+F7 as one stroke.
> Sometimes I crash, sometimes I do not. I began logging life more
> carefully when first I saw a few strokes cause a crash, and thereafter,
> per boot:
>
> 8 strokes crashed.
>
> 60 strokes did not crash, so I gave up and rebooted to try again.
>
> 4 strokes crashed. The first 2 seeming had logged me out, killing my
> cat /proc/kmsg process.
>
> 8 strokes crashed.
>
> 26 strokes crashed.
>
> ...
>
> The only consistency I see is that always an even number of strokes
> cause a crash i.e. always the Ctrl+Alt+F7 switch back to my X console,
> not the switch to a text console.
>
> To prepare to crash, I only know of: sync umount ext3. For me as yet
> "Checking ... filesystem..." wastes less than three minutes per crash,
> and I haven't yet perceptibly lost a disk.
>
> > > I wonder if somehow /proc/kmsg now working is a clue?
>
> Meanwhile, whether `sudo cat /proc/kmsg | tee ...` displays printk
> intact or not also varies, without clearly correlating with whether a
> crash will or will not occur.
>
> So far, with CONFIG_DEBUG_SPINLOCK_SLEEP=y, trying `sudo cat /proc/kmsg
> | tee ...` has never run well enough to capture the cause of the crash.
>
> > > I could easily check ... ssh ...
>
> Remote ssh freezes and remote ping starts losing all packets.
>
> Pat LaVarre
>
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>


2003-09-13 16:11:33

by Michael Frank

[permalink] [raw]
Subject: Re: console lost to Ctrl+Alt+F$n in 2.6.0-test5

On Saturday 13 September 2003 22:49, Michael Frank wrote:
> Don't think this is a keyboard problem and have seen
> this several times related to video drivers in particular
> when switching back to X.

Used script with 2.6.0-test5 + pm2 patch

Kernel: VGA16FB

X4.3: Driver "vesa"
VendorName "Silicon Integrated Systems [SiS]"
BoardName "VESA driver (generic)"

200 cycles wo problems.

2003-09-13 23:11:34

by Pat LaVarre

[permalink] [raw]
Subject: Re: console lost to Ctrl+Alt+F$n in 2.6.0-test5


> script ... switches every $wait (5) seconds between VT
> $vt (1) and X @vt$x (7) ...

Yes, logging in via ssh to run this script does reliably crash my
2.6.0-test5, I think because of `chvt $x`, while I am not touching the
console keyboard or mouse.

Cycles required varies. Counting cycles completed before crashing, I
saw: 1 18 20 20 ... 4 16 ...

The script as was posted creates a small ./X and produces logs such as:

Cycle 1 switching to VT 1
Cycle 1 switching to X
...
Cycle 20 switching to VT 1
Cycle 20 switching to X
Cycle 21 switching to VT 1

This log could have been produced by crashing in `chvt $vt`, but I think
I saw it was produced by crashing in the `sleep $wait` that follows
`chvt $x`. That is, I think the $vt was the last non-blank display, not
the $x.

To increase my confidence, I ran with every command echoed, and indeed
via ssh I saw the last command echoed was `sleep 5`.

I ended by running the third variant script quoted below. Now my logs
comfortingly end with 'switching to X'. I presume I'm catching the
crash in the last sleep $wait.

Pat LaVarre

#!/bin/bash
cycle=1
log=/tmp/_vt.log
vt=1
x=7
wait=3
rm -f $log
echo 'Starting VT <> X test'
while ((1)); do
echo Cycle $cycle switching to VT $vt | tee -a $log
sync
sleep $wait
chvt $vt
sleep $wait
echo Cycle $cycle switching to X | tee -a $log
sync
sleep $wait
chvt $x
sleep $wait
((cycle += 1))
done
;;



2003-09-14 00:25:40

by Michael Frank

[permalink] [raw]
Subject: 2.6.0-test5: intermittent crash on chvt to X; was console lost to Ctrl+Alt+F$n in 2.6.0-test5


> I ended by running the third variant script quoted below. Now my logs
> comfortingly end with 'switching to X'. I presume I'm catching the
> crash in the last sleep $wait.

Makes sense because the crash may (in your case does) happen later
in the switching sequence at which time the "new" log already made
it to disk - I'll use your variant from now on ;)

The qestion now is whether this is kernel or X related.
Have you had this problem with an earlier 2.6 or 2.4 kernel?

If it is specific to -test5, post (as tar.bz2)

- lspci -v
- /var/log/dmesg
- X version and driver info from /var/log/XFree86.log
- .config

Regards
Michael

2003-09-14 19:00:29

by Pat LaVarre

[permalink] [raw]
Subject: RE: 2.6.0-test5: intermittent crash on chvt to X; was console lost to Ctrl+Alt+F$n in 2.6.0-test5


> If it is specific to -test5, post (as tar.bz2)

Definitely not specific to 2.6.0-test5: I have
repeatedly seen 2.6.0-test4 crash this way too.

Whether I have had this vulnerability for days or
weeks or months or years I am not yet sure, I do not
customarily make a point of trying chvt much with each
new kernel.

Unless I hear otherwise, hopefully within hours, when
next I manage to visit my console in person:

1) I will test to see if the script can crash:

2.6.0-test4 with my near default .config
2.4.22 with my near default .config
2.4.21-xfs Knoppix booted via cd

2) After the reboot following a crash of 2.6.0-test5 I
will sample:

lspci -v
/var/log/dmesg
X version and driver info from /var/log/XFree86.log
.config

3) I will report back here, attaching (2) as .tar.bz2.

Pat LaVarre



__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com

2003-09-14 23:03:32

by Pat LaVarre

[permalink] [raw]
Subject: Re: 2.6.0-test5: intermittent crash on chvt to X; was console lost to Ctrl+Alt+F$n in 2.6.0-test5


> qestion now is whether this is kernel or X related.

Oh.

> > echo ... | tee -a $log
> > sync
> > sleep $wait
> > chvt ...
> > sleep $wait
>
> the crash may (in your case does) happen later

In repeating those five commands, is there any purpose to the first
sleep? I left the first sleep in place to match the original, but once
I assume sync leaves no writes unflushed then now I do not see what the
first sleep accomplishes, if anything.

> Have you had this problem with an earlier 2.6 or 2.4 kernel?

Today 2.4.21-xfs Knoppix booted via cd:
in over 75 cycles I saw no crash so I gave up.

Today 2.4.22 with a near default .config:
"in over 75 cycles I saw no crash so I gave up".

Today 2.6.0-test4 with a near default .config:
Counting cycles before crash per boot I saw: 3 2 16 ...

Yester/today 2.6.0-test5 with a near default .config:
"I saw: 1 18 20 20 ... 4 16 ..." 3 ...

> -test5, post (as tar.bz2) ...

After that last -test5 crash I rebooted and then produced the attached
via:

#!/bin/bash
# rm -r chvtx
mkdir chvtx

sudo /sbin/lspci -v >>chvtx/v.lspci
sudo cat /var/log/dmesg >>chvtx/var.log.dmesg
egrep -i 'version|release|driver' /var/log/XFree86.*.log >>chvtx/var.log.XFree86.log
# cp -ip /var/log/XFree86.*.log chvtx/
cp -ip .config chvtx/config

tar -c chvtx | bzip2 -zc >chvtx.2.6.0-test5.tar.bz2

# Pat LaVarre



Attachments:
chvtx.2.6.0-test5.tar.bz2 (12.22 kB)

2003-09-16 17:15:48

by Pavel Machek

[permalink] [raw]
Subject: Re: console lost to Ctrl+Alt+F_n in 2.6.0-test5

Hi!

> > Don't think this is a keyboard problem and have seen
> > this several times related to video drivers in particular
> > when switching back to X.
>
> Used script with 2.6.0-test5 + pm2 patch
>
> Kernel: VGA16FB
>
> X4.3: Driver "vesa"
> VendorName "Silicon Integrated Systems [SiS]"
> BoardName "VESA driver (generic)"
>
> 200 cycles wo problems.

VESA is special: its kernel who drives the hw => X can't crash it
so easily.

--
Pavel
Written on sharp zaurus, because my Velo1 broke. If you have Velo you don't need...