Hey all,
Last night I started chipping away at a (presumable xserver) bug[1] that's
been bothering me for some time now. With recent xservers, running
compiz will sporatically cause the xserver to crash, being send SIGQUIT
by some unknown process. Looking further into the matter with
systemtap, I found that it was keventd sending the signal, specifically
through this path,
send_signal: SIGQUIT was sent to X (pid:2787) by events/1 uid:0
0xffffffff8106b301 : T.649+0x1/0x2c0 [kernel]
0xffffffff8106b8f3 : __group_send_sig_info+0x13/0x20 [kernel]
0xffffffff8106c254 : group_send_sig_info+0x54/0x90 [kernel]
0xffffffff8106c428 : __kill_pgrp_info+0x48/0x80 [kernel]
0xffffffff8106c4a0 : kill_pgrp+0x40/0x60 [kernel]
0xffffffff812eab52 : n_tty_receive_buf+0x482/0x12e0 [kernel]
0xffffffff812ee373 : flush_to_ldisc+0x103/0x1d0 [kernel]
0xffffffff81070d0a : worker_thread+0x15a/0x280 [kernel]
0xffffffff81075cbe : kthread+0x9e/0xb0 [kernel]
0xffffffff8101312a : child_rip+0xa/0x20 [kernel]
0xffffffff81075c20 : kthread+0x0/0xb0 [kernel] (inexact)
0xffffffff81013120 : child_rip+0x0/0x20 [kernel] (inexact)
As you can see, the signal strangely enough seems to be coming from the
tty layer. Looking at n_tty_receive_buf, I see really only one line
which could cause any sort of signal,
drivers/char/n_tty.c:
if (!tty->icanon && (tty->read_cnt >= tty->minimum_to_wake)) {
kill_fasync(&tty->fasync, SIGIO, POLL_IN);
if (waitqueue_active(&tty->read_wait))
wake_up_interruptible(&tty->read_wait);
}
However, this is apparently sending SIGIO, not SIGQUIT. I'm not entirely
certain why n_tty_receive_buf is the direct antecedent of kill_pgrp in
the backtrace. Could this be the result of inlining? If so, shouldn't
the DWARF information be enough to get the intermediate callers?
What could cause the tty layer to send a SIGQUIT to a process? Could the
backtrace I acquired through systemtap somehow result in a SIGQUIT
signal?
Unfortunately, my knowledge of the tty layer is extremely limited, so
any and all advice you could provide would be greatly appreciated.
Thanks,
- Ben
[1] https://bugs.freedesktop.org/show_bug.cgi?id=22679
SIGQUIT is sent to the X server if the controling tty of the X server
(probably its VT) receives the QUIT character (usually control-\, i.e.
0x1c)
Samuel
On Thu, Jul 09, 2009 at 05:00:47PM +0200, Samuel Thibault wrote:
> SIGQUIT is sent to the X server if the controling tty of the X server
> (probably its VT) receives the QUIT character (usually control-\, i.e.
> 0x1c)
This, however, would imply that something is sending the character and
this something is certainly not me. Where else might this character come
from? How might I trace who's writing to the tty?
- Ben
Ben Gamari, le Thu 09 Jul 2009 13:18:16 -0400, a ?crit :
> On Thu, Jul 09, 2009 at 05:00:47PM +0200, Samuel Thibault wrote:
> > SIGQUIT is sent to the X server if the controling tty of the X server
> > (probably its VT) receives the QUIT character (usually control-\, i.e.
> > 0x1c)
>
> This, however, would imply that something is sending the character and
> this something is certainly not me. Where else might this character come
> from? How might I trace who's writing to the tty?
Not writing to the tty, but producing input for the tty. Are you
using evdev or the legacy kbd driver? 0x1c is the keycode of the enter
key, maybe your workload happens to restart the keyboard driver, which
temporarily re-enables signal keys.
Or maybe it's on another tty, do you have anything beyond /dev/mem,
/dev/null, /dev/tty7, /dev/agpgart and /dev/dri/card* in
lsof -p $(pidof Xorg) | grep CHR
?
Samuel
On Fri, Jul 10, 2009 at 08:33:46PM +0200, Samuel Thibault wrote:
> Ben Gamari, le Thu 09 Jul 2009 13:18:16 -0400, a ?crit :
> > On Thu, Jul 09, 2009 at 05:00:47PM +0200, Samuel Thibault wrote:
> > > SIGQUIT is sent to the X server if the controling tty of the X server
> > > (probably its VT) receives the QUIT character (usually control-\, i.e.
> > > 0x1c)
> >
> > This, however, would imply that something is sending the character and
> > this something is certainly not me. Where else might this character come
> > from? How might I trace who's writing to the tty?
>
> Not writing to the tty, but producing input for the tty. Are you
> using evdev or the legacy kbd driver? 0x1c is the keycode of the enter
> key, maybe your workload happens to restart the keyboard driver, which
> temporarily re-enables signal keys.
>
> Or maybe it's on another tty, do you have anything beyond /dev/mem,
> /dev/null, /dev/tty7, /dev/agpgart and /dev/dri/card* in
> lsof -p $(pidof Xorg) | grep CHR
> ?
This sounds familiar:
"a set of 'stty' calls in the init scripts, that (amazingly) reset the
isig flag on the current vt (which in our case is the X vt). For
anyone ignorant of the vile mess of consequences that means
(obviously) your X server gets a SIGQUIT when you press enter."
-- http://www.gnome.org/~michael/blog/2009-05-29.html
Marius Gedminas
--
I'm sure it would be possible to speed apport up a lot, after we're done
making boot and login instantaneous.
-- Lars Wirzenius
Marius Gedminas, le Sat 11 Jul 2009 14:16:59 +0300, a ?crit :
> "a set of 'stty' calls in the init scripts, that (amazingly) reset the
> isig flag on the current vt (which in our case is the X vt). For
> anyone ignorant of the vile mess of consequences that means
> (obviously) your X server gets a SIGQUIT when you press enter."
> -- http://www.gnome.org/~michael/blog/2009-05-29.html
Oh joy.
Samuel
On Fri, Jul 10, 2009 at 08:33:46PM +0200, Samuel Thibault wrote:
> Not writing to the tty, but producing input for the tty. Are you
> using evdev or the legacy kbd driver?
evdev.
> 0x1c is the keycode of the enter key, maybe your workload happens to
> restart the keyboard driver, which temporarily re-enables signal keys.
What syscall would one use to do this? Perhaps I could systemtap this.
>
> Or maybe it's on another tty, do you have anything beyond /dev/mem,
> /dev/null, /dev/tty7, /dev/agpgart and /dev/dri/card* in
> lsof -p $(pidof Xorg) | grep CHR
> ?
[1038 ben@ben-laptop ~] $ sudo lsof -p $(pidof X) | grep CHR
lsof: WARNING: can't stat() fuse.gvfs-fuse-daemon file system /home/ben/.gvfs
Output information may be incomplete.
X 2787 root mem CHR 226,0 1191 /dev/dri/card0
X 2787 root 5w CHR 4,7 0t0 1150 /dev/tty7
X 2787 root 8u CHR 226,0 0t0 1191 /dev/dri/card0
X 2787 root 10u CHR 13,66 0t0 1464 /dev/input/event2
X 2787 root 11u CHR 13,67 0t0 1505 /dev/input/event3
X 2787 root 12u CHR 13,65 0t0 1450 /dev/input/event1
X 2787 root 13u CHR 13,68 0t0 1478 /dev/input/event4
X 2787 root 14u CHR 13,69 0t0 1471 /dev/input/event5
X 2787 root 15u CHR 13,72 0t0 4938 /dev/input/event8
X 2787 root 16u CHR 13,71 0t0 4905 /dev/input/event7
So other than a bunch of event devices, not too much. Thanks for your
input,
- Ben
On Sat, Jul 11, 2009 at 02:16:59PM +0300, Marius Gedminas wrote:
> This sounds familiar:
>
> "a set of 'stty' calls in the init scripts, that (amazingly) reset the
> isig flag on the current vt (which in our case is the X vt). For
> anyone ignorant of the vile mess of consequences that means
> (obviously) your X server gets a SIGQUIT when you press enter."
> -- http://www.gnome.org/~michael/blog/2009-05-29.html
Indeed it does. Unfortunately, it doesn't seem like my init scripts have
anything resetting isig. I do have a onlcr reset it seems, but would
that cause a SIGQUIT?
[1101 ben@ben-laptop etc] $ sudo grep stty -R *
init.d/rc:stty onlcr 0>&1
readahead/boot:/bin/stty
Thanks again,
- Ben