2003-02-05 00:26:52

by Rik van Riel

[permalink] [raw]
Subject: [PATCH][RESEND 3] disassociate_ctty SMP fix

Hi,

the following patch, against today's BK tree, fixes a small
SMP race in disassociate_ctty. This function gets called
from do_exit, without the BKL held.

However, it sets the *tty variable before grabbing the bkl,
then makes decisions on what the variable was set to before
the lock was grabbed, despite the fact that another process
could modify its ->tty pointer in this same function.

please apply,
thank you,

Rik
--
Bravely reimplemented by the knights who say "NIH".
http://www.surriel.com/ http://guru.conectiva.com/
Current spamtrap: <a href=mailto:"[email protected]">[email protected]</a>


===== drivers/char/tty_io.c 1.50 vs edited =====
--- 1.50/drivers/char/tty_io.c Sat Dec 7 16:23:16 2002
+++ edited/drivers/char/tty_io.c Sat Jan 11 11:37:34 2003
@@ -571,7 +571,7 @@
*/
void disassociate_ctty(int on_exit)
{
- struct tty_struct *tty = current->tty;
+ struct tty_struct *tty;
struct task_struct *p;
struct list_head *l;
struct pid *pid;
@@ -579,6 +579,7 @@

lock_kernel();

+ tty = current->tty;
if (tty) {
tty_pgrp = tty->pgrp;
if (on_exit && tty->driver.type != TTY_DRIVER_TYPE_PTY)


2003-02-05 00:58:28

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH][RESEND 3] disassociate_ctty SMP fix

On Tue, 4 Feb 2003, Rik van Riel wrote:

> the following patch, against today's BK tree, fixes a small
> SMP race in disassociate_ctty. This function gets called
> from do_exit, without the BKL held.

Here's a better one, this one does the same fix not only
in disassociate_ctty() but also in do_tty_hangup()

If we're lucky it might fix:
http://bugme.osdl.org/show_bug.cgi?id=54

Please apply. Thank you,

Rik

===== drivers/char/tty_io.c 1.55 vs edited =====
--- 1.55/drivers/char/tty_io.c Tue Jan 14 23:37:20 2003
+++ edited/drivers/char/tty_io.c Tue Feb 4 23:02:52 2003
@@ -425,19 +425,21 @@
*/
void do_tty_hangup(void *data)
{
- struct tty_struct *tty = (struct tty_struct *) data;
+ struct tty_struct *tty;
struct file * cons_filp = NULL;
struct task_struct *p;
struct list_head *l;
struct pid *pid;
int closecount = 0, n;

- if (!tty)
- return;
-
/* inuse_filps is protected by the single kernel lock */
lock_kernel();
-
+ tty = (struct tty_struct *) data;
+ if (!tty) {
+ unlock_kernel();
+ return;
+ }
+
check_tty_count(tty, "do_tty_hangup");
file_list_lock();
for (l = tty->tty_files.next; l != &tty->tty_files; l = l->next) {
@@ -571,7 +573,7 @@
*/
void disassociate_ctty(int on_exit)
{
- struct tty_struct *tty = current->tty;
+ struct tty_struct *tty;
struct task_struct *p;
struct list_head *l;
struct pid *pid;
@@ -579,6 +581,7 @@

lock_kernel();

+ tty = current->tty;
if (tty) {
tty_pgrp = tty->pgrp;
if (on_exit && tty->driver.type != TTY_DRIVER_TYPE_PTY)

2003-02-05 01:41:23

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH][RESEND 3] disassociate_ctty SMP fix

Rik van Riel <[email protected]> wrote:
>
> ===== drivers/char/tty_io.c 1.55 vs edited =====
> --- 1.55/drivers/char/tty_io.c Tue Jan 14 23:37:20 2003
> +++ edited/drivers/char/tty_io.c Tue Feb 4 23:02:52 2003
> @@ -425,19 +425,21 @@
> */
> void do_tty_hangup(void *data)
> {
> - struct tty_struct *tty = (struct tty_struct *) data;
> + struct tty_struct *tty;
> struct file * cons_filp = NULL;
> struct task_struct *p;
> struct list_head *l;
> struct pid *pid;
> int closecount = 0, n;
>
> - if (!tty)
> - return;
> -
> /* inuse_filps is protected by the single kernel lock */
> lock_kernel();
> -
> + tty = (struct tty_struct *) data;
> + if (!tty) {
> + unlock_kernel();
> + return;
> + }
> +
> check_tty_count(tty, "do_tty_hangup");
> file_list_lock();
> for (l = tty->tty_files.next; l != &tty->tty_files; l = l->next) {

This part is a no-op...

2003-02-05 01:49:34

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH][RESEND 3] disassociate_ctty SMP fix

On Tue, 4 Feb 2003, Andrew Morton wrote:
> Rik van Riel <[email protected]> wrote:
> >
> > ===== drivers/char/tty_io.c 1.55 vs edited =====
> > --- 1.55/drivers/char/tty_io.c Tue Jan 14 23:37:20 2003
> > +++ edited/drivers/char/tty_io.c Tue Feb 4 23:02:52 2003
> > @@ -425,19 +425,21 @@
> > */
> > void do_tty_hangup(void *data)
> > {
> > - struct tty_struct *tty = (struct tty_struct *) data;

> > - if (!tty)
> > - return;
> > -
> > /* inuse_filps is protected by the single kernel lock */
> > lock_kernel();

[ move stuff to after the lock_kernel() ]

> This part is a no-op...

Mmmm Doh, local variable indeed ;/

I guess it's time to fix the caller of this function then,
since something strange is going on here:

http://bugme.osdl.org/show_bug.cgi?id=54

Rik
--
Bravely reimplemented by the knights who say "NIH".
http://www.surriel.com/ http://guru.conectiva.com/
Current spamtrap: <a href=mailto:"[email protected]">[email protected]</a>

2003-02-05 09:41:48

by Russell King

[permalink] [raw]
Subject: Re: [PATCH][RESEND 3] disassociate_ctty SMP fix

On Tue, Feb 04, 2003 at 11:58:52PM -0200, Rik van Riel wrote:
> I guess it's time to fix the caller of this function then,
> since something strange is going on here:
>
> http://bugme.osdl.org/show_bug.cgi?id=54

I don't think the tty is null there. It'll be a filp->private_data
being null.

>From the trace, my guess is that a file descriptor is being left
with a null filp->private_data, yet its on the file descriptor list
for the tty, and its fops is set to tty_fops.

I'll see if I can reproduce it here and work out what's going on.
However, note the following (and I think this is the crux of the problem):
release_dev is _supposed_ to run under the BKL. So how the fsck are we
getting into tty_do_hangup's BKL protected region while also being in
release_dev's BKL protected region? Is it the BKL which has broken?

--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html

2003-02-05 14:41:56

by Russell King

[permalink] [raw]
Subject: Re: [PATCH][RESEND 3] disassociate_ctty SMP fix

Ok, what seems to be going on is as follows:

1. you kill 'dd' as per the bug report
2. the kernel closes the file descriptors that 'dd' had open as part of
the do_exit() cleanup.
3. this calls tty_release(), which takes the BKL, and the tty layer
does its necessary cleanup, setting filp->private_data to NULL.
*note* the file descriptor is still on the tty->tty_files list.
4. tty_release() releases the BKL, at which point preemption occurs,
switching to the master sshd process.
5. sshd closes the master end of the pty device, which causes
tty_vhangup to be called.
6. do_tty_hangup scans the file descriptors attached to tty->tty_files,
and calls tty_fasync for each.
7. we come across the file descriptor that 'dd' had open (since it is
still on the tty->tty_files list).
8. tty_fasync looks at filp->private_data, and tty_paranoia_check finds
it to be NULL, and so complains.

So, where do we take the file descriptor off the tty list? __fput() -
list_del(&file->f_list);

This problem could still be present on non-preempt configurations,
although to a lesser degree. As the code currently stands in both
2.4 and 2.5, we generally do not see any reschedules between tty_release
and __fput which would allow the above scenario to occur. Preempt
just provides the extra bait for the bug to show itself.

As to the fix, umm, yea. I'll get to that in a little while. The good
news is that we know what's happening now.

(note: the "scheduling while atomic" below is caused by the method I
used to discover this problem, and shows the point at which the pesky
reschedule happened in the tty code. pid135 is the dd process on the
pty slave end, pid130 is the sshd on the pty master end.)

tty_release: (pid=135) filp = c131ecb4 filp->private_data = c12d3000
bad: scheduling while atomic!
[<c0227d10>] (dump_stack+0x0/0x14)
[<c023cde8>] (schedule+0x0/0x3b4)
[<c023d19c>] (preempt_schedule+0x0/0x50)
[<c02f17b8>] (tty_release+0x0/0x110)
[<c0276230>] (__fput+0x0/0x1a8)
[<c02761f8>] (fput+0x0/0x38)
[<c0274864>] (filp_close+0x0/0x110)
[<c0242b0c>] (put_files_struct+0x0/0x110)
[<c024357c>] (do_exit+0x0/0x528)
[<c024a724>] (sig_exit+0x0/0x74)
[<c02268f8>] (do_signal+0x0/0x470)
[<c0226d68>] (do_notify_resume+0x0/0x34)
tty_release: (pid=130) filp = c131e984 filp->private_data = c1264000
do_tty_hangup: (pid=130) filp = c131ecb4, filp->private_data = 00000000
Warning: null TTY for (88:01) in tty_fasync
tty_release: (pid=130) done
tty_release: (pid=135) done

--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html

2003-02-05 16:40:06

by Russell King

[permalink] [raw]
Subject: Re: [PATCH][RESEND 3] disassociate_ctty SMP fix

Ok, here's my proposed fix, which appears to work with preempt. I haven't
tested on non-preempt, nor (obviously since its from me) SMP. However,
I forsee no problems caused by this change.

release_dev() sets filp->private_data to NULL when the tty layer has
done with the file descriptor. However, it remains on the tty_files
list until __fput completes.

--- orig/drivers/char/tty_io.c Fri Jan 17 10:39:10 2003
+++ linux/drivers/char/tty_io.c Wed Feb 5 16:38:23 2003
@@ -442,6 +442,13 @@
file_list_lock();
for (l = tty->tty_files.next; l != &tty->tty_files; l = l->next) {
struct file * filp = list_entry(l, struct file, f_list);
+ /*
+ * If this file descriptor has been closed, ignore it; it
+ * will be going away shortly. (We don't test filp->f_count
+ * for zero since that could open another race.) --rmk
+ */
+ if (filp->private_data == NULL)
+ continue;
if (IS_CONSOLE_DEV(filp->f_dentry->d_inode->i_rdev) ||
IS_SYSCONS_DEV(filp->f_dentry->d_inode->i_rdev)) {
cons_filp = filp;

--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html