2006-02-13 20:13:55

by Paul Fulghum

[permalink] [raw]
Subject: [PATCH] tty reference count fix

Fix hole where tty structure can be released when reference
count is non zero. Existing code can sleep without tty_sem
protection between deciding to release the tty structure
(setting local variables tty_closing and otty_closing)
and setting TTY_CLOSING to prevent further opens.
An open can occur during this interval causing release_dev()
to free the tty structure while it is still referenced.

This should fix bugzilla.kernel.org
[Bug 6041] New: Unable to handle kernel paging request

In Bug 6041, tty_open() oopes on accessing the tty structure
it has successfully claimed. Bug was on SMP machine
with the same tty being opened and closed by
multiple processes, and DEBUG_PAGEALLOC enabled.

Signed-off-by: Paul Fulghum <[email protected]>
Cc: Alan Cox <[email protected]>
Cc: Jesper Juhl <[email protected]>

--- linux/drivers/char/tty_io.c 2006-02-10 15:54:00.000000000 -0600
+++ b/drivers/char/tty_io.c 2006-02-13 13:14:33.000000000 -0600
@@ -1841,7 +1841,6 @@ static void release_dev(struct file * fi
tty_closing = tty->count <= 1;
o_tty_closing = o_tty &&
(o_tty->count <= (pty_master ? 1 : 0));
- up(&tty_sem);
do_sleep = 0;

if (tty_closing) {
@@ -1869,6 +1868,7 @@ static void release_dev(struct file * fi

printk(KERN_WARNING "release_dev: %s: read/write wait queue "
"active!\n", tty_name(tty, buf));
+ up(&tty_sem);
schedule();
}

@@ -1877,8 +1877,6 @@ static void release_dev(struct file * fi
* both sides, and we've completed the last operation that could
* block, so it's safe to proceed with closing.
*/
-
- down(&tty_sem);
if (pty_master) {
if (--o_tty->count < 0) {
printk(KERN_WARNING "release_dev: bad pty slave count "
@@ -1892,7 +1890,6 @@ static void release_dev(struct file * fi
tty->count, tty_name(tty, buf));
tty->count = 0;
}
- up(&tty_sem);

/*
* We've decremented tty->count, so we need to remove this file
@@ -1937,6 +1934,8 @@ static void release_dev(struct file * fi
read_unlock(&tasklist_lock);
}

+ up(&tty_sem);
+
/* check whether both sides are closing ... */
if (!tty_closing || (o_tty && !o_tty_closing))
return;



2006-02-13 22:15:38

by Jesper Juhl

[permalink] [raw]
Subject: Re: [PATCH] tty reference count fix

On 2/13/06, Paul Fulghum <[email protected]> wrote:
> Fix hole where tty structure can be released when reference
> count is non zero. Existing code can sleep without tty_sem
> protection between deciding to release the tty structure
> (setting local variables tty_closing and otty_closing)
> and setting TTY_CLOSING to prevent further opens.
> An open can occur during this interval causing release_dev()
> to free the tty structure while it is still referenced.
>
> This should fix bugzilla.kernel.org
> [Bug 6041] New: Unable to handle kernel paging request
>
> In Bug 6041, tty_open() oopes on accessing the tty structure
> it has successfully claimed. Bug was on SMP machine
> with the same tty being opened and closed by
> multiple processes, and DEBUG_PAGEALLOC enabled.
>
> Signed-off-by: Paul Fulghum <[email protected]>
> Cc: Alan Cox <[email protected]>
> Cc: Jesper Juhl <[email protected]>
>

I just applied the patch to 2.6.16-rc3 and booted the patched kernel.
Unfortunately I can't tell you if it fixes the bug since I never
successfully reproduced it, it just happened once out of the blue.
What I can tell you though is that the patched kernel seems to behave
just fine and doesn't seem to introduce any regressions on my system -
but my testing has been quite limited so far.

Not the best feedback, I know, but it's the best I can give you at the moment.


--
Jesper Juhl <[email protected]>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html

2006-02-13 22:43:08

by Jason Baron

[permalink] [raw]
Subject: Re: [PATCH] tty reference count fix


On Mon, 13 Feb 2006, Paul Fulghum wrote:

> Fix hole where tty structure can be released when reference
> count is non zero. Existing code can sleep without tty_sem
> protection between deciding to release the tty structure
> (setting local variables tty_closing and otty_closing)
> and setting TTY_CLOSING to prevent further opens.
> An open can occur during this interval causing release_dev()
> to free the tty structure while it is still referenced.
>
> This should fix bugzilla.kernel.org
> [Bug 6041] New: Unable to handle kernel paging request
>
> In Bug 6041, tty_open() oopes on accessing the tty structure
> it has successfully claimed. Bug was on SMP machine
> with the same tty being opened and closed by
> multiple processes, and DEBUG_PAGEALLOC enabled.
>

patch looks good to me. Or you could even drop the tty_sem completely from
the release_dev path (patch below) since lock_kernel() is held in both
tty_open() and the release_dev paths() (and there is no sleeping b/w
setting the tty_closing flag and setting the TTY_CLOSING bit).

-Jason


Signed-off-by: Jason Baron <[email protected]>

--- linux-2.6-working/drivers/char/tty_io.c.bak
+++ linux-2.6-working/drivers/char/tty_io.c
@@ -1829,14 +1829,9 @@ static void release_dev(struct file * fi
* each iteration we avoid any problems.
*/
while (1) {
- /* Guard against races with tty->count changes elsewhere and
- opens on /dev/tty */
-
- down(&tty_sem);
tty_closing = tty->count <= 1;
o_tty_closing = o_tty &&
(o_tty->count <= (pty_master ? 1 : 0));
- up(&tty_sem);
do_sleep = 0;

if (tty_closing) {
@@ -1873,7 +1868,6 @@ static void release_dev(struct file * fi
* block, so it's safe to proceed with closing.
*/

- down(&tty_sem);
if (pty_master) {
if (--o_tty->count < 0) {
printk(KERN_WARNING "release_dev: bad pty slave count "
@@ -1887,7 +1881,6 @@ static void release_dev(struct file * fi
tty->count, tty_name(tty, buf));
tty->count = 0;
}
- up(&tty_sem);

/*
* We've decremented tty->count, so we need to remove this file

2006-02-13 23:21:28

by Paul Fulghum

[permalink] [raw]
Subject: Re: [PATCH] tty reference count fix

Jesper Juhl wrote:
> I just applied the patch to 2.6.16-rc3 and booted the patched kernel.
> Unfortunately I can't tell you if it fixes the bug since I never
> successfully reproduced it, it just happened once out of the blue.
> What I can tell you though is that the patched kernel seems to behave
> just fine and doesn't seem to introduce any regressions on my system -
> but my testing has been quite limited so far.
>
> Not the best feedback, I know, but it's the best I can give you at the moment.

Any feedback is good feedback. I'm not suprised it has not
happened again. The necessary conditions are not common and
the timing window is small. As Andrew pointed out,
DEBUG_PAGEALLOC likely helped uncover this.

Decoding the 2 oops to specific lines of code strongly
suggests this is what happened to you. The decoding showed
the tty open and close (release) paths from different
processes going after the same tty structure at the same time.

Thanks,
Paul

--
Paul Fulghum
Microgate Systems, Ltd




2006-02-13 23:45:16

by Paul Fulghum

[permalink] [raw]
Subject: Re: [PATCH] tty reference count fix

Jason Baron wrote:
> patch looks good to me. Or you could even drop the tty_sem completely from
> the release_dev path (patch below) since lock_kernel() is held in both
> tty_open() and the release_dev paths()

The historical posts on locking in release_dev()
I looked at suggest that a move from BKL to more
fine grained synchronization (tty_sem) was a goal.

The code looks like it originally relied on the BKL.
The later addition of tty_sem was done in a way
that allowed sleeping between setting tty_closing
and TTY_CLOSING flag:

down(&tty_sem)
if (tty->count == 1)
tty_closing = 1;
up(&tty_sem)
...
down(&tty_sem) <<<< can sleep, open can increase count
tty->count--
up(&tty_sem)
...
if (tty_closing)
set_bit(TTY_CLOSING)

> (and there is no sleeping b/w setting the tty_closing
> flag and setting the TTY_CLOSING bit).

Your patch leaves the schedule() call at the bottom of
the while loop between setting tty_closing and
setting TTY_CLOSING flag.

Thanks,
Paul

--
Paul Fulghum
Microgate Systems, Ltd

2006-02-14 00:20:36

by Paul Fulghum

[permalink] [raw]
Subject: Re: [PATCH] tty reference count fix

Paul Fulghum wrote:
> Your patch leaves the schedule() call at the bottom of
> the while loop between setting tty_closing and
> setting TTY_CLOSING flag.

Nevermind. After schedule() the count is reread,
so you are right. Dropping tty_sem altogether
would work.

It is a matter of where you ultimately wish to
push the locking to BKL or tty_sem.

--
Paul Fulghum
Microgate Systems, Ltd

2006-02-14 21:46:40

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH] tty reference count fix


>
> patch looks good to me. Or you could even drop the tty_sem completely from
> the release_dev path (patch below) since lock_kernel() is held in both
> tty_open() and the release_dev paths() (and there is no sleeping b/w
> setting the tty_closing flag and setting the TTY_CLOSING bit).


that's the wrong direction.. the idea is to first put new locking under
the BKL layer.. and at the end pull the BKL entirely.. The BKL isn't a
good lock.