2004-03-29 01:31:11

by Albert Cahalan

[permalink] [raw]
Subject: Subject: Re: NULL pointer in proc_pid_stat -- oops.

>> And from the oops trace output (that is attached), we can
>> see that %edx is 0x0; so we can easily see here why we're
>> crashing at least. After examining the C source, I see
>> that we're dying in the call to task_name() (inline) from
>> proc_pid_stat().
>
> Looks like this problem is same with BSD acct Oops.
>
> if (task->tty) {
> tty_pgrp = task->tty->pgrp;
> tty_nr = new_encode_dev(tty_devnum(task->tty));
> }
>
> Some place doesn't take the any lock for ->tty.
> I think we need to take the lock for ->tty.

Probably this isn't the thing for 2.6.xx, but how
about a special tty struct instead of the NULL?
Make it const, perhaps write-protected, etc.

Then the if(task->tty) test can be removed.



2004-03-29 14:19:33

by OGAWA Hirofumi

[permalink] [raw]
Subject: Re: Subject: Re: NULL pointer in proc_pid_stat -- oops.

Albert Cahalan <[email protected]> writes:

> > if (task->tty) {
> > tty_pgrp = task->tty->pgrp;
> > tty_nr = new_encode_dev(tty_devnum(task->tty));
> > }
> >
> > Some place doesn't take the any lock for ->tty.
> > I think we need to take the lock for ->tty.
>
> Probably this isn't the thing for 2.6.xx,

Ah, sorry for confusing. This is 2.6.x (maybe also 2.4.x).

e.g. the above take the task_lock(). But disassociate_ctty() just take
read_lock(&tasklist_lock), etc. etc. So looks like racy.
--
OGAWA Hirofumi <[email protected]>

2004-03-29 17:16:28

by Ricky Beam

[permalink] [raw]
Subject: Re: Subject: Re: NULL pointer in proc_pid_stat -- oops.

On Mon, 29 Mar 2004, OGAWA Hirofumi wrote:
>> > if (task->tty) {
>> > tty_pgrp = task->tty->pgrp;
>> > tty_nr = new_encode_dev(tty_devnum(task->tty));
>> > }
>> >
>> > Some place doesn't take the any lock for ->tty.
>> > I think we need to take the lock for ->tty.
>>
>> Probably this isn't the thing for 2.6.xx,
>
>Ah, sorry for confusing. This is 2.6.x (maybe also 2.4.x).
>
>e.g. the above take the task_lock(). But disassociate_ctty() just take
>read_lock(&tasklist_lock), etc. etc. So looks like racy.

Yes, there is a race condition. I never chased it back to the specifics.
I just did this:
===== tty.h 1.23 vs 1.24 =====
--- 1.23/include/linux/tty.h Wed Sep 24 02:15:15 2003
+++ 1.24/include/linux/tty.h Wed Feb 11 18:29:20 2004
@@ -404,7 +404,16 @@

static inline dev_t tty_devnum(struct tty_struct *tty)
{
- return MKDEV(tty->driver->major, tty->driver->minor_start) + tty->index;
+ int ret = 0;
+
+ if(!tty) {
+ printk(KERN_CRIT "tty_devnum(): NULL tty (%p)\n", tty);
+ } else if(!tty->driver) {
+ printk(KERN_CRIT "tty_devnum(): NULL tty->driver (%p)\n", tty->d
river);
+ } else
+ ret = MKDEV(tty->driver->major, tty->driver->minor_start) + tty->index;
+
+ return ret;
}

#endif /* __KERNEL__ */

We were seeing this with some application(s) that called the java keytool
a bit too often. (We moved the calls to directly calling the keytool
classes.)

"Works for me" :-)

--Ricky