2008-02-15 17:58:56

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH] do_signal_stop: use signal_group_exit()

do_signal_stop() needs signal_group_exit() but checks sig->group_exit_task.
This (optimization) is correct, SIGNAL_STOP_DEQUEUED and SIGNAL_GROUP_EXIT
are mutually exclusive, but looks confusing. Use signal_group_exit(), this
is not fastpath, the code clarity is more important.

Signed-off-by: Oleg Nesterov <[email protected]>

--- 25/kernel/signal.c~6_DO_SIGNAL_STOP 2008-02-15 16:59:17.000000000 +0300
+++ 25/kernel/signal.c 2008-02-15 20:34:32.000000000 +0300
@@ -1718,7 +1718,7 @@ static int do_signal_stop(int signr)
struct task_struct *t;

if (!likely(sig->flags & SIGNAL_STOP_DEQUEUED) ||
- unlikely(sig->group_exit_task))
+ unlikely(signal_group_exit(sig)))
return 0;
/*
* There is no group stop already in progress.


2008-02-16 03:39:30

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] do_signal_stop: use signal_group_exit()

ug. On about the fourth boot with the current -mm lineup I hit:


: security: permission sendto in class node not defined in policy
: security: permission dccp_recv in class netif not defined in policy
: security: permission dccp_send in class netif not defined in policy
: security: permission ingress in class netif not defined in policy
: security: permission egress in class netif not defined in policy
: security: permission setfcap in class capability not defined in policy
: security: permission forward_in in class packet not defined in policy
: security: permission forward_out in class packet not defined in policy
: SELinux: policy loaded with handle_unknown=deny
: type=1403 audit(1203124656.152:3): policy loaded auid=4294967295 ses=4294967295
: BUG: unable to handle kernel paging request at 0000000000200200
: IP: [<ffffffff802444f5>] free_pid+0x35/0x8e
: PGD 2574cb067 PUD 257561067 PMD 0
: Oops: 0002 [1] SMP
: last sysfs file: /sys/class/net/eth0/address
: CPU 2
: Modules linked in: ipv6 dm_mirror dm_multipath dm_mod sbs sbshc dock battery ac parport_pc lp parport snd_hda_intel snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq shpchp snd_seq_device sg snd_pcm_oss snd_mixer_oss snd_pcm floppy snd_timer button i2c_i801 snd soundcore ide_cd_mod cdrom serio_raw i2c_core snd_page_alloc pcspkr ehci_hcd ohci_hcd uhci_hcd
: Pid: 3132, comm: ifup-eth Not tainted 2.6.25-rc2-mm1 #5
: RIP: 0010:[<ffffffff802444f5>] [<ffffffff802444f5>] free_pid+0x35/0x8e
: RSP: 0018:ffff81025754de58 EFLAGS: 00010046
: RAX: 0000000000000000 RBX: ffff81025f268840 RCX: ffff81025f263f08
: RDX: 0000000000200200 RSI: 0000000000000046 RDI: 0000000000000000
: RBP: ffff81025f263ec0 R08: ffff81025f268b18 R09: ffff81025f268b08
: R10: ffff81025f268b08 R11: 0000000000000000 R12: ffff810259853140
: R13: 0000000000000c78 R14: 0000000000000000 R15: 0000000000000000
: FS: 00007f8f9ba7d6f0(0000) GS:ffff81025f16f0c0(0000) knlGS:0000000000000000
: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
: CR2: 0000000000200200 CR3: 00000002598d0000 CR4: 00000000000006e0
: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
: Process ifup-eth (pid: 3132, threadinfo ffff81025754c000, task ffff81025d467620)
: Stack: ffff81025f268b08 ffff81025f268840 ffff81025994b660 ffffffff80237727
: ffff81025994b660 ffff81025994b660 0000000000000000 ffffffff80237f81
: 00000000000005d0 ffff810257561018 0000000000000000 00007fffa3aa9514
: Call Trace:
: [<ffffffff80237727>] ? release_task+0x152/0x2e5
: [<ffffffff80237f81>] ? do_wait+0x6c7/0xa1c
: [<ffffffff8022f4cc>] ? default_wake_function+0x0/0xe
: [<ffffffff8023e670>] ? sys_rt_sigaction+0x7a/0x98
: [<ffffffff80238360>] ? sys_wait4+0x8a/0xa1
: [<ffffffff8020be4b>] ? system_call_after_swapgs+0x7b/0x80
:
:
: Code: 80 53 41 51 e8 83 d4 28 00 31 ff 48 89 c6 eb 2e 48 63 c7 48 c1 e0 05 48 8d 44 28 40 48 8d 48 08 48 8b 40 08 48 8b 51 08 48 85 c0 <48> 89 02 74 04 48 89 50 08 48 c7 41 08 00 02 20 00 ff c7 3b 7d
: RIP [<ffffffff802444f5>] free_pid+0x35/0x8e
: RSP <ffff81025754de58>
: CR2: 0000000000200200
: ---[ end trace efde415d3f801416 ]---
: BUG: sleeping function called from invalid context at kernel/rwsem.c:21
: in_atomic():0, irqs_disabled():1
: Pid: 3132, comm: ifup-eth Tainted: G D 2.6.25-rc2-mm1 #5
:
: Call Trace:
: [<ffffffff804d0c0a>] down_read+0x15/0x23
: [<ffffffff802550cc>] acct_collect+0x40/0x180
: [<ffffffff802385fd>] do_exit+0x1f3/0x6ba
: [<ffffffff804d1ea1>] sync_regs+0x0/0x67
: [<ffffffff804d3c02>] do_page_fault+0x755/0x80e
: [<ffffffff804d1ae9>] error_exit+0x0/0x51
: [<ffffffff802444f5>] free_pid+0x35/0x8e
: [<ffffffff802444d3>] free_pid+0x13/0x8e
: [<ffffffff80237727>] release_task+0x152/0x2e5
: [<ffffffff80237f81>] do_wait+0x6c7/0xa1c
: [<ffffffff8022f4cc>] default_wake_function+0x0/0xe
: [<ffffffff8023e670>] sys_rt_sigaction+0x7a/0x98
: [<ffffffff80238360>] sys_wait4+0x8a/0xa1
: [<ffffffff8020be4b>] system_call_after_swapgs+0x7b/0x80
:
: eth0: no IPv6 routers present
:

and I don't have a clue which patch caused it and I won't be near this
machine again for over a week.

2008-02-16 13:58:54

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] do_signal_stop: use signal_group_exit()

On 02/15, Andrew Morton wrote:
>
> ug. On about the fourth boot with the current -mm lineup I hit:
>
> : BUG: unable to handle kernel paging request at 0000000000200200
^^^^^^^^^^^^^^^^
== LIST_POISON2

> : IP: [<ffffffff802444f5>] free_pid+0x35/0x8e

most probably == hlist_del_rcu(pid_chain)

> : PGD 2574cb067 PUD 257561067 PMD 0
> : Oops: 0002 [1] SMP
> : last sysfs file: /sys/class/net/eth0/address
> : CPU 2
> : Modules linked in: ipv6 dm_mirror dm_multipath dm_mod sbs sbshc dock battery ac parport_pc lp parport snd_hda_intel snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq shpchp snd_seq_device sg snd_pcm_oss snd_mixer_oss snd_pcm floppy snd_timer button i2c_i801 snd soundcore ide_cd_mod cdrom serio_raw i2c_core snd_page_alloc pcspkr ehci_hcd ohci_hcd uhci_hcd
> : Pid: 3132, comm: ifup-eth Not tainted 2.6.25-rc2-mm1 #5
> : RIP: 0010:[<ffffffff802444f5>] [<ffffffff802444f5>] free_pid+0x35/0x8e
> : RSP: 0018:ffff81025754de58 EFLAGS: 00010046
> : RAX: 0000000000000000 RBX: ffff81025f268840 RCX: ffff81025f263f08
> : RDX: 0000000000200200 RSI: 0000000000000046 RDI: 0000000000000000
> : RBP: ffff81025f263ec0 R08: ffff81025f268b18 R09: ffff81025f268b08
> : R10: ffff81025f268b08 R11: 0000000000000000 R12: ffff810259853140
> : R13: 0000000000000c78 R14: 0000000000000000 R15: 0000000000000000
> : FS: 00007f8f9ba7d6f0(0000) GS:ffff81025f16f0c0(0000) knlGS:0000000000000000
> : CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> : CR2: 0000000000200200 CR3: 00000002598d0000 CR4: 00000000000006e0
> : DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> : DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> : Process ifup-eth (pid: 3132, threadinfo ffff81025754c000, task ffff81025d467620)
> : Stack: ffff81025f268b08 ffff81025f268840 ffff81025994b660 ffffffff80237727
> : ffff81025994b660 ffff81025994b660 0000000000000000 ffffffff80237f81
> : 00000000000005d0 ffff810257561018 0000000000000000 00007fffa3aa9514
> : Call Trace:
> : [<ffffffff80237727>] ? release_task+0x152/0x2e5
> : [<ffffffff80237f81>] ? do_wait+0x6c7/0xa1c
> : [<ffffffff8022f4cc>] ? default_wake_function+0x0/0xe
> : [<ffffffff8023e670>] ? sys_rt_sigaction+0x7a/0x98
> : [<ffffffff80238360>] ? sys_wait4+0x8a/0xa1
> : [<ffffffff8020be4b>] ? system_call_after_swapgs+0x7b/0x80

(Can't understand why there is no detach_pid() in this stack trace,
but it is the only possible caller of free_pid()).

So, detach_pid()->free_pid() hit an already unhashed pid. But this
is not possible?

This means we already did detach_pid(), but in that case the previous
detach_pid() has set task->pids[].pid = NULL, and we should OOPS earlier,
somewhere at "if (!hlist_empty(&pid->tasks[tmp]))".

> and I don't have a clue which patch caused it and I won't be near this
> machine again for over a week.

Definitely not this patch...

I'll try to think more about this, but I doubt very much I'll find the
reason :(

Oleg.

2008-02-16 15:06:18

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH] free_pidmap: turn it into free_pidmap(struct upid *)

The callers of free_pidmap() pass 2 members of "struct upid", we can just pass
"struct upid *" instead. Shaves off 10 bytes from pid.o.

Also, simplify the alloc_pid's "out_free:" error path a little bit. This way it
looks more clear which subset of pid->numbers[] we are freeing.

Signed-off-by: Oleg Nesterov <[email protected]>

--- 25/kernel/pid.c~ 2008-02-15 16:59:17.000000000 +0300
+++ 25/kernel/pid.c 2008-02-16 17:51:17.000000000 +0300
@@ -111,10 +111,11 @@ EXPORT_SYMBOL(is_container_init);

static __cacheline_aligned_in_smp DEFINE_SPINLOCK(pidmap_lock);

-static void free_pidmap(struct pid_namespace *pid_ns, int pid)
+static void free_pidmap(struct upid *upid)
{
- struct pidmap *map = pid_ns->pidmap + pid / BITS_PER_PAGE;
- int offset = pid & BITS_PER_PAGE_MASK;
+ int nr = upid->nr;
+ struct pidmap *map = upid->ns->pidmap + nr / BITS_PER_PAGE;
+ int offset = nr & BITS_PER_PAGE_MASK;

clear_bit(offset, map->page);
atomic_inc(&map->nr_free);
@@ -232,7 +233,7 @@ void free_pid(struct pid *pid)
spin_unlock_irqrestore(&pidmap_lock, flags);

for (i = 0; i <= pid->level; i++)
- free_pidmap(pid->numbers[i].ns, pid->numbers[i].nr);
+ free_pidmap(pid->numbers + i);

call_rcu(&pid->rcu, delayed_put_pid);
}
@@ -278,8 +279,8 @@ out:
return pid;

out_free:
- for (i++; i <= ns->level; i++)
- free_pidmap(pid->numbers[i].ns, pid->numbers[i].nr);
+ while (++i <= ns->level)
+ free_pidmap(pid->numbers + i);

kmem_cache_free(ns->pid_cachep, pid);
pid = NULL;

2008-02-17 23:11:29

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] do_signal_stop: use signal_group_exit()

On 02/16, Oleg Nesterov wrote:
>
> On 02/15, Andrew Morton wrote:
> >
> > ug. On about the fourth boot with the current -mm lineup I hit:
> >
> > : BUG: unable to handle kernel paging request at 0000000000200200
> ^^^^^^^^^^^^^^^^
> == LIST_POISON2
>
> > : IP: [<ffffffff802444f5>] free_pid+0x35/0x8e
>
> most probably == hlist_del_rcu(pid_chain)
>
> > : Call Trace:
> > : [<ffffffff80237727>] ? release_task+0x152/0x2e5
> > : [<ffffffff80237f81>] ? do_wait+0x6c7/0xa1c
> > : [<ffffffff8022f4cc>] ? default_wake_function+0x0/0xe
> > : [<ffffffff8023e670>] ? sys_rt_sigaction+0x7a/0x98
> > : [<ffffffff80238360>] ? sys_wait4+0x8a/0xa1
> > : [<ffffffff8020be4b>] ? system_call_after_swapgs+0x7b/0x80
>
> (Can't understand why there is no detach_pid() in this stack trace,
> but it is the only possible caller of free_pid()).
>
> So, detach_pid()->free_pid() hit an already unhashed pid. But this
> is not possible?
>
> This means we already did detach_pid(), but in that case the previous
> detach_pid() has set task->pids[].pid = NULL, and we should OOPS earlier,
> somewhere at "if (!hlist_empty(&pid->tasks[tmp]))".

Yes, this is not possible.

But what possible is: we have the unbalanced put_pid(pid) which frees the
live pid. The next alloc_pid() gets the same memory, and initializes all
pid->tasks[] lists.

Now, if that pid was used as PIDTYPE_PGID/PIDTYPE_SID, the next detach_pid()
from this pgrp/sid sees that the pid is not used (all lists are hlist_empty),
frees this pid again, and the bug manifests itself this way.

tiocspgrp:

put_pid(real_tty->pgrp);
// ------ WINDOW ------
real_tty->pgrp = get_pid(pgrp);

When bash spawns the command, both parent and child do ioctl(TIOCSPGRP,child),
and it is possible that both do put_pid() on the same parent's pid.

Damn, when you know what the bug is, the test case is trivial:

$ while true; do perl -e0; done

The kernel crashes.

>From 2.6.25-rc2-mm1.bz2 patch:
>
> - .ioctl = tty_ioctl,
> + .unlocked_ioctl = tty_ioctl,

and this is why this didn't happen before, I guess.

> I'll try to think more about this, but I doubt very much I'll find the
> reason :(

Ohhh... 7 (!!!) hours of hacking + some vodka did the trick.

(Kamalesh, I think you hit the same bug).

Oleg.

2008-02-18 04:20:06

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH] do_signal_stop: use signal_group_exit()

Oleg Nesterov <[email protected]> writes:

> On 02/16, Oleg Nesterov wrote:
>>
>> On 02/15, Andrew Morton wrote:
>> >
>> > ug. On about the fourth boot with the current -mm lineup I hit:
>> >
>> > : BUG: unable to handle kernel paging request at 0000000000200200
>> ^^^^^^^^^^^^^^^^
>> == LIST_POISON2
>>
>> > : IP: [<ffffffff802444f5>] free_pid+0x35/0x8e
>>
>> most probably == hlist_del_rcu(pid_chain)
>>
>> > : Call Trace:
>> > : [<ffffffff80237727>] ? release_task+0x152/0x2e5
>> > : [<ffffffff80237f81>] ? do_wait+0x6c7/0xa1c
>> > : [<ffffffff8022f4cc>] ? default_wake_function+0x0/0xe
>> > : [<ffffffff8023e670>] ? sys_rt_sigaction+0x7a/0x98
>> > : [<ffffffff80238360>] ? sys_wait4+0x8a/0xa1
>> > : [<ffffffff8020be4b>] ? system_call_after_swapgs+0x7b/0x80
>>
>> (Can't understand why there is no detach_pid() in this stack trace,
>> but it is the only possible caller of free_pid()).

That is just because of tail call optimization. There is no need
to return to detach_pid so we just jumped to free_pid.

>
>> So, detach_pid()->free_pid() hit an already unhashed pid. But this
>> is not possible?
>>
>> This means we already did detach_pid(), but in that case the previous
>> detach_pid() has set task->pids[].pid = NULL, and we should OOPS earlier,
>> somewhere at "if (!hlist_empty(&pid->tasks[tmp]))".
>
> Yes, this is not possible.
>
> But what possible is: we have the unbalanced put_pid(pid) which frees the
> live pid. The next alloc_pid() gets the same memory, and initializes all
> pid->tasks[] lists.
>
> Now, if that pid was used as PIDTYPE_PGID/PIDTYPE_SID, the next detach_pid()
> from this pgrp/sid sees that the pid is not used (all lists are hlist_empty),
> frees this pid again, and the bug manifests itself this way.
>
> tiocspgrp:
>
> put_pid(real_tty->pgrp);
> // ------ WINDOW ------
> real_tty->pgrp = get_pid(pgrp);
>
> When bash spawns the command, both parent and child do ioctl(TIOCSPGRP,child),
> and it is possible that both do put_pid() on the same parent's pid.
>
> Damn, when you know what the bug is, the test case is trivial:
>
> $ while true; do perl -e0; done
>
> The kernel crashes.
>
>>From 2.6.25-rc2-mm1.bz2 patch:
>>
>> - .ioctl = tty_ioctl,
>> + .unlocked_ioctl = tty_ioctl,
>
> and this is why this didn't happen before, I guess.
>
>> I'll try to think more about this, but I doubt very much I'll find the
>> reason :(
>
> Ohhh... 7 (!!!) hours of hacking + some vodka did the trick.
>
> (Kamalesh, I think you hit the same bug).

Thanks. Looks like we need to grab a lock there.
At a quick skim I think we need the tty lock.


Eric

2008-02-19 23:42:14

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: [PATCH] do_signal_stop: use signal_group_exit()

On Sun, 17 Feb 2008 21:11:14 MST, Eric W. Biederman said:
> Oleg Nesterov <[email protected]> writes:
> > On 02/16, Oleg Nesterov wrote:
> >> On 02/15, Andrew Morton wrote:
> >> > : BUG: unable to handle kernel paging request at 0000000000200200

> >> > : Call Trace:
> >> > : [<ffffffff80237727>] ? release_task+0x152/0x2e5
> >> > : [<ffffffff80237f81>] ? do_wait+0x6c7/0xa1c
> >> > : [<ffffffff8022f4cc>] ? default_wake_function+0x0/0xe
> >> > : [<ffffffff8023e670>] ? sys_rt_sigaction+0x7a/0x98
> >> > : [<ffffffff80238360>] ? sys_wait4+0x8a/0xa1
> >> > : [<ffffffff8020be4b>] ? system_call_after_swapgs+0x7b/0x80

> Thanks. Looks like we need to grab a lock there.
> At a quick skim I think we need the tty lock.

*ping* - Any further activity on this one? I got bit by it as well on
the very first attempted boot of 25-rc2-mm1, the instant it tried to leave
single-user and go multi-user.


Attachments:
(No filename) (226.00 B)

2008-02-20 02:40:19

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH] do_signal_stop: use signal_group_exit()

Oleg Nesterov <[email protected]> writes:

>
>>From 2.6.25-rc2-mm1.bz2 patch:
>>
>> - .ioctl = tty_ioctl,
>> + .unlocked_ioctl = tty_ioctl,
>
> and this is why this didn't happen before, I guess.

Right. Does anyone know what kind of audit was made of the tty code
to ensure everything would be fine?

It it was pretty thorough and it was just this little corner case
it makes sense to get the struct tty locking for pids correct.

Otherwise since the tty layer is historically not especially good
with it's locking. We should just revert the change above, and save
attacking this until someone has time to do a thorough review.

Eric

2008-02-20 16:33:55

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH] (for -mm only) put_pid: make sure we don't free the live pid

[PATCH] (for -mm only) put_pid: make sure we don't free the live pid

Add the temporary (for -mm only) debugging code to catch the unbalanced
put_pid()'s. At least those which can free the "live" pid.

Signed-off-by: Oleg Nesterov <[email protected]>

--- MM/kernel/pid.c~ 2008-02-20 18:29:40.000000000 +0300
+++ MM/kernel/pid.c 2008-02-20 18:35:15.000000000 +0300
@@ -208,6 +208,10 @@ void put_pid(struct pid *pid)
ns = pid->numbers[pid->level].ns;
if ((atomic_read(&pid->count) == 1) ||
atomic_dec_and_test(&pid->count)) {
+ int type = PIDTYPE_MAX;
+ while (--type >= 0)
+ if (WARN_ON(!hlist_empty(&pid->tasks[type])))
+ return;
kmem_cache_free(ns->pid_cachep, pid);
put_pid_ns(ns);
}

2008-02-20 16:36:32

by Oleg Nesterov

[permalink] [raw]
Subject: tty && pid problems

(Change the subject, cc Alan)

On 02/19, [email protected] wrote:
>
> On Sun, 17 Feb 2008 21:11:14 MST, Eric W. Biederman said:
> > Oleg Nesterov <[email protected]> writes:
> > > On 02/16, Oleg Nesterov wrote:
> > >> On 02/15, Andrew Morton wrote:
> > >> > : BUG: unable to handle kernel paging request at 0000000000200200
>
> > >> > : Call Trace:
> > >> > : [<ffffffff80237727>] ? release_task+0x152/0x2e5
> > >> > : [<ffffffff80237f81>] ? do_wait+0x6c7/0xa1c
> > >> > : [<ffffffff8022f4cc>] ? default_wake_function+0x0/0xe
> > >> > : [<ffffffff8023e670>] ? sys_rt_sigaction+0x7a/0x98
> > >> > : [<ffffffff80238360>] ? sys_wait4+0x8a/0xa1
> > >> > : [<ffffffff8020be4b>] ? system_call_after_swapgs+0x7b/0x80
>
> > Thanks. Looks like we need to grab a lock there.
> > At a quick skim I think we need the tty lock.
>
> *ping* - Any further activity on this one? I got bit by it as well on
> the very first attempted boot of 25-rc2-mm1, the instant it tried to leave
> single-user and go multi-user.

Valdis, any chance you can try the
"[PATCH] (for -mm only) put_pid: make sure we don't free the live pid"
I sent? just to make sure we don't have other problems here.

Oleg.

2008-02-20 16:39:19

by Oleg Nesterov

[permalink] [raw]
Subject: Re: tty && pid problems

(sorry, the previous message was not finished)

On 02/20, Oleg Nesterov wrote:
>
> (Change the subject, cc Alan)
>
> On 02/19, [email protected] wrote:
> >
> > On Sun, 17 Feb 2008 21:11:14 MST, Eric W. Biederman said:
> > > Oleg Nesterov <[email protected]> writes:
> > > > On 02/16, Oleg Nesterov wrote:
> > > >> On 02/15, Andrew Morton wrote:
> > > >> > : BUG: unable to handle kernel paging request at 0000000000200200
> >
> > > >> > : Call Trace:
> > > >> > : [<ffffffff80237727>] ? release_task+0x152/0x2e5
> > > >> > : [<ffffffff80237f81>] ? do_wait+0x6c7/0xa1c
> > > >> > : [<ffffffff8022f4cc>] ? default_wake_function+0x0/0xe
> > > >> > : [<ffffffff8023e670>] ? sys_rt_sigaction+0x7a/0x98
> > > >> > : [<ffffffff80238360>] ? sys_wait4+0x8a/0xa1
> > > >> > : [<ffffffff8020be4b>] ? system_call_after_swapgs+0x7b/0x80
> >
> > > Thanks. Looks like we need to grab a lock there.
> > > At a quick skim I think we need the tty lock.
> >
> > *ping* - Any further activity on this one? I got bit by it as well on
> > the very first attempted boot of 25-rc2-mm1, the instant it tried to leave
> > single-user and go multi-user.
>
> Valdis, any chance you can try the
> "[PATCH] (for -mm only) put_pid: make sure we don't free the live pid"
> I sent? just to make sure we don't have other problems here.

I think you can revert the tty-bkl-pushdown.patch. Or, as Eric suggested, just
revert this

@@ -1222,7 +1221,7 @@ static const struct file_operations tty_
.read = tty_read,
.write = tty_write,
.poll = tty_poll,
- .ioctl = tty_ioctl,
+ .unlocked_ioctl = tty_ioctl,
.compat_ioctl = tty_compat_ioctl,
.open = tty_open,
.release = tty_release,
@@ -1235,7 +1234,7 @@ static const struct file_operations ptmx
.read = tty_read,
.write = tty_write,
.poll = tty_poll,
- .ioctl = tty_ioctl,
+ .unlocked_ioctl = tty_ioctl,
.compat_ioctl = tty_compat_ioctl,
.open = ptmx_open,
.release = tty_release,
@@ -1248,7 +1247,7 @@ static const struct file_operations cons
.read = tty_read,
.write = redirected_tty_write,
.poll = tty_poll,
- .ioctl = tty_ioctl,
+ .unlocked_ioctl = tty_ioctl,
.compat_ioctl = tty_compat_ioctl,
.open = tty_open,
.release = tty_release,
@@ -1260,7 +1259,7 @@ static const struct file_operations hung
.read = hung_up_tty_read,
.write = hung_up_tty_write,
.poll = hung_up_tty_poll,
- .ioctl = hung_up_tty_ioctl,
+ .unlocked_ioctl = hung_up_tty_ioctl,
.compat_ioctl = hung_up_tty_compat_ioctl,
.release = tty_release,
};

chunk. I'd prefer to know what Alan's opinion.


HOWEVER. We have another bug here, and this bug is old.

When tiocspgrp() does put_pid(real_tty->pgrp), it is possible that real_tty
has the last reference, and the pid will be actually freed. This means that
tiocgpgrp() and do_task_stat() are not safe (rcu_read_lock() can't help).
We can read the freed/reused memory, and since pid_nr_ns() is not trivial
the kernel can crash. Unlikely, but possible. We need the lock.

Oleg.

2008-02-20 16:40:24

by Alan

[permalink] [raw]
Subject: Re: tty && pid problems

> > *ping* - Any further activity on this one? I got bit by it as well on
> > the very first attempted boot of 25-rc2-mm1, the instant it tried to leave
> > single-user and go multi-user.
>
> Valdis, any chance you can try the
> "[PATCH] (for -mm only) put_pid: make sure we don't free the live pid"
> I sent? just to make sure we don't have other problems here.

There is some other iffy locking of the pid objects ever since they were
changed from pid_t to ref counted structs. Whoever did that didn't add
any locking for it, and the old code knew it was "safe" not to.

I've added locks in my test tree and now I've finally got -mm to build
will do some testing then push more stuff upstream

2008-02-20 19:00:59

by Jiri Slaby

[permalink] [raw]
Subject: Re: tty && pid problems

On 02/20/2008 05:28 PM, Oleg Nesterov wrote:
> I think you can revert the tty-bkl-pushdown.patch. Or, as Eric suggested, just
> revert this
>
> @@ -1222,7 +1221,7 @@ static const struct file_operations tty_
> .read = tty_read,
> .write = tty_write,
> .poll = tty_poll,
> - .ioctl = tty_ioctl,
> + .unlocked_ioctl = tty_ioctl,
> .compat_ioctl = tty_compat_ioctl,
> .open = tty_open,
> .release = tty_release,
> @@ -1235,7 +1234,7 @@ static const struct file_operations ptmx
> .read = tty_read,
> .write = tty_write,
> .poll = tty_poll,
> - .ioctl = tty_ioctl,
> + .unlocked_ioctl = tty_ioctl,
> .compat_ioctl = tty_compat_ioctl,
> .open = ptmx_open,
> .release = tty_release,
> @@ -1248,7 +1247,7 @@ static const struct file_operations cons
> .read = tty_read,
> .write = redirected_tty_write,
> .poll = tty_poll,
> - .ioctl = tty_ioctl,
> + .unlocked_ioctl = tty_ioctl,
> .compat_ioctl = tty_compat_ioctl,
> .open = tty_open,
> .release = tty_release,
> @@ -1260,7 +1259,7 @@ static const struct file_operations hung
> .read = hung_up_tty_read,
> .write = hung_up_tty_write,
> .poll = hung_up_tty_poll,
> - .ioctl = hung_up_tty_ioctl,
> + .unlocked_ioctl = hung_up_tty_ioctl,
> .compat_ioctl = hung_up_tty_compat_ioctl,
> .release = tty_release,
> };
>
> chunk.

This would result in unpredictable behaviour. If I left locking apart, ioctl
prototype != unlocked_ioctl prototype.

2008-02-22 01:42:23

by Eric W. Biederman

[permalink] [raw]
Subject: Re: tty && pid problems

Alan Cox <[email protected]> writes:

>> > *ping* - Any further activity on this one? I got bit by it as well on
>> > the very first attempted boot of 25-rc2-mm1, the instant it tried to leave
>> > single-user and go multi-user.
>>
>> Valdis, any chance you can try the
>> "[PATCH] (for -mm only) put_pid: make sure we don't free the live pid"
>> I sent? just to make sure we don't have other problems here.
>
> There is some other iffy locking of the pid objects ever since they were
> changed from pid_t to ref counted structs. Whoever did that didn't add
> any locking for it, and the old code knew it was "safe" not to.
>
> I've added locks in my test tree and now I've finally got -mm to build
> will do some testing then push more stuff upstream

Thanks. At the tty layer that was probably me.
Most of the instances already appear to be nested in some other kind of
locking, but that doesn't make no additional locking correct or ensure
that it will give a uniform result.

Eric

2008-02-22 09:48:53

by Alan

[permalink] [raw]
Subject: Re: tty && pid problems

> > I've added locks in my test tree and now I've finally got -mm to build
> > will do some testing then push more stuff upstream
>
> Thanks. At the tty layer that was probably me.
> Most of the instances already appear to be nested in some other kind of
> locking, but that doesn't make no additional locking correct or ensure
> that it will give a uniform result.

Fortunately your pid struct is ref counted so not too hard to sort out.
Need to look at procfs but at worst tty needs to export a function which
returns a reference bumped pid struct to people who stick their nose in
from outside.

Alan