2013-07-27 09:53:59

by Toralf Förster

[permalink] [raw]
Subject: fuzz tested user mode linux core dumps in fs/lockd/clntproc.c:131

I do have a user mode linux image (stable 32 bit Gentoo Linux ) which erratically crashes
while fuzz tested with trinity if the victim files are located on a NFS share.

The back trace of the core dumps always looks like the attached.

To bisect it is hard. However after few attempts in the last weeks the following
commit is either the first bad commit or at least the upper limit.


commit 8aac62706adaaf0fab02c4327761561c8bda9448
Author: Oleg Nesterov <[email protected]>
Date: Fri Jun 14 21:09:49 2013 +0200

move exit_task_namespaces() outside of exit_notify()



tfoerste@n22 ~/devel/linux $ gdb --core=/mnt/ramdisk/core /home/tfoerste/devel/linux/linux -n -batch -ex bt
[New LWP 20802]
Core was generated by `/home/tfoerste/devel/linux/linux earlyprintk ubda=/home/tfoerste/virtual/uml/tr'.
Program terminated with signal 6, Aborted.
#0 0xb778e424 in __kernel_vsyscall ()
#0 0xb778e424 in __kernel_vsyscall ()
#1 0x08396175 in kill ()
#2 0x0807155d in uml_abort () at arch/um/os-Linux/util.c:93
#3 0x08071845 in os_dump_core () at arch/um/os-Linux/util.c:138
#4 0x08061197 in panic_exit (self=0x8591518 <panic_exit_notifier>, unused1=0, unused2=0x85c5d60 <buf.12251>) at arch/um/kernel/um_arch.c:240
#5 0x0809daf8 in notifier_call_chain (nl=0x0, val=0, v=0x85c5d60 <buf.12251>, nr_to_call=-2, nr_calls=0x0) at kernel/notifier.c:93
#6 0x0809dc43 in __atomic_notifier_call_chain (nr_calls=<optimized out>, nr_to_call=<optimized out>, v=<optimized out>, val=<optimized out>, nh=<optimized out>) at kernel/notifier.c:182
#7 atomic_notifier_call_chain (nh=0x85c5d44 <panic_notifier_list>, val=0, v=0x85c5d60 <buf.12251>) at kernel/notifier.c:191
#8 0x083f34f8 in panic (fmt=0x0) at kernel/panic.c:127
#9 0x08060b5e in segv (fi=<incomplete type>, ip=136527369, is_user=0, regs=0x858f85c <cpu0_irqstack+30812>) at arch/um/kernel/trap.c:209
#10 0x08060e13 in segv_handler (sig=11, unused_si=0x858fb0c <cpu0_irqstack+31500>, regs=0x858f85c <cpu0_irqstack+30812>) at arch/um/kernel/trap.c:185
#11 0x080706a8 in sig_handler_common (sig=11, si=0x858fb0c <cpu0_irqstack+31500>, mc=0x858fba0 <cpu0_irqstack+31648>) at arch/um/os-Linux/signal.c:44
#12 0x080707ed in sig_handler (sig=0, si=0x858fb0c <cpu0_irqstack+31500>, mc=0x858fba0 <cpu0_irqstack+31648>) at arch/um/os-Linux/signal.c:231
#13 0x0807033b in hard_handler (sig=6, si=0x858fb0c <cpu0_irqstack+31500>, p=0x858fba0 <cpu0_irqstack+31648>) at arch/um/os-Linux/signal.c:165
#14 <signal handler called>
#15 nlmclnt_setlockargs (req=0x48e18860, fl=0x48f27c8c) at fs/lockd/clntproc.c:131
#16 0x08234892 in nlmclnt_proc (host=0x48e18860, cmd=7, fl=0x48f27c8c) at fs/lockd/clntproc.c:170
#17 0x081d91ae in nfs_proc_lock (filp=0x0, cmd=0, fl=0x0) at fs/nfs/proc.c:667
#18 0x081ca386 in do_unlk (filp=0x48fbe0c0, cmd=7, fl=0x48f27c8c, is_local=0) at fs/nfs/file.c:773
#19 0x081ca572 in nfs_flock (filp=0x48fbe0c0, cmd=7, fl=0x0) at fs/nfs/file.c:902
#20 0x0813ee6e in locks_remove_flock (filp=0x48fbe0c0) at fs/locks.c:2074
#21 0x080fe438 in __fput (file=0x48fbe0c0) at fs/file_table.c:240
#22 0x080fe55b in ____fput (work=0x48fbe0c0) at fs/file_table.c:285
#23 0x08095f3e in task_work_run () at kernel/task_work.c:87
#24 0x08080c9d in exit_task_work (task=<optimized out>) at include/linux/task_work.h:21
#25 do_exit (code=1224150016) at kernel/exit.c:798
#26 0x080812a7 in do_group_exit (exit_code=11) at kernel/exit.c:931
#27 0x0808bc2d in get_signal_to_deliver (info=0x48f27e34, return_ka=0x48f27eb4, regs=0x48db31d4, cookie=0x0) at kernel/signal.c:2370
#28 0x0805f6ec in kern_do_signal (regs=0x48db31d4) at arch/um/kernel/signal.c:77
#29 0x0805f7ed in do_signal () at arch/um/kernel/signal.c:123
#30 0x0805e6b7 in interrupt_end () at arch/um/kernel/process.c:107
#31 0x08073c1b in userspace (regs=0x48db31d4) at arch/um/os-Linux/skas/process.c:464
#32 0x0805e44c in fork_handler () at arch/um/kernel/process.c:160
#33 0x5a5a5a5a in ?? ()


--
MfG/Sincerely
Toralf Förster
pgp finger print: 7B1A 07F4 EC82 0F90 D4C2 8936 872A E508 7DB6 9DA3


2013-07-29 15:43:24

by Andrei Vagin

[permalink] [raw]
Subject: Re: fuzz tested user mode linux core dumps in fs/lockd/clntproc.c:131

2013/7/29 Oleg Nesterov <[email protected]>:
> On 07/29, Andrew Vagin wrote:
>>
>> I don't have objections against this patch. All my tests work find.
>
> Great, thanks for confirmation. Can I translate this into your
> acked-by or tested-by if I send this patch?

Yes, you can.
Acked-by: Andrew Vagin <[email protected]>

>
> (there is another fix from Eric, it is not clear to me if I
> should send it right now).
>
> Oleg.
>

2013-07-30 21:20:43

by Myklebust, Trond

[permalink] [raw]
Subject: Re: fuzz tested user mode linux core dumps in fs/lockd/clntproc.c:131 (nfs in a netns utsns problems?)

T24gVHVlLCAyMDEzLTA3LTMwIGF0IDE3OjEyIC0wNDAwLCBKLiBCcnVjZSBGaWVsZHMgd3JvdGU6
DQo+IE9uIE1vbiwgSnVsIDI5LCAyMDEzIGF0IDEwOjQyOjU3QU0gLTA3MDAsIEVyaWMgVy4gQmll
ZGVybWFuIHdyb3RlOg0KPiA+IA0KPiA+IEFkZGluZyBzb21lIHBlb3BsZSB3aG8gcGF5IG1vcmUg
YXR0ZW50aW9uIHRvIG5mcyBpbiBuZXR3b3JrIG5hbWVzcGFjZXMNCj4gPiB0aGFuIEkgdXN1YWxs
eSBkby4NCj4gPiANCj4gPiBPbGVnIE5lc3Rlcm92IDxvbGVnQHJlZGhhdC5jb20+IHdyaXRlczoN
Cj4gPiANCj4gPiA+IE9uIDA3LzI4LCBFcmljIFcuIEJpZWRlcm1hbiB3cm90ZToNCj4gPiA+Pg0K
PiA+ID4+ID4gVGhpcyB1bnRlc3RlZCBwYXRjaCBzaG91bGQgZml4IGl0IHdpdGhvdXQgYW55IG5l
ZWQgdG8gd29ycnkgYWJvdXQNCj4gPiA+PiA+IGR5bmFtaWMgYmVoYXZpb3IuDQo+ID4gPg0KPiA+
ID4gWWVlZXMgOykgSSB3YXMgdGhpbmtpbmcgYWJvdXQgdGhpcyBjaGFuZ2UgdG9vLCBidXQgSSBo
YXZlIG5vIGlkZWENCj4gPiA+IHdoYXQgdGhpcyAtPm5vZGVuYW1lIGFjdHVhbGx5IG1lYW5zIGZv
ciBuZnMuDQo+ID4gPg0KPiA+ID4gSWYgeW91IGFyZSBnb2luZyB0byBzZW5kIHRoaXMgcGF0Y2gg
LSBncmVhdCENCj4gPiANCj4gPiBKdXN0IGJhdHRpbmcgaXQgYXJvdW5kIGZvciBub3csIGFuZCBo
b3Bpbmcgd2UgaGF2ZSB0aGUgcmlnaHQgY29tYmluYXRpb24NCj4gPiBvZiBleWViYWxscyBsb29r
IGF0IGl0LiAgVGhlcmUgYXJlIG1vcmUgcGxhY2VzIGluIG5mcyB0aGF0IGhhdmUNCj4gPiBxdWVz
dGlvbmFibGUgdXNlcyBvZiB1dHNuYW1lKCkgaW5zdGVhZCBvZiBpbml0X3V0c25hbWUoKS4gIFNv
IEkgdGhpbmsgd2UNCj4gPiBwcm9iYWJseSBuZWVkIGEgbW9yZSBjb21wcmVoZW5zaXZlIHBhdGNo
IGF0IHRoZSB2ZXJ5IGxlYXN0Lg0KPiA+IA0KPiA+IG5mc2NsbnRfcmVjbGFpbSBpcyBuZXZlciBj
YWxsZWQgZnJvbSB1c2Vyc3BhY2UuDQo+IA0KPiBTbywgbG9va2luZyBqdXN0IGF0IHRoaXMgb25l
Li4uLg0KPiANCj4gTm90ZSBJIHRoaW5rIHlvdSBtZWFuIG5sbWNsbnRfcmVjbGFpbS4NCj4gDQo+
IFRoYXQncyBwYXJ0IG9mIHRoZSBjbGllbnQncyBoYW5kbGluZyBvZiBzZXJ2ZXIgcmVib290cy4g
IFRoZSBjbGllbnQNCj4ga25vd3MgdGhhdCBpdCBzaG91bGQgaG9sZCBzb21lIGZpbGUgbG9jaywg
YnV0IGtub3dzIHRoYXQgdGhlIHNlcnZlciBoYXMNCj4gbm93IGZvcmdvdHRlbiB0aGF0IGZhY3Qs
IGFuZCBuZWVkcyB0byBzZW5kIGEgInJlY2xhaW0iIHRvIGdldCB0aGUgbG9jaw0KPiBiYWNrLg0K
PiANCj4gVGhhdCByZWNsYWltIHdpbGwgZ2V0IGtpY2tlZCBvZmYgd2hlbiB0aGUgY2xpZW50J3Mg
bm90aWZpZWQgdGhhdCB0aGUNCj4gc2VydmVyIHJlYm9vdGVkLg0KPiANCj4gU28gd2UncmUgbm90
IGluIHRoZSBjb250ZXh0IG9mIHdob2V2ZXIgb3JpZ2luYWxseSBkaWQgdGhlDQo+IGZjbnRsKC4s
Rl9TRVRMSywuKSwgYW5kIHRyeWluZyB0byBnZXQgdGhlIG5hbWVzcGFjZSBvdXQgb2YgY3VycmVu
dCBpcw0KPiBjbGVhcmx5IHdyb25nLg0KPiANCj4gbmxtX2hvc3QgY3VycmVudGx5IGhhcyBhICJz
dHJ1Y3QgbmV0ICpuZXQiIGZpZWxkLiAgTWF5YmUgd2UgYWxzbyBuZWVkIHRvDQo+IHN0YXNoIGEg
InN0cnVjdCB1dHNfbmFtZXNwYWNlICoiLCBvciBqdXN0IGEgY29weSBvZiB0aGUgbm9kZW5hbWU/
DQoNClRoZSBzdHJ1Y3QgcnBjX2NsbnQgYWxyZWFkeSBoYXMgdGhhdCBpbmZvcm1hdGlvbiBpbiB0
aGUgY2xfbm9kZW5hbWUNCmZpZWxkLg0KDQpDaGVlcnMNCiAgVHJvbmQNCg0KLS0gDQpUcm9uZCBN
eWtsZWJ1c3QNCkxpbnV4IE5GUyBjbGllbnQgbWFpbnRhaW5lcg0KDQpOZXRBcHANClRyb25kLk15
a2xlYnVzdEBuZXRhcHAuY29tDQp3d3cubmV0YXBwLmNvbQ0K

2013-07-29 18:17:12

by Eric W. Biederman

[permalink] [raw]
Subject: Re: fuzz tested user mode linux core dumps in fs/lockd/clntproc.c:131 (nfs in a netns utsns problems?)

Oleg Nesterov <[email protected]> writes:

> On 07/29, Eric W. Biederman wrote:
>>
>> So I really don't think using utsname() aka current->nsproxy->uts_ns
>> makes sense in nlmclnt_setlockargs.
>>
>> We most definitely have an inconsistency in nfs.
>
> I tend to agree, but can't really comment.

If I could justify another couple of hours I could write the patch and
justify it. I have cgroups exploding around my ears however.

>> > Yes. And perhaps the patch which moves exit_task_namespaces() after
>> > exit_task_work() makes sense anyway (the patch I showed).
>> >
>> > (but if you change nlmclnt_setlockargs() then it is not 3.11 material).
>> >
>> > The original motivation for 8aac62706 was the leak reported by Andrey,
>> > but that leak should be also fixed by e7b2c406. "Move exit_task_namespaces()
>> > from exit_notify() to do_exit()" is still fine imho, the reason for
>> > exit_task_namespaces() from the middle of exit_notify() has gone away.
>> >
>> > But perhaps it would be better if work->func() could use ->nsproxy even
>> > if the task is PF_EXITING.
>>
>> So far there is nothing in the nfs code that would suggest allowing
>> work->func() being able to use ->nsproxy would make this code any
>> better. I think that would just paper over the problem we are seeing
>> right now.
>
> I think you misunderstood my point.
>
> I fully agree if you change nlmclnt_setlockargs(). I am suggesting to
> move exit_task_namespaces() down after exit_task_work() as a separate
> change which perhaps makes sense by itself. Not to fix this problem,
> not for nfs, not for fput().
>
> Just to allow work->func() to play with ->nsproxy if needed. task_work
> has other users, not only fput().

So to clarify I see little evidence either way on the question of should
work->funk be able to use ->nsproxy if the task is PF_EXITING. What
little evidence I see suggests that we are actually better off not being
able to access ->nsproxy.

Eric




2013-07-29 06:32:07

by Andrew Vagin

[permalink] [raw]
Subject: Re: fuzz tested user mode linux core dumps in fs/lockd/clntproc.c:131

On Sun, Jul 28, 2013 at 07:58:28PM +0200, Oleg Nesterov wrote:
> On 07/28, Toralf Förster wrote:
> >
> > The attached patch works - applied on top of current git -
> > at least the issue cannot be reproduced then.
>
> Thanks Toralf.
>
> I'll write the changelog and send the patch tomorrow.
>
> Andrey, any chance you can check that with this patch free_ipc_ns()
> doesn't have any problem with ->shm_file ?

kmemleak doesn't detect any leak, but I think this patch is incorrect.

According to my previous investigations exit_task_work should be called
after exit task namespaces
(http://comments.gmane.org/gmane.linux.kernel/1475123)

I applied the following patch:

@@ -11,8 +11,11 @@ task_work_add(struct task_struct *task, struct
callback_head *work, bool notify)

do {
head = ACCESS_ONCE(task->task_works);
- if (unlikely(head == &work_exited))
+ if (unlikely(head == &work_exited)) {
+ printk("%s:%d\n", __func__, __LINE__);
+ dump_stack();
return -ESRCH;
+ }
work->next = head;
} while (cmpxchg(&task->task_works, head, work) != head);


and I got a few backtraces in a kernel log

[ 151.513725] task_work_add:15
[ 151.514860] CPU: 1 PID: 15303 Comm: ipc Not tainted 3.11.0-rc2+ #75
[ 151.516743] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 151.518558] ffff880067bf0000 ffff88006922fba0 ffffffff81630dd5 ffff88006d9b2280
[ 151.521767] ffff88006922fbb0 ffffffff8107b478 ffff88006922fbd0 ffffffff8119ad43
[ 151.524587] ffff880079e81740 ffff88007a9035c8 ffff88006922fbe8 ffffffff81281ebd
[ 151.527785] Call Trace:
[ 151.528811] [<ffffffff81630dd5>] dump_stack+0x45/0x56
[ 151.530378] [<ffffffff8107b478>] task_work_add+0x78/0x80
[ 151.533219] [<ffffffff8119ad43>] fput+0x63/0xa0
[ 151.534884] [<ffffffff81281ebd>] shm_destroy+0x7d/0xb0
[ 151.536813] [<ffffffff81281f05>] do_shm_rmid+0x15/0x50
[ 151.539523] [<ffffffff81286572>] free_ipcs+0xa2/0xf0
[ 151.541595] [<ffffffff81286534>] ? free_ipcs+0x64/0xf0
[ 151.544188] [<ffffffff81281ef0>] ? shm_destroy+0xb0/0xb0
[ 151.546393] [<ffffffff81282990>] shm_exit_ns+0x20/0x30
[ 151.548675] [<ffffffff81286619>] put_ipc_ns+0x59/0x80
[ 151.552764] [<ffffffff81083afd>] free_nsproxy+0x3d/0x90
[ 151.560241] [<ffffffff81083d45>] switch_task_namespaces+0x45/0x50
[ 151.564211] [<ffffffff81083d60>] exit_task_namespaces+0x10/0x20
[ 151.566097] [<ffffffff8105d3fd>] do_exit+0x2ad/0xa20
[ 151.567744] [<ffffffff81307ec1>] ? do_raw_spin_lock+0x41/0x110
[ 151.570734] [<ffffffff8163965c>] ? _raw_spin_unlock_irq+0x2c/0x40
[ 151.573967] [<ffffffff8105dbf9>] do_group_exit+0x49/0xc0
[ 151.576773] [<ffffffff8106d593>] get_signal_to_deliver+0x293/0x640
[ 151.580575] [<ffffffff81002458>] do_signal+0x48/0x5a0
[ 151.582401] [<ffffffff811b83f6>] ? mntput_no_expire+0xd6/0x120
[ 151.584418] [<ffffffff8163a41e>] ? paranoid_userspace+0x39/0x5a
[ 151.588639] [<ffffffff810bc42d>] ? trace_hardirqs_on_caller+0xfd/0x1c0
[ 151.590809] [<ffffffff81002a18>] do_notify_resume+0x68/0x90
[ 151.604976] [<ffffffff8163a430>] paranoid_userspace+0x4b/0x5a


Thanks,
Andrey

>
> e7b2c406 should be enough to fix that leak, but it would be nice if
> you can confirm.
>
> > On 07/27/2013 07:00 PM, Oleg Nesterov wrote:
> > > On 07/27, Toralf Förster wrote:
> > >>
> > >> I do have a user mode linux image (stable 32 bit Gentoo Linux ) which erratically crashes
> > >> while fuzz tested with trinity if the victim files are located on a NFS share.
> > >>
> > >> The back trace of the core dumps always looks like the attached.
> > >>
> > >> To bisect it is hard. However after few attempts in the last weeks the following
> > >> commit is either the first bad commit or at least the upper limit (less likely).
> > >>
> > >>
> > >> commit 8aac62706adaaf0fab02c4327761561c8bda9448
> > >> Author: Oleg Nesterov <[email protected]>
> > >> Date: Fri Jun 14 21:09:49 2013 +0200
> > >>
> > >> move exit_task_namespaces() outside of exit_notify()
> > >>
> > >> #15 nlmclnt_setlockargs (req=0x48e18860, fl=0x48f27c8c) at fs/lockd/clntproc.c:131
> > >
> > > Thanks.
> > >
> > > So nlmclnt_setlockargs()->utsname() crashes and we probably need
> > > the patch below.
> > >
> > > But is it correct? I know _absolutely_ nothing about nfs/sunrpc/etc and
> > > I never looked into this code before, most probably I am wrong.
> > >
> > > But it seems that __nlm_async_call() relies on workqueues.
> > > nlmclnt_async_call() does rpc_wait_for_completion_task(), but what if
> > > the caller is killed?
> > >
> > > nlm_rqst can't go away, ->a_count was incremented. But can't the caller
> > > exit before call->name is used? In this case the memory it points to
> > > can be already freed.
> > >
> > > Oleg.
> > >
> > > --- x/kernel/exit.c
> > > +++ x/kernel/exit.c
> > > @@ -783,8 +783,8 @@ void do_exit(long code)
> > > exit_shm(tsk);
> > > exit_files(tsk);
> > > exit_fs(tsk);
> > > - exit_task_namespaces(tsk);
> > > exit_task_work(tsk);
> > > + exit_task_namespaces(tsk);
> > > check_stack_usage();
> > > exit_thread();
> > >
> > >
> > >
> >
> >
> > --
> > MfG/Sincerely
> > Toralf Förster
> > pgp finger print: 7B1A 07F4 EC82 0F90 D4C2 8936 872A E508 7DB6 9DA3
>

2013-07-29 14:23:20

by Oleg Nesterov

[permalink] [raw]
Subject: Re: fuzz tested user mode linux core dumps in fs/lockd/clntproc.c:131

On 07/28, Eric W. Biederman wrote:
>
> > This untested patch should fix it without any need to worry about
> > dynamic behavior.

Yeees ;) I was thinking about this change too, but I have no idea
what this ->nodename actually means for nfs.

If you are going to send this patch - great!

> Although I am wondering if we have a few other spots
> > where the dynamic behavior might be iffy.

Yes. And perhaps the patch which moves exit_task_namespaces() after
exit_task_work() makes sense anyway (the patch I showed).

(but if you change nlmclnt_setlockargs() then it is not 3.11 material).

The original motivation for 8aac62706 was the leak reported by Andrey,
but that leak should be also fixed by e7b2c406. "Move exit_task_namespaces()
from exit_notify() to do_exit()" is still fine imho, the reason for
exit_task_namespaces() from the middle of exit_notify() has gone away.

But perhaps it would be better if work->func() could use ->nsproxy even
if the task is PF_EXITING.

> > Serge do you remember any of this?
> >
> > On a good day I can follow the nfs code but it takes quite a while. I
> > feel the same way about filesystems locks so I am not really certain
> > what is going on.
> >
> > Eric
> >
> > diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c
> > index 9760ecb..6643cfc 100644
> > --- a/fs/lockd/clntproc.c
> > +++ b/fs/lockd/clntproc.c
> > @@ -128,11 +128,11 @@ static void nlmclnt_setlockargs(struct nlm_rqst *req, struct file_lock *fl)
> >
> > nlmclnt_next_cookie(&argp->cookie);
> > memcpy(&lock->fh, NFS_FH(file_inode(fl->fl_file)), sizeof(struct nfs_fh));
> > - lock->caller = utsname()->nodename;
> > + lock->caller = init_utsname()->nodename;
> > lock->oh.data = req->a_owner;
> > lock->oh.len = snprintf(req->a_owner, sizeof(req->a_owner), "%u@%s",
> > (unsigned int)fl->fl_u.nfs_fl.owner->pid,
> > - utsname()->nodename);
> > + init_utsname()->nodename);
> > lock->svid = fl->fl_u.nfs_fl.owner->pid;
> > lock->fl.fl_start = fl->fl_start;
> > lock->fl.fl_end = fl->fl_end;
> >
> > Eric


2013-07-29 13:15:50

by Oleg Nesterov

[permalink] [raw]
Subject: Re: fuzz tested user mode linux core dumps in fs/lockd/clntproc.c:131

On 07/29, Andrew Vagin wrote:
>
> On Sun, Jul 28, 2013 at 07:58:28PM +0200, Oleg Nesterov wrote:
> > On 07/28, Toralf F?rster wrote:
> > >
> > > The attached patch works - applied on top of current git -
> > > at least the issue cannot be reproduced then.
> >
> > Thanks Toralf.
> >
> > I'll write the changelog and send the patch tomorrow.
> >
> > Andrey, any chance you can check that with this patch free_ipc_ns()
> > doesn't have any problem with ->shm_file ?
>
> kmemleak doesn't detect any leak,

Good.

> but I think this patch is incorrect.
>
> According to my previous investigations exit_task_work should be called
> after exit task namespaces
> (http://comments.gmane.org/gmane.linux.kernel/1475123)
>
> I applied the following patch:
>
> @@ -11,8 +11,11 @@ task_work_add(struct task_struct *task, struct
> callback_head *work, bool notify)
>
> do {
> head = ACCESS_ONCE(task->task_works);
> - if (unlikely(head == &work_exited))
> + if (unlikely(head == &work_exited)) {
> + printk("%s:%d\n", __func__, __LINE__);
> + dump_stack();
> return -ESRCH;
> + }
> work->next = head;
> } while (cmpxchg(&task->task_works, head, work) != head);
>
>
> and I got a few backtraces in a kernel log
>
> [ 151.513725] task_work_add:15
> [ 151.514860] CPU: 1 PID: 15303 Comm: ipc Not tainted 3.11.0-rc2+ #75
> [ 151.516743] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> [ 151.518558] ffff880067bf0000 ffff88006922fba0 ffffffff81630dd5 ffff88006d9b2280
> [ 151.521767] ffff88006922fbb0 ffffffff8107b478 ffff88006922fbd0 ffffffff8119ad43
> [ 151.524587] ffff880079e81740 ffff88007a9035c8 ffff88006922fbe8 ffffffff81281ebd
> [ 151.527785] Call Trace:
> [ 151.528811] [<ffffffff81630dd5>] dump_stack+0x45/0x56
> [ 151.530378] [<ffffffff8107b478>] task_work_add+0x78/0x80
> [ 151.533219] [<ffffffff8119ad43>] fput+0x63/0xa0

But this is fine?

Once again, we also have e7b2c406 "fput: task_work_add() can fail if the caller
has passed exit_task_work()" commit which should also fix this particulat problem.

Before this commit - yes, we had to call exit_task_work() after exit_namespaces().

void fput(struct file *file)
{
if (atomic_long_dec_and_test(&file->f_count)) {
struct task_struct *task = current;

file_sb_list_del(file);
if (likely(!in_interrupt() && !(task->flags & PF_KTHREAD))) {
init_task_work(&file->f_u.fu_rcuhead, ____fput);
if (!task_work_add(task, &file->f_u.fu_rcuhead, true))
return;
/*
* After this task has run exit_task_work(),
* task_work_add() will fail. free_ipc_ns()->
* shm_destroy() can do this. Fall through to delayed
* fput to avoid leaking *file.
*/
}

if (llist_add(&file->f_u.fu_llist, &delayed_fput_list))
schedule_work(&delayed_fput_work);
}
}

Please look at the code and the comment about task_work_add().

Or I misunderstood?

Oleg.


2013-07-28 15:26:26

by Toralf Förster

[permalink] [raw]
Subject: Re: fuzz tested user mode linux core dumps in fs/lockd/clntproc.c:131

The attached patch works - applied on top of current git -
at least the issue cannot be reproduced then.

On 07/27/2013 07:00 PM, Oleg Nesterov wrote:
> On 07/27, Toralf Förster wrote:
>>
>> I do have a user mode linux image (stable 32 bit Gentoo Linux ) which erratically crashes
>> while fuzz tested with trinity if the victim files are located on a NFS share.
>>
>> The back trace of the core dumps always looks like the attached.
>>
>> To bisect it is hard. However after few attempts in the last weeks the following
>> commit is either the first bad commit or at least the upper limit (less likely).
>>
>>
>> commit 8aac62706adaaf0fab02c4327761561c8bda9448
>> Author: Oleg Nesterov <[email protected]>
>> Date: Fri Jun 14 21:09:49 2013 +0200
>>
>> move exit_task_namespaces() outside of exit_notify()
>>
>> #15 nlmclnt_setlockargs (req=0x48e18860, fl=0x48f27c8c) at fs/lockd/clntproc.c:131
>
> Thanks.
>
> So nlmclnt_setlockargs()->utsname() crashes and we probably need
> the patch below.
>
> But is it correct? I know _absolutely_ nothing about nfs/sunrpc/etc and
> I never looked into this code before, most probably I am wrong.
>
> But it seems that __nlm_async_call() relies on workqueues.
> nlmclnt_async_call() does rpc_wait_for_completion_task(), but what if
> the caller is killed?
>
> nlm_rqst can't go away, ->a_count was incremented. But can't the caller
> exit before call->name is used? In this case the memory it points to
> can be already freed.
>
> Oleg.
>
> --- x/kernel/exit.c
> +++ x/kernel/exit.c
> @@ -783,8 +783,8 @@ void do_exit(long code)
> exit_shm(tsk);
> exit_files(tsk);
> exit_fs(tsk);
> - exit_task_namespaces(tsk);
> exit_task_work(tsk);
> + exit_task_namespaces(tsk);
> check_stack_usage();
> exit_thread();
>
>
>


--
MfG/Sincerely
Toralf Förster
pgp finger print: 7B1A 07F4 EC82 0F90 D4C2 8936 872A E508 7DB6 9DA3

2013-07-29 00:11:05

by Eric W. Biederman

[permalink] [raw]
Subject: Re: fuzz tested user mode linux core dumps in fs/lockd/clntproc.c:131

Oleg Nesterov <[email protected]> writes:

> On 07/27, Toralf Förster wrote:
>>
>> I do have a user mode linux image (stable 32 bit Gentoo Linux ) which erratically crashes
>> while fuzz tested with trinity if the victim files are located on a NFS share.
>>
>> The back trace of the core dumps always looks like the attached.
>>
>> To bisect it is hard. However after few attempts in the last weeks the following
>> commit is either the first bad commit or at least the upper limit (less likely).
>>
>>
>> commit 8aac62706adaaf0fab02c4327761561c8bda9448
>> Author: Oleg Nesterov <[email protected]>
>> Date: Fri Jun 14 21:09:49 2013 +0200
>>
>> move exit_task_namespaces() outside of exit_notify()
>>
>> #15 nlmclnt_setlockargs (req=0x48e18860, fl=0x48f27c8c) at fs/lockd/clntproc.c:131
>
> Thanks.
>
> So nlmclnt_setlockargs()->utsname() crashes and we probably need
> the patch below.
>
> But is it correct? I know _absolutely_ nothing about nfs/sunrpc/etc and
> I never looked into this code before, most probably I am wrong.
>
> But it seems that __nlm_async_call() relies on workqueues.
> nlmclnt_async_call() does rpc_wait_for_completion_task(), but what if
> the caller is killed?
>
> nlm_rqst can't go away, ->a_count was incremented. But can't the caller
> exit before call->name is used? In this case the memory it points to
> can be already freed.

I don't think anyone has ever looked into that. This was a flyby
conversion by Serge in 2006 when he originally did the uts namespace.


from commit e9ff3990f08e9a0c2839cc22808b01732ea5b3e4
[PATCH] namespaces: utsname: switch to using uts namespaces

Replace references to system_utsname to the per-process uts namespace
where appropriate. This includes things like uname.

Changes: Per Eric Biederman's comments, use the per-process uts namespace
for ELF_PLATFORM, sunrpc, and parts of net/ipv4/ipconfig.c

Hmm. That credits with me with this mess. What was I thinking?
Perhaps I just said you missed a couple of spots.

This untested patch should fix it without any need to worry about
dynamic behavior. Although I am wondering if we have a few other spots
where the dynamic behavior might be iffy.

Serge do you remember any of this?

On a good day I can follow the nfs code but it takes quite a while. I
feel the same way about filesystems locks so I am not really certain
what is going on.

Eric

diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c
index 9760ecb..6643cfc 100644
--- a/fs/lockd/clntproc.c
+++ b/fs/lockd/clntproc.c
@@ -128,11 +128,11 @@ static void nlmclnt_setlockargs(struct nlm_rqst *req, struct file_lock *fl)

nlmclnt_next_cookie(&argp->cookie);
memcpy(&lock->fh, NFS_FH(file_inode(fl->fl_file)), sizeof(struct nfs_fh));
- lock->caller = utsname()->nodename;
+ lock->caller = init_utsname()->nodename;
lock->oh.data = req->a_owner;
lock->oh.len = snprintf(req->a_owner, sizeof(req->a_owner), "%u@%s",
(unsigned int)fl->fl_u.nfs_fl.owner->pid,
- utsname()->nodename);
+ init_utsname()->nodename);
lock->svid = fl->fl_u.nfs_fl.owner->pid;
lock->fl.fl_start = fl->fl_start;
lock->fl.fl_end = fl->fl_end;

Eric

2013-07-29 18:08:22

by Oleg Nesterov

[permalink] [raw]
Subject: Re: fuzz tested user mode linux core dumps in fs/lockd/clntproc.c:131 (nfs in a netns utsns problems?)

On 07/29, Eric W. Biederman wrote:
>
> So I really don't think using utsname() aka current->nsproxy->uts_ns
> makes sense in nlmclnt_setlockargs.
>
> We most definitely have an inconsistency in nfs.

I tend to agree, but can't really comment.

> > Yes. And perhaps the patch which moves exit_task_namespaces() after
> > exit_task_work() makes sense anyway (the patch I showed).
> >
> > (but if you change nlmclnt_setlockargs() then it is not 3.11 material).
> >
> > The original motivation for 8aac62706 was the leak reported by Andrey,
> > but that leak should be also fixed by e7b2c406. "Move exit_task_namespaces()
> > from exit_notify() to do_exit()" is still fine imho, the reason for
> > exit_task_namespaces() from the middle of exit_notify() has gone away.
> >
> > But perhaps it would be better if work->func() could use ->nsproxy even
> > if the task is PF_EXITING.
>
> So far there is nothing in the nfs code that would suggest allowing
> work->func() being able to use ->nsproxy would make this code any
> better. I think that would just paper over the problem we are seeing
> right now.

I think you misunderstood my point.

I fully agree if you change nlmclnt_setlockargs(). I am suggesting to
move exit_task_namespaces() down after exit_task_work() as a separate
change which perhaps makes sense by itself. Not to fix this problem,
not for nfs, not for fput().

Just to allow work->func() to play with ->nsproxy if needed. task_work
has other users, not only fput().

Oleg.


2013-07-28 18:03:49

by Oleg Nesterov

[permalink] [raw]
Subject: Re: fuzz tested user mode linux core dumps in fs/lockd/clntproc.c:131

On 07/28, Toralf F?rster wrote:
>
> The attached patch works - applied on top of current git -
> at least the issue cannot be reproduced then.

Thanks Toralf.

I'll write the changelog and send the patch tomorrow.

Andrey, any chance you can check that with this patch free_ipc_ns()
doesn't have any problem with ->shm_file ?

e7b2c406 should be enough to fix that leak, but it would be nice if
you can confirm.

> On 07/27/2013 07:00 PM, Oleg Nesterov wrote:
> > On 07/27, Toralf F?rster wrote:
> >>
> >> I do have a user mode linux image (stable 32 bit Gentoo Linux ) which erratically crashes
> >> while fuzz tested with trinity if the victim files are located on a NFS share.
> >>
> >> The back trace of the core dumps always looks like the attached.
> >>
> >> To bisect it is hard. However after few attempts in the last weeks the following
> >> commit is either the first bad commit or at least the upper limit (less likely).
> >>
> >>
> >> commit 8aac62706adaaf0fab02c4327761561c8bda9448
> >> Author: Oleg Nesterov <[email protected]>
> >> Date: Fri Jun 14 21:09:49 2013 +0200
> >>
> >> move exit_task_namespaces() outside of exit_notify()
> >>
> >> #15 nlmclnt_setlockargs (req=0x48e18860, fl=0x48f27c8c) at fs/lockd/clntproc.c:131
> >
> > Thanks.
> >
> > So nlmclnt_setlockargs()->utsname() crashes and we probably need
> > the patch below.
> >
> > But is it correct? I know _absolutely_ nothing about nfs/sunrpc/etc and
> > I never looked into this code before, most probably I am wrong.
> >
> > But it seems that __nlm_async_call() relies on workqueues.
> > nlmclnt_async_call() does rpc_wait_for_completion_task(), but what if
> > the caller is killed?
> >
> > nlm_rqst can't go away, ->a_count was incremented. But can't the caller
> > exit before call->name is used? In this case the memory it points to
> > can be already freed.
> >
> > Oleg.
> >
> > --- x/kernel/exit.c
> > +++ x/kernel/exit.c
> > @@ -783,8 +783,8 @@ void do_exit(long code)
> > exit_shm(tsk);
> > exit_files(tsk);
> > exit_fs(tsk);
> > - exit_task_namespaces(tsk);
> > exit_task_work(tsk);
> > + exit_task_namespaces(tsk);
> > check_stack_usage();
> > exit_thread();
> >
> >
> >
>
>
> --
> MfG/Sincerely
> Toralf F?rster
> pgp finger print: 7B1A 07F4 EC82 0F90 D4C2 8936 872A E508 7DB6 9DA3


2013-07-29 14:56:34

by Oleg Nesterov

[permalink] [raw]
Subject: Re: fuzz tested user mode linux core dumps in fs/lockd/clntproc.c:131

On 07/29, Andrew Vagin wrote:
>
> I don't have objections against this patch. All my tests work find.

Great, thanks for confirmation. Can I translate this into your
acked-by or tested-by if I send this patch?

(there is another fix from Eric, it is not clear to me if I
should send it right now).

Oleg.


2013-07-29 17:43:15

by Eric W. Biederman

[permalink] [raw]
Subject: Re: fuzz tested user mode linux core dumps in fs/lockd/clntproc.c:131 (nfs in a netns utsns problems?)


Adding some people who pay more attention to nfs in network namespaces
than I usually do.

Oleg Nesterov <[email protected]> writes:

> On 07/28, Eric W. Biederman wrote:
>>
>> > This untested patch should fix it without any need to worry about
>> > dynamic behavior.
>
> Yeees ;) I was thinking about this change too, but I have no idea
> what this ->nodename actually means for nfs.
>
> If you are going to send this patch - great!

Just batting it around for now, and hoping we have the right combination
of eyeballs look at it. There are more places in nfs that have
questionable uses of utsname() instead of init_utsname(). So I think we
probably need a more comprehensive patch at the very least.

nfsclnt_reclaim is never called from userspace.
nfsclnt_cancel is called from a callback that seems to have no guarantee
of an approprate userspace context.
nfsclnt_proc is called from rpc_ops and I did not spend the time to see
if that could be a userspace context.

So I really don't think using utsname() aka current->nsproxy->uts_ns
makes sense in nlmclnt_setlockargs.

We most definitely have an inconsistency in nfs. I am a little hesitant
to suggest my patch because it is likely to have strange effects on nfs
in a network namespace. On the flip side the code is broken anyway, we
might as well at least use a version that is guarnateed not to cause
a null pointer dereference in the kernel.

This is the first time someone has seen a problem since late 2006 since
Serge updated most of the references to use system_utsname but I think
we have just been lucky.

>> Although I am wondering if we have a few other spots
>> > where the dynamic behavior might be iffy.
>
> Yes. And perhaps the patch which moves exit_task_namespaces() after
> exit_task_work() makes sense anyway (the patch I showed).
>
> (but if you change nlmclnt_setlockargs() then it is not 3.11 material).
>
> The original motivation for 8aac62706 was the leak reported by Andrey,
> but that leak should be also fixed by e7b2c406. "Move exit_task_namespaces()
> from exit_notify() to do_exit()" is still fine imho, the reason for
> exit_task_namespaces() from the middle of exit_notify() has gone away.
>
> But perhaps it would be better if work->func() could use ->nsproxy even
> if the task is PF_EXITING.

So far there is nothing in the nfs code that would suggest allowing
work->func() being able to use ->nsproxy would make this code any
better. I think that would just paper over the problem we are seeing
right now.

>> > Serge do you remember any of this?
>> >
>> > On a good day I can follow the nfs code but it takes quite a while. I
>> > feel the same way about filesystems locks so I am not really certain
>> > what is going on.
>> >
>> > Eric
>> >
>> > diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c
>> > index 9760ecb..6643cfc 100644
>> > --- a/fs/lockd/clntproc.c
>> > +++ b/fs/lockd/clntproc.c
>> > @@ -128,11 +128,11 @@ static void nlmclnt_setlockargs(struct nlm_rqst *req, struct file_lock *fl)
>> >
>> > nlmclnt_next_cookie(&argp->cookie);
>> > memcpy(&lock->fh, NFS_FH(file_inode(fl->fl_file)), sizeof(struct nfs_fh));
>> > - lock->caller = utsname()->nodename;
>> > + lock->caller = init_utsname()->nodename;
>> > lock->oh.data = req->a_owner;
>> > lock->oh.len = snprintf(req->a_owner, sizeof(req->a_owner), "%u@%s",
>> > (unsigned int)fl->fl_u.nfs_fl.owner->pid,
>> > - utsname()->nodename);
>> > + init_utsname()->nodename);
>> > lock->svid = fl->fl_u.nfs_fl.owner->pid;
>> > lock->fl.fl_start = fl->fl_start;
>> > lock->fl.fl_end = fl->fl_end;
>> >
>> > Eric

2013-07-27 17:06:08

by Oleg Nesterov

[permalink] [raw]
Subject: Re: fuzz tested user mode linux core dumps in fs/lockd/clntproc.c:131

On 07/27, Toralf F?rster wrote:
>
> I do have a user mode linux image (stable 32 bit Gentoo Linux ) which erratically crashes
> while fuzz tested with trinity if the victim files are located on a NFS share.
>
> The back trace of the core dumps always looks like the attached.
>
> To bisect it is hard. However after few attempts in the last weeks the following
> commit is either the first bad commit or at least the upper limit (less likely).
>
>
> commit 8aac62706adaaf0fab02c4327761561c8bda9448
> Author: Oleg Nesterov <[email protected]>
> Date: Fri Jun 14 21:09:49 2013 +0200
>
> move exit_task_namespaces() outside of exit_notify()
>
> #15 nlmclnt_setlockargs (req=0x48e18860, fl=0x48f27c8c) at fs/lockd/clntproc.c:131

Thanks.

So nlmclnt_setlockargs()->utsname() crashes and we probably need
the patch below.

But is it correct? I know _absolutely_ nothing about nfs/sunrpc/etc and
I never looked into this code before, most probably I am wrong.

But it seems that __nlm_async_call() relies on workqueues.
nlmclnt_async_call() does rpc_wait_for_completion_task(), but what if
the caller is killed?

nlm_rqst can't go away, ->a_count was incremented. But can't the caller
exit before call->name is used? In this case the memory it points to
can be already freed.

Oleg.

--- x/kernel/exit.c
+++ x/kernel/exit.c
@@ -783,8 +783,8 @@ void do_exit(long code)
exit_shm(tsk);
exit_files(tsk);
exit_fs(tsk);
- exit_task_namespaces(tsk);
exit_task_work(tsk);
+ exit_task_namespaces(tsk);
check_stack_usage();
exit_thread();



2013-07-27 17:33:15

by Oleg Nesterov

[permalink] [raw]
Subject: Re: fuzz tested user mode linux core dumps in fs/lockd/clntproc.c:131

(Fix Serge's email)

On 07/27, Oleg Nesterov wrote:
>
> On 07/27, Toralf F?rster wrote:
> >
> > I do have a user mode linux image (stable 32 bit Gentoo Linux ) which erratically crashes
> > while fuzz tested with trinity if the victim files are located on a NFS share.
> >
> > The back trace of the core dumps always looks like the attached.
> >
> > To bisect it is hard. However after few attempts in the last weeks the following
> > commit is either the first bad commit or at least the upper limit (less likely).
> >
> >
> > commit 8aac62706adaaf0fab02c4327761561c8bda9448
> > Author: Oleg Nesterov <[email protected]>
> > Date: Fri Jun 14 21:09:49 2013 +0200
> >
> > move exit_task_namespaces() outside of exit_notify()
> >
> > #15 nlmclnt_setlockargs (req=0x48e18860, fl=0x48f27c8c) at fs/lockd/clntproc.c:131
>
> Thanks.
>
> So nlmclnt_setlockargs()->utsname() crashes and we probably need
> the patch below.
>
> But is it correct? I know _absolutely_ nothing about nfs/sunrpc/etc and
> I never looked into this code before, most probably I am wrong.
>
> But it seems that __nlm_async_call() relies on workqueues.
> nlmclnt_async_call() does rpc_wait_for_completion_task(), but what if
> the caller is killed?
>
> nlm_rqst can't go away, ->a_count was incremented. But can't the caller
> exit before call->name is used?

I meant lock->caller, sorry.

> In this case the memory it points to
> can be already freed.

And of course I have no idea what lock->caller actually means. But note
that the final fput() can be called by another process from the different
namespace. Say, a task from the parent namespace looks at /proc/pid/fd.

But again. I do not understand this code at all.

> Oleg.
>
> --- x/kernel/exit.c
> +++ x/kernel/exit.c
> @@ -783,8 +783,8 @@ void do_exit(long code)
> exit_shm(tsk);
> exit_files(tsk);
> exit_fs(tsk);
> - exit_task_namespaces(tsk);
> exit_task_work(tsk);
> + exit_task_namespaces(tsk);
> check_stack_usage();
> exit_thread();
>


2013-07-29 01:23:19

by Eric W. Biederman

[permalink] [raw]
Subject: Re: fuzz tested user mode linux core dumps in fs/lockd/clntproc.c:131


Resending with Serge's current email address.

Eric

[email protected] (Eric W. Biederman) writes:

> Oleg Nesterov <[email protected]> writes:
>
>> On 07/27, Toralf Förster wrote:
>>>
>>> I do have a user mode linux image (stable 32 bit Gentoo Linux ) which erratically crashes
>>> while fuzz tested with trinity if the victim files are located on a NFS share.
>>>
>>> The back trace of the core dumps always looks like the attached.
>>>
>>> To bisect it is hard. However after few attempts in the last weeks the following
>>> commit is either the first bad commit or at least the upper limit (less likely).
>>>
>>>
>>> commit 8aac62706adaaf0fab02c4327761561c8bda9448
>>> Author: Oleg Nesterov <[email protected]>
>>> Date: Fri Jun 14 21:09:49 2013 +0200
>>>
>>> move exit_task_namespaces() outside of exit_notify()
>>>
>>> #15 nlmclnt_setlockargs (req=0x48e18860, fl=0x48f27c8c) at fs/lockd/clntproc.c:131
>>
>> Thanks.
>>
>> So nlmclnt_setlockargs()->utsname() crashes and we probably need
>> the patch below.
>>
>> But is it correct? I know _absolutely_ nothing about nfs/sunrpc/etc and
>> I never looked into this code before, most probably I am wrong.
>>
>> But it seems that __nlm_async_call() relies on workqueues.
>> nlmclnt_async_call() does rpc_wait_for_completion_task(), but what if
>> the caller is killed?
>>
>> nlm_rqst can't go away, ->a_count was incremented. But can't the caller
>> exit before call->name is used? In this case the memory it points to
>> can be already freed.
>
> I don't think anyone has ever looked into that. This was a flyby
> conversion by Serge in 2006 when he originally did the uts namespace.
>
>
> from commit e9ff3990f08e9a0c2839cc22808b01732ea5b3e4
> [PATCH] namespaces: utsname: switch to using uts namespaces
>
> Replace references to system_utsname to the per-process uts namespace
> where appropriate. This includes things like uname.
>
> Changes: Per Eric Biederman's comments, use the per-process uts namespace
> for ELF_PLATFORM, sunrpc, and parts of net/ipv4/ipconfig.c
>
> Hmm. That credits with me with this mess. What was I thinking?
> Perhaps I just said you missed a couple of spots.
>
> This untested patch should fix it without any need to worry about
> dynamic behavior. Although I am wondering if we have a few other spots
> where the dynamic behavior might be iffy.
>
> Serge do you remember any of this?
>
> On a good day I can follow the nfs code but it takes quite a while. I
> feel the same way about filesystems locks so I am not really certain
> what is going on.
>
> Eric
>
> diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c
> index 9760ecb..6643cfc 100644
> --- a/fs/lockd/clntproc.c
> +++ b/fs/lockd/clntproc.c
> @@ -128,11 +128,11 @@ static void nlmclnt_setlockargs(struct nlm_rqst *req, struct file_lock *fl)
>
> nlmclnt_next_cookie(&argp->cookie);
> memcpy(&lock->fh, NFS_FH(file_inode(fl->fl_file)), sizeof(struct nfs_fh));
> - lock->caller = utsname()->nodename;
> + lock->caller = init_utsname()->nodename;
> lock->oh.data = req->a_owner;
> lock->oh.len = snprintf(req->a_owner, sizeof(req->a_owner), "%u@%s",
> (unsigned int)fl->fl_u.nfs_fl.owner->pid,
> - utsname()->nodename);
> + init_utsname()->nodename);
> lock->svid = fl->fl_u.nfs_fl.owner->pid;
> lock->fl.fl_start = fl->fl_start;
> lock->fl.fl_end = fl->fl_end;
>
> Eric

2013-07-30 21:12:27

by J. Bruce Fields

[permalink] [raw]
Subject: Re: fuzz tested user mode linux core dumps in fs/lockd/clntproc.c:131 (nfs in a netns utsns problems?)

On Mon, Jul 29, 2013 at 10:42:57AM -0700, Eric W. Biederman wrote:
>
> Adding some people who pay more attention to nfs in network namespaces
> than I usually do.
>
> Oleg Nesterov <[email protected]> writes:
>
> > On 07/28, Eric W. Biederman wrote:
> >>
> >> > This untested patch should fix it without any need to worry about
> >> > dynamic behavior.
> >
> > Yeees ;) I was thinking about this change too, but I have no idea
> > what this ->nodename actually means for nfs.
> >
> > If you are going to send this patch - great!
>
> Just batting it around for now, and hoping we have the right combination
> of eyeballs look at it. There are more places in nfs that have
> questionable uses of utsname() instead of init_utsname(). So I think we
> probably need a more comprehensive patch at the very least.
>
> nfsclnt_reclaim is never called from userspace.

So, looking just at this one....

Note I think you mean nlmclnt_reclaim.

That's part of the client's handling of server reboots. The client
knows that it should hold some file lock, but knows that the server has
now forgotten that fact, and needs to send a "reclaim" to get the lock
back.

That reclaim will get kicked off when the client's notified that the
server rebooted.

So we're not in the context of whoever originally did the
fcntl(.,F_SETLK,.), and trying to get the namespace out of current is
clearly wrong.

nlm_host currently has a "struct net *net" field. Maybe we also need to
stash a "struct uts_namespace *", or just a copy of the nodename?

--b.

> nfsclnt_cancel is called from a callback that seems to have no guarantee
> of an approprate userspace context.
> nfsclnt_proc is called from rpc_ops and I did not spend the time to see
> if that could be a userspace context.
>
> So I really don't think using utsname() aka current->nsproxy->uts_ns
> makes sense in nlmclnt_setlockargs.
>
> We most definitely have an inconsistency in nfs. I am a little hesitant
> to suggest my patch because it is likely to have strange effects on nfs
> in a network namespace. On the flip side the code is broken anyway, we
> might as well at least use a version that is guarnateed not to cause
> a null pointer dereference in the kernel.
>
> This is the first time someone has seen a problem since late 2006 since
> Serge updated most of the references to use system_utsname but I think
> we have just been lucky.
>
> >> Although I am wondering if we have a few other spots
> >> > where the dynamic behavior might be iffy.
> >
> > Yes. And perhaps the patch which moves exit_task_namespaces() after
> > exit_task_work() makes sense anyway (the patch I showed).
> >
> > (but if you change nlmclnt_setlockargs() then it is not 3.11 material).
> >
> > The original motivation for 8aac62706 was the leak reported by Andrey,
> > but that leak should be also fixed by e7b2c406. "Move exit_task_namespaces()
> > from exit_notify() to do_exit()" is still fine imho, the reason for
> > exit_task_namespaces() from the middle of exit_notify() has gone away.
> >
> > But perhaps it would be better if work->func() could use ->nsproxy even
> > if the task is PF_EXITING.
>
> So far there is nothing in the nfs code that would suggest allowing
> work->func() being able to use ->nsproxy would make this code any
> better. I think that would just paper over the problem we are seeing
> right now.
>
> >> > Serge do you remember any of this?
> >> >
> >> > On a good day I can follow the nfs code but it takes quite a while. I
> >> > feel the same way about filesystems locks so I am not really certain
> >> > what is going on.
> >> >
> >> > Eric
> >> >
> >> > diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c
> >> > index 9760ecb..6643cfc 100644
> >> > --- a/fs/lockd/clntproc.c
> >> > +++ b/fs/lockd/clntproc.c
> >> > @@ -128,11 +128,11 @@ static void nlmclnt_setlockargs(struct nlm_rqst *req, struct file_lock *fl)
> >> >
> >> > nlmclnt_next_cookie(&argp->cookie);
> >> > memcpy(&lock->fh, NFS_FH(file_inode(fl->fl_file)), sizeof(struct nfs_fh));
> >> > - lock->caller = utsname()->nodename;
> >> > + lock->caller = init_utsname()->nodename;
> >> > lock->oh.data = req->a_owner;
> >> > lock->oh.len = snprintf(req->a_owner, sizeof(req->a_owner), "%u@%s",
> >> > (unsigned int)fl->fl_u.nfs_fl.owner->pid,
> >> > - utsname()->nodename);
> >> > + init_utsname()->nodename);
> >> > lock->svid = fl->fl_u.nfs_fl.owner->pid;
> >> > lock->fl.fl_start = fl->fl_start;
> >> > lock->fl.fl_end = fl->fl_end;
> >> >
> >> > Eric
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2013-07-29 14:31:04

by Andrew Vagin

[permalink] [raw]
Subject: Re: fuzz tested user mode linux core dumps in fs/lockd/clntproc.c:131

On Mon, Jul 29, 2013 at 03:10:31PM +0200, Oleg Nesterov wrote:
> On 07/29, Andrew Vagin wrote:
> >
> > On Sun, Jul 28, 2013 at 07:58:28PM +0200, Oleg Nesterov wrote:
> > > On 07/28, Toralf Förster wrote:
> > > >
> > > > The attached patch works - applied on top of current git -
> > > > at least the issue cannot be reproduced then.
> > >
> > > Thanks Toralf.
> > >
> > > I'll write the changelog and send the patch tomorrow.
> > >
> > > Andrey, any chance you can check that with this patch free_ipc_ns()
> > > doesn't have any problem with ->shm_file ?
> >
> > kmemleak doesn't detect any leak,
>
> Good.
>
> > but I think this patch is incorrect.
> >
> > According to my previous investigations exit_task_work should be called
> > after exit task namespaces
> > (http://comments.gmane.org/gmane.linux.kernel/1475123)
> >
> > I applied the following patch:
> >
> > @@ -11,8 +11,11 @@ task_work_add(struct task_struct *task, struct
> > callback_head *work, bool notify)
> >
> > do {
> > head = ACCESS_ONCE(task->task_works);
> > - if (unlikely(head == &work_exited))
> > + if (unlikely(head == &work_exited)) {
> > + printk("%s:%d\n", __func__, __LINE__);
> > + dump_stack();
> > return -ESRCH;
> > + }
> > work->next = head;
> > } while (cmpxchg(&task->task_works, head, work) != head);
> >
> >
> > and I got a few backtraces in a kernel log
> >
> > [ 151.513725] task_work_add:15
> > [ 151.514860] CPU: 1 PID: 15303 Comm: ipc Not tainted 3.11.0-rc2+ #75
> > [ 151.516743] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> > [ 151.518558] ffff880067bf0000 ffff88006922fba0 ffffffff81630dd5 ffff88006d9b2280
> > [ 151.521767] ffff88006922fbb0 ffffffff8107b478 ffff88006922fbd0 ffffffff8119ad43
> > [ 151.524587] ffff880079e81740 ffff88007a9035c8 ffff88006922fbe8 ffffffff81281ebd
> > [ 151.527785] Call Trace:
> > [ 151.528811] [<ffffffff81630dd5>] dump_stack+0x45/0x56
> > [ 151.530378] [<ffffffff8107b478>] task_work_add+0x78/0x80
> > [ 151.533219] [<ffffffff8119ad43>] fput+0x63/0xa0
>
> But this is fine?

Yes.

>
> Once again, we also have e7b2c406 "fput: task_work_add() can fail if the caller
> has passed exit_task_work()" commit which should also fix this particulat problem.

Sorry, I skipped e7b2c406, which explains why I don't see leak now.
Thanks.

I don't have objections against this patch. All my tests work find.

2013-09-22 17:03:48

by Toralf Förster

[permalink] [raw]
Subject: Re: fuzz tested user mode linux core dumps in fs/lockd/clntproc.c:131

On 07/27/2013 07:00 PM, Oleg Nesterov wrote:
> On 07/27, Toralf Förster wrote:
>>
>> I do have a user mode linux image (stable 32 bit Gentoo Linux ) which erratically crashes
>> while fuzz tested with trinity if the victim files are located on a NFS share.
>>
>> The back trace of the core dumps always looks like the attached.
>>
>> To bisect it is hard. However after few attempts in the last weeks the following
>> commit is either the first bad commit or at least the upper limit (less likely).
>>
>>
>> commit 8aac62706adaaf0fab02c4327761561c8bda9448
>> Author: Oleg Nesterov <[email protected]>
>> Date: Fri Jun 14 21:09:49 2013 +0200
>>
>> move exit_task_namespaces() outside of exit_notify()
>>
>> #15 nlmclnt_setlockargs (req=0x48e18860, fl=0x48f27c8c) at fs/lockd/clntproc.c:131
>
> Thanks.
>
> So nlmclnt_setlockargs()->utsname() crashes and we probably need
> the patch below.
>
> But is it correct? I know _absolutely_ nothing about nfs/sunrpc/etc and
> I never looked into this code before, most probably I am wrong.
>
> But it seems that __nlm_async_call() relies on workqueues.
> nlmclnt_async_call() does rpc_wait_for_completion_task(), but what if
> the caller is killed?
>
> nlm_rqst can't go away, ->a_count was incremented. But can't the caller
> exit before call->name is used? In this case the memory it points to
> can be already freed.
>
> Oleg.
>
> --- x/kernel/exit.c
> +++ x/kernel/exit.c
> @@ -783,8 +783,8 @@ void do_exit(long code)
> exit_shm(tsk);
> exit_files(tsk);
> exit_fs(tsk);
> - exit_task_namespaces(tsk);
> exit_task_work(tsk);
> + exit_task_namespaces(tsk);
> check_stack_usage();
> exit_thread();
>
>
>
/me wonders if/when this will go in the main kernel ?

--
MfG/Sincerely
Toralf Förster
pgp finger print: 7B1A 07F4 EC82 0F90 D4C2 8936 872A E508 7DB6 9DA3