2014-07-12 10:32:13

by Toralf Förster

[permalink] [raw]
Subject: fuzz tested user mode linux crashed in NFS code path

A fuzz-tested a x86 32 bit user user mode linux guest using guest kernel v3.16-rc4-142-g47ea8dd
with trinity using NFSv4 victim files (both NFS server and NFS client run within the same UML).

The guest crashed :

Kernel panic - not syncing: BUG!
CPU: 0 PID: 1395 Comm: nfsd Tainted: G B 3.16.0-rc4-00142-g47ea8dd-dirty #68
Stack:
0859f770 0859f770 00000003 086c8547 00000000 858742a0 0cacd700 85447e1c
084e3dd2 00000000 85447df0 85447e44 084e0408 085ab1c4 08700980 0859c46a
85447e50 00000000 00000000 858742a0 0cacd700 85447e74 080ffc70 0859c46a
Call Trace:
[<084e3dd2>] dump_stack+0x26/0x28
[<084e0408>] panic+0x7a/0x194
[<080ffc70>] kfree+0x80/0x150
[<0822ba2e>] ? nfsd4_encode_stateid+0x3e/0x60
[<0822c0ab>] nfsd4_encode_lock+0x4b/0x60
[<08233aa3>] nfsd4_encode_operation+0xd3/0x1d0
[<0822afcf>] nfsd4_proc_compound+0x4bf/0x670
[<0809b999>] ? groups_alloc+0x39/0xe0
[<08219aba>] nfsd_dispatch+0xda/0x200
[<0846868f>] svc_process_common+0x31f/0x610
[<08469b4b>] svc_process+0x10b/0x140
[<0809d270>] ? default_wake_function+0x0/0x20
[<08219360>] ? nfsd+0x0/0x140
[<08219442>] nfsd+0xe2/0x140
[<08095806>] kthread+0xd6/0xe0
[<0809c60d>] ? finish_task_switch.isra.56+0x1d/0x70
[<0805f64b>] new_thread_handler+0x6b/0x90


FWIW "dirty" cames from 1 patch from Richard Weinberger for arch/um/kernel/tlb.c and arch/um/os-Linux/skas/process.c


The back trace of the core file gives:


Thread 1 (LWP 12687):
#0 0xb7797aec in __kernel_vsyscall ()
#1 0x08484e15 in kill () at ../sysdeps/unix/syscall-template.S:81
#2 0x0807253d in uml_abort () at arch/um/os-Linux/util.c:93
#3 0x08072885 in os_dump_core () at arch/um/os-Linux/util.c:148
#4 0x0806241d in panic_exit (self=0x86c95c0 <panic_exit_notifier>, unused1=0, unused2=0x8700980 <buf.19753>) at arch/um/kernel/um_arch.c:240
#5 0x08099706 in notifier_call_chain (nl=0x0, val=2235858240, v=0x6, nr_to_call=-2, nr_calls=0x0) at kernel/notifier.c:93
#6 0x08099863 in __atomic_notifier_call_chain (nr_calls=<optimized out>, nr_to_call=<optimized out>, v=<optimized out>, val=<optimized out>, nh=<optimized out>) at kernel/notifier.c:183
#7 atomic_notifier_call_chain (nh=0x8700944 <panic_notifier_list>, val=0, v=0x8700980 <buf.19753>) at kernel/notifier.c:193
#8 0x084e0424 in panic (fmt=0x0) at kernel/panic.c:133
#9 0x080ffc70 in kfree (x=0x194) at mm/slub.c:3380
#10 0x0822c0ab in nfsd4_encode_lock (resp=0x85871030, nfserr=0, lock=0x858742a0) at fs/nfsd/nfs4xdr.c:2910
#11 0x08233aa3 in nfsd4_encode_operation (resp=0x85871030, op=0x85874280) at fs/nfsd/nfs4xdr.c:3889
#12 0x0822afcf in nfsd4_proc_compound (rqstp=0x858750f0, args=0x85447d40, resp=0x85871030) at fs/nfsd/nfs4proc.c:1386
#13 0x08219aba in nfsd_dispatch (rqstp=0x858750f0, statp=0x832c1018) at fs/nfsd/nfssvc.c:686
#14 0x0846868f in svc_process_common (rqstp=0x858750f0, argv=0x85447d40, resv=0x8587526c) at net/sunrpc/svc.c:1206
#15 0x08469b4b in svc_process (rqstp=0x858750f0) at net/sunrpc/svc.c:1331
#16 0x08219442 in nfsd (vrqstp=0x858750f0) at fs/nfsd/nfssvc.c:609
#17 0x08095806 in kthread (_create=0x85377eb0) at kernel/kthread.c:207
#18 0x0805f64b in new_thread_handler () at arch/um/kernel/process.c:129
#19 0x00000000 in ?? ()



--
Toralf



2014-07-18 16:22:56

by Toralf Förster

[permalink] [raw]
Subject: Re: fuzz tested user mode linux crashed in NFS code path

On 07/17/2014 10:27 PM, J. Bruce Fields wrote:
>> Toralf, is that crash reproduceable? If so, does replacing the above
>> kmalloc by a kcalloc also fix it?

Yes.

I could reproduce it today, it tooks about 2 hours using these trinity paramters :

-C 2 -N 100000 -x mremap -x munmap -q -v /mnt/nfsv4


The backtrace of the core file :

Thread 1 (LWP 17085):
#0 0xb770baec in __kernel_vsyscall ()
#1 0x08484e85 in kill () at ../sysdeps/unix/syscall-template.S:81
#2 0x0807253d in uml_abort () at arch/um/os-Linux/util.c:93
#3 0x08072885 in os_dump_core () at arch/um/os-Linux/util.c:148
#4 0x0806241d in panic_exit (self=0x86c95c0 <panic_exit_notifier>, unused1=0, unused2=0x8700980 <buf.19753>) at arch/um/kernel/um_arch.c:240
#5 0x08099706 in notifier_call_chain (nl=0x0, val=2231598400, v=0x6, nr_to_call=-2, nr_calls=0x0) at kernel/notifier.c:93
#6 0x08099863 in __atomic_notifier_call_chain (nr_calls=<optimized out>, nr_to_call=<optimized out>, v=<optimized out>, val=<optimized out>, nh=<optimized out>) at kernel/notifier.c:183
#7 atomic_notifier_call_chain (nh=0x8700944 <panic_notifier_list>, val=0, v=0x8700980 <buf.19753>) at kernel/notifier.c:193
#8 0x084e0494 in panic (fmt=0x0) at kernel/panic.c:133
#9 0x080ffc70 in kfree (x=0x218) at mm/slub.c:3380
#10 0x0822c0db in nfsd4_encode_lock (resp=0x85576120, nfserr=0, lock=0x855752d0) at fs/nfsd/nfs4xdr.c:2910
#11 0x08233ad3 in nfsd4_encode_operation (resp=0x85576120, op=0x855752b0) at fs/nfsd/nfs4xdr.c:3889
#12 0x0822afff in nfsd4_proc_compound (rqstp=0x85571030, args=0x85037d40, resp=0x85576120) at fs/nfsd/nfs4proc.c:1386
#13 0x08219aea in nfsd_dispatch (rqstp=0x85571030, statp=0x7fcc8018) at fs/nfsd/nfssvc.c:686
#14 0x084686ff in svc_process_common (rqstp=0x85571030, argv=0x85037d40, resv=0x855711ac) at net/sunrpc/svc.c:1206
#15 0x08469bbb in svc_process (rqstp=0x85571030) at net/sunrpc/svc.c:1331
#16 0x08219472 in nfsd (vrqstp=0x85571030) at fs/nfsd/nfssvc.c:609
#17 0x08095806 in kthread (_create=0x84f86eb0) at kernel/kthread.c:207
#18 0x0805f64b in new_thread_handler () at arch/um/kernel/process.c:129
#19 0x00000000 in ?? ()

and the UML guest command line :

...
Kernel panic - not syncing: BUG!
CPU: 0 PID: 1397 Comm: nfsd Tainted: G B 3.16.0-rc5-00152-g59ca9ee-dirty #76
Stack:
0859f770 0859f770 00000003 086c8547 00000000 855752d0 0c6c9700 85037e1c
084e3e42 00000000 85037df0 85037e44 084e0478 085ab1c4 08700980 0859c46a
85037e50 00000000 00000000 855752d0 0c6c9700 85037e74 080ffc70 0859c46a
Call Trace:
[<084e3e42>] dump_stack+0x26/0x28
[<084e0478>] panic+0x7a/0x194
[<080ffc70>] kfree+0x80/0x150
[<0822ba5e>] ? nfsd4_encode_stateid+0x3e/0x60
[<0822c0db>] nfsd4_encode_lock+0x4b/0x60
[<08233ad3>] nfsd4_encode_operation+0xd3/0x1d0
[<0822afff>] nfsd4_proc_compound+0x4bf/0x670
[<0809b999>] ? groups_alloc+0x39/0xe0
[<08219aea>] nfsd_dispatch+0xda/0x200
[<084686ff>] svc_process_common+0x31f/0x610
[<08469bbb>] svc_process+0x10b/0x140
[<0809d270>] ? default_wake_function+0x0/0x20
[<08219390>] ? nfsd+0x0/0x140
[<08219472>] nfsd+0xe2/0x140
[<08095806>] kthread+0xd6/0xe0
[<0809c60d>] ? finish_task_switch.isra.56+0x1d/0x70
[<0805f64b>] new_thread_handler+0x6b/0x90


I can now try with kzalloc, but due to the nature of this issue I think, that the absence of this crash - even after 2-3 hours - doesn't mean by 100%, that kzalloc fixed it, or ?


>
> Sorry, that should be kzalloc. We should probably fix that regardless.
>
> But I still don't understand how you hit this case....
>
> --b.


--
Toralf


2014-07-12 17:15:02

by Toralf Förster

[permalink] [raw]
Subject: Re: fuzz tested user mode linux crashed in NFS code path

On 07/12/2014 02:31 PM, Kinglong Mee wrote:
> I think it is caused by kfree an uninitialized address.
> Can you test with the patch listed in following url,
> I have send some days before ?
>
> "[PATCH 1/4] NFSD: Fix memory leak in encoding denied lock"
> http://www.spinics.net/lists/linux-nfs/msg44719.html
>
> thanks,
> Kinglong Mee
>
That patch seems to fix the issue.



--
Toralf


2014-07-19 09:27:52

by Toralf Förster

[permalink] [raw]
Subject: Re: fuzz tested user mode linux crashed in NFS code path

On 07/19/2014 05:23 AM, Kinglong Mee wrote:
> Lock file success, nfsd will copy stateid to the union, but the value
> also influence denied.
> If on x86_64 platform, only influence the len in xdr_netobj,
> but on i686 platform, will influence the len and the data in xdr_netobj.
> So, the problem only appears on i686 platform.

OT, but wouldn't this be a good example for the coverity people to check for it too ?


--
Toralf


2014-07-17 20:27:23

by J. Bruce Fields

[permalink] [raw]
Subject: Re: fuzz tested user mode linux crashed in NFS code path

On Wed, Jul 16, 2014 at 02:57:24PM -0400, J. Bruce Fields wrote:
> On Sat, Jul 12, 2014 at 08:31:15PM +0800, Kinglong Mee wrote:
> > I think it is caused by kfree an uninitialized address.
> > Can you test with the patch listed in following url,
> > I have send some days before ?
> >
> > "[PATCH 1/4] NFSD: Fix memory leak in encoding denied lock"
> > http://www.spinics.net/lists/linux-nfs/msg44719.html
>
> I have this queued for 3.17, but if it causes a crash then it should go
> to 3.16 now.
>
> However, I'm confused: the only explicit initialization of lk_denied is
> in the case vfs_lock_file() returns -EAGAIN. Our usual tests (cthon,
> pynfs) do lots of succesful locks, so should have hit this before.
>
> OK, I see: this memory zeroed by a memset in svc_process_common():
>
> memset(rqstp->rq_argp, 0, procp->pc_argsize);
>
> *But* in the case of the NFSv4 compound operation, we only have enough
> space in rq_argp for 8 operations, anything more is allocated in
> fs/nfsd/nfs4xdr.c:nfsd4_decode_compound:
>
> if (argp->opcnt > ARRAY_SIZE(argp->iops)) {
> argp->ops = kmalloc(argp->opcnt * sizeof(*argp->ops), GFP_KERNEL);
> ...
>
> So, perhaps we got a compound with more than 8 operations, with the LOCK
> operation in the 9th or later position?
>
> But the Linux NFS client doesn't do that, so I don't understand how
> Toralf hit this.
>
> Am I missing anything here?
>
> Toralf, is that crash reproduceable? If so, does replacing the above
> kmalloc by a kcalloc also fix it?

Sorry, that should be kzalloc. We should probably fix that regardless.

But I still don't understand how you hit this case....

--b.

commit 5d6031ca742f
Author: J. Bruce Fields <[email protected]>
Date: Thu Jul 17 16:20:39 2014 -0400

nfsd4: zero op arguments beyond the 8th compound op

The first 8 ops of the compound are zeroed since they're a part of the
argument that's zeroed by the

memset(rqstp->rq_argp, 0, procp->pc_argsize);

in svc_process_common(). But we handle larger compounds by allocating
the memory on the fly in nfsd4_decode_compound(). Other than code
recently fixed by 01529e3f8179 "NFSD: Fix memory leak in encoding denied
lock", I don't know of any examples of code depending on this
initialization. But it definitely seems possible, and I'd rather be
safe.

Compounds this long are unusual so I'm much more worried about failure
in this poorly tested cases than about an insignificant performance hit.

Signed-off-by: J. Bruce Fields <[email protected]>

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 01023a595163..628b430e743e 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -1635,7 +1635,7 @@ nfsd4_decode_compound(struct nfsd4_compoundargs *argp)
goto xdr_error;

if (argp->opcnt > ARRAY_SIZE(argp->iops)) {
- argp->ops = kmalloc(argp->opcnt * sizeof(*argp->ops), GFP_KERNEL);
+ argp->ops = kzalloc(argp->opcnt * sizeof(*argp->ops), GFP_KERNEL);
if (!argp->ops) {
argp->ops = argp->iops;
dprintk("nfsd: couldn't allocate room for COMPOUND\n");

2014-07-23 14:59:19

by J. Bruce Fields

[permalink] [raw]
Subject: Re: fuzz tested user mode linux crashed in NFS code path

On Wed, Jul 23, 2014 at 01:04:35PM +0800, Kinglong Mee wrote:
> On 7/21/2014 23:55, J. Bruce Fields wrote:
> > On Sat, Jul 19, 2014 at 11:23:59AM +0800, Kinglong Mee wrote:
> >> On Sat, Jul 19, 2014 at 12:50 AM, Toralf Förster <[email protected]> wrote:
> >>> On 07/18/2014 06:22 PM, Toralf Förster wrote:
> >>>> I can now try with kzalloc, but due to the nature of this issue I think, that the absence of this crash - even after 2-3 hours - doesn't mean by 100%, that kzalloc fixed it, or ?
> >>>
> >>> Well, next crash (with kzalloc patch) happened after 20 minutes ...
> >>
> >> Maybe I have found the problem.
> >> The stateid and denied are defined as an union as,
> >> fs/nfsd/xdr4.h
> >> 145 struct nfsd4_lock_denied {
> >> 146 clientid_t ld_clientid;
> >> 147 struct xdr_netobj ld_owner;
> >> 148 u64 ld_start;
> >> 149 u64 ld_length;
> >> 150 u32 ld_type;
> >> 151 };
> >> 152
> >> 153 struct nfsd4_lock {
> >> ... ...
> >> 174 /* response */
> >> 175 union {
> >> 176 struct {
> >> 177 stateid_t stateid;
> >> 178 } ok;
> >> 179 struct nfsd4_lock_denied denied;
> >> 180 } u;
> >>
> >> 30 struct xdr_netobj {
> >> 31 unsigned int len;
> >> 32 u8 * data;
> >> 33 };
> >>
> >> sizeof(stateid_t) = 16, sizeof(clientid_t) = 8,
> >> sizeof(struct xdr_netobj) = 16, (on x86_x64 platform),
> >> sizeof(struct xdr_netobj) = 8, (on i686 platform)
> >>
> >> Lock file success, nfsd will copy stateid to the union, but the value
> >> also influence denied.
> >> If on x86_64 platform, only influence the len in xdr_netobj,
> >> but on i686 platform, will influence the len and the data in xdr_netobj.
> >> So, the problem only appears on i686 platform.
> >
> > Oh, great catch, thanks. Sounds like that would explain all of Toralf's
> > results.
> >
> > I'll include this explanation with your original patch and submit it for
> > 3.16.
>
> I saw the patch in your git-tree, there is one fault about the comments.
>
> > (Note that lock->lk_denied.ld_owner.data appears it should be zero here,
> > until you notice that it's one arm of a union the other arm of which is
> > written to in the succesful case by the
> >
> > memcpy(&oc->oc_resp_stateid, &stp->st_stid.sc_stateid, sizeof(stateid_t));
> >
> > in nfsd4_open_confirm().
>
> It's,
> memcpy(&lock->lk_resp_stateid, &lock_stp->st_stid.sc_stateid,
> sizeof(stateid_t));
> in nfsd4_lock().

Thanks, fixed!

--b.

2014-07-12 12:31:48

by Kinglong Mee

[permalink] [raw]
Subject: Re: fuzz tested user mode linux crashed in NFS code path

I think it is caused by kfree an uninitialized address.
Can you test with the patch listed in following url,
I have send some days before ?

"[PATCH 1/4] NFSD: Fix memory leak in encoding denied lock"
http://www.spinics.net/lists/linux-nfs/msg44719.html

thanks,
Kinglong Mee

On 7/12/2014 18:32, Toralf Förster wrote:
> A fuzz-tested a x86 32 bit user user mode linux guest using guest kernel v3.16-rc4-142-g47ea8dd
> with trinity using NFSv4 victim files (both NFS server and NFS client run within the same UML).
>
> The guest crashed :
>
> Kernel panic - not syncing: BUG!
> CPU: 0 PID: 1395 Comm: nfsd Tainted: G B 3.16.0-rc4-00142-g47ea8dd-dirty #68
> Stack:
> 0859f770 0859f770 00000003 086c8547 00000000 858742a0 0cacd700 85447e1c
> 084e3dd2 00000000 85447df0 85447e44 084e0408 085ab1c4 08700980 0859c46a
> 85447e50 00000000 00000000 858742a0 0cacd700 85447e74 080ffc70 0859c46a
> Call Trace:
> [<084e3dd2>] dump_stack+0x26/0x28
> [<084e0408>] panic+0x7a/0x194
> [<080ffc70>] kfree+0x80/0x150
> [<0822ba2e>] ? nfsd4_encode_stateid+0x3e/0x60
> [<0822c0ab>] nfsd4_encode_lock+0x4b/0x60
> [<08233aa3>] nfsd4_encode_operation+0xd3/0x1d0
> [<0822afcf>] nfsd4_proc_compound+0x4bf/0x670
> [<0809b999>] ? groups_alloc+0x39/0xe0
> [<08219aba>] nfsd_dispatch+0xda/0x200
> [<0846868f>] svc_process_common+0x31f/0x610
> [<08469b4b>] svc_process+0x10b/0x140
> [<0809d270>] ? default_wake_function+0x0/0x20
> [<08219360>] ? nfsd+0x0/0x140
> [<08219442>] nfsd+0xe2/0x140
> [<08095806>] kthread+0xd6/0xe0
> [<0809c60d>] ? finish_task_switch.isra.56+0x1d/0x70
> [<0805f64b>] new_thread_handler+0x6b/0x90
>
>
> FWIW "dirty" cames from 1 patch from Richard Weinberger for arch/um/kernel/tlb.c and arch/um/os-Linux/skas/process.c
>
>
> The back trace of the core file gives:
>
>
> Thread 1 (LWP 12687):
> #0 0xb7797aec in __kernel_vsyscall ()
> #1 0x08484e15 in kill () at ../sysdeps/unix/syscall-template.S:81
> #2 0x0807253d in uml_abort () at arch/um/os-Linux/util.c:93
> #3 0x08072885 in os_dump_core () at arch/um/os-Linux/util.c:148
> #4 0x0806241d in panic_exit (self=0x86c95c0 <panic_exit_notifier>, unused1=0, unused2=0x8700980 <buf.19753>) at arch/um/kernel/um_arch.c:240
> #5 0x08099706 in notifier_call_chain (nl=0x0, val=2235858240, v=0x6, nr_to_call=-2, nr_calls=0x0) at kernel/notifier.c:93
> #6 0x08099863 in __atomic_notifier_call_chain (nr_calls=<optimized out>, nr_to_call=<optimized out>, v=<optimized out>, val=<optimized out>, nh=<optimized out>) at kernel/notifier.c:183
> #7 atomic_notifier_call_chain (nh=0x8700944 <panic_notifier_list>, val=0, v=0x8700980 <buf.19753>) at kernel/notifier.c:193
> #8 0x084e0424 in panic (fmt=0x0) at kernel/panic.c:133
> #9 0x080ffc70 in kfree (x=0x194) at mm/slub.c:3380
> #10 0x0822c0ab in nfsd4_encode_lock (resp=0x85871030, nfserr=0, lock=0x858742a0) at fs/nfsd/nfs4xdr.c:2910
> #11 0x08233aa3 in nfsd4_encode_operation (resp=0x85871030, op=0x85874280) at fs/nfsd/nfs4xdr.c:3889
> #12 0x0822afcf in nfsd4_proc_compound (rqstp=0x858750f0, args=0x85447d40, resp=0x85871030) at fs/nfsd/nfs4proc.c:1386
> #13 0x08219aba in nfsd_dispatch (rqstp=0x858750f0, statp=0x832c1018) at fs/nfsd/nfssvc.c:686
> #14 0x0846868f in svc_process_common (rqstp=0x858750f0, argv=0x85447d40, resv=0x8587526c) at net/sunrpc/svc.c:1206
> #15 0x08469b4b in svc_process (rqstp=0x858750f0) at net/sunrpc/svc.c:1331
> #16 0x08219442 in nfsd (vrqstp=0x858750f0) at fs/nfsd/nfssvc.c:609
> #17 0x08095806 in kthread (_create=0x85377eb0) at kernel/kthread.c:207
> #18 0x0805f64b in new_thread_handler () at arch/um/kernel/process.c:129
> #19 0x00000000 in ?? ()
>
>
>

2014-07-18 16:50:40

by Toralf Förster

[permalink] [raw]
Subject: Re: fuzz tested user mode linux crashed in NFS code path

On 07/18/2014 06:22 PM, Toralf Förster wrote:
> I can now try with kzalloc, but due to the nature of this issue I think, that the absence of this crash - even after 2-3 hours - doesn't mean by 100%, that kzalloc fixed it, or ?

Well, next crash (with kzalloc patch) happened after 20 minutes ...


--
Toralf


2014-07-19 03:24:01

by Kinglong Mee

[permalink] [raw]
Subject: Re: fuzz tested user mode linux crashed in NFS code path

On Sat, Jul 19, 2014 at 12:50 AM, Toralf Förster <[email protected]> wrote:
> On 07/18/2014 06:22 PM, Toralf Förster wrote:
>> I can now try with kzalloc, but due to the nature of this issue I think, that the absence of this crash - even after 2-3 hours - doesn't mean by 100%, that kzalloc fixed it, or ?
>
> Well, next crash (with kzalloc patch) happened after 20 minutes ...

Maybe I have found the problem.
The stateid and denied are defined as an union as,
fs/nfsd/xdr4.h
145 struct nfsd4_lock_denied {
146 clientid_t ld_clientid;
147 struct xdr_netobj ld_owner;
148 u64 ld_start;
149 u64 ld_length;
150 u32 ld_type;
151 };
152
153 struct nfsd4_lock {
... ...
174 /* response */
175 union {
176 struct {
177 stateid_t stateid;
178 } ok;
179 struct nfsd4_lock_denied denied;
180 } u;

30 struct xdr_netobj {
31 unsigned int len;
32 u8 * data;
33 };

sizeof(stateid_t) = 16, sizeof(clientid_t) = 8,
sizeof(struct xdr_netobj) = 16, (on x86_x64 platform),
sizeof(struct xdr_netobj) = 8, (on i686 platform)

Lock file success, nfsd will copy stateid to the union, but the value
also influence denied.
If on x86_64 platform, only influence the len in xdr_netobj,
but on i686 platform, will influence the len and the data in xdr_netobj.
So, the problem only appears on i686 platform.

thanks,
Kinglong Mee

2014-07-23 05:05:06

by Kinglong Mee

[permalink] [raw]
Subject: Re: fuzz tested user mode linux crashed in NFS code path

On 7/21/2014 23:55, J. Bruce Fields wrote:
> On Sat, Jul 19, 2014 at 11:23:59AM +0800, Kinglong Mee wrote:
>> On Sat, Jul 19, 2014 at 12:50 AM, Toralf Förster <[email protected]> wrote:
>>> On 07/18/2014 06:22 PM, Toralf Förster wrote:
>>>> I can now try with kzalloc, but due to the nature of this issue I think, that the absence of this crash - even after 2-3 hours - doesn't mean by 100%, that kzalloc fixed it, or ?
>>>
>>> Well, next crash (with kzalloc patch) happened after 20 minutes ...
>>
>> Maybe I have found the problem.
>> The stateid and denied are defined as an union as,
>> fs/nfsd/xdr4.h
>> 145 struct nfsd4_lock_denied {
>> 146 clientid_t ld_clientid;
>> 147 struct xdr_netobj ld_owner;
>> 148 u64 ld_start;
>> 149 u64 ld_length;
>> 150 u32 ld_type;
>> 151 };
>> 152
>> 153 struct nfsd4_lock {
>> ... ...
>> 174 /* response */
>> 175 union {
>> 176 struct {
>> 177 stateid_t stateid;
>> 178 } ok;
>> 179 struct nfsd4_lock_denied denied;
>> 180 } u;
>>
>> 30 struct xdr_netobj {
>> 31 unsigned int len;
>> 32 u8 * data;
>> 33 };
>>
>> sizeof(stateid_t) = 16, sizeof(clientid_t) = 8,
>> sizeof(struct xdr_netobj) = 16, (on x86_x64 platform),
>> sizeof(struct xdr_netobj) = 8, (on i686 platform)
>>
>> Lock file success, nfsd will copy stateid to the union, but the value
>> also influence denied.
>> If on x86_64 platform, only influence the len in xdr_netobj,
>> but on i686 platform, will influence the len and the data in xdr_netobj.
>> So, the problem only appears on i686 platform.
>
> Oh, great catch, thanks. Sounds like that would explain all of Toralf's
> results.
>
> I'll include this explanation with your original patch and submit it for
> 3.16.

I saw the patch in your git-tree, there is one fault about the comments.

> (Note that lock->lk_denied.ld_owner.data appears it should be zero here,
> until you notice that it's one arm of a union the other arm of which is
> written to in the succesful case by the
>
> memcpy(&oc->oc_resp_stateid, &stp->st_stid.sc_stateid, sizeof(stateid_t));
>
> in nfsd4_open_confirm().

It's,
memcpy(&lock->lk_resp_stateid, &lock_stp->st_stid.sc_stateid,
sizeof(stateid_t));
in nfsd4_lock().

> In the 32-bit case this overwrites ld_owner.data.)

thanks,
Kinglong Mee

2014-07-17 20:33:12

by Toralf Förster

[permalink] [raw]
Subject: Re: fuzz tested user mode linux crashed in NFS code path

On 07/17/2014 10:27 PM, J. Bruce Fields wrote:
> On Wed, Jul 16, 2014 at 02:57:24PM -0400, J. Bruce Fields wrote:
>> On Sat, Jul 12, 2014 at 08:31:15PM +0800, Kinglong Mee wrote:
>>> I think it is caused by kfree an uninitialized address.
>>> Can you test with the patch listed in following url,
>>> I have send some days before ?
>>>
>>> "[PATCH 1/4] NFSD: Fix memory leak in encoding denied lock"
>>> http://www.spinics.net/lists/linux-nfs/msg44719.html
>>
>> I have this queued for 3.17, but if it causes a crash then it should go
>> to 3.16 now.
>>
>> However, I'm confused: the only explicit initialization of lk_denied is
>> in the case vfs_lock_file() returns -EAGAIN. Our usual tests (cthon,
>> pynfs) do lots of succesful locks, so should have hit this before.
>>
>> OK, I see: this memory zeroed by a memset in svc_process_common():
>>
>> memset(rqstp->rq_argp, 0, procp->pc_argsize);
>>
>> *But* in the case of the NFSv4 compound operation, we only have enough
>> space in rq_argp for 8 operations, anything more is allocated in
>> fs/nfsd/nfs4xdr.c:nfsd4_decode_compound:
>>
>> if (argp->opcnt > ARRAY_SIZE(argp->iops)) {
>> argp->ops = kmalloc(argp->opcnt * sizeof(*argp->ops), GFP_KERNEL);
>> ...
>>
>> So, perhaps we got a compound with more than 8 operations, with the LOCK
>> operation in the 9th or later position?
>>
>> But the Linux NFS client doesn't do that, so I don't understand how
>> Toralf hit this.
>>
>> Am I missing anything here?
>>
>> Toralf, is that crash reproduceable? If so, does replacing the above
>> kmalloc by a kcalloc also fix it?
>
> Sorry, that should be kzalloc. We should probably fix that regardless.
>
> But I still don't understand how you hit this case....
>
> --b.
>
> commit 5d6031ca742f
> Author: J. Bruce Fields <[email protected]>
> Date: Thu Jul 17 16:20:39 2014 -0400
>
> nfsd4: zero op arguments beyond the 8th compound op
>
> The first 8 ops of the compound are zeroed since they're a part of the
> argument that's zeroed by the
>
> memset(rqstp->rq_argp, 0, procp->pc_argsize);
>
> in svc_process_common(). But we handle larger compounds by allocating
> the memory on the fly in nfsd4_decode_compound(). Other than code
> recently fixed by 01529e3f8179 "NFSD: Fix memory leak in encoding denied
> lock", I don't know of any examples of code depending on this
> initialization. But it definitely seems possible, and I'd rather be
> safe.
>
> Compounds this long are unusual so I'm much more worried about failure
> in this poorly tested cases than about an insignificant performance hit.
>
> Signed-off-by: J. Bruce Fields <[email protected]>
>
> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> index 01023a595163..628b430e743e 100644
> --- a/fs/nfsd/nfs4xdr.c
> +++ b/fs/nfsd/nfs4xdr.c
> @@ -1635,7 +1635,7 @@ nfsd4_decode_compound(struct nfsd4_compoundargs *argp)
> goto xdr_error;
>
> if (argp->opcnt > ARRAY_SIZE(argp->iops)) {
> - argp->ops = kmalloc(argp->opcnt * sizeof(*argp->ops), GFP_KERNEL);
> + argp->ops = kzalloc(argp->opcnt * sizeof(*argp->ops), GFP_KERNEL);
> if (!argp->ops) {
> argp->ops = argp->iops;
> dprintk("nfsd: couldn't allocate room for COMPOUND\n");
>
I'm trying to reproduce it, but due to the nature of fuzz testing ...


--
Toralf


2014-07-16 18:57:26

by J. Bruce Fields

[permalink] [raw]
Subject: Re: fuzz tested user mode linux crashed in NFS code path

On Sat, Jul 12, 2014 at 08:31:15PM +0800, Kinglong Mee wrote:
> I think it is caused by kfree an uninitialized address.
> Can you test with the patch listed in following url,
> I have send some days before ?
>
> "[PATCH 1/4] NFSD: Fix memory leak in encoding denied lock"
> http://www.spinics.net/lists/linux-nfs/msg44719.html

I have this queued for 3.17, but if it causes a crash then it should go
to 3.16 now.

However, I'm confused: the only explicit initialization of lk_denied is
in the case vfs_lock_file() returns -EAGAIN. Our usual tests (cthon,
pynfs) do lots of succesful locks, so should have hit this before.

OK, I see: this memory zeroed by a memset in svc_process_common():

memset(rqstp->rq_argp, 0, procp->pc_argsize);

*But* in the case of the NFSv4 compound operation, we only have enough
space in rq_argp for 8 operations, anything more is allocated in
fs/nfsd/nfs4xdr.c:nfsd4_decode_compound:

if (argp->opcnt > ARRAY_SIZE(argp->iops)) {
argp->ops = kmalloc(argp->opcnt * sizeof(*argp->ops), GFP_KERNEL);
...

So, perhaps we got a compound with more than 8 operations, with the LOCK
operation in the 9th or later position?

But the Linux NFS client doesn't do that, so I don't understand how
Toralf hit this.

Am I missing anything here?

Toralf, is that crash reproduceable? If so, does replacing the above
kmalloc by a kcalloc also fix it?

(And thanks for your fuzz testing, it seems to catch interesting bugs.)

--b.


>
> thanks,
> Kinglong Mee
>
> On 7/12/2014 18:32, Toralf Förster wrote:
> > A fuzz-tested a x86 32 bit user user mode linux guest using guest kernel v3.16-rc4-142-g47ea8dd
> > with trinity using NFSv4 victim files (both NFS server and NFS client run within the same UML).
> >
> > The guest crashed :
> >
> > Kernel panic - not syncing: BUG!
> > CPU: 0 PID: 1395 Comm: nfsd Tainted: G B 3.16.0-rc4-00142-g47ea8dd-dirty #68
> > Stack:
> > 0859f770 0859f770 00000003 086c8547 00000000 858742a0 0cacd700 85447e1c
> > 084e3dd2 00000000 85447df0 85447e44 084e0408 085ab1c4 08700980 0859c46a
> > 85447e50 00000000 00000000 858742a0 0cacd700 85447e74 080ffc70 0859c46a
> > Call Trace:
> > [<084e3dd2>] dump_stack+0x26/0x28
> > [<084e0408>] panic+0x7a/0x194
> > [<080ffc70>] kfree+0x80/0x150
> > [<0822ba2e>] ? nfsd4_encode_stateid+0x3e/0x60
> > [<0822c0ab>] nfsd4_encode_lock+0x4b/0x60
> > [<08233aa3>] nfsd4_encode_operation+0xd3/0x1d0
> > [<0822afcf>] nfsd4_proc_compound+0x4bf/0x670
> > [<0809b999>] ? groups_alloc+0x39/0xe0
> > [<08219aba>] nfsd_dispatch+0xda/0x200
> > [<0846868f>] svc_process_common+0x31f/0x610
> > [<08469b4b>] svc_process+0x10b/0x140
> > [<0809d270>] ? default_wake_function+0x0/0x20
> > [<08219360>] ? nfsd+0x0/0x140
> > [<08219442>] nfsd+0xe2/0x140
> > [<08095806>] kthread+0xd6/0xe0
> > [<0809c60d>] ? finish_task_switch.isra.56+0x1d/0x70
> > [<0805f64b>] new_thread_handler+0x6b/0x90
> >
> >
> > FWIW "dirty" cames from 1 patch from Richard Weinberger for arch/um/kernel/tlb.c and arch/um/os-Linux/skas/process.c
> >
> >
> > The back trace of the core file gives:
> >
> >
> > Thread 1 (LWP 12687):
> > #0 0xb7797aec in __kernel_vsyscall ()
> > #1 0x08484e15 in kill () at ../sysdeps/unix/syscall-template.S:81
> > #2 0x0807253d in uml_abort () at arch/um/os-Linux/util.c:93
> > #3 0x08072885 in os_dump_core () at arch/um/os-Linux/util.c:148
> > #4 0x0806241d in panic_exit (self=0x86c95c0 <panic_exit_notifier>, unused1=0, unused2=0x8700980 <buf.19753>) at arch/um/kernel/um_arch.c:240
> > #5 0x08099706 in notifier_call_chain (nl=0x0, val=2235858240, v=0x6, nr_to_call=-2, nr_calls=0x0) at kernel/notifier.c:93
> > #6 0x08099863 in __atomic_notifier_call_chain (nr_calls=<optimized out>, nr_to_call=<optimized out>, v=<optimized out>, val=<optimized out>, nh=<optimized out>) at kernel/notifier.c:183
> > #7 atomic_notifier_call_chain (nh=0x8700944 <panic_notifier_list>, val=0, v=0x8700980 <buf.19753>) at kernel/notifier.c:193
> > #8 0x084e0424 in panic (fmt=0x0) at kernel/panic.c:133
> > #9 0x080ffc70 in kfree (x=0x194) at mm/slub.c:3380
> > #10 0x0822c0ab in nfsd4_encode_lock (resp=0x85871030, nfserr=0, lock=0x858742a0) at fs/nfsd/nfs4xdr.c:2910
> > #11 0x08233aa3 in nfsd4_encode_operation (resp=0x85871030, op=0x85874280) at fs/nfsd/nfs4xdr.c:3889
> > #12 0x0822afcf in nfsd4_proc_compound (rqstp=0x858750f0, args=0x85447d40, resp=0x85871030) at fs/nfsd/nfs4proc.c:1386
> > #13 0x08219aba in nfsd_dispatch (rqstp=0x858750f0, statp=0x832c1018) at fs/nfsd/nfssvc.c:686
> > #14 0x0846868f in svc_process_common (rqstp=0x858750f0, argv=0x85447d40, resv=0x8587526c) at net/sunrpc/svc.c:1206
> > #15 0x08469b4b in svc_process (rqstp=0x858750f0) at net/sunrpc/svc.c:1331
> > #16 0x08219442 in nfsd (vrqstp=0x858750f0) at fs/nfsd/nfssvc.c:609
> > #17 0x08095806 in kthread (_create=0x85377eb0) at kernel/kthread.c:207
> > #18 0x0805f64b in new_thread_handler () at arch/um/kernel/process.c:129
> > #19 0x00000000 in ?? ()
> >
> >
> >

2014-07-21 15:55:45

by J. Bruce Fields

[permalink] [raw]
Subject: Re: fuzz tested user mode linux crashed in NFS code path

On Sat, Jul 19, 2014 at 11:23:59AM +0800, Kinglong Mee wrote:
> On Sat, Jul 19, 2014 at 12:50 AM, Toralf Förster <[email protected]> wrote:
> > On 07/18/2014 06:22 PM, Toralf Förster wrote:
> >> I can now try with kzalloc, but due to the nature of this issue I think, that the absence of this crash - even after 2-3 hours - doesn't mean by 100%, that kzalloc fixed it, or ?
> >
> > Well, next crash (with kzalloc patch) happened after 20 minutes ...
>
> Maybe I have found the problem.
> The stateid and denied are defined as an union as,
> fs/nfsd/xdr4.h
> 145 struct nfsd4_lock_denied {
> 146 clientid_t ld_clientid;
> 147 struct xdr_netobj ld_owner;
> 148 u64 ld_start;
> 149 u64 ld_length;
> 150 u32 ld_type;
> 151 };
> 152
> 153 struct nfsd4_lock {
> ... ...
> 174 /* response */
> 175 union {
> 176 struct {
> 177 stateid_t stateid;
> 178 } ok;
> 179 struct nfsd4_lock_denied denied;
> 180 } u;
>
> 30 struct xdr_netobj {
> 31 unsigned int len;
> 32 u8 * data;
> 33 };
>
> sizeof(stateid_t) = 16, sizeof(clientid_t) = 8,
> sizeof(struct xdr_netobj) = 16, (on x86_x64 platform),
> sizeof(struct xdr_netobj) = 8, (on i686 platform)
>
> Lock file success, nfsd will copy stateid to the union, but the value
> also influence denied.
> If on x86_64 platform, only influence the len in xdr_netobj,
> but on i686 platform, will influence the len and the data in xdr_netobj.
> So, the problem only appears on i686 platform.

Oh, great catch, thanks. Sounds like that would explain all of Toralf's
results.

I'll include this explanation with your original patch and submit it for
3.16.

--b.