2013-08-07 20:16:06

by Toralf Förster

[permalink] [raw]
Subject: bisecting an NFS issue prevented by fs/nfsd/nfs4recover.c:1414

I'm trying to bisect a NFS + EXT3/4 + loop device issue introduced probably between v3.8 and v3.10. Unfortunately my test systems (2 user mode linux images) often fails to reproduce the origin issue due to a crash in at another place.

Now I'm wondering if somebody can point me to (few) patches, which I could just apply during bisect to avoid running into the "false" bug.

A typical back trace of the NFS server image is :

foerste@n22 ~ $ cat /mnt/ramdisk/bt.v3.9-rc2-6-g2116bda
[New LWP 31064]
Core was generated by `/home/tfoerste/devel/linux/linux earlyprintk ubda=/home/tfoerste/virtual/uml/tr'.
Program terminated with signal 6, Aborted.
#0 0xb77d0424 in __kernel_vsyscall ()
#0 0xb77d0424 in __kernel_vsyscall ()
#1 0x0838f175 in kill ()
#2 0x0807160d in uml_abort () at arch/um/os-Linux/util.c:93
#3 0x080718f5 in os_dump_core () at arch/um/os-Linux/util.c:138
#4 0x080611e7 in panic_exit (self=0x8589518 <panic_exit_notifier>, unused1=0, unused2=0x85bdd60 <buf.12682>) at arch/um/kernel/um_arch.c:240
#5 0x0809bbe8 in notifier_call_chain (nl=0x0, val=0, v=0x85bdd60 <buf.12682>, nr_to_call=-2, nr_calls=0x0) at kernel/notifier.c:93
#6 0x0809bd33 in __atomic_notifier_call_chain (nr_calls=<optimized out>, nr_to_call=<optimized out>, v=<optimized out>, val=<optimized out>, nh=<optimized out>) at kernel/notifier.c:182
#7 atomic_notifier_call_chain (nh=0x85bdd44 <panic_notifier_list>, val=0, v=0x85bdd60 <buf.12682>) at kernel/notifier.c:191
#8 0x083ec46c in panic (fmt=0x0) at kernel/panic.c:128
#9 0x08060bae in segv (fi=<incomplete type>, ip=136507185, is_user=0, regs=0x858785c <cpu0_irqstack+30812>) at arch/um/kernel/trap.c:209
#10 0x08060e63 in segv_handler (sig=11, unused_si=0x8587b0c <cpu0_irqstack+31500>, regs=0x858785c <cpu0_irqstack+30812>) at arch/um/kernel/trap.c:185
#11 0x08070758 in sig_handler_common (sig=11, si=0x8587b0c <cpu0_irqstack+31500>, mc=0x8587ba0 <cpu0_irqstack+31648>) at arch/um/os-Linux/signal.c:44
#12 0x0807089d in sig_handler (sig=0, si=0x8587b0c <cpu0_irqstack+31500>, mc=0x8587ba0 <cpu0_irqstack+31648>) at arch/um/os-Linux/signal.c:231
#13 0x080703eb in hard_handler (sig=6, si=0x8587b0c <cpu0_irqstack+31500>, p=0x8587ba0 <cpu0_irqstack+31648>) at arch/um/os-Linux/signal.c:165
#14 <signal handler called>
#15 nfsd4_client_tracking_exit (net=0x0) at fs/nfsd/nfs4recover.c:1414
#16 0x0822eff6 in legacy_recdir_name_error (error=-2) at fs/nfsd/nfs4recover.c:164
#17 0x0822f3e4 in nfsd4_create_clid_dir (clp=0x48d9c860) at fs/nfsd/nfs4recover.c:187
#18 0x0822f5b0 in nfsd4_client_record_create (clp=0x0) at fs/nfsd/nfs4recover.c:1331
#19 0x082278fe in nfsd4_open_confirm (rqstp=0x0, cstate=0x48f930b0, oc=0x48f94280) at fs/nfsd/nfs4state.c:3706
#20 0x08216829 in nfsd4_proc_compound (rqstp=0x1, args=0x48f940c0, resp=0x48f93090) at fs/nfsd/nfs4proc.c:1288
#21 0x08206bf3 in nfsd_dispatch (rqstp=0x48f92060, statp=0x48c63018) at fs/nfsd/nfssvc.c:671
#22 0x083758a0 in svc_process_common (rqstp=0x48f92060, argv=0x48f921b0, resv=0x48f921d8) at net/sunrpc/svc.c:1199
#23 0x08376cbb in svc_process (rqstp=0x48f92060) at net/sunrpc/svc.c:1324
#24 0x082065da in nfsd (vrqstp=0x48f92060) at fs/nfsd/nfssvc.c:594
#25 0x08096b92 in kthread (_create=0x48edfd38) at kernel/kthread.c:168
#26 0x0805e4da in new_thread_handler () at arch/um/kernel/process.c:140
#27 0x00000000 in ?? ()


--
MfG/Sincerely
Toralf Förster
pgp finger print: 7B1A 07F4 EC82 0F90 D4C2 8936 872A E508 7DB6 9DA3


2013-08-08 16:07:16

by Toralf Förster

[permalink] [raw]
Subject: Re: bisecting an NFS issue prevented by fs/nfsd/nfs4recover.c:1414

On 08/07/2013 10:26 PM, Jim Rees wrote:
> Toralf Förster wrote:
>
> I'm trying to bisect a NFS + EXT3/4 + loop device issue introduced
> probably between v3.8 and v3.10. Unfortunately my test systems (2 user
> mode linux images) often fails to reproduce the origin issue due to a
> crash in at another place.
>
> Now I'm wondering if somebody can point me to (few) patches, which I could
> just apply during bisect to avoid running into the "false" bug.
>
> It could be this:
>
> 7255e71 nfsd: fix oops when legacy_recdir_name_error is passed a -ENOENT error
>
ofc - thx

--
MfG/Sincerely
Toralf Förster
pgp finger print: 7B1A 07F4 EC82 0F90 D4C2 8936 872A E508 7DB6 9DA3

2013-08-07 20:26:21

by Jim Rees

[permalink] [raw]
Subject: Re: bisecting an NFS issue prevented by fs/nfsd/nfs4recover.c:1414

Toralf F?rster wrote:

I'm trying to bisect a NFS + EXT3/4 + loop device issue introduced
probably between v3.8 and v3.10. Unfortunately my test systems (2 user
mode linux images) often fails to reproduce the origin issue due to a
crash in at another place.

Now I'm wondering if somebody can point me to (few) patches, which I could
just apply during bisect to avoid running into the "false" bug.

It could be this:

7255e71 nfsd: fix oops when legacy_recdir_name_error is passed a -ENOENT error