2010-07-28 18:14:53

by yo mama

[permalink] [raw]
Subject: BUG: unable to handle kernel NULL pointer dereference at 0000000000000020

Hello,

I am running Kernel version 2.6.35-rc5-pnfs and notice there is kernel
crash everytime i run iozone on the mounted directory. Is there a
patch already developed for this bug which i can apply. Please see the
error message from message log.




Jul 28 11:06:28 localhost kernel: BUG: unable to handle kernel NULL
pointer dereference at 0000000000000020
Jul 28 11:06:28 localhost kernel: IP: [<ffffffffa025d117>]
encode_stateid+0x3e/0x8f [nfs]
Jul 28 11:06:28 localhost kernel: PGD 6d288067 PUD 70e43067 PMD 0
Jul 28 11:06:28 localhost kernel: Oops: 0000 [#1] SMP
Jul 28 11:06:28 localhost kernel: last sysfs file:
/sys/devices/pci0000:00/0000:00:1f.3/local_cpus
Jul 28 11:06:28 localhost kernel: CPU 0
Jul 28 11:06:28 localhost kernel: Modules linked in: nfslayoutdriver
nfs lockd fscache nfs_acl auth_rpcgss sunrpc ipv6 p4_clockmod
freq_table speedstep_lib dm_multipath uinput e1000e iTCO_wdt serio_raw
iTCO_vendor_support i2c_i801 pcspkr usb_storage i915 drm_kms_helper
drm i2c_algo_bit i2c_core video output [last unloaded: mperf]
Jul 28 11:06:28 localhost kernel:
Jul 28 11:06:28 localhost kernel: Pid: 1317, comm: iozone Not tainted
2.6.35-rc5-pnfs #1 To be filled by O.E.M./To Be Filled By O.E.M.
Jul 28 11:06:28 localhost kernel: RIP: 0010:[<ffffffffa025d117>]
[<ffffffffa025d117>] encode_stateid+0x3e/0x8f [nfs]
Jul 28 11:06:28 localhost kernel: RSP: 0018:ffff880079223a08 EFLAGS: 00010286
Jul 28 11:06:28 localhost kernel: RAX: ffff8800744b10b8 RBX:
ffff8800744b10b8 RCX: 0000000000000000
Jul 28 11:06:28 localhost kernel: RDX: ffff88007b03c008 RSI:
ffff880070a0d9c0 RDI: ffff880079223ab8
Jul 28 11:06:28 localhost kernel: RBP: ffff880079223a48 R08:
ffff8800744b10b4 R09: 0000000000000004
Jul 28 11:06:28 localhost kernel: R10: ffff8800702d72b8 R11:
0000000000000007 R12: ffff88006d148500
Jul 28 11:06:28 localhost kernel: R13: 0000000000000000 R14:
ffff880079223ab8 R15: ffffffffa025d2a1
Jul 28 11:06:28 localhost kernel: FS: 00007f34ff276700(0000)
GS:ffff880002000000(0000) knlGS:0000000000000000
Jul 28 11:06:28 localhost kernel: CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
Jul 28 11:06:28 localhost kernel: CR2: 0000000000000020 CR3:
000000006d2a7000 CR4: 00000000000006f0
Jul 28 11:06:28 localhost kernel: DR0: 0000000000000000 DR1:
0000000000000000 DR2: 0000000000000000
Jul 28 11:06:28 localhost kernel: DR3: 0000000000000000 DR6:
00000000ffff0ff0 DR7: 0000000000000400
Jul 28 11:06:28 localhost kernel: Process iozone (pid: 1317,
threadinfo ffff880079222000, task ffff88006ed4dcc0)
Jul 28 11:06:28 localhost kernel: Stack:
Jul 28 11:06:28 localhost kernel: ffff880079223a18 ffffffffa025ae90
ffff880079223a78 00000000a4433f68
Jul 28 11:06:28 localhost kernel: <0> 000a12d000000010
ffff880079223a88 ffff880079223ab8 ffff8800702d71f0
Jul 28 11:06:28 localhost kernel: <0> ffff880079223a78
ffffffffa025d1a3 ffff88007b03c000 ffff8800702d71f0
Jul 28 11:06:28 localhost kernel: Call Trace:
Jul 28 11:06:28 localhost kernel: [<ffffffffa025ae90>] ?
reserve_space+0xe/0x19 [nfs]
Jul 28 11:06:28 localhost kernel: [<ffffffffa025d1a3>]
encode_write+0x3b/0x89 [nfs]
Jul 28 11:06:28 localhost kernel: [<ffffffffa025d33a>]
nfs4_xdr_enc_write+0x99/0xc5 [nfs]
Jul 28 11:06:28 localhost kernel: [<ffffffffa01b782a>]
rpcauth_wrap_req+0x7f/0x8e [sunrpc]
Jul 28 11:06:28 localhost kernel: [<ffffffffa01afe97>]
call_transmit+0x1e9/0x26d [sunrpc]
Jul 28 11:06:28 localhost kernel: [<ffffffffa01b6d4a>]
__rpc_execute+0x86/0x213 [sunrpc]
Jul 28 11:06:28 localhost kernel: [<ffffffffa01b6efa>]
rpc_execute+0x23/0x27 [sunrpc]
Jul 28 11:06:28 localhost kernel: [<ffffffffa01b09bf>]
rpc_run_task+0x29/0x2f [sunrpc]
Jul 28 11:06:28 localhost kernel: [<ffffffffa0244812>]
nfs_direct_write_execute+0x6a/0xc8 [nfs]
Jul 28 11:06:28 localhost kernel: [<ffffffffa02452b3>]
nfs_file_direct_write+0x503/0x66f [nfs]
Jul 28 11:06:28 localhost kernel: [<ffffffff8106506e>] ? wake_up_bit+0x25/0x2a
Jul 28 11:06:28 localhost kernel: [<ffffffffa023d0c4>]
nfs_file_write+0x62/0x184 [nfs]
Jul 28 11:06:28 localhost kernel: [<ffffffff8110410e>] do_sync_write+0xcb/0x108
Jul 28 11:06:28 localhost kernel: [<ffffffffa0249471>] ?
nfs_wb_all+0x42/0x44 [nfs]
Jul 28 11:06:28 localhost kernel: [<ffffffff811c5caf>] ?
security_file_permission+0x16/0x18
Jul 28 11:06:28 localhost kernel: [<ffffffff81104691>] vfs_write+0xae/0x10b
Jul 28 11:06:28 localhost kernel: [<ffffffff811047ae>] sys_write+0x4a/0x6e
Jul 28 11:06:28 localhost kernel: [<ffffffff81009c32>]
system_call_fastpath+0x16/0x1b
Jul 28 11:06:28 localhost kernel: Code: 89 f4 be 10 00 00 00 49 89 d5
65 48 8b 04 25 28 00 00 00 48 89 45 d8 31 c0 e8 78 dd ff ff 49 8b 74
24 48 48 89 c3 48 85 f6 74 1e <49> 8b 55 20 4c 8d 65 c0 41 8b 4d 28 4c
89 e7 e8 c5 4d 00 00 ba
Jul 28 11:06:28 localhost kernel: RIP [<ffffffffa025d117>]
encode_stateid+0x3e/0x8f [nfs]
Jul 28 11:06:28 localhost kernel: RSP <ffff880079223a08>
Jul 28 11:06:28 localhost kernel: CR2: 0000000000000020
Jul 28 11:06:28 localhost kernel: ---[ end trace 8dabbf8bb362aa6e ]---
Jul 28 11:06:36 localhost abrtd: Hmm, stray update_client: 'Creating
kernel oops crash reports...'
Jul 28 11:06:36 localhost abrt: Kerneloops: Reported 1 kernel oopses to Abrt
Jul 28 11:06:36 localhost abrtd: Directory 'kerneloops-1280340396-1'
creation detected
Jul 28 11:06:36 localhost abrtd: Getting local universal unique identification
Jul 28 11:06:36 localhost abrtd: New crash, saving...


2010-07-29 04:40:29

by Bian Naimeng

[permalink] [raw]
Subject: Re: BUG: unable to handle kernel NULL pointer dereference at 0000000000000020

> Hello,
>
> I am running Kernel version 2.6.35-rc5-pnfs and notice there is kernel
> crash everytime i run iozone on the mounted directory. Is there a
> patch already developed for this bug which i can apply. Please see the
> error message from message log.
>

I'm not sure it's ok, please try it.

----

nfs_writeargs.lock_context always NULL at direct write procedure,
it will cause kernel panic when encode stateid.

Signed-off-by: Bian Naimeng <[email protected]>

---
fs/nfs/direct.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index 3ef9b0c..cb2e1fd 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -801,6 +801,7 @@ static ssize_t nfs_direct_write_schedule_segment(struct nfs_direct_req *dreq,
data->cred = msg.rpc_cred;
data->args.fh = NFS_FH(inode);
data->args.context = ctx;
+ data->args.lock_context = nfs_get_lock_context(ctx);
data->args.offset = pos;
data->args.pgbase = pgbase;
data->args.pages = data->pagevec;
--
1.6.5.2

--
Regards
Bian Naimeng


2010-07-30 18:29:06

by Trond Myklebust

[permalink] [raw]
Subject: Re: BUG: unable to handle kernel NULL pointer dereference at 0000000000000020

On Thu, 2010-07-29 at 12:39 +0800, Bian Naimeng wrote:
> > Hello,
> >
> > I am running Kernel version 2.6.35-rc5-pnfs and notice there is kernel
> > crash everytime i run iozone on the mounted directory. Is there a
> > patch already developed for this bug which i can apply. Please see the
> > error message from message log.
> >
>
> I'm not sure it's ok, please try it.
>
> ----
>
> nfs_writeargs.lock_context always NULL at direct write procedure,
> it will cause kernel panic when encode stateid.
>
> Signed-off-by: Bian Naimeng <[email protected]>
>
> ---
> fs/nfs/direct.c | 1 +
> 1 files changed, 1 insertions(+), 0 deletions(-)
>
> diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
> index 3ef9b0c..cb2e1fd 100644
> --- a/fs/nfs/direct.c
> +++ b/fs/nfs/direct.c
> @@ -801,6 +801,7 @@ static ssize_t nfs_direct_write_schedule_segment(struct nfs_direct_req *dreq,
> data->cred = msg.rpc_cred;
> data->args.fh = NFS_FH(inode);
> data->args.context = ctx;
> + data->args.lock_context = nfs_get_lock_context(ctx);
> data->args.offset = pos;
> data->args.pgbase = pgbase;
> data->args.pages = data->pagevec;
> --
> 1.6.5.2
>

Well caught. There is a similar issue with NFS reads too. I'll fix up
the lock state tracking patch...

Thanks
Trond


2010-07-31 04:56:55

by yo mama

[permalink] [raw]
Subject: Re: BUG: unable to handle kernel NULL pointer dereference at 0000000000000020

Just an update, kernel crash occurs even when running iozone in
automode, please see the updated /var/log/messages


Jul 30 21:26:21 localhost kernel: FS-Cache: Loaded
Jul 30 21:26:22 localhost kernel: FS-Cache: Netfs 'nfs' registered for =
caching
Jul 30 21:29:22 localhost kernel: nfs: server 192.168.100.12 not respon=
ding, ti
med out
Jul 30 21:29:22 localhost kernel: Error: state manager failed on NFSv4 =
server 1
92.168.100.12 with error 5
Jul 30 21:37:38 localhost kernel: nfs4filelayout_init: NFSv4 File Layou=
t Driver
Registering...
Jul 30 21:39:29 localhost kernel: BUG: unable to handle kernel NULL poi=
nter der
eference at 0000000000000030
Jul 30 21:39:29 localhost kernel: IP: [<ffffffff814411d4>] _raw_spin_lo=
ck+0xe/0
x25
Jul 30 21:39:29 localhost kernel: PGD 0
Jul 30 21:39:29 localhost kernel: Oops: 0002 [#1] SMP
Jul 30 21:39:29 localhost kernel: last sysfs file: /sys/devices/pci0000=
:00/0000
:00:19.0/irq
Jul 30 21:39:29 localhost kernel: CPU 3
Jul 30 21:39:29 localhost kernel: Modules linked in: nfslayoutdriver nf=
s lockd
fscache nfs_acl auth_rpcgss sunrpc ipv6 p4_clockmod freq_table speedste=
p_lib dm
_multipath uinput e1000e iTCO_wdt iTCO_vendor_support i2c_i801 pcspkr s=
erio_raw
usb_storage i915 drm_kms_helper drm i2c_algo_bit i2c_core video output=
[last u
nloaded: mperf]
Jul 30 21:39:29 localhost kernel:
Jul 30 21:39:29 localhost kernel: Pid: 1629, comm: 192.168.100.12- Not =
tainted
2.6.35-rc5-pnfs #1 To be filled by O.E.M./To Be Filled By O.E.M.
Jul 30 21:39:29 localhost kernel: RIP: 0010:[<ffffffff814411d4>] [<fff=
fffff814
411d4>] _raw_spin_lock+0xe/0x25
Jul 30 21:39:29 localhost kernel: RSP: 0018:ffff8800700b5cb0 EFLAGS: 0=
0010282
Jul 30 21:39:29 localhost kernel: RAX: 0000000000010000 RBX: 0000000000=
000000 R
CX: 000000000020001d
Jul 30 21:39:29 localhost kernel: RDX: ffff88007bfc9900 RSI: ffffffffa0=
225fe0 R
DI: 0000000000000030
Jul 30 21:39:29 localhost kernel: RBP: ffff8800700b5cb0 R08: ffff88007c=
7c1800 R
09: ffff880000000001
Jul 30 21:39:29 localhost kernel: R10: ffff88007bfc9c00 R11: ffff880070=
b12a00 R
12: ffffffffa0225fe0
Jul 30 21:39:29 localhost kernel: R13: ffff880070bdaf64 R14: ffff880079=
79f800 R
15: ffff8800700b5d60
Jul 30 21:39:29 localhost kernel: FS: 0000000000000000(0000) GS:ffff88=
00021800
00(0000) knlGS:0000000000000000
Jul 30 21:39:29 localhost kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000=
00008005
003b
Jul 30 21:39:29 localhost kernel: CR2: 0000000000000030 CR3: 0000000001=
a42000 C
R4: 00000000000006e0
Jul 30 21:39:29 localhost kernel: DR0: 0000000000000000 DR1: 0000000000=
000000 D
R2: 0000000000000000
Jul 30 21:39:29 localhost kernel: DR3: 0000000000000000 DR6: 00000000ff=
ff0ff0 D
R7: 0000000000000400
Jul 30 21:39:29 localhost kernel: Process 192.168.100.12- (pid: 1629, t=
hreadinf
o ffff8800700b4000, task ffff8800796e1730)
Jul 30 21:39:29 localhost kernel: Stack:
Jul 30 21:39:29 localhost kernel: ffff8800700b5cc0 ffffffffa0212c6e fff=
f8800700
b5ce0 ffffffffa0214168
Jul 30 21:39:29 localhost kernel: <0> ffff880070bdaf00 ffff880070bdaf54=
ffff880
0700b5d00 ffffffffa0201ab1
Jul 30 21:39:29 localhost kernel: <0> ffff8800700b5d00 ffff880070bdaf00=
ffff880
0700b5d40 ffffffffa0201b92
Jul 30 21:39:29 localhost kernel: Call Trace:
Jul 30 21:39:29 localhost kernel: [<ffffffffa0212c6e>] spin_lock+0xe/0x=
10 [nfs]
Jul 30 21:39:29 localhost kernel: [<ffffffffa0214168>] pnfs_set_layout_=
stateid+
0x1b/0x3b [nfs]
Jul 30 21:39:29 localhost kernel: [<ffffffffa0201ab1>] pnfs4_layout_rec=
laim+0x3
5/0x39 [nfs]
Jul 30 21:39:29 localhost kernel: [<ffffffffa0201b92>] nfs4_open_recove=
r+0xdd/0
xf1 [nfs]
Jul 30 21:39:29 localhost kernel: [<ffffffffa0201d04>] nfs4_open_delega=
tion_rec
all+0x80/0x13f [nfs]
Jul 30 21:39:29 localhost kernel: [<ffffffffa020e7e0>] __nfs_inode_retu=
rn_deleg
ation+0xc7/0x1f5 [nfs]
Jul 30 21:39:29 localhost kernel: [<ffffffff810caf61>] ? do_writepages+=
0x21/0x2
a
Jul 30 21:39:29 localhost kernel: [<ffffffffa020e9ff>] nfs_client_retur=
n_marked
_delegations+0x85/0xc6 [nfs]
Jul 30 21:39:29 localhost kernel: [<ffffffffa020d7da>] nfs4_run_state_m=
anager+0
x368/0x494 [nfs]
Jul 30 21:39:29 localhost kernel: [<ffffffffa020d472>] ? nfs4_run_state=
_manager
+0x0/0x494 [nfs]
Jul 30 21:39:29 localhost kernel: [<ffffffff81064bd9>] kthread+0x7f/0x8=
7
Jul 30 21:39:29 localhost kernel: [<ffffffff8100aa24>] kernel_thread_he=
lper+0x4
/0x10
Jul 30 21:39:29 localhost kernel: [<ffffffff81064b5a>] ? kthread+0x0/0x=
87
Jul 30 21:39:29 localhost kernel: [<ffffffff8100aa20>] ? kernel_thread_=
helper+0
x0/0x10
Jul 30 21:39:29 localhost kernel: Code: c2 8d 90 00 00 01 00 75 04 f0 0=
f b1 17
0f 94 c2 0f b6 c2 85 c0 c9 0f 95 c0 0f b6 c0 c3 55 48 89 e5 0f 1f 44 00=
00 b8 0
0 00 01 00 <f0> 0f c1 07 0f b7 d0 c1 e8 10 39 c2 74 07 f3 90 0f b7 17 e=
b f5
Jul 30 21:39:29 localhost kernel: RIP [<ffffffff814411d4>] _raw_spin_l=
ock+0xe/
0x25


On Fri, Jul 30, 2010 at 11:58 PM, Trond Myklebust
<[email protected]> wrote:
> On Thu, 2010-07-29 at 12:39 +0800, Bian Naimeng wrote:
>> > Hello,
>> >
>> > I am running Kernel version 2.6.35-rc5-pnfs and notice there is ke=
rnel
>> > crash everytime i run iozone on the mounted directory. Is there a
>> > patch already developed for this bug which i can apply. Please see=
the
>> > error message from message log.
>> >
>>
>> =A0I'm not sure it's ok, please try it.
>>
>> ----
>>
>> =A0nfs_writeargs.lock_context always NULL at direct write procedure,
>> it will cause kernel panic when encode stateid.
>>
>> Signed-off-by: Bian Naimeng <[email protected]>
>>
>> ---
>> =A0fs/nfs/direct.c | =A0 =A01 +
>> =A01 files changed, 1 insertions(+), 0 deletions(-)
>>
>> diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
>> index 3ef9b0c..cb2e1fd 100644
>> --- a/fs/nfs/direct.c
>> +++ b/fs/nfs/direct.c
>> @@ -801,6 +801,7 @@ static ssize_t nfs_direct_write_schedule_segment=
(struct nfs_direct_req *dreq,
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 data->cred =3D msg.rpc_cred;
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 data->args.fh =3D NFS_FH(inode);
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 data->args.context =3D ctx;
>> + =A0 =A0 =A0 =A0 =A0 =A0 data->args.lock_context =3D nfs_get_lock_c=
ontext(ctx);
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 data->args.offset =3D pos;
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 data->args.pgbase =3D pgbase;
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 data->args.pages =3D data->pagevec;
>> --
>> 1.6.5.2
>>
>
> Well caught. There is a similar issue with NFS reads too. I'll fix up
> the lock state tracking patch...
>
> Thanks
> =A0Trond
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" =
in
> the body of a message to [email protected]
> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
>