Hello,
I am running Kernel version 2.6.35-rc5-pnfs and notice there is kernel
crash everytime i run iozone on the mounted directory. Is there a
patch already developed for this bug which i can apply. Please see the
error message from message log.
Jul 28 11:06:28 localhost kernel: BUG: unable to handle kernel NULL
pointer dereference at 0000000000000020
Jul 28 11:06:28 localhost kernel: IP: [<ffffffffa025d117>]
encode_stateid+0x3e/0x8f [nfs]
Jul 28 11:06:28 localhost kernel: PGD 6d288067 PUD 70e43067 PMD 0
Jul 28 11:06:28 localhost kernel: Oops: 0000 [#1] SMP
Jul 28 11:06:28 localhost kernel: last sysfs file:
/sys/devices/pci0000:00/0000:00:1f.3/local_cpus
Jul 28 11:06:28 localhost kernel: CPU 0
Jul 28 11:06:28 localhost kernel: Modules linked in: nfslayoutdriver
nfs lockd fscache nfs_acl auth_rpcgss sunrpc ipv6 p4_clockmod
freq_table speedstep_lib dm_multipath uinput e1000e iTCO_wdt serio_raw
iTCO_vendor_support i2c_i801 pcspkr usb_storage i915 drm_kms_helper
drm i2c_algo_bit i2c_core video output [last unloaded: mperf]
Jul 28 11:06:28 localhost kernel:
Jul 28 11:06:28 localhost kernel: Pid: 1317, comm: iozone Not tainted
2.6.35-rc5-pnfs #1 To be filled by O.E.M./To Be Filled By O.E.M.
Jul 28 11:06:28 localhost kernel: RIP: 0010:[<ffffffffa025d117>]
[<ffffffffa025d117>] encode_stateid+0x3e/0x8f [nfs]
Jul 28 11:06:28 localhost kernel: RSP: 0018:ffff880079223a08 EFLAGS: 00010286
Jul 28 11:06:28 localhost kernel: RAX: ffff8800744b10b8 RBX:
ffff8800744b10b8 RCX: 0000000000000000
Jul 28 11:06:28 localhost kernel: RDX: ffff88007b03c008 RSI:
ffff880070a0d9c0 RDI: ffff880079223ab8
Jul 28 11:06:28 localhost kernel: RBP: ffff880079223a48 R08:
ffff8800744b10b4 R09: 0000000000000004
Jul 28 11:06:28 localhost kernel: R10: ffff8800702d72b8 R11:
0000000000000007 R12: ffff88006d148500
Jul 28 11:06:28 localhost kernel: R13: 0000000000000000 R14:
ffff880079223ab8 R15: ffffffffa025d2a1
Jul 28 11:06:28 localhost kernel: FS: 00007f34ff276700(0000)
GS:ffff880002000000(0000) knlGS:0000000000000000
Jul 28 11:06:28 localhost kernel: CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
Jul 28 11:06:28 localhost kernel: CR2: 0000000000000020 CR3:
000000006d2a7000 CR4: 00000000000006f0
Jul 28 11:06:28 localhost kernel: DR0: 0000000000000000 DR1:
0000000000000000 DR2: 0000000000000000
Jul 28 11:06:28 localhost kernel: DR3: 0000000000000000 DR6:
00000000ffff0ff0 DR7: 0000000000000400
Jul 28 11:06:28 localhost kernel: Process iozone (pid: 1317,
threadinfo ffff880079222000, task ffff88006ed4dcc0)
Jul 28 11:06:28 localhost kernel: Stack:
Jul 28 11:06:28 localhost kernel: ffff880079223a18 ffffffffa025ae90
ffff880079223a78 00000000a4433f68
Jul 28 11:06:28 localhost kernel: <0> 000a12d000000010
ffff880079223a88 ffff880079223ab8 ffff8800702d71f0
Jul 28 11:06:28 localhost kernel: <0> ffff880079223a78
ffffffffa025d1a3 ffff88007b03c000 ffff8800702d71f0
Jul 28 11:06:28 localhost kernel: Call Trace:
Jul 28 11:06:28 localhost kernel: [<ffffffffa025ae90>] ?
reserve_space+0xe/0x19 [nfs]
Jul 28 11:06:28 localhost kernel: [<ffffffffa025d1a3>]
encode_write+0x3b/0x89 [nfs]
Jul 28 11:06:28 localhost kernel: [<ffffffffa025d33a>]
nfs4_xdr_enc_write+0x99/0xc5 [nfs]
Jul 28 11:06:28 localhost kernel: [<ffffffffa01b782a>]
rpcauth_wrap_req+0x7f/0x8e [sunrpc]
Jul 28 11:06:28 localhost kernel: [<ffffffffa01afe97>]
call_transmit+0x1e9/0x26d [sunrpc]
Jul 28 11:06:28 localhost kernel: [<ffffffffa01b6d4a>]
__rpc_execute+0x86/0x213 [sunrpc]
Jul 28 11:06:28 localhost kernel: [<ffffffffa01b6efa>]
rpc_execute+0x23/0x27 [sunrpc]
Jul 28 11:06:28 localhost kernel: [<ffffffffa01b09bf>]
rpc_run_task+0x29/0x2f [sunrpc]
Jul 28 11:06:28 localhost kernel: [<ffffffffa0244812>]
nfs_direct_write_execute+0x6a/0xc8 [nfs]
Jul 28 11:06:28 localhost kernel: [<ffffffffa02452b3>]
nfs_file_direct_write+0x503/0x66f [nfs]
Jul 28 11:06:28 localhost kernel: [<ffffffff8106506e>] ? wake_up_bit+0x25/0x2a
Jul 28 11:06:28 localhost kernel: [<ffffffffa023d0c4>]
nfs_file_write+0x62/0x184 [nfs]
Jul 28 11:06:28 localhost kernel: [<ffffffff8110410e>] do_sync_write+0xcb/0x108
Jul 28 11:06:28 localhost kernel: [<ffffffffa0249471>] ?
nfs_wb_all+0x42/0x44 [nfs]
Jul 28 11:06:28 localhost kernel: [<ffffffff811c5caf>] ?
security_file_permission+0x16/0x18
Jul 28 11:06:28 localhost kernel: [<ffffffff81104691>] vfs_write+0xae/0x10b
Jul 28 11:06:28 localhost kernel: [<ffffffff811047ae>] sys_write+0x4a/0x6e
Jul 28 11:06:28 localhost kernel: [<ffffffff81009c32>]
system_call_fastpath+0x16/0x1b
Jul 28 11:06:28 localhost kernel: Code: 89 f4 be 10 00 00 00 49 89 d5
65 48 8b 04 25 28 00 00 00 48 89 45 d8 31 c0 e8 78 dd ff ff 49 8b 74
24 48 48 89 c3 48 85 f6 74 1e <49> 8b 55 20 4c 8d 65 c0 41 8b 4d 28 4c
89 e7 e8 c5 4d 00 00 ba
Jul 28 11:06:28 localhost kernel: RIP [<ffffffffa025d117>]
encode_stateid+0x3e/0x8f [nfs]
Jul 28 11:06:28 localhost kernel: RSP <ffff880079223a08>
Jul 28 11:06:28 localhost kernel: CR2: 0000000000000020
Jul 28 11:06:28 localhost kernel: ---[ end trace 8dabbf8bb362aa6e ]---
Jul 28 11:06:36 localhost abrtd: Hmm, stray update_client: 'Creating
kernel oops crash reports...'
Jul 28 11:06:36 localhost abrt: Kerneloops: Reported 1 kernel oopses to Abrt
Jul 28 11:06:36 localhost abrtd: Directory 'kerneloops-1280340396-1'
creation detected
Jul 28 11:06:36 localhost abrtd: Getting local universal unique identification
Jul 28 11:06:36 localhost abrtd: New crash, saving...
> Hello,
>
> I am running Kernel version 2.6.35-rc5-pnfs and notice there is kernel
> crash everytime i run iozone on the mounted directory. Is there a
> patch already developed for this bug which i can apply. Please see the
> error message from message log.
>
I'm not sure it's ok, please try it.
----
nfs_writeargs.lock_context always NULL at direct write procedure,
it will cause kernel panic when encode stateid.
Signed-off-by: Bian Naimeng <[email protected]>
---
fs/nfs/direct.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index 3ef9b0c..cb2e1fd 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -801,6 +801,7 @@ static ssize_t nfs_direct_write_schedule_segment(struct nfs_direct_req *dreq,
data->cred = msg.rpc_cred;
data->args.fh = NFS_FH(inode);
data->args.context = ctx;
+ data->args.lock_context = nfs_get_lock_context(ctx);
data->args.offset = pos;
data->args.pgbase = pgbase;
data->args.pages = data->pagevec;
--
1.6.5.2
--
Regards
Bian Naimeng
On Thu, 2010-07-29 at 12:39 +0800, Bian Naimeng wrote:
> > Hello,
> >
> > I am running Kernel version 2.6.35-rc5-pnfs and notice there is kernel
> > crash everytime i run iozone on the mounted directory. Is there a
> > patch already developed for this bug which i can apply. Please see the
> > error message from message log.
> >
>
> I'm not sure it's ok, please try it.
>
> ----
>
> nfs_writeargs.lock_context always NULL at direct write procedure,
> it will cause kernel panic when encode stateid.
>
> Signed-off-by: Bian Naimeng <[email protected]>
>
> ---
> fs/nfs/direct.c | 1 +
> 1 files changed, 1 insertions(+), 0 deletions(-)
>
> diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
> index 3ef9b0c..cb2e1fd 100644
> --- a/fs/nfs/direct.c
> +++ b/fs/nfs/direct.c
> @@ -801,6 +801,7 @@ static ssize_t nfs_direct_write_schedule_segment(struct nfs_direct_req *dreq,
> data->cred = msg.rpc_cred;
> data->args.fh = NFS_FH(inode);
> data->args.context = ctx;
> + data->args.lock_context = nfs_get_lock_context(ctx);
> data->args.offset = pos;
> data->args.pgbase = pgbase;
> data->args.pages = data->pagevec;
> --
> 1.6.5.2
>
Well caught. There is a similar issue with NFS reads too. I'll fix up
the lock state tracking patch...
Thanks
Trond
Just an update, kernel crash occurs even when running iozone in
automode, please see the updated /var/log/messages
Jul 30 21:26:21 localhost kernel: FS-Cache: Loaded
Jul 30 21:26:22 localhost kernel: FS-Cache: Netfs 'nfs' registered for =
caching
Jul 30 21:29:22 localhost kernel: nfs: server 192.168.100.12 not respon=
ding, ti
med out
Jul 30 21:29:22 localhost kernel: Error: state manager failed on NFSv4 =
server 1
92.168.100.12 with error 5
Jul 30 21:37:38 localhost kernel: nfs4filelayout_init: NFSv4 File Layou=
t Driver
Registering...
Jul 30 21:39:29 localhost kernel: BUG: unable to handle kernel NULL poi=
nter der
eference at 0000000000000030
Jul 30 21:39:29 localhost kernel: IP: [<ffffffff814411d4>] _raw_spin_lo=
ck+0xe/0
x25
Jul 30 21:39:29 localhost kernel: PGD 0
Jul 30 21:39:29 localhost kernel: Oops: 0002 [#1] SMP
Jul 30 21:39:29 localhost kernel: last sysfs file: /sys/devices/pci0000=
:00/0000
:00:19.0/irq
Jul 30 21:39:29 localhost kernel: CPU 3
Jul 30 21:39:29 localhost kernel: Modules linked in: nfslayoutdriver nf=
s lockd
fscache nfs_acl auth_rpcgss sunrpc ipv6 p4_clockmod freq_table speedste=
p_lib dm
_multipath uinput e1000e iTCO_wdt iTCO_vendor_support i2c_i801 pcspkr s=
erio_raw
usb_storage i915 drm_kms_helper drm i2c_algo_bit i2c_core video output=
[last u
nloaded: mperf]
Jul 30 21:39:29 localhost kernel:
Jul 30 21:39:29 localhost kernel: Pid: 1629, comm: 192.168.100.12- Not =
tainted
2.6.35-rc5-pnfs #1 To be filled by O.E.M./To Be Filled By O.E.M.
Jul 30 21:39:29 localhost kernel: RIP: 0010:[<ffffffff814411d4>] [<fff=
fffff814
411d4>] _raw_spin_lock+0xe/0x25
Jul 30 21:39:29 localhost kernel: RSP: 0018:ffff8800700b5cb0 EFLAGS: 0=
0010282
Jul 30 21:39:29 localhost kernel: RAX: 0000000000010000 RBX: 0000000000=
000000 R
CX: 000000000020001d
Jul 30 21:39:29 localhost kernel: RDX: ffff88007bfc9900 RSI: ffffffffa0=
225fe0 R
DI: 0000000000000030
Jul 30 21:39:29 localhost kernel: RBP: ffff8800700b5cb0 R08: ffff88007c=
7c1800 R
09: ffff880000000001
Jul 30 21:39:29 localhost kernel: R10: ffff88007bfc9c00 R11: ffff880070=
b12a00 R
12: ffffffffa0225fe0
Jul 30 21:39:29 localhost kernel: R13: ffff880070bdaf64 R14: ffff880079=
79f800 R
15: ffff8800700b5d60
Jul 30 21:39:29 localhost kernel: FS: 0000000000000000(0000) GS:ffff88=
00021800
00(0000) knlGS:0000000000000000
Jul 30 21:39:29 localhost kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000=
00008005
003b
Jul 30 21:39:29 localhost kernel: CR2: 0000000000000030 CR3: 0000000001=
a42000 C
R4: 00000000000006e0
Jul 30 21:39:29 localhost kernel: DR0: 0000000000000000 DR1: 0000000000=
000000 D
R2: 0000000000000000
Jul 30 21:39:29 localhost kernel: DR3: 0000000000000000 DR6: 00000000ff=
ff0ff0 D
R7: 0000000000000400
Jul 30 21:39:29 localhost kernel: Process 192.168.100.12- (pid: 1629, t=
hreadinf
o ffff8800700b4000, task ffff8800796e1730)
Jul 30 21:39:29 localhost kernel: Stack:
Jul 30 21:39:29 localhost kernel: ffff8800700b5cc0 ffffffffa0212c6e fff=
f8800700
b5ce0 ffffffffa0214168
Jul 30 21:39:29 localhost kernel: <0> ffff880070bdaf00 ffff880070bdaf54=
ffff880
0700b5d00 ffffffffa0201ab1
Jul 30 21:39:29 localhost kernel: <0> ffff8800700b5d00 ffff880070bdaf00=
ffff880
0700b5d40 ffffffffa0201b92
Jul 30 21:39:29 localhost kernel: Call Trace:
Jul 30 21:39:29 localhost kernel: [<ffffffffa0212c6e>] spin_lock+0xe/0x=
10 [nfs]
Jul 30 21:39:29 localhost kernel: [<ffffffffa0214168>] pnfs_set_layout_=
stateid+
0x1b/0x3b [nfs]
Jul 30 21:39:29 localhost kernel: [<ffffffffa0201ab1>] pnfs4_layout_rec=
laim+0x3
5/0x39 [nfs]
Jul 30 21:39:29 localhost kernel: [<ffffffffa0201b92>] nfs4_open_recove=
r+0xdd/0
xf1 [nfs]
Jul 30 21:39:29 localhost kernel: [<ffffffffa0201d04>] nfs4_open_delega=
tion_rec
all+0x80/0x13f [nfs]
Jul 30 21:39:29 localhost kernel: [<ffffffffa020e7e0>] __nfs_inode_retu=
rn_deleg
ation+0xc7/0x1f5 [nfs]
Jul 30 21:39:29 localhost kernel: [<ffffffff810caf61>] ? do_writepages+=
0x21/0x2
a
Jul 30 21:39:29 localhost kernel: [<ffffffffa020e9ff>] nfs_client_retur=
n_marked
_delegations+0x85/0xc6 [nfs]
Jul 30 21:39:29 localhost kernel: [<ffffffffa020d7da>] nfs4_run_state_m=
anager+0
x368/0x494 [nfs]
Jul 30 21:39:29 localhost kernel: [<ffffffffa020d472>] ? nfs4_run_state=
_manager
+0x0/0x494 [nfs]
Jul 30 21:39:29 localhost kernel: [<ffffffff81064bd9>] kthread+0x7f/0x8=
7
Jul 30 21:39:29 localhost kernel: [<ffffffff8100aa24>] kernel_thread_he=
lper+0x4
/0x10
Jul 30 21:39:29 localhost kernel: [<ffffffff81064b5a>] ? kthread+0x0/0x=
87
Jul 30 21:39:29 localhost kernel: [<ffffffff8100aa20>] ? kernel_thread_=
helper+0
x0/0x10
Jul 30 21:39:29 localhost kernel: Code: c2 8d 90 00 00 01 00 75 04 f0 0=
f b1 17
0f 94 c2 0f b6 c2 85 c0 c9 0f 95 c0 0f b6 c0 c3 55 48 89 e5 0f 1f 44 00=
00 b8 0
0 00 01 00 <f0> 0f c1 07 0f b7 d0 c1 e8 10 39 c2 74 07 f3 90 0f b7 17 e=
b f5
Jul 30 21:39:29 localhost kernel: RIP [<ffffffff814411d4>] _raw_spin_l=
ock+0xe/
0x25
On Fri, Jul 30, 2010 at 11:58 PM, Trond Myklebust
<[email protected]> wrote:
> On Thu, 2010-07-29 at 12:39 +0800, Bian Naimeng wrote:
>> > Hello,
>> >
>> > I am running Kernel version 2.6.35-rc5-pnfs and notice there is ke=
rnel
>> > crash everytime i run iozone on the mounted directory. Is there a
>> > patch already developed for this bug which i can apply. Please see=
the
>> > error message from message log.
>> >
>>
>> =A0I'm not sure it's ok, please try it.
>>
>> ----
>>
>> =A0nfs_writeargs.lock_context always NULL at direct write procedure,
>> it will cause kernel panic when encode stateid.
>>
>> Signed-off-by: Bian Naimeng <[email protected]>
>>
>> ---
>> =A0fs/nfs/direct.c | =A0 =A01 +
>> =A01 files changed, 1 insertions(+), 0 deletions(-)
>>
>> diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
>> index 3ef9b0c..cb2e1fd 100644
>> --- a/fs/nfs/direct.c
>> +++ b/fs/nfs/direct.c
>> @@ -801,6 +801,7 @@ static ssize_t nfs_direct_write_schedule_segment=
(struct nfs_direct_req *dreq,
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 data->cred =3D msg.rpc_cred;
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 data->args.fh =3D NFS_FH(inode);
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 data->args.context =3D ctx;
>> + =A0 =A0 =A0 =A0 =A0 =A0 data->args.lock_context =3D nfs_get_lock_c=
ontext(ctx);
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 data->args.offset =3D pos;
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 data->args.pgbase =3D pgbase;
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 data->args.pages =3D data->pagevec;
>> --
>> 1.6.5.2
>>
>
> Well caught. There is a similar issue with NFS reads too. I'll fix up
> the lock state tracking patch...
>
> Thanks
> =A0Trond
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" =
in
> the body of a message to [email protected]
> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
>