2004-05-21 23:00:15

by Lever, Charles

[permalink] [raw]
Subject: FW: RHEL 3.0 update 2 doesn't have fix for posix locking bug

saw this originally in SLES 8 SP 3 a while back, now we've verified it
in RHEL 3.0 update 2 as well. when running with multiple threads, sio
appears to hit a bug in the POSIX locking code in 2.4.

-----Original Message-----
From: Lever, Charles=20
Sent: Friday, May 21, 2004 3:53 PM
To: Steve Dickson
Cc: 'Trond Myklebust'; Suggs, Darrell
Subject: RHEL 3.0 update 2 doesn't have fix for posix locking bug


got this oops running sio (5.46) on my RHEL 3.0 update 2 system:

NMI Watchdog detected LOCKUP on CPU0, eip 02157674, registers:

CPU: 0
EIP: 0060:[<02157674>] Not tainted
EFLAGS: 00000086

EIP is at .text.lock.usercopy [kernel] 0x2f (2.4.21pre1/i686)
eax: 02605c80 ebx: 00000000 ecx: 00000000 edx: 00000000
esi: 021c7b85 edi: 04807c04 ebp: 04807c04 esp: 04807ba4
ds: 0068 es: 0068 ss: 0068
Process sio_linux (pid: 5861, stackpage=3D04807000)
Stack: 00000010 024706c0 00000010 02605c80 00000000 021c7b85 04807c04
00000002
0215741b 021c7b85 04807c04 00000002 00000000 00000000 021c7b85
fffffff2
04807d9c 0210abed 00000002 04807c04 021c7b85 00000006 00003733
00003746
Call Trace: [<021c7b85>] nlmclnt_proc [kernel] 0x1b5 (0x04807bb8)
[<0215741b>] get_user_size [kernel] 0x4b (0x04807bc4) [<021c7b85>]
nlmclnt_proc [kernel] 0x1b5 (0x04807bc8) [<021c7b85>] nlmclnt_proc
[kernel] 0x1b5 (0x04807bdc) [<0210abed>] handle_BUG [kernel] 0x5d
(0x04807be8) [<021c7b85>] nlmclnt_proc [kernel] 0x1b5 (0x04807bf4)
[<0211b950>] do_page_fault [kernel] 0x0 (0x04807c10) [<0210ad10>] die
[kernel] 0x40 (0x04807c18) [<0211bc21>] do_page_fault [kernel] 0x2d1
(0x04807c2c) [<02324a5b>] rpc_call_sync [kernel] 0xcb (0x04807c74)
[<0211b950>] do_page_fault [kernel] 0x0 (0x04807cd0) [<021c0068>]
nfs_update_request [kernel] 0x158 (0x04807d00) [<021c7b85>] nlmclnt_proc
[kernel] 0x1b5 (0x04807d0c) [<021b852d>] nfs_lock [kernel] 0x1fd
(0x04807f34) [<021c1a9a>] nfs_sync_file [kernel] 0x7a (0x04807f4c)
[<021712da>] locks_remove_posix [kernel] 0xda (0x04807f74) [<02158cd7>]
filp_close [kernel] 0x87 (0x04807f94) [<02158d86>] sys_close [kernel]
0x66 (0x04807fb0)

trond, don't you have a fix for this for 2.4 kernels?

- Chuck Lever
--
corporate: <cel at netapp dot com>
personal: <chucklever at bigfoot dot com>


-------------------------------------------------------
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g.
Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs