2004-10-01 16:25:48

by Ivan Kalatchev

[permalink] [raw]
Subject: kernel 2.6.8 bug in fs/locks.c


Hi all,

Application I'm developing has a web interface, so user can change
configuration file for this application using IE browser e.g.
I'm using pthreads for each user-application connections. To protect
configuration file from corruption I used file locking mechanism - fcntl
with F_WRLCK/F_RDLCK.
And at one point after posting web page back immediately after browser
received it, I got this in dmesg:

------------[ cut here ]------------
kernel BUG at fs/locks.c:1726!
invalid operand: 0000 [#1]
PREEMPT
Modules linked in: hpac cdc_acm ohci_hcd usbcore e100
CPU: 0
EIP: 0060:[<c01555f1>] Not tainted
EFLAGS: 00010246 (2.6.8)
EIP is at locks_remove_flock+0x69/0xbc
eax: c113d781 ebx: c34ecf4c ecx: c34eceb0 edx: c1dbff6c
esi: c126f420 edi: c3fd0a40 ebp: c34eceb0 esp: c35b8f84
ds: 007b es: 007b ss: 0068
Process hdas (pid: 6875, threadinfo=c35b8000 task=c113d710)
Stack: c126f420 c126f420 c0142d13 c126f420 c126f420 00000007 00000000
bf5ff914
c3c9ce60 c0142cdf c01513e9 00000009 00000007 00000009 c35b8000
c0105c4b
00000009 00000007 bf5ff914 00000007 00000009 bf5ff6f4 000000dd
c010007b
Call Trace:
[<c0142d13>] __fput+0x33/0x104
[<c0142cdf>] fput+0x13/0x14
[<c01513e9>] sys_fcntl64+0x69/0x70
[<c0105c4b>] syscall_call+0x7/0xb
Code: 0f 0b be 06 de 4a 23 c0 89 d3 8b 13 85 d2 75 c7 ba 00 f0 ff

Configuration file was destroyed of course, as I opened it for writing with
truncation.
I'm changing code now to use pthread mutexes to control access to the
configuration file,
hopefully it will work better. So this message more like bug info, but I
would like
to be CC-ed all answers/comments posted to the list in response to this
posting.

Regards,

_________________________________________________
Ivan Kalatchev

Senior Software Developer
Engineering Seismology Group Canada Inc.
ISO 9001-2000
1 Hyperion Court, Kingston,
Ontarion, Canada K7K 7G3
phone: (613) 548-8287 ext.247
fax: (613) 548-8917



2004-10-01 19:25:24

by Chris Wright

[permalink] [raw]
Subject: Re: kernel 2.6.8 bug in fs/locks.c

* Ivan Kalatchev ([email protected]) wrote:
> I'm using pthreads for each user-application connections. To protect
> configuration file from corruption I used file locking mechanism - fcntl
> with F_WRLCK/F_RDLCK.

I must be confused. pthreads and fcntl locking...that does't give
proper exclusion? The BUG, however, is no good. Despite the fact
that it appears to come at the result of an application bug, we should
be able to handle this w/out a BUG. AFAICT, one thread has closed the
file descriptor, whilst another is mucking with the locks. So the locker
winds up holding the last ref to the filp. This blows the logic of when
locks_remove_posix gets called. Thanks for the bug report.

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

2004-10-02 00:53:51

by Chris Wright

[permalink] [raw]
Subject: [TEST PATCH] Re: kernel 2.6.8 bug in fs/locks.c

* Chris Wright ([email protected]) wrote:
> * Ivan Kalatchev ([email protected]) wrote:
> > I'm using pthreads for each user-application connections. To protect
> > configuration file from corruption I used file locking mechanism - fcntl
> > with F_WRLCK/F_RDLCK.
>
> I must be confused. pthreads and fcntl locking...that does't give
> proper exclusion? The BUG, however, is no good. Despite the fact
> that it appears to come at the result of an application bug, we should
> be able to handle this w/out a BUG. AFAICT, one thread has closed the
> file descriptor, whilst another is mucking with the locks. So the locker
> winds up holding the last ref to the filp. This blows the logic of when
> locks_remove_posix gets called. Thanks for the bug report.

This is a rather ugly fix, but it demonstrates the hole. It's possible
for a posix lock to get applied after the final close of the file. This
means locks_remove_posix misses the yet to be created posix lock, and
on final fput when leaving fcntl, locks_remove_flock will BUG().
I was able to reproduce Ivan's bugreport. I doubt it's the right/best
fix, but it's functional (no more BUG()). Against -bk-curr.

Signed-off-by: Chris Wright <[email protected]>

===== fs/fcntl.c 1.40 vs edited =====
--- 1.40/fs/fcntl.c 2004-09-02 02:48:03 -07:00
+++ edited/fs/fcntl.c 2004-10-01 17:34:15 -07:00
@@ -364,7 +364,7 @@

asmlinkage long sys_fcntl(int fd, unsigned int cmd, unsigned long arg)
{
- struct file *filp;
+ struct file *filp, *filp2;
long err = -EBADF;

filp = fget(fd);
@@ -379,6 +379,11 @@

err = do_fcntl(fd, cmd, arg, filp);

+ filp2 = fget(fd);
+ if (filp2 != filp)
+ locks_remove_posix(filp, current->files);
+ if (filp2)
+ fput(filp2);
fput(filp);
out:
return err;
@@ -387,7 +392,7 @@
#if BITS_PER_LONG == 32
asmlinkage long sys_fcntl64(unsigned int fd, unsigned int cmd, unsigned long arg)
{
- struct file * filp;
+ struct file *filp, *filp2;
long err;

err = -EBADF;
@@ -414,6 +419,12 @@
err = do_fcntl(fd, cmd, arg, filp);
break;
}
+
+ filp2 = fget(fd);
+ if (filp2 != filp)
+ locks_remove_posix(filp, current->files);
+ if (filp2)
+ fput(filp2);
fput(filp);
out:
return err;