Hi,
I'm investigating the reliabiblity of the NFS locking.
I noticed that possible NFS locking related crash in the following situation:
process A
process B
-- A and B are sharing task's fd array.
(clone()d with CLONE_FILES)
file F
-- The file on NFS
file descriptor p (equivalent to file struct P)
file descriptor q (equivalent to file struct Q)
-- p and q are individual file descriptors for the file F
(not dup()-ed)
file lock L
-- The file lock L has been locked via fcntl() for the file descriptor q by
the process B (connects with file struct Q)
1. The process A closes the file descriptor p.
In filp_close(), the process A closes file struct P, it unlocks all the
file locks related to the i-node of the file F, which are held by the
processes sharing the same fd array process A refers to. (locks_remove_posix)
2. The process A unlocks the file lock L.
First of all, the process A removes the file lock L from the list of the
file locks related to the i-node of the file F. Then, it calls the `nfs_lock'
to do the unlocking operation for its file-system dependent operation.
3. While executing the `nfs_lock' with RPC procedure, the process A
sleep on there for a while.
On the other side.
4. The process B closes the file descriptor q.
Because process A has already remove the entry of the file lock from the list,
process B cannot find the entry so it just exit without doing anything about
the list.
System treats the closing operation carried out by the process B is done,
while the process A is sleeping.
The process B invalidates the file struct Q because it is no longer needed.
But, the process A has not finished the operation of the unlocking
for file lock L yet.
5. When the process A wakes up, it attempts to execute remaining unlocking
works, and accesses the file struct Q.
Because the file struct Q is no longer valid, it is likely to cause NULL
pointer dereference.
Also, the file struct Q might be used by other files. in this case, the data
contradiction would happen.
Does anyone have a idea of how to fix it ?
Regards,
--
Akinobu Mita
>>>>> " " == Akinobu Mita <[email protected]> writes:
> Does anyone have a idea of how to fix it ?
Yes. I posted a patch about a week or 2 ago. The original patch can be
found on
http://www.fys.uio.no/~trondmy/src/Linux-2.4.x/2.4.23-rc1/linux-2.4.23-01-posix_race.dif
However, I now believe the real problem here is that
locks_remove_posix() should also be checking the pid (as is done in
all the other POSIX locking checks by calling locks_same_owner()).
It is wrong for locks_remove_posix() to be deleting locks that don't
belong to this pid... Note: this bug exists in 2.6.x. too, although
there it does not cause an Oops...
Cheers,
Trond
--- linux-2.4.23-rc1/fs/locks.c.orig 2003-11-16 19:30:53.000000000 -0500
+++ linux-2.4.23-rc1/fs/locks.c 2003-11-25 19:34:02.000000000 -0500
@@ -1746,7 +1746,8 @@
lock_kernel();
before = &inode->i_flock;
while ((fl = *before) != NULL) {
- if ((fl->fl_flags & FL_POSIX) && fl->fl_owner == owner) {
+ if ((fl->fl_flags & FL_POSIX) && fl->fl_owner == owner &&
+ fl->fl_pid == current->pid) {
locks_unlock_delete(before);
before = &inode->i_flock;
continue;
Thanks, Trond.
but, your patch causes memory leak.
# gcc leak.c -o leak -lpthread
# find /usr -type f -exec ./leak {} \; &
# while true; do sleep 1; grep file_lock_cache /proc/slabinfo;done
-- leak.c --
#include <strings.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <fcntl.h>
#include <pthread.h>
int process_B(void *arg)
{
int i, ret;
struct stat stat;
int fd = *(int *)arg;
struct flock lck;
if ((ret = fstat(fd, &stat)) < 0) {
perror("fstat");
return ret;
}
for (i = 0; i < stat.st_size/2; i++) {
lck.l_type = F_RDLCK;
lck.l_whence = 0;
lck.l_start = 2*i;
lck.l_len = 1;
if ((ret = fcntl(fd, F_SETLK, &lck)) < 0) {
perror("fcntl");
return ret;
}
}
return 0;
}
int main(int argc, char **argv)
{
int p, ret;
pthread_t tid;
p = open(argv[1], O_RDWR);
if (p < 0) {
perror("open");
exit(1);
}
pthread_create(&tid, NULL, process_B, &p);
pthread_join(tid, NULL);
if ((ret = close(p)) < 0)
perror("close");
exit(0);
}
----
it seems that your another patch could not avoid the race completely.
Cheers,
--
Akinobu Mita
On Wednesday 26 November 2003 09:35, Trond Myklebust wrote:
> >>>>> " " == Akinobu Mita <[email protected]> writes:
> > Does anyone have a idea of how to fix it ?
>
> Yes. I posted a patch about a week or 2 ago. The original patch can be
> found on
>
>
> http://www.fys.uio.no/~trondmy/src/Linux-2.4.x/2.4.23-rc1/linux-2.4.23-01-p
>osix_race.dif
>
> However, I now believe the real problem here is that
> locks_remove_posix() should also be checking the pid (as is done in
> all the other POSIX locking checks by calling locks_same_owner()).
>
> It is wrong for locks_remove_posix() to be deleting locks that don't
> belong to this pid... Note: this bug exists in 2.6.x. too, although
> there it does not cause an Oops...
>
> Cheers,
> Trond
>
> --- linux-2.4.23-rc1/fs/locks.c.orig 2003-11-16 19:30:53.000000000 -0500
> +++ linux-2.4.23-rc1/fs/locks.c 2003-11-25 19:34:02.000000000 -0500
> @@ -1746,7 +1746,8 @@
> lock_kernel();
> before = &inode->i_flock;
> while ((fl = *before) != NULL) {
> - if ((fl->fl_flags & FL_POSIX) && fl->fl_owner == owner) {
> + if ((fl->fl_flags & FL_POSIX) && fl->fl_owner == owner &&
> + fl->fl_pid == current->pid) {
> locks_unlock_delete(before);
> before = &inode->i_flock;
> continue;
On Thursday 27 November 2003 06:54, Akinobu Mita wrote:
>From somebody trying to learn something here.
Unpatched 2.6.0-test11 here, using anticipatory scheduler
What should I expect to see occuring when this is executed?
Here, after a few initial cycles of the numbers getting larger, then
stepping smaller and restarting the rise, eventually (a minute or so)
the numbers started to rise and never stopped till I killed it.
The first 2 numbers always matched, and a much smaller pair near the
end of the line always matched, the first pair being something above
30,000 when I stopped it after about 2 1/2 minutes.
>Thanks, Trond.
>
>but, your patch causes memory leak.
[snip code]
--
Cheers, Gene
AMD K6-III@500mhz 320M
Athlon1600XP@1400mhz 512M
99.27% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attornies please note, additions to this message
by Gene Heskett are:
Copyright 2003 by Maurice Eugene Heskett, all rights reserved.
>>>>> " " == Akinobu Mita <[email protected]> writes:
> Thanks, Trond. but, your patch causes memory leak.
Yep. Worse: pthreads assumes that we don't use the pid as the lock
owner. That again means that the test in locks_same_owner() is
incorrect.
For 2.6.x, the NPTL further complicates matters by introducing the
tgid as their equivalent of the posix process id, and not tying
CLONE_THREAD to CLONE_FILES. AFAICS there's nothing we can do about
that...
So then the correct thing to do is indeed to wrap the call to
locks_unlock_delete() with an fget()/fput() pair, and then to remove
the test for fl_pid in locks_same_owner().
We then need to fix lockd so that it generates correct fl_owners for
its locks...
Let me see if I can get that right.
Cheers,
Trond