2002-10-31 23:01:29

by Dave Olien

[permalink] [raw]
Subject: [BUG] open file descriptors remain after threaded exit() in 2.5.44


In linux 2.5.44, there seems to be a race between process exit and pthread
creation that can leave an open file descriptor that has no task
associated with it.

two test programs are included at the end of this mail. Run the first program
on a SMP system with at least two processors. The system I'm using has
8 pentium 4 processors and 16 gigabytes of memory. After the
first program exits, run the second test program to demonstrate that
there is still state remaining from the first program.

To run the tests, you must first create a file that will be used
to place record locks. You can edit the program sources to put that
file wherever you like.

The first test program begins by opening that file and using fcntl(F_SETLK) to
put a write lock on the file. The main thread then creates 8 child threads and
then exits immediately. Each child thread sleeps for 60 seconds.
But, when the parent thread exits, the child threads are forced
to exit also.

The second program tries to get a lock on that same
file. Since the first program has exited, its lock on that file should
nolonger be present, and the second program should successfully get its lock.
But instead, the second program fails. A ps -eaf shows there are no threads
from the first process still present.

If you modify the first program so that the main thread sleeps 30 seconds
before exiting, then this problem is no longer seen. The main thread exiting
still forces the child threads to exit. But those child threads seem to
now be in a state where the forced exit doesn't expose this apparent race.

Here are the test programs

-------------threaded test program. Run this first---------------------------
------------ compile with -lpthread -lm flags --------------------------

#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <math.h>
#include <pthread.h>
#include <getopt.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/fcntl.h>
#include <sys/time.h>
#include <sys/utsname.h>

void *worker_thread(void *arg)
{
sleep(60);
}


static pthread_attr_t thread_attr;
#define NTHREADS 8

main()
{
int i;
int fd;
struct flock lock;

fd = open("/home/dmo/PTH/l", O_RDWR);
if (fd == -1) {
perror("open failed");
exit(0);
}
lock.l_whence = SEEK_SET;
lock.l_type = F_WRLCK;
lock.l_start = 0;
lock.l_len = 1;

if (fcntl(fd, F_SETLK, &lock) == -1) {
perror("F_SETLK failed\n");
exit(0);
}

pthread_attr_init(&thread_attr);
pthread_attr_setdetachstate(&thread_attr, PTHREAD_CREATE_DETACHED);

for (i = 0; i < NTHREADS; i++) {
pthread_t worker_tid;

if (pthread_create(&worker_tid, &thread_attr, worker_thread,
(void *)NULL) != 0) {
perror("thread create failed");
exit(1);
}
}
/*sleep(30);*/
}

---------------------- The second test program ------------------------------
---------------------- This tests for the failure of the first --------------


#include <unistd.h>
#include <fcntl.h>

main()
{
int fd;
struct flock lock;

fd = open("/home/dmo/PTH/l", O_RDWR);
if (fd == -1) {
perror("open failed");
exit(0);
}
lock.l_whence = SEEK_SET;
lock.l_type = F_WRLCK;
lock.l_start = 0;
lock.l_len = 1;

if (fcntl(fd, F_SETLK, &lock) == -1) {
perror("F_SETLK failed\n");
exit(0);
}
printf("lock succeeded\n");
/*sleep(30);*/
}


2002-11-25 20:04:45

by Mark Wong

[permalink] [raw]
Subject: Re: [BUG] open file descriptors remain after threaded exit() in 2.5.44

Dave,

I've verified the problem still exists for SAP DB on 2.5.49 and your
test case still fails. Then I installed NGPT 2.0.4 and found that I
still cannot start, stop and start SAP DB, but your test case now
passes.

Does that offer any clues?

Mark

On Thu, 2002-10-31 at 15:07, Dave Olien wrote:
> In linux 2.5.44, there seems to be a race between process exit and pthread
> creation that can leave an open file descriptor that has no task
> associated with it.
>
> two test programs are included at the end of this mail. Run the first program
> on a SMP system with at least two processors. The system I'm using has
> 8 pentium 4 processors and 16 gigabytes of memory. After the
> first program exits, run the second test program to demonstrate that
> there is still state remaining from the first program.
>
> To run the tests, you must first create a file that will be used
> to place record locks. You can edit the program sources to put that
> file wherever you like.
>
> The first test program begins by opening that file and using fcntl(F_SETLK) to
> put a write lock on the file. The main thread then creates 8 child threads and
> then exits immediately. Each child thread sleeps for 60 seconds.
> But, when the parent thread exits, the child threads are forced
> to exit also.
>
> The second program tries to get a lock on that same
> file. Since the first program has exited, its lock on that file should
> nolonger be present, and the second program should successfully get its lock.
> But instead, the second program fails. A ps -eaf shows there are no threads
> from the first process still present.
>
> If you modify the first program so that the main thread sleeps 30 seconds
> before exiting, then this problem is no longer seen. The main thread exiting
> still forces the child threads to exit. But those child threads seem to
> now be in a state where the forced exit doesn't expose this apparent race.
>
> Here are the test programs
>
> -------------threaded test program. Run this first---------------------------
> ------------ compile with -lpthread -lm flags --------------------------
>
> #include <unistd.h>
> #include <stdlib.h>
> #include <stdio.h>
> #include <string.h>
> #include <math.h>
> #include <pthread.h>
> #include <getopt.h>
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <sys/fcntl.h>
> #include <sys/time.h>
> #include <sys/utsname.h>
>
> void *worker_thread(void *arg)
> {
> sleep(60);
> }
>
>
> static pthread_attr_t thread_attr;
> #define NTHREADS 8
>
> main()
> {
> int i;
> int fd;
> struct flock lock;
>
> fd = open("/home/dmo/PTH/l", O_RDWR);
> if (fd == -1) {
> perror("open failed");
> exit(0);
> }
> lock.l_whence = SEEK_SET;
> lock.l_type = F_WRLCK;
> lock.l_start = 0;
> lock.l_len = 1;
>
> if (fcntl(fd, F_SETLK, &lock) == -1) {
> perror("F_SETLK failed\n");
> exit(0);
> }
>
> pthread_attr_init(&thread_attr);
> pthread_attr_setdetachstate(&thread_attr, PTHREAD_CREATE_DETACHED);
>
> for (i = 0; i < NTHREADS; i++) {
> pthread_t worker_tid;
>
> if (pthread_create(&worker_tid, &thread_attr, worker_thread,
> (void *)NULL) != 0) {
> perror("thread create failed");
> exit(1);
> }
> }
> /*sleep(30);*/
> }
>
> ---------------------- The second test program ------------------------------
> ---------------------- This tests for the failure of the first --------------
>
>
> #include <unistd.h>
> #include <fcntl.h>
>
> main()
> {
> int fd;
> struct flock lock;
>
> fd = open("/home/dmo/PTH/l", O_RDWR);
> if (fd == -1) {
> perror("open failed");
> exit(0);
> }
> lock.l_whence = SEEK_SET;
> lock.l_type = F_WRLCK;
> lock.l_start = 0;
> lock.l_len = 1;
>
> if (fcntl(fd, F_SETLK, &lock) == -1) {
> perror("F_SETLK failed\n");
> exit(0);
> }
> printf("lock succeeded\n");
> /*sleep(30);*/
> }
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
Mark Wong - - [email protected]
Open Source Development Lab Inc - A non-profit corporation
15275 SW Koll Parkway - Suite H - Beaverton OR, 97006
(503)-626-2455 x 32 (office)
(503)-626-2436 (fax)
http://www.osdl.org/archive/markw/

2002-11-25 21:17:49

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [BUG] open file descriptors remain after threaded exit() in 2.5.44


thanks for cc'ing the file locking maintainer or the linux-fsdevel
mailing lists. you know, like it says in MAINTAINERS. after all, if
you'd done that you might've got a reply telling you it's a known bug,
and even a workaround. as it is, i have no idea what Dave Olien's email
address is, so i can't send him mail.

i have a patch, it passes the LTP when run on a local filesystem, but
not over NFS which is why I haven't publicised it yet. Thanks to OSDL
for giving me access to machines to test this kind of thing on.

ftp://ftp.linux.org.uk/pub/linux/willy/patches/flock-2.5.49-2.diff

note, do not use NFS when using this patch. really; i mean it. somehow
i managed to corrupt thread_info.cpu causing _udelay_ to oops.

--
"It's not Hollywood. War is real, war is primarily not about defeat or
victory, it is about death. I've seen thousands and thousands of dead bodies.
Do you think I want to have an academic debate on this subject?" -- Robert Fisk