Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id ; Mon, 4 Feb 2002 22:28:57 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id ; Mon, 4 Feb 2002 22:28:39 -0500 Received: from e21.nc.us.ibm.com ([32.97.136.227]:424 "EHLO e21.nc.us.ibm.com") by vger.kernel.org with ESMTP id ; Mon, 4 Feb 2002 22:28:24 -0500 Date: Mon, 4 Feb 2002 19:28:17 -0800 (PST) From: Dave Olien Message-Id: <200202050328.g153SH815883@eng2.beaverton.ibm.com> To: hch@ns.caldera.de, linux-kernel@vger.kernel.org Subject: [PATCH] 2.5 pthread support for SEM_UNDO in semop() Cc: marcelo@conectiva.com.br, torvalds@transmeta.com Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org [PATCH] 2.5 pthread support for SEM_UNDO in semop() I've included fixes to the patch I submitted last week, from Christoph Hellwig. In particular: 1. I've removed or replaced discussions of "processes" with references to "thread groups" on the lines of code modified by this patch. 2. I've introduced the new flag CLONE_SYSVIPC into sched.h, and used this instead of the collection of flags my previous patch used to determine whether SEM_UNDO state should be shared at __clone() time. I've noticed that linux 2.5 has already introduced a new flag, CLONE_NEWNS. I've "reserved" space in the 2.4.17 patch for the expected new flag in the 2.5 kernel. I'm not sure this was the right thing to do. The hope is that the future version of glibc will work on both 2.4 and 2.5 kernels. How is this sort of thing supposed to be handled? This latest change as the unfortunate aspect that the glibc linuxthreads library need to be changed to pass this new thread to __clone(). The soonest this changed library can be available is in 6 months, as part of glibc verison 2.3. I'll be submitting patches to the glibc mailing list. I've also included mail a new test program, that doesn't use pthread_create(). Instead, it calls __clone() directly. This means the kernel can be tested without a new glibc. ------------------------ Explanation of Problem------------------------- The Linux semop() System V semaphore SEM_UNDO should perform SEM_UNDO cleanup only when the entire task group exits, not when the pthread that did the semop() exits. Following is a brief explanation of the problem, followed by the test program, followed by a patch for linux 2.4.17. The test program is pretty simple. The application creates several threads using pthread_create(). When one thread performs a semop() with the SEM_UNDO flag set, the change in sempaphore value performed by that semop should be "undone" only when the entire application exits, NOT when that thread exits. This would be consistent with the description on the semop(2) manual page, and also with the semop() implementation on many other UNIX implementations. However, In the current Linux implementation, the semop is "undone" when the thread that performed the semop() exits. This is too soon. This behavior has created difficulty in porting to Linux some threaded applications that use semop(). In the current Linux implementation, each thread currently maintains its own private list of semundo structures, referenced by current->semundo. The bug is that the list of sem_undo structures for a thread group should be shared among all the threads in that thread group. This patch adds a structure (sem_undohd) that is shared by all the threads in the threaded application, and controls access to the shared list of sem_undo structures. current->semundo references are replaced with current->semundohd references. I've tried to minimize the impact on tasks that are not using SEM_UNDO semop() operations, or that are not using pthread_create(). ---------------------- Test Program Output ----------------------------------- Below is a test program output. The test program's output SHOULD look like: Waiter, pid = 11490 Poster, pid = 11490, posting Poster posted Poster exiting Waiter waiting, pid = 11490 Waiter done waiting The Incorrect output on Linux is: Waiter, pid = 712 Poster, pid = 713, posting Poster posted Poster exiting Waiter waiting, pid = 712 ------------------------Test Program using __clone()------------------------- #include #include #include #include #define CLONE_VM 0x00000100 #define CLONE_FS 0x00000200 #define CLONE_FILES 0x00000400 #define CLONE_SIGHAND 0x00000800 #define CLONE_SYSVIPC 0x00040000 #define KEY IPC_PRIVATE #define NUMCLONES 2 int waiter_stack[1024]; int poster_stack[1024]; void *retval[NUMCLONES]; void * waiter(void *); void * poster(void *); struct sembuf Psembuf = {0, -1, SEM_UNDO}; struct sembuf Vsembuf = {0, 1, SEM_UNDO}; int sem_id; char *semgeterr = "semget failed\n"; char *clone_waiter = "clone waiter\n"; char *clone_poster = "clone poster\n"; char *waiter_cloned = "waiter cloned\n"; char *poster_cloned = "poster cloned\n"; main() { int i, pid, rc; /* Create the semaphore set */ sem_id = semget(KEY, 1, 0666 | IPC_CREAT); if (sem_id < 0) { printf ("semget failed, errno = %d\n", errno); exit (1); } /* Create the threads */ for (i=0; isemundohd; + if ((undohd != NULL) && (atomic_read(&undohd->refcnt) != 1)) + spin_lock(&undohd->lock); +} + +/* This code has an interesting interaction with copy_semundo(): + * two tasks could have been sharing the semundohd at the time "first" one + * of those tasks acquires the lock acquired in lock_semundo. If the other + * tasks exits before * "the first one" releases the lock (by calling + * unlock_semundo), then the spin_unlock would NOT be called. This would + * leave the semundohd in a locked state. This would NOT be a problem unless + * the remaining task once again creates a new task that once again shares the + * semundohd. Cleanup up this last case is dealt with in copy_semundo by + * having it reinitialize the spin_lock when it once again creates a second + * task sharing the semundo. + */ +static inline void unlock_semundo(void) +{ + struct sem_undohd *undohd; + + undohd = current->semundohd; + if ((undohd != NULL) && (atomic_read(&undohd->refcnt) != 1)) + spin_unlock(&undohd->lock); +} + + +/* If the task doesn't already have a semundohd, then allocate one + * here. We guarantee there is only one thread using this undo list, + * and current is THE ONE + * + * If this allocation and assignment succeeds, but later + * portions of this code fail, there is no need to free the sem_undohd. + * Just let it stay associated with the task, and it'll be freed later + * at exit time. + * + * This can block, so callers must hold no locks. + */ +static inline int get_undohd(struct sem_undohd **undohdp) +{ + struct sem_undohd *undohd; + int size; + + undohd = current->semundohd; + if (!undohd) { + size = sizeof(struct sem_undohd); + undohd = (struct sem_undohd *) kmalloc(size, GFP_KERNEL); + if (undohd == NULL) + return -ENOMEM; + memset(undohd, 0, size); + /* don't initialize unodhd->lock here. It's done + * in copy_semundo() instead. + */ + atomic_set(&undohd->refcnt, 1); + current->semundohd = undohd; + } + *undohdp = undohd; + return 0; +} + +static struct sem_undo* freeundos(struct sem_undo* un) { struct sem_undo* u; struct sem_undo** up; - for(up = ¤t->semundo;(u=*up);up=&u->proc_next) { + for(up = ¤t->semundohd->proc_list;(u=*up);up=&u->proc_next) { if(un==u) { un=u->proc_next; *up=un; @@ -805,33 +868,87 @@ return un->proc_next; } -/* returns without sem_lock on error! */ +static inline struct sem_undo *find_undo(int semid) +{ + struct sem_undo *un; + + un = NULL; + if (current->semundohd != NULL) { + un = current->semundohd->proc_list; + } + while(un != NULL) { + if(un->semid==semid) + break; + if(un->semid==-1) + un=freeundos(un); + else + un=un->proc_next; + } + return un; +} + +/* returns without sem_lock and semundo list locks on error! */ static int alloc_undo(struct sem_array *sma, struct sem_undo** unp, int semid, int alter) { int size, nsems, error; - struct sem_undo *un; + struct sem_undo *un, *new_un; + struct sem_undohd *unhd; + unsigned long saved_add_count; + nsems = sma->sem_nsems; - size = sizeof(struct sem_undo) + sizeof(short)*nsems; + saved_add_count = 0; + if (current->semundohd != NULL) + saved_add_count = current->semundohd->add_count; sem_unlock(semid); + unlock_semundo(); + error = get_undohd(&unhd); + if (error) + return error; + + size = sizeof(struct sem_undo) + sizeof(short)*nsems; un = (struct sem_undo *) kmalloc(size, GFP_KERNEL); if (!un) return -ENOMEM; memset(un, 0, size); + lock_semundo(); error = sem_revalidate(semid, sma, nsems, alter ? S_IWUGO : S_IRUGO); if(error) { + unlock_semundo(); kfree(un); return error; } - un->semadj = (short *) &un[1]; - un->semid = semid; - un->proc_next = current->semundo; - current->semundo = un; - un->id_next = sma->undo; - sma->undo = un; + + /* alloc_undo has just + * released all locks and reacquired them. + * But, another thread may have + * added the semundo we were looking for + * during that time. + * So, we check for it again. + * only initialize and add the new one + * if we don't discover one. + */ + new_un = NULL; + if (current->semundohd->add_count != saved_add_count) + new_un = find_undo(semid); + + if (new_un != NULL) { + if (sma->undo != new_un) + BUG(); + kfree(un); + un = new_un; + } else { + current->semundohd->add_count++; + un->semadj = (short *) &un[1]; + un->semid = semid; + un->proc_next = unhd->proc_list; + unhd->proc_list = un; + un->id_next = sma->undo; + sma->undo = un; + } *unp = un; return 0; } @@ -846,6 +963,7 @@ int undos = 0, decrease = 0, alter = 0; struct sem_queue queue; + if (nsops < 1 || semid < 0) return -EINVAL; if (nsops > sc_semopm) @@ -859,17 +977,18 @@ error=-EFAULT; goto out_free; } + lock_semundo(); sma = sem_lock(semid); error=-EINVAL; if(sma==NULL) - goto out_free; + goto out_semundo_free; error = -EIDRM; if (sem_checkid(sma,semid)) - goto out_unlock_free; + goto out_unlock_semundo_free; error = -EFBIG; for (sop = sops; sop < sops + nsops; sop++) { if (sop->sem_num >= sma->sem_nsems) - goto out_unlock_free; + goto out_unlock_semundo_free; if (sop->sem_flg & SEM_UNDO) undos++; if (sop->sem_op < 0) @@ -881,24 +1000,18 @@ error = -EACCES; if (ipcperms(&sma->sem_perm, alter ? S_IWUGO : S_IRUGO)) - goto out_unlock_free; + goto out_unlock_semundo_free; if (undos) { /* Make sure we have an undo structure * for this process and this semaphore set. */ - un=current->semundo; - while(un != NULL) { - if(un->semid==semid) - break; - if(un->semid==-1) - un=freeundos(sma,un); - else - un=un->proc_next; - } + + un = find_undo(semid); if (!un) { error = alloc_undo(sma,&un,semid,alter); - if(error) + if (error) goto out_free; + } } else un = NULL; @@ -930,16 +1043,18 @@ queue.sleeper = current; current->state = TASK_INTERRUPTIBLE; sem_unlock(semid); + unlock_semundo(); schedule(); + lock_semundo(); tmp = sem_lock(semid); if(tmp==NULL) { if(queue.prev != NULL) BUG(); current->semsleeping = NULL; error = -EIDRM; - goto out_free; + goto out_semundo_free; } /* * If queue.status == 1 we where woken up and @@ -960,7 +1075,7 @@ break; /* Everything done by update_queue */ current->semsleeping = NULL; - goto out_unlock_free; + goto out_unlock_semundo_free; } } current->semsleeping = NULL; @@ -968,14 +1083,58 @@ update: if (alter) update_queue (sma); -out_unlock_free: +out_unlock_semundo_free: sem_unlock(semid); +out_semundo_free: + unlock_semundo(); out_free: if(sops != fast_sops) kfree(sops); return error; } +/* If CLONE_SYSVIPC is set, establish sharing of SEM_UNDO state between + * parent and child tasks. + * + * See the notes above unlock_semundo() regarding the spin_lock_init() + * in this code. Initialize the undohd->lock here instead of get_undohd() + * because of the reasoning in the note referenced here. + */ + +int copy_semundo(unsigned long clone_flags, struct task_struct *tsk) +{ + struct sem_undohd *undohd; + int error; + + if (clone_flags & CLONE_SYSVIPC) { + error = get_undohd(&undohd); + if (error) + return error; + if (atomic_read(&undohd->refcnt) == 1) + spin_lock_init(&undohd->lock); + atomic_inc(&undohd->refcnt); + tsk->semundohd = undohd; + } else + tsk->semundohd = NULL; + + return 0; +} + +static inline void __exit_semundo(struct task_struct *tsk) +{ + struct sem_undohd *unhd; + + unhd = tsk->semundohd; + if (!atomic_dec_and_test(&unhd->refcnt)) + kfree(unhd); +} + +void exit_semundo(struct task_struct *tsk) +{ + if (tsk->semundohd != NULL) + __exit_semundo(tsk); +} + /* * add semadj values to semaphores, free undo structures. * undo structures are not freed when semaphore arrays are destroyed @@ -993,6 +1152,7 @@ struct sem_queue *q; struct sem_undo *u, *un = NULL, **up, **unp; struct sem_array *sma; + struct sem_undohd *undohd; int nsems, i; /* If the current process was sleeping for a semaphore, @@ -1012,7 +1172,14 @@ sem_unlock(semid); } - for (up = ¤t->semundo; (u = *up); *up = u->proc_next, kfree(u)) { + undohd = current->semundohd; + if ((undohd == NULL) || (atomic_read(&undohd->refcnt) != 1)) + return; + + /* There's no need to hold the semundo list lock, as current + * is the last task exiting for this undo list. + */ + for (up = &undohd->proc_list; (u = *up); *up = u->proc_next, kfree(u)) { int semid = u->semid; if(semid == -1) continue; @@ -1050,7 +1217,7 @@ next_entry: sem_unlock(semid); } - current->semundo = NULL; + __exit_semundo(current); } #ifdef CONFIG_PROC_FS diff -urN --exclude-from=/usr/src/dontdiff linux-2.5.2_original/linux/ipc/util.c linux-2.5.2_semundo/linux/ipc/util.c --- linux-2.5.2_original/linux/ipc/util.c Sun Aug 12 17:37:53 2001 +++ linux-2.5.2_semundo/linux/ipc/util.c Mon Jan 21 19:08:40 2002 @@ -340,6 +340,17 @@ * Dummy functions when SYSV IPC isn't configured */ +int copy_semundo(unsigned long clone_flags, struct task_struct *tsk) +{ + return 0; +} + +void exit_semundo(struct task_struct *tsk) +{ + return; +} + + void sem_exit (void) { return; diff -urN --exclude-from=/usr/src/dontdiff linux-2.5.2_original/linux/kernel/fork.c linux-2.5.2_semundo/linux/kernel/fork.c --- linux-2.5.2_original/linux/kernel/fork.c Wed Jan 9 16:16:28 2002 +++ linux-2.5.2_semundo/linux/kernel/fork.c Mon Jan 21 19:19:10 2002 @@ -27,6 +27,9 @@ #include #include +extern int copy_semundo(unsigned long clone_flags, struct task_struct *tsk); +extern void exit_semundo(struct task_struct *tsk); + /* The idle threads do not count.. */ int nr_threads; @@ -662,8 +665,10 @@ retval = -ENOMEM; /* copy all the process information */ - if (copy_files(clone_flags, p)) + if (copy_semundo(clone_flags, p)) goto bad_fork_cleanup; + if (copy_files(clone_flags, p)) + goto bad_fork_cleanup_semundo; if (copy_fs(clone_flags, p)) goto bad_fork_cleanup_files; if (copy_sighand(clone_flags, p)) @@ -675,7 +680,6 @@ retval = copy_thread(0, clone_flags, stack_start, stack_size, p, regs); if (retval) goto bad_fork_cleanup_namespace; - p->semundo = NULL; /* Our parent execution domain becomes current domain These must match for thread signalling to apply */ @@ -773,6 +777,8 @@ exit_fs(p); /* blocking */ bad_fork_cleanup_files: exit_files(p); /* blocking */ +bad_fork_cleanup_semundo: + exit_semundo(p); bad_fork_cleanup: put_exec_domain(p->exec_domain); if (p->binfmt && p->binfmt->module) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/