Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757430AbYA3Vva (ORCPT ); Wed, 30 Jan 2008 16:51:30 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752907AbYA3VvN (ORCPT ); Wed, 30 Jan 2008 16:51:13 -0500 Received: from e5.ny.us.ibm.com ([32.97.182.145]:58153 "EHLO e5.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756128AbYA3VvH (ORCPT ); Wed, 30 Jan 2008 16:51:07 -0500 Date: Wed, 30 Jan 2008 15:26:50 -0600 From: "Serge E. Hallyn" To: pierre.peiffer@bull.net Cc: linux-kernel@vger.kernel.org, containers@lists.linux-foundation.org Subject: Re: [PATCH 2.6.24-rc8-mm1 12/15] (RFC) IPC/semaphores: make use of RCU to free the sem_undo_list Message-ID: <20080130212650.GA8945@sergelap.austin.rr.com> References: <20080129160229.612172683@bull.net> <20080129162131.743970100@bull.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20080129162131.743970100@bull.net> User-Agent: Mutt/1.5.16 (2007-06-09) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6117 Lines: 192 Quoting pierre.peiffer@bull.net (pierre.peiffer@bull.net): > From: Pierre Peiffer > > Today, the sem_undo_list is freed when the last task using it exits. > There is no mechanism in place, that allows a safe concurrent access to > the sem_undo_list of a target task and protects efficiently against a > task-exit. > > That is okay for now as we don't need this. > > As I would like to provide a /proc interface to access this data, I need > such a safe access, without blocking the target task if possible. > > This patch proposes to introduce the use of RCU to delay the real free of > these sem_undo_list structures. They can then be accessed in a safe manner > by any tasks inside read critical section, this way: > > struct sem_undo_list *undo_list; > int ret; > ... > rcu_read_lock(); > undo_list = rcu_dereference(task->sysvsem.undo_list); > if (undo_list) > ret = atomic_inc_not_zero(&undo_list->refcnt); > rcu_read_unlock(); > ... > if (undo_list && ret) { > /* section where undo_list can be used quietly */ > ... > } > ... And of course then if (atomic_dec_and_test(&undo_list->refcnt)) free_semundo_list(undo_list); by that task. > > Signed-off-by: Pierre Peiffer Looks correct in terms of locking/refcounting. Signed-off-by: Serge Hallyn thanks, -serge > --- > > include/linux/sem.h | 7 +++++-- > ipc/sem.c | 42 ++++++++++++++++++++++++++---------------- > 2 files changed, 31 insertions(+), 18 deletions(-) > > Index: b/include/linux/sem.h > =================================================================== > --- a/include/linux/sem.h > +++ b/include/linux/sem.h > @@ -115,7 +115,8 @@ struct sem_queue { > }; > > /* Each task has a list of undo requests. They are executed automatically > - * when the process exits. > + * when the last refcnt of sem_undo_list is released (ie when the process exits > + * in the general case) > */ > struct sem_undo { > struct sem_undo * proc_next; /* next entry on this process */ > @@ -125,12 +126,14 @@ struct sem_undo { > }; > > /* sem_undo_list controls shared access to the list of sem_undo structures > - * that may be shared among all a CLONE_SYSVSEM task group. > + * that may be shared among all a CLONE_SYSVSEM task group or with an external > + * process which changes the list through procfs. > */ > struct sem_undo_list { > atomic_t refcnt; > spinlock_t lock; > struct sem_undo *proc_list; > + struct ipc_namespace *ns; > }; > > struct sysv_sem { > Index: b/ipc/sem.c > =================================================================== > --- a/ipc/sem.c > +++ b/ipc/sem.c > @@ -1038,6 +1038,7 @@ static inline int get_undo_list(struct s > return -ENOMEM; > spin_lock_init(&undo_list->lock); > atomic_set(&undo_list->refcnt, 1); > + undo_list->ns = get_ipc_ns(current->nsproxy->ipc_ns); > current->sysvsem.undo_list = undo_list; > } > *undo_listp = undo_list; > @@ -1316,7 +1317,8 @@ int copy_semundo(unsigned long clone_fla > } > > /* > - * add semadj values to semaphores, free undo structures. > + * add semadj values to semaphores, free undo structures, if there is no > + * more user. > * undo structures are not freed when semaphore arrays are destroyed > * so some of them may be out of date. > * IMPLEMENTATION NOTE: There is some confusion over whether the > @@ -1326,23 +1328,17 @@ int copy_semundo(unsigned long clone_fla > * The original implementation attempted to do this (queue and wait). > * The current implementation does not do so. The POSIX standard > * and SVID should be consulted to determine what behavior is mandated. > + * > + * Note: > + * A concurrent task is only allowed to access and go through the list > + * of sem_undo if it successfully grabs a refcnt. > */ > -void exit_sem(struct task_struct *tsk) > +static void free_semundo_list(struct sem_undo_list *undo_list) > { > - struct sem_undo_list *undo_list; > struct sem_undo *u, **up; > - struct ipc_namespace *ns; > > - undo_list = tsk->sysvsem.undo_list; > - if (!undo_list) > - return; > - > - if (!atomic_dec_and_test(&undo_list->refcnt)) > - return; > - > - ns = tsk->nsproxy->ipc_ns; > - /* There's no need to hold the semundo list lock, as current > - * is the last task exiting for this undo list. > + /* There's no need to hold the semundo list lock, as there are > + * no more tasks or possible users for this undo list. > */ > for (up = &undo_list->proc_list; (u = *up); *up = u->proc_next, kfree(u)) { > struct sem_array *sma; > @@ -1354,7 +1350,7 @@ void exit_sem(struct task_struct *tsk) > > if(semid == -1) > continue; > - sma = sem_lock(ns, semid); > + sma = sem_lock(undo_list->ns, semid); > if (IS_ERR(sma)) > continue; > > @@ -1368,7 +1364,8 @@ void exit_sem(struct task_struct *tsk) > if (u == un) > goto found; > } > - printk ("exit_sem undo list error id=%d\n", u->semid); > + printk(KERN_ERR "free_semundo_list error id=%d\n", > + u->semid); > goto next_entry; > found: > *unp = un->id_next; > @@ -1404,9 +1401,22 @@ found: > next_entry: > sem_unlock(sma); > } > + put_ipc_ns(undo_list->ns); > kfree(undo_list); > } > > +/* called from do_exit() */ > +void exit_sem(struct task_struct *tsk) > +{ > + struct sem_undo_list *ul = tsk->sysvsem.undo_list; > + if (ul) { > + rcu_assign_pointer(tsk->sysvsem.undo_list, NULL); > + synchronize_rcu(); > + if (atomic_dec_and_test(&ul->refcnt)) > + free_semundo_list(ul); > + } > +} > + > #ifdef CONFIG_PROC_FS > static int sysvipc_sem_proc_show(struct seq_file *s, void *it) > { > > -- > Pierre Peiffer > _______________________________________________ > Containers mailing list > Containers@lists.linux-foundation.org > https://lists.linux-foundation.org/mailman/listinfo/containers -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/