Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759256AbZF3JHx (ORCPT ); Tue, 30 Jun 2009 05:07:53 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1754580AbZF3JHp (ORCPT ); Tue, 30 Jun 2009 05:07:45 -0400 Received: from mx2.redhat.com ([66.187.237.31]:39283 "EHLO mx2.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751479AbZF3JHo (ORCPT ); Tue, 30 Jun 2009 05:07:44 -0400 Date: Tue, 30 Jun 2009 12:07:15 +0300 From: "Michael S. Tsirkin" To: Steven Whitehouse Cc: Gregory Haskins , linux-kernel@vger.kernel.org, dhowells@redhat.com Subject: Re: [PATCH v4] slow-work: add (module*)work->ops->owner to fix races with module clients Message-ID: <20090630090715.GD29725@redhat.com> References: <20090629191653.14240.44995.stgit@dev.haskins.net> <1246351383.3383.1.camel@localhost.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1246351383.3383.1.camel@localhost.localdomain> User-Agent: Mutt/1.5.19 (2009-01-05) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 8747 Lines: 237 On Tue, Jun 30, 2009 at 09:43:03AM +0100, Steven Whitehouse wrote: > Hi, > > I'm happy to ACK this, but the race doesn't exist in GFS2's case because > we wait for all work related to each GFS2 fs at umount time and the > module unload cannot happen until all GFS2 fs are umounted, > > Steve. I wonder whether the following holds: static void gfs2_recover_put_ref(struct slow_work *work) { struct gfs2_jdesc *jd = container_of(work, struct gfs2_jdesc, jd_work); clear_bit(JDF_RECOVERY, &jd->jd_flags); smp_mb__after_clear_bit(); wake_up_bit(&jd->jd_flags, JDF_RECOVERY); <- umount can complete here? } If yes, .text of the module could go away between the point marked by <- and return from gfs2_recover_put_ref. > On Mon, 2009-06-29 at 15:24 -0400, Gregory Haskins wrote: > > (Applies to Linus' linux-2.6.git/master:626f380d) > > > > [ Changelog: > > > > v4: > > *) added ".owner = THIS_MODULE" fields to all current slow-work > > clients (fscache, gfs2). > > > > v3: > > *) moved (module*)owner to slow_work_ops > > *) removed useless barrier() > > *) updated documentation/comments > > > > v2: > > *) cache "owner" value to prevent invalid access after put_ref > > > > v1: > > *) initial release > > ] > > > > I've retained Michael's "Reviewed-by:" tag from v3 since v4 is identical > > in every way except the new hunks added to gfs2/fscache that he asked for. > > > > Michael, if you want to recind your tag, please speak up. > > > > Otherwise, please consider for inclusion. > > > > Regards, > > -Greg > > > > --------------------------------- > > > > slow-work: add (module*)work->ops->owner to fix races with module clients > > > > The slow_work facility was designed to use reference counting instead of > > barriers for synchronization. The reference counting mechanism is > > implemented as a vtable op (->get_ref, ->put_ref) callback. This is > > problematic for module use of the slow_work facility because it is > > impossible to synchronize against the .text installed in the callbacks: > > There is no way to ensure that the slow-work threads have completely > > exited the .text in question and rmmod may yank it out from under the > > slow_work thread. > > > > This patch attempts to address this issue by mapping "struct module* owner" > > to the slow_work_ops item, and maintaining a module reference > > count coincident with the more externally visible reference count. Since > > the slow_work facility is resident in kernel, it should be a race-free > > location to issue a module_put() call. This will ensure that modules > > can properly cleanup before exiting. > > > > A module_get()/module_put() pair on slow_work_enqueue() and the subsequent > > dequeue technically adds the overhead of the atomic operations for every > > work item scheduled. However, slow_work is designed for deferring > > relatively long-running and/or sleepy tasks to begin with, so this > > overhead will hopefully be negligible. > > > > Signed-off-by: Gregory Haskins > > Reviewed-by: Michael S. Tsirkin > > CC: David Howells > > CC: Steven Whitehouse > > --- > > > > Documentation/slow-work.txt | 6 +++++- > > fs/fscache/object.c | 1 + > > fs/fscache/operation.c | 1 + > > fs/gfs2/recovery.c | 1 + > > include/linux/slow-work.h | 3 +++ > > kernel/slow-work.c | 20 +++++++++++++++++++- > > 6 files changed, 30 insertions(+), 2 deletions(-) > > > > diff --git a/Documentation/slow-work.txt b/Documentation/slow-work.txt > > index ebc50f8..2a38878 100644 > > --- a/Documentation/slow-work.txt > > +++ b/Documentation/slow-work.txt > > @@ -80,6 +80,7 @@ Slow work items may then be set up by: > > (2) Declaring the operations to be used for this item: > > > > struct slow_work_ops myitem_ops = { > > + .owner = THIS_MODULE, > > .get_ref = myitem_get_ref, > > .put_ref = myitem_put_ref, > > .execute = myitem_execute, > > @@ -102,7 +103,10 @@ A suitably set up work item can then be enqueued for processing: > > int ret = slow_work_enqueue(&myitem); > > > > This will return a -ve error if the thread pool is unable to gain a reference > > -on the item, 0 otherwise. > > +on the item, 0 otherwise. Loadable modules may only enqueue work if at least > > +one reference to the module is known to be held. The slow-work infrastructure > > +will acquire a reference to the module and hold it until after the item's > > +reference is dropped, assuring the stability of the callback. > > > > > > The items are reference counted, so there ought to be no need for a flush > > diff --git a/fs/fscache/object.c b/fs/fscache/object.c > > index 392a41b..d236eb1 100644 > > --- a/fs/fscache/object.c > > +++ b/fs/fscache/object.c > > @@ -45,6 +45,7 @@ static void fscache_enqueue_dependents(struct fscache_object *); > > static void fscache_dequeue_object(struct fscache_object *); > > > > const struct slow_work_ops fscache_object_slow_work_ops = { > > + .owner = THIS_MODULE, > > .get_ref = fscache_object_slow_work_get_ref, > > .put_ref = fscache_object_slow_work_put_ref, > > .execute = fscache_object_slow_work_execute, > > diff --git a/fs/fscache/operation.c b/fs/fscache/operation.c > > index e7f8d53..f1a2857 100644 > > --- a/fs/fscache/operation.c > > +++ b/fs/fscache/operation.c > > @@ -453,6 +453,7 @@ static void fscache_op_execute(struct slow_work *work) > > } > > > > const struct slow_work_ops fscache_op_slow_work_ops = { > > + .owner = THIS_MODULE, > > .get_ref = fscache_op_get_ref, > > .put_ref = fscache_op_put_ref, > > .execute = fscache_op_execute, > > diff --git a/fs/gfs2/recovery.c b/fs/gfs2/recovery.c > > index 59d2695..0c2a6aa 100644 > > --- a/fs/gfs2/recovery.c > > +++ b/fs/gfs2/recovery.c > > @@ -593,6 +593,7 @@ fail: > > } > > > > struct slow_work_ops gfs2_recover_ops = { > > + .owner = THIS_MODULE, > > .get_ref = gfs2_recover_get_ref, > > .put_ref = gfs2_recover_put_ref, > > .execute = gfs2_recover_work, > > diff --git a/include/linux/slow-work.h b/include/linux/slow-work.h > > index b65c888..1382918 100644 > > --- a/include/linux/slow-work.h > > +++ b/include/linux/slow-work.h > > @@ -17,6 +17,7 @@ > > #ifdef CONFIG_SLOW_WORK > > > > #include > > +#include > > > > struct slow_work; > > > > @@ -24,6 +25,8 @@ struct slow_work; > > * The operations used to support slow work items > > */ > > struct slow_work_ops { > > + struct module *owner; > > + > > /* get a ref on a work item > > * - return 0 if successful, -ve if not > > */ > > diff --git a/kernel/slow-work.c b/kernel/slow-work.c > > index 09d7519..18dee34 100644 > > --- a/kernel/slow-work.c > > +++ b/kernel/slow-work.c > > @@ -145,6 +145,15 @@ static unsigned slow_work_calc_vsmax(void) > > return min(vsmax, slow_work_max_threads - 1); > > } > > > > +static void slow_work_put(struct slow_work *work) > > +{ > > + /* cache values that are needed during/after pointer invalidation */ > > + struct module *owner = work->ops->owner; > > + > > + work->ops->put_ref(work); > > + module_put(owner); > > +} > > + > > /* > > * Attempt to execute stuff queued on a slow thread. Return true if we managed > > * it, false if there was nothing to do. > > @@ -219,7 +228,7 @@ static bool slow_work_execute(void) > > spin_unlock_irq(&slow_work_queue_lock); > > } > > > > - work->ops->put_ref(work); > > + slow_work_put(work); > > return true; > > > > auto_requeue: > > @@ -299,6 +308,14 @@ int slow_work_enqueue(struct slow_work *work) > > if (test_bit(SLOW_WORK_EXECUTING, &work->flags)) { > > set_bit(SLOW_WORK_ENQ_DEFERRED, &work->flags); > > } else { > > + /* > > + * Callers must ensure that their module has at least > > + * one reference held while the work is enqueued. We > > + * will acquire another reference here and drop it > > + * once we do the last ops->put_ref() > > + */ > > + __module_get(work->ops->owner); > > + > > if (work->ops->get_ref(work) < 0) > > goto cant_get_ref; > > if (test_bit(SLOW_WORK_VERY_SLOW, &work->flags)) > > @@ -313,6 +330,7 @@ int slow_work_enqueue(struct slow_work *work) > > return 0; > > > > cant_get_ref: > > + module_put(work->ops->owner); > > spin_unlock_irqrestore(&slow_work_queue_lock, flags); > > return -EAGAIN; > > } > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/