Date: Wed, 6 Aug 2008 06:07:04 -0600
From: Matthew Wilcox <matthew@wil.cx>
To: Rusty Russell <rusty@rustcorp.com.au>
Cc: Linus Torvalds <torvalds@osdl.org>, Andrew Morton <akpm@osdl.org>,
       linux-kernel@vger.kernel.org
Subject: Re: [PATCH] Make kthread_stop() not oops when passed a bad pointer
Message-ID: <20080806120704.GF2055@parisc-linux.org>
References: <20080805135559.GQ26461@parisc-linux.org> <200808061122.59584.rusty@rustcorp.com.au>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <200808061122.59584.rusty@rustcorp.com.au>
User-Agent: Mutt/1.5.13 (2006-08-11)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2729
Lines: 62

On Wed, Aug 06, 2008 at 11:22:58AM +1000, Rusty Russell wrote:
> I really do sympathize with your problem; but the quest is to figure out how 
> to identify it before the code is run, not to put a non-orthogonal bandaid 
> here which can hurt other cases.
> 
> How about a more ambitious "we've oopsed so break a mutex every 30 seconds of 
> waiting" patch?

I was considering something more along the lines of "we've oopsed so
find every mutex we own and release it".

> > +?????if (!k || IS_ERR(k))
> > +?????????????return -EINVAL;
> 
> 1) There's no reason that kthread_stop is uniquely difficult to use.  Why pick 
> on that one?

It was the one I hit.

> 2) I know that kfree() handles NULL, but kthread_create/kthread_run never 
> return NULL, unlike kmalloc().

I'd kzalloc'd the memory structure, then rearranged the order of calls
initialising it without rearranging the destructor.

> 3) If we really want to pass a failed kthread_create() through kthread_stop(), 
> we should return PTR_ERR(k) here.  But that should only be done if it made it 
> harder for the callers to screw up, which I don't think it does.

I'm actually really dubious about kthread_stop() returning a value at
all.  To me, returning an error implies that the function failed to do
its job, ie the thread is still running.  But that's not true; if it
returns -EINVAL, it means the thread never ran.  And why should the
caller care?  Only three callers of kthread_stop do anything with the
return value.  Two of them just put the value in a debug message, and
the third one goes to the effort of passing the return value through
three layers of function pointer calls only to have all the callers
discard it.

> 4) After a successful kthread_run(), kthread_stop() will always return the 
> value from the threadfn callback.  ie. kthread_stop() doesn't ever fail.  A 
> simple semantic, which this patch breaks.

Now I'm confused.  kthread_stop isn't failing.  It preserves the
invariant that when it returns, the thread is no longer running.

> 5) Covering up programmer errors is not good policy.  I dislike WARN_ON() 
> because an oops is much harder to miss.  Painful for you, but The System 
> Works.

I don't understand why we wouldn't want to be more robust here.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/