Date: Mon, 4 Mar 2013 15:39:08 +0800
From: Greg KH <gregkh@linuxfoundation.org>
To: Russ Dill <russ.dill@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>,
        linux-kernel <linux-kernel@vger.kernel.org>,
        Nick Kossifidis <mickflemm@gmail.com>, "Theodore Ts'o" <tytso@mit.edu>
Subject: Re: fasync race in fs/fcntl.c
Message-ID: <20130304073908.GA982@kroah.com>
References: <CA+Bv8XZM7yO-=vrGZg5LFLik8YkQiMC9ppCgQbyi1yuLiKstJQ@mail.gmail.com>
 <20130302194923.GE4503@ZenIV.linux.org.uk>
 <CA+Bv8XY1Qr_zD1guQzeWVPCLw6JP=y5taJMwPUEO1AxPckgsjQ@mail.gmail.com>
 <CA+Bv8XaEBZz3RWh5=y-kLoEMdjxwNFBUBBxUKdq-NuV6iZ=QRg@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CA+Bv8XaEBZz3RWh5=y-kLoEMdjxwNFBUBBxUKdq-NuV6iZ=QRg@mail.gmail.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2395
Lines: 62

On Sun, Mar 03, 2013 at 10:16:10PM -0800, Russ Dill wrote:
> On Sat, Mar 2, 2013 at 4:09 PM, Russ Dill <russ.dill@gmail.com> wrote:
> > On Sat, Mar 2, 2013 at 11:49 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >> On Sat, Mar 02, 2013 at 03:00:28AM -0800, Russ Dill wrote:
> >>> I'm seeing a race in fs/fcntl.c. I'm not sure exactly how the race is
> >>> occurring, but the following is my best guess. A kernel log is
> >>> attached.
> >>
> >> [snip the analysis - it's a different lock anyway]
> >>
> >> The traces below are essentially sys_execve() getting to get_random_bytes(),
> >> to kill_fasync(), to send_sigio(), which spins on tasklist_lock.
> >>
> >> Could you rebuild it with lockdep enabled and try to reproduce that?
> >> I very much doubt that this execve() is a part of deadlock - it's
> >> getting caught on one, but it shouldn't be holding any locks that
> >> nest inside tasklist_lock at that point, so even it hadn't been there,
> >> the process holding tasklist_lock probably wouldn't have progressed any
> >> further...
> >
> > ok, I did screw up the analysis quite badly, luckily, lockdep got it right away.
> >
> 
> So lockdep gives some clues, but seems a bit confused, so here's what happened.
> 
> mix_pool_bytes /* takes nonblocking_pool.lock */
> add_device_randomness
> posix_cpu_timers_exit
> __exit_signal
> release_task /* takes write lock on tasklist_lock */
> do_exit
> __module_put_and_exit
> cryptomgr_test
> 
> send_sigio /* takes read lock on tasklist_lock */
> kill_fasync_rcu
> kill_fasync
> account /* takes nonblocking_pool.lock */
> extract_entropy
> get_random_bytes
> create_elf_tables
> load_elf_binary
> load_elf_library
> search_binary_handler
> 
> This would mark the culprit as 613370549 'random: Mix cputime from
> each thread that exits to the pool'.  So long as I'm not as crazy on
> the last analysis as this one, may I suggest a revert of this commit
> for 3.8.3?

I'll revert it, but shouldn't we fix this properly upstream in Linus's
tree as well?  I'd rather take the fix than a revert so that we don't
have a problem that no one remembers to fix until 3.9-final is out.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/