Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755654AbcJYAAk (ORCPT ); Mon, 24 Oct 2016 20:00:40 -0400 Received: from mail-oi0-f46.google.com ([209.85.218.46]:33434 "EHLO mail-oi0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751288AbcJYAAh (ORCPT ); Mon, 24 Oct 2016 20:00:37 -0400 MIME-Version: 1.0 In-Reply-To: References: <20161018234248.GB93792@clm-mbp.masoncoding.com> <332c8e94-a969-093f-1fb4-30d89be8993e@kernel.org> <20161020225028.czodw54tjbiwwv3o@codemonkey.org.uk> <20161020230341.jsxpia2sy53xn5l5@codemonkey.org.uk> <20161021200245.kahjzgqzdfyoe3uz@codemonkey.org.uk> <20161022152033.gkmm3l75kqjzsije@codemonkey.org.uk> <20161024044051.onmh4h6sc2bjxzzc@codemonkey.org.uk> From: Linus Torvalds Date: Mon, 24 Oct 2016 17:00:35 -0700 X-Google-Sender-Auth: dPzf7s9VHSm7xGa0r5vsYH4qBtk Message-ID: Subject: Re: bio linked list corruption. To: Andy Lutomirski Cc: Dave Jones , Chris Mason , Andy Lutomirski , Jens Axboe , Al Viro , Josef Bacik , David Sterba , linux-btrfs , Linux Kernel Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2514 Lines: 62 On Mon, Oct 24, 2016 at 3:42 PM, Andy Lutomirski wrote: > > Here's my theory: I think you're looking at the right code but the > wrong stack. shmem_fault_wait is fine, but shmem_fault_waitq looks > really dicey. Hmm. > Consider: > > fallocate calls wake_up_all(), which calls autoremove_wait_function(). > That wakes up the shmem_fault thread. Suppose that the shmem_fault > thread gets moving really quickly (presumably because it never went to > sleep in the first place -- suppose it hadn't made it to schedule(), > so schedule() turns into a no-op). It calls finish_wait() *before* > autoremove_wake_function() does the list_del_init(). finish_wait() > gets to the list_empty_careful() call and it returns true. All of this happens under inode->i_lock, so the different parts are serialized. So if this happens before the wakeup, then finish_wait() will se that the wait-queue entry is not empty (it points to the wait queue head in shmem_falloc_waitq. But then it will just remove itself carefully with list_del_init() under the waitqueue lock, and things are fine. Yes, it uses the waitiqueue lock on the other stack, but that stack is still ok since (a) we're serialized by inode->i_lock (b) this code ran before the fallocate thread catches up and exits. In other words, your scenario seems to be dependent on those two threads being able to race. But they both hold inode->i_lock in the critical region you are talking about. > Now the fallocate thread catches up and *exits*. Dave's test makes a > new thread that reuses the stack (the vmap area or the backing store). > > Now the shmem_fault thread continues on its merry way and takes > q->lock. But oh crap, q->lock is pointing at some random spot on some > other thread's stack. Kaboom! Note that q->lock should be entirely immaterial, since inode->i_lock nests outside of it in all uses. Now, if there is some code that runs *without* the inode->i_lock, then that would be a big bug. But I'm not seeing it. I do agree that some race on some stack data structure could easily be the cause of these issues. And yes, the vmap code obviously starts reusing the stack much earlier, and would trigger problems that would essentially be hidden by the fact that the kernel stack used to stay around not just until exit(), but until the process was reaped. I just think that in this case i_lock really looks like it should serialize things correctly. Or are you seeing something I'm not? Linus