MIME-Version: 1.0
In-Reply-To: <CA+55aFzXgWSRYeBX-qSUWPv2uhxEQ+80poQbvwvgCbf=RsKXTg@mail.gmail.com>
References: <20161007222059.GS19539@ZenIV.linux.org.uk> <CA+55aFzXgWSRYeBX-qSUWPv2uhxEQ+80poQbvwvgCbf=RsKXTg@mail.gmail.com>
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Sun, 9 Oct 2016 11:40:27 -0700
Message-ID: <CA+55aFxJsPM0OihozMUmCecg0zdG0izVDr_=z55CXkdXU3qT+w@mail.gmail.com>
Subject: Re: [git pull] vfs pile 1 (splice)
To: Al Viro <viro@zeniv.linux.org.uk>,
        Andrew Morton <akpm@linux-foundation.org>, Jens Axboe <axboe@fb.com>,
        "Ted Ts'o" <tytso@mit.edu>, Christoph Lameter <cl@linux.com>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
        linux-fsdevel <linux-fsdevel@vger.kernel.org>
Content-Type: text/plain; charset=UTF-8
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2942
Lines: 66

On Sat, Oct 8, 2016 at 11:05 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Hmm. I've now gotten two oopses today, all at __kmalloc+0xc3/0x1f0,
> which seems to be the
>
>   *(void **)(object + s->offset);
>
> in get_freepointer().

Actually, it's in "get_freepointer_safe()", it's just that without
DEBUG_PAGEALLOC the two end up being the same.

> I guess I'll need to just run with slab debugging on, but I wanted to
> bring this to peoples attention in case it rings a bell for somebody.
> I haven't been merging anything today, partly because of this.

Hmm. When I enabled SLUB debugging, I also enabled DEBUG_PAGEALLOC,
because "why not". But it turns out that that may have been a mistake,
because it changes the very path that failed to no longer do that
failing access (or rather, it does it as a "probe_kernel_read()",
which traps and ignores the failure).

So all my "careful" testing seems to have been pointless, because I
enabled too much debugging, making sure that the problem cannot
happen. No wonder I couldn't reproduce this.

I'll continue with *just* SLUB debugging on, but I thought it was
interesting how enabling more memory access debugging actually ends up
changing some really subtle code.

The "get_freepointer_safe()" thing is explicitly doing a read that
could be to free'd memory, and it then depends on doing the
this_cpu_cmpxchg_double() to abort the operation if it's no longer
valid.

I'm adding Christoph to the cc, not because the slub code has changed
lately (this optimistic access logic is 5+ years old), but because
maybe Christoph remembers what tends to trigger these kinds of issues.

Christoph, the problem is that something is triggering an oops or page
fault (depending on how bogus the address is) in __kmalloc() when it
does that get_freepointer_safe() thing without DEBUG_PAGEALLOC. I've
seen two different cases on two different boots, but they both were on
that one instruction that did that

     void *next_object = get_freepointer_safe(s, object);

access. Both were to random kmalloc'ed memory (it *may* be a very
specific size that sees the corruption, but it's hard to tell, the
callchains were different and in both cases depended on some dynamic
length thing - once the directory entry name, in another case the
xattr name length).

The subject line is about Al's splice pull, but that's only one of the
ones I suspect are the potential causes. It could easily be Andrew's
pile (maybe that nice fsnotify locking cleanup causes double free's?),
Ted's ext4 changes (didn't look whether that could have allocation
pattern changes with bugs) or Jens' block layer changes.

Could be elsewhere too. I saw it twice in one day which would *tend*
to mean that it's recent, but maybe I was just lucky the previous days
and didn't hit it. I haven't been able to repro it now, but maybe I
figured out one reason why my reproductions have been failing ;)

                  Linus