Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752246AbcJISky (ORCPT ); Sun, 9 Oct 2016 14:40:54 -0400 Received: from mail-oi0-f68.google.com ([209.85.218.68]:35794 "EHLO mail-oi0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751576AbcJISkw (ORCPT ); Sun, 9 Oct 2016 14:40:52 -0400 MIME-Version: 1.0 In-Reply-To: References: <20161007222059.GS19539@ZenIV.linux.org.uk> From: Linus Torvalds Date: Sun, 9 Oct 2016 11:40:27 -0700 X-Google-Sender-Auth: BfetA3omlUygV1xsOD0iAKl991c Message-ID: Subject: Re: [git pull] vfs pile 1 (splice) To: Al Viro , Andrew Morton , Jens Axboe , "Ted Ts'o" , Christoph Lameter Cc: Linux Kernel Mailing List , linux-fsdevel Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2942 Lines: 66 On Sat, Oct 8, 2016 at 11:05 PM, Linus Torvalds wrote: > > Hmm. I've now gotten two oopses today, all at __kmalloc+0xc3/0x1f0, > which seems to be the > > *(void **)(object + s->offset); > > in get_freepointer(). Actually, it's in "get_freepointer_safe()", it's just that without DEBUG_PAGEALLOC the two end up being the same. > I guess I'll need to just run with slab debugging on, but I wanted to > bring this to peoples attention in case it rings a bell for somebody. > I haven't been merging anything today, partly because of this. Hmm. When I enabled SLUB debugging, I also enabled DEBUG_PAGEALLOC, because "why not". But it turns out that that may have been a mistake, because it changes the very path that failed to no longer do that failing access (or rather, it does it as a "probe_kernel_read()", which traps and ignores the failure). So all my "careful" testing seems to have been pointless, because I enabled too much debugging, making sure that the problem cannot happen. No wonder I couldn't reproduce this. I'll continue with *just* SLUB debugging on, but I thought it was interesting how enabling more memory access debugging actually ends up changing some really subtle code. The "get_freepointer_safe()" thing is explicitly doing a read that could be to free'd memory, and it then depends on doing the this_cpu_cmpxchg_double() to abort the operation if it's no longer valid. I'm adding Christoph to the cc, not because the slub code has changed lately (this optimistic access logic is 5+ years old), but because maybe Christoph remembers what tends to trigger these kinds of issues. Christoph, the problem is that something is triggering an oops or page fault (depending on how bogus the address is) in __kmalloc() when it does that get_freepointer_safe() thing without DEBUG_PAGEALLOC. I've seen two different cases on two different boots, but they both were on that one instruction that did that void *next_object = get_freepointer_safe(s, object); access. Both were to random kmalloc'ed memory (it *may* be a very specific size that sees the corruption, but it's hard to tell, the callchains were different and in both cases depended on some dynamic length thing - once the directory entry name, in another case the xattr name length). The subject line is about Al's splice pull, but that's only one of the ones I suspect are the potential causes. It could easily be Andrew's pile (maybe that nice fsnotify locking cleanup causes double free's?), Ted's ext4 changes (didn't look whether that could have allocation pattern changes with bugs) or Jens' block layer changes. Could be elsewhere too. I saw it twice in one day which would *tend* to mean that it's recent, but maybe I was just lucky the previous days and didn't hit it. I haven't been able to repro it now, but maybe I figured out one reason why my reproductions have been failing ;) Linus