Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753264AbcJJQPO (ORCPT ); Mon, 10 Oct 2016 12:15:14 -0400 Received: from mail-oi0-f54.google.com ([209.85.218.54]:34877 "EHLO mail-oi0-f54.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752793AbcJJQPM (ORCPT ); Mon, 10 Oct 2016 12:15:12 -0400 MIME-Version: 1.0 In-Reply-To: <20161010.042401.637964142015887598.davem@davemloft.net> References: <20161009.235745.860945462339053703.davem@davemloft.net> <20161010.042401.637964142015887598.davem@davemloft.net> From: Linus Torvalds Date: Mon, 10 Oct 2016 09:15:10 -0700 X-Google-Sender-Auth: E7WVvQ_w-K1CcSPUS4MM032VqtQ Message-ID: Subject: Re: slab corruption with current -git To: David Miller Cc: Aaron Conole , Florian Westphal , Al Viro , Andrew Morton , Jens Axboe , "Theodore Ts'o" , Christoph Lameter , Pablo Neira Ayuso , Linux Kernel Mailing List , linux-fsdevel , Network Development , NetFilter Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1546 Lines: 42 On Mon, Oct 10, 2016 at 1:24 AM, David Miller wrote: > > So I've been reviewing this patch and it looks fine, but I also want > to figure out what is actually causing the OOPS and I can't spot it > yet. Yeah, I'm not actually sure the old linked list implementation is buggy - it might just be ugly. I tried to follow the old code, and I couldn't. So the patch I sent out was a combination of "that's not how you should do singly linked lists" and "those special cases make me worry". In particular, the old code really ended up doing odd things in the "can't find entry" case, because it would exit the loop with a non-NULL 'entry' just because the next entry was NULL.. > One possible way to see that oops is to free the head entry of the > chain without unlinking it. The next unregister will dereference a > POISON pointer. > > Actually... > > The POISON value comes not from a hook entry, but from the array of > pointers in the per-netns datastructure. > > This means that the netns is possibly getting freed up before we > unregister the netfilter hooks. That is certainly one possible explanation for it, yes. However, I didn't think that part had changed, had it? The other thing I find a bit odd in that new single-linked list code is this: - nf_hook_slow(): ... RCU_INIT_POINTER(state->hook_entries, entry); which makes me worried.. It copies the head entry of the list, and maybe it will then (later) end up being used stale. I don't know. But it looks a bit iffy. Linus