Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752336Ab3JWJIe (ORCPT ); Wed, 23 Oct 2013 05:08:34 -0400 Received: from mail-wi0-f180.google.com ([209.85.212.180]:61836 "EHLO mail-wi0-f180.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751052Ab3JWJIb (ORCPT ); Wed, 23 Oct 2013 05:08:31 -0400 MIME-Version: 1.0 In-Reply-To: References: <20131014154627.GA9525@redhat.com> Date: Wed, 23 Oct 2013 11:08:30 +0200 X-Google-Sender-Auth: AeA0yaVmX8TUmNsm-N2Uc2bFWP0 Message-ID: Subject: Re: epoll oops. From: Pekka Enberg To: Linus Torvalds Cc: Dave Jones , Linux Kernel , Al Viro , Davide Libenzi , Eric Wong , Oleg Nesterov , Peter Hurley Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3102 Lines: 68 Hi Linus, On Mon, Oct 14, 2013 at 10:57 PM, Linus Torvalds wrote: > [ Adding Pekka to verify the SLAB_DESTROY_BY_RCU semantics, and Peter > Hurley due to the possible tty association ] > > On Mon, Oct 14, 2013 at 10:31 AM, Linus Torvalds > wrote: >> >> Oleg, does this trigger any memory for you? Commit 971316f0503a >> ("epoll: ep_unregister_pollwait() can use the freed pwq->whead") just >> makes me go "Hmm, this is *exactly* that that commit is talking >> about.." > > Ok, Oleg, going back to that whole thread, I think that old bug went like this: > > (a) normally all the wait-queues that epoll accesses are associated > with files, and as such they cannot go away for any normal file > activity. If the file exists, the waitqueue used for poll() on that > file must exist. > > (b) signalfd is special, and it does a > > poll_wait(file, ¤t->sighand->signalfd_wqh); > > which means that the wait-queue isn't associated with the file > lifetime at all. It cleans it up with signalfd_cleanup() if the signal > handlers are removed. Normal (non-epoll) handling is safe, because > "current->sighand" obviously cannot go away as long as the current > thread (doing the polling) is in its poll/select handling. > > (c) as a result, epoll and exit() can race, since the normal epoll > cleanup() is serialized by the file being closed, and we're missing > that for the case of sighand going away. > > (d) we have this magic POLLFREE protocol to make signal handling > cleanup inform the epoll logic that "oops, this is going away", and we > depend on the underlying sighand data not going away thanks to the > eventual destruction of the slab being delayed by RCU. > > (e) we are also very careful to only ever initialize the signalfd_wqh > entry in the SLAB *constructor*, because we cannot do it at every > allocation: it might still be in reused as long as it exists in the > slab cache: the SLAB_DESTROY_BY_RCU flag does *not* delay individual > slab entries, it only delays the final free of the underlying memory > allocation. > > (f) to make things even more exciting, the SLAB_DESTROY_BY_RCU depend > on the slab implementation: slub and slob seem to delay each > individual allocation (and do ctor/dtor on every allocation), while > slab does that "delay only the underlying big page allocator" thing. So I'm not completely sure what you wanted me to verify Linus but yes SLAB_DESTROY_BY_RCU only guarantees that the underlying page doesn't go away for RCU but we're free to reuse the object. Anyone using the object passed to kmem_cache_free() with SLAB_DESTROY_BY_RCU must check that it's in fact the object we're interested in. There's example code in a SLAB_DESTROY_BY_RCU comment in added by PeterZ. Pekka -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/