Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp5493984imm; Tue, 31 Jul 2018 11:53:35 -0700 (PDT) X-Google-Smtp-Source: AAOMgpdd3KxE4wea1v4tN+ytnNTk/Sakv8tccAhaE7Iqou0mhNB6p7kTBD0mpFjTyGPTsm4aQ4A3 X-Received: by 2002:a65:4d05:: with SMTP id i5-v6mr21152499pgt.58.1533063215284; Tue, 31 Jul 2018 11:53:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1533063215; cv=none; d=google.com; s=arc-20160816; b=EH8PQXT9AmvxJcqd1fzxmRSChNgT8z4vsy22GIBabrXjJ/75XSfDyARxiDH8ahyKlP 04DREUWGWmHT2dW5f+biJ17moJRGZrEWUKUqtsKJjNlNAxNMbnO7q3oXWOE1EgBCHW4Z bPisFhx9H8gy5cL17CpJNLlHWImXV3P/6rKBD8HpdKWb4KpPTYEB6f7x70GLzKF9kvd3 ZaCAVA7pXP2ZFdHGFV5/iB21lT1ZW93XX2fZahQD8/Zc9VZDjTK0V+W5Yoqr4RcguIxO jqHOmfHkR66V9j4UND1ZJICMNoV9E/KHdGNA2VHO3vcOqotj9qiqhIoerTEnZdLdZcoW m9ww== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature :arc-authentication-results; bh=b5s/1w6NfwWmPUI51XE91GmZmvzu/XMZo0AQLSGNCm4=; b=fVnaBI2gROiwI09I3BmE3vxFU3VJxOIab7gio3amzJW9UTmk5m5lvRmc9DC+yiLtkv mWcSux5o+9o2EwzTwyxtEB25lqwG6bFDDaM+NK5oEV0c9JgqQdYBkPDL8lD/1IIJPxEU iu7m9pCI6Z+CjhN0rbduvuan+1T0+sForOcT/3k1xlSAcF/n1zdPtlTm2DwQLT4qgxOT O/sO1uzQCCygzOfq60DTI2eSoVDy8hRXGxupzvOrRKo7h2m0CNH7Gx1keaiBnGU0/E5r GajD8WEEJwmGBta9yHCi58w/mjTeCE8BT8qx/nqHfpUIsDc9DtHhjghEir/NAzO0vQlW 5HUA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=google header.b=ZsBjdhfo; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y18-v6si13237446pll.82.2018.07.31.11.53.15; Tue, 31 Jul 2018 11:53:35 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=google header.b=ZsBjdhfo; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732094AbeGaUdu (ORCPT + 99 others); Tue, 31 Jul 2018 16:33:50 -0400 Received: from mail-it0-f65.google.com ([209.85.214.65]:52447 "EHLO mail-it0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729772AbeGaUdu (ORCPT ); Tue, 31 Jul 2018 16:33:50 -0400 Received: by mail-it0-f65.google.com with SMTP id d9-v6so6009410itf.2; Tue, 31 Jul 2018 11:52:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=b5s/1w6NfwWmPUI51XE91GmZmvzu/XMZo0AQLSGNCm4=; b=ZsBjdhfojpz5SnSKKCC08yr/0rlghwKx7KXUR8oPhEI3grzNZ6DIwH/s3YdEh8KM3C /8IZ3I4BzVexqushZv2yFcXmhXb2CxbcrEHfdczVAmEivmu+3LpZ5ynVb7DRXCgQOStX S5iAN5fbiBQ+3fC22qHy+Hl4lJPosu25IL1oQ= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=b5s/1w6NfwWmPUI51XE91GmZmvzu/XMZo0AQLSGNCm4=; b=LlpKV/kaaM+Gkf40M0v3X84cPVyysNcVtwF2+bCHc+wcCtMHl5WuZNHmWWsy8FBU0n dLSD2C46sZcYlpjxwtBy3/w6g3m6uki1B5+BXpyfW+2eNoD6fYGYzMZEkvL+XS6LBQNq Iej8oWflX6ubY92MB7s0FN21DAR/fZX3Qh9N5FlSFEXzWXWN4JdK6Bvk4STn9MdP4LNa BuEEV6gH79eQuebXeEkVKRbXZggmhH25hQgirMFyfFcXnk1VugumfU2naIxQV76NDyMN HcH+l8Iwy6aj5vGU8VK/EhqBR5lOr+tkAJa6qakse1eXvKOEYD48pDuRdKBhikL4fUkz cPuw== X-Gm-Message-State: AOUpUlFz405yaXnAVN8Rx5BZUMRxS9C4QP9BQEH/Ac3cDxsUTlsYMBcd OtFjPua08W4PtH7vO7GDwWRtYiG67MBoRs4MUk0= X-Received: by 2002:a24:5002:: with SMTP id m2-v6mr827060itb.16.1533063130555; Tue, 31 Jul 2018 11:52:10 -0700 (PDT) MIME-Version: 1.0 References: <01000164f169bc6b-c73a8353-d7d9-47ec-a782-90aadcb86bfb-000000@email.amazonses.com> In-Reply-To: From: Linus Torvalds Date: Tue, 31 Jul 2018 11:51:59 -0700 Message-ID: Subject: Re: SLAB_TYPESAFE_BY_RCU without constructors (was Re: [PATCH v4 13/17] khwasan: add hooks implementation) To: Christoph Lameter Cc: Andrey Ryabinin , "Theodore Ts'o" , Jan Kara , linux-ext4@vger.kernel.org, Greg Kroah-Hartman , Pablo Neira Ayuso , Jozsef Kadlecsik , Florian Westphal , David Miller , NetFilter , coreteam@netfilter.org, Network Development , gerrit@erg.abdn.ac.uk, dccp@vger.kernel.org, Jani Nikula , Joonas Lahtinen , Rodrigo Vivi , Dave Airlie , intel-gfx , DRI , Eric Dumazet , Alexey Kuznetsov , Hideaki YOSHIFUJI , Ursula Braun , linux-s390 , Linux Kernel Mailing List , Dmitry Vyukov , Andrew Morton , linux-mm , Andrey Konovalov Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jul 31, 2018 at 10:49 AM Linus Torvalds wrote: > > So the re-use might initialize the fields lazily, not necessarily using a ctor. In particular, the pattern that nf_conntrack uses looks like it is safe. If you have a well-defined refcount, and use "atomic_inc_not_zero()" to guard the speculative RCU access section, and use "atomic_dec_and_test()" in the freeing section, then you should be safe wrt new allocations. If you have a completely new allocation that has "random stale content", you know that it cannot be on the RCU list, so there is no speculative access that can ever see that random content. So the only case you need to worry about is a re-use allocation, and you know that the refcount will start out as zero even if you don't have a constructor. So you can think of the refcount itself as always having a zero constructor, *BUT* you need to be careful with ordering. In particular, whoever does the allocation needs to then set the refcount to a non-zero value *after* it has initialized all the other fields. And in particular, it needs to make sure that it uses the proper memory ordering to do so. And in this case, we have static struct nf_conn * __nf_conntrack_alloc(struct net *net, { ... atomic_set(&ct->ct_general.use, 0); which is a no-op for the re-use case (whether racing or not, since any "inc_not_zero" users won't touch it), but initializes it to zero for the "completely new object" case. And then, the thing that actually exposes it to the speculative walkers does: int nf_conntrack_hash_check_insert(struct nf_conn *ct) { ... smp_wmb(); /* The caller holds a reference to this object */ atomic_set(&ct->ct_general.use, 2); which means that it stays as zero until everything is actually set up, and then the optimistic walker can use the other fields (including spinlocks etc) to verify that it's actually the right thing. The smp_wmb() means that the previous initialization really will be visible before the object is visible. Side note: on some architectures it might help to make that "smp_wmb -> atomic_set()" sequence be am "smp_store_release()" instead. Doesn't matter on x86, but might matter on arm64. NOTE! One thing to be very worried about is that re-initializing whatever RCU lists means that now the RCU walker may be walking on the wrong list so the walker may do the right thing for this particular entry, but it may miss walking *other* entries. So then you can get spurious lookup failures, because the RCU walker never walked all the way to the end of the right list. That ends up being a much more subtle bug. But the nf_conntrack case seems to get that right too, see the restart in ____nf_conntrack_find(). So I don't see anything wrong in nf_conntrack. But yes, using SLAB_TYPESAFE_BY_RCU is very very subtle. But most of the subtleties have nothing to do with having a constructor, they are about those "make sure memory ordering wrt refcount is right" and "restart speculative RCU walk" issues that actually happen regardless of having a constructor or not. Linus