Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp802302imm; Wed, 1 Aug 2018 05:40:07 -0700 (PDT) X-Google-Smtp-Source: AAOMgpelctmElOigz9OvVQTQeeHbrYoaYAcmIvS1SoYpd1p9zNwZIfHma7YgXl4u/+Q4FUOj4YlY X-Received: by 2002:a17:902:8a97:: with SMTP id p23-v6mr12597785plo.21.1533127207376; Wed, 01 Aug 2018 05:40:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1533127207; cv=none; d=google.com; s=arc-20160816; b=llDckV9D2MtC87pWN5Ciy/qoJY3o+hiky0GQ+jpsMBHXDBZYs4tri7nQEWtG9WZrkN H9j8v81fg85UOBJLUrNn4VfdaV0jffwTcigwrAJNlSmcUNVz+m0TygdTiUQ7GUKOCCTN 6BR6/PK8GnnGsgCw+cO7LiPFtKU/mUqwQgcBliKeZoNJzc8HKkG75GuBUtcZdxXtfEtb pvx3bswoAm0fmIaUBDnDV6KPSylSgPalQdwyP4z4tUHVui+C1rtLH3XYEYZcG6UVmUY6 EudC6EDRQjNQYcR59pa60QsgKeUNudmJ/CmTN690Z4uq8SY+jjL9nnIDNX+Q5X5m9xw+ xtCQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=UFaVFqVLXrek94uB46/J+EgfiNuNZ8T8ezVAAbtRl/U=; b=RmKvjEvyWuEU5UkyrdDubWgAclLo6dmEuWiOkYhCNFnWdAJ43kwjyglWxQ8u5gvLxB Wop8/QLy16iykga9poAE2WSOWPNRZm8S7UMpTpVL4/f59CVAiuB8ohDt68cqU35YI5fl /r8kbDy5fwLHaPzyfWM1+Cp65WyNVVciHYKIq8BeWG8MpCFCcn4W+gRvPYQpJEcMkdfK sVycSMzslEXD550TqVRNPKhzudLO6LH4xaWCP2EwmHSe5rxuPkDTUzDmuxpdr2X/lC6o S4kZwstIRMCOb98k+nLycXcFp21KaV+ujjg2RF9qQEIO0CS5jFtHo8hI61rMOAdzCa/A 3C/w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=lDeQUQJj; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a90-v6si15072161plc.285.2018.08.01.05.39.52; Wed, 01 Aug 2018 05:40:07 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=lDeQUQJj; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2389220AbeHAOYT (ORCPT + 99 others); Wed, 1 Aug 2018 10:24:19 -0400 Received: from mail-pf1-f195.google.com ([209.85.210.195]:39694 "EHLO mail-pf1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2389155AbeHAOYQ (ORCPT ); Wed, 1 Aug 2018 10:24:16 -0400 Received: by mail-pf1-f195.google.com with SMTP id j8-v6so7795996pff.6 for ; Wed, 01 Aug 2018 05:38:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=UFaVFqVLXrek94uB46/J+EgfiNuNZ8T8ezVAAbtRl/U=; b=lDeQUQJj+1Ol9wX+cau9d2kiVIAl0gZDP9m2uKZSC0Z35gDBCMoB1Yio6XOXFG4UMo rKIkQJbFkCvsU2q+i/BoijQgS9qJ8s104TleKnBRwSeKmO83Mu11MCrIVK9mQe0DPr4E HXCyLEeSHKITgzsSgook80EV1SpPCJetm/Y7pkkf5FEB5yzVe+tK24ucPdc9By2bDqCo EISkm0mmQmKpGydf3aOhTIWhIpkpcSKXfZef2pb0caZGr00XFIfrcFvNr2Bv1xYSzeEo gvXir2F3RR1geBdklXXtoqIFtOiFuUMIeGCZmCHY2ZeOdwHvCzxamu8b8KL9LsMzKKIq AJiQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=UFaVFqVLXrek94uB46/J+EgfiNuNZ8T8ezVAAbtRl/U=; b=Or29jjr8TcY+2cT/RC3I1CIC+GW88DSNu2uH4q7W+ldO3DHofbTDqvvOECm3GX/Sgk HXwTSjhc0WhQ/KcmhpBKD+z1BSgWsy4OM2HxstFqvZg1HQqUjOXOKgO/zrFrNGqtl+a5 JYhoxpvjyOYSzPgdHMf91mh3NEEm8MfYGfDRA+C9ltgbTFocbuSfJCpfhJueVplkyMaD OIoCUG21nNYZiSx/jQDsaDTUXxrineMpA8XXlZ9b3t+GLJw3H40c1vl2/0khat03a47Q jPUsYS7yVyLNaN6Q7FxwNueiVODhzKdl6w7gvG8TeCZb0dBuYcWjCz2J12CYtg4bzmEw RzAA== X-Gm-Message-State: AOUpUlGL94oBX7flUonUGk4zgb4HGhV0nXWIkKaRoRLfWQsO+KvZYd69 rrpSosody8FKBFr7lHejOPL3CZu+f81OjP+u4HQChg== X-Received: by 2002:a65:40cd:: with SMTP id u13-v6mr24963509pgp.334.1533127119722; Wed, 01 Aug 2018 05:38:39 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a17:90a:ac14:0:0:0:0 with HTTP; Wed, 1 Aug 2018 05:38:19 -0700 (PDT) In-Reply-To: <20180801114048.ufkr7zwmir7ps3vl@breakpoint.cc> References: <01000164f169bc6b-c73a8353-d7d9-47ec-a782-90aadcb86bfb-000000@email.amazonses.com> <20180801103537.d36t3snzulyuge7g@breakpoint.cc> <20180801114048.ufkr7zwmir7ps3vl@breakpoint.cc> From: Dmitry Vyukov Date: Wed, 1 Aug 2018 14:38:19 +0200 Message-ID: Subject: Re: SLAB_TYPESAFE_BY_RCU without constructors (was Re: [PATCH v4 13/17] khwasan: add hooks implementation) To: Florian Westphal Cc: Linus Torvalds , Christoph Lameter , Andrey Ryabinin , "Theodore Ts'o" , Jan Kara , linux-ext4@vger.kernel.org, Greg Kroah-Hartman , Pablo Neira Ayuso , Jozsef Kadlecsik , David Miller , NetFilter , coreteam@netfilter.org, Network Development , Gerrit Renker , dccp@vger.kernel.org, Jani Nikula , Joonas Lahtinen , Rodrigo Vivi , Dave Airlie , intel-gfx , DRI , Eric Dumazet , Alexey Kuznetsov , Hideaki YOSHIFUJI , Ursula Braun , linux-s390 , Linux Kernel Mailing List , Andrew Morton , linux-mm , Andrey Konovalov Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Aug 1, 2018 at 1:40 PM, Florian Westphal wrote: > Dmitry Vyukov wrote: >> On Wed, Aug 1, 2018 at 12:35 PM, Florian Westphal wrote: >> > Dmitry Vyukov wrote: >> >> Still can't grasp all details. >> >> There is state that we read without taking ct->ct_general.use ref >> >> first, namely ct->state and what's used by nf_ct_key_equal. >> >> So let's say the entry we want to find is in the list, but >> >> ____nf_conntrack_find finds a wrong entry earlier because all state it >> >> looks at is random garbage, so it returns the wrong entry to >> >> __nf_conntrack_find_get. >> > >> > If an entry can be found, it can't be random garbage. >> > We never link entries into global table until state has been set up. >> >> But... we don't hold a reference to the entry. So say it's in the >> table with valid state, now ____nf_conntrack_find discovers it, now >> the entry is removed and reused a dozen of times will all associated >> state reinitialization. And nf_ct_key_equal looks at it concurrently >> and decides if it's the entry we are looking for or now. I think >> unless we hold a ref to the entry, it's state needs to be considered >> random garbage for correctness reasoning. > > It checks if it might be the entry we're looking for. > > If this was complete random garbage, scheme would not work, as then > we could have entry that isn't in table, has nonzero refcount, but > has its confirmed bit set. > > I don't see how that would be possible, any reallocation > makes sure ct->status has CONFIRMED bit clear before setting refcount > to nonzero value. > > I think this is the scenario you hint at is: > 1. nf_ct_key_equal is true > 2. the entry is free'd (or was already free'd) > 3. we return NULL to caller due to atomic_inc_not_zero() failure > > but i fail to see how thats wrong? > > Sure, we could restart lookup but how would that help? > > We'd not find the 'candidate' entry again. > > We might find entry that has been inserted at this very instant but > newly allocated entries are only inserted into global table until the skb that > created the nf_conn object has made it through the network stack > (postrouting for fowarded, or input for local delivery). > > So, the likelyhood of such restart finding another candidate is close to 0, > and won't prevent 'insert race' from happening. The scenario I have in mind is different and it relates to the fact that ____nf_conntrack_find will return the right entry if it's present, but it can also return an unrelated entry because when it looks at entries they change underneath. Let's take any 2 fields compared by nf_ct_key_equal for simplicity, for example, src.u3 and dst.u3. Let's say we are looking for an entry with src.u3=A and dst.u3=B, let's call it (A,B). Let's say there is an existing entry 1 (A,B) in the global list. But there is also entry 2 (A,C) earlier in the list. Now, ____nf_conntrack_find starts checking entry 2 (A,C), it checks that src.u3==A, so so far it looks good. Now another thread deletes, reuses and reinitilizes entry 2 for (C,B). Now, ____nf_conntrack_find checks that dst.u3==B, so it concludes that it's the right entry, because it observed (A,B). Entry 2 is returned to __nf_conntrack_find_get. Now another thread marks entry 2 as dying. Now __nf_conntrack_find_get sees that it's dying and returns NULL to caller, _but_ the matching entry 1 (A,B) was in the list all that time and we should have been discovered it, but we didn't because we were deraield by the wrong entry 2. If that scenario is possible that a fix would be to make __nf_conntrack_find_get ever return NULL iff it got NULL from ____nf_conntrack_find (not if any of the checks has failed).