Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757470AbaAIUc1 (ORCPT ); Thu, 9 Jan 2014 15:32:27 -0500 Received: from relay.parallels.com ([195.214.232.42]:52772 "EHLO relay.parallels.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751751AbaAIUcS (ORCPT ); Thu, 9 Jan 2014 15:32:18 -0500 Date: Fri, 10 Jan 2014 00:32:07 +0400 From: Andrew Vagin To: Florian Westphal CC: Eric Dumazet , Andrey Vagin , , , , , , , Pablo Neira Ayuso , Patrick McHardy , Jozsef Kadlecsik , "David S. Miller" , Cyrill Gorcunov Subject: Re: [PATCH] netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get Message-ID: <20140109203206.GA26348@paralelels.com> References: <1389090711-15843-1-git-send-email-avagin@openvz.org> <1389107305.26646.20.camel@edumazet-glaptop2.roam.corp.google.com> <20140107152520.GF9894@breakpoint.cc> MIME-Version: 1.0 Content-Type: text/plain; charset="koi8-r" Content-Disposition: inline In-Reply-To: <20140107152520.GF9894@breakpoint.cc> User-Agent: Mutt/1.5.21 (2010-09-15) X-Originating-IP: [10.24.24.156] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jan 07, 2014 at 04:25:20PM +0100, Florian Westphal wrote: > Eric Dumazet wrote: > > > diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c > > > index 43549eb..7a34bb2 100644 > > > --- a/net/netfilter/nf_conntrack_core.c > > > +++ b/net/netfilter/nf_conntrack_core.c > > > @@ -387,8 +387,12 @@ begin: > > > !atomic_inc_not_zero(&ct->ct_general.use))) > > > h = NULL; > > > else { > > > + /* A conntrack can be recreated with the equal tuple, > > > + * so we need to check that the conntrack is initialized > > > + */ > > > if (unlikely(!nf_ct_tuple_equal(tuple, &h->tuple) || > > > - nf_ct_zone(ct) != zone)) { > > > + nf_ct_zone(ct) != zone) || > > > + !nf_ct_is_confirmed(ct)) { > > > nf_ct_put(ct); > > > goto begin; > > > } > > > > I do not think this is the right way to fix this problem (if said > > problem is confirmed) > > > > Remember the rule about SLAB_DESTROY_BY_RCU : > > > > When a struct is freed, then reused, its important to set the its refcnt > > (from 0 to 1) only when the structure is fully ready for use. > > > > If a lookup finds a structure which is not yet setup, the > > atomic_inc_not_zero() will fail. > > Indeed. But, the structure itself might be ready (or rather, > can be ready since the allocation side will set the refcount to one > after doing the initial work, such as zapping old ->status flags and > setting tuple information). > > The problem is with nat extension area stored in the ct->ext area. > This extension area is preallocated but the snat/dnat action > information is only set up after the ct (or rather, the skb that grabbed > a reference to the nf_conn entry) traverses nat pre/postrouting. > > This will also set up a null-binding when no matching SNAT/DNAT/MASQERUADE > rule existed. > > The manipulations of the skb->nfct->ext nat area are performed without > a lock. Concurrent access is supposedly impossible as the conntrack > should not (yet) be in the hash table. > > The confirmed bit is set right before we insert the conntrack into > the hash table (after we traversed rules, ct is ready to be > 'published'). Can we allocate conntrack with zero ct_general.use and increment it at the first time before inserting the conntrack into the hash table? When conntrack is allocated it is attached exclusively to one skb. It must be destroyed with skb, if it has not been confirmed, so we don't need refcnt on this stage. I found only one place, where a reference counter of unconfirmed conntract can incremented. It's ctnetlink_dump_table(). Probably we can find a way, how to fix it. > > i.e. when the confirmed bit is NOT set we should not be 'seeing' the nf_conn > struct when we perform the lookup, as it should still be sitting on the > 'unconfirmed' list, being invisible to readers. > > Does that explanation make sense to you? > > Thanks for looking into this. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/