Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757930AbaAIVqR (ORCPT ); Thu, 9 Jan 2014 16:46:17 -0500 Received: from mail-qe0-f45.google.com ([209.85.128.45]:51809 "EHLO mail-qe0-f45.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756516AbaAIVqL (ORCPT ); Thu, 9 Jan 2014 16:46:11 -0500 MIME-Version: 1.0 In-Reply-To: <1389281025.31367.35.camel@edumazet-glaptop2.roam.corp.google.com> References: <1389090711-15843-1-git-send-email-avagin@openvz.org> <1389107305.26646.20.camel@edumazet-glaptop2.roam.corp.google.com> <20140109052454.GA16743@gmail.com> <1389281025.31367.35.camel@edumazet-glaptop2.roam.corp.google.com> Date: Fri, 10 Jan 2014 01:46:10 +0400 Message-ID: Subject: Re: [PATCH] netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get From: Andrey Wagin To: Eric Dumazet Cc: Florian Westphal , netfilter-devel@vger.kernel.org, netfilter@vger.kernel.org, coreteam@netfilter.org, netdev@vger.kernel.org, LKML , vvs@openvz.org, Pablo Neira Ayuso , Patrick McHardy , Jozsef Kadlecsik , "David S. Miller" , Cyrill Gorcunov Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 2014/1/9 Eric Dumazet : > On Thu, 2014-01-09 at 09:24 +0400, Andrew Vagin wrote: > >> I have one more question. Looks like I found another problem. >> >> init_conntrack: >> hlist_nulls_add_head_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode, >> &net->ct.unconfirmed); >> >> __nf_conntrack_hash_insert: >> hlist_nulls_add_head_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode, >> &net->ct.hash[hash]); >> >> We use one hlist_nulls_node to add a conntrack into two different lists. >> As far as I understand, it's unacceptable in case of >> SLAB_DESTROY_BY_RCU. > > I guess you missed : > > net/netfilter/nf_conntrack_core.c:1598: INIT_HLIST_NULLS_HEAD(&net->ct.unconfirmed, UNCONFIRMED_NULLS_VAL); but we can look up something suitable and return this value, but it will be unconfirmed. Ok, I see. This situation is fixed by this patch too. I don't understand the result of your discussion with Florian. Here are a few states of conntracts: it can be used and it's initialized. The sign of the first state is non-zero refcnt and the sign of the second state is the confirmed bit. In the first state conntrack is attached to skb and inserted in the unconfirmed list. In this state the use count can be incremented in ctnetlink_dump_list() and skb_clone(). In the second state conntrack may be attached to a few skb-s and inserted to net->ct.hash. I have read all emails again and I can't understand when this patch doesn't work. Maybe you could give a sequence of actions? Thanks. > > >> >> Lets imagine that we have two threads. The first one enumerates objects >> from a first list, the second one recreates an object and insert it in a >> second list. If a first thread in this moment stays on the object, it >> can read "next", when it's in the second list, so it will continue >> to enumerate objects from the second list. It is not what we want, isn't >> it? >> >> cpu1 cpu2 >> hlist_nulls_for_each_entry_rcu(ct) >> destroy_conntrack >> kmem_cache_free >> >> init_conntrack >> hlist_nulls_add_head_rcu >> ct = ct->next >> > > This will be fine. > > I think we even have a counter to count number of occurrence of this > rare event. (I personally never read a non null search_restart value) > > NF_CT_STAT_INC(net, search_restart); Thank you for explanation. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/