Received: by 10.213.65.68 with SMTP id h4csp189597imn; Mon, 26 Mar 2018 18:53:45 -0700 (PDT) X-Google-Smtp-Source: AIpwx4+JOulmIc5ap/mk9HI/YaU/yhe65ItSd9XOH7TfxvTO3v9rqD0jzUIrAZK7KPHhUngM6AC/ X-Received: by 2002:a17:902:6acc:: with SMTP id i12-v6mr1245824plt.353.1522115625378; Mon, 26 Mar 2018 18:53:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1522115625; cv=none; d=google.com; s=arc-20160816; b=rW3V/m+zw+Z2bZw1n9xn+SFDkzye61F7UP2OZJT4mqQmOSov9Y1p2klp71lc8qfvtp 2jAvK93C61nUyA+8o08ldwl6ZBNkM/JzpPyhAa1yA7pG6pz9OckuTVsLOAVetKtzqqei 05JGsP05sT7OP706wDP9Tl+KZAPVg7/Y1wHzuZCVwUvcwy6WCeIx9uwx53N36MyvqrZx 25rzQnp0L5ex4Lthf1hV0XrIJwdLXRal99/p/PjTPd9gJTATMKRUttUgqRk+iqAGbK42 1gP/Gd0R4bwa3Yc+wWdBqXixYtV5nuhaYQ88/dKAGPT9mK0rP6CTy41Nf9CoBzXZRMRh lGrA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:cc:subject:date:to :from:arc-authentication-results; bh=N5ncfpP77y6Rmle5le/XT3OE78eV3q0KpKxNaoM1ljs=; b=OdpW31zKP4fKibWD6svwP9I8v7ozoH/kfiWtZQmkRXeUzLMPkHpyUBsLWgjsVq/r5K OMEj2fxNqHGs44lF5P/Gi/wwOz/bOw/HFJzt9yHQfR7o8ojEkJkQhAkegTq9eAuw46ns eweqEhByTc8X4zMvQcSoAnpV8eN/YZ+c25cRYVAna9YjOG4Qrbshemi+KpKIN/QYHN4I dP9O1OtLgArN28P1LySFWPYvGj6ZPlDgZ26XBjpL5+Py1cPa03pUaqjcCI2cC6SZj9ol gqADvlJcIT8uRMXbiuP52flQqajVoWpCO3u5Q0sp3XPs0k+A3vQxLa4ZEhNlaVw2RNAg 9aNQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m10-v6si93858pln.595.2018.03.26.18.53.31; Mon, 26 Mar 2018 18:53:45 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752497AbeC0Bvs (ORCPT + 99 others); Mon, 26 Mar 2018 21:51:48 -0400 Received: from mx2.suse.de ([195.135.220.15]:39042 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752469AbeC0Bvq (ORCPT ); Mon, 26 Mar 2018 21:51:46 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 9C671AE6C; Tue, 27 Mar 2018 01:51:44 +0000 (UTC) From: NeilBrown To: Thomas Graf , Herbert Xu Date: Tue, 27 Mar 2018 10:33:04 +1100 Subject: [PATCH 5/6] rhashtable: support guaranteed successful insertion. Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org Message-ID: <152210718434.11435.6551477417902631683.stgit@noble> In-Reply-To: <152210688405.11435.13010923693146415942.stgit@noble> References: <152210688405.11435.13010923693146415942.stgit@noble> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The current rhashtable will fail an insertion if the hashtable it "too full", one of: - table already has 2^31 elements (-E2BIG) - a max_size was specified and table already has that many elements (rounded up to power of 2) (-E2BIG) - a single chain has more than 16 elements (-EBUSY) - table has more elements than the current table size, and allocating a new table fails (-ENOMEM) - a new page needed to be allocated for a nested table, and the memory allocation failed (-ENOMEM). A traditional hash table does not have a concept of "too full", and insertion only fails if the key already exists. Many users of hash tables have separate means of limiting the total number of entries, and are not susceptible to an attack which could cause unusually large hash chains. For those users, the need to check for errors when inserting objects to an rhashtable is an unnecessary burden and hence a potential source of bugs (as these failures are likely to be rare). This patch adds a "never_fail_insert" configuration parameter which ensures that insertion will only fail if the key already exists. When this option is in effect: - nelems is capped at INT_MAX and will never decrease once it reaches that value - max_size is largely ignored - elements will be added to a table that is nominally "full", though a rehash will be scheduled - a new table will never be allocated directly by the insert function, that is always left for the worker. For this to trigger a rehash when long chains are detected (possibly still useful) an extra field in the table records if a long chain has been seen. This shares a word with the 'nest' value. As 'nest' is never changed once the table is created, updating the new ->long_chain without locking cannot cause any corruption. Signed-off-by: NeilBrown --- include/linux/rhashtable.h | 18 +++++++++++++++--- lib/rhashtable.c | 27 +++++++++++++++++++-------- 2 files changed, 34 insertions(+), 11 deletions(-) diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h index 4ffd96949d4f..abdeb1f3f378 100644 --- a/include/linux/rhashtable.h +++ b/include/linux/rhashtable.h @@ -77,6 +77,7 @@ struct rhlist_head { * struct bucket_table - Table of hash buckets * @size: Number of hash buckets * @nest: Number of bits of first-level nested table. + * @long_chain: %true when a chain longer than RHT_ELASTICITY seen. * @rehash: Current bucket being rehashed * @hash_rnd: Random seed to fold into hash * @locks_mask: Mask to apply before accessing locks[] @@ -89,7 +90,8 @@ struct rhlist_head { */ struct bucket_table { unsigned int size; - unsigned int nest; + unsigned short nest; + bool long_chain; unsigned int rehash; u32 hash_rnd; unsigned int locks_mask; @@ -129,6 +131,9 @@ struct rhashtable; * @min_size: Minimum size while shrinking * @locks_mul: Number of bucket locks to allocate per cpu (default: 32) * @automatic_shrinking: Enable automatic shrinking of tables + * @never_fail_insert: Insert will always succeed, even if table will become + * unbalanced. Without this, -E2BIG, -EBUSY, and -ENOMEM are possible + * errors from rhashtable_*insert*() * @nulls_base: Base value to generate nulls marker * @hashfn: Hash function (default: jhash2 if !(key_len % 4), or jhash) * @obj_hashfn: Function to hash object @@ -142,6 +147,7 @@ struct rhashtable_params { unsigned int max_size; u16 min_size; bool automatic_shrinking; + bool never_fail_insert; u8 locks_mul; u32 nulls_base; rht_hashfn_t hashfn; @@ -832,7 +838,10 @@ static inline void *__rhashtable_insert_fast( rcu_assign_pointer(*pprev, obj); - atomic_inc(&ht->nelems); + if (params.never_fail_insert) + atomic_add_unless(&ht->nelems, 1, INT_MAX); + else + atomic_inc(&ht->nelems); if (rht_grow_above_75(ht, tbl)) schedule_work(&ht->run_work); @@ -1104,7 +1113,10 @@ static inline int __rhashtable_remove_fast_one( spin_unlock_bh(lock); if (err > 0) { - atomic_dec(&ht->nelems); + if (params.never_fail_insert) + atomic_add_unless(&ht->nelems, -1, INT_MAX); + else + atomic_dec(&ht->nelems); if (unlikely(ht->p.automatic_shrinking && rht_shrink_below_30(ht, tbl))) schedule_work(&ht->run_work); diff --git a/lib/rhashtable.c b/lib/rhashtable.c index fd6f320b9704..427836aace60 100644 --- a/lib/rhashtable.c +++ b/lib/rhashtable.c @@ -424,7 +424,7 @@ static void rht_deferred_worker(struct work_struct *work) err = rhashtable_rehash_alloc(ht, tbl, tbl->size * 2); else if (ht->p.automatic_shrinking && rht_shrink_below_30(ht, tbl)) err = rhashtable_shrink(ht); - else if (tbl->nest) + else if (tbl->nest || tbl->long_chain) err = rhashtable_rehash_alloc(ht, tbl, tbl->size); if (!err) @@ -549,14 +549,22 @@ static struct bucket_table *rhashtable_insert_one(struct rhashtable *ht, if (new_tbl) return new_tbl; - if (PTR_ERR(data) != -ENOENT) - return ERR_CAST(data); + if (ht->p.never_fail_insert) { + if (PTR_ERR(data) == -EAGAIN && + atomic_read(&ht->nelems) != INT_MAX) { + tbl->long_chain = true; + schedule_work(&ht->run_work); + } + } else { + if (PTR_ERR(data) != -ENOENT) + return ERR_CAST(data); - if (unlikely(rht_grow_above_max(ht, tbl))) - return ERR_PTR(-E2BIG); + if (unlikely(rht_grow_above_max(ht, tbl))) + return ERR_PTR(-E2BIG); - if (unlikely(rht_grow_above_100(ht, tbl))) - return ERR_PTR(-EAGAIN); + if (unlikely(rht_grow_above_100(ht, tbl))) + return ERR_PTR(-EAGAIN); + } pprev = rht_bucket_insert(ht, tbl, hash); if (!pprev) @@ -574,7 +582,10 @@ static struct bucket_table *rhashtable_insert_one(struct rhashtable *ht, rcu_assign_pointer(*pprev, obj); - atomic_inc(&ht->nelems); + if (ht->p.never_fail_insert) + atomic_add_unless(&ht->nelems, 1, INT_MAX); + else + atomic_inc(&ht->nelems); if (rht_grow_above_75(ht, tbl)) schedule_work(&ht->run_work);