Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp2941925imm; Thu, 24 May 2018 19:33:06 -0700 (PDT) X-Google-Smtp-Source: AB8JxZpVQPHpBi1BVG4Pa4TSskD+1J96VzpJoGicJt/1XHMjLrz4aKbwi088l/q39zkyMIhY8rM8 X-Received: by 2002:aa7:81cc:: with SMTP id c12-v6mr547074pfn.169.1527215586184; Thu, 24 May 2018 19:33:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527215586; cv=none; d=google.com; s=arc-20160816; b=r+Nr5sMwmYXp1QLP9OlgU5XDA22YKoRzGuqGxQGiab72hxX7aI2H9eJuBpdbLLH4BZ j8VopXx4E5RMzBDyZWBdGT+MpeSDb/KTQnrGeM8HH9sRqCh8vKQeVj53nh/T62hS4h7f I2uUkX5lSMZcOEBdZJ/65UFJU+/TilykEvCHMbTq49rqZ6w2/+XNBC103enRWwJpAQkW AMvZRX6YApaSUk30Brlec+ZmcFlFo24SPkNCV9dzAh5Z6QxCiPx9icIZzvTAyNyQng5N sgB17kO7khhsK8FSNWREpEnqSaMM20gYYbTayv90yol0xSHRddRXrBuc5gRLg3oUpGoL S7Dg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=PMzIZFo9VpqVefNl8CV+Qr+D3++W4nhbsSXrq4+iFyE=; b=efz4KsgsR0DaFA2yWFH5EtYAX9DjmReuLvOSnT5Z4tEKIbSHO/ccxm+EUFkDnzGqDS QohjPX3yr/lqDFtu2NgeMcnbFpS0ZJcye53aqeSGV71IBkYhSjm49IUM5zdrRA8UDUSV y5VzTHPCWyq54Flh7P4VAFwNCJgN/cmxvfpn/wNyVEn0ZmopI+UKBRnlEp2hdr4clFtr A8iCawAmhKdnj0fyATHt2LmPpSxJmbfmXAxan+caDRpuEsGCD8ayyj2F4+Z+bxTt/bAw BaTeTcxq6vLzDA7KUzz8PRl3uKUPIfiIFf0lVxFdihP9qnwJeISrhs5Vvip6ERRmLgi6 7qjw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f39-v6si22425053plb.572.2018.05.24.19.32.51; Thu, 24 May 2018 19:33:06 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1034169AbeEXRXe (ORCPT + 99 others); Thu, 24 May 2018 13:23:34 -0400 Received: from mx2.suse.de ([195.135.220.15]:53896 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1031084AbeEXRXa (ORCPT ); Thu, 24 May 2018 13:23:30 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (charybdis-ext-too.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 8EF80AF03; Thu, 24 May 2018 17:23:28 +0000 (UTC) Date: Thu, 24 May 2018 10:07:00 -0700 From: Davidlohr Bueso To: Linus Torvalds Cc: Thomas Graf , Herbert Xu , Andrew Morton , Manfred Spraul , guillaume.knispel@supersonicimagine.com, Linux API , Linux Kernel Mailing List Subject: Re: semantics of rhashtable and sysvipc Message-ID: <20180524170700.wblnybinjzx5rwky@linux-n805> References: <20180523172500.anfvmjtumww65ief@linux-n805> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20170421 (1.8.2) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 23 May 2018, Linus Torvalds wrote: >One option is to make rhashtable_alloc() shrink the allocation and try >again if it fails, and then you *can* do __GFP_NOFAIL eventually. The below attempts to implements this, along with converting the EINVAL cases to WARN_ON(). I've refactored bucket_table_alloc() to add a 'retry' param which always ends up doing GFP_KERNEL|__GFP_NOFAIL for both the tbl as well as alloc_bucket_spinlocks(). I've arbitrarily shrunk the size in half upon initial allocation failure. So we default from 64 to 32 buckets, and if the user is hinting at larger nelems, we simply disregard that and retry with 32. Similarly, smaller values are also cut in half. In addition, I think we can also get rid of explicitly passing the gfp flags to bucket_table_alloc() (we only do GFP_KERNEL or GFP_ATOMIC), and leave it up to __bucket_table_alloc() by adding a new bucket_table_alloc_noblock() or something for GFP_ATOMIC. > >In fact, it can validly be argued that rhashtable_init() is just buggy >as-is. The whole *point* olf that function is to size things appropriately, >and returning -ENOMEM obviously means that it didn't do its job. diff --git a/lib/rhashtable.c b/lib/rhashtable.c index 9427b5766134..e86a396aebcf 100644 --- a/lib/rhashtable.c +++ b/lib/rhashtable.c @@ -166,16 +166,21 @@ static struct bucket_table *nested_bucket_table_alloc(struct rhashtable *ht, return tbl; } -static struct bucket_table *bucket_table_alloc(struct rhashtable *ht, - size_t nbuckets, - gfp_t gfp) +static struct bucket_table *__bucket_table_alloc(struct rhashtable *ht, + size_t nbuckets, + gfp_t gfp, bool retry) { struct bucket_table *tbl = NULL; size_t size, max_locks; int i; size = sizeof(*tbl) + nbuckets * sizeof(tbl->buckets[0]); - if (gfp != GFP_KERNEL) + if (retry) { + gfp |= __GFP_NOFAIL; + tbl = kzalloc(size, gfp); + } + + else if (gfp != GFP_KERNEL) tbl = kzalloc(size, gfp | __GFP_NOWARN | __GFP_NORETRY); else tbl = kvzalloc(size, gfp); @@ -211,6 +216,20 @@ static struct bucket_table *bucket_table_alloc(struct rhashtable *ht, return tbl; } +static struct bucket_table *bucket_table_alloc(struct rhashtable *ht, + size_t nbuckets, + gfp_t gfp) +{ + return __bucket_table_alloc(ht, nbuckets, gfp, false); +} + +static struct bucket_table *bucket_table_alloc_retry(struct rhashtable *ht, + size_t nbuckets, + gfp_t gfp) +{ + return __bucket_table_alloc(ht, nbuckets, gfp, true); +} + static struct bucket_table *rhashtable_last_table(struct rhashtable *ht, struct bucket_table *tbl) { @@ -1024,12 +1043,11 @@ int rhashtable_init(struct rhashtable *ht, size = HASH_DEFAULT_SIZE; - if ((!params->key_len && !params->obj_hashfn) || - (params->obj_hashfn && !params->obj_cmpfn)) - return -EINVAL; + WARN_ON((!params->key_len && !params->obj_hashfn) || + (params->obj_hashfn && !params->obj_cmpfn)); - if (params->nulls_base && params->nulls_base < (1U << RHT_BASE_SHIFT)) - return -EINVAL; + WARN_ON(params->nulls_base && + params->nulls_base < (1U << RHT_BASE_SHIFT)); memset(ht, 0, sizeof(*ht)); mutex_init(&ht->mutex); @@ -1068,9 +1086,23 @@ int rhashtable_init(struct rhashtable *ht, } } + /* + * This is api initialization. We need to guarantee the initial + * rhashtable allocation. Upon failure, retry with a smaller size, + * otherwise we exhaust our options with __GFP_NOFAIL. + * + * The size of the table is shrunk to at least half the original + * value. Users that use large nelem_hint values are lowered to 32 + * buckets. + */ tbl = bucket_table_alloc(ht, size, GFP_KERNEL); - if (tbl == NULL) - return -ENOMEM; + if (unlikely(tbl == NULL)) { + size = min(size, HASH_DEFAULT_SIZE) / 2; + + tbl = bucket_table_alloc(ht, size, GFP_KERNEL); + if (tbl == NULL) + tbl = bucket_table_alloc_retry(ht, size, GFP_KERNEL); + } atomic_set(&ht->nelems, 0);