Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp5875174imm; Wed, 12 Sep 2018 12:29:37 -0700 (PDT) X-Google-Smtp-Source: ANB0VdYMyOEqcfXYQhm6uc1T+mt3MnzDCWOBk+QCJVlaPxPKP72R3FBUoeEp1gIf3R8a6tKR/vFI X-Received: by 2002:a63:3cc:: with SMTP id 195-v6mr3767457pgd.229.1536780577001; Wed, 12 Sep 2018 12:29:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536780576; cv=none; d=google.com; s=arc-20160816; b=NApxb8u27YWctfI5BwneXRAoXQcNLQJdNlbRllXegICBXn8r4bZCTwy5BUXU1nfF2F fgy2MIm7oeSbAxJOXcDVuFyV+aSzGJgqw/MBjjy4qm7wURofmPGDBfnLt+0HgW360UZ3 lCyUhuf56imCnmJdh3oH2m/gbtchPBKEskcxYYlDCzJ1EgG22NC7iKqDWbyT6QxiuFH3 0oenVWXCdn6hbGW6QEcRAoqIxMN9AAyq0nKyv8BAoWnYg6zRGH+Od+4oaqmgGyQqJOlN DWEufvmeeo5FL86TRbjRL/7Bfcfw7AoUoUQmuYm1woofs30mw64yRg7ndpD/VrBbiLFr xKCw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from; bh=MTI+fBR0XGf6TwDx6KirZFD7ZQXKWuwFfigwx6jMI8E=; b=EJthcS0EdXz/nab8u4pRGXEX5l0ndTAh3CgJYVPhm+Yh4tnWc7iG7Nac1RWBS2rMTu 2IOXKSfEniox4Krbc97Be2+ibjS2KNbyp9DTU0kQrigWNHcyQo2B+McIrTtl/pemZtzE zpJsyw5c20tW7iu/fP5qrBkRnh4OQnWOO1iObmDWLrbkPJcAU1KJfQg8wms+KJoHgIEy EjScNsJbckyRhrSWgCPiwF3hhiCeyUuASORDU+cLH9aVuK03TW2PIJEgNy6VN+w15f+3 3QqkDx/O6++wpMH0co7IkmDGu0WB0ZeDGygRG916myFwE03VK6UIaWnf1bbJzR7Nf0ES 9B9Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h1-v6si1744766pld.152.2018.09.12.12.29.22; Wed, 12 Sep 2018 12:29:36 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728371AbeIMAfK (ORCPT + 99 others); Wed, 12 Sep 2018 20:35:10 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:38386 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1728264AbeIMAfH (ORCPT ); Wed, 12 Sep 2018 20:35:07 -0400 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id EBBBF40216FC; Wed, 12 Sep 2018 19:29:07 +0000 (UTC) Received: from llong.com (dhcp-17-55.bos.redhat.com [10.18.17.55]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8F88310DCF4A; Wed, 12 Sep 2018 19:29:07 +0000 (UTC) From: Waiman Long To: Alexander Viro , Jan Kara , Jeff Layton , "J. Bruce Fields" , Tejun Heo , Christoph Lameter Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Ingo Molnar , Peter Zijlstra , Andi Kleen , Dave Chinner , Boqun Feng , Davidlohr Bueso , Waiman Long Subject: [PATCH v9 4/5] lib/dlock-list: Make sibling CPUs share the same linked list Date: Wed, 12 Sep 2018 15:28:51 -0400 Message-Id: <1536780532-4092-5-git-send-email-longman@redhat.com> In-Reply-To: <1536780532-4092-1-git-send-email-longman@redhat.com> References: <1536780532-4092-1-git-send-email-longman@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.3 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.5]); Wed, 12 Sep 2018 19:29:08 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.5]); Wed, 12 Sep 2018 19:29:08 +0000 (UTC) for IP:'10.11.54.3' DOMAIN:'int-mx03.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'longman@redhat.com' RCPT:'' Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The dlock list needs one list for each of the CPUs available. However, for sibling CPUs, they are sharing the L2 and probably L1 caches too. As a result, there is not much to gain in term of avoiding cacheline contention while increasing the cacheline footprint of the L1/L2 caches as separate lists may need to be in the cache. This patch makes all the sibling CPUs share the same list, thus reducing the number of lists that need to be maintained in each dlock list without having any noticeable impact on performance. It also improves dlock list iteration performance as fewer lists need to be iterated. Signed-off-by: Waiman Long Reviewed-by: Jan Kara --- lib/dlock-list.c | 74 ++++++++++++++++++++++++++++++++++++++++++++------------ 1 file changed, 59 insertions(+), 15 deletions(-) diff --git a/lib/dlock-list.c b/lib/dlock-list.c index f64ea4c..e286094 100644 --- a/lib/dlock-list.c +++ b/lib/dlock-list.c @@ -25,31 +25,65 @@ * The distributed and locked list is a distributed set of lists each of * which is protected by its own spinlock, but acts like a single * consolidated list to the callers. For scaling purpose, the number of - * lists used is equal to the number of possible CPUs in the system to - * minimize contention. + * lists used is equal to the number of possible cores in the system to + * minimize contention. All threads of the same CPU core will share the + * same list. * - * However, it is possible that individual CPU numbers may be equal to - * or greater than the number of possible CPUs when there are holes in - * the CPU number list. As a result, we need to map the CPU number to a - * list index. + * We need to map each CPU number to a list index. */ static DEFINE_PER_CPU_READ_MOSTLY(int, cpu2idx); +static int nr_dlock_lists __read_mostly; /* - * Initialize cpu2idx mapping table + * Initialize cpu2idx mapping table & nr_dlock_lists. * * It is possible that a dlock-list can be allocated before the cpu2idx is * initialized. In this case, all the cpus are mapped to the first entry * before initialization. * + * All the sibling CPUs of a sibling group will map to the same dlock list so + * as to reduce the number of dlock lists to be maintained while minimizing + * cacheline contention. + * + * As the sibling masks are set up in the core initcall phase, this function + * has to be done in the postcore phase to get the right data. */ static int __init cpu2idx_init(void) { int idx, cpu; + struct cpumask *sibling_mask; + static struct cpumask mask __initdata; + cpumask_clear(&mask); idx = 0; - for_each_possible_cpu(cpu) - per_cpu(cpu2idx, cpu) = idx++; + for_each_possible_cpu(cpu) { + int scpu; + + if (cpumask_test_cpu(cpu, &mask)) + continue; + per_cpu(cpu2idx, cpu) = idx; + cpumask_set_cpu(cpu, &mask); + + sibling_mask = topology_sibling_cpumask(cpu); + if (sibling_mask) { + for_each_cpu(scpu, sibling_mask) { + per_cpu(cpu2idx, scpu) = idx; + cpumask_set_cpu(scpu, &mask); + } + } + idx++; + } + + /* + * nr_dlock_lists can only be set after cpu2idx is properly + * initialized. + */ + smp_mb(); + nr_dlock_lists = idx; + WARN_ON(nr_dlock_lists > nr_cpu_ids); + + pr_info("dlock-list: %d head entries per dlock list.\n", + nr_dlock_lists); return 0; } postcore_initcall(cpu2idx_init); @@ -67,19 +101,23 @@ static int __init cpu2idx_init(void) * * Dynamically allocated locks need to have their own special lock class * to avoid lockdep warning. + * + * Since nr_dlock_lists will always be <= nr_cpu_ids, having more lists + * than necessary allocated is not a problem other than some wasted memory. + * The extra lists will not be ever used as all the cpu2idx entries will be + * 0 before initialization. */ int __alloc_dlock_list_heads(struct dlock_list_heads *dlist, struct lock_class_key *key) { - int idx; + int idx, cnt = nr_dlock_lists ? nr_dlock_lists : nr_cpu_ids; - dlist->heads = kcalloc(nr_cpu_ids, sizeof(struct dlock_list_head), - GFP_KERNEL); + dlist->heads = kcalloc(cnt, sizeof(struct dlock_list_head), GFP_KERNEL); if (!dlist->heads) return -ENOMEM; - for (idx = 0; idx < nr_cpu_ids; idx++) { + for (idx = 0; idx < cnt; idx++) { struct dlock_list_head *head = &dlist->heads[idx]; INIT_LIST_HEAD(&head->list); @@ -117,7 +155,10 @@ bool dlock_lists_empty(struct dlock_list_heads *dlist) { int idx; - for (idx = 0; idx < nr_cpu_ids; idx++) + /* Shouldn't be called before nr_dlock_lists is initialized */ + WARN_ON_ONCE(!nr_dlock_lists); + + for (idx = 0; idx < nr_dlock_lists; idx++) if (!list_empty(&dlist->heads[idx].list)) return false; return true; @@ -199,6 +240,9 @@ struct dlock_list_node *__dlock_list_next_list(struct dlock_list_iter *iter) struct dlock_list_node *next; struct dlock_list_head *head; + /* Shouldn't be called before nr_dlock_lists is initialized */ + WARN_ON_ONCE(!nr_dlock_lists); + restart: if (iter->entry) { spin_unlock(&iter->entry->lock); @@ -209,7 +253,7 @@ struct dlock_list_node *__dlock_list_next_list(struct dlock_list_iter *iter) /* * Try next list */ - if (++iter->index >= nr_cpu_ids) + if (++iter->index >= nr_dlock_lists) return NULL; /* All the entries iterated */ if (list_empty(&iter->head[iter->index].list)) -- 1.8.3.1