Received: by 10.192.165.156 with SMTP id m28csp208165imm; Sun, 15 Apr 2018 21:13:11 -0700 (PDT) X-Google-Smtp-Source: AIpwx4+VwcSBBkNh66QTCPD+T8ypMjb1z+0258hS0LcbcX39qWZLFsa/vW7srYGwvbMghKHkXmju X-Received: by 2002:a17:902:6ac3:: with SMTP id i3-v6mr8101949plt.142.1523851991423; Sun, 15 Apr 2018 21:13:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1523851991; cv=none; d=google.com; s=arc-20160816; b=JdQJ5z/a3AV9eIqh8s6WjiyJSOo14Q6Xy2ZN6PfeMeT2BUXQFQZlyDeTElKcmj8X3c Wjsl9PpRbEEnsApB7vjqJT0w7Il5BgtJxLLReCemCRAaTC7RAJfOZqmUaqsVYg2DD8a8 hWXBBXHjiBXH60HFjnL2nkB7elRQRhtPTyT2rRDf8AzWsd5sNnxYpNHD/SnK82iMm2d0 2BftMW6deKcNybyO1trPSzWMNTgtY857Gqvo5UteyAY0doe35Sggk67MttQTE7zMablX r8lZ/HWVoJbJ9lvFexfYqeg5RWfAS4gYZb3rx5+uGY98ZQhyMLs52lMQV+EkDTYYDTmp i60Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:arc-authentication-results; bh=SRG4WAM5Z2XoQvYjC3SfIMhJgh/GuBJDCRx96vs1QpE=; b=hhDWszRJLFtUaPiHOSHt7Zi9ua2Hx6MmYswgTsZDCKRwwQIshy5QgZ2f+MqEZliv4g KJUJBVE5AB/5Rb4KaL/9UEiKVyAgFonW3y74PoH/4JVBgyukOibzPXO3ns7prGRQ/enL FP2BmAE4c506JsOc+lQwgS33GePrKWK58lq77L78/MCu9e4ipUGOP769sefvWOKjbxn3 KchQoTW+Izqc0N6sICEbd9q9zveek/9h5Hpy4NhKGZEeeTubm70XE/cz+hs53G9FqHjZ hKD23dCkGPdVLzPYMd52epoTBs+qzb6TD1oYMlc9SP7sSqpEGGhwMsFBCXL2ISQSXnXN cZOQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b65si8588556pgc.325.2018.04.15.21.12.57; Sun, 15 Apr 2018 21:13:11 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753149AbeDPEK2 (ORCPT + 99 others); Mon, 16 Apr 2018 00:10:28 -0400 Received: from smtp3.ccs.ornl.gov ([160.91.203.39]:36400 "EHLO smtp3.ccs.ornl.gov" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753076AbeDPEK1 (ORCPT ); Mon, 16 Apr 2018 00:10:27 -0400 Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id D3A3735A; Mon, 16 Apr 2018 00:10:11 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id CFB2B1F8; Mon, 16 Apr 2018 00:10:11 -0400 (EDT) From: James Simmons To: Greg Kroah-Hartman , devel@driverdev.osuosl.org, Andreas Dilger , Oleg Drokin , NeilBrown Cc: Linux Kernel Mailing List , Lustre Development List , Dmitry Eremin , James Simmons Subject: [PATCH 20/25] staging: lustre: libcfs: make tolerant to offline CPUs and empty NUMA nodes Date: Mon, 16 Apr 2018 00:10:02 -0400 Message-Id: <1523851807-16573-21-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1523851807-16573-1-git-send-email-jsimmons@infradead.org> References: <1523851807-16573-1-git-send-email-jsimmons@infradead.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Dmitry Eremin Rework CPU partition code in the way of make it more tolerant to offline CPUs and empty nodes. Signed-off-by: Dmitry Eremin Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8703 Reviewed-on: https://review.whamcloud.com/23222 Reviewed-by: Amir Shehata Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- .../lustre/include/linux/libcfs/linux/linux-cpu.h | 2 + .../staging/lustre/lnet/libcfs/linux/linux-cpu.c | 132 +++++++++------------ drivers/staging/lustre/lnet/lnet/lib-msg.c | 2 + 3 files changed, 60 insertions(+), 76 deletions(-) diff --git a/drivers/staging/lustre/include/linux/libcfs/linux/linux-cpu.h b/drivers/staging/lustre/include/linux/libcfs/linux/linux-cpu.h index b3bc4e7..ed4351b 100644 --- a/drivers/staging/lustre/include/linux/libcfs/linux/linux-cpu.h +++ b/drivers/staging/lustre/include/linux/libcfs/linux/linux-cpu.h @@ -56,6 +56,8 @@ struct cfs_cpu_partition { unsigned int *cpt_distance; /* spread rotor for NUMA allocator */ int cpt_spread_rotor; + /* NUMA node if cpt_nodemask is empty */ + int cpt_node; }; /** descriptor for CPU partitions */ diff --git a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c b/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c index 32ebd0f..80db008 100644 --- a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c +++ b/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c @@ -427,8 +427,16 @@ int cfs_cpt_set_cpu(struct cfs_cpt_table *cptab, int cpt, int cpu) return 0; } - LASSERT(!cpumask_test_cpu(cpu, cptab->ctb_cpumask)); - LASSERT(!cpumask_test_cpu(cpu, cptab->ctb_parts[cpt].cpt_cpumask)); + if (cpumask_test_cpu(cpu, cptab->ctb_cpumask)) { + CDEBUG(D_INFO, "CPU %d is already in cpumask\n", cpu); + return 0; + } + + if (cpumask_test_cpu(cpu, cptab->ctb_parts[cpt].cpt_cpumask)) { + CDEBUG(D_INFO, "CPU %d is already in partition %d cpumask\n", + cpu, cptab->ctb_cpu2cpt[cpu]); + return 0; + } cfs_cpt_add_cpu(cptab, cpt, cpu); cfs_cpt_add_node(cptab, cpt, cpu_to_node(cpu)); @@ -497,8 +505,10 @@ void cfs_cpt_unset_cpumask(struct cfs_cpt_table *cptab, int cpt, { int cpu; - for_each_cpu(cpu, mask) - cfs_cpt_unset_cpu(cptab, cpt, cpu); + for_each_cpu(cpu, mask) { + cfs_cpt_del_cpu(cptab, cpt, cpu); + cfs_cpt_del_node(cptab, cpt, cpu_to_node(cpu)); + } } EXPORT_SYMBOL(cfs_cpt_unset_cpumask); @@ -549,10 +559,8 @@ int cfs_cpt_set_nodemask(struct cfs_cpt_table *cptab, int cpt, { int node; - for_each_node_mask(node, *mask) { - if (!cfs_cpt_set_node(cptab, cpt, node)) - return 0; - } + for_each_node_mask(node, *mask) + cfs_cpt_set_node(cptab, cpt, node); return 1; } @@ -573,7 +581,7 @@ int cfs_cpt_spread_node(struct cfs_cpt_table *cptab, int cpt) nodemask_t *mask; int weight; int rotor; - int node; + int node = 0; /* convert CPU partition ID to HW node id */ @@ -583,20 +591,20 @@ int cfs_cpt_spread_node(struct cfs_cpt_table *cptab, int cpt) } else { mask = cptab->ctb_parts[cpt].cpt_nodemask; rotor = cptab->ctb_parts[cpt].cpt_spread_rotor++; + node = cptab->ctb_parts[cpt].cpt_node; } weight = nodes_weight(*mask); - LASSERT(weight > 0); - - rotor %= weight; + if (weight > 0) { + rotor %= weight; - for_each_node_mask(node, *mask) { - if (!rotor--) - return node; + for_each_node_mask(node, *mask) { + if (!rotor--) + return node; + } } - LBUG(); - return 0; + return node; } EXPORT_SYMBOL(cfs_cpt_spread_node); @@ -689,17 +697,21 @@ static int cfs_cpt_choose_ncpus(struct cfs_cpt_table *cptab, int cpt, cpumask_var_t core_mask; int rc = 0; int cpu; + int i; LASSERT(number > 0); if (number >= cpumask_weight(node_mask)) { while (!cpumask_empty(node_mask)) { cpu = cpumask_first(node_mask); + cpumask_clear_cpu(cpu, node_mask); + + if (!cpu_online(cpu)) + continue; rc = cfs_cpt_set_cpu(cptab, cpt, cpu); if (!rc) return -EINVAL; - cpumask_clear_cpu(cpu, node_mask); } return 0; } @@ -720,24 +732,19 @@ static int cfs_cpt_choose_ncpus(struct cfs_cpt_table *cptab, int cpt, cpu = cpumask_first(node_mask); /* get cpumask for cores in the same socket */ - cpumask_copy(socket_mask, topology_core_cpumask(cpu)); - cpumask_and(socket_mask, socket_mask, node_mask); - - LASSERT(!cpumask_empty(socket_mask)); - + cpumask_and(socket_mask, topology_core_cpumask(cpu), node_mask); while (!cpumask_empty(socket_mask)) { - int i; - /* get cpumask for hts in the same core */ - cpumask_copy(core_mask, topology_sibling_cpumask(cpu)); - cpumask_and(core_mask, core_mask, node_mask); - - LASSERT(!cpumask_empty(core_mask)); + cpumask_and(core_mask, topology_sibling_cpumask(cpu), + node_mask); for_each_cpu(i, core_mask) { cpumask_clear_cpu(i, socket_mask); cpumask_clear_cpu(i, node_mask); + if (!cpu_online(i)) + continue; + rc = cfs_cpt_set_cpu(cptab, cpt, i); if (!rc) { rc = -EINVAL; @@ -806,23 +813,18 @@ static struct cfs_cpt_table *cfs_cpt_table_create(int ncpt) struct cfs_cpt_table *cptab = NULL; cpumask_var_t node_mask; int cpt = 0; + int node; int num; - int rc; - int i; + int rem; + int rc = 0; - rc = cfs_cpt_num_estimate(); + num = cfs_cpt_num_estimate(); if (ncpt <= 0) - ncpt = rc; + ncpt = num; - if (ncpt > num_online_cpus() || ncpt > 4 * rc) { + if (ncpt > num_online_cpus() || ncpt > 4 * num) { CWARN("CPU partition number %d is larger than suggested value (%d), your system may have performance issue or run out of memory while under pressure\n", - ncpt, rc); - } - - if (num_online_cpus() % ncpt) { - CERROR("CPU number %d is not multiple of cpu_npartition %d, please try different cpu_npartitions value or set pattern string by cpu_pattern=STRING\n", - (int)num_online_cpus(), ncpt); - goto failed; + ncpt, num); } cptab = cfs_cpt_table_alloc(ncpt); @@ -831,55 +833,33 @@ static struct cfs_cpt_table *cfs_cpt_table_create(int ncpt) goto failed; } - num = num_online_cpus() / ncpt; - if (!num) { - CERROR("CPU changed while setting CPU partition\n"); - goto failed; - } - if (!zalloc_cpumask_var(&node_mask, GFP_NOFS)) { CERROR("Failed to allocate scratch cpumask\n"); goto failed; } - for_each_online_node(i) { - cpumask_copy(node_mask, cpumask_of_node(i)); - - while (!cpumask_empty(node_mask)) { - struct cfs_cpu_partition *part; - int n; - - /* - * Each emulated NUMA node has all allowed CPUs in - * the mask. - * End loop when all partitions have assigned CPUs. - */ - if (cpt == ncpt) - break; - - part = &cptab->ctb_parts[cpt]; + num = num_online_cpus() / ncpt; + rem = num_online_cpus() % ncpt; + for_each_online_node(node) { + cpumask_copy(node_mask, cpumask_of_node(node)); - n = num - cpumask_weight(part->cpt_cpumask); - LASSERT(n > 0); + while (cpt < ncpt && !cpumask_empty(node_mask)) { + struct cfs_cpu_partition *part = &cptab->ctb_parts[cpt]; + int ncpu = cpumask_weight(part->cpt_cpumask); - rc = cfs_cpt_choose_ncpus(cptab, cpt, node_mask, n); + rc = cfs_cpt_choose_ncpus(cptab, cpt, node_mask, + num - ncpu); if (rc < 0) goto failed_mask; - LASSERT(num >= cpumask_weight(part->cpt_cpumask)); - if (num == cpumask_weight(part->cpt_cpumask)) + ncpu = cpumask_weight(part->cpt_cpumask); + if (ncpu == num + !!(rem > 0)) { cpt++; + rem--; + } } } - if (cpt != ncpt || - num != cpumask_weight(cptab->ctb_parts[ncpt - 1].cpt_cpumask)) { - CERROR("Expect %d(%d) CPU partitions but got %d(%d), CPU hotplug/unplug while setting?\n", - cptab->ctb_nparts, num, cpt, - cpumask_weight(cptab->ctb_parts[ncpt - 1].cpt_cpumask)); - goto failed_mask; - } - free_cpumask_var(node_mask); return cptab; diff --git a/drivers/staging/lustre/lnet/lnet/lib-msg.c b/drivers/staging/lustre/lnet/lnet/lib-msg.c index 0091273..27bdefa 100644 --- a/drivers/staging/lustre/lnet/lnet/lib-msg.c +++ b/drivers/staging/lustre/lnet/lnet/lib-msg.c @@ -568,6 +568,8 @@ /* number of CPUs */ container->msc_nfinalizers = cfs_cpt_weight(lnet_cpt_table(), cpt); + if (container->msc_nfinalizers == 0) + container->msc_nfinalizers = 1; container->msc_finalizers = kvzalloc_cpt(container->msc_nfinalizers * sizeof(*container->msc_finalizers), -- 1.8.3.1