Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp3448120imm; Tue, 29 May 2018 07:26:28 -0700 (PDT) X-Google-Smtp-Source: AB8JxZpvSVNDyqBJ9ON1gPkIZMPmnr55njWTG14KnL7uIvLkQ4U5nC74i973PnraxD0Gak+N0pwF X-Received: by 2002:a63:7b51:: with SMTP id k17-v6mr14000950pgn.55.1527603988619; Tue, 29 May 2018 07:26:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527603988; cv=none; d=google.com; s=arc-20160816; b=kyV4qCLlUbeErcvUWeERIOzkZhioAJnHsOvxuxnXA+1DQdWfvgwpQQhLr3XxhT/HsD siMQ3LjBRXcM9zXTorp8tJO6PNfdRCzH00zXbzzehJM2F0JZapAf2B+NZwu1zOVYMOBv om56cI0UmsXTFAbUf3q2bgsuzt1fFOYb+5h9czk8loRCxqRqdopW6lcfeC93NlThrMdq OXpv3t4J34XocqJRD97bGUDrDbKPZE3JlflbY/GV6m92qBVFLdyBP5cGSAlvLVwwUAUp L5ptMSSUryUBee/SYyJbju8rP3V4dPkiGDMt59WR16t3AO88zpiXLheUajSDUIj8w30B N+iw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:arc-authentication-results; bh=B8weepjJp7bLzcp+uL/Sr+mGGEPXILRlSWJCs59rYPc=; b=odobHJos+shmrjvkd1OV182W1TM8MopzmNwHZezM79/CLefCv3pskHl5d8sA9jZxUU USEziUpfchrdfu1kldt0BTliZzpEmO21BSDaIuYUe2loPDb1JVwkmWynrVYHtOflarYz w5FWmD4e5Vyawv19tHu6qt5OPpVcIqv1u4V0QSuHaUm/T3ZeZtFkGvKUqJh0RY7yeZAx 5+aoebI4KRhECIVUqK3S0FLaietPo9KCt16BByiL+PwKBoOVAZEOeycmuqSTgn3yP9lW debsJVOdRWkii/CgyvRzqLyOK8Nnkyst2KXRbAqa6S04pw186rVHmCMysyFbwkzDlG4O vb9g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e6-v6si25355860pgt.208.2018.05.29.07.26.14; Tue, 29 May 2018 07:26:28 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935384AbeE2OXt (ORCPT + 99 others); Tue, 29 May 2018 10:23:49 -0400 Received: from smtp4.ccs.ornl.gov ([160.91.203.40]:38208 "EHLO smtp4.ccs.ornl.gov" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S935304AbeE2OWn (ORCPT ); Tue, 29 May 2018 10:22:43 -0400 Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 1355510052D9; Tue, 29 May 2018 10:22:07 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 1161FBF; Tue, 29 May 2018 10:22:07 -0400 (EDT) From: James Simmons To: Greg Kroah-Hartman , devel@driverdev.osuosl.org, Andreas Dilger , Oleg Drokin , NeilBrown Cc: Linux Kernel Mailing List , Lustre Development List , Dmitry Eremin , James Simmons Subject: [PATCH v2 24/25] staging: lustre: libcfs: change CPT estimate algorithm Date: Tue, 29 May 2018 10:22:04 -0400 Message-Id: <1527603725-30560-25-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1527603725-30560-1-git-send-email-jsimmons@infradead.org> References: <1527603725-30560-1-git-send-email-jsimmons@infradead.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Dmitry Eremin The main idea to have more CPU partitions is based on KNL experience. When a thread submit IO for network communication one of threads from current CPT is used for network stack. Whith high parallelization many threads become involved in network submission but having less CPU partitions they will wait until single thread process them from network queue. So, the bottleneck just moves into network layer in case of small amount of CPU partitions. My experiments showed that the best performance was when for each IO thread we have one network thread. This condition can be provided having 2 real HW cores (without hyper threads) per CPT. This is exactly what implemented in this patch. Change CPT estimate algorithm from 2 * (N - 1)^2 < NCPUS <= 2 * N^2 to 2 HW cores per CPT. This is critical for machines with number of cores different from 2^N. Current algorithm splits CPTs in KNL: LNet: HW CPU cores: 272, npartitions: 16 cpu_partition_table= 0 : 0-4,68-71,136-139,204-207 1 : 5-9,73-76,141-144,209-212 2 : 10-14,78-81,146-149,214-217 3 : 15-17,72,77,83-85,140,145,151-153,208,219-221 4 : 18-21,82,86-88,150,154-156,213,218,222-224 5 : 22-26,90-93,158-161,226-229 6 : 27-31,95-98,163-166,231-234 7 : 32-35,89,100-103,168-171,236-239 8 : 36-38,94,99,104-105,157,162,167,172-173,225,230,235,240-241 9 : 39-43,107-110,175-178,243-246 10 : 44-48,112-115,180-183,248-251 11 : 49-51,106,111,117-119,174,179,185-187,242,253-255 12 : 52-55,116,120-122,184,188-190,247,252,256-258 13 : 56-60,124-127,192-195,260-263 14 : 61-65,129-132,197-200,265-268 15 : 66-67,123,128,133-135,191,196,201-203,259,264,269-271 New algorithm will split CPTs in KNL: LNet: HW CPU cores: 272, npartitions: 34 cpu_partition_table= 0 : 0-1,68-69,136-137,204-205 1 : 2-3,70-71,138-139,206-207 2 : 4-5,72-73,140-141,208-209 3 : 6-7,74-75,142-143,210-211 4 : 8-9,76-77,144-145,212-213 5 : 10-11,78-79,146-147,214-215 6 : 12-13,80-81,148-149,216-217 7 : 14-15,82-83,150-151,218-219 8 : 16-17,84-85,152-153,220-221 9 : 18-19,86-87,154-155,222-223 10 : 20-21,88-89,156-157,224-225 11 : 22-23,90-91,158-159,226-227 12 : 24-25,92-93,160-161,228-229 13 : 26-27,94-95,162-163,230-231 14 : 28-29,96-97,164-165,232-233 15 : 30-31,98-99,166-167,234-235 16 : 32-33,100-101,168-169,236-237 17 : 34-35,102-103,170-171,238-239 18 : 36-37,104-105,172-173,240-241 19 : 38-39,106-107,174-175,242-243 20 : 40-41,108-109,176-177,244-245 21 : 42-43,110-111,178-179,246-247 22 : 44-45,112-113,180-181,248-249 23 : 46-47,114-115,182-183,250-251 24 : 48-49,116-117,184-185,252-253 25 : 50-51,118-119,186-187,254-255 26 : 52-53,120-121,188-189,256-257 27 : 54-55,122-123,190-191,258-259 28 : 56-57,124-125,192-193,260-261 29 : 58-59,126-127,194-195,262-263 30 : 60-61,128-129,196-197,264-265 31 : 62-63,130-131,198-199,266-267 32 : 64-65,132-133,200-201,268-269 33 : 66-67,134-135,202-203,270-271 'N' pattern in KNL works is not always good. in flat mode it will be one CPT with all CPUs inside. in SNC-4 mode: cpu_partition_table= 0 : 0-17,68-85,136-153,204-221 1 : 18-35,86-103,154-171,222-239 2 : 36-51,104-119,172-187,240-255 3 : 52-67,120-135,188-203,256-271 Signed-off-by: Dmitry Eremin Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8703 Reviewed-on: https://review.whamcloud.com/24304 Reviewed-by: James Simmons Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- Changelog: v1) Initial patch v2) Rebased patch. No changes in code from earlier patch drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 30 +++++-------------------- 1 file changed, 5 insertions(+), 25 deletions(-) diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c index aed48de..ff752d5 100644 --- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c +++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c @@ -798,34 +798,14 @@ static int cfs_cpt_choose_ncpus(struct cfs_cpt_table *cptab, int cpt, static int cfs_cpt_num_estimate(void) { - int nnode = num_online_nodes(); + int nthr = cpumask_weight(topology_sibling_cpumask(smp_processor_id())); int ncpu = num_online_cpus(); - int ncpt; + int ncpt = 1; - if (ncpu <= CPT_WEIGHT_MIN) { - ncpt = 1; - goto out; - } - - /* generate reasonable number of CPU partitions based on total number - * of CPUs, Preferred N should be power2 and match this condition: - * 2 * (N - 1)^2 < NCPUS <= 2 * N^2 - */ - for (ncpt = 2; ncpu > 2 * ncpt * ncpt; ncpt <<= 1) - ; - - if (ncpt <= nnode) { /* fat numa system */ - while (nnode > ncpt) - nnode >>= 1; + if (ncpu > CPT_WEIGHT_MIN) + for (ncpt = 2; ncpu > 2 * nthr * ncpt; ncpt++) + ; /* nothing */ - } else { /* ncpt > nnode */ - while ((nnode << 1) <= ncpt) - nnode <<= 1; - } - - ncpt = nnode; - -out: #if (BITS_PER_LONG == 32) /* config many CPU partitions on 32-bit system could consume * too much memory -- 1.8.3.1