Received: by 10.192.165.156 with SMTP id m28csp1253331imm; Fri, 13 Apr 2018 16:29:30 -0700 (PDT) X-Google-Smtp-Source: AIpwx4/BDbNqCMnJiX4j5KYARJjdk/9NzzAIgdVUKaG5gA/o1x1GBCPOOeG/LfYYZ/L7rIuBzFHj X-Received: by 10.98.8.12 with SMTP id c12mr8366059pfd.77.1523662170104; Fri, 13 Apr 2018 16:29:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1523662170; cv=none; d=google.com; s=arc-20160816; b=AllpBtXXzPNtsTAdliZM6/aWBuXd5AFE4yP31QWZRq82rpyNOxLHgszp1rDW/PaSCW 9mnYfWABdRhvT+J3IzmUPJ6nww4bPNrLHNMlxGFFADILuUXnFdNHS6TaB6cD2FBEZp8k 4NJcIeM4t8xzn1qVe4fRL7fn3zba8SqHXhdcNDqyZCCyylLyO22tUxi0A8oAGZl3GHje eyAQciKK3TcqRkuk0Xu1DDDWaxYjfdzpJBQWdremNPHxzicw6jflvWtT9+grxzQEAbRg qAECBzV21KwaVFBSKJxGZ64pL9VC3UqSnUoeFeJuBDqzJEt+aVinCfs4twQiMzSLV9lv V7Aw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:content-transfer-encoding :spamdiagnosticmetadata:spamdiagnosticoutput:msip_labels :content-language:accept-language:in-reply-to:references:message-id :date:thread-index:thread-topic:subject:to:from:dkim-signature :arc-authentication-results; bh=IMplO3n34PNRZSURg5RCIC7DLuLcuIjjAyrGmcTPjso=; b=l4W4NBrQwpJNdTgktN0Kbh1boBGxmtj6Qweej8EJZImkrDiuMixmLFcFmYYjz2IuNe iHBDPB2jDL+3+zwmXFgwZve7MxhgI9LzmoUXTPn1vzTpsDc+Qcw9iiWMTflk4bF5Z42a 4K4Qu8UjTldxUbYbM2Yo87Fn+Uc4IuET8DOKiDnU7STZKFeA2MtN7a+9y3FDsOrE87C0 GC4rSm9K/gnf1xFjS6cs/g30MwZUvsILzsywu4T+mJinvdb75uUZ4WtJwfmgtYx9Wh3I 7+dXpBzflpXFZVzkaasf7BacKP/F7N3Tk+joChXXzpAad9wJP97/T+ob5tDZzVOdf9fs vXTQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@microsoft.com header.s=selector1 header.b=FeT0ggEQ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=microsoft.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h61-v6si6408814pld.152.2018.04.13.16.29.16; Fri, 13 Apr 2018 16:29:30 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@microsoft.com header.s=selector1 header.b=FeT0ggEQ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=microsoft.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751782AbeDMX2I (ORCPT + 99 others); Fri, 13 Apr 2018 19:28:08 -0400 Received: from mail-by2nam03on0123.outbound.protection.outlook.com ([104.47.42.123]:17488 "EHLO NAM03-BY2-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751154AbeDMX2F (ORCPT ); Fri, 13 Apr 2018 19:28:05 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=IMplO3n34PNRZSURg5RCIC7DLuLcuIjjAyrGmcTPjso=; b=FeT0ggEQnPYRU++Bw4LKIlJM3kygwLr66a00el5JfNP58Stns02pqMuJR5JQOFRuGmUPNpKLEtQoEFI2YUTHL6eluIOf/j0iSeEv0YlYLW3x7xvqOjPdql+8d8Ahyl4jKckk4Fdnz64G2et5hP6EPXOmyYOwWl39RJ39UhpM2Kw= Received: from DM5PR2101MB1030.namprd21.prod.outlook.com (52.132.128.11) by DM5PR2101MB0999.namprd21.prod.outlook.com (52.132.133.33) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.20.696.2; Fri, 13 Apr 2018 23:28:02 +0000 Received: from DM5PR2101MB1030.namprd21.prod.outlook.com ([fe80::91b9:c1b0:20f2:8412]) by DM5PR2101MB1030.namprd21.prod.outlook.com ([fe80::91b9:c1b0:20f2:8412%2]) with mapi id 15.20.0696.008; Fri, 13 Apr 2018 23:28:02 +0000 From: "Michael Kelley (EOSG)" To: Long Li , KY Srinivasan , Haiyang Zhang , Stephen Hemminger , "James E . J . Bottomley" , "Martin K . Petersen" , "devel@linuxdriverproject.org" , "linux-scsi@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "netdev@vger.kernel.org" Subject: RE: [Resend Patch 3/3] Storvsc: Select channel based on available percentage of ring buffer to write Thread-Topic: [Resend Patch 3/3] Storvsc: Select channel based on available percentage of ring buffer to write Thread-Index: AQHTxi7VZycA3BlyO0KRjEaTIuc2cqP/aBPg Date: Fri, 13 Apr 2018 23:28:02 +0000 Message-ID: References: <20180328004840.22787-1-longli@linuxonhyperv.com> <20180328004840.22787-3-longli@linuxonhyperv.com> In-Reply-To: <20180328004840.22787-3-longli@linuxonhyperv.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: MSIP_Label_f42aa342-8706-4288-bd11-ebb85995028c_Enabled=True; MSIP_Label_f42aa342-8706-4288-bd11-ebb85995028c_SiteId=72f988bf-86f1-41af-91ab-2d7cd011db47; MSIP_Label_f42aa342-8706-4288-bd11-ebb85995028c_Owner=mikelley@ntdev.microsoft.com; MSIP_Label_f42aa342-8706-4288-bd11-ebb85995028c_SetDate=2018-04-13T23:28:01.2046805Z; MSIP_Label_f42aa342-8706-4288-bd11-ebb85995028c_Name=General; MSIP_Label_f42aa342-8706-4288-bd11-ebb85995028c_Application=Microsoft Azure Information Protection; MSIP_Label_f42aa342-8706-4288-bd11-ebb85995028c_Extended_MSFT_Method=Automatic; Sensitivity=General x-originating-ip: [24.22.167.197] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1;DM5PR2101MB0999;7:SD3KefnSr0kJnrvjBNaGH787xK0Rg98bzZsc4bBxEAZO+TFGtdXyn/tZZdYPLO5fMqIOvsXk6xchbmeolrhB/MUjbmXG1uO3zs3Degde0RzFFU8Sr4QMJbyGlxLVJTAJ4R+MMuL4PK3oAMXWE3EqEPTFt2sYDVuWPFhPEWp0+HG3tmLiSQwSCvFsmRkfphsjfDWsk1aE1cYcHHLEk/bd+P2BhrBc+Fw5sGf5nu5GHXEqTT0Hoyr3/g+638Rf9vX4;20:cIhfqaRtf8hUSOGjxkfnIVvOYJdt4asIUkI3ab29A5QMk2zrCYTXu69nLiHjyJauFHH/W6KbtWTtDAAEPQH8Grs9YYo+AkIDJromBfnL/D3bDy76wRFv6Q69C+dAmb233fPwPsC94mPzBW/f0SIErmoTyUSB+XtkzqflTMT2F4I= x-ms-exchange-antispam-srfa-diagnostics: SOS; x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:(7020095)(4652020)(5600026)(48565401081)(2017052603328)(7193020);SRVR:DM5PR2101MB0999; x-ms-traffictypediagnostic: DM5PR2101MB0999: authentication-results: spf=none (sender IP is ) smtp.mailfrom=Michael.H.Kelley@microsoft.com; x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(28532068793085)(89211679590171)(9452136761055)(146099531331640); x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(8211001083)(61425038)(6040522)(2401047)(5005006)(8121501046)(93006095)(93001095)(10201501046)(3231232)(944501347)(52105095)(3002001)(6055026)(61426038)(61427038)(6041310)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123564045)(20161123558120)(20161123560045)(20161123562045)(6072148)(201708071742011);SRVR:DM5PR2101MB0999;BCL:0;PCL:0;RULEID:;SRVR:DM5PR2101MB0999; x-forefront-prvs: 0641678E68 x-forefront-antispam-report: SFV:NSPM;SFS:(10019020)(39380400002)(396003)(366004)(346002)(39860400002)(376002)(199004)(189003)(13464003)(10290500003)(59450400001)(5660300001)(446003)(478600001)(486006)(106356001)(86612001)(14454004)(55016002)(76176011)(72206003)(22452003)(2900100001)(7696005)(476003)(99286004)(316002)(11346002)(3846002)(33656002)(6116002)(3280700002)(3660700001)(2906002)(105586002)(81166006)(81156014)(86362001)(53546011)(8676002)(8936002)(6246003)(25786009)(68736007)(305945005)(66066001)(53936002)(110136005)(8990500004)(7736002)(74316002)(26005)(2501003)(5250100002)(575784001)(1511001)(2201001)(102836004)(9686003)(229853002)(6506007)(97736004)(10090500001)(6436002)(921003)(1121003);DIR:OUT;SFP:1102;SCL:1;SRVR:DM5PR2101MB0999;H:DM5PR2101MB1030.namprd21.prod.outlook.com;FPR:;SPF:None;LANG:en;PTR:InfoNoRecords;A:1;MX:1; received-spf: None (protection.outlook.com: microsoft.com does not designate permitted sender hosts) spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Office365-Filtering-Correlation-Id: 3cb51acb-c127-448e-d349-08d5a19633c9 X-OriginatorOrg: microsoft.com X-MS-Exchange-CrossTenant-Network-Message-Id: 3cb51acb-c127-448e-d349-08d5a19633c9 X-MS-Exchange-CrossTenant-originalarrivaltime: 13 Apr 2018 23:28:02.7433 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 72f988bf-86f1-41af-91ab-2d7cd011db47 X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM5PR2101MB0999 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > -----Original Message----- > From: linux-kernel-owner@vger.kernel.org On Behalf > Of Long Li > Sent: Tuesday, March 27, 2018 5:49 PM > To: KY Srinivasan ; Haiyang Zhang ; Stephen > Hemminger ; James E . J . Bottomley ; > Martin K . Petersen ; devel@linuxdriverprojec= t.org; linux- > scsi@vger.kernel.org; linux-kernel@vger.kernel.org; netdev@vger.kernel.or= g > Cc: Long Li > Subject: [Resend Patch 3/3] Storvsc: Select channel based on available pe= rcentage of ring > buffer to write >=20 > From: Long Li >=20 > This is a best effort for estimating on how busy the ring buffer is for > that channel, based on available buffer to write in percentage. It is sti= ll > possible that at the time of actual ring buffer write, the space may not = be > available due to other processes may be writing at the time. >=20 > Selecting a channel based on how full it is can reduce the possibility th= at > a ring buffer write will fail, and avoid the situation a channel is over > busy. >=20 > Now it's possible that storvsc can use a smaller ring buffer size > (e.g. 40k bytes) to take advantage of cache locality. >=20 > Signed-off-by: Long Li > --- > drivers/scsi/storvsc_drv.c | 62 +++++++++++++++++++++++++++++++++++++---= ------ > 1 file changed, 50 insertions(+), 12 deletions(-) >=20 > diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c > index a2ec0bc9e9fa..b1a87072b3ab 100644 > --- a/drivers/scsi/storvsc_drv.c > +++ b/drivers/scsi/storvsc_drv.c > @@ -395,6 +395,12 @@ MODULE_PARM_DESC(storvsc_ringbuffer_size, "Ring buff= er size > (bytes)"); >=20 > module_param(storvsc_vcpus_per_sub_channel, int, S_IRUGO); > MODULE_PARM_DESC(storvsc_vcpus_per_sub_channel, "Ratio of VCPUs to subch= annels"); > + > +static int ring_avail_percent_lowater =3D 10; Reserving 10% of each ring buffer by default seems like more than is needed in the storvsc driver. That would be about 4Kbytes for the 40K ring buffer you suggest, and even more for a ring buffer of 128K. Each outgoing record is only about 344 bytes (I'd have to check exactly). With the new channel selection algorithm below, the only time we use a channel that is already below the low water mark is when no channel could be found that is above the low water mark. There could be a case of two or more threads deciding that a channel is above the low water mark at the same time and both choosing it, but that's likely to be rare. So it seems like we could set t= he default low water mark to 5 percent or even 3 percent, which will let more of the ring buffer be used, and let a channel be assigned according to the algorithm, rather than falling through to the default because all channels appear to be "full". > +module_param(ring_avail_percent_lowater, int, S_IRUGO); > +MODULE_PARM_DESC(ring_avail_percent_lowater, > + "Select a channel if available ring size > this in percent"); > + > /* > * Timeout in seconds for all devices managed by this driver. > */ > @@ -1285,9 +1291,9 @@ static int storvsc_do_io(struct hv_device *device, > { > struct storvsc_device *stor_device; > struct vstor_packet *vstor_packet; > - struct vmbus_channel *outgoing_channel; > + struct vmbus_channel *outgoing_channel, *channel; > int ret =3D 0; > - struct cpumask alloced_mask; > + struct cpumask alloced_mask, other_numa_mask; > int tgt_cpu; >=20 > vstor_packet =3D &request->vstor_packet; > @@ -1301,22 +1307,53 @@ static int storvsc_do_io(struct hv_device *device= , > /* > * Select an an appropriate channel to send the request out. > */ > - > if (stor_device->stor_chns[q_num] !=3D NULL) { > outgoing_channel =3D stor_device->stor_chns[q_num]; > - if (outgoing_channel->target_cpu =3D=3D smp_processor_id()) { > + if (outgoing_channel->target_cpu =3D=3D q_num) { > /* > * Ideally, we want to pick a different channel if > * available on the same NUMA node. > */ > cpumask_and(&alloced_mask, &stor_device->alloced_cpus, > cpumask_of_node(cpu_to_node(q_num))); > - for_each_cpu_wrap(tgt_cpu, &alloced_mask, > - outgoing_channel->target_cpu + 1) { > - if (tgt_cpu !=3D outgoing_channel->target_cpu) { > - outgoing_channel =3D > - stor_device->stor_chns[tgt_cpu]; > - break; > + > + for_each_cpu_wrap(tgt_cpu, &alloced_mask, q_num + 1) { > + if (tgt_cpu =3D=3D q_num) > + continue; > + channel =3D stor_device->stor_chns[tgt_cpu]; > + if (hv_get_avail_to_write_percent( > + &channel->outbound) > + > ring_avail_percent_lowater) { > + outgoing_channel =3D channel; > + goto found_channel; > + } > + } > + > + /* > + * All the other channels on the same NUMA node are > + * busy. Try to use the channel on the current CPU > + */ > + if (hv_get_avail_to_write_percent( > + &outgoing_channel->outbound) > + > ring_avail_percent_lowater) > + goto found_channel; > + > + /* > + * If we reach here, all the channels on the current > + * NUMA node are busy. Try to find a channel in > + * other NUMA nodes > + */ > + cpumask_andnot(&other_numa_mask, > + &stor_device->alloced_cpus, > + cpumask_of_node(cpu_to_node(q_num))); > + > + for_each_cpu(tgt_cpu, &other_numa_mask) { > + channel =3D stor_device->stor_chns[tgt_cpu]; > + if (hv_get_avail_to_write_percent( > + &channel->outbound) > + > ring_avail_percent_lowater) { > + outgoing_channel =3D channel; > + goto found_channel; > } > } > } > @@ -1324,7 +1361,7 @@ static int storvsc_do_io(struct hv_device *device, > outgoing_channel =3D get_og_chn(stor_device, q_num); > } >=20 > - > +found_channel: > vstor_packet->flags |=3D REQUEST_COMPLETION_FLAG; >=20 > vstor_packet->vm_srb.length =3D (sizeof(struct vmscsi_request) - > @@ -1733,7 +1770,8 @@ static int storvsc_probe(struct hv_device *device, > } >=20 > scsi_driver.can_queue =3D (max_outstanding_req_per_channel * > - (max_sub_channels + 1)); > + (max_sub_channels + 1)) * > + (100 - ring_avail_percent_lowater) / 100; A minor nit, but the use of parentheses here is inconsistent. There's a set of parens around the first two expressions to explicitly code the associativity, but not a set to encompass the third term, which must be processed before the fourth one is. C does multiplication and division with left to right associativity, so the result is as intended. But if we're depending on C's default associativity, then that set of parens around the first two expression really isn't needed, and one wonders why they are there. Michael >=20 > host =3D scsi_host_alloc(&scsi_driver, > sizeof(struct hv_host_device)); > -- > 2.14.1