Received: by 10.192.165.156 with SMTP id m28csp1291397imm; Fri, 13 Apr 2018 17:28:06 -0700 (PDT) X-Google-Smtp-Source: AIpwx49Woj2A3aqGIjDPQmslTlfDkkzZBQWX0852HjWL8WpYNDc7sPLHejd4tjOaG3ffhdZ/zhO0 X-Received: by 10.99.97.146 with SMTP id v140mr5713915pgb.415.1523665686684; Fri, 13 Apr 2018 17:28:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1523665686; cv=none; d=google.com; s=arc-20160816; b=FiZuOi6bFbYiC+sisJq8guQmaJ0w9l9MXnwjL7c0LbUuXSXuBBOJ/Zx7kt1uKb6ZnU ChUll6Omk+zmkpn9ditMI5qBR6UVomVm8UEMWL6kKjzD7WoIEF7JjAZhGzaWsUz9NcE7 gxx0x5aEtkCvVa9G2WExyzKTjF+A5KVnmLh1V0ViiXlwnrd1w/Qdw71vwcpCSTrJ5swZ OKYp+dqW2FtMt/kVAspOQI87orejI4cB7vBMbIvWqz6FmSE4KGp3KJubICsUQd41InRB p55znF/szpzdDcWjdT1j0wXhBqnIrmyoGq/rO87K2vmoNovNEP1HLSu+oCUOwTnkrH17 uWuQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:content-transfer-encoding :spamdiagnosticmetadata:spamdiagnosticoutput:content-language :accept-language:in-reply-to:references:message-id:date:thread-index :thread-topic:subject:to:from:dkim-signature :arc-authentication-results; bh=1XfIz4XZT1M/HkV93wNU8cQSLe2xD8Z7+3qGRVatNKU=; b=O9NFTIlVst/UkAFjaxkVHfYcX/G/aFFUlda+5cQ7XdGsNszpXwDu8EyQK8ywmG4mjx 5ltjnADNxYIZYHbw+XfdVgs0JTfEgaiRLg5ak1x/LQ2Gc9hW1oECifG2it3WCLP72OlS iM+5NntfP7dqCDHTSnFDEPLm73aAaGHoZ6N8+qfS/bhuF0a23xuxsMB7T+G9/4Uqfrql u7mPJI6tak7Vbi1Z0IGYbFI1agI03fv/BX3y2jlV9Qg51ofisPCberwM92zW0ZCabVsu LVmOIvMUcANAd4/l3yXGDdHDOUp1M4/zWd7AqTmUKYibOMgtDXenzZeoRaCzozQnNrI+ rnVQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@microsoft.com header.s=selector1 header.b=SJMlJNAS; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=microsoft.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id r77si5599732pfa.359.2018.04.13.17.27.39; Fri, 13 Apr 2018 17:28:06 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@microsoft.com header.s=selector1 header.b=SJMlJNAS; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=microsoft.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751847AbeDNAZn (ORCPT + 99 others); Fri, 13 Apr 2018 20:25:43 -0400 Received: from mail-by2nam03on0097.outbound.protection.outlook.com ([104.47.42.97]:19372 "EHLO NAM03-BY2-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751133AbeDNAZk (ORCPT ); Fri, 13 Apr 2018 20:25:40 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=1XfIz4XZT1M/HkV93wNU8cQSLe2xD8Z7+3qGRVatNKU=; b=SJMlJNASMedYrm46ih4Z+5ilkFndsj0HLi32YEkfcg1YNvWhUH2s50z1kZbEnLZ+eXbeUdhoRcovlAoor6UK+RKAD691lLITAVKshFt24m/4bLWh4LaOPLFxgtSmmVw61sMInGC+6fOIR3oZwxAgPcUjpY34vNTQ/V3b+oP/L6A= Received: from MWHPR2101MB0729.namprd21.prod.outlook.com (10.167.161.167) by MWHPR2101MB0810.namprd21.prod.outlook.com (10.167.162.167) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.696.3; Sat, 14 Apr 2018 00:25:38 +0000 Received: from MWHPR2101MB0729.namprd21.prod.outlook.com ([fe80::2944:e336:d611:1ee9]) by MWHPR2101MB0729.namprd21.prod.outlook.com ([fe80::2944:e336:d611:1ee9%5]) with mapi id 15.20.0696.008; Sat, 14 Apr 2018 00:25:38 +0000 From: Long Li To: "Michael Kelley (EOSG)" , KY Srinivasan , Haiyang Zhang , Stephen Hemminger , "James E . J . Bottomley" , "Martin K . Petersen" , "devel@linuxdriverproject.org" , "linux-scsi@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "netdev@vger.kernel.org" Subject: RE: [Resend Patch 3/3] Storvsc: Select channel based on available percentage of ring buffer to write Thread-Topic: [Resend Patch 3/3] Storvsc: Select channel based on available percentage of ring buffer to write Thread-Index: AQHTxi7FMhV2Xzbr2EmsxAzgm0aQn6P/ccQAgAAM6tA= Date: Sat, 14 Apr 2018 00:25:38 +0000 Message-ID: References: <20180328004840.22787-1-longli@linuxonhyperv.com> <20180328004840.22787-3-longli@linuxonhyperv.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [2001:4898:80e8:f::2e0] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1;MWHPR2101MB0810;7:ZW3yD3blLe6cGHjupYRdWUasQRne/0QomLhqQoMWaIrAHFsYqoCM1pV6pYIIk//77hj0NypEGn/V656aZWuys2iRsK70dl1LrPXHK4Dp7pS2upleG0c7PtOJnKT76XUBYyQG7An2JNt1bDb+hPqnI7uF1dQTae31s8wVLVaaGzukK004b9baFHMCTHjOMq10gizaKNw9q7qk1aR3A5Lch2pZzW4licAjHJz7O7eYJMOSAY18R1/not3mTPea0Z7b;20:FzNVP0rpfLnOei3frB5xufwevKi7K3OGcZlM83+DYmIuMcmKhnDkltb6b3dIb8tQgiJrKLMpQZMI65h30DIDelAu//Pj1R9IR1fqib76OH1BRgINz/nmR3T5hfWu3zyUboi1kMwdLVrEOy6sOpUsPqEhM4zfwLywOzXmIQejLes= x-ms-exchange-antispam-srfa-diagnostics: SOS; x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:(7020095)(4652020)(48565401081)(5600026)(2017052603328)(7193020);SRVR:MWHPR2101MB0810; x-ms-traffictypediagnostic: MWHPR2101MB0810: authentication-results: spf=none (sender IP is ) smtp.mailfrom=longli@microsoft.com; x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(28532068793085)(89211679590171)(9452136761055)(146099531331640); x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(8211001083)(61425038)(6040522)(2401047)(5005006)(8121501046)(10201501046)(3002001)(3231232)(944501347)(52105095)(93006095)(93001095)(6055026)(61426038)(61427038)(6041310)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123562045)(20161123558120)(20161123560045)(20161123564045)(6072148)(201708071742011);SRVR:MWHPR2101MB0810;BCL:0;PCL:0;RULEID:;SRVR:MWHPR2101MB0810; x-forefront-prvs: 0642A5E7BA x-forefront-antispam-report: SFV:NSPM;SFS:(10019020)(39380400002)(39860400002)(376002)(346002)(396003)(366004)(13464003)(189003)(199004)(6116002)(59450400001)(446003)(1511001)(46003)(97736004)(486006)(33656002)(25786009)(22452003)(316002)(3280700002)(3660700001)(2906002)(76176011)(476003)(7696005)(9686003)(55016002)(53936002)(229853002)(6436002)(6246003)(106356001)(105586002)(11346002)(478600001)(2900100001)(14454004)(5250100002)(2501003)(5660300001)(102836004)(2201001)(7736002)(86362001)(68736007)(8676002)(8990500004)(81156014)(81166006)(575784001)(10290500003)(86612001)(110136005)(8936002)(53546011)(6506007)(99286004)(305945005)(74316002)(10090500001)(921003)(1121003);DIR:OUT;SFP:1102;SCL:1;SRVR:MWHPR2101MB0810;H:MWHPR2101MB0729.namprd21.prod.outlook.com;FPR:;SPF:None;LANG:en;PTR:InfoNoRecords;MX:1;A:1; received-spf: None (protection.outlook.com: microsoft.com does not designate permitted sender hosts) spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Office365-Filtering-Correlation-Id: 5278cfc4-2dd4-44f6-3115-08d5a19e3f7d X-OriginatorOrg: microsoft.com X-MS-Exchange-CrossTenant-Network-Message-Id: 5278cfc4-2dd4-44f6-3115-08d5a19e3f7d X-MS-Exchange-CrossTenant-originalarrivaltime: 14 Apr 2018 00:25:38.2535 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 72f988bf-86f1-41af-91ab-2d7cd011db47 X-MS-Exchange-Transport-CrossTenantHeadersStamped: MWHPR2101MB0810 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > Subject: RE: [Resend Patch 3/3] Storvsc: Select channel based on availabl= e > percentage of ring buffer to write >=20 > > -----Original Message----- > > From: linux-kernel-owner@vger.kernel.org > > On Behalf Of Long Li > > Sent: Tuesday, March 27, 2018 5:49 PM > > To: KY Srinivasan ; Haiyang Zhang > > ; Stephen Hemminger > ; > > James E . J . Bottomley ; Martin K . Petersen > > ; devel@linuxdriverproject.org; linux- > > scsi@vger.kernel.org; linux-kernel@vger.kernel.org; > > netdev@vger.kernel.org > > Cc: Long Li > > Subject: [Resend Patch 3/3] Storvsc: Select channel based on available > > percentage of ring buffer to write > > > > From: Long Li > > > > This is a best effort for estimating on how busy the ring buffer is > > for that channel, based on available buffer to write in percentage. It > > is still possible that at the time of actual ring buffer write, the > > space may not be available due to other processes may be writing at the > time. > > > > Selecting a channel based on how full it is can reduce the possibility > > that a ring buffer write will fail, and avoid the situation a channel > > is over busy. > > > > Now it's possible that storvsc can use a smaller ring buffer size > > (e.g. 40k bytes) to take advantage of cache locality. > > > > Signed-off-by: Long Li > > --- > > drivers/scsi/storvsc_drv.c | 62 > > +++++++++++++++++++++++++++++++++++++--------- > > 1 file changed, 50 insertions(+), 12 deletions(-) > > > > diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c > > index a2ec0bc9e9fa..b1a87072b3ab 100644 > > --- a/drivers/scsi/storvsc_drv.c > > +++ b/drivers/scsi/storvsc_drv.c > > @@ -395,6 +395,12 @@ MODULE_PARM_DESC(storvsc_ringbuffer_size, > "Ring > > buffer size (bytes)"); > > > > module_param(storvsc_vcpus_per_sub_channel, int, S_IRUGO); > > MODULE_PARM_DESC(storvsc_vcpus_per_sub_channel, "Ratio of VCPUs > to > > subchannels"); > > + > > +static int ring_avail_percent_lowater =3D 10; >=20 > Reserving 10% of each ring buffer by default seems like more than is need= ed > in the storvsc driver. That would be about 4Kbytes for the 40K ring buff= er > you suggest, and even more for a ring buffer of 128K. Each outgoing reco= rd is > only about 344 bytes (I'd have to check exactly). With the new channel > selection algorithm below, the only time we use a channel that is already > below the low water mark is when no channel could be found that is above > the low water mark. There could be a case of two or more threads decidi= ng > that a channel is above the low water mark at the same time and both > choosing it, but that's likely to be rare. So it seems like we could set= the It's not rare for two processes checking on the same channel at the same ti= me, when running multiple processes I/O workload. The CPU to channel is not= 1:1 mapping. > default low water mark to 5 percent or even 3 percent, which will let mor= e of > the ring buffer be used, and let a channel be assigned according to the > algorithm, rather than falling through to the default because all channel= s > appear to be "full". It seems it's not about how big ring buffer is, e.g. even you have a ring b= uffer of infinite size, it won't help with performance if it's getting queu= ed all the time, while other ring buffers are near empty. It's more about h= ow multiple ring buffers are getting utilized in a reasonable and balanced = way. Testing shows 10 is a good choice, while 3 is prone to return BUSY and= trigger block layer retry. >=20 > > +module_param(ring_avail_percent_lowater, int, S_IRUGO); > > +MODULE_PARM_DESC(ring_avail_percent_lowater, > > + "Select a channel if available ring size > this in percent"); > > + > > /* > > * Timeout in seconds for all devices managed by this driver. > > */ > > @@ -1285,9 +1291,9 @@ static int storvsc_do_io(struct hv_device > > *device, { > > struct storvsc_device *stor_device; > > struct vstor_packet *vstor_packet; > > - struct vmbus_channel *outgoing_channel; > > + struct vmbus_channel *outgoing_channel, *channel; > > int ret =3D 0; > > - struct cpumask alloced_mask; > > + struct cpumask alloced_mask, other_numa_mask; > > int tgt_cpu; > > > > vstor_packet =3D &request->vstor_packet; @@ -1301,22 +1307,53 @@ > > static int storvsc_do_io(struct hv_device *device, > > /* > > * Select an an appropriate channel to send the request out. > > */ > > - > > if (stor_device->stor_chns[q_num] !=3D NULL) { > > outgoing_channel =3D stor_device->stor_chns[q_num]; > > - if (outgoing_channel->target_cpu =3D=3D smp_processor_id()) { > > + if (outgoing_channel->target_cpu =3D=3D q_num) { > > /* > > * Ideally, we want to pick a different channel if > > * available on the same NUMA node. > > */ > > cpumask_and(&alloced_mask, &stor_device- > >alloced_cpus, > > > cpumask_of_node(cpu_to_node(q_num))); > > - for_each_cpu_wrap(tgt_cpu, &alloced_mask, > > - outgoing_channel->target_cpu + 1) { > > - if (tgt_cpu !=3D outgoing_channel->target_cpu) > { > > - outgoing_channel =3D > > - stor_device->stor_chns[tgt_cpu]; > > - break; > > + > > + for_each_cpu_wrap(tgt_cpu, &alloced_mask, > q_num + 1) { > > + if (tgt_cpu =3D=3D q_num) > > + continue; > > + channel =3D stor_device->stor_chns[tgt_cpu]; > > + if (hv_get_avail_to_write_percent( > > + &channel->outbound) > > + > ring_avail_percent_lowater) > { > > + outgoing_channel =3D channel; > > + goto found_channel; > > + } > > + } > > + > > + /* > > + * All the other channels on the same NUMA node > are > > + * busy. Try to use the channel on the current CPU > > + */ > > + if (hv_get_avail_to_write_percent( > > + &outgoing_channel- > >outbound) > > + > ring_avail_percent_lowater) > > + goto found_channel; > > + > > + /* > > + * If we reach here, all the channels on the current > > + * NUMA node are busy. Try to find a channel in > > + * other NUMA nodes > > + */ > > + cpumask_andnot(&other_numa_mask, > > + &stor_device->alloced_cpus, > > + > cpumask_of_node(cpu_to_node(q_num))); > > + > > + for_each_cpu(tgt_cpu, &other_numa_mask) { > > + channel =3D stor_device->stor_chns[tgt_cpu]; > > + if (hv_get_avail_to_write_percent( > > + &channel->outbound) > > + > ring_avail_percent_lowater) > { > > + outgoing_channel =3D channel; > > + goto found_channel; > > } > > } > > } > > @@ -1324,7 +1361,7 @@ static int storvsc_do_io(struct hv_device *device= , > > outgoing_channel =3D get_og_chn(stor_device, q_num); > > } > > > > - > > +found_channel: > > vstor_packet->flags |=3D REQUEST_COMPLETION_FLAG; > > > > vstor_packet->vm_srb.length =3D (sizeof(struct vmscsi_request) - @@ > > -1733,7 +1770,8 @@ static int storvsc_probe(struct hv_device *device, > > } > > > > scsi_driver.can_queue =3D (max_outstanding_req_per_channel * > > - (max_sub_channels + 1)); > > + (max_sub_channels + 1)) * > > + (100 - ring_avail_percent_lowater) / 100; >=20 > A minor nit, but the use of parentheses here is inconsistent. There's a = set of > parens around the first two expressions to explicitly code the associativ= ity, > but not a set to encompass the third term, which must be processed before > the fourth one is. C does multiplication and division with left to right > associativity, so the result is as intended. > But if we're depending on C's default associativity, then that set of par= ens > around the first two expression really isn't needed, and one wonders why > they are there. >=20 > Michael >=20 > > > > host =3D scsi_host_alloc(&scsi_driver, > > sizeof(struct hv_host_device)); > > -- > > 2.14.1