Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753450AbdH2PdT (ORCPT ); Tue, 29 Aug 2017 11:33:19 -0400 Received: from mail-dm3nam03on0096.outbound.protection.outlook.com ([104.47.41.96]:17185 "EHLO NAM03-DM3-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751670AbdH2PdR (ORCPT ); Tue, 29 Aug 2017 11:33:17 -0400 From: "Stephen Bates" To: Jens Axboe , "linux-block@vger.kernel.org" , "linux-kernel@vger.kernel.org" CC: "osandov@fb.com" , "damien.lemoal@wdc.com" Subject: Re: [PATCH] blk-mq: Improvements to the hybrid polling sleep time calculation Thread-Topic: [PATCH] blk-mq: Improvements to the hybrid polling sleep time calculation Thread-Index: AQHTGorMhi9dtMloh0ixIeJiETs4MqKQ08KAgApK1AA= Date: Tue, 29 Aug 2017 15:33:15 +0000 Message-ID: References: <1503326134-3862-1-git-send-email-sbates@raithlin.com> <54dad77e-18d7-eb64-35fb-670fecc83ce7@kernel.dk> In-Reply-To: <54dad77e-18d7-eb64-35fb-670fecc83ce7@kernel.dk> Accept-Language: en-CA, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: user-agent: Microsoft-MacOutlook/f.25.0.170815 authentication-results: spf=none (sender IP is ) smtp.mailfrom=sbates@raithlin.com; x-originating-ip: [70.65.224.121] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1;YTOPR01MB0682;6:UVev6bFxaWBj3IB9dSsWg1NEa26oq+v/T7wCvwokeIPZHzUEmb+7nD08/KhcjIl6zh9T7eo9fGIewKdKPKhgenVKqXx7p30kM8yif8vuXzAK7hvgsLa5xeYZAgVqjrN90oJhNrMFP5AlE99FoFCbt0eg8IKwJd7u0csBpmlrDIOLgL/RzKGf/jazF/zthkRIOIF1u9JvByCF3wPJMrCsjInRX9Y25tGVoqcwpqOzT60m1CKUt4JLLHQObYO6Nnijz43jftfy6Q47K5ernSAuIimhlv0rFg/ZFCOxn7kFUaskzCGM4O1jODJZIaF7cRvCUPFp5UO39gQwgVCwupHVSw==;5:e0PSjxvtoUr0jncOZZiex8cBFSQfb2Tmm9dadWlH9rcjdLDPkpmF6VMcyDAsSWiUiUunhWZ660ocHcTLvHjr8/zO5XYpHWKO5SNVtDdotO9QBsQktYM45XI6yWkcw/qwTkNCf6qL9IY5RyBPU7Km9Q==;24:McOzwr4y7Nff5zW6tN2CMExuQ1oBpgxbUYFXkfb1x9TjX403WTiOBFot6q5/xHJQlzWCh4rCxBiVswLJUuaHfDBL0ymRCIRQxK6ntRr7f2A=;7:63l9p4qYymnhPukx5W57tLo9sxQoObMMZJP/WwTsPj+aDApIY4c4+8tcpI1QUUAQ6VfKIOpKZ3b12FR8JzhT1JTswdQoOyQDG+Gd6wrdpXpvy1OzVWmQvgRQU47EOm571Jl8eKxU2wLIx0sRtrXKX0R+mpB95+V8Lb0pTAvkshKiIDzJ9+cByweFZAXhr81H/UmRhSbCC7W8feim+vlhBs/2CVEsljZWh22tGZ5f/CE= x-ms-exchange-antispam-srfa-diagnostics: SSOS; x-ms-office365-filtering-correlation-id: ee5f9e29-c07c-4a11-5993-08d4eef34437 x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:(300000500095)(300135000095)(300000501095)(300135300095)(22001)(300000502095)(300135100095)(2017030254152)(2017082002075)(300000503095)(300135400095)(2017052603199)(201703131423075)(201702281549075)(300000504095)(300135200095)(300000505095)(300135600095)(300000506095)(300135500095);SRVR:YTOPR01MB0682; x-ms-traffictypediagnostic: YTOPR01MB0682: x-exchange-antispam-report-test: UriScan:; x-microsoft-antispam-prvs: x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(6040450)(2401047)(5005006)(8121501046)(3002001)(93006095)(93001095)(10201501046)(100000703101)(100105400095)(6041248)(20161123555025)(20161123558100)(2016111802025)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123560025)(20161123562025)(20161123564025)(6072148)(6043046)(201708071742011)(100000704101)(100105200095)(100000705101)(100105500095);SRVR:YTOPR01MB0682;BCL:0;PCL:0;RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095);SRVR:YTOPR01MB0682; x-forefront-prvs: 0414DF926F x-forefront-antispam-report: SFV:NSPM;SFS:(10019020)(6009001)(39830400002)(199003)(189002)(2900100001)(106356001)(105586002)(97736004)(4001350100001)(101416001)(7736002)(305945005)(77096006)(6486002)(6436002)(229853002)(6506006)(33656002)(54906002)(53936002)(189998001)(82746002)(478600001)(6512007)(25786009)(3846002)(102836003)(3660700001)(3280700002)(4326008)(6246003)(2950100002)(81166006)(81156014)(8936002)(68736007)(8676002)(2501003)(6116002)(2906002)(5660300001)(2201001)(86362001)(50986999)(76176999)(54356999)(66066001)(36756003)(83716003)(83506001)(14454004);DIR:OUT;SFP:1102;SCL:1;SRVR:YTOPR01MB0682;H:YTOPR01MB0619.CANPRD01.PROD.OUTLOOK.COM;FPR:;SPF:None;PTR:InfoNoRecords;A:1;MX:1;LANG:en; spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="utf-8" Content-ID: MIME-Version: 1.0 X-OriginatorOrg: raithlin.com X-MS-Exchange-CrossTenant-originalarrivaltime: 29 Aug 2017 15:33:15.3774 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 18519031-7ff4-4cbb-bbcb-c3252d330f4b X-MS-Exchange-Transport-CrossTenantHeadersStamped: YTOPR01MB0682 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by nfs id v7TGNeJ8027474 Content-Length: 2825 Lines: 61 >> From: Stephen Bates >> >> Hybrid polling currently uses half the average completion time as an >> estimate of how long to poll for. We can improve upon this by noting >> that polling before the minimum completion time makes no sense. Add a >> sysfs entry to use this fact to improve CPU utilization in certain >> cases. >> >> At the same time the minimum is a bit too long to sleep for since we >> must factor in OS wake time for the thread. For now allow the user to >> set this via a second sysfs entry (in nanoseconds). >> >> Testing this patch on Intel Optane SSDs showed that using the minimum >> rather than half reduced CPU utilization from 59% to 38%. Tuning >> this via the wake time adjustment allowed us to trade CPU load for >> latency. For example >> >> io_poll delay hyb_use_min adjust latency CPU load >> 1 -1 N/A N/A 8.4 100% >> 1 0 0 N/A 8.4 57% >> 1 0 1 0 10.3 34% >> 1 9 1 1000 9.9 37% >> 1 0 1 2000 8.4 47% >> 1 0 1 10000 8.4 100% >> >> Ideally we will extend this to auto-calculate the wake time rather >> than have it set by the user. > > I don't like this, it's another weird knob that will exist but that > no one will know how to use. For most of the testing I've done > recently, hybrid is a win over busy polling - hence I think we should > make that the default. 60% of mean has also, in testing, been shown > to be a win. So that's an easy fix/change we can consider. I do agree that the this is a hard knob to tune. I am however not happy that the current hybrid default may mean we are polling well before the minimum completion time. That just seems like a waste of CPU resources to me. I do agree that turning on hybrid as the default and perhaps bumping up the default is a good idea. > To go beyond that, I'd much rather see us tracking the time waste. > If we consider the total completion time of an IO to be A+B+C, where: > > A Time needed to go to sleep > B Sleep time > C Time needed to wake up > > then we could feasibly track A+C. We already know how long the IO > will take to complete, as we track that. At that point we'd have > a full picture of how long we should sleep. Yes, this is where I was thinking of taking this functionality in the long term. It seems like tracking C is something other parts of the kernel might need. Does anyone know of any existing code in this space? > Bonus points for informing the lower level scheduler of this as > well. If the CPU is going idle, we'll enter some sort of power > state in the processor. If we were able to pass in how long we > expect to sleep, we could be making better decisions here. Yup. Again, this seems like something more general that just the block-layer. I will do some digging and see/if anything is available to leverage here. Cheers Stephen