Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758072AbcKCORc (ORCPT ); Thu, 3 Nov 2016 10:17:32 -0400 Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:49926 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1755719AbcKCOQa (ORCPT ); Thu, 3 Nov 2016 10:16:30 -0400 Subject: Re: [PATCH 3/4] blk-mq: implement hybrid poll mode for sync O_DIRECT To: Bart Van Assche , , , References: <1478034325-28232-1-git-send-email-axboe@fb.com> <1478034325-28232-4-git-send-email-axboe@fb.com> <1be37430-c75a-308b-04b7-9779a9fe3f56@sandisk.com> CC: From: Jens Axboe Message-ID: <1d9faee0-4de8-f10d-34fa-9022d24f7008@fb.com> Date: Thu, 3 Nov 2016 08:15:46 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <1be37430-c75a-308b-04b7-9779a9fe3f56@sandisk.com> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [64.134.26.173] X-ClientProxiedBy: CY1PR08CA0026.namprd08.prod.outlook.com (10.163.94.164) To BN6PR15MB1185.namprd15.prod.outlook.com (10.172.205.139) X-MS-Office365-Filtering-Correlation-Id: 3d9cdfb5-2171-4f3a-f50c-08d403f3ebcb X-Microsoft-Exchange-Diagnostics: 1;BN6PR15MB1185;2:6p5x0U5Cqz9cMomoAm73G4I2IU/7dLM1zyt0GIPZByW+meg+9hsw++Bn/ks+U2Hjbn6fkN4DTwprqfNHYQ95Odb0VELwvWtSzXfpq4phfDw3EXOAyYDTvLPR8YjwF/ns6iy1kP0djzpxzmfcwOcTh0cARapkrnDQsGhwguXim7sxq9rVO83rFaXWAoBwq7SBvFYdejyPB6w9l/uM9ZTvRg==;3:9uxkYTaPErpZGYQRVSS9bQ66iBkrFSXBARm32/DO3LPCzCPA4R17FQG6thUo1ahRl3l5gdq7fZI9P83l7orzw2NdH3FwYYwOBoDkgEqo11rfC9SZuESmQe46y0CeGkJW9FWJAsPSx5r8hKThfNKg8A== X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:BN6PR15MB1185; X-Microsoft-Exchange-Diagnostics: 1;BN6PR15MB1185;25:Vfd27p46o99pOvN490Higva6lTCf7GQ1dMv8dr5iNkydRzLnp2qXtUYvqQLI22IIsThlUJ676syRn+K+3Mao6F7RoZEupTrmnAeSD4CXSBjHsTWYUj0HT8bWlkEIJOFhgqMw9sEXPhZsgpYt8ktzZF7VmkE7eNA4NgM+0FpXvVMS09PieC8XxHzCn1V8+JRRZWXMYrAhMsAAvaxklzevQ3xm6/kuKlM02uwn3x39y5hRaHiUK04kIe+1BAxnHiQtQvV2OND24EzZ1YZD7CSZXGlKUaamL38CJSBSKjWBZ8H8Rk6Rp8GQbI0O13CzV9hEKMepvckSI8uqZL4uLfrqEIx65ggAUnMHdfcx0yN6jXyB8dwotHJ6bhaoBNCj1CqjePsCkPo88LVOXCXfaNOOaTLnzTu8hqqblrfqli7ctzaDdGarPvkkvcQp8gxPvSsxfdymB55TT6kgEScnROITlyQU9XOR4I4Gf0Z3hB6VjryIGqoQst43xEMTLQFbtPTesvSITfJyNemJ+xw/8+eTuGGfNUCVaxT9g3bzzIDAqRRgLye8HA97/yQutn/YeA5WN627cgvpAs/1oKBaBJtgl0daoxqz/TsDeHXCSq7sJZ4H4J2zibEc29tXAzQQivAcR/MhkJxlA3ypcYhELkEvwkQAOI5iIQW1YLKP9lpQwCuRgfMjDVBmDki0PPu44r0tnMe4CkDFFuYgst3p5sZRiw== X-Microsoft-Exchange-Diagnostics: 1;BN6PR15MB1185;31:SyPyFuEphWXvo3z5KHxEadBea0+SQpigf08/eI3/DctQ1Acn9tUaiW0DXesDlD9pPlfCXG0DxLctDtG7LlvolW+Omcpinj9lFs8u9TFwL0bReM/0v/AJVZ1bWD9AikDlwHy/AVkx0/28E6Vu7PI6uD6yZnSfK6XV7RWYdBCaLg1SzesBA8pwNqBjnHQaY2qTiOt9SMOUO3OGtG+xyu5wMEY0k/AIkWJhy1uJ5X0uktzVC1g/xoUrTyg9VWEvHdw/;20:47ZnQp5OurOsFbqwxz7muYQo+fxZvCUOAEla/+YSlB6SNdwiV4uN1SM/VZxBY2f2LgCV3yLsP14qeV4sqTLkfoKGA3ofX9MWQkK30iwVKjXkmqYPeAQy4KMHs+U993Zk4fQzTjnCV1ddjhIRVvIW48KjG2qMN1IA5gtIDKjmVYY=;4:VHCRiAkw8GBYCkHdzZNbDyfp84PXMZ+ZezuuOIuppGdC0V4lY3hqbplweDYqLQcAS88bb1Hq99GB09ajvT9ljx4KdU+WfL7TaP/gqxJYsKj+7hCVfjBomsgdKJ6hFgXw7h+fG9JjblSTmWucdObXAABU4I1sbpCqVFaijquj+rgYTXX3hpV7KJnw/Vax1XH0ELLm6MQROgs9P13zO+jNeaqTrJKNvLPlM2c2TXo/9+Kcnpkh3bA2ieSNrEe4uCZuN1JmDY8u+jrU2V/DQreoQ3N3lU6FU9LCov5hvKHsjOsBJaeKvuw87d9V4EL3hn9Foef7/evC1RXZnAQs3AZ+CH2ZPEjIHns+J9ihfIjl6IHchMqfnLhOj+C3JVrzkBrDeyj9rV2xmZmBohOOTiN/TQ== X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(6040176)(601004)(2401047)(8121501046)(5005006)(10201501046)(3002001);SRVR:BN6PR15MB1185;BCL:0;PCL:0;RULEID:;SRVR:BN6PR15MB1185; X-Forefront-PRVS: 011579F31F X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(4630300001)(6049001)(6009001)(7916002)(24454002)(199003)(377454003)(189002)(230700001)(1691005)(36756003)(2950100002)(19580395003)(305945005)(77096005)(4001350100001)(66066001)(64126003)(105586002)(7736002)(65956001)(83506001)(7846002)(23746002)(106356001)(6666003)(65806001)(8666005)(47776003)(117156001)(50986999)(189998001)(54356999)(31686004)(76176999)(68736007)(33646002)(101416001)(42186005)(65826007)(2906002)(92566002)(31696002)(2201001)(81166006)(86362001)(81156014)(4326007)(5001770100001)(50466002)(97736004)(5660300001)(8676002)(6116002)(3846002)(586003)(7059030);DIR:OUT;SFP:1102;SCL:1;SRVR:BN6PR15MB1185;H:[192.168.6.194];FPR:;SPF:None;PTR:InfoNoRecords;A:1;MX:1;LANG:en; X-Microsoft-Exchange-Diagnostics: =?Windows-1252?Q?1;BN6PR15MB1185;23:TTIoMGknpfE+p8HnSPylUanaS93z05YX0KFit?= =?Windows-1252?Q?5jN5r1FWFv7tivccZbL7u2IRLbet6+NBGuXffQU6tKns8YxUZVJ6b/V/?= =?Windows-1252?Q?JuAkqv9COKs5wbYu6LgpGjgfHgKvwqj6dO2tBR6afufdPdhXQBwnrmA1?= =?Windows-1252?Q?nCdTtm5t41jzNRAsaTlADtCCIopCcOBtE4BmtOUNqVR/mGxWmIslPheD?= =?Windows-1252?Q?YKjNguDOGVilUyn6iUqCVCjIpgAeTWt8S6KKQ0tcxx63/6KkdBC1TiaR?= =?Windows-1252?Q?ymYRd+H9oMTl/H3k52mMV3ZSPNfXiE/KlUFSZKIK+XGDhKW8r4FHE61j?= =?Windows-1252?Q?qnsB0hJU9ulYa9C3SQcAqU1UqEWQfoSvC4nPVjSl/ZNNmMjnmz3jfYzF?= =?Windows-1252?Q?gMCOeGDQirWLUsaotKs3hsyFFcclGJsBHS9InT23zw58KHjRTLxCp3zK?= =?Windows-1252?Q?0ndSBg/X0mwAJwgIpfyaBOhnEcJXDSXpswYbfxBxnbSGgP5pGja5i3J+?= =?Windows-1252?Q?1X2yqgQFLmlRL6v27oz7XayFuwgylkqERSuRsQDLx1cgiltX+QcmMQht?= =?Windows-1252?Q?1/Wjg/I6dXhMkqloy/Yq/t7adHmOY2dWcoqI4X5Am6fWozAhwPqnMx0x?= =?Windows-1252?Q?K9VVPuaft2fGQWbJZq6Ld3m3Rk9n5NlBRb25TS+7pY3nEtIX4N28oNsf?= =?Windows-1252?Q?Y4ClrHVGSdZKGOFAS9E3zUxx21fDFDCqtqVJF9iKDiTwy6mt5+X6BD2X?= =?Windows-1252?Q?Y4uTv7buQ2oNuMGs+3UZnWdVTPYTinL8UiRb25o2mb7EOKjOAgH7goJ9?= =?Windows-1252?Q?4/N4PwCncAsfho9jwSS+sDEQWB1uUwF/34ujdwWvT/QEJ/KsK+X/oMtN?= =?Windows-1252?Q?kOdTq8NrFZlMmUgEKzdVJffndKJQmW3xvkfeby2280E00TTe0rfaxTBK?= =?Windows-1252?Q?MrTDCiNirrovjhCZoFj1kMT2hPh21grNfSN7YYEo4vo0PrHfGK/0wXQB?= =?Windows-1252?Q?MBLbbz+vIx/ILYOS2FjZwDy7QGrMCUsSAiFtSX1QzX6y9XMa7ZSA6mDk?= =?Windows-1252?Q?3Db7tW/ehivpDw8P53baxFYIh19E03H73uPLhyoDSbWO8CQgbQb/u04S?= =?Windows-1252?Q?YBbEMnR1qW2jdws3AQrGliSD88ZPa+ZTCoR9w23e47nVlltLwMEFQ9+6?= =?Windows-1252?Q?erM/hiM7WocIWHee0CD80ACJCJ7VA4zFITngMrcfsxlGTZuWs+gWBulh?= =?Windows-1252?Q?Dz3n55HPLMIW/Y0Q8tmfQ74RfInh6EI0n0eCtNtnNtVCLvCoaT8E4DZN?= =?Windows-1252?Q?tuwS0HBkWi0wiVgZYiwKulv2pkzaMG8JrHhIcJwDMwdHAm2U7uvya01c?= =?Windows-1252?Q?xq+Iv98IpIOlEbkqMpuU0ua/ESWFUBhL423UBXi/TefIVQpDKxMoSzUD?= =?Windows-1252?Q?QUkqrF0VrTB4Wzk0kMw4TJ2t2sQ2huVZEcZsoRnQIu8rbevk1LVDcCLv?= =?Windows-1252?Q?fJkatFw+OYzNq8XPwN7P5tMCjp1?= X-Microsoft-Exchange-Diagnostics: 1;BN6PR15MB1185;6:LRX6DSFxSoxe9dcjyBDfk6nMGwspg7c/2Eq6Y4cNBm2Ahk8EJ9F5gpampNH7l3VeDswjEMo0Osaun1pKE/bi3iJ8ti+0vZfJnMYuROuM/pAC7OIrT/b6MNKi29ZMsf5hInBKxUSQQsEdqtlkIlm+QNlIaiKAtv4bKynwMBQXUAePHN2o9yuMogxJAicgYM7GVn76KaKaTZZonC32hBxU5Grw58HFZKjzfi1OiadSAZAkVUmD8gTCmEmi9djznwTor4PfL5igbnwtANWSm3Qk696B1phnXdlMZ29al+6D6QPoXfGnJDOM2gkM7+wEe0Io;5:V0f4q3ElSraxkiv6GqIO14Ovrd26EpT8/sFi9Cf4Y7fDT977defKxW4jk/cw6wfYMmQt3oREuDKlmmAv4CEyRYadiAZTZWau69yZn9eSaIUFhsddrmqW9FbVHSu+LOp17c5b0ENo5BMHQu2/PE8xTOlv2ouANhRs9ymYknvZUtE=;24:yKZw0aSUOzKjGOxyLK/d4767LV3bNfx5Zi8RCt+NR2e0CE5e5c2yLyonUIyAanv33+4JAvvBy30uc1tgQlFeRsWj/daTYDjqMEA+lnWIe18= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1;BN6PR15MB1185;7:OEUBGm48syEDP9jWhSQiFr5fzGgyJdLOWyqZceEIEQqTVKau2AA0JKrEKP6FHpN5KYcq6PRQsbfPtEgnYzbwoXOdIHatNEgqfc7UuTRokdWE+jy6X4eyv5D8yXS6YKHtbZ3Q88boTQ0c9/xvC34oN87HLE53l5TjAmXs9vWaQRtKFC3sBFAreCShWIPsI8ssqe1hnxdvLwV1ius+X6PQiI4elAwYj9QJZPJiW6AJo9CaUjIDrw5I+/0d8GmTnN9dUIi6aldgyCU9EKL7J00p/Nnag4lI94pHnETbSlpZAHKN7WqFYleDqbyWNlenLFP8UJR55HjMudm51ecE4V3RZ4OLIKe/lM69mrPlTXJSiZ4=;20:ElouX9XukrUWFpSl5wC2FBBJKE3K1Nz7oKJDzVbOF8PxaGPtjVM7wEDgdhNkQMtx2n/jDYECApRTB+nuc+AIKrCMExIkqEeiXJHItJHdcqnzDuGu4ZC999TGi+ey3oVaFpn1vPUl8uP8QD/Y4zrtYNlVQbBzjmdnVgPUcs7jiaA= X-MS-Exchange-CrossTenant-OriginalArrivalTime: 03 Nov 2016 14:15:52.0780 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN6PR15MB1185 X-OriginatorOrg: fb.com X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-11-03_04:,, signatures=0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3115 Lines: 83 On 11/03/2016 08:01 AM, Bart Van Assche wrote: > On 11/01/2016 03:05 PM, Jens Axboe wrote: >> +static void blk_mq_poll_hybrid_sleep(struct request_queue *q, >> + struct request *rq) >> +{ >> + struct hrtimer_sleeper hs; >> + ktime_t kt; >> + >> + if (!q->poll_nsec || test_bit(REQ_ATOM_POLL_SLEPT, >> &rq->atomic_flags)) >> + return; >> + >> + set_bit(REQ_ATOM_POLL_SLEPT, &rq->atomic_flags); >> + >> + /* >> + * This will be replaced with the stats tracking code, using >> + * 'avg_completion_time / 2' as the pre-sleep target. >> + */ >> + kt = ktime_set(0, q->poll_nsec); >> + >> + hrtimer_init_on_stack(&hs.timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); >> + hrtimer_set_expires(&hs.timer, kt); >> + >> + hrtimer_init_sleeper(&hs, current); >> + do { >> + if (test_bit(REQ_ATOM_COMPLETE, &rq->atomic_flags)) >> + break; >> + set_current_state(TASK_INTERRUPTIBLE); >> + hrtimer_start_expires(&hs.timer, HRTIMER_MODE_REL); >> + if (hs.task) >> + io_schedule(); >> + hrtimer_cancel(&hs.timer); >> + } while (hs.task && !signal_pending(current)); >> + >> + __set_current_state(TASK_RUNNING); >> + destroy_hrtimer_on_stack(&hs.timer); >> +} > > Hello Jens, > > Will avg_completion_time/2 be a good choice for a polling interval if an > application submits requests of varying sizes, e.g. 4 KB and 64 KB? Not necessarily. This is a first implementation to demonstrate what is possible, we can definitely make it more clever. As mentioned in the previous email, I believe most cases will be small(ish) IO of roughly the same size, and it'll work fine for that. One possible improvement could be to factor in the minimum times as well. Since we're assuming some level of predictability here, incorporating the 'min' time could help too. This is one of the devices I used for testing: # cat stats read : samples=15199, mean=5185, min=5068, max=25616 write: samples=0, mean=0, min=-1, max=0 and when it looks like that, then using mean/2 works well. If I do a 512/64k 50/50 split, it looks like this: cat stats read : samples=4896, mean=20996, min=5250, max=49721 write: samples=0, mean=0, min=-1, max=0 Using the classic polling with this workload, I get 39900 IOPS at 100% CPU load. With the hybrid poll enabled, I get 37700 IOPS at 68% CPU. For the hybrid polling, with the default settings, we end up being a bit slower than pure poll (not unexpected), but at a higher level of efficiency. Even for this 512b/64k split case, that's still the case. For comparison, without polling, we run at 25800 IOPS (at 39% CPU). > Can avg_completion_time be smaller than the context switch time? If so, > do you think any other thread will be able to do any useful work after > the timer has been started and before the timer has expired? There are definitely cases where we want to just keep busy polling. I didn't add any minimum yet, but yes, you can imagine cases where we can blow our budget by doing the sleep up front. We just want to do the busy poll for that case. -- Jens Axboe