Received: by 10.223.164.221 with SMTP id h29csp2484762wrb; Mon, 23 Oct 2017 08:13:53 -0700 (PDT) X-Received: by 10.84.176.131 with SMTP id v3mr10577297plb.208.1508771633845; Mon, 23 Oct 2017 08:13:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1508771633; cv=none; d=google.com; s=arc-20160816; b=zPFE6/OV5r1wTOKvJmIXp3U6pTf26CAq45WxktPWLNOfY1sfdKN79c6saQYKO2grSx Oo5rSGVkpx5BhTwFE3Ci4vaui0mBLYNiplRmpJEeMcxuX3sTRjqCxicJiZnSdJia/zyc Gx3Kk53aBUmOyOzS7VSpDSMczN7obvc+gB0juBdMeHNR9/Nub1ftQtNlEOE92GSO0kFq w2CF1R17Aak6yf6rdFjDlIPGZ8QT+MLuJXHQmWScE1SJoLrm7lLQkf3d37s75Nl7fKni bl+M7HW8kt78pg0t0l0KY32JPSzxpKmBiKLbO9SjG6LCPupMQy63uvBkcMiKieKvVvJS fIoA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=tqFFxRCLuEw9veIUUeUuvVmGGp8sMUJF3+yLbNT0RG4=; b=utfJraXjdEVesGXCDDIkSf9BGbG+bunTG0WdMBSYNt6Xnn+Qv5qVMZb3Qsn9BU8vKI IO+XAllw5z0TNLuq3DfM9bT+yHsIN1LccnE2oMsXt2xO1l89QhlcFgMv8teCODfh/jbr IZk4Hrjh/B/jNH3GLaY2N2kPnCSIUhu0u72oWiX1Z6pnOGxXVHpxYTiTNxxD/LyxeSag z+CX7VFbp6qjP/rgQmu7ftCc+5Wm5LE8lmE+bM2ddEDB+J7893xu+/1B5PMcYkbuBRwf yDtcXwzFO7bOxQdvlLr6B2axtLgWDopeRGat22xkEfGBXXgdj8WOVMwD3h64U/YvTaV0 DweQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@profitbricks-com.20150623.gappssmtp.com header.s=20150623 header.b=midToz7G; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w3si5019360pgc.682.2017.10.23.08.13.38; Mon, 23 Oct 2017 08:13:53 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@profitbricks-com.20150623.gappssmtp.com header.s=20150623 header.b=midToz7G; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751393AbdJWPNQ (ORCPT + 99 others); Mon, 23 Oct 2017 11:13:16 -0400 Received: from mail-it0-f67.google.com ([209.85.214.67]:54880 "EHLO mail-it0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751260AbdJWPNO (ORCPT ); Mon, 23 Oct 2017 11:13:14 -0400 Received: by mail-it0-f67.google.com with SMTP id 72so6326501itk.3 for ; Mon, 23 Oct 2017 08:13:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=profitbricks-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=tqFFxRCLuEw9veIUUeUuvVmGGp8sMUJF3+yLbNT0RG4=; b=midToz7Gahjl0P5/fZ+Pp/2jrU+ILCSTAE+5e7ihaTCYz3PCd2d6ytvqHPm9KTpGSC bADaUT3t5+PbOtk67A8nUlFSy6vkajGoSZvjX9lZlNGoj47sJicOwG9XVOChgiL9+EDk 5qUH76e2jbCyyRYMZLEHwBOeB1yeaXcC/r1n4bNdg1CrW+ZhqovfBvj4Y0ZqnAEicpEb 1cNtUJ7BBqFnJnFy0bcyUQKSu4Lrvnp5H71GzTWpnXDhjmP00hZVDN/ae9fkS6BgpMrW fPmz42cOwKYlnLo3ePAoXJ5ARgjDZ+1hVD80ZDaIQFrCOHW1LP1yUxYVxWhMMupWUmRd XNWA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=tqFFxRCLuEw9veIUUeUuvVmGGp8sMUJF3+yLbNT0RG4=; b=Q2VfhBIQTittOAs3KTDlm6JcdLxcRFuLVmkWTmXpdf8R62XTl/pODhdeyy7eHhNZWH kUkwVdl4MIguKDaismDbnpZYeF1eMzuTdmTXnVa8Vk9jb4GMZgozq3z4UMX0SkRBZGmG UMqOuqaGV8o7qnc51u8k/q2y/y1GrzYC7v4iwW8mW6GnDFsfZWZthYWt+MYC1ZsOV0SF rcB9wQ5S6e1/NhQcCE35TKP6QrEafrzAXco9iG8/5D8h7L+4YwQYCyPT4pDXpob/Mk/V bLA3q6KltKYx6L2owmPAQSJvYtYa4Cpf13FtTwvHxzlbLujaJeg9NGYocVuDlRsUK9hf o7FQ== X-Gm-Message-State: AMCzsaXXW35LysJCyB+8kZrWk48NCd926lxQV3xCkgLckWhIjPR+fmPk CG41DbHM27FTMiYBwgV7lIFSmb2pP/AxEQv+07tcdA== X-Google-Smtp-Source: ABhQp+QgLY2ACSsvFI1a77ll5Wwy9RvKkj5isHQN5/9xZ7LPLbZvkIzr+FByxCBvm1ianzVpoXVbTc7kON1BsiONQqU= X-Received: by 10.36.7.142 with SMTP id f136mr10358809itf.40.1508771593300; Mon, 23 Oct 2017 08:13:13 -0700 (PDT) MIME-Version: 1.0 Received: by 10.107.12.77 with HTTP; Mon, 23 Oct 2017 08:12:52 -0700 (PDT) In-Reply-To: <1508529929.3029.20.camel@wdc.com> References: <20171018102206.26020-1-roman.penyaev@profitbricks.com> <1508435243.2429.42.camel@wdc.com> <1508529929.3029.20.camel@wdc.com> From: Roman Penyaev Date: Mon, 23 Oct 2017 17:12:52 +0200 Message-ID: Subject: Re: [PATCH 1/1] [RFC] blk-mq: fix queue stalling on shared hctx restart To: Bart Van Assche Cc: "hch@lst.de" , "linux-kernel@vger.kernel.org" , "linux-block@vger.kernel.org" , "hare@suse.com" , "axboe@fb.com" Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Oct 20, 2017 at 10:05 PM, Bart Van Assche wrote: > On Fri, 2017-10-20 at 11:39 +0200, Roman Penyaev wrote: >> But what bothers me is these looong loops inside blk_mq_sched_restart(), >> and since you are the author of the original 6d8c6c0f97ad ("blk-mq: Restart >> a single queue if tag sets are shared") I want to ask what was the original >> problem which you attempted to fix? Likely I am missing some test scenario >> which would be great to know about. > > Long loops? How many queues share the same tag set on your setup? How many > hardware queues does your block driver create per request queue? Yeah, ok, my mistake. I had to split both issues and should not have described everything in one go in the first email. So, take a look. For my tests I create 128 queues (devices) with 64 hctx each, all queues share same tags set, then I start 128 fio jobs (1 job per 1 queue). The following is the fio and ftrace output for v4.14-rc4 kernel (without any changes): READ: io=5630.3MB, aggrb=573208KB/s, minb=573208KB/s, maxb=573208KB/s, mint=10058msec, maxt=10058msec WRITE: io=5650.9MB, aggrb=575312KB/s, minb=575312KB/s, maxb=575312KB/s, mint=10058msec, maxt=10058msec root@pserver16:~/roman# cat /sys/kernel/debug/tracing/trace_stat/* | grep blk_mq Function Hit Time Avg s^2 -------- --- ---- --- --- blk_mq_sched_restart 16347 9540759 us 583.639 us 8804801 us blk_mq_sched_restart 7884 6073471 us 770.354 us 8780054 us blk_mq_sched_restart 14176 7586794 us 535.185 us 2822731 us blk_mq_sched_restart 7843 6205435 us 791.206 us 12424960 us blk_mq_sched_restart 1490 4786107 us 3212.153 us 1949753 us <<< !!! 3 ms in average !!! blk_mq_sched_restart 7892 6039311 us 765.244 us 2994627 us blk_mq_sched_restart 15382 7511126 us 488.306 us 3090912 us [cut] And here are results with two patches reverted: 8e8320c9315c ("blk-mq: fix performance regression with shared tags") 6d8c6c0f97ad ("blk-mq: Restart a single queue if tag sets are shared") READ: io=12884MB, aggrb=1284.3MB/s, minb=1284.3MB/s, maxb=1284.3MB/s, mint=10032msec, maxt=10032msec WRITE: io=12987MB, aggrb=1294.6MB/s, minb=1294.6MB/s, maxb=1294.6MB/s, mint=10032msec, maxt=10032msec root@pserver16:~/roman# cat /sys/kernel/debug/tracing/trace_stat/* | grep blk_mq Function Hit Time Avg s^2 -------- --- ---- --- --- blk_mq_sched_restart 50699 8802.349 us 0.173 us 121.771 us blk_mq_sched_restart 50362 8740.470 us 0.173 us 161.494 us blk_mq_sched_restart 50402 9066.337 us 0.179 us 113.009 us blk_mq_sched_restart 50104 9366.197 us 0.186 us 188.645 us blk_mq_sched_restart 50375 9317.727 us 0.184 us 54.218 us blk_mq_sched_restart 50136 9311.657 us 0.185 us 446.790 us blk_mq_sched_restart 50103 9179.625 us 0.183 us 114.472 us [cut] The difference is significant: 570MB/s vs 1280MB/s. E.g. one cpu spent 3 ms in average iterating over all queues and hctxs in order to find out hctx to restart. In total CPUs spent *seconds* in loop. That seems incredibly long. > Commit 6d8c6c0f97ad is something I came up with to fix queue lockups in the > SCSI and dm-mq drivers. You mean fairness? (some hctx get less amount of chances to be restarted). That's why you need to restart them in RR fashion, right? In IBNBD I also do hctx restarts in RR fashion and for that I put each hctx which is needed to be restarted in a separate percpu list. Probably it makes sense to do the same here? -- Roman From 1581808324285567179@xxx Fri Oct 20 20:06:12 +0000 2017 X-GM-THRID: 1581602602754363002 X-Gmail-Labels: Inbox,Category Forums