Received: by 10.223.164.221 with SMTP id h29csp2544898wrb; Mon, 23 Oct 2017 09:14:05 -0700 (PDT) X-Received: by 10.84.241.15 with SMTP id a15mr10794059pll.199.1508775245836; Mon, 23 Oct 2017 09:14:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1508775245; cv=none; d=google.com; s=arc-20160816; b=SUZPBca6LpcvHz83YWUgJQ4L82UR62Dg8J6PWJZYezld7CGojf4g25eiyIfwLmcZs+ YZPrMgDi4kF62v3tXqt+Nwuba/UuU/qU9ipvaQtRQSQDjjeoOi7fBGy5AoCUPLOJ08fv rT9yjNm/jSQ64MNBnHeBVPGQedrK2WkI+Ml5ZoU/AbbAAIBuiFUXUHWyPXBV/9Nv0irS gwAcnCrH0brRapcqSr/PZkSwUnliM5vdkfYj3zksG9SNwUg2zw/4cl9UAs87/6KnNyVW 20OFXFvRH2HimjPDoZy/O8SKnPYZg4jvDbMyE25rswd8+A8ggeTXl0LnlwYP5KDZA7F5 TeXQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=Mu3BUpozCAfn/9jkmbsBw6xNyL8ZGI4FY7ONErFKOIY=; b=IJCocu+vdRUuU/1Rk5OVe9jgDu+KSvY+lS3vIFhzVM3TWKohO2nlltmwwqJ9/YhmqE VL+D7ISBGZ4hpw6WD72yDdZFeCQUIrMJ8FI8fZi+mhX2ugNI+UG5bef1MAWbI1Aaxa8v uwvLWzXOWaqm4wyw43Y72hnTlkgWMx0VOkqh+B5lV/qIyvNNSM13zpyW4d6nqV5zyjBs cYSKPIsEMIhdv+ug6zUgnTi3alwXECVmeZHZpexDZYWMa+THcI88JDvoIQNmUnwmtnfd i2dxGC9LulQT93MGQhJAWduV91hveVIOFnFLgYYDaRgbTTBH9MnN/El6RUTkYix51RlC 1QTw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@profitbricks-com.20150623.gappssmtp.com header.s=20150623 header.b=0i3t3UYF; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l16si5507059pfk.589.2017.10.23.09.13.51; Mon, 23 Oct 2017 09:14:05 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@profitbricks-com.20150623.gappssmtp.com header.s=20150623 header.b=0i3t3UYF; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932348AbdJWQMw (ORCPT + 99 others); Mon, 23 Oct 2017 12:12:52 -0400 Received: from mail-io0-f193.google.com ([209.85.223.193]:54735 "EHLO mail-io0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751354AbdJWQMu (ORCPT ); Mon, 23 Oct 2017 12:12:50 -0400 Received: by mail-io0-f193.google.com with SMTP id e89so20681848ioi.11 for ; Mon, 23 Oct 2017 09:12:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=profitbricks-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=Mu3BUpozCAfn/9jkmbsBw6xNyL8ZGI4FY7ONErFKOIY=; b=0i3t3UYFWBtB6JpsG13kYgEQadPzX5HyBgFU0hj5Av1lKjJvkgDTUnQ4CYBRtz2xZ7 X55xPWUyJSBQMp+DFhDj1Ew1ayagFsFDZeGiMlrdFsR90cPZzztVIRwYR08h8Y5j0je3 hG0C+PSjEs6W69bM6is0/IEyGgRiQpUXS6UWJJBUF8x4kqJKWiINyy3Pqh8OATqUBk8L 7HP7utb+2hhTO+SKFi/xOJgstV2+BmGwVmCXd5n6gIxjpU33GsGzle2Abs3+X0CpDNdQ c7O2Lmm6I0FtJcIi1994xcbPWJCJzHpmmjW2ebFu2FBIiuWm50YMNq3IHxOU2tR+YihA hlJg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=Mu3BUpozCAfn/9jkmbsBw6xNyL8ZGI4FY7ONErFKOIY=; b=kDc5hTMJzR1zkCHD9FOivMb2wodWIE+ybGbX5rKjqQbCbhGzbczzIOZxVZXYZtY/xT i7hOpIPywswlo6+tLvqCT1vjed3vtetEpaK0gr4grlY3lnouBdk8R7V9uIzGr1wErQHW bHGTR8xj0qOjd/jvDfp8aWlk2cGJlwf5F7PAwTUTFBt4SmDAKqIaOrP/sF/E/1YKarAe jK2O4kBM1Tj57oSsJvpdF123URFiCN1zg4AOnYWKR4ZNUfHB6twDQ9il9flG3+ku+t+z iEEAgyH4j/mmgJ8MMg4UWo9Z0uNEQVGrGC9yVt0xgKO3T2aYISeMC8ZrsJ8Y0hXw61xm eUpA== X-Gm-Message-State: AMCzsaXCbGEKWZsbzncL1SUm28kcRTE/gPczTHF3a7nWH3AGnqfG9swr lW5e35COqwdzf8xGE7E5LjYrOwKTwbu6dAiJ6aVYmQ== X-Google-Smtp-Source: ABhQp+Qq8vT69cWzJ5yR51u97u3La5ElHO3jlHhs2OEDc3CerldJJepkumV+LHoP/ETUzFpCSioBDEgqax+fsN5s4+Q= X-Received: by 10.107.107.22 with SMTP id g22mr18981870ioc.77.1508775169960; Mon, 23 Oct 2017 09:12:49 -0700 (PDT) MIME-Version: 1.0 Received: by 10.107.160.139 with HTTP; Mon, 23 Oct 2017 09:12:29 -0700 (PDT) In-Reply-To: <20171020133943.GA31275@ming.t460p> References: <20171018102206.26020-1-roman.penyaev@profitbricks.com> <20171020133943.GA31275@ming.t460p> From: Roman Penyaev Date: Mon, 23 Oct 2017 18:12:29 +0200 Message-ID: Subject: Re: [PATCH 1/1] [RFC] blk-mq: fix queue stalling on shared hctx restart To: Ming Lei Cc: linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, Bart Van Assche , Christoph Hellwig , Hannes Reinecke , Jens Axboe Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Ming, On Fri, Oct 20, 2017 at 3:39 PM, Ming Lei wrote: > On Wed, Oct 18, 2017 at 12:22:06PM +0200, Roman Pen wrote: >> Hi all, >> >> the patch below fixes queue stalling when shared hctx marked for restart >> (BLK_MQ_S_SCHED_RESTART bit) but q->shared_hctx_restart stays zero. The >> root cause is that hctxs are shared between queues, but 'shared_hctx_restart' >> belongs to the particular queue, which in fact may not need to be restarted, >> thus we return from blk_mq_sched_restart() and leave shared hctx of another >> queue never restarted. >> >> The fix is to make shared_hctx_restart counter belong not to the queue, but >> to tags, thereby counter will reflect real number of shared hctx needed to >> be restarted. >> >> During tests 1 hctx (set->nr_hw_queues) was used and all stalled requests >> were noticed in dd->fifo_list of mq-deadline scheduler. >> >> Seeming possible sequence of events: >> >> 1. Request A of queue A is inserted into dd->fifo_list of the scheduler. >> >> 2. Request B of queue A bypasses scheduler and goes directly to >> hctx->dispatch. >> >> 3. Request C of queue B is inserted. >> >> 4. blk_mq_sched_dispatch_requests() is invoked, since hctx->dispatch is not >> empty (request B is in the list) hctx is only marked for for next restart >> and request A is left in a list (see comment "So it's best to leave them >> there for as long as we can. Mark the hw queue as needing a restart in >> that case." in blk-mq-sched.c) >> >> 5. Eventually request B is completed/freed and blk_mq_sched_restart() is >> called, but by chance hctx from queue B is chosen for restart and request C >> gets a chance to be dispatched. >> >> 6. Eventually request C is completed/freed and blk_mq_sched_restart() is >> called, but shared_hctx_restart for queue B is zero and we return without >> attempt to restart hctx from queue A, thus request A is stuck forever. >> >> But stalling queue is not the only one problem with blk_mq_sched_restart(). >> My tests show that those loops thru all queues and hctxs can be very costly, >> even with shared_hctx_restart counter, which aims to fix performance issue. >> For my tests I create 128 devices with 64 hctx each, which share same tags >> set. > > Hi Roman, > > I also find the performance issue with RESTART for TAG_SHARED. > > But from my analysis, RESTART isn't needed for TAG_SHARED > because SCSI-MQ has handled the RESTART by itself already, so > could you test the patch in following link posted days ago to > see if it fixes your issue? I can say without any testing: it fixes all the issues :) You've reverted 8e8320c9315c ("blk-mq: fix performance regression with shared tags") 6d8c6c0f97ad ("blk-mq: Restart a single queue if tag sets are shared") with one major difference: you do not handle shared tags in a special way and restart only requested hctx, instead of iterating over all hctxs in a queue. Firstly I have to say that queue stalling issue(#1) and performance issue(#2) were observed on our in-house rdma driver IBNBD: https://lwn.net/Articles/718181/ and I've never tried to reproduce them on SCSI-MQ. Then your patch brakes RR restarts, which were introduced by this commit 6d8c6c0f97ad ("blk-mq: Restart a single queue if tag sets are shared"). Seems basic idea of that patch is nice, but because of possible big amount of queues and hctxs patch requires reimplementation. Eventually you should get fast hctx restart but in RR fashion. According to my understanding that does not contradict with your patch. -- Roman From 1582061724820049217@xxx Mon Oct 23 15:13:53 +0000 2017 X-GM-THRID: 1581602602754363002 X-Gmail-Labels: Inbox,Category Forums