Received: by 10.223.164.221 with SMTP id h29csp4889720wrb; Fri, 20 Oct 2017 02:40:36 -0700 (PDT) X-Received: by 10.99.109.14 with SMTP id i14mr4073990pgc.355.1508492435924; Fri, 20 Oct 2017 02:40:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1508492435; cv=none; d=google.com; s=arc-20160816; b=X0Fo/Q48ZVKzy56pKsk0RfeNYbvrF3pLN80sBCiQ1NJkYTilXLYyYMe8wUjvMu5O6Z lxTPmJkzec9hcQ9rhj8h/vaj+qW9atxKS/s53caKDa6xt+RbQRJq4nFKt3m6MtMrXQBI xv8LXOiaB34rxrCQmDBXdSyveMXM589J2pxZUtyPqpe9L9/X90kia7duWwpjHhZOGHis m6h5MKk+q37gzzT8Pau8SAUG/Sb3wO7z2Ny8pUGgVGlgavs1b4X1eAdUDIH4JqsIjzHR k2lY8aXsQH0yEICf6tsfQz1tMDDvduqs8o8T6itRM6NqZvuoEF74X2dlqFYE4ZQtX0yM /xJw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=9MwWblK99hze6cIUutxgILu7pLNRNZEXcCTyHqBeJsk=; b=C/NVnXRjAVg8Zgd8I0/k9aaphBZL94bnYWvlzqGKwy1mLjhWw0A7D1oU0XWvWkKuKf uHG3wvMqC0eoJYlCdTPd4FVsh4EUUiIXlzyRf4Fi7d2S8WUguteegOs4oY1dMn0A0b0t ytfGACJ+lNAbwvcaA1qU7di4RUAzauujXHS2OqxyahDuUPqpmaByJl7/m2wo0ZXNclpA wsGcMDCwFnx5n/l7bz1z9MCRhPUsSZisNFA2iigE5SHslLJfdK9A8oLcZeDhZkFdvI8Z OqU8QZGM8bW3mUp7Q29BNRx5MexdTANXha4+0hQy8EcwVASiK3mvk2dEGUxp82rTKCuO V22Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@profitbricks-com.20150623.gappssmtp.com header.s=20150623 header.b=mAYmm/BW; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m5si499479pgs.223.2017.10.20.02.40.22; Fri, 20 Oct 2017 02:40:35 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@profitbricks-com.20150623.gappssmtp.com header.s=20150623 header.b=mAYmm/BW; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752654AbdJTJjs (ORCPT + 99 others); Fri, 20 Oct 2017 05:39:48 -0400 Received: from mail-it0-f54.google.com ([209.85.214.54]:45739 "EHLO mail-it0-f54.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752530AbdJTJjp (ORCPT ); Fri, 20 Oct 2017 05:39:45 -0400 Received: by mail-it0-f54.google.com with SMTP id n195so13351579itg.0 for ; Fri, 20 Oct 2017 02:39:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=profitbricks-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=9MwWblK99hze6cIUutxgILu7pLNRNZEXcCTyHqBeJsk=; b=mAYmm/BWWd3tPXqufynACh41bqTtyCr7uYn4aobmi6eD/tLgDgd8l02VJXLrhuzr6o +g9foafemqgd2FcS1zVMyzWgM+UUgMSatoJQ+Vc+Aj9KlTK3o6lIrRZkB4tgPCc8be3w v0ICItL/B9EhRTQm4uwVy1/YBSUjrquDmo4j5visQ799zAyTIYFHl5Rj6ls/8tOImrbn s9lFnQjVPqOlixesfA3P+yvPOjiPabCpHZsE0u8LdhLVMkGf+/Pvopk5OdwbpjYDjKqq zVsH7tqlGs2odZUsvZFFUCOM9vosuonEjApdMnLqEWkVgCZNlNAP9BHoXrLhSbqcsPUP dHWg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=9MwWblK99hze6cIUutxgILu7pLNRNZEXcCTyHqBeJsk=; b=NdEhCWwliXQAPqRpD5cA8068RvC9goXXDO3FnhqXinaqnEZJXPi54Yy8291KOCggcs MY1fz4AfoZrv5fsZpMJIdh3wsCSnNXP98Vp8p6isWO4qjQ6n9xS5VakFG3ijS7kFDPrv b+XqF924S4EMskZXl5HmC6msxRIlk5KHZXvKfSQYW3HCOPWZbcG0xMFFg46jsUiFJM+2 8zkGieD50nnJS/PTmIkvwRXAKkqjG9zEVtFTiOHrRa6jPBMVdfchengK7FD9oiPJNnV9 QdEgaEZWNCkkCIRfCglAI3ELvFkN8Urs5iEoA9MygGchUsAFbdqtZLMY6ChSYNB71+8D w1Tw== X-Gm-Message-State: AMCzsaWbs4UKLCDaPnFzsIkINIDJPn1gmPD484FmseKbm/SwbwnopC6+ J4qaR9Jjc+Rmzz0q8n8Qp4ALBqvrX05U256Dp5DLtQ== X-Google-Smtp-Source: ABhQp+S0s/mOEpRb3+UuK6/KpkSsi345G6mlW8ccmzvhbToxyvtIICBxnoTvyBqIeU0j2755w7hZdOwfc+27JbYr89Y= X-Received: by 10.36.40.203 with SMTP id h194mr1863727ith.26.1508492384471; Fri, 20 Oct 2017 02:39:44 -0700 (PDT) MIME-Version: 1.0 Received: by 10.107.160.139 with HTTP; Fri, 20 Oct 2017 02:39:24 -0700 (PDT) In-Reply-To: <1508435243.2429.42.camel@wdc.com> References: <20171018102206.26020-1-roman.penyaev@profitbricks.com> <1508435243.2429.42.camel@wdc.com> From: Roman Penyaev Date: Fri, 20 Oct 2017 11:39:24 +0200 Message-ID: Subject: Re: [PATCH 1/1] [RFC] blk-mq: fix queue stalling on shared hctx restart To: Bart Van Assche Cc: "linux-kernel@vger.kernel.org" , "linux-block@vger.kernel.org" , "hare@suse.com" , "axboe@fb.com" , "hch@lst.de" Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Bart, On Thu, Oct 19, 2017 at 7:47 PM, Bart Van Assche wrote: > On Wed, 2017-10-18 at 12:22 +0200, Roman Pen wrote: >> the patch below fixes queue stalling when shared hctx marked for restart >> (BLK_MQ_S_SCHED_RESTART bit) but q->shared_hctx_restart stays zero. The >> root cause is that hctxs are shared between queues, but 'shared_hctx_restart' >> belongs to the particular queue, which in fact may not need to be restarted, >> thus we return from blk_mq_sched_restart() and leave shared hctx of another >> queue never restarted. >> >> The fix is to make shared_hctx_restart counter belong not to the queue, but >> to tags, thereby counter will reflect real number of shared hctx needed to >> be restarted. > > Hello Roman, > > The patch you posted looks fine to me but seeing this patch and the patch > description makes me wonder why this had not been noticed before. This is a good question, which I could not answer. I tried to simulate the same behaviour (completion timings, completion pinning, number of submission queues, shared tags, etc) on null block. but what I see is that *_sched_restart() never observes 'shared_hctx_restart', literally never (I made a counter when we take a path and start looking for a hctx to restart, and a counter stays 0). That makes me nervous and then I gave up. After some time I want return to that and try to reproduce the problem on something else, say nvme. > Are you perhaps using a block driver that returns BLK_STS_RESOURCE more > often than other block drivers? Did you perhaps run into this with the > Infiniband network block device (IBNBD) driver? Yep, this is IBNBD, but in these tests I tested with mq scheduler, shared tags and 1 hctx for each queue (blk device), thus I never run out of internal tags and never return BLK_STS_RESOURCE. Indeed, not modified IBNBD does internal tags management. This was needed because each queue (block device) was created with hctx number (nr_hw_queues) equal to number of cpus on the system, but blk-mq tags set is shared only between hctx, not globally, which led to need to return BLK_STS_RESOURCE and queues restarts. But, with mq scheduler situation changed: 1 hctx with shared tags can be specified for all hundreds of devices without any performance impact. Testing this configuration (1 hctx, shared tags, mq-deadline) immediately shows these two problems: request stalling and slow loops inside blk_mq_sched_restart(). > No matter what driver triggered this, I think this bug should be fixed. Yes, queue stalling can be easily fixed. I can resend current patch with shorter description which targets only this particular bug, if no one else has objections/comments etc. But what bothers me is these looong loops inside blk_mq_sched_restart(), and since you are the author of the original 6d8c6c0f97ad ("blk-mq: Restart a single queue if tag sets are shared") I want to ask what was the original problem which you attempted to fix? Likely I am missing some test scenario which would be great to know about. -- Roman From 1581709172289637044@xxx Thu Oct 19 17:50:13 +0000 2017 X-GM-THRID: 1581602602754363002 X-Gmail-Labels: Inbox,Category Forums