Received: by 10.192.165.148 with SMTP id m20csp3287947imm; Mon, 7 May 2018 09:39:50 -0700 (PDT) X-Google-Smtp-Source: AB8JxZqyRIiN+sSGsgLfRGNVEhOiEHmZWuPZh6jSluyw6CTmdbV+dVewXk+Z0Y6tTXFuOuK9t4i/ X-Received: by 2002:a6b:2cc7:: with SMTP id s190-v6mr42905403ios.0.1525711187897; Mon, 07 May 2018 09:39:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525711187; cv=none; d=google.com; s=arc-20160816; b=hrn2kzuQobXwHFF4dJRK22FTYguQ4IAI4+HWYHLqX+6cetTua1POTeCMN3hlJPxd0T NfpA77TU076jdN4mQYVHySvjZF462EArbBsI7lVd19KO0ptMI3ecZFzae5U8mpfPuArZ fb9/iZA24WgfYceftSBI0a+3RSEpD++jhhWGPUfeOXD/xn3e+Ct8HXe+fcehiWUQhKHM NLqeXjpTckUpRn0rOgqcZvFCkxggthGRkjLgMQGJ3L76kpsFRWFsjlGwPe23gWy70iGE lgc7lf/IUy68UCD5/1d7rsnP2PHSxqF2WAv7dziwkkCwbGbbrXVcqHM1+mfwhwVqcqMK A9Lw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature :arc-authentication-results; bh=CvTc1frE0jRYAYLMFCErIY0mUo/xvySlhQegq/uY9zA=; b=IG9Rj3iVxnUEAIa07WIHGlYFXI5mmOa7WGTq2izQl8YAzBUoJHDU7VPVPb8dlPfdF9 /ZTcOyZdyD2kPezWxfptcWEZzOZWEXpR+HothgdM9QWLHu5bB+Y6SaRd+Ka9Sc1F/nhK LryUWcbuqYFBQ5HVjt0Xvqa3W5oixwdsTawRkTKBsKzlXOSyIFbnBxhWIttCqcvF4qA4 o4UJp83iClgRxmDTgDiQH3C4xbeSBwjgTVh8KJt3TdBFwEJXh7EkVF6Z2PRnXvPVAN85 1JZPzHZJPOoobY8Xgb8SSg7M8ib6NZQtnOizH0TOWESYXqkenU1nMGAGLWMBpccW5oVK 4OUg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel-dk.20150623.gappssmtp.com header.s=20150623 header.b=tjBHTiPO; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p135-v6si18830295iod.141.2018.05.07.09.39.34; Mon, 07 May 2018 09:39:47 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel-dk.20150623.gappssmtp.com header.s=20150623 header.b=tjBHTiPO; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752368AbeEGQjS (ORCPT + 99 others); Mon, 7 May 2018 12:39:18 -0400 Received: from mail-it0-f51.google.com ([209.85.214.51]:50563 "EHLO mail-it0-f51.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751953AbeEGQjQ (ORCPT ); Mon, 7 May 2018 12:39:16 -0400 Received: by mail-it0-f51.google.com with SMTP id p3-v6so12564318itc.0 for ; Mon, 07 May 2018 09:39:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=CvTc1frE0jRYAYLMFCErIY0mUo/xvySlhQegq/uY9zA=; b=tjBHTiPOiAW8KgFJlPZeF0SjmBcK+jmdBlBSHKSZP8rOS8txTKgI82z3va0i4XZh4Q cegw2buW7JVHuir+1zwWGqcykX8Wz+4h5stYbRAoRCnkaUL32n7TJVslRbBYYX8E2Aq3 9wzMT9/zpwJnCPrpL81ef8708ohYJ//vr8k8TAnL1baOEtyfzZZ+6jdAKR4hH9XU8Ogt l/EjmvBDOtiaL7Rlt3vx3GfGGN0PAx/aT8XJPptHNRO1zdExU8gl1x2tgZpR7ziVCaE6 rjjzBRXsNE6hgetKGRDUIaHoSBj7qydqG5+1B8c4+RXbSjJKaxucSeiBJHZ0lPZDCkrG wxkg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=CvTc1frE0jRYAYLMFCErIY0mUo/xvySlhQegq/uY9zA=; b=JHhyVNq6S4ASgkpIKJ0vwRewGCtWcn2JcHt+Y/tXUCDyfdc3+3X3a2hbxSxyUyu2A1 pgXotDL4CCBt1EOUDNtGdRL0Jjj3cVU/VAIOgms8wax7NberowQO6xeNLE5R3mvIT0mu gIIMcqFF9la/7RxRiWZzWgFBksohwUbyd6SugCapXOkPPjoJdzEUiJz2Yu6atu4xcOQN pxylsx/SszVRtqZSiVL9zXajBJuOWs8YiOplZAjVrT0kqmrTDEzig4Alq8Oknz6hfRK2 TxW9n4UWDsj9CaWhU2EBSMsV46mR0VlP3NUx0VV0nwftR1Ur32toMh9XtVlXemjVNhPo 3Xew== X-Gm-Message-State: ALKqPwcUfGBaP6ArwWryocvNZj/c9DM+4W5aI91DeX9kWAtgGrGC6U9Z BBn8oUxWuhX79On6wwxY20wg8g== X-Received: by 2002:a24:bd82:: with SMTP id x124-v6mr1968481ite.17.1525711155422; Mon, 07 May 2018 09:39:15 -0700 (PDT) Received: from [192.168.1.167] ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id u132-v6sm4304088itf.9.2018.05.07.09.39.13 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 07 May 2018 09:39:14 -0700 (PDT) Subject: Re: bug in tag handling in blk-mq? To: Paolo Valente , Mike Galbraith , Christoph Hellwig Cc: linux-block , Ulf Hansson , LKML , Linus Walleij , Oleksandr Natalenko References: <999DF2B3-4EE8-4BDF-89C5-EB0C2D8BF69E@linaro.org> From: Jens Axboe Message-ID: <7760d23b-7a4c-a645-1c7a-da7569bb44dc@kernel.dk> Date: Mon, 7 May 2018 10:39:12 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.0 MIME-Version: 1.0 In-Reply-To: <999DF2B3-4EE8-4BDF-89C5-EB0C2D8BF69E@linaro.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 5/7/18 8:03 AM, Paolo Valente wrote: > Hi Jens, Christoph, all, > Mike Galbraith has been experiencing hangs, on blk_mq_get_tag, only > with bfq [1]. Symptoms seem to clearly point to a problem in I/O-tag > handling, triggered by bfq because it limits the number of tags for > async and sync write requests (in bfq_limit_depth). > > Fortunately, I just happened to find a way to apparently confirm it. > With the following one-liner for block/bfq-iosched.c: > > @@ -554,8 +554,7 @@ static void bfq_limit_depth(unsigned int op, struct blk_mq_alloc_data *data) > if (unlikely(bfqd->sb_shift != bt->sb.shift)) > bfq_update_depths(bfqd, bt); > > - data->shallow_depth = > - bfqd->word_depths[!!bfqd->wr_busy_queues][op_is_sync(op)]; > + data->shallow_depth = 1; > > bfq_log(bfqd, "[%s] wr_busy %d sync %d depth %u", > __func__, bfqd->wr_busy_queues, op_is_sync(op), > > Mike's machine now crashes soon and systematically, while nothing bad > happens on my machines, even with heavy workloads (apart from an > expected throughput drop). > > This change simply reduces to 1 the maximum possible value for the sum > of the number of async requests and of sync write requests. > > This email is basically a request for help to knowledgeable people. To > start, here are my first doubts/questions: > 1) Just to be certain, I guess it is not normal that blk-mq hangs if > async requests and sync write requests can be at most one, right? > 2) Do you have any hint to where I could look for, to chase this bug? > Of course, the bug may be in bfq, i.e, it may be a somehow unrelated > bfq bug that causes this hang in blk-mq, indirectly. But it is hard > for me to understand how. CC Omar, since he implemented the shallow part. But we'll need some traces to show where we are hung, probably also the value of the /sys/debug/kernel/block// directory. For the crash mentioned, a trace as well. Otherwise we'll be wasting a lot of time on this. Is there a reproducer? -- Jens Axboe