Received: by 10.192.165.148 with SMTP id m20csp3126039imm; Mon, 7 May 2018 07:06:37 -0700 (PDT) X-Google-Smtp-Source: AB8JxZredb3UVmUcavo0nqHmWFn0ccdmfBBXIKZOSuYDrLHxD3s97dIYNKJiuNihR7lQQMClc3NW X-Received: by 2002:a65:498e:: with SMTP id r14-v6mr29487979pgs.78.1525701997910; Mon, 07 May 2018 07:06:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525701997; cv=none; d=google.com; s=arc-20160816; b=X4yMVkli42sV1sONh03i4NXEzyMBZotbbiyz8Mnhw3v4V1aNnwz/1/uuZnD4BTJ3Zw bCYHRTYGK9rQecJqWVY3240uxHixUTMLbJgcBl6MqAjgzNKVyv1gLKCtMje5FdS3HGOU MxGb1D4Rrlw/gyywTiT+s41w/xzRCzj2ZteaXz1FUIZjB3F7WX8+OugFxb8216jvDXMl JC+HMq/1tg66NCcIZaCy6Ry0TbKin1MtExnP7RVzsNjUsvvcY8tT7CpekifzdU/L8wa1 xE1tXFR+Q6KwPQUXzaaeJ4wIbveI0NQYDor4953pOL94JGQahZtcGVNhh8Z4MtJvO2yq MLWQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:to:cc:date:message-id:subject :mime-version:content-transfer-encoding:from:dkim-signature :arc-authentication-results; bh=aX3sElnXYaDmdtSD3wvPr9XvGDIyyBnlOa0fGvpvEsU=; b=kdzsFH+PmA/IqG88oiP6QLACrSH/zhNLsNlwVY9ElrZ8xxw4fgYUOPjNtHyRw5rdFe 9srisDupRarqh4p0ZhQi3DuElqaeidX/NWtFadAslQNvxObWdJMJavCzP3AdHEYtSSf1 5Pa1K9RoTwvUpAfRVNhhA/KPP2U65gzvO0GCiDI/PcfxhHEB1qSssbLoeFWU1Ierem6V 8Vf6rHTExlXvTIFbip6yNng+rpXdkq/o7x5oTzejztR/k7JkXpQ+6UMaP/2kXqYV6aav UjAljQoOGLHfggdzRRlpL2VFMiud5tQNizeFs3Vx4xZyTbbOp1m8lvzE5TZ5lHPJ0S7c FcTg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=K+GpU37r; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e203si22767035pfh.86.2018.05.07.07.06.22; Mon, 07 May 2018 07:06:37 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=K+GpU37r; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752469AbeEGOEO (ORCPT + 99 others); Mon, 7 May 2018 10:04:14 -0400 Received: from mail-wr0-f170.google.com ([209.85.128.170]:43921 "EHLO mail-wr0-f170.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752322AbeEGOEK (ORCPT ); Mon, 7 May 2018 10:04:10 -0400 Received: by mail-wr0-f170.google.com with SMTP id v15-v6so28887449wrm.10 for ; Mon, 07 May 2018 07:04:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:content-transfer-encoding:mime-version:subject:message-id:date :cc:to; bh=aX3sElnXYaDmdtSD3wvPr9XvGDIyyBnlOa0fGvpvEsU=; b=K+GpU37rlxwnkQHB8pYirdxMLFS9vs88eGDw3QA9dvdUZw7OvwS7f0K05Ysw+Sv/04 VjCrv5CxGuONFexNp1ztON3EWpXZZKQaly4kmfKky8pZm/5ywOLBSVCxgN59pC463DTn 4fKFExOfDyni+de3H4gAnNpaEacLphg507UT0= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:content-transfer-encoding:mime-version :subject:message-id:date:cc:to; bh=aX3sElnXYaDmdtSD3wvPr9XvGDIyyBnlOa0fGvpvEsU=; b=jHq7ZJlEiSbTnnxq6ByLTVfQBiHRlM1Pph1Pk0QPij2tl7SgUeX4Yn6JpyGmfisAez 3IuhLrEulzGDt/40u7Z7VYwpMf5h7TQNLY8zMgYR/BUpNsNx232pjQ9dzzH6XZqwrpvg e2qr9OAWWprkNeLPjU7PXoGCvZEarpg7hN7lLh6BvhNZ4RsmHn6OfjCoXOsn0XigrCIc 2X7LXCtf+eqSE/uK4zy9P/8L2TY3DKBfigtoWVMiv+Jou5QHvUinov2sWAVwkdvQizxc uvnSIOS/DhdabpDAqgFop4/FgQ0pMXNzKxz78L9TGlBuFFlsGVpnoL1CUBB8oMa2WQGX YdZw== X-Gm-Message-State: ALQs6tAEPHFCKfwJurShH6i2sMOBiYNA300KceS0+d81CuexfzHB6bFG Mo4DLXbPdHKVbHwUhtH7TsX5LSyJa8c= X-Received: by 2002:adf:b1cd:: with SMTP id r13-v6mr27804056wra.221.1525701848750; Mon, 07 May 2018 07:04:08 -0700 (PDT) Received: from wifi-122_dhcprange-204.wifi.unimo.it (wifi-122_dhcprange-204.wifi.unimo.it. [155.185.122.204]) by smtp.gmail.com with ESMTPSA id y6-v6sm3961863wmy.39.2018.05.07.07.03.59 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 07 May 2018 07:04:07 -0700 (PDT) From: Paolo Valente Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 11.3 \(3445.6.18\)) Subject: bug in tag handling in blk-mq? Message-Id: <999DF2B3-4EE8-4BDF-89C5-EB0C2D8BF69E@linaro.org> Date: Mon, 7 May 2018 16:03:34 +0200 Cc: linux-block , Ulf Hansson , LKML , Linus Walleij , Ulf Hansson , Oleksandr Natalenko To: Mike Galbraith , Jens Axboe , Christoph Hellwig X-Mailer: Apple Mail (2.3445.6.18) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Jens, Christoph, all, Mike Galbraith has been experiencing hangs, on blk_mq_get_tag, only with bfq [1]. Symptoms seem to clearly point to a problem in I/O-tag handling, triggered by bfq because it limits the number of tags for async and sync write requests (in bfq_limit_depth). Fortunately, I just happened to find a way to apparently confirm it. With the following one-liner for block/bfq-iosched.c: @@ -554,8 +554,7 @@ static void bfq_limit_depth(unsigned int op, struct = blk_mq_alloc_data *data) if (unlikely(bfqd->sb_shift !=3D bt->sb.shift)) bfq_update_depths(bfqd, bt); =20 - data->shallow_depth =3D - = bfqd->word_depths[!!bfqd->wr_busy_queues][op_is_sync(op)]; + data->shallow_depth =3D 1; =20 bfq_log(bfqd, "[%s] wr_busy %d sync %d depth %u", __func__, bfqd->wr_busy_queues, op_is_sync(op), Mike's machine now crashes soon and systematically, while nothing bad happens on my machines, even with heavy workloads (apart from an expected throughput drop). This change simply reduces to 1 the maximum possible value for the sum of the number of async requests and of sync write requests. This email is basically a request for help to knowledgeable people. To start, here are my first doubts/questions: 1) Just to be certain, I guess it is not normal that blk-mq hangs if async requests and sync write requests can be at most one, right? 2) Do you have any hint to where I could look for, to chase this bug? Of course, the bug may be in bfq, i.e, it may be a somehow unrelated bfq bug that causes this hang in blk-mq, indirectly. But it is hard for me to understand how. Looking forward to some help. Thanks, Paolo [1] https://www.spinics.net/lists/stable/msg215036.html=