Received: by 10.192.165.148 with SMTP id m20csp3368907imm; Mon, 7 May 2018 11:02:36 -0700 (PDT) X-Google-Smtp-Source: AB8JxZovfnu7KZh2LUdNYmhQZtVTCWagRC7LeAJI32csykw/awD0NVi62alhi3lLzwTLrgTXS6vn X-Received: by 2002:a24:fe46:: with SMTP id w67-v6mr2642822ith.46.1525716156544; Mon, 07 May 2018 11:02:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525716156; cv=none; d=google.com; s=arc-20160816; b=TexvadHgXWeMPfEDPa+UnTDso58FWroOR62d9zJblGc+Z2E3iYQ80Zy613kHYZ7jQM e95Fp5yL/Cx6Lwsz7qxTd42n8V2wCDSQDh5oDM6v1rTZJW+daiHWTGCt+U7Wvq7FH5XK /ajWQk8UjF5L+JZngZLgTcf7II6qOF6xI6OCYC1xNdFQfW18jpaTqXJbEQlfP7gsbvmr RqdSs/dMVT5ybUEv4znazWwOl3BjxwmxZIXKBbJ7psNEzmZBXVOqag2WJj+ZBLy7NZ0o EoI+BFW+ZhGeQpL0XTGXflqKgvD0ra5lUaqgWR0ZZ5pXi+uwpsn7LPjNaXXK7B6LQrhM 6dEw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:to:references:message-id :content-transfer-encoding:cc:date:in-reply-to:from:subject :mime-version:dkim-signature:arc-authentication-results; bh=F1LOFYQGyOqotyRkhaZvlwuwvb1rPSjk0IsOuztfDxE=; b=lHyrHxiK02WU3cF9RiJe5ytl0YL6ZOPiLr9b0pJKXUhy9JhtOo8VbUHgGCiWyEb5Gz v/um2414rU5dM4oSwHKJFdBxSG629n50IxhXqwU8ys2+bMv+oj3ES47pc8d8th2JllZD ySjsBukjPg6qHkpThsE6YTUwcqAB/zdetNlJZCgi9viUaW6fhLtvOAKmnsowEvuDqvR2 Rex/xIh8mfP/A+kfGidhIt4blAUfPTjbWz4Rn1O9fWT5Fo9pQq4FUB1frhscj2qzeWe1 60H6nNijc2Ci8s7zVUeCynDWfJkgi5QLgE+c+jo+szQq6o5HNcizd3IAuFSbnHu7JMKI Lmvg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=eArbRysO; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id n22-v6si18311435ioc.138.2018.05.07.11.02.23; Mon, 07 May 2018 11:02:36 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=eArbRysO; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752432AbeEGSCH (ORCPT + 99 others); Mon, 7 May 2018 14:02:07 -0400 Received: from mail-wr0-f179.google.com ([209.85.128.179]:37927 "EHLO mail-wr0-f179.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752158AbeEGSCG (ORCPT ); Mon, 7 May 2018 14:02:06 -0400 Received: by mail-wr0-f179.google.com with SMTP id 94-v6so28357611wrf.5 for ; Mon, 07 May 2018 11:02:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=F1LOFYQGyOqotyRkhaZvlwuwvb1rPSjk0IsOuztfDxE=; b=eArbRysO5Mo/CsBvdIt/RTbSBRHfjmvuC/eaZrY7FMJZpbrYcR7lUmpynh5fwkz+r4 hiyc/nFhb3Kq2KD51NAm31Mn+B5iy7PW/oz6Ir0HGIIGpsTS1G5GPoSQTGJZ+Zo9LLQ1 IJyBLJw7a/x3WZdam9ykBlvVP9NFXeNGjcNKI= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=F1LOFYQGyOqotyRkhaZvlwuwvb1rPSjk0IsOuztfDxE=; b=lv43xCKfxDe8nvTGSFMec1J/UX4yDGZ92Tnh0YPV0HRTkNBTv9UuKoBrRvyKJ/S8Z8 CBGGj6P+kn9X9nyaI+EobO2wOihMV9YrHhEuydocClsNAfQR3J8FTP4v8L+EXrCcHSs1 Bs5NqVMjmzhk4VieacQy9iItusKUELVg9/RkAFHT7Ulwla/jaVBrQzp7iDUFBwQLAHbp g09aZrx8nHBk9tcJ3TpsP9z7moZJGrM+k/mPeIWMwH3onLtRIipTpbsuNwFkdU0LIQIa pKq6jDlZyZTjmzJAiJZcpf/a/hAjoyGGZX/PVokgrHnbXIag2TTu21b7dVA2l4j6eX8Z byNw== X-Gm-Message-State: ALQs6tAXebhViKI6+wmb1co58anPcNy+Goh7aUCEWkTPlsglpmZlny78 CYaJ2gWBVrMg35hMUBon/+rSLQ== X-Received: by 2002:adf:a805:: with SMTP id l5-v6mr28128362wrc.97.1525716124799; Mon, 07 May 2018 11:02:04 -0700 (PDT) Received: from [192.168.0.105] (146-241-57-124.dyn.eolo.it. [146.241.57.124]) by smtp.gmail.com with ESMTPSA id 131-v6sm13002281wms.34.2018.05.07.11.02.03 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 07 May 2018 11:02:03 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 11.3 \(3445.6.18\)) Subject: Re: bug in tag handling in blk-mq? From: Paolo Valente In-Reply-To: <7760d23b-7a4c-a645-1c7a-da7569bb44dc@kernel.dk> Date: Mon, 7 May 2018 20:02:03 +0200 Cc: Mike Galbraith , Christoph Hellwig , linux-block , Ulf Hansson , LKML , Linus Walleij , Oleksandr Natalenko Content-Transfer-Encoding: quoted-printable Message-Id: <84145CD7-B917-4B32-8A5C-310C1910DB71@linaro.org> References: <999DF2B3-4EE8-4BDF-89C5-EB0C2D8BF69E@linaro.org> <7760d23b-7a4c-a645-1c7a-da7569bb44dc@kernel.dk> To: Jens Axboe X-Mailer: Apple Mail (2.3445.6.18) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > Il giorno 07 mag 2018, alle ore 18:39, Jens Axboe ha = scritto: >=20 > On 5/7/18 8:03 AM, Paolo Valente wrote: >> Hi Jens, Christoph, all, >> Mike Galbraith has been experiencing hangs, on blk_mq_get_tag, only >> with bfq [1]. Symptoms seem to clearly point to a problem in I/O-tag >> handling, triggered by bfq because it limits the number of tags for >> async and sync write requests (in bfq_limit_depth). >>=20 >> Fortunately, I just happened to find a way to apparently confirm it. >> With the following one-liner for block/bfq-iosched.c: >>=20 >> @@ -554,8 +554,7 @@ static void bfq_limit_depth(unsigned int op, = struct blk_mq_alloc_data *data) >> if (unlikely(bfqd->sb_shift !=3D bt->sb.shift)) >> bfq_update_depths(bfqd, bt); >>=20 >> - data->shallow_depth =3D >> - = bfqd->word_depths[!!bfqd->wr_busy_queues][op_is_sync(op)]; >> + data->shallow_depth =3D 1; >>=20 >> bfq_log(bfqd, "[%s] wr_busy %d sync %d depth %u", >> __func__, bfqd->wr_busy_queues, = op_is_sync(op), >>=20 >> Mike's machine now crashes soon and systematically, while nothing bad >> happens on my machines, even with heavy workloads (apart from an >> expected throughput drop). >>=20 >> This change simply reduces to 1 the maximum possible value for the = sum >> of the number of async requests and of sync write requests. >>=20 >> This email is basically a request for help to knowledgeable people. = To >> start, here are my first doubts/questions: >> 1) Just to be certain, I guess it is not normal that blk-mq hangs if >> async requests and sync write requests can be at most one, right? >> 2) Do you have any hint to where I could look for, to chase this bug? >> Of course, the bug may be in bfq, i.e, it may be a somehow unrelated >> bfq bug that causes this hang in blk-mq, indirectly. But it is hard >> for me to understand how. >=20 > CC Omar, since he implemented the shallow part. But we'll need some > traces to show where we are hung, probably also the value of the > /sys/debug/kernel/block// directory. For the crash mentioned, a > trace as well. Otherwise we'll be wasting a lot of time on this. >=20 > Is there a reproducer? >=20 Ok Mike, I guess it's your turn now, for at least a stack trace. Thanks, Paolo > --=20 > Jens Axboe