Received: by 2002:a05:6358:16cc:b0:ea:6187:17c9 with SMTP id r12csp89481rwl; Sat, 24 Dec 2022 14:54:47 -0800 (PST) X-Google-Smtp-Source: AMrXdXt6VD7ogsT1EZfq+MTWzes67wH1YHxgp9mcUFrFmuUPi7wBzJHuX5o6QEgyyykZryoDZADi X-Received: by 2002:a17:903:3284:b0:188:82fc:e277 with SMTP id jh4-20020a170903328400b0018882fce277mr16022255plb.12.1671922487383; Sat, 24 Dec 2022 14:54:47 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1671922487; cv=none; d=google.com; s=arc-20160816; b=caPBcFkGW+ZDVdRbxsHajjm0EGU1AicrPK0VqJM4X2Ydv8b+VaoReJTX5WX7ysb3y4 8Daihld8fLnCM7kbrWI3Q2AzV+qR0NwsTNMpcK/Bwo/D+01aJcQP6npeZ3AiX1KEsyDS eNY+yWob/3XAGwDzBwjILKjhYYvHBdBPgkaWIQYK7Kj5Y0OLJEAKgGEHusD1sAgv5IT4 5WFzkh7Hxfofx1JosvJ8lBBSZ8FmEefRmawNFhSXVExq4O5csK+AvPCfaC9a2Oxk0o2f jgkxr4s2zeJhZm1L1vw+ueWSHYxzDzGUDgWN4rdJrk6+0WtGypm3oZN2s6ZfA/f08hFy v6VA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=4UGgvDYwkEzkWlAoJg7VVnlO9d0ObHmSrmzZPksKgQg=; b=kbjq0mlgBqfX56Pj79uwip4xK7oKwnnINCJCuTpv24twYWuUSK1T3JLHHRWSOd2FM0 hgNtWddteR7kiz98zP+B/zlQU0wl4Yt4dDR6MUWN9/vRSOf7/MyZwl6n6n1fXJBSrmzD UdRQb7lnS9InogemsIH5ZETeDSn7gMjK3/q9TwVvUuVT2UaV/sqD25VFOO0uTgsqVuXd t+hDGPUQIXnXmT1JAQyPnNaWSju75KU4wAZLDmZ6IwM3SvnfMQAANjf4Wk3DT6SnK2Am VJdB2CxFfzIDB+HkgO/HSuxNbPFcYLOAl8Tii36Kv2ONLofBIFrgObJQ+/slF87FEPyf RkIA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=azEswh39; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id iz17-20020a170902ef9100b00188e045470csi6574541plb.542.2022.12.24.14.54.38; Sat, 24 Dec 2022 14:54:47 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=azEswh39; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230200AbiLXWGs (ORCPT + 66 others); Sat, 24 Dec 2022 17:06:48 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60524 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229499AbiLXWGq (ORCPT ); Sat, 24 Dec 2022 17:06:46 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 224A765C3; Sat, 24 Dec 2022 14:06:45 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 96D3BB80184; Sat, 24 Dec 2022 22:06:43 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7FE9CC433D2; Sat, 24 Dec 2022 22:06:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1671919602; bh=FDRW/65+iOIG+xRl7DZC4z2D8oXawEPLvY0Logj1h7I=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=azEswh39v0IPU+yFWhK/l7jZnBdIuQyRXDASQsXm26TOOIzxu45HD40/Jn/oOhOxB l8kSRtNTPKTH4KrhYFqQD8IwI59cyqoCffkqxPjuPgu6GZu2fP7ihtHA/8EfiUCPdm igs7kG3lysEgaOhZICnOxgh12LqqdfXwzz1RE/+HUO4/xEW7UY9PjwGB+BR8JCvbrQ wpne5B4xhd0oFoRhz/ea7mYjO8NAKeh8CBQMgP0PRr9cscfG1cfmom0YJ+S8eXIZcx KQ4RoXlKeKdjA6ifcFhSS5mJIcfUZAczlIElcgldw4z4ZTmIjjs0tI+5HTtvgddYpV jj/HBtGAatVFQ== Date: Sat, 24 Dec 2022 15:06:38 -0700 From: Keith Busch To: Hugh Dickins Cc: Christoph Hellwig , Jens Axboe , Sagi Grimberg , Chaitanya Kulkarni , Linus Torvalds , Thorsten Leemhuis , linux-block@vger.kernel.org, linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org Subject: Re: 6.2 nvme-pci: something wrong Message-ID: References: <572cfcc0-197a-9ead-9cb-3c5bf5e735@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <572cfcc0-197a-9ead-9cb-3c5bf5e735@google.com> X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Dec 23, 2022 at 09:24:56PM -0800, Hugh Dickins wrote: > Hi Christoph, > > There's something wrong with the nvme-pci heading for 6.2-rc1: > no problem booting here on this Lenovo ThinkPad X1 Carbon 5th, > but under load... > > nvme nvme0: I/O 0 (I/O Cmd) QID 2 timeout, aborting > nvme nvme0: I/O 1 (I/O Cmd) QID 2 timeout, aborting > nvme nvme0: I/O 2 (I/O Cmd) QID 2 timeout, aborting > nvme nvme0: I/O 3 (I/O Cmd) QID 2 timeout, aborting > nvme nvme0: Abort status: 0x0 > nvme nvme0: Abort status: 0x0 > nvme nvme0: Abort status: 0x0 > nvme nvme0: Abort status: 0x0 > nvme nvme0: I/O 0 QID 2 timeout, reset controller > > ...and more, until I just have to poweroff and reboot. > > Bisection points to your > 0da7feaa5913 ("nvme-pci: use the tagset alloc/free helpers") > And that does revert cleanly, giving a kernel which shows no problem. > > I've spent a while comparing old nvme_pci_alloc_tag_set() and new > nvme_alloc_io_tag_set(), I do not know my way around there at all > and may be talking nonsense, but it did look as if there might now > be a difference in the queue_depth, sqsize, q_depth conversions. > > I'm running load successfully with the patch below, but I strongly > suspect that the right patch will be somewhere else: over to you! > > Hugh > > --- a/drivers/nvme/host/core.c > +++ b/drivers/nvme/host/core.c > @@ -4926,7 +4926,7 @@ int nvme_alloc_io_tag_set(struct nvme_ct > > memset(set, 0, sizeof(*set)); > set->ops = ops; > - set->queue_depth = ctrl->sqsize + 1; > + set->queue_depth = ctrl->sqsize; Your observation is a queue-wrap condition that makes it impossible for the controller know there are new commands. Your patch does look like the correct thing to do. The "zero means one" thing is a confusing distraction, I think. It makes more sense if you consider sqsize as the maximum number of tags we can have outstanding at one time and it looks like all the drivers set it that way. We're supposed to leave one slot empty for a full NVMe queue, so adding one here to report the total number slots isn't right since that would allow us to fill all slots. Fabrics drivers have been using this method for a while, though, so interesting they haven't had a simiar problem.