Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp1043096pxb; Thu, 4 Mar 2021 01:30:57 -0800 (PST) X-Google-Smtp-Source: ABdhPJyuRUUjzMRB6xOy1qaLzbBzeMlnmhqSXQlyJB0sfl4eUBTyxJRlSGhMZ1P0HqoA0F3xXo5T X-Received: by 2002:a17:906:5d12:: with SMTP id g18mr3191597ejt.246.1614850257689; Thu, 04 Mar 2021 01:30:57 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1614850257; cv=none; d=google.com; s=arc-20160816; b=VcheyfmxxZKFTyvvROIcvvZYAFusNWO6xK3iAncOS2hAevTQBaqipWsCxnG7O1c1mp +Crffh/yATMo3Gd51KBbn4r6pf50Lv/4L+B0pHWQ+mn0duuX/QllBei7VIbUDpfMdbH6 F6jXqDJyJ3UukE+JDfwsjR3PeEBvrfGpAQusBGCr8fPMcQKfLckONPl6HrVSedBHw1PE b3jYzf8T6TJPtq+rZZgd4zH5eZ6LUUkzHangFG1IKZf4o4bHV61C1ir6Fjk4+sv0xJ6g OEnXfjlWGFfIvDsoFLVRUd6GWUzISVu3KMT0aT0vXuBtN96XPN2VAiaz27cgxHuqd0tQ 3DIQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date :dkim-signature; bh=zdJ5hx+mYbfNreiVWKd99KRYazRIfUe4brAtRfe/xF4=; b=hxnAKAWNm8I0PjowgsI/mY6UuAhNGgH3RfafUgI2O89IFNj/8qxLmNKh6emprjncDj y1coyV7yaiR0jyeoCyplVBoeaWiSPkjY/CeinfQg1s7WZ82ld5lUMYuHx8D3OAEQr/vo Uahvj9mUmceLbDqP1AP9cHcagIiI+2vBeiDxXCCvApj1cIgkQGgtgeW55qmFBNgp+f2e 1VWGDwEpAUIwSsAmCOyEs/bhSZh+5m2a/XZq5up0fh9JPiR4xxMVvsyCSwRfnJ3XF+qn rwbMjkwxkOScVIpzfHO3DPg8n5u6iqqSLeXQ1DFhjXvD+okVWv3AByevFfzY37ELrBJa 3jpA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=K+6wg9BU; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id a18si8464637ejr.378.2021.03.04.01.30.34; Thu, 04 Mar 2021 01:30:57 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=K+6wg9BU; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1446145AbhCCPB5 (ORCPT + 99 others); Wed, 3 Mar 2021 10:01:57 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37810 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1843060AbhCCKZJ (ORCPT ); Wed, 3 Mar 2021 05:25:09 -0500 Received: from mail-wm1-x32b.google.com (mail-wm1-x32b.google.com [IPv6:2a00:1450:4864:20::32b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1AA81C0610CE; Wed, 3 Mar 2021 00:55:44 -0800 (PST) Received: by mail-wm1-x32b.google.com with SMTP id i9so4480787wml.0; Wed, 03 Mar 2021 00:55:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=zdJ5hx+mYbfNreiVWKd99KRYazRIfUe4brAtRfe/xF4=; b=K+6wg9BUznm5a/+9b9QJ7i9CChtLVD4n5KpT38ZA7QuZ71i/zpE6c5PwKFC5vlJVh6 DhhowwzJ76bXSS0tQp4FXcUh1skR513TO5ktNOi0c03sUdRBG73n1d5THMiMZDsmN339 wRqGWi4Qi7zNv+UkUWrn5CskAi6Jl9sFYGf3Ygjt7vc9S4QLLZD+0NIPewtfRslu3qdJ j74C1vM98bGMvHT1RfoU5NMfcNVz/NNzY4vZ8Rr5r9IQ/Yo6sOLTs60QygBWVb19pc1S j7KcwVVdncAxdiPLaegbiS4FUQjOedYDq3IzUNFcUw7udfrRHkOQxBeJgtHp53hdwE4c xGxQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=zdJ5hx+mYbfNreiVWKd99KRYazRIfUe4brAtRfe/xF4=; b=HtumlKycCgRUz413V+rEfk7u1Nu0iW/jLB9dok6x4ET+cpihUOAVdzYggzQMUailXs 3JW3Hwy6R0WJu3CCxJbUlGeqTju95Ej7VpCGvN/LntodZjqYNF78wnS7yoWy7ca3E/ge G2qzcImkwwKQRsWVvYRLnxkwvf0Mi8wdtOvQ1Uqi4j2rpVrqCM+iFfnTBjM26sYlwM8/ LxHigQamagS1iBpw8PQYEZfqAR8JRLgQs621I+RSpefyKE/LffHn0TCVAHRj69DmwT2O LEPy4dXJesdBUmtw5ObalvP34mIPhGfXW7jQGQrWx1Sbl6OidbH7oNNgXXh8wC0Dlg2V 0EbQ== X-Gm-Message-State: AOAM533B3u0z1iL2p70QrPgGBRrqqkdb5pCwybEXAWcSESqJQySRkxwC jUf6GC/CVYHe7YHJgsXzUortyFWhiGXWlxSs X-Received: by 2002:a1c:8041:: with SMTP id b62mr8441707wmd.0.1614761742701; Wed, 03 Mar 2021 00:55:42 -0800 (PST) Received: from sf (tunnel547699-pt.tunnel.tserv1.lon2.ipv6.he.net. [2001:470:1f1c:3e6::2]) by smtp.gmail.com with ESMTPSA id n6sm11688949wrt.1.2021.03.03.00.55.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 03 Mar 2021 00:55:42 -0800 (PST) Date: Wed, 3 Mar 2021 08:55:33 +0000 From: Sergei Trofimovich To: John Paul Adrian Glaubitz , Don Brace , storagedev@microchip.com, linux-scsi@vger.kernel.org Cc: linux-ia64@vger.kernel.org, linux-kernel@vger.kernel.org, Joe Szczypek , Scott Benesh , Scott Teel , Tomas Henzl , "Martin K. Petersen" Subject: Re: [bisected] 5.12-rc1 hpsa regression: "scsi: hpsa: Correct dev cmds outstanding for retried cmds" breaks hpsa P600 Message-ID: <20210303085533.505b1590@sf> In-Reply-To: <20210303002236.2f4ec01f@sf> References: <20210222230519.73f3e239@sf> <8decdd2e-a380-9951-3ebb-2bc3e48aa1c3@physik.fu-berlin.de> <20210223083507.43b5a6dd@sf> <51cbf584-07ef-1e62-7a3b-81494a04faa6@physik.fu-berlin.de> <9441757f-d4bc-a5b5-5fb0-967c9aaca693@physik.fu-berlin.de> <20210223192743.0198d4a9@sf> <20210302222630.5056f243@sf> <25dfced0-88b2-b5b3-f1b6-8b8a9931bf90@physik.fu-berlin.de> <20210303002236.2f4ec01f@sf> X-Mailer: Claws Mail 3.17.8 (GTK+ 2.24.32; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 3 Mar 2021 00:22:36 +0000 Sergei Trofimovich wrote: > On Tue, 2 Mar 2021 23:31:32 +0100 > John Paul Adrian Glaubitz wrote: > > > Hi Sergei! > > > > On 3/2/21 11:26 PM, Sergei Trofimovich wrote: > > > Gave v5.12-rc1 a try today and got a similar boot failure around > > > hpsa queue initialization, but my failure is later: > > > https://dev.gentoo.org/~slyfox/configs/guppy-dmesg-5.12-rc1 > > > Maybe I get different error because I flipped on most debugging > > > kernel options :) > > > > > > Looks like 'ERROR: Invalid distance value range' while being > > > very scary are harmless. It's just a new spammy way for kernel > > > to report lack of NUMA config on the machine (no SRAT and SLIT > > > ACPI tables). > > > > > > At least I get hpsa detected on PCI bus. But I guess it's discovered > > > configuration is very wrong as I get unaligned accesses: > > > [ 19.811570] kernel unaligned access to 0xe000000105dd8295, ip=0xa000000100b874d1 > > > > > > Bisecting now. > > > > Sounds good. I guess we should get Jens' fix for the signal regression > > merged as well as your two fixes for strace. > > "bisected" (cheated halfway through) and verified that reverting > f749d8b7a9896bc6e5ffe104cc64345037e0b152 makes rx3600 boot again. > > CCing authors who might be able to help us here. > > commit f749d8b7a9896bc6e5ffe104cc64345037e0b152 > Author: Don Brace > Date: Mon Feb 15 16:26:57 2021 -0600 > > scsi: hpsa: Correct dev cmds outstanding for retried cmds > > Prevent incrementing device->commands_outstanding for ioaccel command > retries that are driver initiated. If the command goes through the retry > path, the device->commands_outstanding counter has already accounted for > the number of commands outstanding to the device. Only commands going > through function hpsa_cmd_resolve_events decrement this counter. > > - ioaccel commands go to either HBA disks or to logical volumes comprised > of SSDs. > > The extra increment is causing device resets to hang. > > - Resets wait for all device outstanding commands to complete before > returning. > > Replace unused field abort_pending with retry_pending. This is a > maintenance driver so these changes have the least impact/risk. > > Link: https://lore.kernel.org/r/161342801747.29388.13045495968308188518.stgit@brunhilda > Tested-by: Joe Szczypek > Reviewed-by: Scott Benesh > Reviewed-by: Scott Teel > Reviewed-by: Tomas Henzl > Signed-off-by: Don Brace > Signed-off-by: Martin K. Petersen > > Don, do you happen to know why this patch caused some controller init failure > for device > 14:01.0 RAID bus controller: Hewlett-Packard Company Smart Array P600 > ? > > Boot failure: https://dev.gentoo.org/~slyfox/configs/guppy-dmesg-5.12-rc1 > Boot success: https://dev.gentoo.org/~slyfox/configs/guppy-dmesg-5.12-rc1-good > > The difference between the two boots is > f749d8b7a9896bc6e5ffe104cc64345037e0b152 reverted on top of 5.12-rc1 > in -good case. > > Looks like hpsa controller fails to initialize in bad case (could be a race?). Also CCing hpsa maintainer mailing lists. Looking more into the suspect commit https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f749d8b7a9896bc6e5ffe104cc64345037e0b152 it roughly does the: @@ -448,7 +448,7 @@ struct CommandList { */ struct hpsa_scsi_dev_t *phys_disk; - int abort_pending; + bool retry_pending; struct hpsa_scsi_dev_t *device; atomic_t refcount; /* Must be last to avoid memset in hpsa_cmd_init() */ } __aligned(COMMANDLIST_ALIGNMENT); ... @@ -1151,7 +1151,10 @@ static void __enqueue_cmd_and_start_io(struct ctlr_info *h, { dial_down_lockup_detection_during_fw_flash(h, c); atomic_inc(&h->commands_outstanding); - if (c->device) + /* + * Check to see if the command is being retried. + */ + if (c->device && !c->retry_pending) atomic_inc(&c->device->commands_outstanding); But I don't immediately see anything wrong with it. -- Sergei