Received: by 2002:a05:6358:701b:b0:131:369:b2a3 with SMTP id 27csp2545534rwo; Sun, 23 Jul 2023 18:34:55 -0700 (PDT) X-Google-Smtp-Source: APBJJlGwOm3OXj3TxzFmNizmGc01fw4NxoBk8D9wSYc6FtCfWb0OanJlMacVrgPbvzn8lcmTA6c5 X-Received: by 2002:a17:907:2cf7:b0:993:f8b2:d6fa with SMTP id hz23-20020a1709072cf700b00993f8b2d6famr7284349ejc.21.1690162495260; Sun, 23 Jul 2023 18:34:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690162495; cv=none; d=google.com; s=arc-20160816; b=vcEQie1VNqpbsbvErW2QqOj86+OKCJeSLbZhm9u9C9iYX6JAbS+iVmrMYT3pYS0HLb 526k45TXCp8DkLNcqkw7cgz0SnMajnKCUEv9zn2dkEZg2o+wGDvA/lj84GGtWkc/jbCE h439uPXeFwyEQrivxb4AqeHbHukQ5fLqkI0bKEif6ckUF/f9IxTGhNdNrne2e+MMhRo/ 5ziQ17lmQrCsG/s2eY+wrsCKyjRcTL3UvDnRqQCSmXS6P2eE/WR5DJ5wFRfKljKjnHan a01r2ieoB/zOSnuq7WZ5N9PqSUsumRbJa/IBnqwwNEyrev93NE41n8+xsfySwdarM5oC OSrA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=UZNWG4k0Sv5Mnu5xgxSLJFWWGXOJRQNVg5J3OXU1+Nk=; fh=aFAHlp+P6uDpLNyeoDwt19gyYVpGBdswqlDa+wDtXGU=; b=ndnVO5cxc9in4bakrc+2xGX0E0g0mk5chncXazSFyqxV0DQCfBhKmQC2efWOSOhHQV FUsbQcIrGGOeIzTmW6s370fFoeBxD4gc0ZZuJMCOblSrXKupB7an+mVICX8Lad/7kg2Y 3i5A5apJqluT6kzoQgs8ALF5l30U8AyXlQ+vBEAfORXQzdQEcapZsXOBcmptc4M+Ckoe nlwBcq4d+6C7eUHflJHkakWGGh4MLw9h3hnq7myvA2UWOX2SxX/PWHXnfApYElnGzP01 ZnpqS80ssBRxzaT7cxhKIeE6x5C2v4qDyUNw2KH6gAZf/yljbb8LtZQTBevlorWpB39N sSpQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=C8KB6mbo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id jx15-20020a170906ca4f00b00991df86ac0fsi5675272ejb.290.2023.07.23.18.34.31; Sun, 23 Jul 2023 18:34:55 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=C8KB6mbo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230027AbjGXBT7 (ORCPT + 99 others); Sun, 23 Jul 2023 21:19:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50518 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229926AbjGXBT6 (ORCPT ); Sun, 23 Jul 2023 21:19:58 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1E50E171F; Sun, 23 Jul 2023 18:19:30 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id DA50E60EF9; Mon, 24 Jul 2023 01:18:32 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0B3ACC433C8; Mon, 24 Jul 2023 01:18:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1690161512; bh=Gx0A6rJXlbwCqQAX4Gy3l3XZri+JPfTyJS42TR/E9NE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=C8KB6mboxgRbRqwcGC1i3Czl4GwNFLtMlN5xdovMtQB7IjyoKuD8FBcP6kH7yuHby dFgEvSXo2votroXvzgzDQc8817hAeCJIcyBz330yP6X5tHeEkI13kyMQvZ8z45g8F8 h8QUj+Q6r0mNOSJf2ZAzGPm8PP6y7QSwhuql/xhcFuddN7dtl5EUshzqMAgd8gIf95 wYVNqvAqOxlsvl6d2mGyUQIUCrafYZycP+2UBz3JRjkMZjfPnCf9WyMVQiNV2+F8MD C4Q4mlauUKRsrT26M6JA70FkzbDRI7gC7LEfz2zM+lC2cWWCB+Pf32N77Zn7O2CAJX l8ONtdzg18s3g== From: Sasha Levin To: linux-kernel@vger.kernel.org, stable@vger.kernel.org Cc: Ofir Bitton , Oded Gabbay , Sasha Levin , ttayar@habana.ai, stanislaw.gruszka@linux.intel.com, dliberman@habana.ai, osharabi@habana.ai, dhirschfeld@habana.ai, mhaimovski@habana.ai, kelbaz@habana.ai, dri-devel@lists.freedesktop.org Subject: [PATCH AUTOSEL 6.4 31/58] accel/habanalabs: add pci health check during heartbeat Date: Sun, 23 Jul 2023 21:12:59 -0400 Message-Id: <20230724011338.2298062-31-sashal@kernel.org> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230724011338.2298062-1-sashal@kernel.org> References: <20230724011338.2298062-1-sashal@kernel.org> MIME-Version: 1.0 X-stable: review X-Patchwork-Hint: Ignore X-stable-base: Linux 6.4.5 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Ofir Bitton [ Upstream commit d8b9cea584661b30305cf341bf9f675dc0a25471 ] Currently upon a heartbeat failure, we don't know if the failure is due to firmware hang or due to a bad PCI link. Hence, we are reading a PCI config space register with a known value (vendor ID) so we will know which of the two possibilities caused the heartbeat failure. Signed-off-by: Ofir Bitton Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay Signed-off-by: Sasha Levin --- drivers/accel/habanalabs/common/device.c | 15 ++++++++++++++- drivers/accel/habanalabs/common/habanalabs.h | 2 ++ drivers/accel/habanalabs/common/habanalabs_drv.c | 2 -- 3 files changed, 16 insertions(+), 3 deletions(-) diff --git a/drivers/accel/habanalabs/common/device.c b/drivers/accel/habanalabs/common/device.c index fabfc501ef543..a39dd346a1678 100644 --- a/drivers/accel/habanalabs/common/device.c +++ b/drivers/accel/habanalabs/common/device.c @@ -981,6 +981,18 @@ static void device_early_fini(struct hl_device *hdev) hdev->asic_funcs->early_fini(hdev); } +static bool is_pci_link_healthy(struct hl_device *hdev) +{ + u16 vendor_id; + + if (!hdev->pdev) + return false; + + pci_read_config_word(hdev->pdev, PCI_VENDOR_ID, &vendor_id); + + return (vendor_id == PCI_VENDOR_ID_HABANALABS); +} + static void hl_device_heartbeat(struct work_struct *work) { struct hl_device *hdev = container_of(work, struct hl_device, @@ -995,7 +1007,8 @@ static void hl_device_heartbeat(struct work_struct *work) goto reschedule; if (hl_device_operational(hdev, NULL)) - dev_err(hdev->dev, "Device heartbeat failed!\n"); + dev_err(hdev->dev, "Device heartbeat failed! PCI link is %s\n", + is_pci_link_healthy(hdev) ? "healthy" : "broken"); info.err_type = HL_INFO_FW_HEARTBEAT_ERR; info.event_mask = &event_mask; diff --git a/drivers/accel/habanalabs/common/habanalabs.h b/drivers/accel/habanalabs/common/habanalabs.h index eaae69a9f8178..7f5d1b6e3fb08 100644 --- a/drivers/accel/habanalabs/common/habanalabs.h +++ b/drivers/accel/habanalabs/common/habanalabs.h @@ -36,6 +36,8 @@ struct hl_device; struct hl_fpriv; +#define PCI_VENDOR_ID_HABANALABS 0x1da3 + /* Use upper bits of mmap offset to store habana driver specific information. * bits[63:59] - Encode mmap type * bits[45:0] - mmap offset value diff --git a/drivers/accel/habanalabs/common/habanalabs_drv.c b/drivers/accel/habanalabs/common/habanalabs_drv.c index d9df64e75f33a..1ec97da3dddb8 100644 --- a/drivers/accel/habanalabs/common/habanalabs_drv.c +++ b/drivers/accel/habanalabs/common/habanalabs_drv.c @@ -54,8 +54,6 @@ module_param(boot_error_status_mask, ulong, 0444); MODULE_PARM_DESC(boot_error_status_mask, "Mask of the error status during device CPU boot (If bitX is cleared then error X is masked. Default all 1's)"); -#define PCI_VENDOR_ID_HABANALABS 0x1da3 - #define PCI_IDS_GOYA 0x0001 #define PCI_IDS_GAUDI 0x1000 #define PCI_IDS_GAUDI_SEC 0x1010 -- 2.39.2