Received: by 2002:a05:6a11:4021:0:0:0:0 with SMTP id ky33csp2088546pxb; Mon, 20 Sep 2021 11:58:29 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxOSlBUZuNbAqXlsa28bTdEb+mYlynfNHeq2THdY/jkS5byBEybvI6Gt5tY7yQrNtt0GKRH X-Received: by 2002:a50:e0c9:: with SMTP id j9mr17168029edl.336.1632164309395; Mon, 20 Sep 2021 11:58:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1632164309; cv=none; d=google.com; s=arc-20160816; b=XjVM3TN0RuQ9OkRX96nOdZ0mQ2WL0WEJSY+ua9QoLOjeDlzbdvaYIzmgCjCANTsKz4 qZaTk7RdE1e4MVkvpo8DzWqSEjvujLdd7YpdsUzJN0zJGurQOyiPfZbani2YBvK7KGOS 3WelR8eLCfItI8Yx5tk19SQ1Ga/jhgQ55mZ9dqnqBlb3kI74yQmTsjCazEQuZ60pz72e Q9OfxPL5x9//U27TXk3CdEjP6HpaLl4bKMFcQRSIvs7rBziKIv0IdjFHdmldQ0TwRIS9 iEcrmvtWspZoZaLDC1dsAlXM6Z3i4XxHFV2t3JW/lkPgd4UxMq6AXMcyctQKEhSNvgon xu9Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=bnM7JWkY+q+fxSkD7aYopyJyjjLm/LBn8P6EVwzexKk=; b=Bxsg9qHEPkBz9nVvA6ZJ8QgKf1MQiCg1U6lEeqxWrTz5NDaLSGQhC7boTcPnpY/WKp 6vJk2IuNmzeBH1/lotSaG6NgVKi9NXCySyGkm43UgDDJG0OidCdwUNctUpKU11FaGjFG 59Bp8Aye5GjPt55YeSN+TrGMTkx8n+dp8fr5jXJQaisveeuK5t0RqMbS3QV3e7ln8EBJ 9I3LuQ42/tZnpP+4sYD/bG4Hoko7cU5T/X+gBrE7S3rtXWF40btxtUnNQV56HORIA65H A25jKZF1AlfknwXMd4Dyn3Y6Eoj4BTPaSIUmSX2pTwIQFF2chj4elNo7sz/oSwP4Ewua YfJA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b="G8/JU6d/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id q21si12812590edc.349.2021.09.20.11.57.57; Mon, 20 Sep 2021 11:58:29 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b="G8/JU6d/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1349724AbhITS5n (ORCPT + 99 others); Mon, 20 Sep 2021 14:57:43 -0400 Received: from mail.kernel.org ([198.145.29.99]:38108 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1385807AbhITSw2 (ORCPT ); Mon, 20 Sep 2021 14:52:28 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 90F4163383; Mon, 20 Sep 2021 17:35:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1632159318; bh=AV3XuGOzlaAqvqRuTjT43u2zLAVdU4oHwauY6QUT+jk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=G8/JU6d/nbi5xHvdg3ivGKpjnqMnTPCreC1qpAV6lpLTqTqhImlyyIYAHjwXlfvlp RZij61Tk5yyeTRI4ozl4tA8hOuj4riKekEhwFvTyZ2YRnPNtK7LFA97u44sYxmUouE p9GvzB50G/Ah36NR+OiQQiV8XgIgtyJHIcm1GLCs= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Edwin Peer , Michael Chan , "David S. Miller" , Sasha Levin Subject: [PATCH 5.14 164/168] bnxt_en: Fix possible unintended driver initiated error recovery Date: Mon, 20 Sep 2021 18:45:02 +0200 Message-Id: <20210920163927.070105317@linuxfoundation.org> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20210920163921.633181900@linuxfoundation.org> References: <20210920163921.633181900@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Michael Chan [ Upstream commit 1b2b91831983aeac3adcbb469aa8b0dc71453f89 ] If error recovery is already enabled, bnxt_timer() will periodically check the heartbeat register and the reset counter. If we get an error recovery async. notification from the firmware (e.g. change in primary/secondary role), we will immediately read and update the heartbeat register and the reset counter. If the timer for the next health check expires soon after this, we may read the heartbeat register again in quick succession and find that it hasn't changed. This will trigger error recovery unintentionally. The likelihood is small because we also reset fw_health->tmr_counter which will reset the interval for the next health check. But the update is not protected and bnxt_timer() can miss the update and perform the health check without waiting for the full interval. Fix it by only reading the heartbeat register and reset counter in bnxt_async_event_process() if error recovery is trasitioning to the enabled state. Also add proper memory barriers so that when enabling for the first time, bnxt_timer() will see the tmr_counter interval and perform the health check after the full interval has elapsed. Fixes: 7e914027f757 ("bnxt_en: Enable health monitoring.") Reviewed-by: Edwin Peer Signed-off-by: Michael Chan Signed-off-by: David S. Miller Signed-off-by: Sasha Levin --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 25 ++++++++++++++++------- 1 file changed, 18 insertions(+), 7 deletions(-) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index 1acf8633399f..2660dfc6875a 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -2172,25 +2172,34 @@ static int bnxt_async_event_process(struct bnxt *bp, if (!fw_health) goto async_event_process_exit; - fw_health->enabled = EVENT_DATA1_RECOVERY_ENABLED(data1); - fw_health->master = EVENT_DATA1_RECOVERY_MASTER_FUNC(data1); - if (!fw_health->enabled) { + if (!EVENT_DATA1_RECOVERY_ENABLED(data1)) { + fw_health->enabled = false; netif_info(bp, drv, bp->dev, "Error recovery info: error recovery[0]\n"); break; } + fw_health->master = EVENT_DATA1_RECOVERY_MASTER_FUNC(data1); fw_health->tmr_multiplier = DIV_ROUND_UP(fw_health->polling_dsecs * HZ, bp->current_interval * 10); fw_health->tmr_counter = fw_health->tmr_multiplier; - fw_health->last_fw_heartbeat = - bnxt_fw_health_readl(bp, BNXT_FW_HEARTBEAT_REG); - fw_health->last_fw_reset_cnt = - bnxt_fw_health_readl(bp, BNXT_FW_RESET_CNT_REG); + if (!fw_health->enabled) { + fw_health->last_fw_heartbeat = + bnxt_fw_health_readl(bp, BNXT_FW_HEARTBEAT_REG); + fw_health->last_fw_reset_cnt = + bnxt_fw_health_readl(bp, BNXT_FW_RESET_CNT_REG); + } netif_info(bp, drv, bp->dev, "Error recovery info: error recovery[1], master[%d], reset count[%u], health status: 0x%x\n", fw_health->master, fw_health->last_fw_reset_cnt, bnxt_fw_health_readl(bp, BNXT_FW_HEALTH_REG)); + if (!fw_health->enabled) { + /* Make sure tmr_counter is set and visible to + * bnxt_health_check() before setting enabled to true. + */ + smp_wmb(); + fw_health->enabled = true; + } goto async_event_process_exit; } case ASYNC_EVENT_CMPL_EVENT_ID_DEBUG_NOTIFICATION: @@ -11250,6 +11259,8 @@ static void bnxt_fw_health_check(struct bnxt *bp) if (!fw_health->enabled || test_bit(BNXT_STATE_IN_FW_RESET, &bp->state)) return; + /* Make sure it is enabled before checking the tmr_counter. */ + smp_rmb(); if (fw_health->tmr_counter) { fw_health->tmr_counter--; return; -- 2.30.2