Received: by 2002:a05:6a10:c7c6:0:0:0:0 with SMTP id h6csp1659589pxy; Mon, 2 Aug 2021 07:16:14 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxe/PgFNbWL0Px4QEhqWIirLzK6BPEyc06CJYmHq7T0ZspdpfDIdXSXmnGgo83nOo+Y+T5s X-Received: by 2002:a92:c10d:: with SMTP id p13mr27990ile.83.1627913774305; Mon, 02 Aug 2021 07:16:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1627913774; cv=none; d=google.com; s=arc-20160816; b=U8s2mjFAUGX9tyB6qEDg/NocyTn4Vj8e5lWld/ZcCScp08f3V2tnO5sR/KoEvuTnnB Bb5aFKonr7o2L/2YIJcVPmxCNN4gKjSysKbPUcRdieqgOjtW0nuRaxbQfE2NhA5FY/fL bzh2OiguK8lva0oI/N9TvWxgtmxEZ1q1AxJ2C943vtxqjIRD1wB+qbSZ3XQuckHfNGWD xswl12eoFed7Zc6b6GwmYKuA6leNEp++Mh9x77EIQzaPcno1X+qvMEQ2/FiiH4Fk65JV WKfqJgAOuNWzlYfFj/KZdCoaokR3MGvE5nzChj7rWbHnr4lJo87F2/0QRCuJyFAPy92y rN/Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=ckNMwlpxyrF0HODSXTw1IqAOeORkKoJqDnOfAdLRFPU=; b=au1xfVrUDvl6/b6l4PBiUCmOoHf5QXe0G2XqWgS3AnFRG5VQAi+InLe+whz/FTqJj+ gHw0JQne0467Eu9pgKgrkWpBDL/KrNyYYz5VVFPapo4aQCX5SqMCPeCSm+VHYyIwiE6J Zq3uaYd0iRWsjyJDw0CMj4tIHYmFklBhJGCpOhkgYDA1lPBdfQT1F9BExrFECBWs5zmc QYGVdm3aZIwfMq5yYvctPnL9W563nCCFG3aSyLTfLgBkvN3xWqHXnsfzmqaccf2btvCs 0yhbRSXc6kWRBK0YbfQ/fO+pDgocLVXYekwiW1hWA4a49Zcskt6HWQoa14N01k1703oR AI/w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=oobBpshV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id q18si12102543jao.21.2021.08.02.07.16.02; Mon, 02 Aug 2021 07:16:14 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=oobBpshV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237047AbhHBOOC (ORCPT + 99 others); Mon, 2 Aug 2021 10:14:02 -0400 Received: from mail.kernel.org ([198.145.29.99]:49164 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236591AbhHBODc (ORCPT ); Mon, 2 Aug 2021 10:03:32 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id C0A0461211; Mon, 2 Aug 2021 13:57:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1627912640; bh=PpLBshF79vpConzbkrHRkjI2D8Qk/JOWna+XLDYDzTU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=oobBpshVb5dTyYmbBOsXfVLAXjoi8bckrV09eKckOE9F+esWZ5j+aFwHgED/EcDmG hzBKYOtWnWv85OewuE8QDiroJwiR/JhjDeqyOkn4W8djuyayOEvYNBVxLkn2d+isae PwMl4xMUMmuMkzk73imc/Pfb7Gf0yksoH3yZ9XHs= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Aya Levin , Moshe Shemesh , Saeed Mahameed , Sasha Levin Subject: [PATCH 5.13 083/104] net/mlx5: Unload device upon firmware fatal error Date: Mon, 2 Aug 2021 15:45:20 +0200 Message-Id: <20210802134346.743470990@linuxfoundation.org> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210802134344.028226640@linuxfoundation.org> References: <20210802134344.028226640@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Aya Levin [ Upstream commit 7f331bf0f060c2727e36d64f9b098b4ee5f3dfad ] When fw_fatal reporter reports an error, the firmware in not responding. Unload the device to ensure that the driver closes all its resources, even if recovery is not due (user disabled auto-recovery or reporter is in grace period). On successful recovery the device is loaded back up. Fixes: b3bd076f7501 ("net/mlx5: Report devlink health on FW fatal issues") Signed-off-by: Aya Levin Reviewed-by: Moshe Shemesh Signed-off-by: Saeed Mahameed Signed-off-by: Sasha Levin --- drivers/net/ethernet/mellanox/mlx5/core/health.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c index 9ff163c5bcde..9abeb80ffa31 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/health.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c @@ -626,8 +626,16 @@ static void mlx5_fw_fatal_reporter_err_work(struct work_struct *work) } fw_reporter_ctx.err_synd = health->synd; fw_reporter_ctx.miss_counter = health->miss_counter; - devlink_health_report(health->fw_fatal_reporter, - "FW fatal error reported", &fw_reporter_ctx); + if (devlink_health_report(health->fw_fatal_reporter, + "FW fatal error reported", &fw_reporter_ctx) == -ECANCELED) { + /* If recovery wasn't performed, due to grace period, + * unload the driver. This ensures that the driver + * closes all its resources and it is not subjected to + * requests from the kernel. + */ + mlx5_core_err(dev, "Driver is in error state. Unloading\n"); + mlx5_unload_one(dev); + } } static const struct devlink_health_reporter_ops mlx5_fw_fatal_reporter_ops = { -- 2.30.2