Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp2256868imm; Mon, 28 May 2018 04:55:15 -0700 (PDT) X-Google-Smtp-Source: AB8JxZoZ9CmTDgPsRDZ5/gM7j356zTEkqvBH2CsjSmuimL+aPrG7+ibJNuQAoha4LXzqmZOuI6Bp X-Received: by 2002:a62:8345:: with SMTP id h66-v6mr13357979pfe.0.1527508515490; Mon, 28 May 2018 04:55:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527508515; cv=none; d=google.com; s=arc-20160816; b=TnYMvyQTUSpO3uATuUvNEk76PfbqQ2qQhsH3N5eycv2bQAWYt/YrRFzLjirXCTJwXV ZlYt/uFw3CN5dl1JjTZP7rg0q3lPhejsft0gQ4p+wN5IdH+sf8dKCOpmh5CuCL2F5xcj 4TKp4CvoaZCe00IPdOCoQwnD/Bo8z+SF2brmfdMSLKgFNK0Sa8ingA92t/zoC/JoV790 2Eng9uYwT0WixgyezvkXAIZQwcsxy/uI2UiBXO5cipvYn3jvysYCEqsDsSdeTy+10U+2 43ANsuxf82NTpQkAmw2K648cQcCaKGAKgH6i/mKthiJiGY8ftVYl+yIbrW4luEutadrU FaXQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :in-reply-to:message-id:date:subject:cc:to:from:dkim-signature :arc-authentication-results; bh=NO0s5EH2gqr3psa0TW2fW/vyn+qfavq7KyPF/VaJSfY=; b=THc1AbYWovOAPOKEDI7VImCfqtwZrKMvfxuz4ACjcXPELwO4CnX3qOs5UNu1MzMxbW NUSHyBPoM5ZU1Xt4pS2TfFi/0mbIeGeNuo0Ek3u0yBFebgARTGoDZ9PI75m+GC5qzLcb e7uPasGtw1bNPJfhlebFypm4HPuBXOKWF1fxp4hgh+VwRpkUpHoq4E0iGT2WB6TvuCna dWV39Xrb2gtGW5csZ4+WRH8Elz8rdm/CBbBbchKUBvSrbemrNxhwE4vtN5PDXIE4/q97 1DjW1EPW+x7iFVIafNKvX5w4XT20L4GcTme8GrVwlgrMaxscd7IyPCD4PNE30ZwTP+VL FmQQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=twrPT1MF; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b1-v6si30666677pla.570.2018.05.28.04.55.01; Mon, 28 May 2018 04:55:15 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=twrPT1MF; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1423785AbeE1LyJ (ORCPT + 99 others); Mon, 28 May 2018 07:54:09 -0400 Received: from mail.kernel.org ([198.145.29.99]:56476 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1423673AbeE1LIx (ORCPT ); Mon, 28 May 2018 07:08:53 -0400 Received: from localhost (LFbn-1-12247-202.w90-92.abo.wanadoo.fr [90.92.61.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 478212089E; Mon, 28 May 2018 11:08:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1527505732; bh=iZYU6tiRF8xkPhoJK0Fzf1LahmnQkRHeY9bBrPJL7ys=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=twrPT1MFNJcEWg81otnn81xGc2yGXALUcEizxP7urKJZjVAOEShDkBEbmlkKjgsiC MULyEVrBxvDbZZcJhruzjzQfSJ0OGAP44bCPPqF/HcX4rvGE5GAvr6p3vBNdAxOWPn y+pX65LKGAaOqKVogtlX0fmWbzDbVY51sY9/iW0E= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Eran Ben Elisha , Saeed Mahameed , Sasha Levin Subject: [PATCH 4.16 108/272] net/mlx5e: Move all TX timeout logic to be under state lock Date: Mon, 28 May 2018 12:02:21 +0200 Message-Id: <20180528100250.187792719@linuxfoundation.org> X-Mailer: git-send-email 2.17.0 In-Reply-To: <20180528100240.256525891@linuxfoundation.org> References: <20180528100240.256525891@linuxfoundation.org> User-Agent: quilt/0.65 X-stable: review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 4.16-stable review patch. If anyone has any objections, please let me know. ------------------ From: Eran Ben Elisha [ Upstream commit bfc647d52e67dc756c605e9a50d45b71054c2533 ] Driver callback for handling TX timeout should access some internal resources (SQ, CQ) in order to decide if the tx timeout work should be scheduled. These resources might be unavailable if channels are closed in parallel (ifdown for example). The state lock is the mechanism to protect from such races. Move all TX timeout logic to be in the work under a state lock. In addition, Move the work from the global WQ to mlx5e WQ to make sure this work is flushed when device is detached.. Also, move the mlx5e_tx_timeout_work code to be next to the TX timeout NDO for better code locality. Fixes: 3947ca185999 ("net/mlx5e: Implement ndo_tx_timeout callback") Signed-off-by: Eran Ben Elisha Signed-off-by: Saeed Mahameed Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 61 ++++++++++++---------- 1 file changed, 34 insertions(+), 27 deletions(-) --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -153,26 +153,6 @@ static void mlx5e_update_carrier_work(st mutex_unlock(&priv->state_lock); } -static void mlx5e_tx_timeout_work(struct work_struct *work) -{ - struct mlx5e_priv *priv = container_of(work, struct mlx5e_priv, - tx_timeout_work); - int err; - - rtnl_lock(); - mutex_lock(&priv->state_lock); - if (!test_bit(MLX5E_STATE_OPENED, &priv->state)) - goto unlock; - mlx5e_close_locked(priv->netdev); - err = mlx5e_open_locked(priv->netdev); - if (err) - netdev_err(priv->netdev, "mlx5e_open_locked failed recovering from a tx_timeout, err(%d).\n", - err); -unlock: - mutex_unlock(&priv->state_lock); - rtnl_unlock(); -} - void mlx5e_update_stats(struct mlx5e_priv *priv) { int i; @@ -3632,13 +3612,19 @@ static bool mlx5e_tx_timeout_eq_recover( return true; } -static void mlx5e_tx_timeout(struct net_device *dev) +static void mlx5e_tx_timeout_work(struct work_struct *work) { - struct mlx5e_priv *priv = netdev_priv(dev); + struct mlx5e_priv *priv = container_of(work, struct mlx5e_priv, + tx_timeout_work); + struct net_device *dev = priv->netdev; bool reopen_channels = false; - int i; + int i, err; - netdev_err(dev, "TX timeout detected\n"); + rtnl_lock(); + mutex_lock(&priv->state_lock); + + if (!test_bit(MLX5E_STATE_OPENED, &priv->state)) + goto unlock; for (i = 0; i < priv->channels.num * priv->channels.params.num_tc; i++) { struct netdev_queue *dev_queue = netdev_get_tx_queue(dev, i); @@ -3646,7 +3632,9 @@ static void mlx5e_tx_timeout(struct net_ if (!netif_xmit_stopped(dev_queue)) continue; - netdev_err(dev, "TX timeout on queue: %d, SQ: 0x%x, CQ: 0x%x, SQ Cons: 0x%x SQ Prod: 0x%x, usecs since last trans: %u\n", + + netdev_err(dev, + "TX timeout on queue: %d, SQ: 0x%x, CQ: 0x%x, SQ Cons: 0x%x SQ Prod: 0x%x, usecs since last trans: %u\n", i, sq->sqn, sq->cq.mcq.cqn, sq->cc, sq->pc, jiffies_to_usecs(jiffies - dev_queue->trans_start)); @@ -3659,8 +3647,27 @@ static void mlx5e_tx_timeout(struct net_ } } - if (reopen_channels && test_bit(MLX5E_STATE_OPENED, &priv->state)) - schedule_work(&priv->tx_timeout_work); + if (!reopen_channels) + goto unlock; + + mlx5e_close_locked(dev); + err = mlx5e_open_locked(dev); + if (err) + netdev_err(priv->netdev, + "mlx5e_open_locked failed recovering from a tx_timeout, err(%d).\n", + err); + +unlock: + mutex_unlock(&priv->state_lock); + rtnl_unlock(); +} + +static void mlx5e_tx_timeout(struct net_device *dev) +{ + struct mlx5e_priv *priv = netdev_priv(dev); + + netdev_err(dev, "TX timeout detected\n"); + queue_work(priv->wq, &priv->tx_timeout_work); } static int mlx5e_xdp_set(struct net_device *netdev, struct bpf_prog *prog)