Received: by 10.223.176.5 with SMTP id f5csp65853wra; Tue, 6 Feb 2018 17:18:30 -0800 (PST) X-Google-Smtp-Source: AH8x226s3kzDVaKNGgmndprjsOpr/rUxYJyg1D7xGGDt6UbjrySUXwf9rEiiMP1JAK7mkJkuH8VF X-Received: by 10.98.197.68 with SMTP id j65mr4149738pfg.93.1517966310884; Tue, 06 Feb 2018 17:18:30 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1517966310; cv=none; d=google.com; s=arc-20160816; b=lsOLrOp/tChf920yFZx2sgtqNRZaxED/pkwBW+0JqWKGsf55T6jX6mRHczCSz7RwF0 Y4p4Lp7bbkDoFyBuMTo4QYX1qtUKSymDTPjB9efR5KKDmuBh6KrLYnj51Qq25EMaCVcJ bB/MrAIrZUgnurg7nLiQm4Ten6Q/PIvcXdR74Pk4NOr+DtHStLLbEF5A4NnjmhxFHCKQ PWj19soJDBWZgTszmdMADSxSwcbtUsbvP4wbPhpY0PDF7+OBbmHEIRLUel+quGlp1Xea 3tKYDsM6S7s9B3EvOZ7Bi+P65+dLMrN8uuaEjRBT63O9ABotK3aclRnNSMGIvGjXPuch oMag== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:message-id:date:subject:cc :to:from:dkim-signature:arc-authentication-results; bh=3yN27he2tk8BKaH1b2G8vtv3l+MER9Hj62FQsy71XnQ=; b=XQR+MamK3HpUiuzaK6QThQamwghrF7WLhzogauhx7ySsvwbBgbq3uOMAI/f3DQXUDX zJ9UEgmbIr94IXxo3zWczhK4tCj+zbCAfxLVJcFg4dzb3So01hSRxY5Vb56SJjTGFGbU iUuUoOH35eLJ6hgXftM4eZqiDDjEFw4iBFniH0kOlJuhq9XZKEROZWDP8wZDCIzJQNyW lH8LtJJTSGPt3k/uZe+toPGOItNqdlNyf/oSQY2PN4zN2uUzdoIAPcmQdss8iMjnDELs aOt0VQhizGU9slHdT6NmmNoyTaVWBhptxHSWE/HPY2sMb9snsC4DiDYvQOgLnCmYA5un vUdQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ti.com header.s=ti-com-17Q1 header.b=piBFTtpv; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=NONE dis=NONE) header.from=ti.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id o3-v6si225982pld.405.2018.02.06.17.18.02; Tue, 06 Feb 2018 17:18:30 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@ti.com header.s=ti-com-17Q1 header.b=piBFTtpv; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=NONE dis=NONE) header.from=ti.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754347AbeBGBRT (ORCPT + 99 others); Tue, 6 Feb 2018 20:17:19 -0500 Received: from fllnx210.ext.ti.com ([198.47.19.17]:32307 "EHLO fllnx210.ext.ti.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754326AbeBGBRR (ORCPT ); Tue, 6 Feb 2018 20:17:17 -0500 Received: from dlelxv90.itg.ti.com ([172.17.2.17]) by fllnx210.ext.ti.com (8.15.1/8.15.1) with ESMTP id w171HFM7031968; Tue, 6 Feb 2018 19:17:15 -0600 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ti.com; s=ti-com-17Q1; t=1517966235; bh=UvJ5cPMFPSnHTq69pqa8jNuFBG8oe9deqT2nnRD5u8g=; h=From:To:CC:Subject:Date; b=piBFTtpv9H90PZRRxuUR9vBWmrgnfKvDPQ7033sfWAhwRwPSD1Hy8G8/1U4+Tsc5h AN4EBWEgYbOdiZBQn7hSWebaJTfjzXCYCn5Glri4raAuAmDOIb5DYQJ7hLldElLIlY ijGqv1owcWMCMEVfLpxaWAkXm3MEeKZ3wQB25L18= Received: from DFLE108.ent.ti.com (dfle108.ent.ti.com [10.64.6.29]) by dlelxv90.itg.ti.com (8.14.3/8.13.8) with ESMTP id w171HE8Q030737; Tue, 6 Feb 2018 19:17:15 -0600 Received: from DFLE101.ent.ti.com (10.64.6.22) by DFLE108.ent.ti.com (10.64.6.29) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1261.35; Tue, 6 Feb 2018 19:17:14 -0600 Received: from dlep32.itg.ti.com (157.170.170.100) by DFLE101.ent.ti.com (10.64.6.22) with Microsoft SMTP Server (version=TLS1_0, cipher=TLS_RSA_WITH_AES_256_CBC_SHA) id 15.1.1261.35 via Frontend Transport; Tue, 6 Feb 2018 19:17:14 -0600 Received: from legion.dal.design.ti.com (legion.dal.design.ti.com [128.247.22.53]) by dlep32.itg.ti.com (8.14.3/8.13.8) with ESMTP id w171HEbW028686; Tue, 6 Feb 2018 19:17:14 -0600 Received: from localhost (uda0226610.dhcp.ti.com [128.247.59.147]) by legion.dal.design.ti.com (8.11.7p1+Sun/8.11.7) with ESMTP id w171HEx22713; Tue, 6 Feb 2018 19:17:14 -0600 (CST) From: Grygorii Strashko To: "David S. Miller" , CC: Sekhar Nori , , , Grygorii Strashko Subject: [PATCH] net: ethernet: ti: cpsw: fix net watchdog timeout Date: Tue, 6 Feb 2018 19:17:06 -0600 Message-ID: <20180207011706.13393-1-grygorii.strashko@ti.com> X-Mailer: git-send-email 2.10.5 MIME-Version: 1.0 Content-Type: text/plain X-EXCLAIMER-MD-CONFIG: e1e8a2fd-e40a-4ac6-ac9b-f7e9cc9ee180 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org It was discovered that simple program which indefinitely sends 200b UDP packets and runs on TI AM574x SoC (SMP) under RT Kernel triggers network watchdog timeout in TI CPSW driver (<6 hours run). The network watchdog timeout is triggered due to race between cpsw_ndo_start_xmit() and cpsw_tx_handler() [NAPI] cpsw_ndo_start_xmit() if (unlikely(!cpdma_check_free_tx_desc(txch))) { txq = netdev_get_tx_queue(ndev, q_idx); netif_tx_stop_queue(txq); ^^ as per [1] barier has to be used after set_bit() otherwise new value might not be visible to other cpus } cpsw_tx_handler() if (unlikely(netif_tx_queue_stopped(txq))) netif_tx_wake_queue(txq); and when it happens ndev TX queue became disabled forever while driver's HW TX queue is empty. Fix this, by adding smp_mb__after_atomic() after netif_tx_stop_queue() calls and double check for free TX descriptors after stopping ndev TX queue - if there are free TX descriptors wake up ndev TX queue. [1] https://www.kernel.org/doc/html/latest/core-api/atomic_ops.html Signed-off-by: Grygorii Strashko --- drivers/net/ethernet/ti/cpsw.c | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c index 10d7cbe..3805b13 100644 --- a/drivers/net/ethernet/ti/cpsw.c +++ b/drivers/net/ethernet/ti/cpsw.c @@ -1638,6 +1638,7 @@ static netdev_tx_t cpsw_ndo_start_xmit(struct sk_buff *skb, q_idx = q_idx % cpsw->tx_ch_num; txch = cpsw->txv[q_idx].ch; + txq = netdev_get_tx_queue(ndev, q_idx); ret = cpsw_tx_packet_submit(priv, skb, txch); if (unlikely(ret != 0)) { cpsw_err(priv, tx_err, "desc submit failed\n"); @@ -1648,15 +1649,26 @@ static netdev_tx_t cpsw_ndo_start_xmit(struct sk_buff *skb, * tell the kernel to stop sending us tx frames. */ if (unlikely(!cpdma_check_free_tx_desc(txch))) { - txq = netdev_get_tx_queue(ndev, q_idx); netif_tx_stop_queue(txq); + + /* Barrier, so that stop_queue visible to other cpus */ + smp_mb__after_atomic(); + + if (cpdma_check_free_tx_desc(txch)) + netif_tx_wake_queue(txq); } return NETDEV_TX_OK; fail: ndev->stats.tx_dropped++; - txq = netdev_get_tx_queue(ndev, skb_get_queue_mapping(skb)); netif_tx_stop_queue(txq); + + /* Barrier, so that stop_queue visible to other cpus */ + smp_mb__after_atomic(); + + if (cpdma_check_free_tx_desc(txch)) + netif_tx_wake_queue(txq); + return NETDEV_TX_BUSY; } -- 2.10.5