Received: by 10.223.176.5 with SMTP id f5csp1393142wra; Wed, 7 Feb 2018 18:58:34 -0800 (PST) X-Google-Smtp-Source: AH8x226XCg2cC3glrx3VcmvOXuwz8PApYklgdH5w2KipKUMxVZcdi+d1+JSGn70/CLrqVFVodz8M X-Received: by 10.98.58.204 with SMTP id v73mr8145231pfj.0.1518058714581; Wed, 07 Feb 2018 18:58:34 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1518058714; cv=none; d=google.com; s=arc-20160816; b=AEL4DXXwRAGDDuXT/yKWOtGU3AUjyIKLF/eAdYQeYZv2MEV2lb/wPTps1t6+Obv/a2 3UitXp7EnkYN4EBg2yodLVUy7hPwXXAGRpQ8yZVRdDrZxao0zHf7cS92MwtG55mSXHLZ vnECKsTM6ptT/wya9R6QEL2pakbfstrHinXAffidujkiKtPddwkL9PhbS2c55kbQ4quc C9U1z+MDjZfzfpgOpO5sqTugS5Z0sjiSYjYGXl7UXcRCC3xC2h8w+/iXRC/Y/wz7Q8KT +kKZSP3egqkmttzNDxETHhDpRRBdP3GeEWNGY9ME55dAUHPivdBKwdL+3sncn45Asrt+ pMNA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:from:subject:cc:to:message-id:date :arc-authentication-results; bh=+cKoFtkrKBlfsTbmSqQNC76QGqapfF6tSP1Wy7IY/M4=; b=k4iR1ZLiMm0WTKcKx9LTbZ1ZNA9kzYQohVMMzb0MFT3S2D7xxg+C/tQ80C20Zfoad9 Hnt6CyLFvU/T5MODuw9EJn1wZNFiIzPYovKCadRbE5NYquo9a6lBlb7MRPXwSSNsxcIW dH701pKPXtKyGB7rUbZpy89GbBbkNIQRtCJ3vzOl/AcrV2hArjwzX7H20AzsDaGnr786 dJQ7mHFEIy9GblOE1FVan1rVQqxj8f0dOhiIbt6po9fKkqXzeiVowQTvLRC54nOSj9zn HA6xUzXdScU9gWUVLEgOHhdmp6tyY4gaLmERutqdl2LNk1YmaoS2xzs0AfWhM+lPk4Rr OQjQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p13-v6si2085938pll.84.2018.02.07.18.58.20; Wed, 07 Feb 2018 18:58:34 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752021AbeBHC5i (ORCPT + 99 others); Wed, 7 Feb 2018 21:57:38 -0500 Received: from shards.monkeyblade.net ([184.105.139.130]:39496 "EHLO shards.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751788AbeBHC5h (ORCPT ); Wed, 7 Feb 2018 21:57:37 -0500 Received: from localhost (pool-173-77-163-229.nycmny.fios.verizon.net [173.77.163.229]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) (Authenticated sender: davem-davemloft) by shards.monkeyblade.net (Postfix) with ESMTPSA id 4C6B01261FBDB; Wed, 7 Feb 2018 18:57:36 -0800 (PST) Date: Wed, 07 Feb 2018 21:57:35 -0500 (EST) Message-Id: <20180207.215735.1518454397358783732.davem@davemloft.net> To: grygorii.strashko@ti.com Cc: netdev@vger.kernel.org, nsekhar@ti.com, linux-kernel@vger.kernel.org, linux-omap@vger.kernel.org Subject: Re: [PATCH] net: ethernet: ti: cpsw: fix net watchdog timeout From: David Miller In-Reply-To: <20180207011706.13393-1-grygorii.strashko@ti.com> References: <20180207011706.13393-1-grygorii.strashko@ti.com> X-Mailer: Mew version 6.7 on Emacs 25.3 / Mule 6.0 (HANACHIRUSATO) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.5.12 (shards.monkeyblade.net [149.20.54.216]); Wed, 07 Feb 2018 18:57:36 -0800 (PST) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Grygorii Strashko Date: Tue, 6 Feb 2018 19:17:06 -0600 > It was discovered that simple program which indefinitely sends 200b UDP > packets and runs on TI AM574x SoC (SMP) under RT Kernel triggers network > watchdog timeout in TI CPSW driver (<6 hours run). The network watchdog > timeout is triggered due to race between cpsw_ndo_start_xmit() and > cpsw_tx_handler() [NAPI] > > cpsw_ndo_start_xmit() > if (unlikely(!cpdma_check_free_tx_desc(txch))) { > txq = netdev_get_tx_queue(ndev, q_idx); > netif_tx_stop_queue(txq); > > ^^ as per [1] barier has to be used after set_bit() otherwise new value > might not be visible to other cpus > } > > cpsw_tx_handler() > if (unlikely(netif_tx_queue_stopped(txq))) > netif_tx_wake_queue(txq); > > and when it happens ndev TX queue became disabled forever while driver's HW > TX queue is empty. > > Fix this, by adding smp_mb__after_atomic() after netif_tx_stop_queue() > calls and double check for free TX descriptors after stopping ndev TX queue > - if there are free TX descriptors wake up ndev TX queue. > > [1] https://www.kernel.org/doc/html/latest/core-api/atomic_ops.html > Signed-off-by: Grygorii Strashko Applied, thanks.