Received: by 10.223.176.5 with SMTP id f5csp659062wra; Wed, 7 Feb 2018 05:32:11 -0800 (PST) X-Google-Smtp-Source: AH8x226CBbwS1odUYi0GLXeXvrKN8sygt8AEwmwAZW7Xxid8l/rKH9ipsrD11p9BCeT1V5kvPeOQ X-Received: by 2002:a17:902:788b:: with SMTP id q11-v6mr6007915pll.378.1518010331537; Wed, 07 Feb 2018 05:32:11 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1518010331; cv=none; d=google.com; s=arc-20160816; b=kTV25urXoqXSwjVtzaQdganPzQU00ODsyOcKcJ69Vz7kxtbx3itq5o458Nrog/MK/t HpwzHDLRouClALmSumExnuyxOJ+OmcFHGXbRic65RvLEFLpHesOV8d2jOG95YCQG15NF XJa7BMJvo2U5eeR4LbVs0SaNbjHZtlLdoNNW+vwu/f1+ktn7oqwc3WnLxAOp+Z3XNtg9 Az56RYCRlOLEzhXGZ8ry+cyloP7d9mDSvYkeYZNqhi/sUtaIANtZ9tMlzoI9NUtU68Aj UrItPZ0L5TmSqrjxB/cxyyDQaKs03ep7STEDp2I480HhLL5nbc5RJboFpf+eNC6UozO1 hS2Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:mail-followup-to :message-id:subject:to:from:date:dkim-signature :arc-authentication-results; bh=Plk1I4Nu+9bJRuJ1UQJJ9nkeiEMq89G1UtPRL/hTgk0=; b=a/IpBdZJEzMCaN3YCmsZyh+6pzMV440xXMFCak8+KeTzbQfGiRS4gNh9HTevfubMN1 P/taVLG5WzaIPUVoq0Qd7DHyZ198lzgJn3RTbqoFZX0NodNS8FnDT94qx24QJ7giaCHj UskX96IbGdkQKYI/M4vV0t/J2eoFmB+brxO9JxiU+vijfPUHgmUQzc9uFsaWvL7PN2Y+ TDRE3maf5PjzP5GqpAjyqBTnyAHNKi6GbGfvcU3x5mCmb10zuyBpSom9fwRmBnIzjH1P 8kjZt3A/g/vWHtEaIy4lzSmzxw+EijYhOlD2z9x/CYv2IfrKff0Ko8pDjPazZ99GffpL J3jA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=J55ixJk4; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id n9si931418pgr.552.2018.02.07.05.31.56; Wed, 07 Feb 2018 05:32:11 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=J55ixJk4; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753964AbeBGNbN (ORCPT + 99 others); Wed, 7 Feb 2018 08:31:13 -0500 Received: from mail-lf0-f66.google.com ([209.85.215.66]:42596 "EHLO mail-lf0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753648AbeBGNbL (ORCPT ); Wed, 7 Feb 2018 08:31:11 -0500 Received: by mail-lf0-f66.google.com with SMTP id q17so1342187lfa.9 for ; Wed, 07 Feb 2018 05:31:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=date:from:to:subject:message-id:mail-followup-to:references :mime-version:content-disposition:in-reply-to:user-agent; bh=Plk1I4Nu+9bJRuJ1UQJJ9nkeiEMq89G1UtPRL/hTgk0=; b=J55ixJk404geEoi7R9QiWDpV3Kksvv0OiY1ir1SceXiO2eo70gma2O7YQHGNtNws7b p/xx3z4qvvjpCn86Qs7EsjYKBt6AVO2CUw1YcIxLZI0PQENF+E5vj7BxODylpTz8haH1 o6f2mwmUuLjoFz142Q5IdFGndHFop99vtGH2o= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:subject:message-id:mail-followup-to :references:mime-version:content-disposition:in-reply-to:user-agent; bh=Plk1I4Nu+9bJRuJ1UQJJ9nkeiEMq89G1UtPRL/hTgk0=; b=beewqB8PtKDQSSr5eiI7LByGAtD1us37xRhnSYZuvv8xzrQiw3Lh76myierwsZznjo 7yyNvs4gyViHRRaqozVUg2yFLRXXGYnxUATX9wFqL4HAGwi1SWhjuQh8HDJDpi9jXJTM 5IZbb49hoM+vp0HD9ID1Ar/NBjklbvI47l6l1Y/Bn3FnylKOjclboo+Dh/cE70pXz39X A3IB4bXqraAPTSioEumilfvVQ5L8J7P089+Q4uuBe6L0pNa7oKmGNDPVGZcprV5TI/hu ZkAwY6Z3GEBoAwa5kbhAGZfaxWOXdTSCW/DUPsBqFhvsRYg0CVQh18+8lTt8IQpujb9L xu5Q== X-Gm-Message-State: APf1xPBRNVQ2lw5rt9OCd9RVDV2A8McNvFSAURABwy0kPt9gGJMycyVZ uA6x7sG0rWoCD4UeSttvVExNAg== X-Received: by 10.46.91.92 with SMTP id p89mr4475361ljb.122.1518010269453; Wed, 07 Feb 2018 05:31:09 -0800 (PST) Received: from khorivan (59-201-94-178.pool.ukrtel.net. [178.94.201.59]) by smtp.gmail.com with ESMTPSA id i15sm204893lfj.68.2018.02.07.05.31.08 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 07 Feb 2018 05:31:08 -0800 (PST) Date: Wed, 7 Feb 2018 15:31:06 +0200 From: Ivan Khoronzhuk To: Grygorii Strashko , "David S. Miller" , netdev@vger.kernel.org, Sekhar Nori , linux-kernel@vger.kernel.org, linux-omap@vger.kernel.org Subject: Re: [PATCH] net: ethernet: ti: cpsw: fix net watchdog timeout Message-ID: <20180207133105.GB7883@khorivan> Mail-Followup-To: Grygorii Strashko , "David S. Miller" , netdev@vger.kernel.org, Sekhar Nori , linux-kernel@vger.kernel.org, linux-omap@vger.kernel.org References: <20180207011706.13393-1-grygorii.strashko@ti.com> <20180207030316.GA7883@khorivan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180207030316.GA7883@khorivan> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Feb 07, 2018 at 05:03:19AM +0200, Ivan Khoronzhuk wrote: > On Tue, Feb 06, 2018 at 07:17:06PM -0600, Grygorii Strashko wrote: > > It was discovered that simple program which indefinitely sends 200b UDP > > packets and runs on TI AM574x SoC (SMP) under RT Kernel triggers network > > watchdog timeout in TI CPSW driver (<6 hours run). The network watchdog > > timeout is triggered due to race between cpsw_ndo_start_xmit() and > > cpsw_tx_handler() [NAPI] > > > > cpsw_ndo_start_xmit() > > if (unlikely(!cpdma_check_free_tx_desc(txch))) { > > txq = netdev_get_tx_queue(ndev, q_idx); > > netif_tx_stop_queue(txq); > > > > ^^ as per [1] barier has to be used after set_bit() otherwise new value > > might not be visible to other cpus > > } > > > > cpsw_tx_handler() > > if (unlikely(netif_tx_queue_stopped(txq))) > > netif_tx_wake_queue(txq); > > > > and when it happens ndev TX queue became disabled forever while driver's HW > > TX queue is empty. > I'm sure it fixes test case somehow but there is some strangeness. > (I've thought about this some X months ago): > 1. If no free desc, then there is bunch of descs on the queue ready to be sent > 2. If one of this desc while this process was missed then next will wake queue, > because there is bunch of them on the fly. So, if desc on top of the sent queue > missed to enable the queue, then next one more likely will enable it anyway.. > then how it could happen? The described race is possible only on last > descriptor, yes, packets are small the speed is hight, possibility is very small > .....but then next situation is also possible: > - packets are sent fast > - all packets were sent, but no any descriptors are freed now by sw interrupt (NAPI) > - when interrupt had started NAPI, the queue was enabled, all other next > interrupts are throttled once NAPI not finished it's work yet. > - when new packet submitted, no free descs are present yet (NAPI has not freed > any yet), but all packets are sent, so no one can awake tx queue, as interrupt > will not arise when NAPI is started to free first descriptor interrupts are > disabled.....because h/w queue to be sent is empty... > - how it can happen as submitting packet and handling packet operations is under > channel lock? Not exactly, a period between handling and freeing the descriptor > to the pool is not under channel lock, here: > > spin_unlock_irqrestore(&chan->lock, flags); > if (unlikely(status & CPDMA_DESC_TD_COMPLETE)) > cb_status = -ENOSYS; > else > cb_status = status; > > __cpdma_chan_free(chan, desc, outlen, cb_status); > return status; > > unlock_ret: > spin_unlock_irqrestore(&chan->lock, flags); > return status; > > And: > __cpdma_chan_free(chan, desc, outlen, cb_status); > -> cpdma_desc_free(pool, desc, 1); > > As result, queue deadlock as you've described. > Just thought, not checked, but theoretically possible. > What do you think? Better explanation, for rare race: start conditions: - all descs are submitted, except last one, and queue is not stopped - any desc was returned to the pool yet (but packets can be sent) time || \/ submit process NAPI poll process -------------------------------------------------------------------------------- new packet is scheduled for submission stated that no free descs (with locks) lock is freed returned all descs to the pool and queue is enabled interrupt enabled, poll exit queue is disabled submit exit Result: - all descs are returned to the pool, submission to the queue disabled - NAPI cannot wake queue, as all desc were handled already According to packet size in 200B Data size, bits: 200B * 63desc * 10 = 128000bit roughly Time all of them are sent: 128000 / 1Gb = 128us That's enough the CPU to be occupied by other process in RT even. > > > > > Fix this, by adding smp_mb__after_atomic() after netif_tx_stop_queue() > > calls and double check for free TX descriptors after stopping ndev TX queue > > - if there are free TX descriptors wake up ndev TX queue. > > > > [1] https://www.kernel.org/doc/html/latest/core-api/atomic_ops.html > > Signed-off-by: Grygorii Strashko > > --- > > drivers/net/ethernet/ti/cpsw.c | 16 ++++++++++++++-- > > 1 file changed, 14 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c > > index 10d7cbe..3805b13 100644 > > --- a/drivers/net/ethernet/ti/cpsw.c > > +++ b/drivers/net/ethernet/ti/cpsw.c > > @@ -1638,6 +1638,7 @@ static netdev_tx_t cpsw_ndo_start_xmit(struct sk_buff *skb, > > q_idx = q_idx % cpsw->tx_ch_num; > > > > txch = cpsw->txv[q_idx].ch; > > + txq = netdev_get_tx_queue(ndev, q_idx); > > ret = cpsw_tx_packet_submit(priv, skb, txch); > > if (unlikely(ret != 0)) { > > cpsw_err(priv, tx_err, "desc submit failed\n"); > > @@ -1648,15 +1649,26 @@ static netdev_tx_t cpsw_ndo_start_xmit(struct sk_buff *skb, > > * tell the kernel to stop sending us tx frames. > > */ > > if (unlikely(!cpdma_check_free_tx_desc(txch))) { > > - txq = netdev_get_tx_queue(ndev, q_idx); > > netif_tx_stop_queue(txq); > > + > > + /* Barrier, so that stop_queue visible to other cpus */ > > + smp_mb__after_atomic(); > > + > > + if (cpdma_check_free_tx_desc(txch)) > > + netif_tx_wake_queue(txq); > > } > > > > return NETDEV_TX_OK; > > fail: > > ndev->stats.tx_dropped++; > > - txq = netdev_get_tx_queue(ndev, skb_get_queue_mapping(skb)); > > netif_tx_stop_queue(txq); > > + > > + /* Barrier, so that stop_queue visible to other cpus */ > > + smp_mb__after_atomic(); > > + > > + if (cpdma_check_free_tx_desc(txch)) > > + netif_tx_wake_queue(txq); > > + > > return NETDEV_TX_BUSY; > > } > > > > -- > > 2.10.5 > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-omap" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > Regards, > Ivan Khoronzhuk -- Regards, Ivan Khoronzhuk