Received: by 10.223.176.5 with SMTP id f5csp966949wra; Wed, 7 Feb 2018 10:20:20 -0800 (PST) X-Google-Smtp-Source: AH8x224cnmnrUxtT9W67DZ8CLXXkIfxLNElWNbghBeeIC+XjOyk6EnKnN94cwuMzxBDJBkiQPGC6 X-Received: by 2002:a17:902:8b8a:: with SMTP id ay10-v6mr6783877plb.156.1518027620413; Wed, 07 Feb 2018 10:20:20 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1518027620; cv=none; d=google.com; s=arc-20160816; b=n7rM8KhUpUpQKO/xilwXERCC3H7hFGyEw3rN9zN5Klj9plTmtGz5+BcAHo/zMQ7p/v BmKxoG4yp5dQjy6mik14V84iOvBkKE4N1k0ivI5iwo6z80KTVgFnhiaxHAmnIF/JGsYA v1XNQPEPfAqUNMH/8EkX3e/EBFvgo5Yyjuwt3h5gG9nQptuchraHMir8V/4EiUEaomSX wBeWNIplCmiNwFex5VPQ18CIj2NctmpKovuAiRhYz+cT8uEkwrzmk3hwByWG2Xc1czno T6G984hUHw0ZhZBQ66EuX3/3IbMGX94dg5zfw3//jOIHCsWxdnV+Invjl3caEYXeeETI 9SYg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:cc:from:references:to:subject:dkim-signature :arc-authentication-results; bh=1vckb2g2+OU5y1f5OWyGZFqrS3RYcKxsBNX36S8jRqo=; b=DBMp/MYxlkSEad/8XozOVFK5DDAys1hRBWzw55XFwdKB78zESHVCfEK+T2lKzsGtRd KAILFkK2Rx47kzlXj24vf9RLMcTE1lnLEgeQaTA1cfHCqzq92H+dWfa2e8MFJUup/OTo Z2VO2B4fF/QYVjAaGRjS42obRcdEmvmMNLEa9dDnnbKvVD9MXDmE1wJ/1ecCZay35oqz 3PcZon1oMhfzqCX4IpFRzV6VoeeLdG1pAMoFTrpNGBlHawLLom1rpVUlud6FwECti43j 2F9QVDyQirLLEhok36iRnJOi8CRiYy/m3hBe2idyqgv4F1lcPE5/ebt7QGTHZBWocJSK /WVw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ti.com header.s=ti-com-17Q1 header.b=pImX/ZNh; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=NONE dis=NONE) header.from=ti.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k131si1250248pgc.101.2018.02.07.10.20.03; Wed, 07 Feb 2018 10:20:20 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@ti.com header.s=ti-com-17Q1 header.b=pImX/ZNh; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=NONE dis=NONE) header.from=ti.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754165AbeBGSTP (ORCPT + 99 others); Wed, 7 Feb 2018 13:19:15 -0500 Received: from fllnx209.ext.ti.com ([198.47.19.16]:31742 "EHLO fllnx209.ext.ti.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753390AbeBGSTN (ORCPT ); Wed, 7 Feb 2018 13:19:13 -0500 Received: from dflxv15.itg.ti.com ([128.247.5.124]) by fllnx209.ext.ti.com (8.15.1/8.15.1) with ESMTP id w17IJCoo016837; Wed, 7 Feb 2018 12:19:12 -0600 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ti.com; s=ti-com-17Q1; t=1518027552; bh=4tQR9ImF0CmRjSWF8J/Ceqd9pw+q7+Sp8Zjs4g4T1C4=; h=Subject:To:References:From:CC:Date:In-Reply-To; b=pImX/ZNh+vs1t88e9vKJTUINY3EERAl728onZMK2f5uX5o9mphSl3uTgpV2XPDZWJ /VPQE+dQqqS7owtLSAdVANoFUXdp9ahhbLLemf7yqWAStk8SvrwyA+ie7xAGrOkkHz A8FinFvyTDqEHyDG0AVxYRrIhW2F+gfoyh8bTpgA= Received: from DLEE70.ent.ti.com (dlee70.ent.ti.com [157.170.170.113]) by dflxv15.itg.ti.com (8.14.3/8.13.8) with ESMTP id w17IJCi7032108; Wed, 7 Feb 2018 12:19:12 -0600 Received: from [128.247.59.147] (128.247.59.147) by DLEE70.ent.ti.com (157.170.170.113) with Microsoft SMTP Server id 14.3.294.0; Wed, 7 Feb 2018 12:19:11 -0600 Subject: Re: [PATCH] net: ethernet: ti: cpsw: fix net watchdog timeout To: Ivan Khoronzhuk References: <20180207011706.13393-1-grygorii.strashko@ti.com> <20180207030316.GA7883@khorivan> <20180207133105.GB7883@khorivan> From: Grygorii Strashko CC: "David S. Miller" , , Sekhar Nori , , Message-ID: Date: Wed, 7 Feb 2018 12:19:11 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 In-Reply-To: <20180207133105.GB7883@khorivan> Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [128.247.59.147] Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 02/07/2018 07:31 AM, Ivan Khoronzhuk wrote: > On Wed, Feb 07, 2018 at 05:03:19AM +0200, Ivan Khoronzhuk wrote: >> On Tue, Feb 06, 2018 at 07:17:06PM -0600, Grygorii Strashko wrote: >>> It was discovered that simple program which indefinitely sends 200b UDP >>> packets and runs on TI AM574x SoC (SMP) under RT Kernel triggers network >>> watchdog timeout in TI CPSW driver (<6 hours run). The network watchdog >>> timeout is triggered due to race between cpsw_ndo_start_xmit() and >>> cpsw_tx_handler() [NAPI] >>> >>> cpsw_ndo_start_xmit() >>> if (unlikely(!cpdma_check_free_tx_desc(txch))) { >>> txq = netdev_get_tx_queue(ndev, q_idx); >>> netif_tx_stop_queue(txq); >>> >>> ^^ as per [1] barrier has to be used after set_bit() otherwise new value >>> might not be visible to other cpus >>> } >>> >>> cpsw_tx_handler() >>> if (unlikely(netif_tx_queue_stopped(txq))) >>> netif_tx_wake_queue(txq); >>> >>> and when it happens ndev TX queue became disabled forever while driver's HW >>> TX queue is empty. >> I'm sure it fixes test case somehow but there is some strangeness. >> (I've thought about this some X months ago): >> 1. If no free desc, then there is bunch of descs on the queue ready to be sent >> 2. If one of this desc while this process was missed then next will wake queue, >> because there is bunch of them on the fly. So, if desc on top of the sent queue >> missed to enable the queue, then next one more likely will enable it anyway.. >> then how it could happen? The described race is possible only on last >> descriptor, yes, packets are small the speed is hight, possibility is very small >> .....but then next situation is also possible: >> - packets are sent fast >> - all packets were sent, but no any descriptors are freed now by sw interrupt (NAPI) >> - when interrupt had started NAPI, the queue was enabled, all other next >> interrupts are throttled once NAPI not finished it's work yet. >> - when new packet submitted, no free descs are present yet (NAPI has not freed >> any yet), but all packets are sent, so no one can awake tx queue, as interrupt >> will not arise when NAPI is started to free first descriptor interrupts are >> disabled.....because h/w queue to be sent is empty... >> - how it can happen as submitting packet and handling packet operations is under >> channel lock? Not exactly, a period between handling and freeing the descriptor >> to the pool is not under channel lock, here: >> >> spin_unlock_irqrestore(&chan->lock, flags); >> if (unlikely(status & CPDMA_DESC_TD_COMPLETE)) >> cb_status = -ENOSYS; >> else >> cb_status = status; >> >> __cpdma_chan_free(chan, desc, outlen, cb_status); >> return status; >> >> unlock_ret: >> spin_unlock_irqrestore(&chan->lock, flags); >> return status; >> >> And: >> __cpdma_chan_free(chan, desc, outlen, cb_status); >> -> cpdma_desc_free(pool, desc, 1); >> >> As result, queue deadlock as you've described. >> Just thought, not checked, but theoretically possible. >> What do you think? Yep. And if this happens desc_alloc_fail should be >0 while i usually see it 0 when watchdog triggers. I was able to reproduce this situation once, but with bunch of debug code added which changes timings: Prerequisite: only one free desc available. cpsw_ndo_start_xmit1 cpsw_tx_poll prepare and send packet ->Free desc queue is empty at this moment if (unlikely(!cpdma_check_free_tx_desc(txch))) netif_tx_stop_queue(txq); --- tx queue stopped cpdma_chan_process() spin_lock_irqsave(&chan->lock, flags); chan->count-- spin_unlock_irqrestore(&chan->lock, flags) cpsw_tx_handler() if (unlikely(netif_tx_queue_stopped(txq))) netif_tx_wake_queue(txq); --- tx queue is woken up cpsw_ndo_start_xmit2 prepare packet ret = cpsw_tx_packet_submit(priv, skb, txch); //fail as desc not returned to the pool yet if (unlikely(ret != 0)) { cpsw_err(priv, tx_err, "desc submit failed\n"); goto fail; } cpdma_desc_free() fail: ndev->stats.tx_dropped++; netif_tx_stop_queue(txq); // oops. That's why I added double check and barrier in fail path also > > Better explanation, for rare race: > > start conditions: > - all descs are submitted, except last one, and queue is not stopped > - any desc was returned to the pool yet (but packets can be sent) > > time > || > \/ > > submit process NAPI poll process > -------------------------------------------------------------------------------- > new packet is scheduled for submission > stated that no free descs (with locks) > lock is freed > returned all descs to the pool > and queue is enabled > interrupt enabled, poll exit > queue is disabled > submit exit > > Result: > - all descs are returned to the pool, submission to the queue disabled > - NAPI cannot wake queue, as all desc were handled already > > According to packet size in 200B > Data size, bits: 200B * 63desc * 10 = 128000bit roughly > Time all of them are sent: 128000 / 1Gb = 128us > > That's enough the CPU to be occupied by other process in RT even. > First, in case of RT, it's not a "rare race" as all processing is done in threads and both xmit and cpsw_tx_poll can be kicked out any time - in case of xmit right before netif_tx_stop_queue(). It also better reproducible with low speed as xmit drains Free desc pool faster. Next, on SMP it not only execution race as netif_tx_stop_queue() is just set_bit(), so new value might not be visible to other cpus and due to netif_tx_queue_stopped(txq) check cpsw_tx_handler() might fail to wake up the queue. By the way, I'm not inventing smth new here - I've checked bunch of net drivers and found that a lot of them have exactly the same netif_tx_stop_queue/netif_tx_wake_queue handling. Do you want me to update description or there are other comments? -- regards, -grygorii