Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753169AbcKTGSV (ORCPT ); Sun, 20 Nov 2016 01:18:21 -0500 Received: from mail-db5eur01on0068.outbound.protection.outlook.com ([104.47.2.68]:8320 "EHLO EUR01-DB5-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1750838AbcKTGSS (ORCPT ); Sun, 20 Nov 2016 01:18:18 -0500 From: Andy Duan To: Chris Lesiak CC: "netdev@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "Jaccon Bastiaansen" Subject: RE: [PATCH] net: fec: Detect and recover receive queue hangs Thread-Topic: [PATCH] net: fec: Detect and recover receive queue hangs Thread-Index: AQHSQRf9Yh5sZt2Qwk6CMjQDco2+iKDeRuFwgACJTACAApT3gA== Date: Sun, 20 Nov 2016 06:18:13 +0000 Message-ID: References: <1479417282-15540-1-git-send-email-chris.lesiak@licor.com> In-Reply-To: Accept-Language: zh-CN, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=fugang.duan@nxp.com; x-originating-ip: [199.59.231.64] x-microsoft-exchange-diagnostics: 1;AM4PR0401MB2259;7:huiJGklissFkD/m0QDmIEKXftgkOkZaXvhktRz1aGGkTssjVzQCJxEyt7OfzEApqCj3nl1b0k1KIgw0PlBuJxxrz/4UxkQSWlMI+SEzomB5W2SkB+MU1KsrW1NnF06uDa1bMuFga0R3Wl0GXomvpwkGOosHuefAVk5D0AZIWhkNnhi3TMOfvVZSTcfbVhb/31KoG4LckJALgKRI+PPTsjI+GNHYNCnOcPZcG2vt6mPJNg0TLfXvFiUT+nh9qqLP9RNk2ILr+LwHZjj0x2nI2WYKgnEuYypP5gHwPV0wVm6sBfCmInC4d0NmQM7GvNSRSLw0oDa1Fk1roxwFhfV9J44OS+fNotjt/cXOdN9elzec= x-ms-office365-filtering-correlation-id: 1fe37a2a-c24d-4c29-3e9c-08d4110d0215 x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:(22001);SRVR:AM4PR0401MB2259; x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(9452136761055)(185117386973197); x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(6040307)(6045199)(601004)(2401047)(5005006)(8121501046)(3002001)(10201501046)(6055026)(6041248)(6046074);SRVR:AM4PR0401MB2259;BCL:0;PCL:0;RULEID:;SRVR:AM4PR0401MB2259; x-forefront-prvs: 0132C558ED x-forefront-antispam-report: SFV:NSPM;SFS:(10009020)(6009001)(7916002)(51444003)(189002)(377454003)(24454002)(199003)(106116001)(81166006)(81156014)(101416001)(87936001)(4326007)(106356001)(8936002)(105586002)(86362001)(189998001)(2950100002)(38730400001)(9686002)(6916009)(68736007)(76576001)(92566002)(66066001)(8666005)(305945005)(8676002)(2900100001)(3846002)(575784001)(76176999)(74316002)(3660700001)(3280700002)(6506003)(6116002)(7846002)(122556002)(7736002)(2906002)(77096005)(110136003)(5660300001)(7696004)(54356999)(97736004)(102836003)(50986999)(33656002)(229853002)(7059030);DIR:OUT;SFP:1101;SCL:1;SRVR:AM4PR0401MB2259;H:AM4PR0401MB2260.eurprd04.prod.outlook.com;FPR:;SPF:None;PTR:InfoNoRecords;MX:1;A:1;LANG:en; spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 X-OriginatorOrg: nxp.com X-MS-Exchange-CrossTenant-originalarrivaltime: 20 Nov 2016 06:18:13.0714 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 686ea1d3-bc2b-4c6f-a92c-d99c5c301635 X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM4PR0401MB2259 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by mail.home.local id uAK6ISWM026845 Content-Length: 5011 Lines: 134 From: Chris Lesiak Sent: Friday, November 18, 2016 10:37 PM >To: Andy Duan >Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Jaccon >Bastiaansen >Subject: Re: [PATCH] net: fec: Detect and recover receive queue hangs > >On 11/18/2016 12:44 AM, Andy Duan wrote: >> From: Chris Lesiak Sent: Friday, November 18, >> 2016 5:15 AM >> >To: Andy Duan >> >Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Jaccon >> >Bastiaansen ; chris.lesiak@licor.com >> >Subject: [PATCH] net: fec: Detect and recover receive queue hangs > >> >This corrects a problem that appears to be similar to ERR006358. But >> while >> >ERR006358 is a race when the tx queue transitions from empty to not >> empty, >this problem is a race when the rx queue transitions from full to >not full. >> > >> >The symptom is a receive queue that is stuck. The ENET_RDAR >> register will >read 0, indicating that there are no empty receive >> descriptors in the receive >ring. Since no additional frames can be queued, >no RXF interrupts occur. >> > >> >This problem can be triggered with a 1 Gb link and about 400 Mbps of >traffic. > >I can cause the error by running the following on an imx6q: iperf -s -u And >sending packets from the other end of a 1 Gbps link: >iperf -c $IPADDR -u -b40000pps > >A few others have seen this problem. >See: https://community.nxp.com/thread/322882 > >> > >> >This patch detects this condition, sets the work_rx bit, and >> reschedules the >poll method. >> > >> >Signed-off-by: Chris Lesiak >> >--- >> > drivers/net/ethernet/freescale/fec_main.c | 31 >> >+++++++++++++++++++++++++++++++ > 1 file changed, 31 insertions(+) >> > Firstly, how to reproduce the issue, pls list the reproduce steps. >> Thanks. >> Secondly, pls check below comments. >> >> >diff --git a/drivers/net/ethernet/freescale/fec_main.c >> >b/drivers/net/ethernet/freescale/fec_main.c >> >index fea0f33..8a87037 100644 >> >--- a/drivers/net/ethernet/freescale/fec_main.c >> >+++ b/drivers/net/ethernet/freescale/fec_main.c >> >@@ -1588,6 +1588,34 @@ fec_enet_interrupt(int irq, void *dev_id) >> > return ret; >> > } >> > >> >+static inline bool >> >+fec_enet_recover_rxq(struct fec_enet_private *fep, u16 queue_id) { >> >+ int work_bit = (queue_id == 0) ? 2 : ((queue_id == 1) ? 0 : 1); >> >+ >> >+ if (readl(fep->rx_queue[queue_id]->bd.reg_desc_active)) >> If rx ring is really empty in slight throughput cases, rdar is always cleared, >then there always do napi reschedule. > >I think that you are concerned that if rdar is zero due to this hardware >problem, but the rx ring is actually empty, then fec_enet_rx_queue will >never do a write to rdar so that it can be non-zero. That will cause napi to >always be resceduled. > >I suppose that might be the case with zero rx traffic, and I was concerned >that it might be true even when there was rx traffic. I suspected that the >hardware, seeing that rdar is zero, would never queue another packet, even >if there were in fact empty descriptors. But it doesn't seem to be the case. It >does reschedule multiple times, but eventually sees some packets in the rx >ring and recovers. > >I admit that I do not completely understand how that can happen. I did >confirm that fec_enet_active_rxring is not being called. > >Maybe someone with a deeper understanding of the fec than I can provide >an explanation. > The patch needs to hold on for some time (days), I will reserve time to investigate the issue. Thanks. >> >> >+ return false; >> >+ >> >+ dev_notice_once(&fep->pdev->dev, "Recovered rx queue\n"); >> >+ >> >+ fep->work_rx |= 1 << work_bit; >> >+ >> >+ return true; >> >+} >> >+ >> >+static inline bool fec_enet_recover_rxqs(struct fec_enet_private >> *fep) >+{ >> >+ unsigned int q; >> >+ bool ret = false; >> >+ >> >+ for (q = 0; q < fep->num_rx_queues; q++) { >> >+ if (fec_enet_recover_rxq(fep, q)) >> >+ ret = true; >> >+ } >> >+ >> >+ return ret; >> >+} >> >+ >> > static int fec_enet_rx_napi(struct napi_struct *napi, int budget) { >> > struct net_device *ndev = napi->dev; >> >@@ -1601,6 +1629,9 @@ static int fec_enet_rx_napi(struct napi_struct >> *napi, >int budget) >> > if (pkts < budget) { >> > napi_complete(napi); >> > writel(FEC_DEFAULT_IMASK, fep->hwp + FEC_IMASK); >> >+ >> >+ if (fec_enet_recover_rxqs(fep) && napi_reschedule(napi)) >> >+ writel(FEC_NAPI_IMASK, fep->hwp + FEC_IMASK); >> > } >> > return pkts; >> > } >> >-- >> >2.5.5 >> > > >-- >Chris Lesiak >Principal Design Engineer, Software >LI-COR Biosciences >chris.lesiak@licor.com > >Any opinions expressed are those of the author and do not necessarily >represent those of his employer. >