Received: by 10.223.176.46 with SMTP id f43csp460084wra; Wed, 24 Jan 2018 00:36:24 -0800 (PST) X-Google-Smtp-Source: AH8x226nTeViF3b8/pSzPGxvISgKMdrcYh0wvCWVG+c+8olRrrVHRBc19ocUBnU5o5fovXRnDjOp X-Received: by 10.98.42.79 with SMTP id q76mr9210935pfq.23.1516782984326; Wed, 24 Jan 2018 00:36:24 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516782984; cv=none; d=google.com; s=arc-20160816; b=tk1iorPgS7iBd09p1WNnZ96721HtjvCZDV2Naw+CoyIByRZtcjlNLTGivEZUP08GB5 nWJaDemHqC5NVQ2bIbcN082bzv3V/s9sU9f0JhKns/r2NCxpPNiM2a9kbHwnQ5YDzTKY /eZJbIIFyCiNOLOwhDr3bNFFS6mU3JQdt/CjcEYNaXWTAgWE+3F2yaQfj0nwYfk3dIMg BbbgVgGvhM1wycVKzQnTHfzH4B/4XDq/JDygbaQ5W8LW1t6IWYJBF8pmUPzPVYwhhGLE pVJ9TEmfB06FEuDyoN35J3cjsleMtDyMvcmONtAS25GWFGmYM+8y200YYrnSHgbNc/GS EKTg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=O23OfJQX8wmaBZvgLjbHPuwVFC28CmR9A6QxN81wOXw=; b=C+e5hgsYSiVMLw96nFUwGgGHMWgnINubyvSXumEpuLqOrQHIt7FpaoM70fvjWEmle3 x25UvcLzCtWodKE2lKwAzg/fOB6bqf3uzcD/QjV2ijdnpf+tgh2rf3r5onEpIqpUg9TL xrW7foFO0VpLe6cz7Wo+uq0MoDl4Te6dhuArrLxHNNuBU2KLY33Ckrt/V4Gqw55ileBG LPjDKS/E90puzli4xPmpmB0iE5KZRIAX1A6xPPGz9GcJhEQiL0XafDxt3n4oCA2k8EZW SghL/oHnRumWpZCyqtkBwkK3BKwP/wEjuObtb/TzpCzr/QcjcCrP5e/F+qlkH2jOODWt qUWA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=ZMCjXgOP; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a5si15502263pgw.241.2018.01.24.00.36.10; Wed, 24 Jan 2018 00:36:24 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=ZMCjXgOP; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932370AbeAXIfq (ORCPT + 99 others); Wed, 24 Jan 2018 03:35:46 -0500 Received: from mail-pf0-f175.google.com ([209.85.192.175]:32983 "EHLO mail-pf0-f175.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932103AbeAXIfo (ORCPT ); Wed, 24 Jan 2018 03:35:44 -0500 Received: by mail-pf0-f175.google.com with SMTP id t5so2517073pfi.0; Wed, 24 Jan 2018 00:35:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=O23OfJQX8wmaBZvgLjbHPuwVFC28CmR9A6QxN81wOXw=; b=ZMCjXgOP/UWhjcCZXNkxHd0b4n6V8FMGT+4qENZKCh+HiOA21BxYvVbpob2q2vL8hw H3RSr7yHnULEfSOYJkaudkXkYkilYDI1SKAXt5xcisiEZdk7Vprk35r3w6jeq6fGAUZT oy7DtCuCeRSCAPJGkGA/LD0yQgQTRR+SsmygIiMf+NpTL9w3wSJmXlkR6vG3vIlNhnfW rDU5QAzzcnex3fMZFTjROb6HwBbTRTtNNkC0lVkr+pFHCpE14xlnkvBhV8SsoRGw4hW4 Kriv+A9gl9GlFlAfSV1snmzloTQlwcDevkk4itcm5jeFcfyLlh7Eu637JzGnK7GSh4d0 P8eQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=O23OfJQX8wmaBZvgLjbHPuwVFC28CmR9A6QxN81wOXw=; b=HC8kGgQxD3gOTGnqj6AKq6vxD70WsZ5/YjrITOcYqkQZZlQQj3J0pjZG6TiuuO4yQ0 /W+WA5ByTEC4ACPdk8j3a80EH3NS26rioZ0fozgowE97jfU8Elh/NmxVrOYWDyDdI5bh kLRArwwKBrDgPLV9psQLPGrhRXP6LNryyRZkYnhMSlj12iftsKrSoq8aoOJNCiKl4Fd0 u30b7Chpoz6BJNO1WRs3banv7Qe5bHAW52WMTsRVW5OnQ+r1uUySh9M9Dd5IaG8Lc1tp tngRaaTTsbWk+I760UZdZlCCGkcM9ZyEc0cXk78Ij7SnVH3OSIYQuEeByggp40IrGe2d odYQ== X-Gm-Message-State: AKwxytducwr459l1iWV1LYMcOVmxz+q+gBAFlZER05RAZws+SjP8n0Kz /Wdz1qmcj1HYmx7INqQDGOqFI0Ae X-Received: by 10.98.242.2 with SMTP id m2mr12570499pfh.102.1516782943891; Wed, 24 Jan 2018 00:35:43 -0800 (PST) Received: from f1.synalogic.ca (113x35x119x249.ap113.ftth.ucom.ne.jp. [113.35.119.249]) by smtp.gmail.com with ESMTPSA id v9sm7944911pfj.88.2018.01.24.00.35.41 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 24 Jan 2018 00:35:42 -0800 (PST) Date: Wed, 24 Jan 2018 17:35:39 +0900 From: Benjamin Poirier To: Alexander Duyck Cc: Shrikrishna Khare , Jeff Kirsher , Netdev , intel-wired-lan , linux-kernel@vger.kernel.org Subject: Re: [Intel-wired-lan] [RFC PATCH] e1000e: Remove Other from EIAC. Message-ID: <20180124083539.nwwmmt7g2pxrcsej@f1.synalogic.ca> References: <20180118065054.29844-1-bpoirier@suse.com> <20180119085952.u63kius4ud34lleq@f1.synalogic.ca> <20180119133648.s5nbm4gvby6c33av@f1.synalogic.ca> <20180119224517.klugizz5n5zznryx@f1.synalogic.ca> <20180119225500.lq2vpnjh5isxiovf@f1.synalogic.ca> <20180122071214.cv773dufu6n4lvnw@f1.synalogic.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20170912 (1.9.0) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2018/01/22 10:01, Alexander Duyck wrote: [...] > > > > If the patch that I submitted for the current vmware issue is merged, > > the significant commits that are left are: > > > > 0a8047ac68e5 e1000e: Fix msi-x interrupt automask (v4.5-rc1) > > Fixes a problem in the irq disabling of the napi implementation. > > This one I fully agree is still needed. > > > 19110cfbb34d e1000e: Separate signaling for link check/link up > > (v4.15-rc1) > > Fixes link flapping caused by a race condition in link > > detection. It was found because the Other interrupt was being > > triggered sort of spuriously by rxo. > > This one is somewhat iffy. I am not sure if the patch description > really matches what it is doing. It doesn't appear to do what it says > it is trying to do since clearing get_link_status will still trigger a > link up, it just takes an extra 2 seconds. I think there may be issues > if you aren't using autoneg, as I don't see how you are getting the > link to report up other than the fact that mac->get_link_status has > been cleared but we are reporting a pseduo-error. In addition it is > only really needed after the RXO problem was introduced which really > didn't exist until after we stopped checking for LSC. One interesting > test we may want to look at is to see if there is an additional delay > in a link coming up for a non-autoneg setup. If we find an additional > 2 second delay then I would be even more confident that this patch has > a bug. It seems like you're right but I didn't look into this part of the problem in detail yet. I'll get back to it. > > > 4aea7a5c5e94 e1000e: Avoid receiver overrun interrupt bursts (v4.15-rc1) > > Fixes Other interrupt bursts during sustained rxo conditions. > > So the RXO problem probably didn't exist until we stopped checking for > the OTHER and LSC bits in the "other" interrupt handler. Yes there > would be more "other" cause interrupts, but they shouldn't have been > causing much in the way of issues since the get_link_status value > never changed. Personally I would lean more toward the option of I agree. I tested rxo behavior on commit 4d432f67ff00 ("e1000e: Remove unreachable code", v4.5-rc1) which is before any significant change in that area. (I force rxo by adding mdelay(10) to e1000_clean_rx_irq and sending a netperf UDP_STREAM from another host). In case of sustained rxo condition, we get repeated Other interrupts. Handling these irqs is useless work that could be avoided when the system is already overloaded but it doesn't lead to misbehavior like the race condition described in the log of commit 19110cfbb34d ("e1000e: Separate signaling for link check/link up", v4.15-rc1). However, I noticed something unexpected. It seems like reading ICR doesn't clear every bit that's set in IAM, most notably not rxo. In a different test, I was doing a single write of RXO | OTHER to ICS, then two subsequent reads of icr gave 0x01000041. OTOH, writing a bit to ICS reliably clears it. So if you want to remove RXO interrupt mitigation, you should at least add a write of RXO to ICR, to clear it. On my system it reduced Other interrupts from ~17000/s to ~1700/s when using the mdelay testing approach. > reverting this patch and instead just focus on testing OTHER and LSC > as we originally were so that we don't risk messing up NAPI by messing > with ring state from a non-ring interrupt. > > I will try to get to these later this week if you would like. > Unfortunately I don't have any of these devices in any of my > development systems so I have to go chase one down. Otherwise you are > free to take these on and tell me if I have made another flawed > assumption somewhere, but I am thinking the RXO issue goes away if we > get the original "other" interrupt routine back to where it was. > > So the last bit in all this ends up being that because of 0a8047ac68e5 > e1000e: Fix msi-x interrupt automask (v4.5-rc1) we don't seem to > auto-clear interrupt causes anymore on ICR read. I am not certain what > the impact of this is. I would be interested in finding out if a cause > left set will trigger an interrupt storm or if it just goes quiet when > we just leave the value high. If it goes quiet then that in itself > might solve the RXO interrupt burst problem if we don't clear it. > Otherwise we need to make certain to clear all of the causes that can > trigger the "other" interrupt to fire regardless of if we service the > events or not. In MSI-X mode, as long as Other is not set in ICR, nothing will happen even if the bits related to Other (LSC, RXO, MDAC, SRPD, ACK, MNG) are set. However, that doesn't solve the rxo interrupt burst because an rxo condition in hardware sets both RXO and Other, so it triggers an interrupt.