Received: by 10.223.176.46 with SMTP id f43csp2574590wra; Sun, 21 Jan 2018 23:13:02 -0800 (PST) X-Google-Smtp-Source: AH8x227sIgOSJJHKaEaBe7mq9t+KUIMG/SYGz3gqyXQ01WlIFZAT2vXObvKn2bOAC4LvPcTtPhNA X-Received: by 10.99.55.66 with SMTP id g2mr6599222pgn.61.1516605181927; Sun, 21 Jan 2018 23:13:01 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516605181; cv=none; d=google.com; s=arc-20160816; b=Ir6kAEcRQfaLYuW1i2pzbmMOySYUD3hiDn892W6wEdHVUGz1UK5+t3r5M4w5SQRH4q LIhlol0LNQzZQEA7mapE22gIzs2Jm+cBt7ttW/nfElVFtQoCqVLd/k+WYslu28KwrKiR asXLyKt9E7gWB5bqrwjXkoM+V8GGB5EQszozOHCSWD/zC70Zzph2cKoa9Cvg3pnLbLCl a8oleaZyXzyLgLNqPb/19jP0W/hbIp9+rXjwbpsE+NzoTVpC2TTU1494UHhLFFdHrDQd kg90HbxNCEzZBxSC/59h2DjeKaBKsM2vUjXQwlR9w1oTOeTJvSz2l5YN/8l2Ow13mE7G sHOQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=ALzgbAKrkmsVtCvK65pmoqloD5EIVTdFw8+7QDhHqUA=; b=uarglRMYd0LD05tKLqzEOqoSPc5e0QBwPvo3lyJEyCs363Csjku1jiFNJUxVaL1OAy rYH/IhL39OngoC3NxqDg/CJDkVbqoXhbPz+fhRTbk4JgI6aPmX7VUIOYmuuDFVW1dqPL 9cJBNw4to1NOWfuY/Ty/xGAQsNut507hPGYr2CeIx2BgZywZCzK921a8JlZSZopEYNMS illu2apyTnPWKIgPfKPVGBmzZnKoVyjW3lgWcLRY77aCTBHAh0FI3x6SiBlLQ1n0BCgQ 0rCbhQFFFyUJyyKOuqRT6S7XRRmz1h+THk0vdoKkuu8Ijd12oCZApS5lV/U4pFb2dN2V S61Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=YJQqwcQs; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a80si14933168pfg.372.2018.01.21.23.12.47; Sun, 21 Jan 2018 23:13:01 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=YJQqwcQs; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751139AbeAVHMW (ORCPT + 99 others); Mon, 22 Jan 2018 02:12:22 -0500 Received: from mail-pg0-f47.google.com ([74.125.83.47]:45677 "EHLO mail-pg0-f47.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751003AbeAVHMU (ORCPT ); Mon, 22 Jan 2018 02:12:20 -0500 Received: by mail-pg0-f47.google.com with SMTP id c194so6409805pga.12; Sun, 21 Jan 2018 23:12:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=ALzgbAKrkmsVtCvK65pmoqloD5EIVTdFw8+7QDhHqUA=; b=YJQqwcQsDUnwSsU0VD8/OpH2C1Rj2ctR7Bdb/sYO//snyBAfFNFJZ4kByhJlJ+TxGj UO5QGYK08ec4M7IbZT3vHRWUGZRV/ysfdMBip08z5H7PW/pP++Tjaonk34zV3eAQYn1t 66R88xQ7os1LNzoGIA4eZBtBq4kMMgI9oLAbvY7ho6nfsdy7Bup066bWDXEy7JDaLfDX 5vsthgyhMf5aHNKlS2FKroJOZ6MZOYRk/sgyERUuzblalcQ0q4t2+bJqKtVVne1vNDIC 4bnFS3Cb9Fcsaa4Uo3VrqJMjMvT9QlnNS7Sn1LLzq8QFTmdOBWTfE5touS0aP+KIGFes U0gg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=ALzgbAKrkmsVtCvK65pmoqloD5EIVTdFw8+7QDhHqUA=; b=fJprWEUKdEOXHemSUZG69Wkx/xBB4+6H4iApvz+2AexG4cUrSNpS1HWcyYsixixg9M VmPfIvnMC9LWBULMZ4OL4WSCQ2JD3t7Z2xBs4SP2ubdOJP2WRIHHYVqN8Jb0dszBVOi6 e0dYXHTjijrH/OpIE+Rdjs+oDfc721a86VePHcqiCvYB4HAsXFkzbshr4k4w4fdKf+M5 0zaLLM303ya8ipcJU+WYZ1SHqkpQ1GQCHRDDgBDhDPadZvwJnJ82Wrotxl4yPA8GIjm/ w8tVUEGLCr57aaMpbjaycv/yEqPqqmKXreNuGiPK2B7JH4c8VpgyIHo0Sjkgyw4EX5st JDvg== X-Gm-Message-State: AKwxytf5UC5oaXovcBj99FcBoJxVVMnfSPzWgaqjZVfhq7WpUaLFmZ/p CwqKVE/h96b30jVQ9B3QqNA= X-Received: by 10.101.86.201 with SMTP id w9mr6474146pgs.434.1516605139234; Sun, 21 Jan 2018 23:12:19 -0800 (PST) Received: from f1.synalogic.ca (113x35x119x249.ap113.ftth.ucom.ne.jp. [113.35.119.249]) by smtp.gmail.com with ESMTPSA id e7sm29151205pfj.44.2018.01.21.23.12.15 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sun, 21 Jan 2018 23:12:17 -0800 (PST) Date: Mon, 22 Jan 2018 16:12:14 +0900 From: Benjamin Poirier To: Alexander Duyck Cc: Shrikrishna Khare , Jeff Kirsher , Netdev , intel-wired-lan , linux-kernel@vger.kernel.org Subject: Re: [Intel-wired-lan] [RFC PATCH] e1000e: Remove Other from EIAC. Message-ID: <20180122071214.cv773dufu6n4lvnw@f1.synalogic.ca> References: <20180118065054.29844-1-bpoirier@suse.com> <20180119085952.u63kius4ud34lleq@f1.synalogic.ca> <20180119133648.s5nbm4gvby6c33av@f1.synalogic.ca> <20180119224517.klugizz5n5zznryx@f1.synalogic.ca> <20180119225500.lq2vpnjh5isxiovf@f1.synalogic.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20170912 (1.9.0) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2018/01/20 09:21, Alexander Duyck wrote: > On Fri, Jan 19, 2018 at 2:55 PM, Benjamin Poirier > wrote: > > On 2018/01/20 07:45, Benjamin Poirier wrote: > > [...] > >> > > >> > I'm of the mind that we need to cut down on the code thrash. This > >> > driver is supposed to have been in a "maintenance" mode for the last > >> > year or so as there aren't being any new parts added is my > >> > understanding. As-is I don't see any reason why 16ecba59bc33 ("e1000e: > >> > Do not read ICR in Other interrupt", v4.5-rc1) was submitted or > >> > accepted in the first place. I don't see any notes about it fixing any > >> > bug or addressing any issue and it seems like that is the start of all > >> > the issues we have been having recently with RXO triggering more > >> > interrupts, various link issues, and this most recent VMware issue. > >> > >> I'm sorry to say but you're the one who suggested that change: > >> > >> http://lkml.iu.edu/hypermail/linux/kernel/1510.3/03528.html > >> > >> On 2015/10/28 23:08, Alexander Duyck wrote: > >> > On 10/22/2015 05:32 PM, Benjamin Poirier wrote: > >> [...] > >> > > >> > I would argue your first patch probably didn't go far enough to remove dead > >> > code. Specifically you should only ever get into this function if LSC is > >> > set. There are no other causes that should trigger this. As such you could > >> > probably remove the ICR read, and instead replace it with an ICR write of > >> > the LSC bit since OTHER is already cleared via EIAC. > >> > > > > > ... The assumption that "There are no other causes that should trigger > > this." turned out to be wrong and that caused the RXO problems. Clearing > > OTHER via EIAC is causing the problems with vmware now. I don't think > > you foresaw those problems back in 2015 and neither did I. > > Well that explains why I felt like I was explaining things to a > younger version of myself. I was a bit more relaxed in terms of being > willing to make arbitrary changes a few years ago. I tend to be a bit > more conservative now, at least as far as having justifications as to > why I want to do things. With any change you end up taking on risk, > and so usually a patch has a justification as to why you are making > the change. > > Unfortunately at the time I didn't have all the information and based > my suggestion on a bad assumption. I would guess at the time I was > thinking of doing general code cleanup. Other drivers such as igb work > this way, but it led us down the path we are on now where we are > having to make one fix after another. It is leading in the opposite > direction of maintainability as this is all becoming more complex. > Suggesting this was a bad decision on my part at the time. I'm only > human, I make mistakes.. :-) Thanks for the introspection. > > With further review of the code I am seeing various other issues that > could still pop up as I am not certain we should even have the "other" > interrupt messing with the NAPI polling or packet accounting logic at > all. The question I would have at this point is if we revert > 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt", v4.5-rc1) > and all the related fixes for it, what do we end up with? The patch I submitted for the current vmware issue actually finishes reverting commit 16ecba59bc33. I believe the relevant commits to consider are: 16ecba59bc33 e1000e: Do not read ICR in Other interrupt (v4.5-rc1) a61cfe4ffad7 e1000e: Do not write lsc to ics in msi-x mode (v4.5-rc1) 0a8047ac68e5 e1000e: Fix msi-x interrupt automask (v4.5-rc1) 19110cfbb34d e1000e: Separate signaling for link check/link up (v4.15-rc1) 4aea7a5c5e94 e1000e: Avoid receiver overrun interrupt bursts (v4.15-rc1) 4110e02eb45e e1000e: Fix e1000_check_for_copper_link_ich8lan return value. (v4.15-rc8) (submitted) e1000e: Remove Other from EIAC. commit 4aea7a5c5e94 essentially reverts a61cfe4ffad7 and part of 16ecba59bc33 (ICR read). The submitted patch reverts the rest of 16ecba59bc33 (EIAC clearing of Other). > It seems > like the code is slowly heading back in the direction of where it was > originally anyway since there have been a number of partial reverts. > I'm wondering what would happen if we were to just short-cut that and > audit the patches involved to see what we really need and don't. > > Your patch as proposed is essentially another step in that direction. > I'm thinking we may want to drop my currently proposed fix for now and > instead look at going through and figure out what changes after that > first one are still really needed. If the patch that I submitted for the current vmware issue is merged, the significant commits that are left are: 0a8047ac68e5 e1000e: Fix msi-x interrupt automask (v4.5-rc1) Fixes a problem in the irq disabling of the napi implementation. 19110cfbb34d e1000e: Separate signaling for link check/link up (v4.15-rc1) Fixes link flapping caused by a race condition in link detection. It was found because the Other interrupt was being triggered sort of spuriously by rxo. 4aea7a5c5e94 e1000e: Avoid receiver overrun interrupt bursts (v4.15-rc1) Fixes Other interrupt bursts during sustained rxo conditions. I think all of those are still needed, or if they're removed, they need to be reimplemented differently. > It doesn't look like my fix will > make it for 4.15 anyway so we might as well focus on making certain to > have things as solid as possible by the time 4.16-rc1 rolls around. > > Thanks. > > - Alex