Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751897AbbDCG4H (ORCPT ); Fri, 3 Apr 2015 02:56:07 -0400 Received: from mail-wg0-f48.google.com ([74.125.82.48]:33824 "EHLO mail-wg0-f48.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751186AbbDCG4C (ORCPT ); Fri, 3 Apr 2015 02:56:02 -0400 Date: Fri, 3 Apr 2015 08:55:57 +0200 From: Ingo Molnar To: Jesse Brandeburg Cc: torvalds@linux-foundation.org, Thomas Gleixner , linux-kernel@vger.kernel.org, John Subject: Re: [PATCH] irq: revert non-working patch to affinity defaults Message-ID: <20150403065557.GA12815@gmail.com> References: <20150403005022.3143.73693.stgit@jbrandeb-cp2.jf.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150403005022.3143.73693.stgit@jbrandeb-cp2.jf.intel.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2566 Lines: 64 * Jesse Brandeburg wrote: > I've seen a couple of reports of issues since commit e2e64a932556 ("genirq: > Set initial affinity in irq_set_affinity_hint()") where the > affinity for the interrupt when programmed via > /proc/irq//smp_affinity will not be able to stick. It changes back > to some previous value at the next interrupt on that IRQ. > > The original intent was to fix the broken default behavior of all IRQs > for a device starting up on CPU0. With a network card with 64 or more > queues, all 64 queue's interrupt vectors end up on CPU0 which can have > bad side effects, and has to be fixed by the irqbalance daemon, or by > the user at every boot with some kind of affinity script. > > The symptom is that after a driver calls set_irq_affinity_hint, the > affinity will be set for that interrupt (and readable via /proc/...), > but on the first irq for that vector, the affinity for CPU0 or CPU1 > resets to the default. The rest of the irq affinites seem to work and > everything is fine. > > Impact if we don't fix this for 4.0.0: > Some users won't be able to set irq affinity as expected, on > some cpus. > > I've spent a chunk of time trying to debug this with no luck and suggest > that we revert the change if no-one else can help me debug what is going > wrong, we can pick up the change later. > > This commit would also revert commit 4fe7ffb7e17ca ("genirq: Fix null pointer > reference in irq_set_affinity_hint()") which was a bug fix to the original > patch. So the original commit also has the problem that it unnecessary drops/retakes the descriptor lock: > irq_put_desc_unlock(desc, flags); > - /* set the initial affinity to prevent every interrupt being on CPU0 */ > - if (m) > - __irq_set_affinity(irq, m, false); i.e. why not just call into irq_set_affinity_locked() while we still have the descriptor locked? Now this is just a small annoyance that should not really matter - it would be nice to figure out the real reason for why the irqs move back to CPU#0. In theory the same could happen to 'irqbalanced' as well, if it calls shortly after an irq was registered - so this is not a bug we want to ignore. Also, worst case we are back to where v3.19 was, right? So could we try to analyze this a bit more? Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/