Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752588AbeAQJYM (ORCPT + 1 other); Wed, 17 Jan 2018 04:24:12 -0500 Received: from Galois.linutronix.de ([146.0.238.70]:45321 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752134AbeAQJYK (ORCPT ); Wed, 17 Jan 2018 04:24:10 -0500 Date: Wed, 17 Jan 2018 10:24:06 +0100 (CET) From: Thomas Gleixner To: Keith Busch cc: LKML Subject: Re: [BUG 4.15-rc7] IRQ matrix management errors In-Reply-To: <20180117075500.GB7562@localhost.localdomain> Message-ID: References: <20180115025759.GG13580@localhost.localdomain> <20180115030255.GA13921@localhost.localdomain> <20180116061641.GB32639@localhost.localdomain> <20180116071145.GA5643@localhost.localdomain> <20180117022511.GD6259@localhost.localdomain> <20180117075500.GB7562@localhost.localdomain> User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On Wed, 17 Jan 2018, Keith Busch wrote: > On Wed, Jan 17, 2018 at 08:34:22AM +0100, Thomas Gleixner wrote: > > Can you trace the matrix allocations from the very beginning or tell me how > > to reproduce. I'd like to figure out why this is happening. > > Sure, I'll get the irq_matrix events. > > I reproduce this on a machine with 112 CPUs and 3 NVMe controllers. The > first two NVMe want 112 MSI-x vectors, and the last only 31 vectors. The > test runs 'modprobe nvme' and 'modprobe -r nvme' in a loop with 10 > second delay between each step. Repro occurs within a few iterations, > sometimes already broken after the initial boot. That doesn't sound right. The vectors should be spread evenly accross the CPUs. So ENOSPC should never happen. Can you please take snapshots of /sys/kernel/debug/irq/ between the modprobe and modprobe -r steps? Thanks, tglx