Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932566AbbGUOVc (ORCPT ); Tue, 21 Jul 2015 10:21:32 -0400 Received: from mail-wi0-f180.google.com ([209.85.212.180]:37672 "EHLO mail-wi0-f180.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754726AbbGUOVU (ORCPT ); Tue, 21 Jul 2015 10:21:20 -0400 MIME-Version: 1.0 In-Reply-To: <20150721025639.GX58053@asylum.americas.sgi.com> References: <20150710191506.GA52396@asylum.americas.sgi.com> <20150714182234.GD17920@asylum.americas.sgi.com> <20150714184820.GB58053@asylum.americas.sgi.com> <20150714202848.GD58053@asylum.americas.sgi.com> <55A74E61.1080403@mellanox.com> <20150720162803.GL58053@asylum.americas.sgi.com> <20150721025639.GX58053@asylum.americas.sgi.com> Date: Tue, 21 Jul 2015 17:21:19 +0300 Message-ID: Subject: Re: [BUG] mellanox IB driver fails to load on large config From: Matan Barak To: Alex Thorlton Cc: Or Gerlitz , Or Gerlitz , andrew banman , Linux Kernel , Doug Ledford , Sean Hefty , Hal Rosenstock , "David S. Miller" , Roland Dreier , Matan Barak , Moni Shoua , Jack Morgenstein , Yishai Hadas , Eran Ben Elisha , Ira Weiny , "linux-rdma@vger.kernel.org" Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2033 Lines: 56 On Tue, Jul 21, 2015 at 5:56 AM, Alex Thorlton wrote: > On Mon, Jul 20, 2015 at 11:28:03AM -0500, Alex Thorlton wrote: >> I've got some time on the large machine later today. I'll give this a >> try then. > > I ran a boot with this patch applied: > > diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h > index 83e80ab..c84aea0 100644 > --- a/include/linux/mlx4/device.h > +++ b/include/linux/mlx4/device.h > @@ -45,7 +45,7 @@ > #include > > #define MAX_MSIX_P_PORT 17 > -#define MAX_MSIX 64 > +#define MAX_MSIX 8192 > #define MSIX_LEGACY_SZ 4 > #define MIN_MSIX_P_PORT 5 > > I went for a max of 8192, since I was actually booting the machine with > 6144 cores (not 4096) for this run. It doesn't look like this fixed the > problem. I still saw the same errors during boot. > > FWIW, the module does appear to still successfully load: > > 8<--- > # lsmod | grep mlx > mlx4_ib 151552 0 > ib_sa 32768 1 mlx4_ib > ib_mad 49152 2 ib_sa,mlx4_ib > ib_core 102400 3 ib_sa,mlx4_ib,ib_mad > mlx4_core 278528 1 mlx4_ib > --->8 > > If the module loading is good enough, and we should just ignore the > errors, then I'm fine with that. Just wanting to make sure that > everything is behaving correctly. It shouldn't be a problem, as all unused/erroneous EQs get "-1". We'll try to reproduce the problem here, it might take awhile though. Thanks for checking this, Matan > > - Alex > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/