Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752779AbbGOMHs (ORCPT ); Wed, 15 Jul 2015 08:07:48 -0400 Received: from mail-am1on0088.outbound.protection.outlook.com ([157.56.112.88]:25665 "EHLO emea01-am1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751843AbbGOMHr (ORCPT ); Wed, 15 Jul 2015 08:07:47 -0400 X-Greylist: delayed 1988 seconds by postgrey-1.27 at vger.kernel.org; Wed, 15 Jul 2015 08:07:46 EDT Authentication-Results: spf=none (sender IP is 193.47.165.134) smtp.mailfrom=mellanox.com; vger.kernel.org; dkim=none (message not signed) header.d=none; Message-ID: <55A6450D.80300@mellanox.com> Date: Wed, 15 Jul 2015 14:33:33 +0300 From: Matan Barak User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: Alex Thorlton , Or Gerlitz CC: andrew banman , Linux Kernel , Doug Ledford , "Sean Hefty" , Hal Rosenstock , "Or Gerlitz" , "David S. Miller" , Roland Dreier , Moni Shoua , "Jack Morgenstein" , Yishai Hadas , Eran Ben Elisha , Ira Weiny , "linux-rdma@vger.kernel.org" Subject: Re: [BUG] mellanox IB driver fails to load on large config References: <20150710191506.GA52396@asylum.americas.sgi.com> <20150714182234.GD17920@asylum.americas.sgi.com> <20150714184820.GB58053@asylum.americas.sgi.com> <20150714202848.GD58053@asylum.americas.sgi.com> In-Reply-To: <20150714202848.GD58053@asylum.americas.sgi.com> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [10.223.0.82] X-EOPAttributedMessage: 0 X-Microsoft-Exchange-Diagnostics: 1;DB3FFO11OLC001;1:WJeFCP0M8Mg31g9MTWd6t17rPmicOg4xH+tXjA0++1P4z7itELX9kTZJsh0xVUvizX49OFys0tBfaskopaZvdn9uI1/AenEpdfHZEcJ+IuExaJEY3mrDwT2z+cFeUJNpwNWYGANOpp2WWVCqFBL9mWZMdsLIRLkNB5NWxb5FRqwO3uKjrejL1BK2xSzYR5x1Qw8a+XnP+nmpAcLd+px8cSCrxM89Am9GGnrXgzg02hHdv5wnO7HT0saG8xj+zBdOTU/Uotl3yVLfGZT5yqkFeDLiFF3G4u/x9oOKi3WlzN5bzul6pxGgjPykyb7d6cvxGw6d+NAlXVae3V6Dx2esnvIyehvDWNdHq+fhD/ynb+5OU9W6DXoi2eS9Wp54bCGLPXMTWaJWBUw2RCALc6XQaG9NlIhza8DfhG17c6febVCQkqPvM3NYWmKln4CPMJE4 X-Forefront-Antispam-Report: CIP:193.47.165.134;CTRY:IL;IPV:NLI;EFV:NLI;SFV:NSPM;SFS:(10009020)(6009001)(2980300002)(428002)(479174004)(377454003)(199003)(52604005)(24454002)(189002)(19580395003)(77156002)(62966003)(86362001)(6806004)(19580405001)(83506001)(33656002)(93886004)(87936001)(46102003)(23746002)(36756003)(65816999)(4001350100001)(92566002)(76176999)(50986999)(54356999)(50466002)(2950100001)(106466001)(77096005)(64126003)(101416001)(5001770100001)(105586002)(65806001)(189998001)(65956001)(47776003)(3940600001);DIR:OUT;SFP:1101;SCL:1;SRVR:DB3PR05MB284;H:mtlcas13.mtl.com;FPR:;SPF:None;MLV:sfv;A:1;MX:1;LANG:en; X-Microsoft-Exchange-Diagnostics: 1;DB3PR05MB284;2:qsBWmOKaY+oPq5/V4bp2VQBRaP3RjL5PorFBMshiwIDFRBKSmqSNcGMkjK0uYxKn;3:xbmvrMKlZZhSw9q7+45rlcSLJ5VlwoUS7YmIbHOnur/KXmyFVBTEdriGYca1/0nvJSjb/9SmOZ5i1VYEJYpoFnfuxoB9PbdYeum+FQ9pN/XD/xQZGI3JWj1VSllMfNt6nwY+msKYWqA5YdyK3Enu4473lsgtlRtMN3WQykiB0cmtvyMhlpxObcCk/1cm8x+stnYCSwjClImcp+DDMfShxU5a6lue+nLbSsE8fEuTM8iXlWmWADQwP1rBJhjvJXdh;25:ytw+H13bvS3+T4+PdRfqhbJd7KNkOgzSa6lRr5CryrfOK7AWs52aWyvJZPf1BC6rC50C7LXog+BO3akNIK7ORNOu7IwyszG8bzc0ERXfwUmof2QXxuqSD/uxuokRrPqrSbPUyZceGpqz1q4q3OqVjWTHaGe+Y5g7B+cvBV8Lzcda7jTfp2fsQ/LFh7m2r5bepAZAWZ1QiWuW5Am0N4EEzzZS9FnDTSHDd/YXdi2I8SdcaOggHxQThse0ePfBTWC/bSsYB3oxqIovIMoz7MLHkQ== X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:DB3PR05MB284; X-Microsoft-Exchange-Diagnostics: 1;DB3PR05MB284;20:Gv3EVsBIIBFq11AQFdBKhSoYdxsSzCWoauXYBfyGgoP3RimIPiFDIu45ojQAvJasxJhxZnZtsZQkHEQNr+7alqRXzF3Gb9A5AiUHp5pL+uWDPMklW1+bTP/AjQZlivqjL/vP0GaTowN3hd/JgBB905bfHBnVPh2A3C0PFc2oTwem9GXmUueewC+bclQKch1mdsYWFsetjN448BWDc90bsPggkjyYZ2q237irhq4sl1kkOMHU6F47zIGXBGkTJNSVgkqEYUD/C+r7FnZnCsnEVfkhUndsb6dPCFUSXjzQgEbG2gbPJwD/t5luXi/4dn6Sgij470hWwZs/fvS+2ktE93Ny/Y6UAiTH1u6rTxj/KETa5Om4CXyNqUmokrmhA9HlqKt+PLiuHBYoOziQvQZMfy5ZHc5goxj7CNeHYJn+rCwoekwB85jYIWQY013X/EE2i919GJCji2RTeC5Pyzgpfu7KfkJ/FIES/9XtGDii/mO2Ik7gqbQDmcTeVqSGEWt+;4:hrjTe2u/tML6X5nJKHZQ7or71iHDQTVcMQBZy3Cxo8/5DsxVM87kSz8q8wLyTrWBJTbLA7oBmUgUqToaw4pJxmHf45oe7sI0Rzz16CV1ASZ8IwGtmRNLBGOeoqZ2t8FjUdtLAgUR79Jd1hj1YY//dCKeBiQxWQ8BGZE/O031FH2fKiRVjH3IY1mXyEmtiioIzCIYcpjn53YybnwHdri01qpwpV9dWPhLQILHJt3LsGOTimiovvXLMCDk+1xx5F52E5RQcs9SBKZgM0dOfbUNwWKGyRaTv+2tZ/49dpALkSM= DB3PR05MB284: X-MS-Exchange-Organization-RulesExecuted X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(601004)(5005006)(3002001);SRVR:DB3PR05MB284;BCL:0;PCL:0;RULEID:;SRVR:DB3PR05MB284; X-Forefront-PRVS: 0638FD5066 X-Microsoft-Exchange-Diagnostics: =?Windows-1252?Q?1;DB3PR05MB284;23:Bb6V0SAegdwqjjO5IFKdHRM7W0FzpFvuZSU8rM?= =?Windows-1252?Q?5KymjAHOBUhSdk13octchgZDvZE5qVZ3XmY3KxgcGsyxbKJ1Z8KlW/At?= =?Windows-1252?Q?ymVp0xkKSGPyojtGBUIYWsXMNZPYWmgOkKtV5mRRtUOE0X9oK3pT3S+O?= =?Windows-1252?Q?pJPzZmW14t2qr7YG6x0ESyYaLhSg5Mvx1P4VcqS7TN4jEJhPRdCq+O3D?= =?Windows-1252?Q?e0jkrvamtsDHgIGxx6VE24sqiL0bbqjmLvNkt9fFCj0yWF+uUbkUEldp?= =?Windows-1252?Q?+P+fUXepQ7ZCwYH9EqpPeJYcMBv8UuHND5J+Xv0Tw0STS+mFMq9agz0+?= =?Windows-1252?Q?emUWFveO2tUtp4pebyeoXL/aJJPN5PrnZlR7oS87ZGoLmObpab8G3jaZ?= =?Windows-1252?Q?Fv8jUGxO2PlIl0t6wVqnnxGhmoq+qdcgyH5xBwWrBmUwrVCUF7iULfc7?= =?Windows-1252?Q?OiqIvnRSQUC0yFsBklgJDR7RYgviivDhE//TybdWdcqdy03PhLh5RF2j?= =?Windows-1252?Q?ohFm32EQglzNJdAdbs6v+lt4UHzRfZPrl7+hvSXHNrhubPxRRhp/wDgh?= =?Windows-1252?Q?uQGatL00MQ/dceugTxIYf5rF5HaSwPxzQWlraMer3GneX4mp/SODFeBT?= =?Windows-1252?Q?DPk0KyqbtOrzDt8ILcUk1Of/FuWu9VScVajxW5E39YqXfjNhcTgQP0TD?= =?Windows-1252?Q?Gu/wCLECnDvdhBNAbX9N2d2ef5wG0OMOQvZOkiwq3ybYW308hZiLR2+D?= =?Windows-1252?Q?/6I2xQEvUaFrP+4NfKtmUOES6Mq6qyUpJH77Hntc+MGAzK9J7dj0Dmbo?= =?Windows-1252?Q?JqtTINlqwHPVcOKOR/4HBFaxqK+ths+e/BfwIdzHrDzh4lSCH2ZHWMHu?= =?Windows-1252?Q?+k9SQVobGhdMBi7lAtMgnA0FgeIA3m8A50+1fycgxloURunNlaSuVtpI?= =?Windows-1252?Q?ShzTIRL9lQ4jkdvskbjBtO4B4aOXoGT/A0jtsY6WfXx0VTRU7bkB/Ayz?= =?Windows-1252?Q?v0uA1tSc8gzHBMORRI4pb0nLPi/Sr/UAD3w7zENsWYxPzyRqUH/JUm4n?= =?Windows-1252?Q?ctHW0RO33wAqByj4ol/R9rfAJ8C5n6G2T8YuvwVFhf7Sd4/illvSFKgi?= =?Windows-1252?Q?6GwysAXtfi8dDHRwQQ//w=3D?= X-Microsoft-Exchange-Diagnostics: 1;DB3PR05MB284;5:zlIErNW8yUiJPCQg3WH2px1ozfcCQx9hBOUq7oGZBBgWg0cBP+BU7l0oqy3lIunFUUhk3zrUaRv12J3/FnfXP0plueFr5szajkKMCDxKvMYB/QbOlQhDWIhawGo4x1IzfuPWUBsy1fR5pQ43qfgfGg==;24:XVZuMUZmV1GIQ31OLB82P7WhURt4BHxwV4kTjSRyPRtlxXLgIARkLrZZxKF0/cIcuB5fD34CQkQDcRZGqFnASAY09gC1nO7rdQu+ye/ELGo=;20:XyFTi3itaGwSq4FhiZD9y0cCL701JOJprbENYC6ch4YsdFR1y6kUemIhtjJ8inHTRsYlfptGWrVAZjb5LL30hw== X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 15 Jul 2015 11:34:35.0984 (UTC) X-MS-Exchange-CrossTenant-Id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=a652971c-7d2e-4d9b-a6a4-d149256f461b;Ip=[193.47.165.134];Helo=[mtlcas13.mtl.com] X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB3PR05MB284 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1665 Lines: 47 On 7/14/2015 11:28 PM, Alex Thorlton wrote: > On Tue, Jul 14, 2015 at 11:06:26PM +0300, Or Gerlitz wrote: >> On Tue, Jul 14, 2015 at 9:48 PM, Alex Thorlton wrote: >>> On Tue, Jul 14, 2015 at 01:22:34PM -0500, andrew banman wrote: >>>> On Sat, Jul 11, 2015 at 11:20:19PM +0300, Or Gerlitz wrote: >>>>> On Fri, Jul 10, 2015 at 10:15 PM, andrew banman wrote: >>>>>> I'm seeing a large number of allocation errors originating from the Mellanox IB >>>>>> driver when booting the 4.2-rc1 kernel on a 4096cpu 32TB memory system: >>>>> >>>>> Just to make sure, mlx4 works fine on this small (...) system with 4.1 >>>>> and 4.2-rc1 breaks, or 4.2-rc1 is the 1st time you're trying that >>>>> config? >>>> >>>> I'll let Alex comment on that, he did some testing on that. >>> >>> I started seeing this on a 4.1-rc8 kernel, so it's been around for a >>> little while. It may have been around before 4.1-rc8, but I haven't run >>> any kernels older than that on the big machine for some time. >> >> To make sure I am correctly following, on 4.1-rc8 you also see >> something, right? > > Yes, that's correct. > >> are these the same messages or different ones? if the latter send to us. > > We see the same exact messages on 4.1-rc8. Hi, We don't recall getting those error with 32cpu machines, but we'll try to reproduce this issue. Matan > > Thanks for looking into this! > > - Alex > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/