Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932403AbbHJRmo (ORCPT ); Mon, 10 Aug 2015 13:42:44 -0400 Received: from mail-ig0-f172.google.com ([209.85.213.172]:33899 "EHLO mail-ig0-f172.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932108AbbHJRml (ORCPT ); Mon, 10 Aug 2015 13:42:41 -0400 MIME-Version: 1.0 In-Reply-To: References: <20150724224258.GA23990@google.com> <20150728212944.GA12958@google.com> <20150729012255.GA18606@google.com> <20150729155509.GA31170@google.com> From: Bjorn Helgaas Date: Mon, 10 Aug 2015 12:42:19 -0500 Message-ID: Subject: Re: X-Gene: Unhandled fault: synchronous external abort in pci_generic_config_read32 To: Duc Dang Cc: Tanmay Inamdar , "linux-pci@vger.kernel.org" , linux-arm , "linux-kernel@vger.kernel.org" Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2228 Lines: 51 On Mon, Aug 10, 2015 at 12:16 PM, Duc Dang wrote: > On Monday, August 10, 2015, Bjorn Helgaas wrote: >> >> On Fri, Jul 31, 2015 at 12:00 PM, Duc Dang wrote: >> > On Wed, Jul 29, 2015 at 8:55 AM, Bjorn Helgaas >> > wrote: >> >> On Tue, Jul 28, 2015 at 08:22:55PM -0500, Bjorn Helgaas wrote: >> >>> On Tue, Jul 28, 2015 at 02:50:39PM -0700, Duc Dang wrote: >> >> >> >>> > Do you have another PCIe card to try on the same reboot test on this >> >>> > board? >> >>> >> >>> I've seen this on at least two Mellanox cards. I'm running similar >> >>> tests >> >>> on a different type of card now. >> >> >> >> FWIW, reboot tests on two machines with Mellanox cards failed, while >> >> the >> >> same test on a machine with a different proprietary card succeeded. >> > >> > Thanks, Bjorn. >> > >> > I don't have the same Mellanox card as yours, but I will also run >> > similar reboot test to see if I hit the same issue with my card. >> >> Any more hints on this? Nothing has changed on my end, so of course >> I'm still seeing this, always on machines with Mellanox, and never on >> other machines. Could this be a hardware issue like a signal >> integrity or margin issue? I don't know where to go from here because >> I'm not a hardware person, and I don't know anything to do in >> software. > > > Hi Bjorn, > > I tried to run similar reboot tests on 2 different Mellanox cards (Connect-X > family, one card has 2 10G interfaces, the other one has 1 port that > supports InfiniBand) with U-Boot 1.15.12 and linux 4.2-rc5 and I did not see > the crash that you encounterred. > > Did you check if your Mellanox cards have latest firmware? I did see some > link issues on my Mellanox cards with its old firmware before. Good idea; I'll check that, too. Also, I just learned that these cards on installed with an extender card because of some space issues, so we're going to test again without the extender. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/