Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754549AbZALH7n (ORCPT ); Mon, 12 Jan 2009 02:59:43 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751840AbZALH7b (ORCPT ); Mon, 12 Jan 2009 02:59:31 -0500 Received: from csamuel.org ([207.210.213.14]:55586 "EHLO csamuel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751407AbZALH7a (ORCPT ); Mon, 12 Jan 2009 02:59:30 -0500 From: Chris Samuel To: linux-btrfs@vger.kernel.org Subject: Hard to debug kernel issues (was Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning) Date: Mon, 12 Jan 2009 18:59:16 +1100 User-Agent: KMail/1.10.1 (Linux/2.6.28-cs1; KDE/4.1.2; x86_64; ; ) Cc: David Woodhouse , Andi Kleen , Andrew Morton , Ingo Molnar , Linus Torvalds , Harvey Harrison , "H. Peter Anvin" , Chris Mason , Peter Zijlstra , Steven Rostedt , paulmck@linux.vnet.ibm.com, Gregory Haskins , Matthew Wilcox , Linux Kernel Mailing List , "linux-fsdevel" , Thomas Gleixner , Nick Piggin , Peter Morreale , Sven Dietrich References: <20090110030216.GW26290@one.firstfloor.org> <1231676801.25018.150.camel@macbook.infradead.org> In-Reply-To: <1231676801.25018.150.camel@macbook.infradead.org> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="nextPart4283922.WHj4pGa4DB"; protocol="application/pgp-signature"; micalg=pgp-sha1 Content-Transfer-Encoding: 7bit Message-Id: <200901121859.20893.chris@csamuel.org> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2336 Lines: 68 --nextPart4283922.WHj4pGa4DB Content-Type: text/plain; charset="iso-8859-15" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline On Sun, 11 Jan 2009 11:26:41 pm David Woodhouse wrote: > Sometimes you weren't going to get a backtrace if something goes wrong > _anyway_. Case in point - we've been struggling with some of our SuperMicro based=20 systems with AMD Barcelona B3 k10h CPUs *turning themselves off* when runni= ng=20 various HPC applications. Nothing in the kernel logs, nothing in the IPMI controller logs. It's just= =20 like someone has wandered in and held the power button down (and no, it's n= ot=20 that). It's been driving us up the wall. We'd assumed it was a hardware issue as it was happening with all sorts of= =20 kernels but today we tried 2.6.29-rc1 "just in case" and I've not been able= to=20 reproduce the crash (yet) on a node I can crash in about 30 seconds, and=20 rebooting back into 2.6.28 makes it crash again. If the test boxes are still alive tomorrow I might see if we can attempt so= me=20 form of a reverse bisect to track down what commit fixed it (git doesn't se= em=20 to support that so we've going to have to invert the good/bad commands). cheers, Chris =2D-=20 Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC This email may come with a PGP signature as a file. Do not panic. =46or more info see: http://en.wikipedia.org/wiki/OpenPGP --nextPart4283922.WHj4pGa4DB Content-Type: application/pgp-signature; name=signature.asc Content-Description: This is a digitally signed message part. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iQEVAwUASWr4WI1yjaOTJg85AQKTcAf+P0AcoISkYMQNMd/A77CGGEABURCdPauB I1VLpvvN4DteXebOhYQDvSU8VOP4nlQQUus1YQWiIMIfOh+jd+6c7RIfnMJVZY+6 HHzKLLuk4EcS/vIfCI9PEGySqmn1rFAo6Z/TAry1xXF6ihgle1f/QtbGqM72JCJT djYDJ8OmA2dFF8cq6lLz8lTWZTG9CEuYJU88dwjLmCr+DU6rJ543xd1QOeQSe0iP 8oqjIjtqnHvSa6hQul611JZ72pZh+VqvKO5SG1zNtETVKC2eyfsumys6qgrcocyr h4QcdDiu5fZMmWSRC0lVzhEzJ/Ve0Qh/EmtRkTvs8x6HtPyrggpo9A== =h4j+ -----END PGP SIGNATURE----- --nextPart4283922.WHj4pGa4DB-- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/