Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753057AbcDUQv3 (ORCPT ); Thu, 21 Apr 2016 12:51:29 -0400 Received: from mail.skyhub.de ([78.46.96.112]:34490 "EHLO mail.skyhub.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752036AbcDUQv2 (ORCPT ); Thu, 21 Apr 2016 12:51:28 -0400 Date: Thu, 21 Apr 2016 18:51:06 +0200 From: Borislav Petkov To: Marc Haber Cc: Paolo Bonzini , linux-kernel@vger.kernel.org, kvm ML Subject: Re: Major KVM issues with kernel 4.5 on the host Message-ID: <20160421165106.GK28821@pd.tnic> References: <20160317181128.GA30324@pd.tnic> <56EBD20A.1020608@redhat.com> <20160413183701.GC7600@torres.zugschlus.de> <570EADD2.8030300@redhat.com> <20160413222942.GD7600@torres.zugschlus.de> <570EEF6D.40307@redhat.com> <20160414052220.GE7600@torres.zugschlus.de> <20160421083948.GF21755@torres.zugschlus.de> <20160421123711.GD28821@pd.tnic> <20160421145005.GI21755@torres.zugschlus.de> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20160421145005.GI21755@torres.zugschlus.de> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2764 Lines: 84 On Thu, Apr 21, 2016 at 04:50:05PM +0200, Marc Haber wrote: > What bothers me is that since I ended up with a "suspect" commit that > actually results in a "good" kernel (running for 22 hours now), I must > have said "bad" to an actually "good" kernel, which means that I had > an unrelated crash or corruption. Is that reasoning correct? Hmm, did that "unrelated crash or corruption" have the same symptoms as the original one? > That one qualified as "good" six days ago. I'll retry, maybe I just > didn't wait long enough. So if the trigger time is varying so much, I'd try to double that to make sure I'm fairly certain about each commit I'm testing. Also, this is a single box we're talking about, right? And you're sure it hasn't had any corruption issues so far? I see you have amd64_edac loading, so it must have ECC DIMMs. Have you had any reports in the past of ECC errors in dmesg? Or other MCEs, lockups, etc? Can you grep your logs for stuff like "hardware error", "mce", "edac" etc? Do a case-insensitive search. > "Trying" means make oldconfig, make deb-pkg in my case right? Does it > matter what I answer to the numerous config questions that keep coming > up during the oldconfig step? What I do is: $ git bisect to mark the current commit after having tested it. Then I do $ yes "" | make oldconfig to set the new config options. Then $ make -j7 $ make modules_install install and reboot into the new kernel. Kernel name will possibly change each time so I write down on paper which kernel I'm testing. You can verify when booting it by doing: $ dmesg | head [ 0.000000] Linux version 4.6.0-rc2+ (boris@pd) (gcc version 5.3.1 20160101 (Debian 5.3.1-5) ) #1 SMP PREEMPT Wed Apr 6 20:22:51 CEST 2016 ... that date at the end of the line and number "#1" should be current. Number is also in .version and gets issued when you finish building: Kernel: arch/x86/boot/bzImage is ready (#1) > Would it help to explicitly mark > 0e749e54244eec87b2a3cd0a4314e60bc6781115 as good so that the knowledge > gained during the last week is not completely lost? I'd do the whole thing again, just to be sure. I know, bisection is very time-consuming :-\ And it is particularly annoying if it is done on the box I'm normally using daily. > So I need to git log | grep 46896c73c1a4 and apply the patch again > each time the commit is found? I think you can let git do that for ya: $ git branch --contains 46896c73c1a4 * (HEAD detached at 46896c73c1a4) that lists that the current checked out HEAD contains that commit. If you do $ git checkout 46896c73c1a4~1 then that "(HEAD detached..." line is not in the list of branches containing it. HTH. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply.