Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752493AbcDWX6Z (ORCPT ); Sat, 23 Apr 2016 19:58:25 -0400 Received: from mail.skyhub.de ([78.46.96.112]:56116 "EHLO mail.skyhub.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751706AbcDWX6Y (ORCPT ); Sat, 23 Apr 2016 19:58:24 -0400 Date: Sun, 24 Apr 2016 01:57:59 +0200 From: Borislav Petkov To: Marc Haber Cc: Paolo Bonzini , linux-kernel@vger.kernel.org, kvm ML Subject: Re: Major KVM issues with kernel 4.5 on the host Message-ID: <20160423235759.GA25381@pd.tnic> References: <20160413222942.GD7600@torres.zugschlus.de> <570EEF6D.40307@redhat.com> <20160414052220.GE7600@torres.zugschlus.de> <20160421083948.GF21755@torres.zugschlus.de> <20160421123711.GD28821@pd.tnic> <20160421145005.GI21755@torres.zugschlus.de> <20160421165106.GK28821@pd.tnic> <20160421200433.GL21755@torres.zugschlus.de> <20160423160429.GL8531@pd.tnic> <20160423184341.GA21755@torres.zugschlus.de> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20160423184341.GA21755@torres.zugschlus.de> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1066 Lines: 34 On Sat, Apr 23, 2016 at 08:43:41PM +0200, Marc Haber wrote: > Uncorrectable errors would still be identified by the ECC hardware, Not if the hardware decides to syncflood so that we don't even get to run the #MC handler... > and the box wouldn't be perfectly fine with an "old" kernel. Maybe the "old" kernel is not causing all the required ingredients to come together for the uncorrectable error to happen. But yeah, I agree, the fact that 4.4 is fine kinda doesn't fit with the uncorrectable error theory. > Yes, that would be in the logs. Presumably. And see above. > But we still postulate that the issue does only show on older AMD > CPUs. Otherwise, I wouldn't be the only one making this experience. It actually shows only on this one system. At least I'm not aware of any other report of the same issue. My system with a F10h, rev E is just fine. > Do you want me to memtest for 24 hours? Yeah, that memtest crap never triggers any ECCs. But if you're bored, why not... -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply.