Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755882AbdIGTzl (ORCPT ); Thu, 7 Sep 2017 15:55:41 -0400 Received: from mx2.suse.de ([195.135.220.15]:42753 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1755798AbdIGTzj (ORCPT ); Thu, 7 Sep 2017 15:55:39 -0400 Date: Thu, 7 Sep 2017 21:55:38 +0200 (CEST) From: Jiri Kosina X-X-Sender: jkosina@wotan.suse.de To: Ingo Molnar cc: Andy Lutomirski , X86 ML , Borislav Petkov , "linux-kernel@vger.kernel.org" , Linus Torvalds Subject: Re: [PATCH 1/2] x86/mm: Reinitialize TLB state on hotplug and resume In-Reply-To: <20170907074834.tmwo6vsvody2qrlg@gmail.com> Message-ID: References: <20170907074834.tmwo6vsvody2qrlg@gmail.com> User-Agent: Alpine 2.21 (LSU 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1796 Lines: 38 On Thu, 7 Sep 2017, Ingo Molnar wrote: > > > When Linux brings a CPU down and back up, it switches to init_mm and then > > > loads swapper_pg_dir into CR3. With PCID enabled, this has the side effect > > > of masking off the ASID bits in CR3. > > > > > > This can result in some confusion in the TLB handling code. If we > > > bring a CPU down and back up with any ASID other than 0, we end up > > > with the wrong ASID active on the CPU after resume. This could > > > cause our internal state to become corrupt, although major > > > corruption is unlikely because init_mm doesn't have any user pages. > > > More obviously, if CONFIG_DEBUG_VM=y, we'll trip over an assertion > > > in the next context switch. The result of *that* is a failure to > > > resume from suspend with probability 1 - 1/6^(cpus-1). > > > > > > Fix it by reinitializing cpu_tlbstate on resume and CPU bringup. > > > > > > Reported-by: Linus Torvalds > > > Reported-by: Jiri Kosina > > > Fixes: 10af6235e0d3 ("x86/mm: Implement PCID based optimization: try to preserve old TLB entries using PCID") > > > Signed-off-by: Andy Lutomirski > > > > Tested-by: Jiri Kosina > > The fix should be upstream already, as of 1c9fe4409ce3 and later. Hm, so I've just experienced two instances in a row of reboot just after reading hibernation image (i.e. exactly the same symptom as before) even with 3b9f8ed kernel (which contains the fix). Seems like the fix is either incomplete (just the probability of it happening is lower), or I'm seeing something differet with the same symptom. I'll try to figure out whether it's the same VM_BUG_ON() triggering, but probably will be able to do so only tomorrow. -- Jiri Kosina SUSE Labs