Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753197AbdIHBXe (ORCPT ); Thu, 7 Sep 2017 21:23:34 -0400 Received: from mail-pf0-f170.google.com ([209.85.192.170]:33326 "EHLO mail-pf0-f170.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752132AbdIHBXb (ORCPT ); Thu, 7 Sep 2017 21:23:31 -0400 X-Google-Smtp-Source: ADKCNb7gbq4kfIkfwUucofa5He5XkRgvq2mQPVA7XhdX+k0WwPTYao12IgTUIpob9gFstefjh1hxyw== Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (1.0) Subject: Re: [PATCH 1/2] x86/mm: Reinitialize TLB state on hotplug and resume From: Andy Lutomirski X-Mailer: iPhone Mail (14G60) In-Reply-To: Date: Thu, 7 Sep 2017 18:23:27 -0700 Cc: Ingo Molnar , Andy Lutomirski , X86 ML , Borislav Petkov , "linux-kernel@vger.kernel.org" , Linus Torvalds Message-Id: References: <20170907074834.tmwo6vsvody2qrlg@gmail.com> To: Jiri Kosina Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by nfs id v881NgvS016796 Content-Length: 2035 Lines: 46 > On Sep 7, 2017, at 12:55 PM, Jiri Kosina wrote: > > On Thu, 7 Sep 2017, Ingo Molnar wrote: > >>>> When Linux brings a CPU down and back up, it switches to init_mm and then >>>> loads swapper_pg_dir into CR3. With PCID enabled, this has the side effect >>>> of masking off the ASID bits in CR3. >>>> >>>> This can result in some confusion in the TLB handling code. If we >>>> bring a CPU down and back up with any ASID other than 0, we end up >>>> with the wrong ASID active on the CPU after resume. This could >>>> cause our internal state to become corrupt, although major >>>> corruption is unlikely because init_mm doesn't have any user pages. >>>> More obviously, if CONFIG_DEBUG_VM=y, we'll trip over an assertion >>>> in the next context switch. The result of *that* is a failure to >>>> resume from suspend with probability 1 - 1/6^(cpus-1). >>>> >>>> Fix it by reinitializing cpu_tlbstate on resume and CPU bringup. >>>> >>>> Reported-by: Linus Torvalds >>>> Reported-by: Jiri Kosina >>>> Fixes: 10af6235e0d3 ("x86/mm: Implement PCID based optimization: try to preserve old TLB entries using PCID") >>>> Signed-off-by: Andy Lutomirski >>> >>> Tested-by: Jiri Kosina >> >> The fix should be upstream already, as of 1c9fe4409ce3 and later. > > Hm, so I've just experienced two instances in a row of reboot just after > reading hibernation image (i.e. exactly the same symptom as before) even > with 3b9f8ed kernel (which contains the fix). Seems like the fix is either > incomplete (just the probability of it happening is lower), or I'm seeing > something differet with the same symptom. > > I'll try to figure out whether it's the same VM_BUG_ON() triggering, but > probably will be able to do so only tomorrow. > Nah, don't waste your time. I think I see the bug, and it's a different bug. It's an easy one-line fix, but I have to figure out how to test it. > -- > Jiri Kosina > SUSE Labs >