From: Mark Langsdorf <mark.langsdorf@amd.com>
To: Ingo Molnar <mingo@elte.hu>
Subject: Re: invalidate caches before going into suspend
Date: Wed, 13 Aug 2008 12:09:57 -0500
User-Agent: KMail/1.9.9
CC: linux-kernel@vger.kernel.org,
       Linus Torvalds <torvalds@linux-foundation.org>,
       "H. Peter Anvin" <hpa@zytor.com>, Thomas Gleixner <tglx@linutronix.de>
References: <200808131141.18003.mark.langsdorf@amd.com> <20080813164728.GD5720@elte.hu>
In-Reply-To: <20080813164728.GD5720@elte.hu>
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-ID: <200808131209.57534.mark.langsdorf@amd.com>
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2270
Lines: 65

On Wednesday 13 August 2008, Ingo Molnar wrote:
> 
> * Mark Langsdorf <mark.langsdorf@amd.com> wrote:
> 
> > When a CPU core is shut down, all of its caches need to be flushed to 
> > prevent stale data from causing errors if the core is resumed. Current 
> > Linux suspend code performs an assignment after the flush, which can 
> > add dirty data back to the cache.  On some AMD platforms, additional 
> > speculative reads have caused crashes on resume because of this dirty 
> > data.
> > 
> > Relocate the cache flush to be the very last thing done before 
> > halting.
> 
> nice catch! Applied to x86/urgent.
> 
> I'm really curious: how did you find this bug? Did you see a CPU come up 
> as !CPU_DEAD?

AMD's diagnostic code for new CPUs was hanging when coming out of suspend,
so I presume it was hitting a bug check for not !CPU_DEAD.  I got the
debug lab reports second hand.  They traced the root cause to dirty data
being preserved in the cache and suggested relocating the wbinvd().

> please send a patch for the 32-bit side too, it has the same bug.
> 
> also, we might be safer if the wbinvd(), the CLI and the halt was in a 
> single assembly sequence:

> to make sure the compiler doesnt ever insert something into this 
> codepath? [ And note the double cli which would be further 
> robustification - in theory we could get a spurious interrupt straight 
> after the wbinvd. ] Hm?

I don't think it's necessary.  I can submit a delta patch later if you
think it's really necessary.


Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>

diff -r 1e74a821dd00 arch/x86/kernel/process_32.c
--- a/arch/x86/kernel/process_32.c	Tue Aug 12 12:04:12 2008 -0500
+++ b/arch/x86/kernel/process_32.c	Wed Aug 13 06:40:00 2008 -0500
@@ -95,11 +95,11 @@ static inline void play_dead(void)
 {
 	/* This must be done before dead CPU ack */
 	cpu_exit_clear();
-	wbinvd();
 	mb();
 	/* Ack it */
 	__get_cpu_var(cpu_state) = CPU_DEAD;
 
+	wbinvd();
 	/*
 	 * With physical CPU hotplug, we should halt the cpu
 	 */


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/