Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760126AbXFRIWm (ORCPT ); Mon, 18 Jun 2007 04:22:42 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1756234AbXFRIWf (ORCPT ); Mon, 18 Jun 2007 04:22:35 -0400 Received: from smtp2.linux-foundation.org ([207.189.120.14]:53371 "EHLO smtp2.linux-foundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755161AbXFRIWe (ORCPT ); Mon, 18 Jun 2007 04:22:34 -0400 Date: Mon, 18 Jun 2007 01:20:55 -0700 From: Andrew Morton To: Ingo Molnar , Ravikiran G Thirumalai Cc: Miklos Szeredi , cebbert@redhat.com, chris@atlee.ca, linux-kernel@vger.kernel.org, tglx@linutronix.de, torvalds@linux-foundation.org Subject: Re: [BUG] long freezes on thinkpad t60 Message-Id: <20070618012055.81a7c837.akpm@linux-foundation.org> In-Reply-To: <20070618081204.GA11153@elte.hu> References: <20070524144447.GA25068@elte.hu> <20070524210153.GB19672@elte.hu> <20070616103707.GA28096@elte.hu> <20070618064343.GA31113@elte.hu> <20070618081204.GA11153@elte.hu> X-Mailer: Sylpheed 2.4.1 (GTK+ 2.8.17; x86_64-unknown-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1601 Lines: 31 On Mon, 18 Jun 2007 10:12:04 +0200 Ingo Molnar wrote: > ----------------------------------------------------> > Subject: [patch] x86: fix spin-loop starvation bug > From: Ingo Molnar > > Miklos Szeredi reported very long pauses (several seconds, sometimes > more) on his T60 (with a Core2Duo) which he managed to track down to > wait_task_inactive()'s open-coded busy-loop. He observed that an > interrupt on one core tries to acquire the runqueue-lock but does not > succeed in doing so for a very long time - while wait_task_inactive() on > the other core loops waiting for the first core to deschedule a task > (which it wont do while spinning in an interrupt handler). > > The problem is: both the spin_lock() code and the wait_task_inactive() > loop uses cpu_relax()/rep_nop(), so in theory the CPU should have > guaranteed MESI-fairness to the two cores - but that didnt happen: one > of the cores was able to monopolize the cacheline that holds the > runqueue lock, for extended periods of time. > > This patch changes the spin-loop to assert an atomic op after every REP > NOP instance - this will cause the CPU to express its "MESI interest" in > that cacheline after every REP NOP. Kiran, if you're still able to reproduce that zone->lru_lock starvation problem, this would be a good one to try... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/