Date: Fri, 4 Jun 2010 01:35:18 +1000
From: Nick Piggin <npiggin@suse.de>
To: Andi Kleen <andi@firstfloor.org>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>, Avi Kivity <avi@redhat.com>,
       Gleb Natapov <gleb@redhat.com>, linux-kernel@vger.kernel.org,
       kvm@vger.kernel.org, hpa@zytor.com, mingo@elte.hu, tglx@linutronix.de,
       mtosatti@redhat.com
Subject: Re: [PATCH] use unfair spinlock when running on hypervisor.
Message-ID: <20100603153518.GP6822@laptop>
References: <4C053ACC.5020708@redhat.com>
 <20100601172730.GB11880@basil.fritz.box>
 <4C05C722.1010804@redhat.com>
 <20100602085055.GA14221@basil.fritz.box>
 <4C061DAB.6000804@redhat.com>
 <20100603042051.GA5953@linux.vnet.ibm.com>
 <20100603103855.GG6822@laptop>
 <20100603120450.GH4035@linux.vnet.ibm.com>
 <20100603123832.GL6822@laptop>
 <20100603151730.GE4166@basil.fritz.box>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20100603151730.GE4166@basil.fritz.box>
User-Agent: Mutt/1.5.20 (2009-06-14)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 1825
Lines: 36

On Thu, Jun 03, 2010 at 05:17:30PM +0200, Andi Kleen wrote:
> On Thu, Jun 03, 2010 at 10:38:32PM +1000, Nick Piggin wrote:
> > And they aren't even using ticket spinlocks!!
> 
> I suppose they simply don't have unfair memory. Makes things easier.

That would certainly be a part of it, I'm sure they have stronger
fairness and guarantees at the expense of some performance. We saw the
spinlock starvation first on 8-16 core Opterons I think, wheras Altix
had been over 1024 cores and POWER7 1024 threads now apparently without
reported problems.

However I think more is needed than simply "fair" memory at the cache
coherency level, considering that for example s390 implements it simply
by retrying cas until it succeeds. So you could perfectly round-robin
all cache requests for the lock word, but one core could quite easily
always find it is granted the cacheline when the lock is already taken.

So I think actively enforcing fairness at the lock level would be
required. Something like if it is detected that a core is not making
progress on a tight cas loop, then it will need to enter a queue of
cores where the head of the queue is always granted the cacheline first
after it has been dirtied. Interrupts will need to be ignored from this
logic. This still doesn't solve the problem of an owner unfairly
releasing and grabbing the lock again, they could have more detection to
handle that.

I don't know how far hardware goes. Maybe it is enough to statistically
avoid starvation if memory is pretty fair. But it does seem a lot easier
to enforce fairness in software.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/