Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760167AbbEERVi (ORCPT ); Tue, 5 May 2015 13:21:38 -0400 Received: from claw.goop.org ([74.207.240.146]:43498 "EHLO claw.goop.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755745AbbEERVg (ORCPT ); Tue, 5 May 2015 13:21:36 -0400 Message-ID: <5548FC1A.7000806@goop.org> Date: Tue, 05 May 2015 10:21:30 -0700 From: Jeremy Fitzhardinge User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0 MIME-Version: 1.0 To: Juergen Gross , linux-kernel@vger.kernel.org, x86@kernel.org, hpa@zytor.com, tglx@linutronix.de, mingo@redhat.com, xen-devel@lists.xensource.com, konrad.wilk@oracle.com, david.vrabel@citrix.com, boris.ostrovsky@oracle.com, chrisw@sous-sol.org, akataria@vmware.com, rusty@rustcorp.com.au, virtualization@lists.linux-foundation.org, gleb@kernel.org, pbonzini@redhat.com, kvm@vger.kernel.org Subject: Re: [PATCH 0/6] x86: reduce paravirtualized spinlock overhead References: <1430391243-7112-1-git-send-email-jgross@suse.com> <55425ADA.4060105@goop.org> <554709BB.7090400@suse.com> In-Reply-To: <554709BB.7090400@suse.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1231 Lines: 30 On 05/03/2015 10:55 PM, Juergen Gross wrote: > I did a small measurement of the pure locking functions on bare metal > without and with my patches. > > spin_lock() for the first time (lock and code not in cache) dropped from > about 600 to 500 cycles. > > spin_unlock() for first time dropped from 145 to 87 cycles. > > spin_lock() in a loop dropped from 48 to 45 cycles. > > spin_unlock() in the same loop dropped from 24 to 22 cycles. Did you isolate icache hot/cold from dcache hot/cold? It seems to me the main difference will be whether the branch predictor is warmed up rather than if the lock itself is in dcache, but its much more likely that the lock code is icache if the code is lock intensive, making the cold case moot. But that's pure speculation. Could you see any differences in workloads beyond microbenchmarks? Not that its my call at all, but I think we'd need to see some concrete improvements in real workloads before adding the complexity of more pvops. J -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/