Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752094AbbEFLzk (ORCPT ); Wed, 6 May 2015 07:55:40 -0400 Received: from cantor2.suse.de ([195.135.220.15]:52727 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751183AbbEFLzi (ORCPT ); Wed, 6 May 2015 07:55:38 -0400 Message-ID: <554A0132.3070802@suse.com> Date: Wed, 06 May 2015 13:55:30 +0200 From: Juergen Gross User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0 MIME-Version: 1.0 To: Jeremy Fitzhardinge , linux-kernel@vger.kernel.org, x86@kernel.org, hpa@zytor.com, tglx@linutronix.de, mingo@redhat.com, xen-devel@lists.xensource.com, konrad.wilk@oracle.com, david.vrabel@citrix.com, boris.ostrovsky@oracle.com, chrisw@sous-sol.org, akataria@vmware.com, rusty@rustcorp.com.au, virtualization@lists.linux-foundation.org, gleb@kernel.org, pbonzini@redhat.com, kvm@vger.kernel.org Subject: Re: [PATCH 0/6] x86: reduce paravirtualized spinlock overhead References: <1430391243-7112-1-git-send-email-jgross@suse.com> <55425ADA.4060105@goop.org> <554709BB.7090400@suse.com> <5548FC1A.7000806@goop.org> In-Reply-To: <5548FC1A.7000806@goop.org> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1645 Lines: 41 On 05/05/2015 07:21 PM, Jeremy Fitzhardinge wrote: > On 05/03/2015 10:55 PM, Juergen Gross wrote: >> I did a small measurement of the pure locking functions on bare metal >> without and with my patches. >> >> spin_lock() for the first time (lock and code not in cache) dropped from >> about 600 to 500 cycles. >> >> spin_unlock() for first time dropped from 145 to 87 cycles. >> >> spin_lock() in a loop dropped from 48 to 45 cycles. >> >> spin_unlock() in the same loop dropped from 24 to 22 cycles. > > Did you isolate icache hot/cold from dcache hot/cold? It seems to me the > main difference will be whether the branch predictor is warmed up rather > than if the lock itself is in dcache, but its much more likely that the > lock code is icache if the code is lock intensive, making the cold case > moot. But that's pure speculation. > > Could you see any differences in workloads beyond microbenchmarks? > > Not that its my call at all, but I think we'd need to see some concrete > improvements in real workloads before adding the complexity of more pvops. I did another test on a larger machine: 25 kernel builds (time make -j 32) on a 32 core machine. Before each build "make clean" was called, the first result after boot was omitted to avoid disk cache warmup effects. System time without my patches: 861.5664 +/- 3.3665 s with my patches: 852.2269 +/- 3.6629 s Juergen -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/