Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760632AbbBIJrZ (ORCPT ); Mon, 9 Feb 2015 04:47:25 -0500 Received: from mail-pd0-f182.google.com ([209.85.192.182]:36308 "EHLO mail-pd0-f182.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1759725AbbBIJrY (ORCPT ); Mon, 9 Feb 2015 04:47:24 -0500 Message-ID: <54D88222.8040000@linaro.org> Date: Mon, 09 Feb 2015 17:47:14 +0800 From: Daniel Thompson User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0 MIME-Version: 1.0 To: Will Deacon CC: Thomas Gleixner , John Stultz , "linux-kernel@vger.kernel.org" , "patches@linaro.org" , "linaro-kernel@lists.linaro.org" , Sumit Semwal , Stephen Boyd , Steven Rostedt , Russell King , Catalin Marinas Subject: Re: [PATCH v4 2/5] sched_clock: Optimize cache line usage References: <1421859236-19782-1-git-send-email-daniel.thompson@linaro.org> <1423396960-4824-1-git-send-email-daniel.thompson@linaro.org> <1423396960-4824-3-git-send-email-daniel.thompson@linaro.org> <20150209012801.GA13969@arm.com> In-Reply-To: <20150209012801.GA13969@arm.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2269 Lines: 48 On 09/02/15 09:28, Will Deacon wrote: > On Sun, Feb 08, 2015 at 12:02:37PM +0000, Daniel Thompson wrote: >> Currently sched_clock(), a very hot code path, is not optimized to >> minimise its cache profile. In particular: >> >> 1. cd is not ____cacheline_aligned, >> >> 2. struct clock_data does not distinguish between hotpath and >> coldpath data, reducing locality of reference in the hotpath, >> >> 3. Some hotpath data is missing from struct clock_data and is marked >> __read_mostly (which more or less guarantees it will not share a >> cache line with cd). >> >> This patch corrects these problems by extracting all hotpath data >> into a separate structure and using ____cacheline_aligned to ensure >> the hotpath uses a single (64 byte) cache line. > > Have you got any performance figures for this change, or is this just a > theoretical optimisation? It would be interesting to see what effect this > has on systems with 32-byte cachelines and also scenarios where there's > contention on the sequence counter. Most of my testing has focused on proving the NMI safety parts of the patch work as advertised so its mostly theoretical. However there are some numbers from simple tight loop calls to sched_clock (Stephen Boyd's results are more interesting than mine because I observe pretty wild quantization effects that render the results hard to trust): http://thread.gmane.org/gmane.linux.kernel/1871157/focus=1879265 Not sure what useful figures would be useful for a contended sequence counter. Firstly the counter is taken for write at 7/8 wrap time of the times so even for the fastest timers the interval is likely to be >3s and is very short duration. Additionally, the NMI safety changes make it possible to read the timer whilst it is being updated so it is only during the very short struct-copy/write/struct-copy/write update sequence that we will observe the extra cache line used for a read. Benchmarks that show the effect of update are therefore non-trivial to construct. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/