Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756925AbbBEKVg (ORCPT ); Thu, 5 Feb 2015 05:21:36 -0500 Received: from mail-wi0-f176.google.com ([209.85.212.176]:39056 "EHLO mail-wi0-f176.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754014AbbBEKVe (ORCPT ); Thu, 5 Feb 2015 05:21:34 -0500 Message-ID: <54D3442C.3090205@linaro.org> Date: Thu, 05 Feb 2015 10:21:32 +0000 From: Daniel Thompson User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0 MIME-Version: 1.0 To: Stephen Boyd CC: Thomas Gleixner , John Stultz , linux-kernel@vger.kernel.org, patches@linaro.org, linaro-kernel@lists.linaro.org, Sumit Semwal , Steven Rostedt , Russell King , Will Deacon , Catalin Marinas Subject: Re: [PATCH v3 2/4] sched_clock: Optimize cache line usage References: <1421859236-19782-1-git-send-email-daniel.thompson@linaro.org> <1422644602-11953-1-git-send-email-daniel.thompson@linaro.org> <1422644602-11953-3-git-send-email-daniel.thompson@linaro.org> <20150205011407.GB30372@codeaurora.org> In-Reply-To: <20150205011407.GB30372@codeaurora.org> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2808 Lines: 93 On 05/02/15 01:14, Stephen Boyd wrote: > On 01/30, Daniel Thompson wrote: >> diff --git a/kernel/time/sched_clock.c b/kernel/time/sched_clock.c >> index 3d21a8719444..cb69a47dfee4 100644 >> --- a/kernel/time/sched_clock.c >> +++ b/kernel/time/sched_clock.c >> @@ -18,28 +18,44 @@ >> #include >> #include >> >> -struct clock_data { >> - ktime_t wrap_kt; >> +/** >> + * struct clock_read_data - data required to read from sched_clock >> + * > > Nitpick: Won't kernel-doc complain that members aren't > documented? It does indeed. I'll add descriptions here... >> + * Care must be taken when updating this structure; it is read by >> + * some very hot code paths. It occupies <=48 bytes and, when combined >> + * with the seqcount used to synchronize access, comfortably fits into >> + * a 64 byte cache line. >> + */ >> +struct clock_read_data { >> u64 epoch_ns; >> u64 epoch_cyc; >> - seqcount_t seq; >> - unsigned long rate; >> + u64 sched_clock_mask; >> + u64 (*read_sched_clock)(void); >> u32 mult; >> u32 shift; >> bool suspended; >> }; >> >> +/** >> + * struct clock_data - all data needed for sched_clock (including >> + * registration of a new clock source) >> + * > > Same comment. ... and here. >> + * The ordering of this structure has been chosen to optimize cache >> + * performance. In particular seq and read_data (combined) should fit >> + * into a single 64 byte cache line. >> + */ >> +struct clock_data { >> + seqcount_t seq; >> + struct clock_read_data read_data; >> + ktime_t wrap_kt; >> + unsigned long rate; >> +}; >> @@ -60,15 +79,16 @@ unsigned long long notrace sched_clock(void) >> { >> u64 cyc, res; >> unsigned long seq; >> + struct clock_read_data *rd = &cd.read_data; >> >> do { >> seq = raw_read_seqcount_begin(&cd.seq); >> >> - res = cd.epoch_ns; >> - if (!cd.suspended) { >> - cyc = read_sched_clock(); >> - cyc = (cyc - cd.epoch_cyc) & sched_clock_mask; >> - res += cyc_to_ns(cyc, cd.mult, cd.shift); >> + res = rd->epoch_ns; >> + if (!rd->suspended) { > > Should this have likely() treatment? It would be really nice if > we could use static branches here to avoid any branch penalty at > all. I guess that would need some sort of special cased > stop_machine() though. Or I wonder if we could replace > rd->read_sched_clock() with a dumb function that returns > cd.epoch_cyc so that the math turns out to be 0? Great idea. Making this code branchless with a special function sounds very much better than using likely(). -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/