Date: Wed, 4 Feb 2015 16:50:34 -0800
From: Stephen Boyd <sboyd@codeaurora.org>
To: Daniel Thompson <daniel.thompson@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>, John Stultz <john.stultz@linaro.org>,
        linux-kernel@vger.kernel.org, patches@linaro.org,
        linaro-kernel@lists.linaro.org, Sumit Semwal <sumit.semwal@linaro.org>,
        Steven Rostedt <rostedt@goodmis.org>
Subject: Re: [PATCH v3 0/4] sched_clock: Optimize and avoid deadlock during
 read from NMI
Message-ID: <20150205005034.GA30372@codeaurora.org>
References: <1421859236-19782-1-git-send-email-daniel.thompson@linaro.org>
 <1422644602-11953-1-git-send-email-daniel.thompson@linaro.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <1422644602-11953-1-git-send-email-daniel.thompson@linaro.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2701
Lines: 70

On 01/30, Daniel Thompson wrote:
> This patchset optimizes the generic sched_clock implementation to
> significantly reduce the data cache profile. It also makes it safe to call
> sched_clock() from NMI (or FIQ on ARM).
> 
> The data cache profile of sched_clock() in both the original code and
> my previous patch was somewhere between 2 and 3 (64-byte) cache lines,
> depending on alignment of struct clock_data. After patching, the cache
> profile for the normal case should be a single cacheline.
> 
> NMI safety was tested on i.MX6 with perf drowning the system in FIQs and
> using the perf handler to check that sched_clock() returned monotonic
> values. At the same time I forcefully reduced kt_wrap so that
> update_sched_clock() is being called at >1000Hz.
> 
> Without the patches the above system is grossly unstable, surviving
> [9K,115K,25K] perf event cycles during three separate runs. With the
> patch I ran for over 9M perf event cycles before getting bored.

I wanted to see if there was any speedup from these changes so I
made a tight loop around sched_clock() that ran for 10 seconds
and I ran it 10 times before and after this patch series:

        unsigned long long clock, start_clock;
        int count = 0; 

        clock = start_clock = sched_clock();
        while ((clock - start_clock) < 10ULL * NSEC_PER_SEC) {
                clock = sched_clock();
                count++;
        }

        pr_info("Made %d calls in %llu ns\n", count, clock - start_clock);

Before
------
 Made 19218953 calls in 10000000439 ns
 Made 19212790 calls in 10000000438 ns
 Made 19217121 calls in 10000000142 ns
 Made 19227304 calls in 10000000142 ns
 Made 19217559 calls in 10000000142 ns
 Made 19230193 calls in 10000000290 ns
 Made 19212715 calls in 10000000290 ns
 Made 19234446 calls in 10000000438 ns
 Made 19226274 calls in 10000000439 ns
 Made 19236118 calls in 10000000143 ns
 
After
-----
 Made 19434797 calls in 10000000438 ns
 Made 19435733 calls in 10000000439 ns
 Made 19434499 calls in 10000000438 ns
 Made 19438482 calls in 10000000438 ns
 Made 19435604 calls in 10000000142 ns
 Made 19438551 calls in 10000000438 ns
 Made 19444550 calls in 10000000290 ns
 Made 19437580 calls in 10000000290 ns
 Made 19439429 calls in 10000048142 ns
 Made 19439493 calls in 10000000438 ns

So it seems to be a small improvement.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/