2016-11-19 16:01:25

by Nicolai Stange

[permalink] [raw]
Subject: [RFC v8 00/28] adapt clockevents frequencies to mono clock

Goal: avoid programming ced devices too early for large deltas, for
details, c.f. the description of [23/28].

Previous v7 can be found at [1].

While the runtimes looked Ok for x86, you were concerned about
outliers on ARM:

Thomas Gleixner <[email protected]> writes:
> On Wed, 21 Sep 2016, Nicolai Stange wrote:
>
>> On a Raspberry Pi 2B (bcm2836, ARMv7) with CONFIG_SMP=y, the mean over
>> ~5300 samples is 5.14+/-1.04us with a max of 11.15us.
>
> So why is the variance that high?
> You have an outlier on that intel as well which might be caused by
> NMI, but it might also be a systematic issue depending on the input
> parameters. 11 us on that ARM worries me.

I did those benchmarks on a defconfig'd linux-next with a Raspbian
running in userspace. This turned out to be the worst choice I could
have made.
- The USB host controller driven by dwc2 fired 8000 interrupts per
second.
- Missing cpufreq support in upstream linux caused the core to always
run at its minimum speed.
- The Raspbian userspace was a mostly idle but quite uncontrolled
workload.

So, for the next round of measurements, I
- disabled almost all drivers, USB in particular,
- told the firmware to always run the core at its nominal max of 900MHz
- and used Busybox for the userspace.

While the adjustment process, i.e. the input parameters, had formerly
been driven by a NTP daemon, I switched to injecting deterministic
offsets via adjtimex(2) from time to time. Due to their exponential
decay, the adjustments span a larger range now.

On top of that, I generated various artificial workloads by means of
stress(1) and it turned out that clockevents_adjust_all_freqs()'
runtime is
- insensitive to input parameters,
- insensitive to CPU load (as it should),
- very sensitive to memory load.

Actual numbers can be found in the description of
[23/28] ("timekeeping: inform clockevents about freq adjustments"):
- CPU stressed system
Mean: 1733.75+-81.10
Quantiles:
0% 25% 50% 75% 100%
1458 1671 1717 1800 2083 (ns)

- idle system gives numbers very similar to the CPU stressed case

- Memory stressed system
Mean: 8750.73+-1034.77
Quantiles:
0% 25% 50% 75% 100%
3854 8171 8901 9452 13593 (ns)

Note that the "memory stressed system" is the ultimate worst case
scenario: all TLB, data and instruction caches are presumably cold and
memory is busy.

I did my best to improve on the memory loaded situation, c.f. the new
[27/28] ("clockevents: optimize struct clock_event_device layout")
and
[28/28] ("clockevents: move non-adjustable devices to the tail of the list")

The numbers finally became
Mean: 6505.18+-740.85
Quantiles:
0% 25% 50% 75% 100%
2708 6054 6523 6980 10885 (ns)

These are all >12h runs which accumulated >100000 samples and I think
that I may safely claim these 10.9us being the upper bound under a
worst case workload.

I played around with some other things like
- prefetchw()ing the clock_event_device structs,
- moving the clockevents_lock into .data, namely into the
clockevent_devices list head's cacheline
with no further improvement.

A remaining possibility would be to always_inline
__clockevents_calc_adjust_freq(): for some reason, gcc emits it into a
page different from clockevents_adjust_all_freqs()' one. However, I
didn't do that for now.

Final note: clockevents_adjust_all_freqs()' runtime is
O(#adjustable clockevent devices). That ARM has 5 of them. Due to lack
of hardware access, I can't tell how the performance might look
like on NUMA or machines with hundreds of cores.

So, I think there are three options to proceed from here:
- leave this series (modulo other review comments) as is,
i.e. continue to call clockevents_adjust_all_freqs() from
update_wall_time -> ... -> timekeeping_freqadjust()
in interrupt context,
- move the call to clockevents_adjust_all_freqs() out of interrupt
context into some scheduled work
- or abandon this series because it's too intrusive/complex
when weighed against the improvements it attempts to achieve.


Thanks,

Nicolai

[1] http://lkml.kernel.org/r/[email protected]



Changes to v7:
Rebased against next-20161117.

In order to make some room in struct clock_event_device's first
cacheline,
[20/28] ("clockevents: degrade ->min_delta_ticks to unsigned int")
[21/28] ("clockevents: pack ->state_use_accessors and ->features together")
have been added.

This freed space is used by the new
[27/28] ("clockevents: optimize struct clock_event_device layout")

The also new
[26/28] ("clockevents: degrade ->retries to unsigned int")
allows the just mentioned [27/28] to keep struct clock_event_device
from crossing an extra multiple of the cacheline size on x86_64.

The relative order of former
[14/23] ("clockevents: decouple ->max_delta_ns from ->max_delta_ticks")
and
[15-20/23], the ->min_delta_ns => ->min_delta_ticks transformation,
has been reversed because this allows for a more natural evolution
of struct clock_event_device w.r.t. to its final layout.
The latter subseries can now be found at [13-18/28] while
the "decouple ->max_delta_ns from ->max_delta_ticks" patch is at [22/28].

[28/28] ("clockevents: move non-adjustable devices to the tail of the list")
is new.

Finally, it's worth mentioning that in former
[15/23] ("clockevents: do comparison of delta against minimum in terms of cycles"),
now being [13/28], the superfluous
dev->min_delta_ticks_adjusted = max(dev->min_delta_ticks_adjusted,
dev->min_delta_ticks);
has been purged: cev_delta2ns() always rounds up and this change avoids
an access to the cache-cold ->min_delta_ticks from
clockevents_increase_min_delta().

Per-patch details:
[1-10/28]: unchanged

[11/28] ("clockevents: always initialize ->min_delta_ns and ->max_delta_ns")
- Skip CLOCK_EVT_FEAT_DUMMY devices in __clockevents_update_bounds().

[12/28] ("many clockevent drivers: don't set ->min_delta_ns and ->max_delta_ns")
- Unchanged.

[13/28] ("clockevents: do comparison of delta against minimum in terms of cycles")
- Former [15/23].
- As mentioned above, purge the
dev->min_delta_ticks_adjusted = max(dev->min_delta_ticks_adjusted,
dev->min_delta_ticks);
- Rebase in order to apply before former
[14/23] ("clockevents: decouple ->max_delta_ns from ->max_delta_ticks")
which is now at [22/28]

[14/28] ("clockevents: clockevents_program_min_delta(): don't set ->next_event")
- Former [16/23], otherwise unchanged.

[15/28] ("clockevents: use ->min_delta_ticks_adjusted to program minimum delta")
- Former [17/23], otherwise unchanged.

[16/28] ("clockevents: min delta increment: calculate min_delta_ns from ticks")
- Former [18/23].
- Rebase in order to apply with the max(...) removal in [13/28].

[17/28] ("timer_list: print_tickdevice(): calculate ->min_delta_ns dynamically")
- Former [19/23].
- Rebase in order to apply before former
[14/23] ("clockevents: decouple ->max_delta_ns from ->max_delta_ticks")
which is now at [22/28].

[18/28] ("clockevents: purge ->min_delta_ns")
- Former [20/23].
- Rebase in order to apply before former
[14/23] ("clockevents: decouple ->max_delta_ns from ->max_delta_ticks")
which is now at [22/28].
- Rebase in order to apply to apply with the max(...) removal in [13/28].

[19/28] ("clockevents: introduce CLOCK_EVT_FEAT_NO_ADJUST flag")
- Former [13/23], otherwise unchanged.

[20/28] ("clockevents: degrade ->min_delta_ticks to unsigned int")
- New.

[21/28] ("clockevents: pack ->state_use_accessors and ->features together")
- New.

[22/28] ("clockevents: decouple ->max_delta_ns from ->max_delta_ticks")
- Former [14/23].
- Rebase to apply at this position in the series.

[23/28] ("clockevents: initial support for mono to raw time conversion")
- Former [21/23].
- Changes to clockchips.h rebased in order to apply with the new
layout of struct clock_event_device.
- Rebase in order to apply with the max(...) removal in [13/28].
- Rebase in order to apply after the new
[20/28] ("clockevents: degrade ->min_delta_ticks to unsigned int")
- Skip CLOCK_EVT_FEAT_DUMMY devices in __clockevents_adjust_freq().

[24/28] ("clockevents: make setting of ->mult and ->mult_adjusted atomic")
- Former [22/23], otherwise unchanged.

[25/28] ("timekeeping: inform clockevents about freq adjustments")
- Former [23/23].
- Add benchmark results to description.
- Skip CLOCK_EVT_FEAT_DUMMY devices in clockevents_adjust_all_freqs().

[26/28] ("clockevents: degrade ->retries to unsigned int")
- New.

[27/28] ("clockevents: optimize struct clock_event_device layout")
- New.

[28/28] ("clockevents: move non-adjustable devices to the tail of the list")
- New.


Changes to v6:
Rebased against next-20160916.

[1/23] ("clocksource: sh_cmt: compute rate before registration again")
Do not remove braces at if statement.


Changes to v5:
[21/23] ("clockevents: initial support for mono to raw time conversion")
Replace the max_t() in
adj = max_t(u64, adj, mult_ce_raw / 8);
by min_t(): mult_ce_raw / 8 actually sets an upper bound on the
mult adjustments.

[23/23] ("timekeeping: inform clockevents about freq adjustments")
Move the clockevents_adjust_all_freqs() invocation from
timekeeping_apply_adjustment() to timekeeping_freqadjust().
Reason is given in the patch description.


Changes to v4:
[1-12/23] Unchanged

[13/23] ("clockevents: introduce CLOCK_EVT_FEAT_NO_ADJUST flag")
New.

[14/23] ("clockevents: decouple ->max_delta_ns from ->max_delta_ticks")
New. Solves the overflow problem the former
[13/22] ("clockevents: check a programmed delta's bounds in terms of cycles")
from v4 introduced.

(Note that the former
[14/22] ("clockevents: clockevents_program_event(): turn clc into unsigned long")
from v4 has been purged.)

[15/23] ("clockevents: do comparison of delta against minimum in terms of cycles")
This is the former
[13/22] ("clockevents: check a programmed delta's bounds in terms of cycles"),
but only for the ->min_delta_* -- the ->max_delta_* are handled by [14/23] now.

[16/23] ("clockevents: clockevents_program_min_delta(): don't set ->next_event")
Former [15/22] unchanged.

[17/23] ("clockevents: use ->min_delta_ticks_adjusted to program minimum delta")
Former [16/22]. Trivially fix compilation error with
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=n.

[18/22] ("clockevents: min delta increment: calculate min_delta_ns from ticks")
Former [17/22] unchanged.

[19/23] ("timer_list: print_tickdevice(): calculate ->min_delta_ns dynamically")
Corresponds to former
[18/22] ("timer_list: print_tickdevice(): calculate ->*_delta_ns dynamically")
from v4, but only for ->min_delta_ns. The changes required for the display of
->max_delta_ns are now being made in [14/23] already.

[20/23] ("clockevents: purge ->min_delta_ns")
Corresponds to former
[19/22] ("clockevents: purge ->min_delta_ns and ->max_delta_ns"),
but with ->max_delta_ns being kept.

[21/23] ("clockevents: initial support for mono to raw time conversion")
Former [20/22] with the following changes:
- Don't adjust mult for those ced's that have CLOCK_EVT_FEAT_NO_ADJUST set.
- Don't meld __clockevents_update_bounds() into __clockevents_adjust_freq()
anymore: the bounds for those devices having CLOCK_EVT_FEAT_NO_ADJUST set
must have got their bounds set as well.
- In __clockevents_calc_adjust_freq(), make sure that the adjusted mult
doesn't exceed the original by more than 12.5%. C.f. [14/23].
- In timekeeping, define timekeeping_get_mono_mult() only for
CONFIG_GENERIC_CLOCKEVENTS=y.

[22/23] ("clockevents: make setting of ->mult and ->mult_adjusted atomic")
Former [12/22], but with the description updated: previously, it said that
this patch would introduce a new locking dependency. This is not true.

[23/23] ("timekeeping: inform clockevents about freq adjustments")
Former [22/22] with the following changes:
- Don't adjust mult for those ced's that have CLOCK_EVT_FEAT_NO_ADJUST set.
- In clockevents_adjust_all_freqs(), reuse the adjusted cached mult only
if the associated ->shift also matches.
- Introduce noop clockevents_adjust_all_freqs() in order to fix a
compilation error with CONFIG_GENERIC_CLOCKEVENTS=n.


Nicolai Stange (28):
clocksource: sh_cmt: compute rate before registration again
clocksource: sh_tmu: compute rate before registration again
clocksource: em_sti: split clock prepare and enable steps
clocksource: em_sti: compute rate before registration
clocksource: h8300_timer8: don't reset rate in ->set_state_oneshot()
clockevents: make clockevents_config() static
many clockevent drivers: set ->min_delta_ticks and ->max_delta_ticks
arch/s390/kernel/time: set ->min_delta_ticks and ->max_delta_ticks
arch/x86/platform/uv/uv_time: set ->min_delta_ticks and
->max_delta_ticks
arch/tile/kernel/time: set ->min_delta_ticks and ->max_delta_ticks
clockevents: always initialize ->min_delta_ns and ->max_delta_ns
many clockevent drivers: don't set ->min_delta_ns and ->max_delta_ns
clockevents: do comparison of delta against minimum in terms of cycles
clockevents: clockevents_program_min_delta(): don't set ->next_event
clockevents: use ->min_delta_ticks_adjusted to program minimum delta
clockevents: min delta increment: calculate min_delta_ns from ticks
timer_list: print_tickdevice(): calculate ->min_delta_ns dynamically
clockevents: purge ->min_delta_ns
clockevents: introduce CLOCK_EVT_FEAT_NO_ADJUST flag
clockevents: degrade ->min_delta_ticks to unsigned int
clockevents: pack ->state_use_accessors and ->features together
clockevents: decouple ->max_delta_ns from ->max_delta_ticks
clockevents: initial support for mono to raw time conversion
clockevents: make setting of ->mult and ->mult_adjusted atomic
timekeeping: inform clockevents about freq adjustments
clockevents: degrade ->retries to unsigned int
clockevents: optimize struct clock_event_device layout
clockevents: move non-adjustable devices to the tail of the list

arch/avr32/kernel/time.c | 4 +-
arch/blackfin/kernel/time-ts.c | 8 +-
arch/c6x/platforms/timer64.c | 4 +-
arch/hexagon/kernel/time.c | 4 +-
arch/m68k/coldfire/pit.c | 6 +-
arch/microblaze/kernel/timer.c | 6 +-
arch/mips/alchemy/common/time.c | 4 +-
arch/mips/jz4740/time.c | 4 +-
arch/mips/kernel/cevt-bcm1480.c | 4 +-
arch/mips/kernel/cevt-ds1287.c | 4 +-
arch/mips/kernel/cevt-gt641xx.c | 4 +-
arch/mips/kernel/cevt-sb1250.c | 4 +-
arch/mips/kernel/cevt-txx9.c | 5 +-
arch/mips/loongson32/common/time.c | 4 +-
arch/mips/loongson64/common/cs5536/cs5536_mfgpt.c | 4 +-
arch/mips/loongson64/loongson-3/hpet.c | 4 +-
arch/mips/ralink/cevt-rt3352.c | 4 +-
arch/mips/sgi-ip27/ip27-timer.c | 4 +-
arch/mn10300/kernel/cevt-mn10300.c | 4 +-
arch/powerpc/kernel/time.c | 6 +-
arch/s390/kernel/time.c | 4 +-
arch/score/kernel/time.c | 6 +-
arch/sparc/kernel/time_32.c | 4 +-
arch/sparc/kernel/time_64.c | 6 +-
arch/tile/kernel/time.c | 4 +-
arch/um/kernel/time.c | 4 +-
arch/unicore32/kernel/time.c | 6 +-
arch/x86/kernel/apic/apic.c | 12 +-
arch/x86/lguest/boot.c | 4 +-
arch/x86/platform/uv/uv_time.c | 6 +-
arch/x86/xen/time.c | 8 +-
drivers/clocksource/dw_apb_timer.c | 5 +-
drivers/clocksource/em_sti.c | 49 +++--
drivers/clocksource/h8300_timer8.c | 8 -
drivers/clocksource/metag_generic.c | 4 +-
drivers/clocksource/numachip.c | 4 +-
drivers/clocksource/sh_cmt.c | 45 ++--
drivers/clocksource/sh_tmu.c | 26 +--
drivers/clocksource/timer-atlas7.c | 4 +-
include/linux/clockchips.h | 49 +++--
kernel/time/clockevents.c | 244 ++++++++++++++++++----
kernel/time/tick-broadcast-hrtimer.c | 2 -
kernel/time/tick-internal.h | 6 +
kernel/time/timekeeping.c | 16 ++
kernel/time/timer_list.c | 11 +-
45 files changed, 409 insertions(+), 219 deletions(-)

--
2.10.2


2016-11-19 16:01:36

by Nicolai Stange

[permalink] [raw]
Subject: [RFC v8 03/28] clocksource: em_sti: split clock prepare and enable steps

Currently, the em_sti driver prepares and enables the needed clock in
em_sti_enable(), potentially called through its clockevent device's
->set_state_oneshot().

However, the clk_prepare() step may sleep whereas tick_program_event() and
thus, ->set_state_oneshot(), can be called in atomic context.

Split the clk_prepare_enable() in em_sti_enable() into two steps:
- prepare the clock at device probing via clk_prepare()
- and enable it in em_sti_enable() via clk_enable().
Slightly reorder resource initialization in em_sti_probe() in order to
facilitate error handling.

Signed-off-by: Nicolai Stange <[email protected]>
---

Notes:
Compile-only tested on ARCH=arm.

drivers/clocksource/em_sti.c | 25 ++++++++++++++++++-------
1 file changed, 18 insertions(+), 7 deletions(-)

diff --git a/drivers/clocksource/em_sti.c b/drivers/clocksource/em_sti.c
index 19bb179..46750c0 100644
--- a/drivers/clocksource/em_sti.c
+++ b/drivers/clocksource/em_sti.c
@@ -78,7 +78,7 @@ static int em_sti_enable(struct em_sti_priv *p)
int ret;

/* enable clock */
- ret = clk_prepare_enable(p->clk);
+ ret = clk_enable(p->clk);
if (ret) {
dev_err(&p->pdev->dev, "cannot enable clock\n");
return ret;
@@ -107,7 +107,7 @@ static void em_sti_disable(struct em_sti_priv *p)
em_sti_write(p, STI_INTENCLR, 3);

/* stop clock */
- clk_disable_unprepare(p->clk);
+ clk_disable(p->clk);
}

static cycle_t em_sti_count(struct em_sti_priv *p)
@@ -303,6 +303,7 @@ static int em_sti_probe(struct platform_device *pdev)
struct em_sti_priv *p;
struct resource *res;
int irq;
+ int ret;

p = devm_kzalloc(&pdev->dev, sizeof(*p), GFP_KERNEL);
if (p == NULL)
@@ -323,6 +324,13 @@ static int em_sti_probe(struct platform_device *pdev)
if (IS_ERR(p->base))
return PTR_ERR(p->base);

+ if (devm_request_irq(&pdev->dev, irq, em_sti_interrupt,
+ IRQF_TIMER | IRQF_IRQPOLL | IRQF_NOBALANCING,
+ dev_name(&pdev->dev), p)) {
+ dev_err(&pdev->dev, "failed to request low IRQ\n");
+ return -ENOENT;
+ }
+
/* get hold of clock */
p->clk = devm_clk_get(&pdev->dev, "sclk");
if (IS_ERR(p->clk)) {
@@ -330,17 +338,20 @@ static int em_sti_probe(struct platform_device *pdev)
return PTR_ERR(p->clk);
}

- if (devm_request_irq(&pdev->dev, irq, em_sti_interrupt,
- IRQF_TIMER | IRQF_IRQPOLL | IRQF_NOBALANCING,
- dev_name(&pdev->dev), p)) {
- dev_err(&pdev->dev, "failed to request low IRQ\n");
- return -ENOENT;
+ ret = clk_prepare(p->clk);
+ if (ret < 0) {
+ dev_err(&pdev->dev, "cannot prepare clock\n");
+ goto err_clk_put;
}

raw_spin_lock_init(&p->lock);
em_sti_register_clockevent(p);
em_sti_register_clocksource(p);
return 0;
+
+err_clk_put:
+ clk_put(p->clk);
+ return ret;
}

static int em_sti_remove(struct platform_device *pdev)
--
2.10.2

2016-11-19 16:01:44

by Nicolai Stange

[permalink] [raw]
Subject: [RFC v8 06/28] clockevents: make clockevents_config() static

A clockevent device's rate should be configured before or at registration
and changed afterwards through clockevents_update_freq() only.

For the configuration at registration, we already have
clockevents_config_and_register().

Right now, there are no clockevents_config() users outside of the
clockevents core.

To mitigiate the risk of drivers errorneously reconfiguring their rates
through clockevents_config() *after* device registration, make
clockevents_config() static.

Signed-off-by: Nicolai Stange <[email protected]>
---
include/linux/clockchips.h | 1 -
kernel/time/clockevents.c | 2 +-
2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h
index 0d442e3..a116926 100644
--- a/include/linux/clockchips.h
+++ b/include/linux/clockchips.h
@@ -182,7 +182,6 @@ extern u64 clockevent_delta2ns(unsigned long latch, struct clock_event_device *e
extern void clockevents_register_device(struct clock_event_device *dev);
extern int clockevents_unbind_device(struct clock_event_device *ced, int cpu);

-extern void clockevents_config(struct clock_event_device *dev, u32 freq);
extern void clockevents_config_and_register(struct clock_event_device *dev,
u32 freq, unsigned long min_delta,
unsigned long max_delta);
diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index 2c5bc77..e73ac7f 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -468,7 +468,7 @@ void clockevents_register_device(struct clock_event_device *dev)
}
EXPORT_SYMBOL_GPL(clockevents_register_device);

-void clockevents_config(struct clock_event_device *dev, u32 freq)
+static void clockevents_config(struct clock_event_device *dev, u32 freq)
{
u64 sec;

--
2.10.2

2016-11-19 16:01:50

by Nicolai Stange

[permalink] [raw]
Subject: [RFC v8 09/28] arch/x86/platform/uv/uv_time: set ->min_delta_ticks and ->max_delta_ticks

With the yet to come introduction of NTP correction awareness to the
clockevent core, drivers should report their valid ranges in units of
cycles to the latter.

Currently, the x86's uv rtc clockevent device is initialized as follows:

clock_event_device_uv.min_delta_ns = NSEC_PER_SEC /
sn_rtc_cycles_per_second;
clock_event_device_uv.max_delta_ns = clocksource_uv.mask *
(NSEC_PER_SEC / sn_rtc_cycles_per_second);

This translates to a ->min_delta_ticks value of 1 and a ->max_delta_ticks
value of clocksource_uv.mask.

Initialize ->min_delta_ticks and ->max_delta_ticks with these values
respectively.

Signed-off-by: Nicolai Stange <[email protected]>
---

Notes:
Compile-only tested through an allmodconfig build on ARCH=x86_64.

arch/x86/platform/uv/uv_time.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/x86/platform/uv/uv_time.c b/arch/x86/platform/uv/uv_time.c
index b333fc4..6410ee3 100644
--- a/arch/x86/platform/uv/uv_time.c
+++ b/arch/x86/platform/uv/uv_time.c
@@ -390,9 +390,11 @@ static __init int uv_rtc_setup_clock(void)

clock_event_device_uv.min_delta_ns = NSEC_PER_SEC /
sn_rtc_cycles_per_second;
+ clock_event_device_uv.min_delta_ticks = 1;

clock_event_device_uv.max_delta_ns = clocksource_uv.mask *
(NSEC_PER_SEC / sn_rtc_cycles_per_second);
+ clock_event_device_uv.max_delta_ticks = clocksource_uv.mask;

rc = schedule_on_each_cpu(uv_rtc_register_clockevents);
if (rc) {
--
2.10.2

2016-11-19 16:01:38

by Nicolai Stange

[permalink] [raw]
Subject: [RFC v8 04/28] clocksource: em_sti: compute rate before registration

With the upcoming NTP correction related rate adjustments to be implemented
in the clockevents core, the latter needs to get informed about every rate
change of a clockevent device made after its registration.

Currently, em_sti violates this requirement in that it registers its
clockevent device with a dummy rate and sets its final rate through
clockevents_config() called from its ->set_state_oneshot().

This patch moves the setting of the clockevent device's rate to its
registration.

I checked all current em_sti users in arch/arm/mach-shmobile and right now,
none of them changes any rate in any clock tree relevant to em_sti after
their respective time_init(). Since all em_sti instances are created after
time_init(), none of them should ever observe any clock rate changes.

- Determine the ->rate value in em_sti_probe() at device probing rather
than at first usage.
- Set the clockevent device's rate at its registration.
- Although not strictly necessary for the upcoming clockevent core changes,
set the clocksource's rate at its registration for consistency.

Signed-off-by: Nicolai Stange <[email protected]>
---

Notes:
For a detailed analysis of the current em_sti users, please see
https://nicst.de/ced-clk-rate-change-analysis/em_sti-cgitted.html

Compile-only tested on ARCH=arm.

drivers/clocksource/em_sti.c | 24 ++++++++++++++----------
1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/drivers/clocksource/em_sti.c b/drivers/clocksource/em_sti.c
index 46750c0..6168d18 100644
--- a/drivers/clocksource/em_sti.c
+++ b/drivers/clocksource/em_sti.c
@@ -205,13 +205,9 @@ static cycle_t em_sti_clocksource_read(struct clocksource *cs)

static int em_sti_clocksource_enable(struct clocksource *cs)
{
- int ret;
struct em_sti_priv *p = cs_to_em_sti(cs);

- ret = em_sti_start(p, USER_CLOCKSOURCE);
- if (!ret)
- __clocksource_update_freq_hz(cs, p->rate);
- return ret;
+ return em_sti_start(p, USER_CLOCKSOURCE);
}

static void em_sti_clocksource_disable(struct clocksource *cs)
@@ -240,8 +236,7 @@ static int em_sti_register_clocksource(struct em_sti_priv *p)

dev_info(&p->pdev->dev, "used as clock source\n");

- /* Register with dummy 1 Hz value, gets updated in ->enable() */
- clocksource_register_hz(cs, 1);
+ clocksource_register_hz(cs, p->rate);
return 0;
}

@@ -263,7 +258,6 @@ static int em_sti_clock_event_set_oneshot(struct clock_event_device *ced)

dev_info(&p->pdev->dev, "used for oneshot clock events\n");
em_sti_start(p, USER_CLOCKEVENT);
- clockevents_config(&p->ced, p->rate);
return 0;
}

@@ -294,8 +288,7 @@ static void em_sti_register_clockevent(struct em_sti_priv *p)

dev_info(&p->pdev->dev, "used for clock events\n");

- /* Register with dummy 1 Hz value, gets updated in ->set_state_oneshot() */
- clockevents_config_and_register(ced, 1, 2, 0xffffffff);
+ clockevents_config_and_register(ced, p->rate, 2, 0xffffffff);
}

static int em_sti_probe(struct platform_device *pdev)
@@ -344,11 +337,22 @@ static int em_sti_probe(struct platform_device *pdev)
goto err_clk_put;
}

+ ret = clk_enable(p->clk);
+ if (ret < 0) {
+ dev_err(&p->pdev->dev, "cannot enable clock\n");
+ goto err_clk_unprepare;
+ }
+ p->rate = clk_get_rate(p->clk);
+ clk_disable(p->clk);
+
raw_spin_lock_init(&p->lock);
em_sti_register_clockevent(p);
em_sti_register_clocksource(p);
return 0;

+err_clk_unprepare:
+ clk_unprepare(p->clk);
+
err_clk_put:
clk_put(p->clk);
return ret;
--
2.10.2

2016-11-19 16:01:47

by Nicolai Stange

[permalink] [raw]
Subject: [RFC v8 08/28] arch/s390/kernel/time: set ->min_delta_ticks and ->max_delta_ticks

With the yet to come introduction of NTP correction awareness to the
clockevent core, drivers should report their valid ranges in units of
cycles to the latter.

Currently, the s390's CPU timer clockevent device is initialized as
follows:

cd->min_delta_ns = 1;
cd->max_delta_ns = LONG_MAX;

Note that the device's time to cycle conversion factor, i.e.
cd->mult / (2^cd->shift), is approx. equal to 4.

Hence, this would translate to

cd->min_delta_ticks = 4;
cd->max_delta_ticks = 4 * LONG_MAX;

However, a minimum value of 1ns is in the range of noise anyway and the
clockevent core will take care of this by increasing it to 1us or so.
Furthermore, 4*LONG_MAX will overflow the unsigned long argument the
clockevent devices gets programmed with.

Thus, initialize ->min_delta_ticks with 1 and ->max_delta_ticks with
ULONG_MAX.

Signed-off-by: Nicolai Stange <[email protected]>
---
arch/s390/kernel/time.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/s390/kernel/time.c b/arch/s390/kernel/time.c
index 33082f6..46f2073 100644
--- a/arch/s390/kernel/time.c
+++ b/arch/s390/kernel/time.c
@@ -158,7 +158,9 @@ void init_cpu_timer(void)
cd->mult = 16777;
cd->shift = 12;
cd->min_delta_ns = 1;
+ cd->min_delta_ticks = 1;
cd->max_delta_ns = LONG_MAX;
+ cd->max_delta_ticks = ULONG_MAX;
cd->rating = 400;
cd->cpumask = cpumask_of(cpu);
cd->set_next_event = s390_next_event;
--
2.10.2

2016-11-19 16:01:55

by Nicolai Stange

[permalink] [raw]
Subject: [RFC v8 11/28] clockevents: always initialize ->min_delta_ns and ->max_delta_ns

Now that all clockevent drivers set ->min_delta_ticks and ->max_delta_ticks
independently of whether they use clockevents_config*() or not, the
clockevent core can calculate ->min_delta_ns and ->max_delta_ns from these
unconditionally.

The goal is to prepare the clockevent core for introducing NTP rate
correction awareness: as the clockevent devices' rates will get adjusted,
the ->*_delta_ns won't stay fixed but need to get changed as well.

Thus, make the clockevent core calculate the ->*_delta_ns values from
the invariant ->_delta_ticks ones at device registration.

In order to facilitate this, move the corresponding code from
clockevents_config() into its own helper function,
__clockevents_update_bounds() and invoke this where needed, in
particular from clockevents_register_device().

Note that there is a side effect affecting those drivers
- not using clockevents_config*()
- and manually initializing their ->min_delta_ns to values less than 1us.
In order to avoid "pointless noise", the clockevent core always sets
->min_delta_ns to values >= 1us, and thus, those drivers' ->min_delta_ns
is now increased. I think that this side effect is desired though.

Signed-off-by: Nicolai Stange <[email protected]>
---
kernel/time/clockevents.c | 19 +++++++++++++++++--
1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index e73ac7f..4fcec08 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -442,6 +442,20 @@ int clockevents_unbind_device(struct clock_event_device *ced, int cpu)
}
EXPORT_SYMBOL_GPL(clockevents_unbind_device);

+static void __clockevents_update_bounds(struct clock_event_device *dev)
+{
+ if (!(dev->features & CLOCK_EVT_FEAT_ONESHOT) ||
+ (dev->features & CLOCK_EVT_FEAT_DUMMY))
+ return;
+
+ /*
+ * cev_delta2ns() never returns values less than 1us and thus,
+ * we'll never program any ced with anything less.
+ */
+ dev->min_delta_ns = cev_delta2ns(dev->min_delta_ticks, dev, false);
+ dev->max_delta_ns = cev_delta2ns(dev->max_delta_ticks, dev, true);
+}
+
/**
* clockevents_register_device - register a clock event device
* @dev: device to register
@@ -458,6 +472,8 @@ void clockevents_register_device(struct clock_event_device *dev)
dev->cpumask = cpumask_of(smp_processor_id());
}

+ __clockevents_update_bounds(dev);
+
raw_spin_lock_irqsave(&clockevents_lock, flags);

list_add(&dev->list, &clockevent_devices);
@@ -488,8 +504,6 @@ static void clockevents_config(struct clock_event_device *dev, u32 freq)
sec = 600;

clockevents_calc_mult_shift(dev, freq, sec);
- dev->min_delta_ns = cev_delta2ns(dev->min_delta_ticks, dev, false);
- dev->max_delta_ns = cev_delta2ns(dev->max_delta_ticks, dev, true);
}

/**
@@ -515,6 +529,7 @@ EXPORT_SYMBOL_GPL(clockevents_config_and_register);
int __clockevents_update_freq(struct clock_event_device *dev, u32 freq)
{
clockevents_config(dev, freq);
+ __clockevents_update_bounds(dev);

if (clockevent_state_oneshot(dev))
return clockevents_program_event(dev, dev->next_event, false);
--
2.10.2

2016-11-19 16:01:41

by Nicolai Stange

[permalink] [raw]
Subject: [RFC v8 05/28] clocksource: h8300_timer8: don't reset rate in ->set_state_oneshot()

With the upcoming NTP correction related rate adjustments to be implemented
in the clockevents core, the latter needs to get informed about every rate
change of a clockevent device made after its registration.

Currently, h8300_timer8 violates this requirement in that it registers its
clockevent device with the correct rate, but resets its ->mult and ->rate
values in timer8_clock_event_start(), called from its ->set_state_oneshot()
function.

It seems like
commit 4633f4cac85a ("clocksource/drivers/h8300: Cleanup startup and
remove module code."),
which introduced the rate initialization at registration, missed to remove
the manual setting of ->mult and ->shift from timer8_clock_event_start().

Purge the setting of ->mult, ->shift, ->min_delta_ns and ->max_delta_ns
from timer8_clock_event_start().

Signed-off-by: Nicolai Stange <[email protected]>
---

Notes:
Compile-only tested on ARCH=h8300.

drivers/clocksource/h8300_timer8.c | 8 --------
1 file changed, 8 deletions(-)

diff --git a/drivers/clocksource/h8300_timer8.c b/drivers/clocksource/h8300_timer8.c
index 546bb18..804c489 100644
--- a/drivers/clocksource/h8300_timer8.c
+++ b/drivers/clocksource/h8300_timer8.c
@@ -101,15 +101,7 @@ static inline struct timer8_priv *ced_to_priv(struct clock_event_device *ced)

static void timer8_clock_event_start(struct timer8_priv *p, unsigned long delta)
{
- struct clock_event_device *ced = &p->ced;
-
timer8_start(p);
-
- ced->shift = 32;
- ced->mult = div_sc(p->rate, NSEC_PER_SEC, ced->shift);
- ced->max_delta_ns = clockevent_delta2ns(0xffff, ced);
- ced->min_delta_ns = clockevent_delta2ns(0x0001, ced);
-
timer8_set_next(p, delta);
}

--
2.10.2

2016-11-19 16:01:33

by Nicolai Stange

[permalink] [raw]
Subject: [RFC v8 02/28] clocksource: sh_tmu: compute rate before registration again

With the upcoming NTP correction related rate adjustments to be implemented
in the clockevents core, the latter needs to get informed about every rate
change of a clockevent device made after its registration.

Currently, sh_tmu violates this requirement in that it registers its
clockevent device with a dummy rate and sets its final rate through
clockevents_config() called from its ->set_state_oneshot() and
->set_state_periodic() functions respectively.

This patch moves the setting of the clockevent device's rate to its
registration.

Note that there has been some back and forth regarding this question with
respect to the clocksource also provided by this driver:
commit 66f49121ffa4 ("clocksource: sh_tmu: compute mult and shift before
registration")
moves the rate determination from the clocksource's ->enable() function to
before its registration. OTOH, the later
commit 0aeac458d9eb ("clocksource: sh_tmu: __clocksource_updatefreq_hz()
update")
basically reverts this, saying
"Without this patch the old code uses clocksource_register() together
with a hack that assumes a never changing clock rate."

However, I checked all current sh_tmu users in arch/sh as well as in
arch/arm/mach-shmobile carefully and right now, none of them changes any
rate in any clock tree relevant to sh_tmu after their respective
time_init(). Since all sh_tmu instances are created after time_init(), none
of them should ever observe any clock rate changes.

What's more, both, a clocksource as well as a clockevent device, can
immediately get selected for use at their registration and thus, enabled
at this point already. So it's probably safer to assume a "never changing
clock rate" here.

- Move the struct sh_tmu_channel's ->rate member to struct sh_tmu_device:
it's a property of the underlying clock which is in turn specific to
the sh_tmu_device.
- Determine the ->rate value in sh_tmu_setup() at device probing rather
than at first usage.
- Set the clockevent device's rate at its registration.
- Although not strictly necessary for the upcoming clockevent core changes,
set the clocksource's rate at its registration for consistency.

Signed-off-by: Nicolai Stange <[email protected]>
---

Notes:
For a detailed analysis of the current sh_tmu users, please see
https://nicst.de/ced-clk-rate-change-analysis/sh_tmu-cgitted.html

Compile-only tested on ARCH=sh and ARCH=arm.

drivers/clocksource/sh_tmu.c | 26 +++++++++++++-------------
1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/drivers/clocksource/sh_tmu.c b/drivers/clocksource/sh_tmu.c
index 469e776..8200042 100644
--- a/drivers/clocksource/sh_tmu.c
+++ b/drivers/clocksource/sh_tmu.c
@@ -46,7 +46,6 @@ struct sh_tmu_channel {
void __iomem *base;
int irq;

- unsigned long rate;
unsigned long periodic;
struct clock_event_device ced;
struct clocksource cs;
@@ -59,6 +58,7 @@ struct sh_tmu_device {

void __iomem *mapbase;
struct clk *clk;
+ unsigned long rate;

enum sh_tmu_model model;

@@ -165,7 +165,6 @@ static int __sh_tmu_enable(struct sh_tmu_channel *ch)
sh_tmu_write(ch, TCNT, 0xffffffff);

/* configure channel to parent clock / 4, irq off */
- ch->rate = clk_get_rate(ch->tmu->clk) / 4;
sh_tmu_write(ch, TCR, TCR_TPSC_CLK4);

/* enable channel */
@@ -271,10 +270,8 @@ static int sh_tmu_clocksource_enable(struct clocksource *cs)
return 0;

ret = sh_tmu_enable(ch);
- if (!ret) {
- __clocksource_update_freq_hz(cs, ch->rate);
+ if (!ret)
ch->cs_enabled = true;
- }

return ret;
}
@@ -334,8 +331,7 @@ static int sh_tmu_register_clocksource(struct sh_tmu_channel *ch,
dev_info(&ch->tmu->pdev->dev, "ch%u: used as clock source\n",
ch->index);

- /* Register with dummy 1 Hz value, gets updated in ->enable() */
- clocksource_register_hz(cs, 1);
+ clocksource_register_hz(cs, ch->tmu->rate);
return 0;
}

@@ -346,14 +342,10 @@ static struct sh_tmu_channel *ced_to_sh_tmu(struct clock_event_device *ced)

static void sh_tmu_clock_event_start(struct sh_tmu_channel *ch, int periodic)
{
- struct clock_event_device *ced = &ch->ced;
-
sh_tmu_enable(ch);

- clockevents_config(ced, ch->rate);
-
if (periodic) {
- ch->periodic = (ch->rate + HZ/2) / HZ;
+ ch->periodic = (ch->tmu->rate + HZ/2) / HZ;
sh_tmu_set_next(ch, ch->periodic, 1);
}
}
@@ -435,7 +427,7 @@ static void sh_tmu_register_clockevent(struct sh_tmu_channel *ch,
dev_info(&ch->tmu->pdev->dev, "ch%u: used for clock events\n",
ch->index);

- clockevents_config_and_register(ced, 1, 0x300, 0xffffffff);
+ clockevents_config_and_register(ced, ch->tmu->rate, 0x300, 0xffffffff);

ret = request_irq(ch->irq, sh_tmu_interrupt,
IRQF_TIMER | IRQF_IRQPOLL | IRQF_NOBALANCING,
@@ -561,6 +553,14 @@ static int sh_tmu_setup(struct sh_tmu_device *tmu, struct platform_device *pdev)
if (ret < 0)
goto err_clk_put;

+ /* Determine clock rate. */
+ ret = clk_enable(tmu->clk);
+ if (ret < 0)
+ goto err_clk_unprepare;
+
+ tmu->rate = clk_get_rate(tmu->clk) / 4;
+ clk_disable(tmu->clk);
+
/* Map the memory resource. */
ret = sh_tmu_map_memory(tmu);
if (ret < 0) {
--
2.10.2

2016-11-19 16:03:06

by Nicolai Stange

[permalink] [raw]
Subject: [RFC v8 07/28] many clockevent drivers: set ->min_delta_ticks and ->max_delta_ticks

The struct clock_event_device has got the ->min_delta_ns, ->max_delta_ns,
->min_delta_ticks and ->max_delta_ticks members for setting the bounds
of valid deltas to be programmed.

During operation, the clockevent core uses the ->*_delta_ns versions only.
OTOH, the ->*_delta_ticks are currently used solely for the calculations of
the ->*_delta_ns values in clockevents_config().

>From the hardware's point of view, giving the valid ranges in terms of
cycles, i.e. the ->*_delta_ticks values, is the natural choice.

Right now, as the ratio of ns/cycles is fixed, it doesn't matter much which
unit of measurement is actually used by the clockevents core. (Well, apart
from the fact that nearly all drivers which set the ->*_delta_ns manually
don't round towards the correct direction).

This situation is going to change though: as NTP time correction awareness
will be introduced to the clockevents core, the ratio ns/cycles will cease
to stay fixed at all times.

In conclusion, every driver should hand its valid range in terms of an
invariant unit, i.e. cycles, to the clockevent core.

Those drivers which configure their clockevent devices via
clockevents_config_and_register() already do this.

This patch makes the vast majority of the remaining drivers do that, too.
For the sake of bisectability, the manual settings of the ->*delta_ns
by drivers will be removed by a later patch.

This patch was created with the help of the Coccinelle script below. It is
noteworthy that for constructs like

ced.min_delta_ns = clockevent_delta2ns(<cycles>, &ced) + 1;

it is assumed that the +1 addition is done in order to compensate for
any rounding introduced by the conversion of cycles to ns and thus,
is irrelevant for ->min_delta_cycles.

@@
expression ced, cycles;
@@
ced->min_delta_ns = clockevent_delta2ns(cycles, ced);
+ ced->min_delta_ticks = cycles;

@min@
expression ced, cycles;
@@
ced.min_delta_ns = clockevent_delta2ns(cycles, &ced);
+ ced.min_delta_ticks = cycles;

@min_round_up depends on !min@
expression ced, cycles;
@@
ced.min_delta_ns = clockevent_delta2ns(cycles, &ced) + 1;
+ ced.min_delta_ticks = cycles;

@@
identifier ced;
expression ns;
@@
struct clock_event_device ced = {
.mult = 1,
.shift = 0,
.min_delta_ns = ns,
+ .min_delta_ticks = ns,
};

@p_max_neg_const@
expression ced;
constant int cycles;
@@
ced->max_delta_ns = clockevent_delta2ns(-cycles, ced);
+ ced->max_delta_ticks = (unsigned long)-cycles;

@p_max depends on !p_max_neg_const@
expression ced, cycles;
@@
ced->max_delta_ns = clockevent_delta2ns(cycles, ced);
+ ced->max_delta_ticks = cycles;

@@
expression ced, cycles;
@@
ced.max_delta_ns = clockevent_delta2ns(cycles, &ced);
+ ced.max_delta_ticks = cycles;

@@
identifier ced;
expression ns;
@@
struct clock_event_device ced = {
.mult = 1,
.shift = 0,
.max_delta_ns = ns,
+ .max_delta_ticks = ns,
};

This patch differs from what spatch created only in some whitespace
adjustments and an explicit cast added in arch/sparc/kernel/time_32.c.

Signed-off-by: Nicolai Stange <[email protected]>
---
arch/avr32/kernel/time.c | 2 ++
arch/blackfin/kernel/time-ts.c | 4 ++++
arch/c6x/platforms/timer64.c | 2 ++
arch/hexagon/kernel/time.c | 2 ++
arch/m68k/coldfire/pit.c | 2 ++
arch/microblaze/kernel/timer.c | 2 ++
arch/mips/alchemy/common/time.c | 4 +++-
arch/mips/jz4740/time.c | 2 ++
arch/mips/kernel/cevt-bcm1480.c | 2 ++
arch/mips/kernel/cevt-ds1287.c | 2 ++
arch/mips/kernel/cevt-gt641xx.c | 2 ++
arch/mips/kernel/cevt-sb1250.c | 2 ++
arch/mips/kernel/cevt-txx9.c | 2 ++
arch/mips/loongson32/common/time.c | 2 ++
arch/mips/loongson64/common/cs5536/cs5536_mfgpt.c | 2 ++
arch/mips/loongson64/loongson-3/hpet.c | 2 ++
arch/mips/ralink/cevt-rt3352.c | 2 ++
arch/mips/sgi-ip27/ip27-timer.c | 2 ++
arch/mn10300/kernel/cevt-mn10300.c | 2 ++
arch/powerpc/kernel/time.c | 2 ++
arch/score/kernel/time.c | 2 ++
arch/sparc/kernel/time_32.c | 2 ++
arch/sparc/kernel/time_64.c | 2 ++
arch/um/kernel/time.c | 4 +++-
arch/unicore32/kernel/time.c | 2 ++
arch/x86/kernel/apic/apic.c | 4 ++++
arch/x86/lguest/boot.c | 2 ++
arch/x86/xen/time.c | 4 ++++
drivers/clocksource/dw_apb_timer.c | 2 ++
drivers/clocksource/metag_generic.c | 2 ++
drivers/clocksource/numachip.c | 2 ++
drivers/clocksource/sh_cmt.c | 2 ++
drivers/clocksource/timer-atlas7.c | 2 ++
33 files changed, 74 insertions(+), 2 deletions(-)

diff --git a/arch/avr32/kernel/time.c b/arch/avr32/kernel/time.c
index a124c55..3fff4c9 100644
--- a/arch/avr32/kernel/time.c
+++ b/arch/avr32/kernel/time.c
@@ -142,7 +142,9 @@ void __init time_init(void)
/* setup COMPARE clockevent */
comparator.mult = div_sc(counter_hz, NSEC_PER_SEC, comparator.shift);
comparator.max_delta_ns = clockevent_delta2ns((u32)~0, &comparator);
+ comparator.max_delta_ticks = (u32)~0;
comparator.min_delta_ns = clockevent_delta2ns(50, &comparator) + 1;
+ comparator.min_delta_ticks = 50;
comparator.cpumask = cpumask_of(0);

sysreg_write(COMPARE, 0);
diff --git a/arch/blackfin/kernel/time-ts.c b/arch/blackfin/kernel/time-ts.c
index fb9e95f..4c93b6f 100644
--- a/arch/blackfin/kernel/time-ts.c
+++ b/arch/blackfin/kernel/time-ts.c
@@ -230,7 +230,9 @@ static void __init bfin_gptmr0_clockevent_init(struct clock_event_device *evt)
clock_tick = get_sclk();
evt->mult = div_sc(clock_tick, NSEC_PER_SEC, evt->shift);
evt->max_delta_ns = clockevent_delta2ns(-1, evt);
+ evt->max_delta_ticks = (unsigned long)-1;
evt->min_delta_ns = clockevent_delta2ns(100, evt);
+ evt->min_delta_ticks = 100;

evt->cpumask = cpumask_of(0);

@@ -344,7 +346,9 @@ void bfin_coretmr_clockevent_init(void)
clock_tick = get_cclk() / TIME_SCALE;
evt->mult = div_sc(clock_tick, NSEC_PER_SEC, evt->shift);
evt->max_delta_ns = clockevent_delta2ns(-1, evt);
+ evt->max_delta_ticks = (unsigned long)-1;
evt->min_delta_ns = clockevent_delta2ns(100, evt);
+ evt->min_delta_ticks = 100;

evt->cpumask = cpumask_of(cpu);

diff --git a/arch/c6x/platforms/timer64.c b/arch/c6x/platforms/timer64.c
index c19901e..0bd0452 100644
--- a/arch/c6x/platforms/timer64.c
+++ b/arch/c6x/platforms/timer64.c
@@ -234,7 +234,9 @@ void __init timer64_init(void)
clockevents_calc_mult_shift(cd, c6x_core_freq / TIMER_DIVISOR, 5);

cd->max_delta_ns = clockevent_delta2ns(0x7fffffff, cd);
+ cd->max_delta_ticks = 0x7fffffff;
cd->min_delta_ns = clockevent_delta2ns(250, cd);
+ cd->min_delta_ticks = 250;

cd->cpumask = cpumask_of(smp_processor_id());

diff --git a/arch/hexagon/kernel/time.c b/arch/hexagon/kernel/time.c
index a6a1d1f..fbdeac1 100644
--- a/arch/hexagon/kernel/time.c
+++ b/arch/hexagon/kernel/time.c
@@ -199,7 +199,9 @@ void __init time_init_deferred(void)
clockevents_calc_mult_shift(ce_dev, sleep_clk_freq, 4);

ce_dev->max_delta_ns = clockevent_delta2ns(0x7fffffff, ce_dev);
+ ce_dev->max_delta_ticks = 0x7fffffff;
ce_dev->min_delta_ns = clockevent_delta2ns(0xf, ce_dev);
+ ce_dev->min_delta_ticks = 0xf;

#ifdef CONFIG_SMP
setup_percpu_clockdev();
diff --git a/arch/m68k/coldfire/pit.c b/arch/m68k/coldfire/pit.c
index d86a9ff..91850e7 100644
--- a/arch/m68k/coldfire/pit.c
+++ b/arch/m68k/coldfire/pit.c
@@ -149,8 +149,10 @@ void hw_timer_init(irq_handler_t handler)
cf_pit_clockevent.mult = div_sc(FREQ, NSEC_PER_SEC, 32);
cf_pit_clockevent.max_delta_ns =
clockevent_delta2ns(0xFFFF, &cf_pit_clockevent);
+ cf_pit_clockevent.max_delta_ticks = 0xFFFF;
cf_pit_clockevent.min_delta_ns =
clockevent_delta2ns(0x3f, &cf_pit_clockevent);
+ cf_pit_clockevent.min_delta_ticks = 0x3f;
clockevents_register_device(&cf_pit_clockevent);

setup_irq(MCF_IRQ_PIT1, &pit_irq);
diff --git a/arch/microblaze/kernel/timer.c b/arch/microblaze/kernel/timer.c
index 5bbf38b..7ba2ff6 100644
--- a/arch/microblaze/kernel/timer.c
+++ b/arch/microblaze/kernel/timer.c
@@ -177,8 +177,10 @@ static __init int xilinx_clockevent_init(void)
clockevent_xilinx_timer.shift);
clockevent_xilinx_timer.max_delta_ns =
clockevent_delta2ns((u32)~0, &clockevent_xilinx_timer);
+ clockevent_xilinx_timer.max_delta_ticks = (u32)~0;
clockevent_xilinx_timer.min_delta_ns =
clockevent_delta2ns(1, &clockevent_xilinx_timer);
+ clockevent_xilinx_timer.min_delta_ticks = 1;
clockevent_xilinx_timer.cpumask = cpumask_of(0);
clockevents_register_device(&clockevent_xilinx_timer);

diff --git a/arch/mips/alchemy/common/time.c b/arch/mips/alchemy/common/time.c
index f99d3ec..4afdb70 100644
--- a/arch/mips/alchemy/common/time.c
+++ b/arch/mips/alchemy/common/time.c
@@ -138,7 +138,9 @@ static int __init alchemy_time_init(unsigned int m2int)
cd->shift = 32;
cd->mult = div_sc(32768, NSEC_PER_SEC, cd->shift);
cd->max_delta_ns = clockevent_delta2ns(0xffffffff, cd);
- cd->min_delta_ns = clockevent_delta2ns(9, cd); /* ~0.28ms */
+ cd->max_delta_ticks = 0xffffffff;
+ cd->min_delta_ns = clockevent_delta2ns(9, cd);
+ cd->min_delta_ticks = 9; /* ~0.28ms */
clockevents_register_device(cd);
setup_irq(m2int, &au1x_rtcmatch2_irqaction);

diff --git a/arch/mips/jz4740/time.c b/arch/mips/jz4740/time.c
index 1f7ca2c..8cb992f 100644
--- a/arch/mips/jz4740/time.c
+++ b/arch/mips/jz4740/time.c
@@ -145,7 +145,9 @@ void __init plat_time_init(void)

clockevent_set_clock(&jz4740_clockevent, clk_rate);
jz4740_clockevent.min_delta_ns = clockevent_delta2ns(100, &jz4740_clockevent);
+ jz4740_clockevent.min_delta_ticks = 100;
jz4740_clockevent.max_delta_ns = clockevent_delta2ns(0xffff, &jz4740_clockevent);
+ jz4740_clockevent.max_delta_ticks = 0xffff;
jz4740_clockevent.cpumask = cpumask_of(0);

clockevents_register_device(&jz4740_clockevent);
diff --git a/arch/mips/kernel/cevt-bcm1480.c b/arch/mips/kernel/cevt-bcm1480.c
index 940ac00..8f9f2da 100644
--- a/arch/mips/kernel/cevt-bcm1480.c
+++ b/arch/mips/kernel/cevt-bcm1480.c
@@ -123,7 +123,9 @@ void sb1480_clockevent_init(void)
CLOCK_EVT_FEAT_ONESHOT;
clockevent_set_clock(cd, V_SCD_TIMER_FREQ);
cd->max_delta_ns = clockevent_delta2ns(0x7fffff, cd);
+ cd->max_delta_ticks = 0x7fffff;
cd->min_delta_ns = clockevent_delta2ns(2, cd);
+ cd->min_delta_ticks = 2;
cd->rating = 200;
cd->irq = irq;
cd->cpumask = cpumask_of(cpu);
diff --git a/arch/mips/kernel/cevt-ds1287.c b/arch/mips/kernel/cevt-ds1287.c
index 77a5ddf..61ad907 100644
--- a/arch/mips/kernel/cevt-ds1287.c
+++ b/arch/mips/kernel/cevt-ds1287.c
@@ -128,7 +128,9 @@ int __init ds1287_clockevent_init(int irq)
cd->irq = irq;
clockevent_set_clock(cd, 32768);
cd->max_delta_ns = clockevent_delta2ns(0x7fffffff, cd);
+ cd->max_delta_ticks = 0x7fffffff;
cd->min_delta_ns = clockevent_delta2ns(0x300, cd);
+ cd->min_delta_ticks = 0x300;
cd->cpumask = cpumask_of(0);

clockevents_register_device(&ds1287_clockevent);
diff --git a/arch/mips/kernel/cevt-gt641xx.c b/arch/mips/kernel/cevt-gt641xx.c
index 6604005..fd90c82 100644
--- a/arch/mips/kernel/cevt-gt641xx.c
+++ b/arch/mips/kernel/cevt-gt641xx.c
@@ -152,7 +152,9 @@ static int __init gt641xx_timer0_clockevent_init(void)
cd->rating = 200 + gt641xx_base_clock / 10000000;
clockevent_set_clock(cd, gt641xx_base_clock);
cd->max_delta_ns = clockevent_delta2ns(0x7fffffff, cd);
+ cd->max_delta_ticks = 0x7fffffff;
cd->min_delta_ns = clockevent_delta2ns(0x300, cd);
+ cd->min_delta_ticks = 0x300;
cd->cpumask = cpumask_of(0);

clockevents_register_device(&gt641xx_timer0_clockevent);
diff --git a/arch/mips/kernel/cevt-sb1250.c b/arch/mips/kernel/cevt-sb1250.c
index 3d860ef..9d1edb5 100644
--- a/arch/mips/kernel/cevt-sb1250.c
+++ b/arch/mips/kernel/cevt-sb1250.c
@@ -123,7 +123,9 @@ void sb1250_clockevent_init(void)
CLOCK_EVT_FEAT_ONESHOT;
clockevent_set_clock(cd, V_SCD_TIMER_FREQ);
cd->max_delta_ns = clockevent_delta2ns(0x7fffff, cd);
+ cd->max_delta_ticks = 0x7fffff;
cd->min_delta_ns = clockevent_delta2ns(2, cd);
+ cd->min_delta_ticks = 2;
cd->rating = 200;
cd->irq = irq;
cd->cpumask = cpumask_of(cpu);
diff --git a/arch/mips/kernel/cevt-txx9.c b/arch/mips/kernel/cevt-txx9.c
index 537eefd..e651a03 100644
--- a/arch/mips/kernel/cevt-txx9.c
+++ b/arch/mips/kernel/cevt-txx9.c
@@ -196,7 +196,9 @@ void __init txx9_clockevent_init(unsigned long baseaddr, int irq,
clockevent_set_clock(cd, TIMER_CLK(imbusclk));
cd->max_delta_ns =
clockevent_delta2ns(0xffffffff >> (32 - TXX9_TIMER_BITS), cd);
+ cd->max_delta_ticks = 0xffffffff >> (32 - TXX9_TIMER_BITS);
cd->min_delta_ns = clockevent_delta2ns(0xf, cd);
+ cd->min_delta_ticks = 0xf;
cd->irq = irq;
cd->cpumask = cpumask_of(0),
clockevents_register_device(cd);
diff --git a/arch/mips/loongson32/common/time.c b/arch/mips/loongson32/common/time.c
index ff224f0..bf42507 100644
--- a/arch/mips/loongson32/common/time.c
+++ b/arch/mips/loongson32/common/time.c
@@ -199,7 +199,9 @@ static void __init ls1x_time_init(void)

clockevent_set_clock(cd, mips_hpt_frequency);
cd->max_delta_ns = clockevent_delta2ns(0xffffff, cd);
+ cd->max_delta_ticks = 0xffffff;
cd->min_delta_ns = clockevent_delta2ns(0x000300, cd);
+ cd->min_delta_ticks = 0x000300;
cd->cpumask = cpumask_of(smp_processor_id());
clockevents_register_device(cd);

diff --git a/arch/mips/loongson64/common/cs5536/cs5536_mfgpt.c b/arch/mips/loongson64/common/cs5536/cs5536_mfgpt.c
index da77d41..fd9a9f2 100644
--- a/arch/mips/loongson64/common/cs5536/cs5536_mfgpt.c
+++ b/arch/mips/loongson64/common/cs5536/cs5536_mfgpt.c
@@ -123,7 +123,9 @@ void __init setup_mfgpt0_timer(void)
cd->cpumask = cpumask_of(cpu);
clockevent_set_clock(cd, MFGPT_TICK_RATE);
cd->max_delta_ns = clockevent_delta2ns(0xffff, cd);
+ cd->max_delta_ticks = 0xffff;
cd->min_delta_ns = clockevent_delta2ns(0xf, cd);
+ cd->min_delta_ticks = 0xf;

/* Enable MFGPT0 Comparator 2 Output to the Interrupt Mapper */
_wrmsr(DIVIL_MSR_REG(MFGPT_IRQ), 0, 0x100);
diff --git a/arch/mips/loongson64/loongson-3/hpet.c b/arch/mips/loongson64/loongson-3/hpet.c
index 4788bea..94f5410 100644
--- a/arch/mips/loongson64/loongson-3/hpet.c
+++ b/arch/mips/loongson64/loongson-3/hpet.c
@@ -241,7 +241,9 @@ void __init setup_hpet_timer(void)
cd->cpumask = cpumask_of(cpu);
clockevent_set_clock(cd, HPET_FREQ);
cd->max_delta_ns = clockevent_delta2ns(0x7fffffff, cd);
+ cd->max_delta_ticks = 0x7fffffff;
cd->min_delta_ns = clockevent_delta2ns(HPET_MIN_PROG_DELTA, cd);
+ cd->min_delta_ticks = HPET_MIN_PROG_DELTA;

clockevents_register_device(cd);
setup_irq(HPET_T0_IRQ, &hpet_irq);
diff --git a/arch/mips/ralink/cevt-rt3352.c b/arch/mips/ralink/cevt-rt3352.c
index f24eee0..b8a1376 100644
--- a/arch/mips/ralink/cevt-rt3352.c
+++ b/arch/mips/ralink/cevt-rt3352.c
@@ -129,7 +129,9 @@ static int __init ralink_systick_init(struct device_node *np)
systick.dev.name = np->name;
clockevents_calc_mult_shift(&systick.dev, SYSTICK_FREQ, 60);
systick.dev.max_delta_ns = clockevent_delta2ns(0x7fff, &systick.dev);
+ systick.dev.max_delta_ticks = 0x7fff;
systick.dev.min_delta_ns = clockevent_delta2ns(0x3, &systick.dev);
+ systick.dev.min_delta_ticks = 0x3;
systick.dev.irq = irq_of_parse_and_map(np, 0);
if (!systick.dev.irq) {
pr_err("%s: request_irq failed", np->name);
diff --git a/arch/mips/sgi-ip27/ip27-timer.c b/arch/mips/sgi-ip27/ip27-timer.c
index 42d6cb9..c1e0c84 100644
--- a/arch/mips/sgi-ip27/ip27-timer.c
+++ b/arch/mips/sgi-ip27/ip27-timer.c
@@ -113,7 +113,9 @@ void hub_rt_clock_event_init(void)
cd->features = CLOCK_EVT_FEAT_ONESHOT;
clockevent_set_clock(cd, CYCLES_PER_SEC);
cd->max_delta_ns = clockevent_delta2ns(0xfffffffffffff, cd);
+ cd->max_delta_ticks = 0xfffffffffffff;
cd->min_delta_ns = clockevent_delta2ns(0x300, cd);
+ cd->min_delta_ticks = 0x300;
cd->rating = 200;
cd->irq = irq;
cd->cpumask = cpumask_of(cpu);
diff --git a/arch/mn10300/kernel/cevt-mn10300.c b/arch/mn10300/kernel/cevt-mn10300.c
index d9b34dd..2b21bbc 100644
--- a/arch/mn10300/kernel/cevt-mn10300.c
+++ b/arch/mn10300/kernel/cevt-mn10300.c
@@ -98,7 +98,9 @@ int __init init_clockevents(void)

/* Calculate the min / max delta */
cd->max_delta_ns = clockevent_delta2ns(TMJCBR_MAX, cd);
+ cd->max_delta_ticks = TMJCBR_MAX;
cd->min_delta_ns = clockevent_delta2ns(100, cd);
+ cd->min_delta_ticks = 100;

cd->rating = 200;
cd->cpumask = cpumask_of(smp_processor_id());
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index be9751f..b6b13d9 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -974,8 +974,10 @@ static void __init init_decrementer_clockevent(void)

decrementer_clockevent.max_delta_ns =
clockevent_delta2ns(decrementer_max, &decrementer_clockevent);
+ decrementer_clockevent.max_delta_ticks = decrementer_max;
decrementer_clockevent.min_delta_ns =
clockevent_delta2ns(2, &decrementer_clockevent);
+ decrementer_clockevent.min_delta_ticks = 2;

register_decrementer_clockevent(cpu);
}
diff --git a/arch/score/kernel/time.c b/arch/score/kernel/time.c
index 679b8d7..29aafc7 100644
--- a/arch/score/kernel/time.c
+++ b/arch/score/kernel/time.c
@@ -81,8 +81,10 @@ void __init time_init(void)
score_clockevent.shift);
score_clockevent.max_delta_ns = clockevent_delta2ns((u32)~0,
&score_clockevent);
+ score_clockevent.max_delta_ticks = (u32)~0;
score_clockevent.min_delta_ns = clockevent_delta2ns(50,
&score_clockevent) + 1;
+ score_clockevent.min_delta_ticks = 50;
score_clockevent.cpumask = cpumask_of(0);
clockevents_register_device(&score_clockevent);
}
diff --git a/arch/sparc/kernel/time_32.c b/arch/sparc/kernel/time_32.c
index 1affabc..1203c7c 100644
--- a/arch/sparc/kernel/time_32.c
+++ b/arch/sparc/kernel/time_32.c
@@ -228,7 +228,9 @@ void register_percpu_ce(int cpu)
ce->mult = div_sc(sparc_config.clock_rate, NSEC_PER_SEC,
ce->shift);
ce->max_delta_ns = clockevent_delta2ns(sparc_config.clock_rate, ce);
+ ce->max_delta_ticks = (unsigned long)sparc_config.clock_rate;
ce->min_delta_ns = clockevent_delta2ns(100, ce);
+ ce->min_delta_ticks = 100;

clockevents_register_device(ce);
}
diff --git a/arch/sparc/kernel/time_64.c b/arch/sparc/kernel/time_64.c
index c69b21e..83f0f99 100644
--- a/arch/sparc/kernel/time_64.c
+++ b/arch/sparc/kernel/time_64.c
@@ -796,8 +796,10 @@ void __init time_init(void)

sparc64_clockevent.max_delta_ns =
clockevent_delta2ns(0x7fffffffffffffffUL, &sparc64_clockevent);
+ sparc64_clockevent.max_delta_ticks = 0x7fffffffffffffffUL;
sparc64_clockevent.min_delta_ns =
clockevent_delta2ns(0xF, &sparc64_clockevent);
+ sparc64_clockevent.min_delta_ticks = 0xF;

printk("clockevent: mult[%x] shift[%d]\n",
sparc64_clockevent.mult, sparc64_clockevent.shift);
diff --git a/arch/um/kernel/time.c b/arch/um/kernel/time.c
index 25c2366..11b96ed 100644
--- a/arch/um/kernel/time.c
+++ b/arch/um/kernel/time.c
@@ -65,7 +65,9 @@ static struct clock_event_device timer_clockevent = {
.set_next_event = itimer_next_event,
.shift = 0,
.max_delta_ns = 0xffffffff,
- .min_delta_ns = TIMER_MIN_DELTA, //microsecond resolution should be enough for anyone, same as 640K RAM
+ .max_delta_ticks = 0xffffffff,
+ .min_delta_ns = TIMER_MIN_DELTA,
+ .min_delta_ticks = TIMER_MIN_DELTA, // microsecond resolution should be enough for anyone, same as 640K RAM
.irq = 0,
.mult = 1,
};
diff --git a/arch/unicore32/kernel/time.c b/arch/unicore32/kernel/time.c
index ac4c544..29c91a9 100644
--- a/arch/unicore32/kernel/time.c
+++ b/arch/unicore32/kernel/time.c
@@ -91,8 +91,10 @@ void __init time_init(void)

ckevt_puv3_osmr0.max_delta_ns =
clockevent_delta2ns(0x7fffffff, &ckevt_puv3_osmr0);
+ ckevt_puv3_osmr0.max_delta_ticks = 0x7fffffff;
ckevt_puv3_osmr0.min_delta_ns =
clockevent_delta2ns(MIN_OSCR_DELTA * 2, &ckevt_puv3_osmr0) + 1;
+ ckevt_puv3_osmr0.min_delta_ticks = MIN_OSCR_DELTA * 2;
ckevt_puv3_osmr0.cpumask = cpumask_of(0);

setup_irq(IRQ_TIMER0, &puv3_timer_irq);
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 2686894..eebb677 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -731,8 +731,10 @@ static int __init calibrate_APIC_clock(void)
TICK_NSEC, lapic_clockevent.shift);
lapic_clockevent.max_delta_ns =
clockevent_delta2ns(0x7FFFFF, &lapic_clockevent);
+ lapic_clockevent.max_delta_ticks = 0x7FFFFF;
lapic_clockevent.min_delta_ns =
clockevent_delta2ns(0xF, &lapic_clockevent);
+ lapic_clockevent.min_delta_ticks = 0xF;
lapic_clockevent.features &= ~CLOCK_EVT_FEAT_DUMMY;
return 0;
}
@@ -778,8 +780,10 @@ static int __init calibrate_APIC_clock(void)
lapic_clockevent.shift);
lapic_clockevent.max_delta_ns =
clockevent_delta2ns(0x7FFFFFFF, &lapic_clockevent);
+ lapic_clockevent.max_delta_ticks = 0x7FFFFFFF;
lapic_clockevent.min_delta_ns =
clockevent_delta2ns(0xF, &lapic_clockevent);
+ lapic_clockevent.min_delta_ticks = 0xF;

lapic_timer_frequency = (delta * APIC_DIVISOR) / LAPIC_CAL_LOOPS;

diff --git a/arch/x86/lguest/boot.c b/arch/x86/lguest/boot.c
index 4ca0d78..930dabe 100644
--- a/arch/x86/lguest/boot.c
+++ b/arch/x86/lguest/boot.c
@@ -994,7 +994,9 @@ static struct clock_event_device lguest_clockevent = {
.mult = 1,
.shift = 0,
.min_delta_ns = LG_CLOCK_MIN_DELTA,
+ .min_delta_ticks = LG_CLOCK_MIN_DELTA,
.max_delta_ns = LG_CLOCK_MAX_DELTA,
+ .max_delta_ticks = LG_CLOCK_MAX_DELTA,
};

/*
diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
index 33d8f6a..cfc2f12 100644
--- a/arch/x86/xen/time.c
+++ b/arch/x86/xen/time.c
@@ -209,7 +209,9 @@ static const struct clock_event_device xen_timerop_clockevent = {
.features = CLOCK_EVT_FEAT_ONESHOT,

.max_delta_ns = 0xffffffff,
+ .max_delta_ticks = 0xffffffff,
.min_delta_ns = TIMER_SLOP,
+ .min_delta_ticks = TIMER_SLOP,

.mult = 1,
.shift = 0,
@@ -268,7 +270,9 @@ static const struct clock_event_device xen_vcpuop_clockevent = {
.features = CLOCK_EVT_FEAT_ONESHOT,

.max_delta_ns = 0xffffffff,
+ .max_delta_ticks = 0xffffffff,
.min_delta_ns = TIMER_SLOP,
+ .min_delta_ticks = TIMER_SLOP,

.mult = 1,
.shift = 0,
diff --git a/drivers/clocksource/dw_apb_timer.c b/drivers/clocksource/dw_apb_timer.c
index 797505a..2d68204 100644
--- a/drivers/clocksource/dw_apb_timer.c
+++ b/drivers/clocksource/dw_apb_timer.c
@@ -257,7 +257,9 @@ dw_apb_clockevent_init(int cpu, const char *name, unsigned rating,
clockevents_calc_mult_shift(&dw_ced->ced, freq, APBT_MIN_PERIOD);
dw_ced->ced.max_delta_ns = clockevent_delta2ns(0x7fffffff,
&dw_ced->ced);
+ dw_ced->ced.max_delta_ticks = 0x7fffffff;
dw_ced->ced.min_delta_ns = clockevent_delta2ns(5000, &dw_ced->ced);
+ dw_ced->ced.min_delta_ticks = 5000;
dw_ced->ced.cpumask = cpumask_of(cpu);
dw_ced->ced.features = CLOCK_EVT_FEAT_PERIODIC |
CLOCK_EVT_FEAT_ONESHOT | CLOCK_EVT_FEAT_DYNIRQ;
diff --git a/drivers/clocksource/metag_generic.c b/drivers/clocksource/metag_generic.c
index a80ab3e..2b08024 100644
--- a/drivers/clocksource/metag_generic.c
+++ b/drivers/clocksource/metag_generic.c
@@ -114,7 +114,9 @@ static int arch_timer_starting_cpu(unsigned int cpu)

clk->mult = div_sc(hwtimer_freq, NSEC_PER_SEC, clk->shift);
clk->max_delta_ns = clockevent_delta2ns(0x7fffffff, clk);
+ clk->max_delta_ticks = 0x7fffffff;
clk->min_delta_ns = clockevent_delta2ns(0xf, clk);
+ clk->min_delta_ticks = 0xf;
clk->cpumask = cpumask_of(cpu);

clockevents_register_device(clk);
diff --git a/drivers/clocksource/numachip.c b/drivers/clocksource/numachip.c
index 4e0f11f..6a20dc8 100644
--- a/drivers/clocksource/numachip.c
+++ b/drivers/clocksource/numachip.c
@@ -51,7 +51,9 @@ static struct clock_event_device numachip2_clockevent = {
.mult = 1,
.shift = 0,
.min_delta_ns = 1250,
+ .min_delta_ticks = 1250,
.max_delta_ns = LONG_MAX,
+ .max_delta_ticks = LONG_MAX,
};

static void numachip_timer_interrupt(void)
diff --git a/drivers/clocksource/sh_cmt.c b/drivers/clocksource/sh_cmt.c
index 3038885..97ce6bf 100644
--- a/drivers/clocksource/sh_cmt.c
+++ b/drivers/clocksource/sh_cmt.c
@@ -815,7 +815,9 @@ static int sh_cmt_register_clockevent(struct sh_cmt_channel *ch,
ced->shift = 32;
ced->mult = div_sc(ch->cmt->rate, NSEC_PER_SEC, ced->shift);
ced->max_delta_ns = clockevent_delta2ns(ch->max_match_value, ced);
+ ced->max_delta_ticks = ch->max_match_value;
ced->min_delta_ns = clockevent_delta2ns(0x1f, ced);
+ ced->min_delta_ticks = 0x1f;

dev_info(&ch->cmt->pdev->dev, "ch%u: used for clock events\n",
ch->index);
diff --git a/drivers/clocksource/timer-atlas7.c b/drivers/clocksource/timer-atlas7.c
index 4334e03..c95f209 100644
--- a/drivers/clocksource/timer-atlas7.c
+++ b/drivers/clocksource/timer-atlas7.c
@@ -192,7 +192,9 @@ static int sirfsoc_local_timer_starting_cpu(unsigned int cpu)
ce->set_next_event = sirfsoc_timer_set_next_event;
clockevents_calc_mult_shift(ce, atlas7_timer_rate, 60);
ce->max_delta_ns = clockevent_delta2ns(-2, ce);
+ ce->max_delta_ticks = (unsigned long)-2;
ce->min_delta_ns = clockevent_delta2ns(2, ce);
+ ce->min_delta_ticks = 2;
ce->cpumask = cpumask_of(cpu);

action->dev_id = ce;
--
2.10.2

2016-11-19 16:03:22

by Nicolai Stange

[permalink] [raw]
Subject: [RFC v8 01/28] clocksource: sh_cmt: compute rate before registration again

With the upcoming NTP correction related rate adjustments to be implemented
in the clockevents core, the latter needs to get informed about every rate
change of a clockevent device made after its registration.

Currently, sh_cmt violates this requirement in that it registers its
clockevent device with a dummy rate and sets its final ->mult and ->shift
values from its ->set_state_oneshot() and ->set_state_periodic() functions
respectively.

This patch moves the setting of the clockevent device's ->mult and ->shift
values to before its registration.

Note that there has been some back and forth regarding this question with
respect to the clocksource also provided by this driver:
commit f4d7c3565c16 ("clocksource: sh_cmt: compute mult and shift before
registration")
moves the rate determination from the clocksource's ->enable() function to
before its registration. OTOH, the later
commit 3593f5fe40a1 ("clocksource: sh_cmt: __clocksource_updatefreq_hz()
update")
basically reverts this, saying
"Without this patch the old code uses clocksource_register() together
with a hack that assumes a never changing clock rate."

However, I checked all current sh_cmt users in arch/sh as well as in
arch/arm/mach-shmobile carefully and right now, none of them changes any
rate in any clock tree relevant to sh_cmt after their respective
time_init(). Since all sh_cmt instances are created after time_init(), none
of them should ever observe any clock rate changes.

What's more, both, a clocksource as well as a clockevent device, can
immediately get selected for use at their registration and thus, enabled
at this point already. So it's probably safer to assume a "never changing
clock rate" here.

- Move the struct sh_cmt_channel's ->rate member to struct sh_cmt_device:
it's a property of the underlying clock which is in turn specific to
the sh_cmt_device.
- Determine the ->rate value in sh_cmt_setup() at device probing rather
than at first usage.
- Set the clockevent device's ->mult and ->shift values right before its
registration.
- Although not strictly necessary for the upcoming clockevent core changes,
set the clocksource's rate at its registration for consistency.

Signed-off-by: Nicolai Stange <[email protected]>
---

Notes:
For a detailed analysis of the current sh_cmt users, please see
https://nicst.de/ced-clk-rate-change-analysis/sh_cmt-cgitted.html

Compile-only tested on ARCH=sh and ARCH=arm.

drivers/clocksource/sh_cmt.c | 45 ++++++++++++++++++++++++--------------------
1 file changed, 25 insertions(+), 20 deletions(-)

diff --git a/drivers/clocksource/sh_cmt.c b/drivers/clocksource/sh_cmt.c
index 103c493..3038885 100644
--- a/drivers/clocksource/sh_cmt.c
+++ b/drivers/clocksource/sh_cmt.c
@@ -103,7 +103,6 @@ struct sh_cmt_channel {
unsigned long match_value;
unsigned long next_match_value;
unsigned long max_match_value;
- unsigned long rate;
raw_spinlock_t lock;
struct clock_event_device ced;
struct clocksource cs;
@@ -118,6 +117,7 @@ struct sh_cmt_device {

void __iomem *mapbase;
struct clk *clk;
+ unsigned long rate;

raw_spinlock_t lock; /* Protect the shared start/stop register */

@@ -320,7 +320,7 @@ static void sh_cmt_start_stop_ch(struct sh_cmt_channel *ch, int start)
raw_spin_unlock_irqrestore(&ch->cmt->lock, flags);
}

-static int sh_cmt_enable(struct sh_cmt_channel *ch, unsigned long *rate)
+static int sh_cmt_enable(struct sh_cmt_channel *ch)
{
int k, ret;

@@ -340,11 +340,9 @@ static int sh_cmt_enable(struct sh_cmt_channel *ch, unsigned long *rate)

/* configure channel, periodic mode and maximum timeout */
if (ch->cmt->info->width == 16) {
- *rate = clk_get_rate(ch->cmt->clk) / 512;
sh_cmt_write_cmcsr(ch, SH_CMT16_CMCSR_CMIE |
SH_CMT16_CMCSR_CKS512);
} else {
- *rate = clk_get_rate(ch->cmt->clk) / 8;
sh_cmt_write_cmcsr(ch, SH_CMT32_CMCSR_CMM |
SH_CMT32_CMCSR_CMTOUT_IE |
SH_CMT32_CMCSR_CMR_IRQ |
@@ -572,7 +570,7 @@ static int sh_cmt_start(struct sh_cmt_channel *ch, unsigned long flag)
raw_spin_lock_irqsave(&ch->lock, flags);

if (!(ch->flags & (FLAG_CLOCKEVENT | FLAG_CLOCKSOURCE)))
- ret = sh_cmt_enable(ch, &ch->rate);
+ ret = sh_cmt_enable(ch);

if (ret)
goto out;
@@ -640,10 +638,9 @@ static int sh_cmt_clocksource_enable(struct clocksource *cs)
ch->total_cycles = 0;

ret = sh_cmt_start(ch, FLAG_CLOCKSOURCE);
- if (!ret) {
- __clocksource_update_freq_hz(cs, ch->rate);
+ if (!ret)
ch->cs_enabled = true;
- }
+
return ret;
}

@@ -697,8 +694,7 @@ static int sh_cmt_register_clocksource(struct sh_cmt_channel *ch,
dev_info(&ch->cmt->pdev->dev, "ch%u: used as clock source\n",
ch->index);

- /* Register with dummy 1 Hz value, gets updated in ->enable() */
- clocksource_register_hz(cs, 1);
+ clocksource_register_hz(cs, ch->cmt->rate);
return 0;
}

@@ -709,19 +705,10 @@ static struct sh_cmt_channel *ced_to_sh_cmt(struct clock_event_device *ced)

static void sh_cmt_clock_event_start(struct sh_cmt_channel *ch, int periodic)
{
- struct clock_event_device *ced = &ch->ced;
-
sh_cmt_start(ch, FLAG_CLOCKEVENT);

- /* TODO: calculate good shift from rate and counter bit width */
-
- ced->shift = 32;
- ced->mult = div_sc(ch->rate, NSEC_PER_SEC, ced->shift);
- ced->max_delta_ns = clockevent_delta2ns(ch->max_match_value, ced);
- ced->min_delta_ns = clockevent_delta2ns(0x1f, ced);
-
if (periodic)
- sh_cmt_set_next(ch, ((ch->rate + HZ/2) / HZ) - 1);
+ sh_cmt_set_next(ch, ((ch->cmt->rate + HZ/2) / HZ) - 1);
else
sh_cmt_set_next(ch, ch->max_match_value);
}
@@ -824,6 +811,12 @@ static int sh_cmt_register_clockevent(struct sh_cmt_channel *ch,
ced->suspend = sh_cmt_clock_event_suspend;
ced->resume = sh_cmt_clock_event_resume;

+ /* TODO: calculate good shift from rate and counter bit width */
+ ced->shift = 32;
+ ced->mult = div_sc(ch->cmt->rate, NSEC_PER_SEC, ced->shift);
+ ced->max_delta_ns = clockevent_delta2ns(ch->max_match_value, ced);
+ ced->min_delta_ns = clockevent_delta2ns(0x1f, ced);
+
dev_info(&ch->cmt->pdev->dev, "ch%u: used for clock events\n",
ch->index);
clockevents_register_device(ced);
@@ -996,6 +989,18 @@ static int sh_cmt_setup(struct sh_cmt_device *cmt, struct platform_device *pdev)
if (ret < 0)
goto err_clk_put;

+ /* Determine clock rate. */
+ ret = clk_enable(cmt->clk);
+ if (ret < 0)
+ goto err_clk_unprepare;
+
+ if (cmt->info->width == 16)
+ cmt->rate = clk_get_rate(cmt->clk) / 512;
+ else
+ cmt->rate = clk_get_rate(cmt->clk) / 8;
+
+ clk_disable(cmt->clk);
+
/* Map the memory resource(s). */
ret = sh_cmt_map_memory(cmt);
if (ret < 0)
--
2.10.2

2016-11-19 16:10:52

by Nicolai Stange

[permalink] [raw]
Subject: [RFC v8 12/28] many clockevent drivers: don't set ->min_delta_ns and ->max_delta_ns

Now that the clockevent core always initializes the ->*_delta_ns values
from their ->_delta_ticks counterparts, there is no point in having the
clockevent devices' drivers doing so as well.

Don't initialize ->min_delta_ns and ->max_delta_ns from the clockevent
devices' drivers.

This patch was created with the help of the Coccinelle script below.
One initialization of ->min_delta_ns in arch/tile/kernel/time.c gets
missed by this Cocci script and its removal has been manually added.

@@
expression ced;
expression ns, cycles;
@@
- ced->min_delta_ns = ns;
...
ced->min_delta_ticks = cycles;

@@
expression ced;
expression ns, cycles;
@@
- ced.min_delta_ns = ns;
...
ced.min_delta_ticks = cycles;

@@
identifier ced;
expression ns, cycles;
@@
struct clock_event_device ced = {
- .min_delta_ns = ns,
.min_delta_ticks = cycles,
};

@@
expression ced;
expression ns, cycles;
@@
- ced->max_delta_ns = ns;
...
ced->max_delta_ticks = cycles;

@@
expression ced;
expression ns, cycles;
@@
- ced.max_delta_ns = ns;
...
ced.max_delta_ticks = cycles;

@@
identifier ced;
expression ns, cycles;
@@
struct clock_event_device ced = {
- .max_delta_ns = ns,
.max_delta_ticks = cycles,
};

Signed-off-by: Nicolai Stange <[email protected]>
---
arch/avr32/kernel/time.c | 2 --
arch/blackfin/kernel/time-ts.c | 4 ----
arch/c6x/platforms/timer64.c | 2 --
arch/hexagon/kernel/time.c | 2 --
arch/m68k/coldfire/pit.c | 4 ----
arch/microblaze/kernel/timer.c | 4 ----
arch/mips/alchemy/common/time.c | 2 --
arch/mips/jz4740/time.c | 2 --
arch/mips/kernel/cevt-bcm1480.c | 2 --
arch/mips/kernel/cevt-ds1287.c | 2 --
arch/mips/kernel/cevt-gt641xx.c | 2 --
arch/mips/kernel/cevt-sb1250.c | 2 --
arch/mips/kernel/cevt-txx9.c | 3 ---
arch/mips/loongson32/common/time.c | 2 --
arch/mips/loongson64/common/cs5536/cs5536_mfgpt.c | 2 --
arch/mips/loongson64/loongson-3/hpet.c | 2 --
arch/mips/ralink/cevt-rt3352.c | 2 --
arch/mips/sgi-ip27/ip27-timer.c | 2 --
arch/mn10300/kernel/cevt-mn10300.c | 2 --
arch/powerpc/kernel/time.c | 4 ----
arch/s390/kernel/time.c | 2 --
arch/score/kernel/time.c | 4 ----
arch/sparc/kernel/time_32.c | 2 --
arch/sparc/kernel/time_64.c | 4 ----
arch/tile/kernel/time.c | 2 --
arch/um/kernel/time.c | 2 --
arch/unicore32/kernel/time.c | 4 ----
arch/x86/kernel/apic/apic.c | 8 --------
arch/x86/lguest/boot.c | 2 --
arch/x86/platform/uv/uv_time.c | 4 ----
arch/x86/xen/time.c | 4 ----
drivers/clocksource/dw_apb_timer.c | 3 ---
drivers/clocksource/metag_generic.c | 2 --
drivers/clocksource/numachip.c | 2 --
drivers/clocksource/sh_cmt.c | 2 --
drivers/clocksource/timer-atlas7.c | 2 --
kernel/time/tick-broadcast-hrtimer.c | 2 --
37 files changed, 100 deletions(-)

diff --git a/arch/avr32/kernel/time.c b/arch/avr32/kernel/time.c
index 3fff4c9..af08261 100644
--- a/arch/avr32/kernel/time.c
+++ b/arch/avr32/kernel/time.c
@@ -141,9 +141,7 @@ void __init time_init(void)

/* setup COMPARE clockevent */
comparator.mult = div_sc(counter_hz, NSEC_PER_SEC, comparator.shift);
- comparator.max_delta_ns = clockevent_delta2ns((u32)~0, &comparator);
comparator.max_delta_ticks = (u32)~0;
- comparator.min_delta_ns = clockevent_delta2ns(50, &comparator) + 1;
comparator.min_delta_ticks = 50;
comparator.cpumask = cpumask_of(0);

diff --git a/arch/blackfin/kernel/time-ts.c b/arch/blackfin/kernel/time-ts.c
index 4c93b6f..be43289 100644
--- a/arch/blackfin/kernel/time-ts.c
+++ b/arch/blackfin/kernel/time-ts.c
@@ -229,9 +229,7 @@ static void __init bfin_gptmr0_clockevent_init(struct clock_event_device *evt)

clock_tick = get_sclk();
evt->mult = div_sc(clock_tick, NSEC_PER_SEC, evt->shift);
- evt->max_delta_ns = clockevent_delta2ns(-1, evt);
evt->max_delta_ticks = (unsigned long)-1;
- evt->min_delta_ns = clockevent_delta2ns(100, evt);
evt->min_delta_ticks = 100;

evt->cpumask = cpumask_of(0);
@@ -345,9 +343,7 @@ void bfin_coretmr_clockevent_init(void)

clock_tick = get_cclk() / TIME_SCALE;
evt->mult = div_sc(clock_tick, NSEC_PER_SEC, evt->shift);
- evt->max_delta_ns = clockevent_delta2ns(-1, evt);
evt->max_delta_ticks = (unsigned long)-1;
- evt->min_delta_ns = clockevent_delta2ns(100, evt);
evt->min_delta_ticks = 100;

evt->cpumask = cpumask_of(cpu);
diff --git a/arch/c6x/platforms/timer64.c b/arch/c6x/platforms/timer64.c
index 0bd0452..67cd790 100644
--- a/arch/c6x/platforms/timer64.c
+++ b/arch/c6x/platforms/timer64.c
@@ -233,9 +233,7 @@ void __init timer64_init(void)

clockevents_calc_mult_shift(cd, c6x_core_freq / TIMER_DIVISOR, 5);

- cd->max_delta_ns = clockevent_delta2ns(0x7fffffff, cd);
cd->max_delta_ticks = 0x7fffffff;
- cd->min_delta_ns = clockevent_delta2ns(250, cd);
cd->min_delta_ticks = 250;

cd->cpumask = cpumask_of(smp_processor_id());
diff --git a/arch/hexagon/kernel/time.c b/arch/hexagon/kernel/time.c
index fbdeac1..426764d 100644
--- a/arch/hexagon/kernel/time.c
+++ b/arch/hexagon/kernel/time.c
@@ -198,9 +198,7 @@ void __init time_init_deferred(void)
*/
clockevents_calc_mult_shift(ce_dev, sleep_clk_freq, 4);

- ce_dev->max_delta_ns = clockevent_delta2ns(0x7fffffff, ce_dev);
ce_dev->max_delta_ticks = 0x7fffffff;
- ce_dev->min_delta_ns = clockevent_delta2ns(0xf, ce_dev);
ce_dev->min_delta_ticks = 0xf;

#ifdef CONFIG_SMP
diff --git a/arch/m68k/coldfire/pit.c b/arch/m68k/coldfire/pit.c
index 91850e7..56ff421 100644
--- a/arch/m68k/coldfire/pit.c
+++ b/arch/m68k/coldfire/pit.c
@@ -147,11 +147,7 @@ void hw_timer_init(irq_handler_t handler)
{
cf_pit_clockevent.cpumask = cpumask_of(smp_processor_id());
cf_pit_clockevent.mult = div_sc(FREQ, NSEC_PER_SEC, 32);
- cf_pit_clockevent.max_delta_ns =
- clockevent_delta2ns(0xFFFF, &cf_pit_clockevent);
cf_pit_clockevent.max_delta_ticks = 0xFFFF;
- cf_pit_clockevent.min_delta_ns =
- clockevent_delta2ns(0x3f, &cf_pit_clockevent);
cf_pit_clockevent.min_delta_ticks = 0x3f;
clockevents_register_device(&cf_pit_clockevent);

diff --git a/arch/microblaze/kernel/timer.c b/arch/microblaze/kernel/timer.c
index 7ba2ff6..8b82d10 100644
--- a/arch/microblaze/kernel/timer.c
+++ b/arch/microblaze/kernel/timer.c
@@ -175,11 +175,7 @@ static __init int xilinx_clockevent_init(void)
clockevent_xilinx_timer.mult =
div_sc(timer_clock_freq, NSEC_PER_SEC,
clockevent_xilinx_timer.shift);
- clockevent_xilinx_timer.max_delta_ns =
- clockevent_delta2ns((u32)~0, &clockevent_xilinx_timer);
clockevent_xilinx_timer.max_delta_ticks = (u32)~0;
- clockevent_xilinx_timer.min_delta_ns =
- clockevent_delta2ns(1, &clockevent_xilinx_timer);
clockevent_xilinx_timer.min_delta_ticks = 1;
clockevent_xilinx_timer.cpumask = cpumask_of(0);
clockevents_register_device(&clockevent_xilinx_timer);
diff --git a/arch/mips/alchemy/common/time.c b/arch/mips/alchemy/common/time.c
index 4afdb70..5f0edac 100644
--- a/arch/mips/alchemy/common/time.c
+++ b/arch/mips/alchemy/common/time.c
@@ -137,9 +137,7 @@ static int __init alchemy_time_init(unsigned int m2int)

cd->shift = 32;
cd->mult = div_sc(32768, NSEC_PER_SEC, cd->shift);
- cd->max_delta_ns = clockevent_delta2ns(0xffffffff, cd);
cd->max_delta_ticks = 0xffffffff;
- cd->min_delta_ns = clockevent_delta2ns(9, cd);
cd->min_delta_ticks = 9; /* ~0.28ms */
clockevents_register_device(cd);
setup_irq(m2int, &au1x_rtcmatch2_irqaction);
diff --git a/arch/mips/jz4740/time.c b/arch/mips/jz4740/time.c
index 8cb992f..4bf158e 100644
--- a/arch/mips/jz4740/time.c
+++ b/arch/mips/jz4740/time.c
@@ -144,9 +144,7 @@ void __init plat_time_init(void)
jz4740_jiffies_per_tick = DIV_ROUND_CLOSEST(clk_rate, HZ);

clockevent_set_clock(&jz4740_clockevent, clk_rate);
- jz4740_clockevent.min_delta_ns = clockevent_delta2ns(100, &jz4740_clockevent);
jz4740_clockevent.min_delta_ticks = 100;
- jz4740_clockevent.max_delta_ns = clockevent_delta2ns(0xffff, &jz4740_clockevent);
jz4740_clockevent.max_delta_ticks = 0xffff;
jz4740_clockevent.cpumask = cpumask_of(0);

diff --git a/arch/mips/kernel/cevt-bcm1480.c b/arch/mips/kernel/cevt-bcm1480.c
index 8f9f2da..35b8702 100644
--- a/arch/mips/kernel/cevt-bcm1480.c
+++ b/arch/mips/kernel/cevt-bcm1480.c
@@ -122,9 +122,7 @@ void sb1480_clockevent_init(void)
cd->features = CLOCK_EVT_FEAT_PERIODIC |
CLOCK_EVT_FEAT_ONESHOT;
clockevent_set_clock(cd, V_SCD_TIMER_FREQ);
- cd->max_delta_ns = clockevent_delta2ns(0x7fffff, cd);
cd->max_delta_ticks = 0x7fffff;
- cd->min_delta_ns = clockevent_delta2ns(2, cd);
cd->min_delta_ticks = 2;
cd->rating = 200;
cd->irq = irq;
diff --git a/arch/mips/kernel/cevt-ds1287.c b/arch/mips/kernel/cevt-ds1287.c
index 61ad907..3084bdb0 100644
--- a/arch/mips/kernel/cevt-ds1287.c
+++ b/arch/mips/kernel/cevt-ds1287.c
@@ -127,9 +127,7 @@ int __init ds1287_clockevent_init(int irq)
cd->rating = 100;
cd->irq = irq;
clockevent_set_clock(cd, 32768);
- cd->max_delta_ns = clockevent_delta2ns(0x7fffffff, cd);
cd->max_delta_ticks = 0x7fffffff;
- cd->min_delta_ns = clockevent_delta2ns(0x300, cd);
cd->min_delta_ticks = 0x300;
cd->cpumask = cpumask_of(0);

diff --git a/arch/mips/kernel/cevt-gt641xx.c b/arch/mips/kernel/cevt-gt641xx.c
index fd90c82..d86975e 100644
--- a/arch/mips/kernel/cevt-gt641xx.c
+++ b/arch/mips/kernel/cevt-gt641xx.c
@@ -151,9 +151,7 @@ static int __init gt641xx_timer0_clockevent_init(void)
cd = &gt641xx_timer0_clockevent;
cd->rating = 200 + gt641xx_base_clock / 10000000;
clockevent_set_clock(cd, gt641xx_base_clock);
- cd->max_delta_ns = clockevent_delta2ns(0x7fffffff, cd);
cd->max_delta_ticks = 0x7fffffff;
- cd->min_delta_ns = clockevent_delta2ns(0x300, cd);
cd->min_delta_ticks = 0x300;
cd->cpumask = cpumask_of(0);

diff --git a/arch/mips/kernel/cevt-sb1250.c b/arch/mips/kernel/cevt-sb1250.c
index 9d1edb5..8fb7723 100644
--- a/arch/mips/kernel/cevt-sb1250.c
+++ b/arch/mips/kernel/cevt-sb1250.c
@@ -122,9 +122,7 @@ void sb1250_clockevent_init(void)
cd->features = CLOCK_EVT_FEAT_PERIODIC |
CLOCK_EVT_FEAT_ONESHOT;
clockevent_set_clock(cd, V_SCD_TIMER_FREQ);
- cd->max_delta_ns = clockevent_delta2ns(0x7fffff, cd);
cd->max_delta_ticks = 0x7fffff;
- cd->min_delta_ns = clockevent_delta2ns(2, cd);
cd->min_delta_ticks = 2;
cd->rating = 200;
cd->irq = irq;
diff --git a/arch/mips/kernel/cevt-txx9.c b/arch/mips/kernel/cevt-txx9.c
index e651a03..acb56fe 100644
--- a/arch/mips/kernel/cevt-txx9.c
+++ b/arch/mips/kernel/cevt-txx9.c
@@ -194,10 +194,7 @@ void __init txx9_clockevent_init(unsigned long baseaddr, int irq,
txx9_clock_event_device.tmrptr = tmrptr;

clockevent_set_clock(cd, TIMER_CLK(imbusclk));
- cd->max_delta_ns =
- clockevent_delta2ns(0xffffffff >> (32 - TXX9_TIMER_BITS), cd);
cd->max_delta_ticks = 0xffffffff >> (32 - TXX9_TIMER_BITS);
- cd->min_delta_ns = clockevent_delta2ns(0xf, cd);
cd->min_delta_ticks = 0xf;
cd->irq = irq;
cd->cpumask = cpumask_of(0),
diff --git a/arch/mips/loongson32/common/time.c b/arch/mips/loongson32/common/time.c
index bf42507..4e8e6bd 100644
--- a/arch/mips/loongson32/common/time.c
+++ b/arch/mips/loongson32/common/time.c
@@ -198,9 +198,7 @@ static void __init ls1x_time_init(void)
ls1x_pwmtimer_init();

clockevent_set_clock(cd, mips_hpt_frequency);
- cd->max_delta_ns = clockevent_delta2ns(0xffffff, cd);
cd->max_delta_ticks = 0xffffff;
- cd->min_delta_ns = clockevent_delta2ns(0x000300, cd);
cd->min_delta_ticks = 0x000300;
cd->cpumask = cpumask_of(smp_processor_id());
clockevents_register_device(cd);
diff --git a/arch/mips/loongson64/common/cs5536/cs5536_mfgpt.c b/arch/mips/loongson64/common/cs5536/cs5536_mfgpt.c
index fd9a9f2..6bc905c 100644
--- a/arch/mips/loongson64/common/cs5536/cs5536_mfgpt.c
+++ b/arch/mips/loongson64/common/cs5536/cs5536_mfgpt.c
@@ -122,9 +122,7 @@ void __init setup_mfgpt0_timer(void)

cd->cpumask = cpumask_of(cpu);
clockevent_set_clock(cd, MFGPT_TICK_RATE);
- cd->max_delta_ns = clockevent_delta2ns(0xffff, cd);
cd->max_delta_ticks = 0xffff;
- cd->min_delta_ns = clockevent_delta2ns(0xf, cd);
cd->min_delta_ticks = 0xf;

/* Enable MFGPT0 Comparator 2 Output to the Interrupt Mapper */
diff --git a/arch/mips/loongson64/loongson-3/hpet.c b/arch/mips/loongson64/loongson-3/hpet.c
index 94f5410..2d407a1 100644
--- a/arch/mips/loongson64/loongson-3/hpet.c
+++ b/arch/mips/loongson64/loongson-3/hpet.c
@@ -240,9 +240,7 @@ void __init setup_hpet_timer(void)
cd->irq = HPET_T0_IRQ;
cd->cpumask = cpumask_of(cpu);
clockevent_set_clock(cd, HPET_FREQ);
- cd->max_delta_ns = clockevent_delta2ns(0x7fffffff, cd);
cd->max_delta_ticks = 0x7fffffff;
- cd->min_delta_ns = clockevent_delta2ns(HPET_MIN_PROG_DELTA, cd);
cd->min_delta_ticks = HPET_MIN_PROG_DELTA;

clockevents_register_device(cd);
diff --git a/arch/mips/ralink/cevt-rt3352.c b/arch/mips/ralink/cevt-rt3352.c
index b8a1376..3657f98 100644
--- a/arch/mips/ralink/cevt-rt3352.c
+++ b/arch/mips/ralink/cevt-rt3352.c
@@ -128,9 +128,7 @@ static int __init ralink_systick_init(struct device_node *np)
systick_irqaction.name = np->name;
systick.dev.name = np->name;
clockevents_calc_mult_shift(&systick.dev, SYSTICK_FREQ, 60);
- systick.dev.max_delta_ns = clockevent_delta2ns(0x7fff, &systick.dev);
systick.dev.max_delta_ticks = 0x7fff;
- systick.dev.min_delta_ns = clockevent_delta2ns(0x3, &systick.dev);
systick.dev.min_delta_ticks = 0x3;
systick.dev.irq = irq_of_parse_and_map(np, 0);
if (!systick.dev.irq) {
diff --git a/arch/mips/sgi-ip27/ip27-timer.c b/arch/mips/sgi-ip27/ip27-timer.c
index c1e0c84..aad2a6e 100644
--- a/arch/mips/sgi-ip27/ip27-timer.c
+++ b/arch/mips/sgi-ip27/ip27-timer.c
@@ -112,9 +112,7 @@ void hub_rt_clock_event_init(void)
cd->name = name;
cd->features = CLOCK_EVT_FEAT_ONESHOT;
clockevent_set_clock(cd, CYCLES_PER_SEC);
- cd->max_delta_ns = clockevent_delta2ns(0xfffffffffffff, cd);
cd->max_delta_ticks = 0xfffffffffffff;
- cd->min_delta_ns = clockevent_delta2ns(0x300, cd);
cd->min_delta_ticks = 0x300;
cd->rating = 200;
cd->irq = irq;
diff --git a/arch/mn10300/kernel/cevt-mn10300.c b/arch/mn10300/kernel/cevt-mn10300.c
index 2b21bbc..276336b 100644
--- a/arch/mn10300/kernel/cevt-mn10300.c
+++ b/arch/mn10300/kernel/cevt-mn10300.c
@@ -97,9 +97,7 @@ int __init init_clockevents(void)
clockevents_calc_mult_shift(cd, MN10300_JCCLK, 1);

/* Calculate the min / max delta */
- cd->max_delta_ns = clockevent_delta2ns(TMJCBR_MAX, cd);
cd->max_delta_ticks = TMJCBR_MAX;
- cd->min_delta_ns = clockevent_delta2ns(100, cd);
cd->min_delta_ticks = 100;

cd->rating = 200;
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index b6b13d9..aefb9dd 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -972,11 +972,7 @@ static void __init init_decrementer_clockevent(void)

clockevents_calc_mult_shift(&decrementer_clockevent, ppc_tb_freq, 4);

- decrementer_clockevent.max_delta_ns =
- clockevent_delta2ns(decrementer_max, &decrementer_clockevent);
decrementer_clockevent.max_delta_ticks = decrementer_max;
- decrementer_clockevent.min_delta_ns =
- clockevent_delta2ns(2, &decrementer_clockevent);
decrementer_clockevent.min_delta_ticks = 2;

register_decrementer_clockevent(cpu);
diff --git a/arch/s390/kernel/time.c b/arch/s390/kernel/time.c
index 46f2073..5673630 100644
--- a/arch/s390/kernel/time.c
+++ b/arch/s390/kernel/time.c
@@ -157,9 +157,7 @@ void init_cpu_timer(void)
cd->features = CLOCK_EVT_FEAT_ONESHOT;
cd->mult = 16777;
cd->shift = 12;
- cd->min_delta_ns = 1;
cd->min_delta_ticks = 1;
- cd->max_delta_ns = LONG_MAX;
cd->max_delta_ticks = ULONG_MAX;
cd->rating = 400;
cd->cpumask = cpumask_of(cpu);
diff --git a/arch/score/kernel/time.c b/arch/score/kernel/time.c
index 29aafc7..50455e6 100644
--- a/arch/score/kernel/time.c
+++ b/arch/score/kernel/time.c
@@ -79,11 +79,7 @@ void __init time_init(void)
/* setup COMPARE clockevent */
score_clockevent.mult = div_sc(SYSTEM_CLOCK, NSEC_PER_SEC,
score_clockevent.shift);
- score_clockevent.max_delta_ns = clockevent_delta2ns((u32)~0,
- &score_clockevent);
score_clockevent.max_delta_ticks = (u32)~0;
- score_clockevent.min_delta_ns = clockevent_delta2ns(50,
- &score_clockevent) + 1;
score_clockevent.min_delta_ticks = 50;
score_clockevent.cpumask = cpumask_of(0);
clockevents_register_device(&score_clockevent);
diff --git a/arch/sparc/kernel/time_32.c b/arch/sparc/kernel/time_32.c
index 1203c7c..247c229 100644
--- a/arch/sparc/kernel/time_32.c
+++ b/arch/sparc/kernel/time_32.c
@@ -227,9 +227,7 @@ void register_percpu_ce(int cpu)
ce->shift = 32;
ce->mult = div_sc(sparc_config.clock_rate, NSEC_PER_SEC,
ce->shift);
- ce->max_delta_ns = clockevent_delta2ns(sparc_config.clock_rate, ce);
ce->max_delta_ticks = (unsigned long)sparc_config.clock_rate;
- ce->min_delta_ns = clockevent_delta2ns(100, ce);
ce->min_delta_ticks = 100;

clockevents_register_device(ce);
diff --git a/arch/sparc/kernel/time_64.c b/arch/sparc/kernel/time_64.c
index 83f0f99..76e1594 100644
--- a/arch/sparc/kernel/time_64.c
+++ b/arch/sparc/kernel/time_64.c
@@ -794,11 +794,7 @@ void __init time_init(void)
sparc64_clockevent.name = tick_ops->name;
clockevents_calc_mult_shift(&sparc64_clockevent, freq, 4);

- sparc64_clockevent.max_delta_ns =
- clockevent_delta2ns(0x7fffffffffffffffUL, &sparc64_clockevent);
sparc64_clockevent.max_delta_ticks = 0x7fffffffffffffffUL;
- sparc64_clockevent.min_delta_ns =
- clockevent_delta2ns(0xF, &sparc64_clockevent);
sparc64_clockevent.min_delta_ticks = 0xF;

printk("clockevent: mult[%x] shift[%d]\n",
diff --git a/arch/tile/kernel/time.c b/arch/tile/kernel/time.c
index c2fd280..3022733 100644
--- a/arch/tile/kernel/time.c
+++ b/arch/tile/kernel/time.c
@@ -153,7 +153,6 @@ static int tile_timer_shutdown(struct clock_event_device *evt)
static DEFINE_PER_CPU(struct clock_event_device, tile_timer) = {
.name = "tile timer",
.features = CLOCK_EVT_FEAT_ONESHOT,
- .min_delta_ns = 1000,
.min_delta_ticks = 1,
.rating = 100,
.irq = -1,
@@ -169,7 +168,6 @@ void setup_tile_timer(void)

/* Fill in fields that are speed-specific. */
clockevents_calc_mult_shift(evt, cycles_per_sec, TILE_MINSEC);
- evt->max_delta_ns = clockevent_delta2ns(MAX_TICK, evt);
evt->max_delta_ticks = MAX_TICK;

/* Mark as being for this cpu only. */
diff --git a/arch/um/kernel/time.c b/arch/um/kernel/time.c
index 11b96ed..6023d4c 100644
--- a/arch/um/kernel/time.c
+++ b/arch/um/kernel/time.c
@@ -64,9 +64,7 @@ static struct clock_event_device timer_clockevent = {
.set_state_oneshot = itimer_one_shot,
.set_next_event = itimer_next_event,
.shift = 0,
- .max_delta_ns = 0xffffffff,
.max_delta_ticks = 0xffffffff,
- .min_delta_ns = TIMER_MIN_DELTA,
.min_delta_ticks = TIMER_MIN_DELTA, // microsecond resolution should be enough for anyone, same as 640K RAM
.irq = 0,
.mult = 1,
diff --git a/arch/unicore32/kernel/time.c b/arch/unicore32/kernel/time.c
index 29c91a9..d2a1c50 100644
--- a/arch/unicore32/kernel/time.c
+++ b/arch/unicore32/kernel/time.c
@@ -89,11 +89,7 @@ void __init time_init(void)

clockevents_calc_mult_shift(&ckevt_puv3_osmr0, CLOCK_TICK_RATE, 5);

- ckevt_puv3_osmr0.max_delta_ns =
- clockevent_delta2ns(0x7fffffff, &ckevt_puv3_osmr0);
ckevt_puv3_osmr0.max_delta_ticks = 0x7fffffff;
- ckevt_puv3_osmr0.min_delta_ns =
- clockevent_delta2ns(MIN_OSCR_DELTA * 2, &ckevt_puv3_osmr0) + 1;
ckevt_puv3_osmr0.min_delta_ticks = MIN_OSCR_DELTA * 2;
ckevt_puv3_osmr0.cpumask = cpumask_of(0);

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index eebb677..6522d6c 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -729,11 +729,7 @@ static int __init calibrate_APIC_clock(void)
lapic_timer_frequency);
lapic_clockevent.mult = div_sc(lapic_timer_frequency/APIC_DIVISOR,
TICK_NSEC, lapic_clockevent.shift);
- lapic_clockevent.max_delta_ns =
- clockevent_delta2ns(0x7FFFFF, &lapic_clockevent);
lapic_clockevent.max_delta_ticks = 0x7FFFFF;
- lapic_clockevent.min_delta_ns =
- clockevent_delta2ns(0xF, &lapic_clockevent);
lapic_clockevent.min_delta_ticks = 0xF;
lapic_clockevent.features &= ~CLOCK_EVT_FEAT_DUMMY;
return 0;
@@ -778,11 +774,7 @@ static int __init calibrate_APIC_clock(void)
/* Calculate the scaled math multiplication factor */
lapic_clockevent.mult = div_sc(delta, TICK_NSEC * LAPIC_CAL_LOOPS,
lapic_clockevent.shift);
- lapic_clockevent.max_delta_ns =
- clockevent_delta2ns(0x7FFFFFFF, &lapic_clockevent);
lapic_clockevent.max_delta_ticks = 0x7FFFFFFF;
- lapic_clockevent.min_delta_ns =
- clockevent_delta2ns(0xF, &lapic_clockevent);
lapic_clockevent.min_delta_ticks = 0xF;

lapic_timer_frequency = (delta * APIC_DIVISOR) / LAPIC_CAL_LOOPS;
diff --git a/arch/x86/lguest/boot.c b/arch/x86/lguest/boot.c
index 930dabe..8566f85 100644
--- a/arch/x86/lguest/boot.c
+++ b/arch/x86/lguest/boot.c
@@ -993,9 +993,7 @@ static struct clock_event_device lguest_clockevent = {
.rating = INT_MAX,
.mult = 1,
.shift = 0,
- .min_delta_ns = LG_CLOCK_MIN_DELTA,
.min_delta_ticks = LG_CLOCK_MIN_DELTA,
- .max_delta_ns = LG_CLOCK_MAX_DELTA,
.max_delta_ticks = LG_CLOCK_MAX_DELTA,
};

diff --git a/arch/x86/platform/uv/uv_time.c b/arch/x86/platform/uv/uv_time.c
index 6410ee3..670a545 100644
--- a/arch/x86/platform/uv/uv_time.c
+++ b/arch/x86/platform/uv/uv_time.c
@@ -388,12 +388,8 @@ static __init int uv_rtc_setup_clock(void)
clock_event_device_uv.mult = div_sc(sn_rtc_cycles_per_second,
NSEC_PER_SEC, clock_event_device_uv.shift);

- clock_event_device_uv.min_delta_ns = NSEC_PER_SEC /
- sn_rtc_cycles_per_second;
clock_event_device_uv.min_delta_ticks = 1;

- clock_event_device_uv.max_delta_ns = clocksource_uv.mask *
- (NSEC_PER_SEC / sn_rtc_cycles_per_second);
clock_event_device_uv.max_delta_ticks = clocksource_uv.mask;

rc = schedule_on_each_cpu(uv_rtc_register_clockevents);
diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
index cfc2f12..d2b9439 100644
--- a/arch/x86/xen/time.c
+++ b/arch/x86/xen/time.c
@@ -208,9 +208,7 @@ static const struct clock_event_device xen_timerop_clockevent = {
.name = "xen",
.features = CLOCK_EVT_FEAT_ONESHOT,

- .max_delta_ns = 0xffffffff,
.max_delta_ticks = 0xffffffff,
- .min_delta_ns = TIMER_SLOP,
.min_delta_ticks = TIMER_SLOP,

.mult = 1,
@@ -269,9 +267,7 @@ static const struct clock_event_device xen_vcpuop_clockevent = {
.name = "xen",
.features = CLOCK_EVT_FEAT_ONESHOT,

- .max_delta_ns = 0xffffffff,
.max_delta_ticks = 0xffffffff,
- .min_delta_ns = TIMER_SLOP,
.min_delta_ticks = TIMER_SLOP,

.mult = 1,
diff --git a/drivers/clocksource/dw_apb_timer.c b/drivers/clocksource/dw_apb_timer.c
index 2d68204..8f89b90 100644
--- a/drivers/clocksource/dw_apb_timer.c
+++ b/drivers/clocksource/dw_apb_timer.c
@@ -255,10 +255,7 @@ dw_apb_clockevent_init(int cpu, const char *name, unsigned rating,
dw_ced->timer.freq = freq;

clockevents_calc_mult_shift(&dw_ced->ced, freq, APBT_MIN_PERIOD);
- dw_ced->ced.max_delta_ns = clockevent_delta2ns(0x7fffffff,
- &dw_ced->ced);
dw_ced->ced.max_delta_ticks = 0x7fffffff;
- dw_ced->ced.min_delta_ns = clockevent_delta2ns(5000, &dw_ced->ced);
dw_ced->ced.min_delta_ticks = 5000;
dw_ced->ced.cpumask = cpumask_of(cpu);
dw_ced->ced.features = CLOCK_EVT_FEAT_PERIODIC |
diff --git a/drivers/clocksource/metag_generic.c b/drivers/clocksource/metag_generic.c
index 2b08024..5f86bdb 100644
--- a/drivers/clocksource/metag_generic.c
+++ b/drivers/clocksource/metag_generic.c
@@ -113,9 +113,7 @@ static int arch_timer_starting_cpu(unsigned int cpu)
clk->set_next_event = metag_timer_set_next_event,

clk->mult = div_sc(hwtimer_freq, NSEC_PER_SEC, clk->shift);
- clk->max_delta_ns = clockevent_delta2ns(0x7fffffff, clk);
clk->max_delta_ticks = 0x7fffffff;
- clk->min_delta_ns = clockevent_delta2ns(0xf, clk);
clk->min_delta_ticks = 0xf;
clk->cpumask = cpumask_of(cpu);

diff --git a/drivers/clocksource/numachip.c b/drivers/clocksource/numachip.c
index 6a20dc8..8ad51a0 100644
--- a/drivers/clocksource/numachip.c
+++ b/drivers/clocksource/numachip.c
@@ -50,9 +50,7 @@ static struct clock_event_device numachip2_clockevent = {
.features = CLOCK_EVT_FEAT_ONESHOT,
.mult = 1,
.shift = 0,
- .min_delta_ns = 1250,
.min_delta_ticks = 1250,
- .max_delta_ns = LONG_MAX,
.max_delta_ticks = LONG_MAX,
};

diff --git a/drivers/clocksource/sh_cmt.c b/drivers/clocksource/sh_cmt.c
index 97ce6bf..e8bc1e6 100644
--- a/drivers/clocksource/sh_cmt.c
+++ b/drivers/clocksource/sh_cmt.c
@@ -814,9 +814,7 @@ static int sh_cmt_register_clockevent(struct sh_cmt_channel *ch,
/* TODO: calculate good shift from rate and counter bit width */
ced->shift = 32;
ced->mult = div_sc(ch->cmt->rate, NSEC_PER_SEC, ced->shift);
- ced->max_delta_ns = clockevent_delta2ns(ch->max_match_value, ced);
ced->max_delta_ticks = ch->max_match_value;
- ced->min_delta_ns = clockevent_delta2ns(0x1f, ced);
ced->min_delta_ticks = 0x1f;

dev_info(&ch->cmt->pdev->dev, "ch%u: used for clock events\n",
diff --git a/drivers/clocksource/timer-atlas7.c b/drivers/clocksource/timer-atlas7.c
index c95f209..5ef980e 100644
--- a/drivers/clocksource/timer-atlas7.c
+++ b/drivers/clocksource/timer-atlas7.c
@@ -191,9 +191,7 @@ static int sirfsoc_local_timer_starting_cpu(unsigned int cpu)
ce->tick_resume = sirfsoc_timer_shutdown;
ce->set_next_event = sirfsoc_timer_set_next_event;
clockevents_calc_mult_shift(ce, atlas7_timer_rate, 60);
- ce->max_delta_ns = clockevent_delta2ns(-2, ce);
ce->max_delta_ticks = (unsigned long)-2;
- ce->min_delta_ns = clockevent_delta2ns(2, ce);
ce->min_delta_ticks = 2;
ce->cpumask = cpumask_of(cpu);

diff --git a/kernel/time/tick-broadcast-hrtimer.c b/kernel/time/tick-broadcast-hrtimer.c
index 690b797..d0fbbf8 100644
--- a/kernel/time/tick-broadcast-hrtimer.c
+++ b/kernel/time/tick-broadcast-hrtimer.c
@@ -83,8 +83,6 @@ static struct clock_event_device ce_broadcast_hrtimer = {
CLOCK_EVT_FEAT_HRTIMER,
.rating = 0,
.bound_on = -1,
- .min_delta_ns = 1,
- .max_delta_ns = KTIME_MAX,
.min_delta_ticks = 1,
.max_delta_ticks = ULONG_MAX,
.mult = 1,
--
2.10.2

2016-11-19 16:11:04

by Nicolai Stange

[permalink] [raw]
Subject: [RFC v8 13/28] clockevents: do comparison of delta against minimum in terms of cycles

Before converting the given delta from ns to cycles by means of the
mult/shift pair, clockevents_program_event() enforces it to be greater or
equal than ->max_delta_ns. Simplified, this reads as

delta = max(delta, dev->min_delta_ns);
clc = (delta * dev->mult) >> dev->shift;

Note that ->min_delta_ns is more or less ->min_delta_ticks converted to
nanoseconds and thus, it depends on the device's mult/shift pair as well.
Thus, any update to ->mult would require a corresponding change of
->min_delta_ns as well and these two had to get consumed in the clockevent
programming path atomically. This would have a negative performance impact
as soon as we start adjusting ->mult from a different CPU with upcoming
changes making the clockevents core NTP correction aware: some kind of
locking would have been needed in the event programming path.

These costly atomic reads in the event programming path can be avoided by
doing the comparison against the lower bound in terms of cycles instead of
nanoseconds.

Now, as it stands, ->min_delta_ns is not always completely equivalent
to ->min_delta_ticks: first, it is forced to be >= 1us and second,
clockevents_program_min_delta() can increase it if
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y.

In order to account for this, introduce ->min_delta_ticks_adjusted which
is supposed to be equivalent to ->min_delta_ns. Keep the original
->min_delta_ticks for reconfiguration.

The invariant ->min_delta_ticks <= ->min_delta_ticks_adjusted will be
guaranteed to hold at all times. This will allow for non-atomic updates of
->mult and ->min_delta_ticks_adjusted -- as long as we stay within a
device's allowed bounds, we don't care for small deviations.

Make clockevents_program_event() use ->min_delta_ticks_adjusted for the
enforcement of the lower bound on the delta value.

Finally, move ->min_delta_ns out of struct clock_event_device's first
cacheline.

Signed-off-by: Nicolai Stange <[email protected]>
---
include/linux/clockchips.h | 6 ++++--
kernel/time/clockevents.c | 9 ++++++++-
2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h
index a116926..6342b76 100644
--- a/include/linux/clockchips.h
+++ b/include/linux/clockchips.h
@@ -74,7 +74,7 @@ enum clock_event_state {
* @set_next_ktime: set next event function using a direct ktime value
* @next_event: local storage for the next event in oneshot mode
* @max_delta_ns: maximum delta value in ns
- * @min_delta_ns: minimum delta value in ns
+ * @min_delta_ticks_adjusted: minimum delta value, increased as needed
* @mult: nanosecond to cycles multiplier
* @shift: nanoseconds to cycles divisor (power of two)
* @state_use_accessors:current state of the device, assigned by the core code
@@ -88,6 +88,7 @@ enum clock_event_state {
* @broadcast: function to broadcast events
* @min_delta_ticks: minimum delta value in ticks stored for reconfiguration
* @max_delta_ticks: maximum delta value in ticks stored for reconfiguration
+ * @min_delta_ns: minimum delta value in ns
* @name: ptr to clock event name
* @rating: variable to rate clock event devices
* @irq: IRQ number (only for non CPU local devices)
@@ -102,7 +103,7 @@ struct clock_event_device {
int (*set_next_ktime)(ktime_t expires, struct clock_event_device *);
ktime_t next_event;
u64 max_delta_ns;
- u64 min_delta_ns;
+ unsigned long min_delta_ticks_adjusted;
u32 mult;
u32 shift;
enum clock_event_state state_use_accessors;
@@ -120,6 +121,7 @@ struct clock_event_device {
void (*resume)(struct clock_event_device *);
unsigned long min_delta_ticks;
unsigned long max_delta_ticks;
+ u64 min_delta_ns;

const char *name;
int rating;
diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index 4fcec08..5d54928 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -225,6 +225,9 @@ static int clockevents_increase_min_delta(struct clock_event_device *dev)
if (dev->min_delta_ns > MIN_DELTA_LIMIT)
dev->min_delta_ns = MIN_DELTA_LIMIT;

+ dev->min_delta_ticks_adjusted = (unsigned long)((dev->min_delta_ns *
+ dev->mult) >> dev->shift);
+
printk_deferred(KERN_WARNING
"CE: %s increased min_delta_ns to %llu nsec\n",
dev->name ? dev->name : "?",
@@ -333,9 +336,11 @@ int clockevents_program_event(struct clock_event_device *dev, ktime_t expires,
return force ? clockevents_program_min_delta(dev) : -ETIME;

delta = min(delta, (int64_t) dev->max_delta_ns);
- delta = max(delta, (int64_t) dev->min_delta_ns);

clc = ((unsigned long long) delta * dev->mult) >> dev->shift;
+
+ clc = max_t(unsigned long, clc, dev->min_delta_ticks_adjusted);
+
rc = dev->set_next_event((unsigned long) clc, dev);

return (rc && force) ? clockevents_program_min_delta(dev) : rc;
@@ -454,6 +459,8 @@ static void __clockevents_update_bounds(struct clock_event_device *dev)
*/
dev->min_delta_ns = cev_delta2ns(dev->min_delta_ticks, dev, false);
dev->max_delta_ns = cev_delta2ns(dev->max_delta_ticks, dev, true);
+ dev->min_delta_ticks_adjusted = (unsigned long)((dev->min_delta_ns *
+ dev->mult) >> dev->shift);
}

/**
--
2.10.2

2016-11-19 16:11:13

by Nicolai Stange

[permalink] [raw]
Subject: [RFC v8 14/28] clockevents: clockevents_program_min_delta(): don't set ->next_event

Currently, clockevents_program_min_delta() sets a clockevent device's
->next_event to the point in time where the minimum delta would actually
expire:

delta = dev->min_delta_ns;
dev->next_event = ktime_add_ns(ktime_get(), delta);

For your reference, this is so since the initial advent of
clockevents_program_min_delta() with
commit d1748302f70b ("clockevents: Make minimum delay adjustments
configurable").

clockevents_program_min_delta() is called from clockevents_program_event()
only. More specifically, it is called if the latter's force argument is set
and, neglecting the case of device programming failure for the moment, if
the requested expiry is in the past.

On the contrary, if the expiry requested from clockevents_program_event()
is in the future, but less than ->min_delta_ns behind, then
- ->next_event gets set to that expiry verbatim
- but the clockevent device gets silently programmed to fire after
->min_delta_ns only.

Thus, in the extreme cases of expires == ktime_get() and
expires == ktime_get() + 1, the respective values of ->next_event would
differ by ->min_delta_ns while the clockevent device would actually get
programmed to fire at (almost) the same times (with force being set,
of course).

While this discontinuity of ->next_event at expires == ktime_get() is not
a problem by itself, the mere use of ->min_delta_ns in the event
programming path hinders upcoming changes making the clockevent core
NTP correction aware: both, ->mult and ->min_delta_ns would need to get
updated as well as consumed atomically and we'd rather like to avoid any
locking here.

Thus, let clockevents_program_event() unconditionally set ->next_event to
the expiry time actually requested by its caller, i.e. don't set
->next_event from clockevents_program_min_delta().

A few notes on why this change is safe with the current consumers of
->next_event:
1.
Note that a clockevents_program_event() with a requested expiry in the
past and force being set basically means: "fire ASAP". Now, consider this
so programmed event getting handed once again to
clockevents_program_event(), i.e. that a

clockevents_program_event(dev, dev->next_event, false)

as in __clockevents_update_freq() is done.
With this change applied, clockevents_program_event() would now properly
detect the expiry being in the past and, due to the force argument being
unset, wouldn't actually do anything.
Before this change OTOH, there would be the (very unlikely) possibility
that the requested event is still somewhere in the future and
clockevents_program_event() would silently delay the event expiration by
another ->min_delta_ns.

2.
The periodic tick handlers on oneshot-only devices use ->next_event
to calculate the followup expiry time.
tick_handle_periodic() spins on reprogramming the clockevent device
until some expiry in the future has been reached:

ktime_t next = dev->next_event;
...
for(;;) {
next = ktime_add(next, tick_period);
if (!clockevents_program_event(dev, next, false))
return;
...
}

Thus, tick_handle_periodic() isn't affected by this change.
For tick_handle_periodic_broadcast(), the situation is different since

commit 2951d5c031a3 ("tick: broadcast: Prevent livelock from event
handler")

though: a loop similar to the one from tick_handle_periodic() above got
replaced by a single

ktime_t next = ktime_add(dev->next_event, tick_period);
clockevents_program_event(dev, next, true);

In the case that dev->next_event + tick_period happens to be less than
ktime_get() + ->min_delta_ns, without this change applied, ->next_event
would get recovered to some point in the future after a single
tick_handle_periodic_broadcast() event.
On the contrary, with this patch applied, it could potentially take some
number of tick_handle_periodic_broadcast() events, each separated by
->min_delta_ns only, until ->next_event is able to catch up with the
current ktime_get(). However, if this turns out to become a problem,
the reprogramming loop in tick_handle_periodic_broadcast() can probably
be restored easily.

3.
In kernel/time/tick-broadcast.c, the broadcast receiving clockevent
devices' ->next_event is read multiple times in order to determine who's
next or who must be pinged. These uses all continue to work. Moreover,
clockevent devices getting programmed to something less than
ktime_get() + ->min_delta_ns
might not be the best candidates for a transition into C3 anyway.

4.
Finally, a "sleep length" is calculated at the very end of
tick_nohz_stop_sched_tick() as follows:

ts->sleep_length = ktime_sub(dev->next_event, now);

AFAICS, this can happen to be negative w/o this change applied already: in
NOHZ_MODE_HIGHRES mode there can be some overdue hrtimers whose removal is
blocked because tick_nohz_stop_sched_tick() gets called with interrupts
disabled. Unfortunately, the only user, the menu cpuidle governor,
can't cope with negative sleep lengths as it casts the return value
of the tick_nohz_get_sleep_length() getter to an unsigned int.
This change can very well make things worse here. A followup patch
will force this ->sleep_length to >= 0.

Signed-off-by: Nicolai Stange <[email protected]>
---
kernel/time/clockevents.c | 2 --
1 file changed, 2 deletions(-)

diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index 5d54928..8d32c2c 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -249,7 +249,6 @@ static int clockevents_program_min_delta(struct clock_event_device *dev)

for (i = 0;;) {
delta = dev->min_delta_ns;
- dev->next_event = ktime_add_ns(ktime_get(), delta);

if (clockevent_state_shutdown(dev))
return 0;
@@ -286,7 +285,6 @@ static int clockevents_program_min_delta(struct clock_event_device *dev)
int64_t delta;

delta = dev->min_delta_ns;
- dev->next_event = ktime_add_ns(ktime_get(), delta);

if (clockevent_state_shutdown(dev))
return 0;
--
2.10.2

2016-11-19 16:11:33

by Nicolai Stange

[permalink] [raw]
Subject: [RFC v8 15/28] clockevents: use ->min_delta_ticks_adjusted to program minimum delta

The use of a clockevent device's ->min_delta_ns in the event programming
path hinders upcoming changes to the clockevent core making it NTP
correction aware: both, ->mult and ->min_delta_ns would need to get
updated as well as consumed atomically and we'd rather like to avoid any
locking here.

We already have got ->min_delta_ticks_adjusted which
- resembles the value of ->min_delta_ns
- and is guaranteed to be always >= the hardware's hard limit
->min_delta_ticks and thus, can be used w/o locking as we don't care
for small deviations.

In both implementations of clockevents_program_min_delta(),
don't calculate the event's deadline from ->min_delta_ns, but use its
drop-in replacement ->min_delta_ticks_adjusted.

Signed-off-by: Nicolai Stange <[email protected]>
---
kernel/time/clockevents.c | 15 ++-------------
1 file changed, 2 insertions(+), 13 deletions(-)

diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index 8d32c2c..74b05dd 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -243,19 +243,14 @@ static int clockevents_increase_min_delta(struct clock_event_device *dev)
*/
static int clockevents_program_min_delta(struct clock_event_device *dev)
{
- unsigned long long clc;
- int64_t delta;
int i;

for (i = 0;;) {
- delta = dev->min_delta_ns;
-
if (clockevent_state_shutdown(dev))
return 0;

dev->retries++;
- clc = ((unsigned long long) delta * dev->mult) >> dev->shift;
- if (dev->set_next_event((unsigned long) clc, dev) == 0)
+ if (!dev->set_next_event(dev->min_delta_ticks_adjusted, dev))
return 0;

if (++i > 2) {
@@ -281,17 +276,11 @@ static int clockevents_program_min_delta(struct clock_event_device *dev)
*/
static int clockevents_program_min_delta(struct clock_event_device *dev)
{
- unsigned long long clc;
- int64_t delta;
-
- delta = dev->min_delta_ns;
-
if (clockevent_state_shutdown(dev))
return 0;

dev->retries++;
- clc = ((unsigned long long) delta * dev->mult) >> dev->shift;
- return dev->set_next_event((unsigned long) clc, dev);
+ return dev->set_next_event(dev->min_delta_ticks_adjusted, dev);
}

#endif /* CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST */
--
2.10.2

2016-11-19 16:11:45

by Nicolai Stange

[permalink] [raw]
Subject: [RFC v8 17/28] timer_list: print_tickdevice(): calculate ->min_delta_ns dynamically

print_tickdevice(), assembling the per-tick device sections in
/proc/timer_list, is the last user of struct clock_event_device's
->min_delta_ns member.

In order to make this one fully obsolete while retaining userspace ABI,
calculate the displayed value of 'min_delta_ns' on the fly from
->min_delta_ticks_adjusted.

Signed-off-by: Nicolai Stange <[email protected]>
---
kernel/time/timer_list.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/time/timer_list.c b/kernel/time/timer_list.c
index ba7d8b2..12a27cb 100644
--- a/kernel/time/timer_list.c
+++ b/kernel/time/timer_list.c
@@ -206,6 +206,9 @@ static void
print_tickdevice(struct seq_file *m, struct tick_device *td, int cpu)
{
struct clock_event_device *dev = td->evtdev;
+ u64 min_delta_ns;
+
+ min_delta_ns = clockevent_delta2ns(dev->min_delta_ticks_adjusted, dev);

SEQ_printf(m, "Tick Device: mode: %d\n", td->mode);
if (cpu < 0)
@@ -222,7 +225,7 @@ print_tickdevice(struct seq_file *m, struct tick_device *td, int cpu)
SEQ_printf(m, " max_delta_ns: %llu\n",
(unsigned long long) dev->max_delta_ns);
SEQ_printf(m, " min_delta_ns: %llu\n",
- (unsigned long long) dev->min_delta_ns);
+ (unsigned long long) min_delta_ns);
SEQ_printf(m, " mult: %u\n", dev->mult);
SEQ_printf(m, " shift: %u\n", dev->shift);
SEQ_printf(m, " mode: %d\n", clockevent_get_state(dev));
--
2.10.2

2016-11-19 16:11:50

by Nicolai Stange

[permalink] [raw]
Subject: [RFC v8 18/28] clockevents: purge ->min_delta_ns

The struct clock_event_device's ->min_delta_ns member isn't used anymore.

Purge it.

In __clockevents_update_bounds(), shortcut the
->min_delta_ticks => ->min_delta_ns => ->min_delta_ticks_adjusted
calculation detour -- it had been made solely for the purpose of ensuring
that ->min_delta_ticks_adjusted corresponds to something >= 1us.

Signed-off-by: Nicolai Stange <[email protected]>
---
include/linux/clockchips.h | 2 --
kernel/time/clockevents.c | 13 +++++++------
2 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h
index 6342b76..5c4670a 100644
--- a/include/linux/clockchips.h
+++ b/include/linux/clockchips.h
@@ -88,7 +88,6 @@ enum clock_event_state {
* @broadcast: function to broadcast events
* @min_delta_ticks: minimum delta value in ticks stored for reconfiguration
* @max_delta_ticks: maximum delta value in ticks stored for reconfiguration
- * @min_delta_ns: minimum delta value in ns
* @name: ptr to clock event name
* @rating: variable to rate clock event devices
* @irq: IRQ number (only for non CPU local devices)
@@ -121,7 +120,6 @@ struct clock_event_device {
void (*resume)(struct clock_event_device *);
unsigned long min_delta_ticks;
unsigned long max_delta_ticks;
- u64 min_delta_ns;

const char *name;
int rating;
diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index 36dd024..57b00b0 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -443,14 +443,15 @@ static void __clockevents_update_bounds(struct clock_event_device *dev)
(dev->features & CLOCK_EVT_FEAT_DUMMY))
return;

+ dev->max_delta_ns = cev_delta2ns(dev->max_delta_ticks, dev, true);
+
/*
- * cev_delta2ns() never returns values less than 1us and thus,
- * we'll never program any ced with anything less.
+ * Enforce ->min_delta_ticks_adjusted to correspond to a value
+ * >= 1us.
*/
- dev->min_delta_ns = cev_delta2ns(dev->min_delta_ticks, dev, false);
- dev->max_delta_ns = cev_delta2ns(dev->max_delta_ticks, dev, true);
- dev->min_delta_ticks_adjusted = (unsigned long)((dev->min_delta_ns *
- dev->mult) >> dev->shift);
+ dev->min_delta_ticks_adjusted =
+ max(dev->min_delta_ticks,
+ (unsigned long)((1000ULL * dev->mult) >> dev->shift));
}

/**
--
2.10.2

2016-11-19 16:11:58

by Nicolai Stange

[permalink] [raw]
Subject: [RFC v8 19/28] clockevents: introduce CLOCK_EVT_FEAT_NO_ADJUST flag

Upcoming changes to the clockevent core will make it adjusting a clockevent
device's mult/shift pair in order to compensate for NTP corrections made
by the timekeeping core.

For certain devices this behaviour is unwanted. Introduce the
CLOCK_EVT_FEAT_NO_ADJUST flag that, when being set, will cause a clockevent
device to be excluded from those adjustments.

Signed-off-by: Nicolai Stange <[email protected]>
---
include/linux/clockchips.h | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h
index 5c4670a..76fc51b 100644
--- a/include/linux/clockchips.h
+++ b/include/linux/clockchips.h
@@ -66,6 +66,12 @@ enum clock_event_state {
*/
# define CLOCK_EVT_FEAT_HRTIMER 0x000080

+/*
+ * Clockevent device's mult/shift pair shall not get adjusted in order
+ * to compensate for NTP corrections made in the timekeeping core.
+ */
+# define CLOCK_EVT_FEAT_NO_ADJUST 0x000100
+
/**
* struct clock_event_device - clock event device descriptor
* @event_handler: Assigned by the framework to be called by the low
--
2.10.2

2016-11-19 16:12:06

by Nicolai Stange

[permalink] [raw]
Subject: [RFC v8 21/28] clockevents: pack ->state_use_accessors and ->features together

Right now, the members ->state_use_accessors and ->features of
struct clock_event_device occupy an int each which is way more than needed:
the former can only take one of the five different values from
enum clock_event_state while the latter is a bitmask with the highest
CLOCK_EVT_FEAT_* bit defined so far being the ninth.

Both members are located in the first cacheline and thus, packing them into
a single int would free some precious space there.

Turn both members into appropriately sized bitfields.

Signed-off-by: Nicolai Stange <[email protected]>
---
include/linux/clockchips.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h
index c8dfc7a..548f4fe 100644
--- a/include/linux/clockchips.h
+++ b/include/linux/clockchips.h
@@ -111,8 +111,8 @@ struct clock_event_device {
unsigned int min_delta_ticks_adjusted;
u32 mult;
u32 shift;
- enum clock_event_state state_use_accessors;
- unsigned int features;
+ enum clock_event_state state_use_accessors:8;
+ unsigned int features:24;
unsigned long retries;

int (*set_state_periodic)(struct clock_event_device *);
--
2.10.2

2016-11-19 16:12:17

by Nicolai Stange

[permalink] [raw]
Subject: [RFC v8 23/28] clockevents: initial support for mono to raw time conversion

With NOHZ_FULL and one single well-isolated, CPU consumptive task, one
would expect approximately one clockevent interrupt per second. However, on
my Intel Haswell where the monotonic clock is the TSC monotonic clock and
the clockevent device is the TSC deadline device, it turns out that every
second, there are two such interrupts: the first one arrives always
approximately ~50us before the scheduled deadline as programmed by
tick_nohz_stop_sched_tick() through the hrtimer API. The
__hrtimer_run_queues() called in this interrupt detects that the queued
tick_sched_timer hasn't expired yet and simply does nothing except
reprogramming the clock event device to fire shortly after again.

These too early programmed deadlines are explained as follows:
clockevents_program_event() programs the clockevent device to fire
after
f_event * delta_t_progr
clockevent device cycles where f_event is the clockevent device's hardware
frequency and delta_t_progr is the requested time interval. After that many
clockevent device cycles have elapsed, the device underlying the monotonic
clock, that is the monotonic raw clock has seen f_raw / f_event as many
cycles.
The ktime_get() called from __hrtimer_run_queues() interprets those
cycles to run at the frequency of the monotonic clock. Summarizing:
delta_t_perc = 1/f_mono * f_raw/f_event * f_event * delta_t_progr
= f_raw / f_mono * delta_t_progr
with f_mono being the monotonic clock's frequency and delta_t_perc being
the elapsed time interval as perceived by __hrtimer_run_queues().

Now, f_mono is not a constant, but is dynamically adjusted in
timekeeping_adjust() in order to compensate for the NTP error. With the
large values of delta_t_progr of 10^9ns with NOHZ_FULL, the error made
becomes significant and results in the double timer interrupts described
above.

Compensate for this error by multiplying the clockevent device's f_event
by f_mono/f_raw.

Namely:
- Introduce a ->mult_adjusted member to the struct clock_event_device. Its
value is supposed to be equal to ->mult * f_mono/f_raw for devices
which don't have the CLOCK_EVT_FEAT_NO_ADJUST flag set, equal to ->mult
otherwise.
- Introduce the timekeeping_get_mono_mult() helper which provides
the clockevent core with access to the timekeeping's current f_mono
and f_raw.
- Introduce the helper __clockevents_adjust_freq() which
sets a clockevent device's ->mult_adjusted member as appropriate. It is
implemented with the help of the new __clockevents_calc_adjust_freq().
- Call __clockevents_adjust_freq() at clockevent device registration time
as well as at frequency updates through clockevents_update_freq().
- Use the ->mult_adjusted rather than ->mult in the ns to cycle
conversion made in clockevents_program_event() as well as in the
cycle to ns conversion in cev_delta2ns().
- Finally, move ->mult out of struct clock_event_device's first cacheline.

Note that future adjustments of the monotonic clock are not taken into
account yet. Furthemore, this patch assumes that after a clockevent
device's registration, its ->mult changes only through calls to
clockevents_update_freq().

Signed-off-by: Nicolai Stange <[email protected]>
---
include/linux/clockchips.h | 6 ++--
kernel/time/clockevents.c | 80 ++++++++++++++++++++++++++++++++++++++-------
kernel/time/tick-internal.h | 1 +
kernel/time/timekeeping.c | 14 ++++++++
4 files changed, 88 insertions(+), 13 deletions(-)

diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h
index 8fc5469..e07f421 100644
--- a/include/linux/clockchips.h
+++ b/include/linux/clockchips.h
@@ -82,7 +82,7 @@ enum clock_event_state {
* @max_delta_ns: maximum delta value in ns
* @max_delta_ticks: maximum delta value in ticks
* @min_delta_ticks_adjusted: minimum delta value, increased as needed
- * @mult: nanosecond to cycles multiplier
+ * @mult_adjusted: adjusted multiplier compensating for NTP adjustments
* @shift: nanoseconds to cycles divisor (power of two)
* @state_use_accessors:current state of the device, assigned by the core code
* @features: features
@@ -95,6 +95,7 @@ enum clock_event_state {
* @broadcast: function to broadcast events
* @name: ptr to clock event name
* @min_delta_ticks: minimum delta value in ticks stored for reconfiguration
+ * @mult: ns to cycles multiplier stored for reconfiguration
* @rating: variable to rate clock event devices
* @irq: IRQ number (only for non CPU local devices)
* @bound_on: Bound on CPU
@@ -110,7 +111,7 @@ struct clock_event_device {
u64 max_delta_ns;
unsigned long max_delta_ticks;
unsigned int min_delta_ticks_adjusted;
- u32 mult;
+ u32 mult_adjusted;
u32 shift;
enum clock_event_state state_use_accessors:8;
unsigned int features:24;
@@ -128,6 +129,7 @@ struct clock_event_device {

const char *name;
unsigned int min_delta_ticks;
+ u32 mult;
int rating;
int irq;
int bound_on;
diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index ff44140..d9a53af 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -34,17 +34,19 @@ struct ce_unbind {
int res;
};

+static void __clockevents_adjust_freq(struct clock_event_device *dev);
+
static u64 cev_delta2ns(unsigned long latch, struct clock_event_device *evt,
bool ismax)
{
u64 clc = (u64) latch << evt->shift;
u64 rnd;

- if (unlikely(!evt->mult)) {
- evt->mult = 1;
+ if (unlikely(!evt->mult_adjusted)) {
+ evt->mult_adjusted = 1;
WARN_ON(1);
}
- rnd = (u64) evt->mult - 1;
+ rnd = (u64) evt->mult_adjusted - 1;

/*
* Upper bound sanity check. If the backwards conversion is
@@ -73,10 +75,10 @@ static u64 cev_delta2ns(unsigned long latch, struct clock_event_device *evt,
* Also omit the add if it would overflow the u64 boundary.
*/
if ((~0ULL - clc > rnd) &&
- (!ismax || evt->mult <= (1ULL << evt->shift)))
+ (!ismax || evt->mult_adjusted <= (1ULL << evt->shift)))
clc += rnd;

- do_div(clc, evt->mult);
+ do_div(clc, evt->mult_adjusted);

/* Deltas less than 1usec are pointless noise */
return clc > 1000 ? clc : 1000;
@@ -165,8 +167,8 @@ void clockevents_switch_state(struct clock_event_device *dev,
* on it, so fix it up and emit a warning:
*/
if (clockevent_state_oneshot(dev)) {
- if (unlikely(!dev->mult)) {
- dev->mult = 1;
+ if (unlikely(!dev->mult_adjusted)) {
+ dev->mult_adjusted = 1;
WARN_ON(1);
}
}
@@ -229,8 +231,9 @@ static int clockevents_increase_min_delta(struct clock_event_device *dev)
if (min_delta_ns > MIN_DELTA_LIMIT)
min_delta_ns = MIN_DELTA_LIMIT;

- dev->min_delta_ticks_adjusted = (unsigned int)((min_delta_ns *
- dev->mult) >> dev->shift);
+ dev->min_delta_ticks_adjusted =
+ (unsigned int)((min_delta_ns * dev->mult_adjusted) >>
+ dev->shift);

printk_deferred(KERN_WARNING
"CE: %s increased min_delta_ns to %llu nsec\n",
@@ -328,7 +331,7 @@ int clockevents_program_event(struct clock_event_device *dev, ktime_t expires,

delta = min(delta, (int64_t) dev->max_delta_ns);

- clc = ((unsigned long long) delta * dev->mult) >> dev->shift;
+ clc = ((unsigned long long) delta * dev->mult_adjusted) >> dev->shift;

clc = min_t(unsigned long, clc, dev->max_delta_ticks);
clc = max_t(unsigned long, clc, dev->min_delta_ticks_adjusted);
@@ -496,7 +499,8 @@ static void __clockevents_update_bounds(struct clock_event_device *dev)
*/
dev->min_delta_ticks_adjusted =
max(dev->min_delta_ticks,
- (unsigned int)((1000ULL * dev->mult) >> dev->shift));
+ (unsigned int)((1000ULL * dev->mult_adjusted) >>
+ dev->shift));
}

/**
@@ -515,6 +519,7 @@ void clockevents_register_device(struct clock_event_device *dev)
dev->cpumask = cpumask_of(smp_processor_id());
}

+ __clockevents_adjust_freq(dev);
__clockevents_update_bounds(dev);

raw_spin_lock_irqsave(&clockevents_lock, flags);
@@ -569,9 +574,62 @@ void clockevents_config_and_register(struct clock_event_device *dev,
}
EXPORT_SYMBOL_GPL(clockevents_config_and_register);

+static u32 __clockevents_calc_adjust_freq(u32 mult_ce_raw, u32 mult_cs_mono,
+ u32 mult_cs_raw)
+{
+ u64 adj;
+ int sign;
+
+ if (mult_cs_raw >= mult_cs_mono) {
+ sign = 0;
+ adj = mult_cs_raw - mult_cs_mono;
+ } else {
+ sign = 1;
+ adj = mult_cs_mono - mult_cs_raw;
+ }
+
+ adj *= mult_ce_raw;
+ adj += mult_cs_mono / 2;
+ do_div(adj, mult_cs_mono);
+
+ if (!sign) {
+ /*
+ * Never increase mult by more than 12.5%,
+ * c.f. __clockevents_update_bounds().
+ */
+ adj = min_t(u64, adj, mult_ce_raw / 8);
+ if (U32_MAX - mult_ce_raw < adj)
+ return U32_MAX;
+ return mult_ce_raw + (u32)adj;
+ }
+ if (adj >= mult_ce_raw)
+ return 1;
+ return mult_ce_raw - (u32)adj;
+}
+
+void __clockevents_adjust_freq(struct clock_event_device *dev)
+{
+ u32 mult_cs_mono, mult_cs_raw;
+
+ if (!(dev->features & CLOCK_EVT_FEAT_ONESHOT) ||
+ (dev->features & CLOCK_EVT_FEAT_DUMMY))
+ return;
+
+ if (dev->features & CLOCK_EVT_FEAT_NO_ADJUST) {
+ dev->mult_adjusted = dev->mult;
+ return;
+ }
+
+ timekeeping_get_mono_mult(&mult_cs_mono, &mult_cs_raw);
+ dev->mult_adjusted = __clockevents_calc_adjust_freq(dev->mult,
+ mult_cs_mono,
+ mult_cs_raw);
+}
+
int __clockevents_update_freq(struct clock_event_device *dev, u32 freq)
{
clockevents_config(dev, freq);
+ __clockevents_adjust_freq(dev);
__clockevents_update_bounds(dev);

if (clockevent_state_oneshot(dev))
diff --git a/kernel/time/tick-internal.h b/kernel/time/tick-internal.h
index f738251..0b29d23 100644
--- a/kernel/time/tick-internal.h
+++ b/kernel/time/tick-internal.h
@@ -56,6 +56,7 @@ extern int clockevents_program_event(struct clock_event_device *dev,
ktime_t expires, bool force);
extern void clockevents_handle_noop(struct clock_event_device *dev);
extern int __clockevents_update_freq(struct clock_event_device *dev, u32 freq);
+extern void timekeeping_get_mono_mult(u32 *mult_cs_mono, u32 *mult_cs_raw);
extern ssize_t sysfs_get_uname(const char *buf, char *dst, size_t cnt);

/* Broadcasting support */
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 37dec7e..e0471e0 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -329,6 +329,20 @@ static inline s64 timekeeping_cycles_to_ns(struct tk_read_base *tkr,
return timekeeping_delta_to_ns(tkr, delta);
}

+void timekeeping_get_mono_mult(u32 *mult_cs_mono, u32 *mult_cs_raw)
+{
+ unsigned int seq;
+ struct tk_read_base *tkr_mono = &tk_core.timekeeper.tkr_mono;
+
+ /* The seqlock protects us from a racing change_clocksource(). */
+ do {
+ seq = read_seqcount_begin(&tk_core.seq);
+
+ *mult_cs_mono = tkr_mono->mult;
+ *mult_cs_raw = tkr_mono->clock->mult;
+ } while (read_seqcount_retry(&tk_core.seq, seq));
+}
+
/**
* update_fast_timekeeper - Update the fast and NMI safe monotonic timekeeper.
* @tkr: Timekeeping readout base from which we take the update
--
2.10.2

2016-11-19 16:12:11

by Nicolai Stange

[permalink] [raw]
Subject: [RFC v8 20/28] clockevents: degrade ->min_delta_ticks to unsigned int

While being of type unsigned long right now, the ->min_delta_ticks and
->min_delta_ticks_adjusted members of struct clock_event_device aren't
expected to ever overflow an unsigned int.

The latter sits in the first cacheline. For the case of
sizeof(long) > sizeof(int), this is a waste of precious space needed by
later patches.

Degrade the types of ->min_delta_ticks_adjusted and ->min_delta_ticks
from unsigned long to unsigned int.
Change casts in the clockevents core were needed.
Adapt the signature of clockevents_config_and_register() accordingly.
Finally, move ->min_delta_ticks farther down the struct in order to
allow for alignment with other ints, potentially avoiding padding.

Signed-off-by: Nicolai Stange <[email protected]>
---
include/linux/clockchips.h | 8 ++++----
kernel/time/clockevents.c | 6 +++---
2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h
index 76fc51b..c8dfc7a 100644
--- a/include/linux/clockchips.h
+++ b/include/linux/clockchips.h
@@ -92,9 +92,9 @@ enum clock_event_state {
* @set_state_shutdown: switch state to shutdown
* @tick_resume: resume clkevt device
* @broadcast: function to broadcast events
- * @min_delta_ticks: minimum delta value in ticks stored for reconfiguration
* @max_delta_ticks: maximum delta value in ticks stored for reconfiguration
* @name: ptr to clock event name
+ * @min_delta_ticks: minimum delta value in ticks stored for reconfiguration
* @rating: variable to rate clock event devices
* @irq: IRQ number (only for non CPU local devices)
* @bound_on: Bound on CPU
@@ -108,7 +108,7 @@ struct clock_event_device {
int (*set_next_ktime)(ktime_t expires, struct clock_event_device *);
ktime_t next_event;
u64 max_delta_ns;
- unsigned long min_delta_ticks_adjusted;
+ unsigned int min_delta_ticks_adjusted;
u32 mult;
u32 shift;
enum clock_event_state state_use_accessors;
@@ -124,10 +124,10 @@ struct clock_event_device {
void (*broadcast)(const struct cpumask *mask);
void (*suspend)(struct clock_event_device *);
void (*resume)(struct clock_event_device *);
- unsigned long min_delta_ticks;
unsigned long max_delta_ticks;

const char *name;
+ unsigned int min_delta_ticks;
int rating;
int irq;
int bound_on;
@@ -189,7 +189,7 @@ extern void clockevents_register_device(struct clock_event_device *dev);
extern int clockevents_unbind_device(struct clock_event_device *ced, int cpu);

extern void clockevents_config_and_register(struct clock_event_device *dev,
- u32 freq, unsigned long min_delta,
+ u32 freq, unsigned int min_delta,
unsigned long max_delta);

extern int clockevents_update_freq(struct clock_event_device *ce, u32 freq);
diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index 57b00b0..84639a1 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -228,7 +228,7 @@ static int clockevents_increase_min_delta(struct clock_event_device *dev)
if (min_delta_ns > MIN_DELTA_LIMIT)
min_delta_ns = MIN_DELTA_LIMIT;

- dev->min_delta_ticks_adjusted = (unsigned long)((min_delta_ns *
+ dev->min_delta_ticks_adjusted = (unsigned int)((min_delta_ns *
dev->mult) >> dev->shift);

printk_deferred(KERN_WARNING
@@ -451,7 +451,7 @@ static void __clockevents_update_bounds(struct clock_event_device *dev)
*/
dev->min_delta_ticks_adjusted =
max(dev->min_delta_ticks,
- (unsigned long)((1000ULL * dev->mult) >> dev->shift));
+ (unsigned int)((1000ULL * dev->mult) >> dev->shift));
}

/**
@@ -514,7 +514,7 @@ static void clockevents_config(struct clock_event_device *dev, u32 freq)
* min/max_delta can be 0 for devices which do not support oneshot mode.
*/
void clockevents_config_and_register(struct clock_event_device *dev,
- u32 freq, unsigned long min_delta,
+ u32 freq, unsigned int min_delta,
unsigned long max_delta)
{
dev->min_delta_ticks = min_delta;
--
2.10.2

2016-11-19 16:12:21

by Nicolai Stange

[permalink] [raw]
Subject: [RFC v8 24/28] clockevents: make setting of ->mult and ->mult_adjusted atomic

In order to avoid races between setting a struct clock_event_device's
->mult_adjusted in clockevents_update_freq() and yet to be implemented
updates triggered from the timekeeping core, the setting of ->mult and
->mult_adjusted should be made atomic.

Protect the update in clockevents_update_freq() by locking the
clockevents_lock spinlock. Frequency updates are expected to be done
seldomly and thus, taking this subsystem lock should not have any impact
on performance.

Note that the call to tick_broadcast_update_freq() is also protected
by this clockevents_lock and thus, this patch can take the
tick_broadcast_lock with the clockevents_lock being held. However,
this is supposed to be safe since the sequence
__clockevents_unbind() -> clockevents_replace() ->
tick_install_replacement() -> tick_setup_device() ->
tick_device_uses_broadcast()
already does this, too.

Signed-off-by: Nicolai Stange <[email protected]>
---
kernel/time/clockevents.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index d9a53af..7dd71bb 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -658,11 +658,11 @@ int clockevents_update_freq(struct clock_event_device *dev, u32 freq)
unsigned long flags;
int ret;

- local_irq_save(flags);
+ raw_spin_lock_irqsave(&clockevents_lock, flags);
ret = tick_broadcast_update_freq(dev, freq);
if (ret == -ENODEV)
ret = __clockevents_update_freq(dev, freq);
- local_irq_restore(flags);
+ raw_spin_unlock_irqrestore(&clockevents_lock, flags);
return ret;
}

--
2.10.2

2016-11-19 16:12:27

by Nicolai Stange

[permalink] [raw]
Subject: [RFC v8 25/28] timekeeping: inform clockevents about freq adjustments

Upon adjustments of the monotonic clock's frequencies from the
timekeeping core, the clockevents devices' ->mult_adjusted should be
changed accordingly, too.

Introduce clockevents_adjust_all_freqs() which traverses all registered
clockevent devices and, if the CLOCK_EVT_FEAT_NO_ADJUST flag is not set,
recalculates their ->mult_adjusted based on the monotonic clock's current
frequency.

Call clockevents_adjust_all_freqs() from timekeeping_freqadjust().

Note that it might look like as if timekeeping_apply_adjustment() was the
more natural candidate to trigger the clockevent devices' frequency updates
from: it's the single place where the mono clock's ->mult is changed.
However, timekeeping_apply_adjustment() is also invoked for the
on-off-controlled adjustments made to the mono clock's ->mult from
timekeeping_adjust(). These adjustments are very small in magnitude and,
more importantly, exhibit some oscillatory behaviour once the NTP error
becomes small. We don't want the clockevent devices' ->mult values to
follow these oscillations because they're negligible and because the
process of updating them would periodically destroy what
clockevents_increase_min_delta() might have built up.

Performance impact:
The following measurements have been carried out on a Raspberry Pi 2B
(armv7, 4 cores, 900MHz). The adjustment process has been driven by
periodically injecting a certain offset via adjtimex(2). The runtime of
clockevents_adjust_all_freqs() has been measured for three workloads.

- CPU stressed system: no interrupts except for the timer interrupts. An
adjtimex(2) every 3.7h keeps the adjustment process running. A
'stress --cpu 8' hogs the four cores.
Mean: 1733.75+-81.10
Quantiles:
0% 25% 50% 75% 100%
1458 1671 1717 1800 2083 (ns)

- Memory stressed system: same setup but with some memory load generated
by 'stress --vm 8 --vm-bytes 32M' instead.
Mean: 8750.73+-1034.77
Quantiles:
0% 25% 50% 75% 100%
3854 8171 8901 9452 13593 (ns)

- Idle case: same setup as above but with no stress at all. The results
are very similar to the ones obtained for the case of CPU stress.

Since clockevents_adjust_all_freqs() runs with interrupts disabled, the
differences in CPU vs. memory loaded are to be explained with cache hits
vs. misses as well as busy memory in the latter case.

Future patches will partly mitigate the memory stressed situation.

Signed-off-by: Nicolai Stange <[email protected]>
---
kernel/time/clockevents.c | 34 ++++++++++++++++++++++++++++++++++
kernel/time/tick-internal.h | 5 +++++
kernel/time/timekeeping.c | 2 ++
3 files changed, 41 insertions(+)

diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index 7dd71bb..6146d2f 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -626,6 +626,40 @@ void __clockevents_adjust_freq(struct clock_event_device *dev)
mult_cs_raw);
}

+void clockevents_adjust_all_freqs(u32 mult_cs_mono, u32 mult_cs_raw)
+{
+ u32 last_mult_raw = 0, last_shift = 0, last_mult_adjusted = 0;
+ u32 mult_raw, shift;
+ unsigned long flags;
+ struct clock_event_device *dev;
+
+ raw_spin_lock_irqsave(&clockevents_lock, flags);
+ list_for_each_entry(dev, &clockevent_devices, list) {
+ if (!(dev->features & CLOCK_EVT_FEAT_ONESHOT) ||
+ (dev->features & CLOCK_EVT_FEAT_DUMMY) ||
+ (dev->features & CLOCK_EVT_FEAT_NO_ADJUST))
+ continue;
+
+ /*
+ * The cached last_mult_adjusted is only valid if
+ * shift == last_shift. Otherwise, it could exceed
+ * what is allowed by ->max_delta_ns.
+ */
+ mult_raw = dev->mult;
+ shift = dev->shift;
+ if (mult_raw != last_mult_raw || shift != last_shift) {
+ last_mult_raw = mult_raw;
+ last_shift = shift;
+ last_mult_adjusted =
+ __clockevents_calc_adjust_freq(mult_raw,
+ mult_cs_mono,
+ mult_cs_raw);
+ }
+ dev->mult_adjusted = last_mult_adjusted;
+ }
+ raw_spin_unlock_irqrestore(&clockevents_lock, flags);
+}
+
int __clockevents_update_freq(struct clock_event_device *dev, u32 freq)
{
clockevents_config(dev, freq);
diff --git a/kernel/time/tick-internal.h b/kernel/time/tick-internal.h
index 0b29d23..2d97c42 100644
--- a/kernel/time/tick-internal.h
+++ b/kernel/time/tick-internal.h
@@ -56,6 +56,7 @@ extern int clockevents_program_event(struct clock_event_device *dev,
ktime_t expires, bool force);
extern void clockevents_handle_noop(struct clock_event_device *dev);
extern int __clockevents_update_freq(struct clock_event_device *dev, u32 freq);
+extern void clockevents_adjust_all_freqs(u32 mult_cs_mono, u32 mult_cs_raw);
extern void timekeeping_get_mono_mult(u32 *mult_cs_mono, u32 *mult_cs_raw);
extern ssize_t sysfs_get_uname(const char *buf, char *dst, size_t cnt);

@@ -95,6 +96,10 @@ static inline void tick_set_periodic_handler(struct clock_event_device *dev, int
#else /* !GENERIC_CLOCKEVENTS: */
static inline void tick_suspend(void) { }
static inline void tick_resume(void) { }
+
+static inline void clockevents_adjust_all_freqs(u32 mult_cs_mono,
+ u32 mult_cs_raw)
+{}
#endif /* !GENERIC_CLOCKEVENTS */

/* Oneshot related functions */
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index e0471e0..3d0ebc3 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1913,6 +1913,8 @@ static __always_inline void timekeeping_freqadjust(struct timekeeper *tk,

/* scale the corrections */
timekeeping_apply_adjustment(tk, offset, negative, adj_scale);
+ clockevents_adjust_all_freqs(tk->tkr_mono.mult,
+ tk->tkr_mono.clock->mult);
}

/*
--
2.10.2

2016-11-19 16:12:36

by Nicolai Stange

[permalink] [raw]
Subject: [RFC v8 27/28] clockevents: optimize struct clock_event_device layout

Benchmarks have shown that clockevents_adjust_all_freqs() is memory bound.
The members of struct clock_event_device accessed therein, namely
->features, ->mult_adjusted, ->shift, ->mult and ->list are split across
two cachelines.

In what follows, a cacheline size of 64 bytes will be assumed.

I.) sizeof(long) == 4
At least for sizeof(long) == sizeof(void*) == 4, e.g. on ARM, this
situation can be improved by reordering the members of
struct clock_event_device.

This patch exploits the fact that with the current layout, there is some
padding inserted within the first cacheline on ARM: right now, the first
three members, which are of pointer type and 12 bytes in total, are
followed by a ktime_t. Baically being a s64, the latter gets aligned on
an 8 byte boundary. Thus, bytes 12 to 16 are occupied by unused padding.

Furthermore, for sizeof(unsigned long) == 4, this patch series has achieved
another net saving of 4 bytes within the first cacheline so far.

Summarizing: through reordering of the first cacheline's members, a total
space of 8 bytes can be made available, compared to the state before this
series. This allows for the placement of the 12 bytes made up by ->list and
->mult within the first cacheline while moving only one 4 byte member out.

This victim will be ->set_state_periodic() which will be moved to its
->set_state_*() relatives within the second cacheline.

II.) sizeof(long) == 8
For the case of sizeof(long) == sizeof(void*) == 8, the first cacheline's
contents will be unchanged.
The ->list and ->mult members are moved from the third to the start of
the second cacheline. Compared to the situation before this series,
this will move ->broadcast() and ->suspend() from the end of the second
cacheline to the start of the third one.
A nice side effect is that due to the elimination of padding in the third
cacheline, together with the patch ("clockevents: degrade ->retries to
unsigned int"), the struct clock_event_device now fits exactly into a
multiple of three cachelines.

Thus, in struct clock_event_device:
- move the ->event_handler(), ->set_next_event() and ->set_next_ktime()
in order to avoid padding in front of ->next_event on ARM.
- move the ->list and ->mult members right before the ->retries member.

Benchmark results:
The following measurements have been carried out on a Raspberry Pi 2B
(armv7, 4 cores, 900MHz). The adjustment process has been driven by
periodically injecting a certain offset via adjtimex(2) approximately
every 3.7h. A 'stress --vm 8 --vm-bytes 32M' was running. The runtime of
clockevents_adjust_all_freqs() has been measured.

- Before this patch:
Mean: 8750.73+-1034.77
Quantiles:
0% 25% 50% 75% 100%
3854 8171 8901 9452 13593 (ns)

- After this patch:
Mean: 6916.90+-782.63
Quantiles:
0% 25% 50% 75% 100%
3072 6412 6927 7430 10989 (ns)

Signed-off-by: Nicolai Stange <[email protected]>
---
include/linux/clockchips.h | 22 +++++++++++-----------
1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h
index 878a802..17ef940 100644
--- a/include/linux/clockchips.h
+++ b/include/linux/clockchips.h
@@ -74,10 +74,6 @@ enum clock_event_state {

/**
* struct clock_event_device - clock event device descriptor
- * @event_handler: Assigned by the framework to be called by the low
- * level handler of the event source
- * @set_next_event: set next event function using a clocksource delta
- * @set_next_ktime: set next event function using a direct ktime value
* @next_event: local storage for the next event in oneshot mode
* @max_delta_ns: maximum delta value in ns
* @max_delta_ticks: maximum delta value in ticks
@@ -86,6 +82,12 @@ enum clock_event_state {
* @shift: nanoseconds to cycles divisor (power of two)
* @state_use_accessors:current state of the device, assigned by the core code
* @features: features
+ * @event_handler: Assigned by the framework to be called by the low
+ * level handler of the event source
+ * @set_next_event: set next event function using a clocksource delta
+ * @set_next_ktime: set next event function using a direct ktime value
+ * @list: list head for the management code
+ * @mult: ns to cycles multiplier stored for reconfiguration
* @retries: number of forced programming retries
* @set_state_periodic: switch state to periodic
* @set_state_oneshot: switch state to oneshot
@@ -95,18 +97,13 @@ enum clock_event_state {
* @broadcast: function to broadcast events
* @name: ptr to clock event name
* @min_delta_ticks: minimum delta value in ticks stored for reconfiguration
- * @mult: ns to cycles multiplier stored for reconfiguration
* @rating: variable to rate clock event devices
* @irq: IRQ number (only for non CPU local devices)
* @bound_on: Bound on CPU
* @cpumask: cpumask to indicate for which CPUs this device works
- * @list: list head for the management code
* @owner: module reference
*/
struct clock_event_device {
- void (*event_handler)(struct clock_event_device *);
- int (*set_next_event)(unsigned long evt, struct clock_event_device *);
- int (*set_next_ktime)(ktime_t expires, struct clock_event_device *);
ktime_t next_event;
u64 max_delta_ns;
unsigned long max_delta_ticks;
@@ -115,6 +112,11 @@ struct clock_event_device {
u32 shift;
enum clock_event_state state_use_accessors:8;
unsigned int features:24;
+ void (*event_handler)(struct clock_event_device *);
+ int (*set_next_event)(unsigned long evt, struct clock_event_device *);
+ int (*set_next_ktime)(ktime_t expires, struct clock_event_device *);
+ struct list_head list;
+ u32 mult;
unsigned int retries;

int (*set_state_periodic)(struct clock_event_device *);
@@ -129,12 +131,10 @@ struct clock_event_device {

const char *name;
unsigned int min_delta_ticks;
- u32 mult;
int rating;
int irq;
int bound_on;
const struct cpumask *cpumask;
- struct list_head list;
struct module *owner;
} ____cacheline_aligned;

--
2.10.2

2016-11-19 16:12:38

by Nicolai Stange

[permalink] [raw]
Subject: [RFC v8 26/28] clockevents: degrade ->retries to unsigned int

Being of type unsigned long, the struct clock_event_device's ->retries
member is incremented in clockevents_program_min_delta() and used for
diagnostic purposes in /proc/timer_list only.

Turning ->retries' type into unsigned int avoids the introduction of
some padding with future reorderings of struct clock_event_device for the
case of sizeof(unsigned long) == sizeof(void*) == 8. This is turn prevents
these reorganizations from making sizeof(struct clock_event_device) cross
a multiple of the cacheline size, assumed to be equal to 64 bytes.

For the case of HZ=1000, this change still allows for
2^32 / HZ / (3600*24) = 49 days without overwrap, even in the highly
unrealistic case of one clockevents_program_min_delta() invocation per
tick.

Thus, change ->retries' type from unsigned long to unsigned int. Adapt
the format specifier or /proc/timer_list accordingly.

Signed-off-by: Nicolai Stange <[email protected]>
---
include/linux/clockchips.h | 2 +-
kernel/time/timer_list.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h
index e07f421..878a802 100644
--- a/include/linux/clockchips.h
+++ b/include/linux/clockchips.h
@@ -115,7 +115,7 @@ struct clock_event_device {
u32 shift;
enum clock_event_state state_use_accessors:8;
unsigned int features:24;
- unsigned long retries;
+ unsigned int retries;

int (*set_state_periodic)(struct clock_event_device *);
int (*set_state_oneshot)(struct clock_event_device *);
diff --git a/kernel/time/timer_list.c b/kernel/time/timer_list.c
index 9067760..6a8d54f 100644
--- a/kernel/time/timer_list.c
+++ b/kernel/time/timer_list.c
@@ -271,7 +271,7 @@ print_tickdevice(struct seq_file *m, struct tick_device *td, int cpu)
SEQ_printf(m, " event_handler: ");
print_name_offset(m, dev->event_handler);
SEQ_printf(m, "\n");
- SEQ_printf(m, " retries: %lu\n", dev->retries);
+ SEQ_printf(m, " retries: %u\n", dev->retries);
SEQ_printf(m, "\n");
}

--
2.10.2

2016-11-19 16:12:51

by Nicolai Stange

[permalink] [raw]
Subject: [RFC v8 28/28] clockevents: move non-adjustable devices to the tail of the list

clockevents_adjust_all_freqs() iterates over all devices in the
clockevent_devices list and adjusts frequencies as appropriate, skipping
over those that have either of CLOCK_EVT_FEAT_DUMMY and
CLOCK_EVT_FEAT_NO_ADJUST set or aren't oneshot capable.

This results in unnecessary memory accesses to these list members.

Avoid this by moving all such devices to the end of the clockevents_devices
list and making clockevents_adjust_all_freqs() return as soon as it
encounters the first of them.

For the list insertion part, introduce the ced_list_add() helper and
use it where appropriate.

Benchmark results:
The following measurements have been carried out on a Raspberry Pi 2B
(armv7, 4 cores, 900MHz). The adjustment process has been driven by
periodically injecting a certain offset via adjtimex(2) approximately
every 3.7h. A 'stress --vm 8 --vm-bytes 32M' was running. The runtime of
clockevents_adjust_all_freqs() has been measured.

- Before this patch:
Mean: 6916.90+-782.63
Quantiles:
0% 25% 50% 75% 100%
3072 6412 6927 7430 10989 (ns)

- After this patch:
Mean: 6505.18+-740.85
Quantiles:
0% 25% 50% 75% 100%
2708 6054 6523 6980 10885 (ns)

Signed-off-by: Nicolai Stange <[email protected]>
---
kernel/time/clockevents.c | 22 +++++++++++++++++++---
1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index 6146d2f..038fa82 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -341,6 +341,22 @@ int clockevents_program_event(struct clock_event_device *dev, ktime_t expires,
return (rc && force) ? clockevents_program_min_delta(dev) : rc;
}

+static void ced_list_add(struct clock_event_device *dev)
+{
+ /*
+ * Insert all devices which aren't candidates for NTP
+ * frequency adjustments at the end of the list such that
+ * clockevents_adjust_all_freqs() can skip the tail once
+ * encountering the first of them.
+ */
+ if (!(dev->features & CLOCK_EVT_FEAT_ONESHOT) ||
+ (dev->features & CLOCK_EVT_FEAT_DUMMY) ||
+ (dev->features & CLOCK_EVT_FEAT_NO_ADJUST))
+ list_add_tail(&dev->list, &clockevent_devices);
+ else
+ list_add(&dev->list, &clockevent_devices);
+}
+
/*
* Called after a notify add to make devices available which were
* released from the notifier call.
@@ -353,7 +369,7 @@ static void clockevents_notify_released(void)
dev = list_entry(clockevents_released.next,
struct clock_event_device, list);
list_del(&dev->list);
- list_add(&dev->list, &clockevent_devices);
+ ced_list_add(dev);
tick_check_new_device(dev);
}
}
@@ -524,7 +540,7 @@ void clockevents_register_device(struct clock_event_device *dev)

raw_spin_lock_irqsave(&clockevents_lock, flags);

- list_add(&dev->list, &clockevent_devices);
+ ced_list_add(dev);
tick_check_new_device(dev);
clockevents_notify_released();

@@ -638,7 +654,7 @@ void clockevents_adjust_all_freqs(u32 mult_cs_mono, u32 mult_cs_raw)
if (!(dev->features & CLOCK_EVT_FEAT_ONESHOT) ||
(dev->features & CLOCK_EVT_FEAT_DUMMY) ||
(dev->features & CLOCK_EVT_FEAT_NO_ADJUST))
- continue;
+ break;

/*
* The cached last_mult_adjusted is only valid if
--
2.10.2

2016-11-19 16:13:36

by Nicolai Stange

[permalink] [raw]
Subject: [RFC v8 22/28] clockevents: decouple ->max_delta_ns from ->max_delta_ticks

Before converting the given delta from ns to cycles by means of the
mult/shift pair, clockevents_program_event() enforces it to be less or
equal than ->max_delta_ns. Simplified, this reads as

delta = min(delta, dev->max_delta_ns);
clc = (delta * dev->mult) >> dev->shift;

A device's ->max_delta_ns is chosen such that
1.) The multiplication does not overflow 64 bits.
2.) clc fits into an unsigned long.
3.) clc <= the ->max_delta_ticks allowed by hardware.

Item 3.) is responsible for making ->max_delta_ns depend tightly on
->max_delta_ticks and the device's frequency, i.e. its mult/shift pair.
As it stands, any update to ->mult would require a corresponding change of
->max_delta_ns as well and these two had to get consumed in the clockevent
programming path atomically. This would have a negative performance impact
as soon as we start adjusting ->mult from a different CPU with upcoming
changes making the clockevents core NTP correction aware: some kind of
locking would have been needed in the event programming path.

In order to circumvent this necessity, ->max_delta_ns could be made small
enough to cover the whole range of possible adjustments to ->mult,
eliminating the need to update it at all.

The timekeeping core never adjusts its mono clock's inverse frequency by
more than 11% (c.f. timekeeping_adjust()). This means that a clockevent
device's adjusted mult will never grow beyond 1 / (1-11%) = 112.4% of its
initial value. Thus, reducing ->max_delta_ns unconditionally to
1/112.4% = 89% would do the trick. (Actually, setting it to 8/9 = 88.89%
thereof allows for mult increases of up to 12.5% = 1/8 which is good for
binary arithmetic.)

However, such a decrease of ->max_delta_ns could become problematic for
devices whose ->max_delta_ticks is small, i.e. those for whom item 3.)
prevails.

In order to work around this, I chose to make ->max_delta_ns completely
independent of ->max_delta_ticks. It is set to 8/9 of the maximum value
satisfying conditions 1.) and 2.) for the given ->mult. Item 3.) is made
explicit by introducing an extra min(clc, ->max_delta_ticks) after the ns
to cycles conversion in clockevents_program_event(). This way, the maximum
programmable delta is reduced only for those devices which have got a large
->max_delta_ticks anyway. This comes at the cost of an extra min() in the
event programming path though.

So,
- In the case of CLOCK_EVT_FEAT_NO_ADJUST not being enabled, set
->max_delta_ns to 8/9 of the maximum possible value such that items 1.)
and 2.) hold for the given ->mult. OTOH, if CLOCK_EVT_FEAT_NO_ADJUST
is not set, set ->max_delta_ns to the full such value.

- Add a

clc = min(clc, dev->max_delta_ticks)

after the ns to cycle conversion in clockevents_program_event().

- Move ->max_delta_ticks into struct clock_event_device's first cacheline.

- Finally, preserve the semantics of /proc/timer_list by letting it display
the minimum of ->max_delta_ns and ->max_delta_ticks converted to ns at
the 'max_delta_ns' field.

Signed-off-by: Nicolai Stange <[email protected]>
---
include/linux/clockchips.h | 4 ++--
kernel/time/clockevents.c | 47 +++++++++++++++++++++++++++++++++++++++++++++-
kernel/time/timer_list.c | 6 ++++--
3 files changed, 52 insertions(+), 5 deletions(-)

diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h
index 548f4fe..8fc5469 100644
--- a/include/linux/clockchips.h
+++ b/include/linux/clockchips.h
@@ -80,6 +80,7 @@ enum clock_event_state {
* @set_next_ktime: set next event function using a direct ktime value
* @next_event: local storage for the next event in oneshot mode
* @max_delta_ns: maximum delta value in ns
+ * @max_delta_ticks: maximum delta value in ticks
* @min_delta_ticks_adjusted: minimum delta value, increased as needed
* @mult: nanosecond to cycles multiplier
* @shift: nanoseconds to cycles divisor (power of two)
@@ -92,7 +93,6 @@ enum clock_event_state {
* @set_state_shutdown: switch state to shutdown
* @tick_resume: resume clkevt device
* @broadcast: function to broadcast events
- * @max_delta_ticks: maximum delta value in ticks stored for reconfiguration
* @name: ptr to clock event name
* @min_delta_ticks: minimum delta value in ticks stored for reconfiguration
* @rating: variable to rate clock event devices
@@ -108,6 +108,7 @@ struct clock_event_device {
int (*set_next_ktime)(ktime_t expires, struct clock_event_device *);
ktime_t next_event;
u64 max_delta_ns;
+ unsigned long max_delta_ticks;
unsigned int min_delta_ticks_adjusted;
u32 mult;
u32 shift;
@@ -124,7 +125,6 @@ struct clock_event_device {
void (*broadcast)(const struct cpumask *mask);
void (*suspend)(struct clock_event_device *);
void (*resume)(struct clock_event_device *);
- unsigned long max_delta_ticks;

const char *name;
unsigned int min_delta_ticks;
diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index 84639a1..ff44140 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -17,6 +17,7 @@
#include <linux/module.h>
#include <linux/smp.h>
#include <linux/device.h>
+#include <asm/bitsperlong.h>

#include "tick-internal.h"

@@ -329,6 +330,7 @@ int clockevents_program_event(struct clock_event_device *dev, ktime_t expires,

clc = ((unsigned long long) delta * dev->mult) >> dev->shift;

+ clc = min_t(unsigned long, clc, dev->max_delta_ticks);
clc = max_t(unsigned long, clc, dev->min_delta_ticks_adjusted);

rc = dev->set_next_event((unsigned long) clc, dev);
@@ -439,11 +441,54 @@ EXPORT_SYMBOL_GPL(clockevents_unbind_device);

static void __clockevents_update_bounds(struct clock_event_device *dev)
{
+ u64 max_delta_ns;
+
if (!(dev->features & CLOCK_EVT_FEAT_ONESHOT) ||
(dev->features & CLOCK_EVT_FEAT_DUMMY))
return;

- dev->max_delta_ns = cev_delta2ns(dev->max_delta_ticks, dev, true);
+ /*
+ * max_delta_ns is <= the maximum value such that the ns to
+ * cycles conversion in clockevents_program_event() doesn't
+ * overflow. It follows that is has to fulfill the following
+ * constraints:
+ * 1.) mult * max_delta_ns <= ULLONG_MAX
+ * 2.) (mult * max_delta_ns) >> shift <= ULONG_MAX
+ * On 32 bit archs, 2.) is stricter because shift <= 32 always
+ * holds, otherwise 1.) is.
+ *
+ * If CLOCK_EVT_FEAT_NO_ADJUST is not set, the actual
+ * ->max_delta_ns is set to a value equal to 8/9 of the
+ * maximum one possible. This gives some room for increasing
+ * mult by up to 12.5% which is just enough to compensate for
+ * the up to 11% mono clock frequency adjustments made by the
+ * timekeeping core, c.f. timekeeping_adjust().
+ *
+ */
+#if BITS_PER_LONG == 32
+ /*
+ * Set the lower BITS_PER_LONG + dev->shift bits.
+ * Note: dev->shift can be 32, so avoid undefined behaviour.
+ * The ^= below is really a |= here. However the former is the
+ * common idiom and recognizable by gcc.
+ */
+ max_delta_ns = (u64)1 << (BITS_PER_LONG + dev->shift - 1);
+ max_delta_ns ^= (dev->max_delta_ns - 1);
+#else
+ max_delta_ns = U64_MAX;
+#endif
+ if (unlikely(!dev->mult)) {
+ dev->mult = 1;
+ WARN_ON(1);
+ }
+ do_div(max_delta_ns, dev->mult);
+ dev->max_delta_ns = max_delta_ns;
+ if (!(dev->features & CLOCK_EVT_FEAT_NO_ADJUST)) {
+ /* reduce by 1/9 */
+ do_div(max_delta_ns, 9);
+ ++max_delta_ns; /* round up */
+ dev->max_delta_ns -= max_delta_ns;
+ }

/*
* Enforce ->min_delta_ticks_adjusted to correspond to a value
diff --git a/kernel/time/timer_list.c b/kernel/time/timer_list.c
index 12a27cb..9067760 100644
--- a/kernel/time/timer_list.c
+++ b/kernel/time/timer_list.c
@@ -206,8 +206,10 @@ static void
print_tickdevice(struct seq_file *m, struct tick_device *td, int cpu)
{
struct clock_event_device *dev = td->evtdev;
- u64 min_delta_ns;
+ u64 max_delta_ns, min_delta_ns;

+ max_delta_ns = clockevent_delta2ns(dev->max_delta_ticks, dev);
+ max_delta_ns = min(max_delta_ns, dev->max_delta_ns);
min_delta_ns = clockevent_delta2ns(dev->min_delta_ticks_adjusted, dev);

SEQ_printf(m, "Tick Device: mode: %d\n", td->mode);
@@ -223,7 +225,7 @@ print_tickdevice(struct seq_file *m, struct tick_device *td, int cpu)
}
SEQ_printf(m, "%s\n", dev->name);
SEQ_printf(m, " max_delta_ns: %llu\n",
- (unsigned long long) dev->max_delta_ns);
+ (unsigned long long) max_delta_ns);
SEQ_printf(m, " min_delta_ns: %llu\n",
(unsigned long long) min_delta_ns);
SEQ_printf(m, " mult: %u\n", dev->mult);
--
2.10.2

2016-11-19 16:14:04

by Nicolai Stange

[permalink] [raw]
Subject: [RFC v8 16/28] clockevents: min delta increment: calculate min_delta_ns from ticks

The use of a clockevent device's ->min_delta_ns in the event programming
path hinders upcoming changes to the clockevent core making it NTP
correction aware: both, ->mult and ->min_delta_ns would need to get
updated as well as consumed atomically and we'd rather like to avoid any
locking here.

We already have got ->min_delta_ticks_adjusted which
- resembles the value of ->min_delta_ns
- and is guaranteed to be always >= the hardware's hard limit
->min_delta_ticks and thus, can be used w/o locking as we don't care
for small deviations.

In clockevents_increase_min_delta(), don't use ->min_delta_ns but
calculate it dynamically from ->min_delta_ticks_adjusted.

As clockevents_increase_min_delta() gets invoked only rarely, the
additional division should not be an issue from a performance standpoint.

Signed-off-by: Nicolai Stange <[email protected]>
---
kernel/time/clockevents.c | 19 +++++++++++--------
1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index 74b05dd..36dd024 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -209,29 +209,32 @@ int clockevents_tick_resume(struct clock_event_device *dev)
*/
static int clockevents_increase_min_delta(struct clock_event_device *dev)
{
+ u64 min_delta_ns = cev_delta2ns(dev->min_delta_ticks_adjusted, dev,
+ false);
+
/* Nothing to do if we already reached the limit */
- if (dev->min_delta_ns >= MIN_DELTA_LIMIT) {
+ if (min_delta_ns >= MIN_DELTA_LIMIT) {
printk_deferred(KERN_WARNING
"CE: Reprogramming failure. Giving up\n");
dev->next_event.tv64 = KTIME_MAX;
return -ETIME;
}

- if (dev->min_delta_ns < 5000)
- dev->min_delta_ns = 5000;
+ if (min_delta_ns < 5000)
+ min_delta_ns = 5000;
else
- dev->min_delta_ns += dev->min_delta_ns >> 1;
+ min_delta_ns += min_delta_ns >> 1;

- if (dev->min_delta_ns > MIN_DELTA_LIMIT)
- dev->min_delta_ns = MIN_DELTA_LIMIT;
+ if (min_delta_ns > MIN_DELTA_LIMIT)
+ min_delta_ns = MIN_DELTA_LIMIT;

- dev->min_delta_ticks_adjusted = (unsigned long)((dev->min_delta_ns *
+ dev->min_delta_ticks_adjusted = (unsigned long)((min_delta_ns *
dev->mult) >> dev->shift);

printk_deferred(KERN_WARNING
"CE: %s increased min_delta_ns to %llu nsec\n",
dev->name ? dev->name : "?",
- (unsigned long long) dev->min_delta_ns);
+ (unsigned long long) min_delta_ns);
return 0;
}

--
2.10.2

2016-11-19 16:18:52

by Nicolai Stange

[permalink] [raw]
Subject: [RFC v8 10/28] arch/tile/kernel/time: set ->min_delta_ticks and ->max_delta_ticks

With the yet to come introduction of NTP correction awareness to the
clockevent core, drivers should report their valid ranges in units of
cycles to the latter.

Currently, the tile's timer clockevent device is initialized as follows:

evt->max_delta_ns = clockevent_delta2ns(MAX_TICK, evt);

and

.min_delta_ns = 1000,

The first one translates to a ->max_delta_ticks value of MAX_TICK.
For the latter, note that the clockevent core will superimpose a
minimum of 1us by itself -- setting ->min_delta_ticks to 1 is safe here.

Initialize ->min_delta_ticks and ->max_delta_ticks with these values.

Signed-off-by: Nicolai Stange <[email protected]>
---
arch/tile/kernel/time.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/tile/kernel/time.c b/arch/tile/kernel/time.c
index 178989e..c2fd280 100644
--- a/arch/tile/kernel/time.c
+++ b/arch/tile/kernel/time.c
@@ -154,6 +154,7 @@ static DEFINE_PER_CPU(struct clock_event_device, tile_timer) = {
.name = "tile timer",
.features = CLOCK_EVT_FEAT_ONESHOT,
.min_delta_ns = 1000,
+ .min_delta_ticks = 1,
.rating = 100,
.irq = -1,
.set_next_event = tile_timer_set_next_event,
@@ -169,6 +170,7 @@ void setup_tile_timer(void)
/* Fill in fields that are speed-specific. */
clockevents_calc_mult_shift(evt, cycles_per_sec, TILE_MINSEC);
evt->max_delta_ns = clockevent_delta2ns(MAX_TICK, evt);
+ evt->max_delta_ticks = MAX_TICK;

/* Mark as being for this cpu only. */
evt->cpumask = cpumask_of(smp_processor_id());
--
2.10.2