2014-01-31 23:20:04

by Andy Lutomirski

[permalink] [raw]
Subject: [PATCH] x86: Remove hpet vclock support

The HPET is so amazingly slow that this is barely a win. It adds
complexity, and it scares me a tiny bit to map a piece of crappy
hardware where every userspace program can poke at it (even if said
poking is read-only). Let's just remove it.

This is probably a tiny speedup on kvmclock systems.

Signed-off-by: Andy Lutomirski <[email protected]>
---
arch/x86/include/asm/clocksource.h | 3 +--
arch/x86/include/asm/fixmap.h | 1 -
arch/x86/kernel/hpet.c | 7 -------
arch/x86/kvm/trace.h | 4 ++--
arch/x86/vdso/vclock_gettime.c | 8 --------
5 files changed, 3 insertions(+), 20 deletions(-)

diff --git a/arch/x86/include/asm/clocksource.h b/arch/x86/include/asm/clocksource.h
index 16a57f4..00d2e6f 100644
--- a/arch/x86/include/asm/clocksource.h
+++ b/arch/x86/include/asm/clocksource.h
@@ -7,8 +7,7 @@

#define VCLOCK_NONE 0 /* No vDSO clock available. */
#define VCLOCK_TSC 1 /* vDSO should use vread_tsc. */
-#define VCLOCK_HPET 2 /* vDSO should use vread_hpet. */
-#define VCLOCK_PVCLOCK 3 /* vDSO should use vread_pvclock. */
+#define VCLOCK_PVCLOCK 2 /* vDSO should use vread_pvclock. */

struct arch_clocksource_data {
int vclock_mode;
diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
index 7252cd3..a1c741c 100644
--- a/arch/x86/include/asm/fixmap.h
+++ b/arch/x86/include/asm/fixmap.h
@@ -80,7 +80,6 @@ enum fixed_addresses {
VSYSCALL_FIRST_PAGE = VSYSCALL_LAST_PAGE
+ ((VSYSCALL_END-VSYSCALL_START) >> PAGE_SHIFT) - 1,
VVAR_PAGE,
- VSYSCALL_HPET,
#ifdef CONFIG_PARAVIRT_CLOCK
PVCLOCK_FIXMAP_BEGIN,
PVCLOCK_FIXMAP_END = PVCLOCK_FIXMAP_BEGIN+PVCLOCK_VSYSCALL_NR_PAGES-1,
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index da85a8e..b578e07 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -12,7 +12,6 @@
#include <linux/pm.h>
#include <linux/io.h>

-#include <asm/fixmap.h>
#include <asm/hpet.h>
#include <asm/time.h>

@@ -74,9 +73,6 @@ static inline void hpet_writel(unsigned int d, unsigned int a)
static inline void hpet_set_mapping(void)
{
hpet_virt_address = ioremap_nocache(hpet_address, HPET_MMAP_SIZE);
-#ifdef CONFIG_X86_64
- __set_fixmap(VSYSCALL_HPET, hpet_address, PAGE_KERNEL_VVAR_NOCACHE);
-#endif
}

static inline void hpet_clear_mapping(void)
@@ -752,9 +748,6 @@ static struct clocksource clocksource_hpet = {
.mask = HPET_MASK,
.flags = CLOCK_SOURCE_IS_CONTINUOUS,
.resume = hpet_resume_counter,
-#ifdef CONFIG_X86_64
- .archdata = { .vclock_mode = VCLOCK_HPET },
-#endif
};

static int hpet_clocksource_register(void)
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index 545245d..02d1a84 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -781,8 +781,8 @@ TRACE_EVENT(kvm_write_tsc_offset,

#define host_clocks \
{VCLOCK_NONE, "none"}, \
- {VCLOCK_TSC, "tsc"}, \
- {VCLOCK_HPET, "hpet"} \
+ {VCLOCK_TSC, "tsc"}
+

TRACE_EVENT(kvm_update_master_clock,
TP_PROTO(bool use_master_clock, unsigned int host_clock, bool offset_matched),
diff --git a/arch/x86/vdso/vclock_gettime.c b/arch/x86/vdso/vclock_gettime.c
index eb5d7a5..3095446 100644
--- a/arch/x86/vdso/vclock_gettime.c
+++ b/arch/x86/vdso/vclock_gettime.c
@@ -19,7 +19,6 @@
#include <asm/fixmap.h>
#include <asm/vgtod.h>
#include <asm/timex.h>
-#include <asm/hpet.h>
#include <asm/unistd.h>
#include <asm/io.h>
#include <asm/pvclock.h>
@@ -58,11 +57,6 @@ notrace static cycle_t vread_tsc(void)
return last;
}

-static notrace cycle_t vread_hpet(void)
-{
- return readl((const void __iomem *)fix_to_virt(VSYSCALL_HPET) + HPET_COUNTER);
-}
-
#ifdef CONFIG_PARAVIRT_CLOCK

static notrace const struct pvclock_vsyscall_time_info *get_pvti(int cpu)
@@ -157,8 +151,6 @@ notrace static inline u64 vgetsns(int *mode)
cycles_t cycles;
if (gtod->clock.vclock_mode == VCLOCK_TSC)
cycles = vread_tsc();
- else if (gtod->clock.vclock_mode == VCLOCK_HPET)
- cycles = vread_hpet();
#ifdef CONFIG_PARAVIRT_CLOCK
else if (gtod->clock.vclock_mode == VCLOCK_PVCLOCK)
cycles = vread_pvclock(mode);
--
1.8.5.3


2014-02-01 15:44:32

by Clemens Ladisch

[permalink] [raw]
Subject: Re: [PATCH] x86: Remove hpet vclock support

Andy Lutomirski wrote:
> The HPET is so amazingly slow that this is barely a win.

What happens on CPUs where the TSC cannot be used for the clock?

> it scares me a tiny bit to map a piece of crappy hardware where every
> userspace program can poke at it (even if said poking is read-only).
> Let's just remove it.

So this mapping is a backdoor to bypass the restrictions on /dev/hpet? ;-)


Regards,
Clemens

2014-02-01 17:11:14

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH] x86: Remove hpet vclock support

On Sat, Feb 1, 2014 at 7:43 AM, Clemens Ladisch <[email protected]> wrote:
> Andy Lutomirski wrote:
>> The HPET is so amazingly slow that this is barely a win.
>
> What happens on CPUs where the TSC cannot be used for the clock?

vclock_gettime, etc will fall back to using syscalls, just like they
would on systems using the PIT timer.

>
>> it scares me a tiny bit to map a piece of crappy hardware where every
>> userspace program can poke at it (even if said poking is read-only).
>> Let's just remove it.
>
> So this mapping is a backdoor to bypass the restrictions on /dev/hpet? ;-)
>

Yes, except that it's read-only.

At some point, someone will probably want to run code in a sandbox
that can't tell time at all. That will be messy, but this is a decent
first little step.

--Andy

2014-02-04 19:31:47

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] x86: Remove hpet vclock support

On Fri, 31 Jan 2014, Andy Lutomirski wrote:

> The HPET is so amazingly slow that this is barely a win. It adds

That's nonsense. It's definitely a win to access HPET directly
especially on older systems with TSC wreckage. Why do you want to
enforce a syscall if we can read out the data straight from user
space. The systems which are forced to use HPET have slower syscalls
than those which have a proper TSC.

> complexity, and it scares me a tiny bit to map a piece of crappy
> hardware where every userspace program can poke at it (even if said
> poking is read-only). Let's just remove it.

What's scary about granting user space access to a read only resource?

> This is probably a tiny speedup on kvmclock systems.

Oh, now you are coming to the real point of your change:

You want to save a few cycles in a guest system.

So you justify that by completely unbacked assumptions and a pointless
blurb about scariness.

We are not removing existing functionality which is useful for parts
of our userbase just to make financial online gambling faster.

> #ifdef CONFIG_PARAVIRT_CLOCK
>
> static notrace const struct pvclock_vsyscall_time_info *get_pvti(int cpu)
> @@ -157,8 +151,6 @@ notrace static inline u64 vgetsns(int *mode)
> cycles_t cycles;
> if (gtod->clock.vclock_mode == VCLOCK_TSC)
> cycles = vread_tsc();
> - else if (gtod->clock.vclock_mode == VCLOCK_HPET)
> - cycles = vread_hpet();
> #ifdef CONFIG_PARAVIRT_CLOCK
> else if (gtod->clock.vclock_mode == VCLOCK_PVCLOCK)
> cycles = vread_pvclock(mode);

I really doubt, that your tiny speedup is even measurable simply
because the branch prediction wont fail and the cache foot print is
not that excitingly smaller. I could be wrong as usual, but its up to
you to provide proper numbers which prove that:

- there is no impact for the actual forced to use HPET systems

- there is a measurable improvement for the PV clock case

and not before you tried to revert the evaluation order of VCLOCK_HPET
and VCLOCK_PVCLOCK.

Thanks,

tglx

2014-02-04 19:41:48

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH] x86: Remove hpet vclock support

On Tue, Feb 4, 2014 at 11:31 AM, Thomas Gleixner <[email protected]> wrote:
> On Fri, 31 Jan 2014, Andy Lutomirski wrote:
>
>> The HPET is so amazingly slow that this is barely a win. It adds
>
> That's nonsense. It's definitely a win to access HPET directly
> especially on older systems with TSC wreckage. Why do you want to
> enforce a syscall if we can read out the data straight from user
> space. The systems which are forced to use HPET have slower syscalls
> than those which have a proper TSC.
>

I'm actually curious whether anyone cares about this particular
performance difference. On all my HPET systems, the actual HPET read
takes ~500ns, whereas the overhead from a syscall is ~50ns. (This is
ignoring the CONFIG_AUDIT_SYSCALL wreckage, which I'm trying to fix.)
I certainly care about 10% performance changes in clock_gettime, but
that's only because it's already fast enough to call very frequently.
If it took anywhere near 500ns, I would just stop using it, so the
50ns difference wouldn't matter for my application.

It's certainly true that, on older hardware, syscalls are slower, but
I suspect that the HPET is at least as slow, and old enough hardware
won't even have a usable HPET.

On newish things (probably Nehalem and up that have non-buggy BIOS),
HPET is AFAIK completely pointless.

>> complexity, and it scares me a tiny bit to map a piece of crappy
>> hardware where every userspace program can poke at it (even if said
>> poking is read-only). Let's just remove it.
>
> What's scary about granting user space access to a read only resource?

I only said it was a tiny bit scary, not that it was meaningfully scary :)

>
>> This is probably a tiny speedup on kvmclock systems.
>
> Oh, now you are coming to the real point of your change:
>
> You want to save a few cycles in a guest system.
>
> So you justify that by completely unbacked assumptions and a pointless
> blurb about scariness.
>
> We are not removing existing functionality which is useful for parts
> of our userbase just to make financial online gambling faster.

For the record, while I could be accused of financial online gambling,
I do not use kvmclock. In fact, I couldn't even get kvmclock to work
on my box (to test this patch) without manually patching out a bunch
of apparently buggy TSC stability checks. I'm not at all convinced
that this thing is actually functional on any normal system.

(My interest in clock_gettime performance is to keep the overhead of
userspace profiling low. I do this on bare metal.)

That being said, lots of people probably do care about kvmclock
performance. It's probably quite popular in the cloud, and the cloud
seems to be all the rage these days.

>
>> #ifdef CONFIG_PARAVIRT_CLOCK
>>
>> static notrace const struct pvclock_vsyscall_time_info *get_pvti(int cpu)
>> @@ -157,8 +151,6 @@ notrace static inline u64 vgetsns(int *mode)
>> cycles_t cycles;
>> if (gtod->clock.vclock_mode == VCLOCK_TSC)
>> cycles = vread_tsc();
>> - else if (gtod->clock.vclock_mode == VCLOCK_HPET)
>> - cycles = vread_hpet();
>> #ifdef CONFIG_PARAVIRT_CLOCK
>> else if (gtod->clock.vclock_mode == VCLOCK_PVCLOCK)
>> cycles = vread_pvclock(mode);
>
> I really doubt, that your tiny speedup is even measurable simply
> because the branch prediction wont fail and the cache foot print is
> not that excitingly smaller. I could be wrong as usual, but its up to
> you to provide proper numbers which prove that:
>
> - there is no impact for the actual forced to use HPET systems
>
> - there is a measurable improvement for the PV clock case
>
> and not before you tried to revert the evaluation order of VCLOCK_HPET
> and VCLOCK_PVCLOCK.

Heh. If I actually cared about speeding up VCLOCK_PVCLOCK, I think
I'd be aiming for much bigger fish.

That being said, you're the maintainer. If you don't like this patch,
feel free to drop it. (But please don't drop my other vDSO patch --
that one fixes a real bug. It was kind of alarming to see English
text in my vvar page.)

--Andy

2014-02-04 20:29:37

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] x86: Remove hpet vclock support

On Tue, 4 Feb 2014, Andy Lutomirski wrote:
> On Tue, Feb 4, 2014 at 11:31 AM, Thomas Gleixner <[email protected]> wrote:
> > On Fri, 31 Jan 2014, Andy Lutomirski wrote:
> >
> >> The HPET is so amazingly slow that this is barely a win. It adds
> >
> > That's nonsense. It's definitely a win to access HPET directly
> > especially on older systems with TSC wreckage. Why do you want to
> > enforce a syscall if we can read out the data straight from user
> > space. The systems which are forced to use HPET have slower syscalls
> > than those which have a proper TSC.
> >
>
> I'm actually curious whether anyone cares about this particular
> performance difference. On all my HPET systems, the actual HPET read
> takes ~500ns, whereas the overhead from a syscall is ~50ns. (This is
> ignoring the CONFIG_AUDIT_SYSCALL wreckage, which I'm trying to fix.)
> I certainly care about 10% performance changes in clock_gettime, but
> that's only because it's already fast enough to call very frequently.
> If it took anywhere near 500ns, I would just stop using it, so the
> 50ns difference wouldn't matter for my application.
>
> It's certainly true that, on older hardware, syscalls are slower, but
> I suspect that the HPET is at least as slow, and old enough hardware
> won't even have a usable HPET.

Well, on one reference system which is forced to use hpet the
systemcall overhead with your patch amounts with a real world
application to a whopping 20% versus the vdso based HPET access.

> On newish things (probably Nehalem and up that have non-buggy BIOS),
> HPET is AFAIK completely pointless.

True, but there is a world outside of the "we have access to the
latest hardware" universe. Linux has served that world very well and I
see no reason why we should not continue to do so.

Thanks,

tglx