2023-05-08 22:05:19

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC][PATCH 7/9] x86/tsc: Provide sched_clock_noinstr()

On Mon, May 08, 2023 at 11:19:58PM +0200, Peter Zijlstra wrote:

> --- a/drivers/clocksource/hyperv_timer.c
> +++ b/drivers/clocksource/hyperv_timer.c
> @@ -408,9 +408,9 @@ static u64 notrace read_hv_clock_tsc_cs(
> return read_hv_clock_tsc();
> }
>
> -static u64 notrace read_hv_sched_clock_tsc(void)
> +static u64 noinstr read_hv_sched_clock_tsc(void)
> {
> - return (read_hv_clock_tsc() - hv_sched_clock_offset) *
> + return (hv_read_tsc_page(hv_get_tsc_page()) - hv_sched_clock_offset) *
> (NSEC_PER_SEC / HV_CLOCK_HZ);
> }
>
> --- a/include/clocksource/hyperv_timer.h
> +++ b/include/clocksource/hyperv_timer.h
> @@ -38,7 +38,7 @@ extern void hv_remap_tsc_clocksource(voi
> extern unsigned long hv_get_tsc_pfn(void);
> extern struct ms_hyperv_tsc_page *hv_get_tsc_page(void);
>
> -static inline notrace u64
> +static __always_inline notrace u64
> hv_read_tsc_page_tsc(const struct ms_hyperv_tsc_page *tsc_pg, u64 *cur_tsc)
> {
> u64 scale, offset;
> @@ -85,7 +85,7 @@ hv_read_tsc_page_tsc(const struct ms_hyp
> return mul_u64_u64_shr(*cur_tsc, scale, 64) + offset;
> }
>
> -static inline notrace u64
> +static __always_inline notrace u64
> hv_read_tsc_page(const struct ms_hyperv_tsc_page *tsc_pg)
> {
> u64 cur_tsc;

Hyper-V folks!

While reviewing all this I found the following 'gem':

hv_init_clocksource()
hv_setup_sched_clock()
paravirt_set_sched_clock(read_hv_sched_clock_msr)

read_hv_sched_clock_msr() [notrace]
read_hv_clock_msr() [notrace]
hv_get_register() *traced*
hv_get_non_nested_register() ...
hv_ghcb_msr_read()
WARN_ON(in_nmi())
...
local_irq_save()


Note that:

a) sched_clock() is used in NMI context a *LOT*
b) sched_clock() is notrace (or even noinstr with these patches)
and local_irq_save() implies tracing


Can you pretty please:

1) delete all this; or,
2) fix it in a hurry?

Thanks!


2023-05-09 00:18:49

by Wei Liu

[permalink] [raw]
Subject: Re: [RFC][PATCH 7/9] x86/tsc: Provide sched_clock_noinstr()

On Mon, May 08, 2023 at 11:44:19PM +0200, Peter Zijlstra wrote:
> On Mon, May 08, 2023 at 11:19:58PM +0200, Peter Zijlstra wrote:
>
> > --- a/drivers/clocksource/hyperv_timer.c
> > +++ b/drivers/clocksource/hyperv_timer.c
> > @@ -408,9 +408,9 @@ static u64 notrace read_hv_clock_tsc_cs(
> > return read_hv_clock_tsc();
> > }
> >
> > -static u64 notrace read_hv_sched_clock_tsc(void)
> > +static u64 noinstr read_hv_sched_clock_tsc(void)
> > {
> > - return (read_hv_clock_tsc() - hv_sched_clock_offset) *
> > + return (hv_read_tsc_page(hv_get_tsc_page()) - hv_sched_clock_offset) *
> > (NSEC_PER_SEC / HV_CLOCK_HZ);
> > }
> >
> > --- a/include/clocksource/hyperv_timer.h
> > +++ b/include/clocksource/hyperv_timer.h
> > @@ -38,7 +38,7 @@ extern void hv_remap_tsc_clocksource(voi
> > extern unsigned long hv_get_tsc_pfn(void);
> > extern struct ms_hyperv_tsc_page *hv_get_tsc_page(void);
> >
> > -static inline notrace u64
> > +static __always_inline notrace u64
> > hv_read_tsc_page_tsc(const struct ms_hyperv_tsc_page *tsc_pg, u64 *cur_tsc)
> > {
> > u64 scale, offset;
> > @@ -85,7 +85,7 @@ hv_read_tsc_page_tsc(const struct ms_hyp
> > return mul_u64_u64_shr(*cur_tsc, scale, 64) + offset;
> > }
> >
> > -static inline notrace u64
> > +static __always_inline notrace u64
> > hv_read_tsc_page(const struct ms_hyperv_tsc_page *tsc_pg)
> > {
> > u64 cur_tsc;
>
> Hyper-V folks!
>
> While reviewing all this I found the following 'gem':
>
> hv_init_clocksource()
> hv_setup_sched_clock()
> paravirt_set_sched_clock(read_hv_sched_clock_msr)
>
> read_hv_sched_clock_msr() [notrace]
> read_hv_clock_msr() [notrace]
> hv_get_register() *traced*
> hv_get_non_nested_register() ...
> hv_ghcb_msr_read()
> WARN_ON(in_nmi())
> ...
> local_irq_save()
>
>
> Note that:
>
> a) sched_clock() is used in NMI context a *LOT*
> b) sched_clock() is notrace (or even noinstr with these patches)
> and local_irq_save() implies tracing
>

Tianyu and Michael, what's your thought on this?

Is the MSR-based GHCB usable at this point?

What other clock source can be used?

Thanks,
Wei.

>
> Can you pretty please:
>
> 1) delete all this; or,
> 2) fix it in a hurry?
>
> Thanks!

2023-05-11 20:42:05

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC][PATCH 7/9] x86/tsc: Provide sched_clock_noinstr()

On Mon, May 08, 2023 at 11:30:43PM +0000, Wei Liu wrote:
> On Mon, May 08, 2023 at 11:44:19PM +0200, Peter Zijlstra wrote:
> > On Mon, May 08, 2023 at 11:19:58PM +0200, Peter Zijlstra wrote:
> >
> > > --- a/drivers/clocksource/hyperv_timer.c
> > > +++ b/drivers/clocksource/hyperv_timer.c
> > > @@ -408,9 +408,9 @@ static u64 notrace read_hv_clock_tsc_cs(
> > > return read_hv_clock_tsc();
> > > }
> > >
> > > -static u64 notrace read_hv_sched_clock_tsc(void)
> > > +static u64 noinstr read_hv_sched_clock_tsc(void)
> > > {
> > > - return (read_hv_clock_tsc() - hv_sched_clock_offset) *
> > > + return (hv_read_tsc_page(hv_get_tsc_page()) - hv_sched_clock_offset) *
> > > (NSEC_PER_SEC / HV_CLOCK_HZ);
> > > }
> > >
> > > --- a/include/clocksource/hyperv_timer.h
> > > +++ b/include/clocksource/hyperv_timer.h
> > > @@ -38,7 +38,7 @@ extern void hv_remap_tsc_clocksource(voi
> > > extern unsigned long hv_get_tsc_pfn(void);
> > > extern struct ms_hyperv_tsc_page *hv_get_tsc_page(void);
> > >
> > > -static inline notrace u64
> > > +static __always_inline notrace u64
> > > hv_read_tsc_page_tsc(const struct ms_hyperv_tsc_page *tsc_pg, u64 *cur_tsc)
> > > {
> > > u64 scale, offset;
> > > @@ -85,7 +85,7 @@ hv_read_tsc_page_tsc(const struct ms_hyp
> > > return mul_u64_u64_shr(*cur_tsc, scale, 64) + offset;
> > > }
> > >
> > > -static inline notrace u64
> > > +static __always_inline notrace u64
> > > hv_read_tsc_page(const struct ms_hyperv_tsc_page *tsc_pg)
> > > {
> > > u64 cur_tsc;
> >
> > Hyper-V folks!
> >
> > While reviewing all this I found the following 'gem':
> >
> > hv_init_clocksource()
> > hv_setup_sched_clock()
> > paravirt_set_sched_clock(read_hv_sched_clock_msr)
> >
> > read_hv_sched_clock_msr() [notrace]
> > read_hv_clock_msr() [notrace]
> > hv_get_register() *traced*
> > hv_get_non_nested_register() ...
> > hv_ghcb_msr_read()
> > WARN_ON(in_nmi())
> > ...
> > local_irq_save()
> >
> >
> > Note that:
> >
> > a) sched_clock() is used in NMI context a *LOT*
> > b) sched_clock() is notrace (or even noinstr with these patches)
> > and local_irq_save() implies tracing
> >
>
> Tianyu and Michael, what's your thought on this?
>
> Is the MSR-based GHCB usable at this point?
>
> What other clock source can be used?

You do have TSC support -- which is what I fixed for you. It's just the
whole MSR thing that is comically broken.

You could do a read_hv_clock_msr() implementation using
__rdmsr() and add some sanity checking that anything GHCB using (SEV?)
*will* use TSC.

Anyway, will you guys do that, or should I pull out the chainsaw and fix
it for you?

2023-05-11 23:15:02

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [RFC][PATCH 7/9] x86/tsc: Provide sched_clock_noinstr()

From: Peter Zijlstra <[email protected]> Sent: Thursday, May 11, 2023 1:24 PM
>
> On Mon, May 08, 2023 at 11:30:43PM +0000, Wei Liu wrote:
> > On Mon, May 08, 2023 at 11:44:19PM +0200, Peter Zijlstra wrote:
> > > On Mon, May 08, 2023 at 11:19:58PM +0200, Peter Zijlstra wrote:
> > >
> > > > --- a/drivers/clocksource/hyperv_timer.c
> > > > +++ b/drivers/clocksource/hyperv_timer.c
> > > > @@ -408,9 +408,9 @@ static u64 notrace read_hv_clock_tsc_cs(
> > > > return read_hv_clock_tsc();
> > > > }
> > > >
> > > > -static u64 notrace read_hv_sched_clock_tsc(void)
> > > > +static u64 noinstr read_hv_sched_clock_tsc(void)
> > > > {
> > > > - return (read_hv_clock_tsc() - hv_sched_clock_offset) *
> > > > + return (hv_read_tsc_page(hv_get_tsc_page()) - hv_sched_clock_offset) *
> > > > (NSEC_PER_SEC / HV_CLOCK_HZ);
> > > > }
> > > >
> > > > --- a/include/clocksource/hyperv_timer.h
> > > > +++ b/include/clocksource/hyperv_timer.h
> > > > @@ -38,7 +38,7 @@ extern void hv_remap_tsc_clocksource(voi
> > > > extern unsigned long hv_get_tsc_pfn(void);
> > > > extern struct ms_hyperv_tsc_page *hv_get_tsc_page(void);
> > > >
> > > > -static inline notrace u64
> > > > +static __always_inline notrace u64
> > > > hv_read_tsc_page_tsc(const struct ms_hyperv_tsc_page *tsc_pg, u64 *cur_tsc)
> > > > {
> > > > u64 scale, offset;
> > > > @@ -85,7 +85,7 @@ hv_read_tsc_page_tsc(const struct ms_hyp
> > > > return mul_u64_u64_shr(*cur_tsc, scale, 64) + offset;
> > > > }
> > > >
> > > > -static inline notrace u64
> > > > +static __always_inline notrace u64
> > > > hv_read_tsc_page(const struct ms_hyperv_tsc_page *tsc_pg)
> > > > {
> > > > u64 cur_tsc;
> > >
> > > Hyper-V folks!
> > >
> > > While reviewing all this I found the following 'gem':
> > >
> > > hv_init_clocksource()
> > > hv_setup_sched_clock()
> > > paravirt_set_sched_clock(read_hv_sched_clock_msr)
> > >
> > > read_hv_sched_clock_msr() [notrace]
> > > read_hv_clock_msr() [notrace]
> > > hv_get_register() *traced*
> > > hv_get_non_nested_register() ...
> > > hv_ghcb_msr_read()
> > > WARN_ON(in_nmi())
> > > ...
> > > local_irq_save()
> > >
> > >
> > > Note that:
> > >
> > > a) sched_clock() is used in NMI context a *LOT*
> > > b) sched_clock() is notrace (or even noinstr with these patches)
> > > and local_irq_save() implies tracing
> > >
> >
> > Tianyu and Michael, what's your thought on this?
> >
> > Is the MSR-based GHCB usable at this point?
> >
> > What other clock source can be used?
>
> You do have TSC support -- which is what I fixed for you. It's just the
> whole MSR thing that is comically broken.
>
> You could do a read_hv_clock_msr() implementation using
> __rdmsr() and add some sanity checking that anything GHCB using (SEV?)
> *will* use TSC.
>
> Anyway, will you guys do that, or should I pull out the chainsaw and fix
> it for you?

Peter -- I'll work on a fix. But it will be the first half of next week before
I can do it.

Michael

2023-05-12 06:22:09

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC][PATCH 7/9] x86/tsc: Provide sched_clock_noinstr()

On Thu, May 11, 2023 at 11:11:07PM +0000, Michael Kelley (LINUX) wrote:
> From: Peter Zijlstra <[email protected]> Sent: Thursday, May 11, 2023 1:24 PM

> > > Tianyu and Michael, what's your thought on this?
> > >
> > > Is the MSR-based GHCB usable at this point?
> > >
> > > What other clock source can be used?
> >
> > You do have TSC support -- which is what I fixed for you. It's just the
> > whole MSR thing that is comically broken.
> >
> > You could do a read_hv_clock_msr() implementation using
> > __rdmsr() and add some sanity checking that anything GHCB using (SEV?)
> > *will* use TSC.
> >
> > Anyway, will you guys do that, or should I pull out the chainsaw and fix
> > it for you?
>
> Peter -- I'll work on a fix. But it will be the first half of next week before
> I can do it.

OK, Thanks!

2023-05-17 03:04:46

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [RFC][PATCH 7/9] x86/tsc: Provide sched_clock_noinstr()

From: Peter Zijlstra <[email protected]> Sent: Monday, May 8, 2023 2:44 PM
>
> On Mon, May 08, 2023 at 11:19:58PM +0200, Peter Zijlstra wrote:
>
> > --- a/drivers/clocksource/hyperv_timer.c
> > +++ b/drivers/clocksource/hyperv_timer.c
> > @@ -408,9 +408,9 @@ static u64 notrace read_hv_clock_tsc_cs(
> > return read_hv_clock_tsc();
> > }
> >
> > -static u64 notrace read_hv_sched_clock_tsc(void)
> > +static u64 noinstr read_hv_sched_clock_tsc(void)
> > {
> > - return (read_hv_clock_tsc() - hv_sched_clock_offset) *
> > + return (hv_read_tsc_page(hv_get_tsc_page()) - hv_sched_clock_offset) *
> > (NSEC_PER_SEC / HV_CLOCK_HZ);
> > }
> >
> > --- a/include/clocksource/hyperv_timer.h
> > +++ b/include/clocksource/hyperv_timer.h
> > @@ -38,7 +38,7 @@ extern void hv_remap_tsc_clocksource(voi
> > extern unsigned long hv_get_tsc_pfn(void);
> > extern struct ms_hyperv_tsc_page *hv_get_tsc_page(void);
> >
> > -static inline notrace u64
> > +static __always_inline notrace u64
> > hv_read_tsc_page_tsc(const struct ms_hyperv_tsc_page *tsc_pg, u64 *cur_tsc)
> > {
> > u64 scale, offset;
> > @@ -85,7 +85,7 @@ hv_read_tsc_page_tsc(const struct ms_hyp
> > return mul_u64_u64_shr(*cur_tsc, scale, 64) + offset;
> > }
> >
> > -static inline notrace u64
> > +static __always_inline notrace u64
> > hv_read_tsc_page(const struct ms_hyperv_tsc_page *tsc_pg)
> > {
> > u64 cur_tsc;
>
> Hyper-V folks!
>
> While reviewing all this I found the following 'gem':
>
> hv_init_clocksource()
> hv_setup_sched_clock()
> paravirt_set_sched_clock(read_hv_sched_clock_msr)
>
> read_hv_sched_clock_msr() [notrace]
> read_hv_clock_msr() [notrace]
> hv_get_register() *traced*
> hv_get_non_nested_register() ...
> hv_ghcb_msr_read()
> WARN_ON(in_nmi())
> ...
> local_irq_save()
>
>
> Note that:
>
> a) sched_clock() is used in NMI context a *LOT*
> b) sched_clock() is notrace (or even noinstr with these patches)
> and local_irq_save() implies tracing
>
>
> Can you pretty please:
>
> 1) delete all this; or,
> 2) fix it in a hurry?
>

Peter -- I've sent you an RFC patch to incorporate into your broader
patch set. I think it probably makes sense for all the Hyper-V
stuff to be a separate patch.

I haven't previously worked with the details of notrace vs. noinstr,
but I followed the patterns elsewhere in patch set. Please review
to see if it seems correct.

One thing: In the cases where I added __always_inline, I dropped
any notrace or noinstr annotations. I presume such code always
takes on the attributes of the caller. If that's not correct, let me know.

Michael

2023-05-17 11:52:02

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC][PATCH 7/9] x86/tsc: Provide sched_clock_noinstr()

On Wed, May 17, 2023 at 02:26:35AM +0000, Michael Kelley (LINUX) wrote:

> Peter -- I've sent you an RFC patch to incorporate into your broader
> patch set. I think it probably makes sense for all the Hyper-V
> stuff to be a separate patch.

Perhaps, it's not that much.

> I haven't previously worked with the details of notrace vs. noinstr,
> but I followed the patterns elsewhere in patch set. Please review
> to see if it seems correct.

notrace inhibits the "call __fentry__" at the start of the symbol.

The __fentry__ call is mostly for ftrace, there's a few sites where
inhibiting tracing is critical -- stuff that happens before the ftrace
recursion handling, but mostly it's about performance these days,
constantly hitting the recusion code isn't very good.

noinstr inhibits any and all compiler generated 'extra' -- it is for the
C as a portable assembler usage. This very much includes notrace, but it
also covers all the *SAN nonsense. Basically, if it does not directly
reflect the code as written, it shouldn't be emitted.

Additionally, and for validation purposes, it also ensures all these
symbols end up in a special text section.

But yeah, you seem to have gotten it right.

> One thing: In the cases where I added __always_inline, I dropped
> any notrace or noinstr annotations. I presume such code always
> takes on the attributes of the caller. If that's not correct, let me know.

Correct; noinstr actually has an explicit noinline because compilers
suck :/