2019-03-14 13:15:07

by Peter Zijlstra

[permalink] [raw]
Subject: [PATCH 1/8] perf/x86/intel: Fix memory corruption

Through:

validate_event()
x86_pmu.get_event_constraints(.idx=-1)
tfa_get_event_constraints()
dyn_constraint()

We use cpuc->constraint_list[-1], which is an obvious out-of-bound
access.

In this case, simply skip the TFA constraint code, there is no event
constraint with just PMC3, therefore the code will never result in the
empty set.

Reported-by: Tony Jones <[email protected]>
Reported-by: "DSouza, Nelson" <[email protected]>
Tested-by: Tony Jones <[email protected]>
Tested-by: "DSouza, Nelson" <[email protected]>
Cc: [email protected]
Fixes: 400816f60c54 ("perf/x86/intel: Implement support for TSX Force Abort")
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
---
arch/x86/events/intel/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3410,7 +3410,7 @@ tfa_get_event_constraints(struct cpu_hw_
/*
* Without TFA we must not use PMC3.
*/
- if (!allow_tsx_force_abort && test_bit(3, c->idxmsk)) {
+ if (!allow_tsx_force_abort && test_bit(3, c->idxmsk) && idx >= 0) {
c = dyn_constraint(cpuc, c, idx);
c->idxmsk64 &= ~(1ULL << 3);
c->weight--;




Subject: [tip:perf/urgent] perf/x86/intel: Fix memory corruption

Commit-ID: ede271b059463731cbd6dffe55ffd70d7dbe8392
Gitweb: https://git.kernel.org/tip/ede271b059463731cbd6dffe55ffd70d7dbe8392
Author: Peter Zijlstra <[email protected]>
AuthorDate: Thu, 14 Mar 2019 14:01:14 +0100
Committer: Thomas Gleixner <[email protected]>
CommitDate: Fri, 15 Mar 2019 12:22:51 +0100

perf/x86/intel: Fix memory corruption

Through:

validate_event()
x86_pmu.get_event_constraints(.idx=-1)
tfa_get_event_constraints()
dyn_constraint()

cpuc->constraint_list[-1] is used, which is an obvious out-of-bound access.

In this case, simply skip the TFA constraint code, there is no event
constraint with just PMC3, therefore the code will never result in the
empty set.

Fixes: 400816f60c54 ("perf/x86/intel: Implement support for TSX Force Abort")
Reported-by: Tony Jones <[email protected]>
Reported-by: "DSouza, Nelson" <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Tony Jones <[email protected]>
Tested-by: "DSouza, Nelson" <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]

---
arch/x86/events/intel/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 35102ecdfc8d..92dfeb343a6a 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3410,7 +3410,7 @@ tfa_get_event_constraints(struct cpu_hw_events *cpuc, int idx,
/*
* Without TFA we must not use PMC3.
*/
- if (!allow_tsx_force_abort && test_bit(3, c->idxmsk)) {
+ if (!allow_tsx_force_abort && test_bit(3, c->idxmsk) && idx >= 0) {
c = dyn_constraint(cpuc, c, idx);
c->idxmsk64 &= ~(1ULL << 3);
c->weight--;

2019-03-19 06:30:34

by Stephane Eranian

[permalink] [raw]
Subject: Re: [PATCH 1/8] perf/x86/intel: Fix memory corruption

On Thu, Mar 14, 2019 at 6:11 AM Peter Zijlstra <[email protected]> wrote:
>
> Through:
>
> validate_event()
> x86_pmu.get_event_constraints(.idx=-1)
> tfa_get_event_constraints()
> dyn_constraint()
>
> We use cpuc->constraint_list[-1], which is an obvious out-of-bound
> access.
>
> In this case, simply skip the TFA constraint code, there is no event
> constraint with just PMC3, therefore the code will never result in the
> empty set.
>
> Reported-by: Tony Jones <[email protected]>
> Reported-by: "DSouza, Nelson" <[email protected]>
> Tested-by: Tony Jones <[email protected]>
> Tested-by: "DSouza, Nelson" <[email protected]>
> Cc: [email protected]
> Fixes: 400816f60c54 ("perf/x86/intel: Implement support for TSX Force Abort")
> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
> ---
> arch/x86/events/intel/core.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -3410,7 +3410,7 @@ tfa_get_event_constraints(struct cpu_hw_
> /*
> * Without TFA we must not use PMC3.
> */
> - if (!allow_tsx_force_abort && test_bit(3, c->idxmsk)) {
> + if (!allow_tsx_force_abort && test_bit(3, c->idxmsk) && idx >= 0) {
> c = dyn_constraint(cpuc, c, idx);
> c->idxmsk64 &= ~(1ULL << 3);
> c->weight--;
>
>
I was not cc'd on the patch that added allow_tsx_force_abort, so I
will give some comments here.
If I understand the goal of the control parameter it is to turn on/off
the TFA workaround and thus determine
whether or not PMC3 is available. I don't know why you would need to
make this a runtime tunable. That
seems a bit dodgy. But given the code you have here right now, we have
to deal with it. A sysadmin could
flip the control at any time, including when PMC3 is already in used
by some events. I do not see the code
that schedules out all the events on all CPUs once PMC3 becomes
unavailable. You cannot just rely on the
next context-switch or timer tick for multiplexing.

2019-03-19 11:06:57

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/8] perf/x86/intel: Fix memory corruption

On Mon, Mar 18, 2019 at 11:29:25PM -0700, Stephane Eranian wrote:

> > --- a/arch/x86/events/intel/core.c
> > +++ b/arch/x86/events/intel/core.c
> > @@ -3410,7 +3410,7 @@ tfa_get_event_constraints(struct cpu_hw_
> > /*
> > * Without TFA we must not use PMC3.
> > */
> > - if (!allow_tsx_force_abort && test_bit(3, c->idxmsk)) {
> > + if (!allow_tsx_force_abort && test_bit(3, c->idxmsk) && idx >= 0) {
> > c = dyn_constraint(cpuc, c, idx);
> > c->idxmsk64 &= ~(1ULL << 3);
> > c->weight--;
> >
> >

> I was not cc'd on the patch that added allow_tsx_force_abort, so I

Yeah, that never was public :-( I didn't particularly like that, but
that's the way it is.

> will give some comments here.

> If I understand the goal of the control parameter it is to turn on/off
> the TFA workaround and thus determine whether or not PMC3 is
> available. I don't know why you would need to make this a runtime
> tunable.

Not quite; the control on its own doesn't directly write the MSR. And
even when the work-around is allowed, we'll not set the MSR unless there
is also demand for PMC3.

It is a runtime tunable because boot parameters suck.

> That seems a bit dodgy. But given the code you have here right now, we
> have to deal with it. A sysadmin could flip the control at any time,
> including when PMC3 is already in used by some events. I do not see
> the code that schedules out all the events on all CPUs once PMC3
> becomes unavailable. You cannot just rely on the next context-switch
> or timer tick for multiplexing.

Yeah, meh. You're admin, you can 'fix' it. In practise I don't expect
most people to care about the knob, and the few people that do, should
be able to make it work.

2019-03-19 17:53:19

by Stephane Eranian

[permalink] [raw]
Subject: Re: [PATCH 1/8] perf/x86/intel: Fix memory corruption

On Tue, Mar 19, 2019 at 4:05 AM Peter Zijlstra <[email protected]> wrote:
>
> On Mon, Mar 18, 2019 at 11:29:25PM -0700, Stephane Eranian wrote:
>
> > > --- a/arch/x86/events/intel/core.c
> > > +++ b/arch/x86/events/intel/core.c
> > > @@ -3410,7 +3410,7 @@ tfa_get_event_constraints(struct cpu_hw_
> > > /*
> > > * Without TFA we must not use PMC3.
> > > */
> > > - if (!allow_tsx_force_abort && test_bit(3, c->idxmsk)) {
> > > + if (!allow_tsx_force_abort && test_bit(3, c->idxmsk) && idx >= 0) {
> > > c = dyn_constraint(cpuc, c, idx);
> > > c->idxmsk64 &= ~(1ULL << 3);
> > > c->weight--;
> > >
> > >
>
> > I was not cc'd on the patch that added allow_tsx_force_abort, so I
>
> Yeah, that never was public :-( I didn't particularly like that, but
> that's the way it is.
>
> > will give some comments here.
>
> > If I understand the goal of the control parameter it is to turn on/off
> > the TFA workaround and thus determine whether or not PMC3 is
> > available. I don't know why you would need to make this a runtime
> > tunable.
>
> Not quite; the control on its own doesn't directly write the MSR. And
> even when the work-around is allowed, we'll not set the MSR unless there
> is also demand for PMC3.
>
Trying to understand this better here. When the workaround is enabled
(tfa=0), you lose
PMC3 and transactions operate normally. When it is disabled (tfa=1),
transactions are
all aborted and PMC3 is available. If you are saying that when there
is a PMU event
requesting PMC3, then you need PMC3 avail, so you set the MSR so that
tfa=1 forcing
all transactions to abort. But in that case, you are modifying the
execution of the workload
when you are monitoring it, assuming it uses TSX. You want lowest
overhead and no
modifications to how the workload operates, otherwise how
representative is the data you are
collecting? I understand that there is no impact on apps not using
TSX, well, except on context
switch where you have to toggle that MSR. But for workloads using TSX,
there is potentially
an impact.

> It is a runtime tunable because boot parameters suck.
>
> > That seems a bit dodgy. But given the code you have here right now, we
> > have to deal with it. A sysadmin could flip the control at any time,
> > including when PMC3 is already in used by some events. I do not see
> > the code that schedules out all the events on all CPUs once PMC3
> > becomes unavailable. You cannot just rely on the next context-switch
> > or timer tick for multiplexing.
>
> Yeah, meh. You're admin, you can 'fix' it. In practise I don't expect
> most people to care about the knob, and the few people that do, should
> be able to make it work.

I don't understand how this can work reliably. You have a knob to toggle
that MSR. Then, you have another one inside perf_events and then the sysadmin
has to make sure nobody (incl. NMI watchdog) is using the PMU when
this all happens.
How can this be a practical solution? Am I missing something here?

2019-03-19 18:23:06

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/8] perf/x86/intel: Fix memory corruption

On Tue, Mar 19, 2019 at 10:52:01AM -0700, Stephane Eranian wrote:
> > Not quite; the control on its own doesn't directly write the MSR. And
> > even when the work-around is allowed, we'll not set the MSR unless there
> > is also demand for PMC3.
> >
> Trying to understand this better here. When the workaround is enabled
> (tfa=0), you lose PMC3 and transactions operate normally.

> When it is disabled (tfa=1), transactions are all aborted and PMC3 is
> available.

Right, but we don't expose tfa.

> If you are saying that when there is a PMU event requesting PMC3, then
> you need PMC3 avail, so you set the MSR so that tfa=1 forcing all
> transactions to abort.

Right, so when allow_tfa=1 (default), we only set tfa=1 when PMC3 is
requested.

This has the advantage that,

TSX only workload -> works
perf 4 counteres -> works

Only when you need both of them, do you get 'trouble'.

> But in that case, you are modifying the execution of the workload when
> you are monitoring it, assuming it uses TSX.

We assume you are not in fact using TSX, not a lot of code does. If you
do use TSX a lot, and you don't want to interfere, you have to set
allow_tfa=0 and live with one counter less.

Any which way around you turn this stone, it sucks.

> You want lowest overhead and no modifications to how the workload
> operates, otherwise how representative is the data you are collecting?

Sure; but there are no good choices here. This 'fix' will break
something. We figured TSX+4-counter-perf was the least common scenario.

We konw of people that rely on 4 counter being present; you want to
explain to them how when doing an update their program suddently doesn't
work anymore?

Or you want to default to tfa=1; but then you have to explain to those
people relying on TSX why their workload stopped working.

> I understand that there is no impact on apps not using TSX, well,
> except on context switch where you have to toggle that MSR.

There is no additional code in the context switch; only the perf event
scheduling code prods at the MSR.

> But for workloads using TSX, there is potentially an impact.

Yes, well, if you're a TSX _and_ perf user, you now have an extra knob
to play with; that's not something I can do anything about. We're forced
to make a choice here.

> > Yeah, meh. You're admin, you can 'fix' it. In practise I don't expect
> > most people to care about the knob, and the few people that do, should
> > be able to make it work.
>
> I don't understand how this can work reliably.

> You have a knob to toggle that MSR.

No, we don't have this knob.

> Then, you have another one inside perf_events

Only this knob exists allow_tfa.

> and then the sysadmin has to make sure nobody (incl. NMI watchdog) is
> using the PMU when this all happens.

You're very unlucky if the watchdog runs on PMC3, normally it runs on
Fixed1 or something. Esp early after boot. (Remember, we schedule fixed
counters first, and then general purpose counters, with a preference for
lower counters).

Anyway, you can trivially switch it off if you want.

> How can this be a practical solution? Am I missing something here?

It works just fine; it is unfortunate that we have this interaction but
that's not something we can do anything about. We're forced to deal with
this.

But if you're a TSX+perf user, have your boot scripts do:

echo 0 > /sys/bus/event_source/devices/cpu/allow_tsx_force_abort

and you'll not use PMC3 and TSX will be 'awesome'. If you don't give a
crap about TSX (most people), just boot and be happy.

If you do care about TSX+perf and want to dynamically toggle for some
reason, you just have to be a little careful.

2019-03-20 20:50:03

by Stephane Eranian

[permalink] [raw]
Subject: Re: [PATCH 1/8] perf/x86/intel: Fix memory corruption

On Tue, Mar 19, 2019 at 11:20 AM Peter Zijlstra <[email protected]> wrote:
>
> On Tue, Mar 19, 2019 at 10:52:01AM -0700, Stephane Eranian wrote:
> > > Not quite; the control on its own doesn't directly write the MSR. And
> > > even when the work-around is allowed, we'll not set the MSR unless there
> > > is also demand for PMC3.
> > >
> > Trying to understand this better here. When the workaround is enabled
> > (tfa=0), you lose PMC3 and transactions operate normally.
>
> > When it is disabled (tfa=1), transactions are all aborted and PMC3 is
> > available.
>
> Right, but we don't expose tfa.
>
> > If you are saying that when there is a PMU event requesting PMC3, then
> > you need PMC3 avail, so you set the MSR so that tfa=1 forcing all
> > transactions to abort.
>
> Right, so when allow_tfa=1 (default), we only set tfa=1 when PMC3 is
> requested.
>
> This has the advantage that,
>
> TSX only workload -> works
> perf 4 counteres -> works
>
> Only when you need both of them, do you get 'trouble'.
>
> > But in that case, you are modifying the execution of the workload when
> > you are monitoring it, assuming it uses TSX.
>
> We assume you are not in fact using TSX, not a lot of code does. If you
> do use TSX a lot, and you don't want to interfere, you have to set
> allow_tfa=0 and live with one counter less.
>
> Any which way around you turn this stone, it sucks.
>
> > You want lowest overhead and no modifications to how the workload
> > operates, otherwise how representative is the data you are collecting?
>
> Sure; but there are no good choices here. This 'fix' will break
> something. We figured TSX+4-counter-perf was the least common scenario.
>
> We konw of people that rely on 4 counter being present; you want to
> explain to them how when doing an update their program suddently doesn't
> work anymore?
>
> Or you want to default to tfa=1; but then you have to explain to those
> people relying on TSX why their workload stopped working.
>
> > I understand that there is no impact on apps not using TSX, well,
> > except on context switch where you have to toggle that MSR.
>
> There is no additional code in the context switch; only the perf event
> scheduling code prods at the MSR.
>
> > But for workloads using TSX, there is potentially an impact.
>
> Yes, well, if you're a TSX _and_ perf user, you now have an extra knob
> to play with; that's not something I can do anything about. We're forced
> to make a choice here.
>
> > > Yeah, meh. You're admin, you can 'fix' it. In practise I don't expect
> > > most people to care about the knob, and the few people that do, should
> > > be able to make it work.
> >
> > I don't understand how this can work reliably.
>
> > You have a knob to toggle that MSR.
>
> No, we don't have this knob.
>
> > Then, you have another one inside perf_events
>
> Only this knob exists allow_tfa.
>
> > and then the sysadmin has to make sure nobody (incl. NMI watchdog) is
> > using the PMU when this all happens.
>
> You're very unlucky if the watchdog runs on PMC3, normally it runs on
> Fixed1 or something. Esp early after boot. (Remember, we schedule fixed
> counters first, and then general purpose counters, with a preference for
> lower counters).
>
> Anyway, you can trivially switch it off if you want.
>
> > How can this be a practical solution? Am I missing something here?
>
> It works just fine; it is unfortunate that we have this interaction but
> that's not something we can do anything about. We're forced to deal with
> this.
>
Right now, if I do:

echo 0 > /sys/bus/event_source/devices/cpu/allow_tsx_force_abort

Then I don't have the guarantee on when there will be no abort when I
return from the echo.
the MSR is accessed only on PMU scheduling. I would expect a sysadmin
to want some guarantee
if this is to be switched on/off at runtime. If not, then having a
boot time option is better in my opinion.

This other bit I noticed is that cpuc->tfa_shadow is used to avoid the
wrmsr(), but I don't see the
code that makes sure the init value (0) matches the value of the MSR.
Is this MSR guarantee to be
zero on reset? How about on kexec()?

> But if you're a TSX+perf user, have your boot scripts do:
>
> echo 0 > /sys/bus/event_source/devices/cpu/allow_tsx_force_abort
>
> and you'll not use PMC3 and TSX will be 'awesome'. If you don't give a
> crap about TSX (most people), just boot and be happy.
>
> If you do care about TSX+perf and want to dynamically toggle for some
> reason, you just have to be a little careful.

2019-03-20 20:54:52

by Stephane Eranian

[permalink] [raw]
Subject: Re: [PATCH 1/8] perf/x86/intel: Fix memory corruption

On Wed, Mar 20, 2019 at 1:47 PM Stephane Eranian <[email protected]> wrote:
>
> On Tue, Mar 19, 2019 at 11:20 AM Peter Zijlstra <[email protected]> wrote:
> >
> > On Tue, Mar 19, 2019 at 10:52:01AM -0700, Stephane Eranian wrote:
> > > > Not quite; the control on its own doesn't directly write the MSR. And
> > > > even when the work-around is allowed, we'll not set the MSR unless there
> > > > is also demand for PMC3.
> > > >
> > > Trying to understand this better here. When the workaround is enabled
> > > (tfa=0), you lose PMC3 and transactions operate normally.
> >
> > > When it is disabled (tfa=1), transactions are all aborted and PMC3 is
> > > available.
> >
> > Right, but we don't expose tfa.
> >
> > > If you are saying that when there is a PMU event requesting PMC3, then
> > > you need PMC3 avail, so you set the MSR so that tfa=1 forcing all
> > > transactions to abort.
> >
> > Right, so when allow_tfa=1 (default), we only set tfa=1 when PMC3 is
> > requested.
> >
> > This has the advantage that,
> >
> > TSX only workload -> works
> > perf 4 counteres -> works
> >
> > Only when you need both of them, do you get 'trouble'.
> >
> > > But in that case, you are modifying the execution of the workload when
> > > you are monitoring it, assuming it uses TSX.
> >
> > We assume you are not in fact using TSX, not a lot of code does. If you
> > do use TSX a lot, and you don't want to interfere, you have to set
> > allow_tfa=0 and live with one counter less.
> >
> > Any which way around you turn this stone, it sucks.
> >
> > > You want lowest overhead and no modifications to how the workload
> > > operates, otherwise how representative is the data you are collecting?
> >
> > Sure; but there are no good choices here. This 'fix' will break
> > something. We figured TSX+4-counter-perf was the least common scenario.
> >
> > We konw of people that rely on 4 counter being present; you want to
> > explain to them how when doing an update their program suddently doesn't
> > work anymore?
> >
> > Or you want to default to tfa=1; but then you have to explain to those
> > people relying on TSX why their workload stopped working.
> >
> > > I understand that there is no impact on apps not using TSX, well,
> > > except on context switch where you have to toggle that MSR.
> >
> > There is no additional code in the context switch; only the perf event
> > scheduling code prods at the MSR.
> >
> > > But for workloads using TSX, there is potentially an impact.
> >
> > Yes, well, if you're a TSX _and_ perf user, you now have an extra knob
> > to play with; that's not something I can do anything about. We're forced
> > to make a choice here.
> >
> > > > Yeah, meh. You're admin, you can 'fix' it. In practise I don't expect
> > > > most people to care about the knob, and the few people that do, should
> > > > be able to make it work.
> > >
> > > I don't understand how this can work reliably.
> >
> > > You have a knob to toggle that MSR.
> >
> > No, we don't have this knob.
> >
> > > Then, you have another one inside perf_events
> >
> > Only this knob exists allow_tfa.
> >
> > > and then the sysadmin has to make sure nobody (incl. NMI watchdog) is
> > > using the PMU when this all happens.
> >
> > You're very unlucky if the watchdog runs on PMC3, normally it runs on
> > Fixed1 or something. Esp early after boot. (Remember, we schedule fixed
> > counters first, and then general purpose counters, with a preference for
> > lower counters).
> >
> > Anyway, you can trivially switch it off if you want.
> >
> > > How can this be a practical solution? Am I missing something here?
> >
> > It works just fine; it is unfortunate that we have this interaction but
> > that's not something we can do anything about. We're forced to deal with
> > this.
> >
> Right now, if I do:
>
> echo 0 > /sys/bus/event_source/devices/cpu/allow_tsx_force_abort
>
> Then I don't have the guarantee on when there will be no abort when I
> return from the echo.
> the MSR is accessed only on PMU scheduling. I would expect a sysadmin
> to want some guarantee
> if this is to be switched on/off at runtime. If not, then having a
> boot time option is better in my opinion.
>
> This other bit I noticed is that cpuc->tfa_shadow is used to avoid the
> wrmsr(), but I don't see the
> code that makes sure the init value (0) matches the value of the MSR.
> Is this MSR guarantee to be
> zero on reset? How about on kexec()?
>

Furthermore, depending on what is measured on each CPU, i.e., PMC3 is
used or not,
the TFA may be set to true or false dynamically. So if you have a TSX
workload it may
be impacted depending on which CPU it runs on. I don't think users would like
that approach.


> > But if you're a TSX+perf user, have your boot scripts do:
> >
> > echo 0 > /sys/bus/event_source/devices/cpu/allow_tsx_force_abort
> >
> > and you'll not use PMC3 and TSX will be 'awesome'. If you don't give a
> > crap about TSX (most people), just boot and be happy.
> >
> > If you do care about TSX+perf and want to dynamically toggle for some
> > reason, you just have to be a little careful.

2019-03-20 22:23:39

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/8] perf/x86/intel: Fix memory corruption

On Wed, Mar 20, 2019 at 01:47:28PM -0700, Stephane Eranian wrote:

> Right now, if I do:
>
> echo 0 > /sys/bus/event_source/devices/cpu/allow_tsx_force_abort
>
> Then I don't have the guarantee on when there will be no abort when I
> return from the echo. the MSR is accessed only on PMU scheduling. I
> would expect a sysadmin to want some guarantee if this is to be
> switched on/off at runtime. If not, then having a boot time option is
> better in my opinion.

Something like cycling the nmi watchdog or:

perf stat -a -e cycles sleep 1

should be enough to force reschedule the events on every CPU.

Again, I'm not adverse to 'fixing' this if it can be done with limited
LoC. But I don't really see this as critical.

> This other bit I noticed is that cpuc->tfa_shadow is used to avoid the
> wrmsr(), but I don't see the code that makes sure the init value (0)
> matches the value of the MSR. Is this MSR guarantee to be zero on
> reset?

That was my understanding.

> How about on kexec()?

Good point, we might want to fix that.

2019-03-21 12:40:10

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/8] perf/x86/intel: Fix memory corruption

On Wed, Mar 20, 2019 at 11:22:20PM +0100, Peter Zijlstra wrote:
> On Wed, Mar 20, 2019 at 01:47:28PM -0700, Stephane Eranian wrote:
>
> > Right now, if I do:
> >
> > echo 0 > /sys/bus/event_source/devices/cpu/allow_tsx_force_abort
> >
> > Then I don't have the guarantee on when there will be no abort when I
> > return from the echo. the MSR is accessed only on PMU scheduling. I
> > would expect a sysadmin to want some guarantee if this is to be
> > switched on/off at runtime. If not, then having a boot time option is
> > better in my opinion.
>
> Something like cycling the nmi watchdog or:
>
> perf stat -a -e cycles sleep 1
>
> should be enough to force reschedule the events on every CPU.
>
> Again, I'm not adverse to 'fixing' this if it can be done with limited
> LoC. But I don't really see this as critical.
>
> > This other bit I noticed is that cpuc->tfa_shadow is used to avoid the
> > wrmsr(), but I don't see the code that makes sure the init value (0)
> > matches the value of the MSR. Is this MSR guarantee to be zero on
> > reset?
>
> That was my understanding.
>
> > How about on kexec()?
>
> Good point, we might want to fix that.

Something like the below perhaps?

---
Subject: perf/x86/intel: Initialize TFA MSR

Stephane reported that we don't initialize the TFA MSR, which could lead
to trouble if the RESET value is not 0 or on kexec.

Reported-by: Stephane Eranian <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
---
arch/x86/events/intel/core.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 8baa441d8000..2d3caf2d1384 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3575,6 +3575,12 @@ static void intel_pmu_cpu_starting(int cpu)

cpuc->lbr_sel = NULL;

+ if (x86_pmu.flags & PMU_FL_TFA) {
+ WARN_ON_ONCE(cpuc->tfa_shadow);
+ cpuc->tfa_shadow = ~0ULL;
+ intel_set_tfa(cpuc, false);
+ }
+
if (x86_pmu.version > 1)
flip_smm_bit(&x86_pmu.attr_freeze_on_smi);


2019-03-21 16:46:53

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 1/8] perf/x86/intel: Fix memory corruption

On Thu, 21 Mar 2019, Peter Zijlstra wrote:
> Subject: perf/x86/intel: Initialize TFA MSR
>
> Stephane reported that we don't initialize the TFA MSR, which could lead
> to trouble if the RESET value is not 0 or on kexec.

That sentence doesn't parse.

Stephane reported that the TFA MSR is not initialized by the kernel, but
the TFA bit could set by firmware or as a leftover from a kexec, which
makes the state inconsistent.

> Reported-by: Stephane Eranian <[email protected]>
> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
> ---
> arch/x86/events/intel/core.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index 8baa441d8000..2d3caf2d1384 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -3575,6 +3575,12 @@ static void intel_pmu_cpu_starting(int cpu)
>
> cpuc->lbr_sel = NULL;
>
> + if (x86_pmu.flags & PMU_FL_TFA) {
> + WARN_ON_ONCE(cpuc->tfa_shadow);

Hmm. I wouldn't warn here as this is a legit state for kexec/kdump and I
don't think we can figure out whether this comes directly from the
firmware.

Thanks,

tglx

2019-03-21 17:13:41

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/8] perf/x86/intel: Fix memory corruption

On Thu, Mar 21, 2019 at 05:45:41PM +0100, Thomas Gleixner wrote:
> On Thu, 21 Mar 2019, Peter Zijlstra wrote:
> > Subject: perf/x86/intel: Initialize TFA MSR
> >
> > Stephane reported that we don't initialize the TFA MSR, which could lead
> > to trouble if the RESET value is not 0 or on kexec.
>
> That sentence doesn't parse.
>
> Stephane reported that the TFA MSR is not initialized by the kernel, but
> the TFA bit could set by firmware or as a leftover from a kexec, which
> makes the state inconsistent.
>
> > Reported-by: Stephane Eranian <[email protected]>
> > Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
> > ---
> > arch/x86/events/intel/core.c | 6 ++++++
> > 1 file changed, 6 insertions(+)
> >
> > diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> > index 8baa441d8000..2d3caf2d1384 100644
> > --- a/arch/x86/events/intel/core.c
> > +++ b/arch/x86/events/intel/core.c
> > @@ -3575,6 +3575,12 @@ static void intel_pmu_cpu_starting(int cpu)
> >
> > cpuc->lbr_sel = NULL;
> >
> > + if (x86_pmu.flags & PMU_FL_TFA) {
> > + WARN_ON_ONCE(cpuc->tfa_shadow);
>
> Hmm. I wouldn't warn here as this is a legit state for kexec/kdump and I
> don't think we can figure out whether this comes directly from the
> firmware.

Even on kexec, the cpuc will be freshly allocated and zerod I think. We
only inherit hardware state, not software state.

2019-03-21 17:18:18

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 1/8] perf/x86/intel: Fix memory corruption

On Thu, 21 Mar 2019, Peter Zijlstra wrote:
> On Thu, Mar 21, 2019 at 05:45:41PM +0100, Thomas Gleixner wrote:
> > On Thu, 21 Mar 2019, Peter Zijlstra wrote:
> > > Subject: perf/x86/intel: Initialize TFA MSR
> > >
> > > Stephane reported that we don't initialize the TFA MSR, which could lead
> > > to trouble if the RESET value is not 0 or on kexec.
> >
> > That sentence doesn't parse.
> >
> > Stephane reported that the TFA MSR is not initialized by the kernel, but
> > the TFA bit could set by firmware or as a leftover from a kexec, which
> > makes the state inconsistent.
> >
> > > Reported-by: Stephane Eranian <[email protected]>
> > > Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
> > > ---
> > > arch/x86/events/intel/core.c | 6 ++++++
> > > 1 file changed, 6 insertions(+)
> > >
> > > diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> > > index 8baa441d8000..2d3caf2d1384 100644
> > > --- a/arch/x86/events/intel/core.c
> > > +++ b/arch/x86/events/intel/core.c
> > > @@ -3575,6 +3575,12 @@ static void intel_pmu_cpu_starting(int cpu)
> > >
> > > cpuc->lbr_sel = NULL;
> > >
> > > + if (x86_pmu.flags & PMU_FL_TFA) {
> > > + WARN_ON_ONCE(cpuc->tfa_shadow);
> >
> > Hmm. I wouldn't warn here as this is a legit state for kexec/kdump and I
> > don't think we can figure out whether this comes directly from the
> > firmware.
>
> Even on kexec, the cpuc will be freshly allocated and zerod I think. We
> only inherit hardware state, not software state.

Ouch, misread the patch of course ... What about cpu hotplug? Does that
free/alloc or reuse?

Thanks,

tglx


2019-03-21 17:25:24

by Stephane Eranian

[permalink] [raw]
Subject: Re: [PATCH 1/8] perf/x86/intel: Fix memory corruption

On Thu, Mar 21, 2019 at 9:45 AM Thomas Gleixner <[email protected]> wrote:
>
> On Thu, 21 Mar 2019, Peter Zijlstra wrote:
> > Subject: perf/x86/intel: Initialize TFA MSR
> >
> > Stephane reported that we don't initialize the TFA MSR, which could lead
> > to trouble if the RESET value is not 0 or on kexec.
>
> That sentence doesn't parse.
>
> Stephane reported that the TFA MSR is not initialized by the kernel, but
> the TFA bit could set by firmware or as a leftover from a kexec, which
> makes the state inconsistent.
>
Correct. This is what I meant.
The issue is what does the kernel guarantee when it boots?

I see:
static bool allow_tsx_force_abort = true;

Therefore you must ensure the MSR is set to reflect that state on boot.
So you have to force it to that value to be in sync which is what your
new patch is doing.

> > Reported-by: Stephane Eranian <[email protected]>
> > Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
> > ---
> > arch/x86/events/intel/core.c | 6 ++++++
> > 1 file changed, 6 insertions(+)
> >
> > diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> > index 8baa441d8000..2d3caf2d1384 100644
> > --- a/arch/x86/events/intel/core.c
> > +++ b/arch/x86/events/intel/core.c
> > @@ -3575,6 +3575,12 @@ static void intel_pmu_cpu_starting(int cpu)
> >
> > cpuc->lbr_sel = NULL;
> >
> > + if (x86_pmu.flags & PMU_FL_TFA) {
> > + WARN_ON_ONCE(cpuc->tfa_shadow);
>
> Hmm. I wouldn't warn here as this is a legit state for kexec/kdump and I
> don't think we can figure out whether this comes directly from the
> firmware.
>
> Thanks,
>
> tglx

2019-03-21 17:54:43

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 1/8] perf/x86/intel: Fix memory corruption

On Thu, 21 Mar 2019, Stephane Eranian wrote:
> On Thu, Mar 21, 2019 at 9:45 AM Thomas Gleixner <[email protected]> wrote:
> >
> > On Thu, 21 Mar 2019, Peter Zijlstra wrote:
> > > Subject: perf/x86/intel: Initialize TFA MSR
> > >
> > > Stephane reported that we don't initialize the TFA MSR, which could lead
> > > to trouble if the RESET value is not 0 or on kexec.
> >
> > That sentence doesn't parse.
> >
> > Stephane reported that the TFA MSR is not initialized by the kernel, but
> > the TFA bit could set by firmware or as a leftover from a kexec, which
> > makes the state inconsistent.
> >
> Correct. This is what I meant.
> The issue is what does the kernel guarantee when it boots?
>
> I see:
> static bool allow_tsx_force_abort = true;
>
> Therefore you must ensure the MSR is set to reflect that state on boot.
> So you have to force it to that value to be in sync which is what your
> new patch is doing.

The initial state should be that the MSR TFA bit is 0. The software state
is a different beast.

allow_tsx_force_abort

false Do not set MSR TFA bit (Make TSX work with PMC3) and
exclude PMC3 from being used.

true Set the MSR TFA bit when PMC3 is used by perf, clear it
when PMC3 is not longer in use.

Now, if the firmware or the kexec has the TFA bit set in the MSR and PMC3
is not in use then TSX always aborts pointlessly. It's not a fatal isseu,
but it's inconsistent.

So independent of the state of allow_tsx_force_abort the kernel has to
clear the MSR TSA bit when the CPUs are brought up.

The state of allow_tsx_force_abort is solely relevant after CPUs coming up
to decide whether PMC3 can be used by perf or not.

Thanks,

tglx


2019-03-21 18:22:51

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/8] perf/x86/intel: Fix memory corruption

On Thu, Mar 21, 2019 at 06:17:01PM +0100, Thomas Gleixner wrote:
> On Thu, 21 Mar 2019, Peter Zijlstra wrote:
> > On Thu, Mar 21, 2019 at 05:45:41PM +0100, Thomas Gleixner wrote:
> > > On Thu, 21 Mar 2019, Peter Zijlstra wrote:
> > > > Subject: perf/x86/intel: Initialize TFA MSR
> > > >
> > > > Stephane reported that we don't initialize the TFA MSR, which could lead
> > > > to trouble if the RESET value is not 0 or on kexec.
> > >
> > > That sentence doesn't parse.
> > >
> > > Stephane reported that the TFA MSR is not initialized by the kernel, but
> > > the TFA bit could set by firmware or as a leftover from a kexec, which
> > > makes the state inconsistent.
> > >
> > > > Reported-by: Stephane Eranian <[email protected]>
> > > > Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
> > > > ---
> > > > arch/x86/events/intel/core.c | 6 ++++++
> > > > 1 file changed, 6 insertions(+)
> > > >
> > > > diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> > > > index 8baa441d8000..2d3caf2d1384 100644
> > > > --- a/arch/x86/events/intel/core.c
> > > > +++ b/arch/x86/events/intel/core.c
> > > > @@ -3575,6 +3575,12 @@ static void intel_pmu_cpu_starting(int cpu)
> > > >
> > > > cpuc->lbr_sel = NULL;
> > > >
> > > > + if (x86_pmu.flags & PMU_FL_TFA) {
> > > > + WARN_ON_ONCE(cpuc->tfa_shadow);
> > >
> > > Hmm. I wouldn't warn here as this is a legit state for kexec/kdump and I
> > > don't think we can figure out whether this comes directly from the
> > > firmware.
> >
> > Even on kexec, the cpuc will be freshly allocated and zerod I think. We
> > only inherit hardware state, not software state.
>
> Ouch, misread the patch of course ... What about cpu hotplug? Does that
> free/alloc or reuse?

re-use I think, but since we kill all events before taking the CPU down,
it _should_ have cleared the MSR while doing that.

Might be best to have someone test this.. someone with ucode and a
machine etc.. :-)

2019-03-21 19:44:58

by Tony Jones

[permalink] [raw]
Subject: Re: [PATCH 1/8] perf/x86/intel: Fix memory corruption

On 3/21/19 11:20 AM, Peter Zijlstra wrote:

> Might be best to have someone test this.. someone with ucode and a
> machine etc.. :-)

If your own company can't oblige you here Peter :-) I'm sure I can make some time to test it.

2019-03-21 19:48:59

by DSouza, Nelson

[permalink] [raw]
Subject: RE: [PATCH 1/8] perf/x86/intel: Fix memory corruption

Is the request to check whether the msr gets reset to default upon reboot of the machine ?

-----Original Message-----
From: Tony Jones [mailto:[email protected]]
Sent: Thursday, March 21, 2019 12:43 PM
To: Peter Zijlstra <[email protected]>; Thomas Gleixner <[email protected]>
Cc: Stephane Eranian <[email protected]>; Ingo Molnar <[email protected]>; Jiri Olsa <[email protected]>; LKML <[email protected]>; DSouza, Nelson <[email protected]>
Subject: Re: [PATCH 1/8] perf/x86/intel: Fix memory corruption

On 3/21/19 11:20 AM, Peter Zijlstra wrote:

> Might be best to have someone test this.. someone with ucode and a
> machine etc.. :-)

If your own company can't oblige you here Peter :-) I'm sure I can make some time to test it.

2019-03-21 20:08:02

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/8] perf/x86/intel: Fix memory corruption

On Thu, Mar 21, 2019 at 07:47:50PM +0000, DSouza, Nelson wrote:
> Is the request to check whether the msr gets reset to default upon reboot of the machine ?

basically:

- apply patch
- start workload with 4 counter (on all CPUs), such that tfa-msr=1
- try the following:
o cpu hotplug
o kexec

to see if the new WARN will trigger -- it should not.

2019-03-21 23:18:47

by DSouza, Nelson

[permalink] [raw]
Subject: RE: [PATCH 1/8] perf/x86/intel: Fix memory corruption

Attached cpu hotplug test output while perf is running in the background. No WARN messages seen.

When I run the kexec command, it boots to bios. Haven't used kexec before. Still trying to figure that one out.

-----Original Message-----
From: Peter Zijlstra [mailto:[email protected]]
Sent: Thursday, March 21, 2019 1:07 PM
To: DSouza, Nelson <[email protected]>
Cc: Tony Jones <[email protected]>; Thomas Gleixner <[email protected]>; Stephane Eranian <[email protected]>; Ingo Molnar <[email protected]>; Jiri Olsa <[email protected]>; LKML <[email protected]>
Subject: Re: [PATCH 1/8] perf/x86/intel: Fix memory corruption

On Thu, Mar 21, 2019 at 07:47:50PM +0000, DSouza, Nelson wrote:
> Is the request to check whether the msr gets reset to default upon reboot of the machine ?

basically:

- apply patch
- start workload with 4 counter (on all CPUs), such that tfa-msr=1
- try the following:
o cpu hotplug
o kexec

to see if the new WARN will trigger -- it should not.


Attachments:
cpu_hotplug.txt (921.00 B)
cpu_hotplug.txt

2019-03-22 19:06:21

by Stephane Eranian

[permalink] [raw]
Subject: Re: [PATCH 1/8] perf/x86/intel: Fix memory corruption

On Thu, Mar 21, 2019 at 10:51 AM Thomas Gleixner <[email protected]> wrote:
>
> On Thu, 21 Mar 2019, Stephane Eranian wrote:
> > On Thu, Mar 21, 2019 at 9:45 AM Thomas Gleixner <[email protected]> wrote:
> > >
> > > On Thu, 21 Mar 2019, Peter Zijlstra wrote:
> > > > Subject: perf/x86/intel: Initialize TFA MSR
> > > >
> > > > Stephane reported that we don't initialize the TFA MSR, which could lead
> > > > to trouble if the RESET value is not 0 or on kexec.
> > >
> > > That sentence doesn't parse.
> > >
> > > Stephane reported that the TFA MSR is not initialized by the kernel, but
> > > the TFA bit could set by firmware or as a leftover from a kexec, which
> > > makes the state inconsistent.
> > >
> > Correct. This is what I meant.
> > The issue is what does the kernel guarantee when it boots?
> >
> > I see:
> > static bool allow_tsx_force_abort = true;
> >
> > Therefore you must ensure the MSR is set to reflect that state on boot.
> > So you have to force it to that value to be in sync which is what your
> > new patch is doing.
>
> The initial state should be that the MSR TFA bit is 0. The software state
> is a different beast.
>
> allow_tsx_force_abort
>
> false Do not set MSR TFA bit (Make TSX work with PMC3) and
> exclude PMC3 from being used.
>
> true Set the MSR TFA bit when PMC3 is used by perf, clear it
> when PMC3 is not longer in use.
>
I would expect this description to be included in the source code where the
allow_tsx_force_abort variable is defined and somewhere in the kernel
Documentation
because it is not trivial to understand what the control actually does
and the guarantees
you have when you toggle it.

> Now, if the firmware or the kexec has the TFA bit set in the MSR and PMC3
> is not in use then TSX always aborts pointlessly. It's not a fatal isseu,
> but it's inconsistent.
>
> So independent of the state of allow_tsx_force_abort the kernel has to
> clear the MSR TSA bit when the CPUs are brought up.
>
> The state of allow_tsx_force_abort is solely relevant after CPUs coming up
> to decide whether PMC3 can be used by perf or not.
>
> Thanks,
>
> tglx
>

2019-03-22 22:15:19

by DSouza, Nelson

[permalink] [raw]
Subject: RE: [PATCH 1/8] perf/x86/intel: Fix memory corruption

kexec was borked in 5.0+ kernel I was using.

Switched to 5.0.3 stable and kexec works as expected.
No warnings seen with a kexec boot.

-----Original Message-----
From: DSouza, Nelson
Sent: Thursday, March 21, 2019 4:16 PM
To: Peter Zijlstra <[email protected]>
Cc: Tony Jones <[email protected]>; Thomas Gleixner <[email protected]>; Stephane Eranian <[email protected]>; Ingo Molnar <[email protected]>; Jiri Olsa <[email protected]>; LKML <[email protected]>
Subject: RE: [PATCH 1/8] perf/x86/intel: Fix memory corruption

Attached cpu hotplug test output while perf is running in the background. No WARN messages seen.

When I run the kexec command, it boots to bios. Haven't used kexec before. Still trying to figure that one out.

-----Original Message-----
From: Peter Zijlstra [mailto:[email protected]]
Sent: Thursday, March 21, 2019 1:07 PM
To: DSouza, Nelson <[email protected]>
Cc: Tony Jones <[email protected]>; Thomas Gleixner <[email protected]>; Stephane Eranian <[email protected]>; Ingo Molnar <[email protected]>; Jiri Olsa <[email protected]>; LKML <[email protected]>
Subject: Re: [PATCH 1/8] perf/x86/intel: Fix memory corruption

On Thu, Mar 21, 2019 at 07:47:50PM +0000, DSouza, Nelson wrote:
> Is the request to check whether the msr gets reset to default upon reboot of the machine ?

basically:

- apply patch
- start workload with 4 counter (on all CPUs), such that tfa-msr=1
- try the following:
o cpu hotplug
o kexec

to see if the new WARN will trigger -- it should not.

2019-04-03 07:33:50

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/8] perf/x86/intel: Fix memory corruption

On Fri, Mar 22, 2019 at 12:04:55PM -0700, Stephane Eranian wrote:
> On Thu, Mar 21, 2019 at 10:51 AM Thomas Gleixner <[email protected]> wrote:
> >
> > On Thu, 21 Mar 2019, Stephane Eranian wrote:
> > > On Thu, Mar 21, 2019 at 9:45 AM Thomas Gleixner <[email protected]> wrote:
> > > >
> > > > On Thu, 21 Mar 2019, Peter Zijlstra wrote:
> > > > > Subject: perf/x86/intel: Initialize TFA MSR
> > > > >
> > > > > Stephane reported that we don't initialize the TFA MSR, which could lead
> > > > > to trouble if the RESET value is not 0 or on kexec.
> > > >
> > > > That sentence doesn't parse.
> > > >
> > > > Stephane reported that the TFA MSR is not initialized by the kernel, but
> > > > the TFA bit could set by firmware or as a leftover from a kexec, which
> > > > makes the state inconsistent.
> > > >
> > > Correct. This is what I meant.
> > > The issue is what does the kernel guarantee when it boots?
> > >
> > > I see:
> > > static bool allow_tsx_force_abort = true;
> > >
> > > Therefore you must ensure the MSR is set to reflect that state on boot.
> > > So you have to force it to that value to be in sync which is what your
> > > new patch is doing.
> >
> > The initial state should be that the MSR TFA bit is 0. The software state
> > is a different beast.
> >
> > allow_tsx_force_abort
> >
> > false Do not set MSR TFA bit (Make TSX work with PMC3) and
> > exclude PMC3 from being used.
> >
> > true Set the MSR TFA bit when PMC3 is used by perf, clear it
> > when PMC3 is not longer in use.
> >
> I would expect this description to be included in the source code where the
> allow_tsx_force_abort variable is defined

That part is easy I suppose.

> and somewhere in the kernel Documentation because it is not trivial to
> understand what the control actually does and the guarantees you have
> when you toggle it.

But here we seem to be sorely lacking, that is, there is no sysfs/perf
documentation at all.

In any case, feel free to send a patch :-)

Subject: [tip:perf/urgent] perf/x86/intel: Initialize TFA MSR

Commit-ID: d7262457e35dbe239659e62654e56f8ddb814bed
Gitweb: https://git.kernel.org/tip/d7262457e35dbe239659e62654e56f8ddb814bed
Author: Peter Zijlstra <[email protected]>
AuthorDate: Thu, 21 Mar 2019 13:38:49 +0100
Committer: Ingo Molnar <[email protected]>
CommitDate: Wed, 3 Apr 2019 11:40:32 +0200

perf/x86/intel: Initialize TFA MSR

Stephane reported that the TFA MSR is not initialized by the kernel,
but the TFA bit could set by firmware or as a leftover from a kexec,
which makes the state inconsistent.

Reported-by: Stephane Eranian <[email protected]>
Tested-by: Nelson DSouza <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vince Weaver <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/events/intel/core.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 1539647ea39d..f61dcbef20ff 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3575,6 +3575,12 @@ static void intel_pmu_cpu_starting(int cpu)

cpuc->lbr_sel = NULL;

+ if (x86_pmu.flags & PMU_FL_TFA) {
+ WARN_ON_ONCE(cpuc->tfa_shadow);
+ cpuc->tfa_shadow = ~0ULL;
+ intel_set_tfa(cpuc, false);
+ }
+
if (x86_pmu.version > 1)
flip_smm_bit(&x86_pmu.attr_freeze_on_smi);

2019-04-03 11:32:01

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [tip:perf/urgent] perf/x86/intel: Initialize TFA MSR

On Wed, 3 Apr 2019, tip-bot for Peter Zijlstra wrote:

> Commit-ID: d7262457e35dbe239659e62654e56f8ddb814bed
> Gitweb: https://git.kernel.org/tip/d7262457e35dbe239659e62654e56f8ddb814bed
> Author: Peter Zijlstra <[email protected]>
> AuthorDate: Thu, 21 Mar 2019 13:38:49 +0100
> Committer: Ingo Molnar <[email protected]>
> CommitDate: Wed, 3 Apr 2019 11:40:32 +0200
>
> perf/x86/intel: Initialize TFA MSR
>
> Stephane reported that the TFA MSR is not initialized by the kernel,
> but the TFA bit could set by firmware or as a leftover from a kexec,
> which makes the state inconsistent.
>
> Reported-by: Stephane Eranian <[email protected]>
> Tested-by: Nelson DSouza <[email protected]>
> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
> Cc: Alexander Shishkin <[email protected]>
> Cc: Arnaldo Carvalho de Melo <[email protected]>
> Cc: Jiri Olsa <[email protected]>
> Cc: Linus Torvalds <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Vince Weaver <[email protected]>
> Cc: [email protected]
> Link: https://lkml.kernel.org/r/[email protected]
> Signed-off-by: Ingo Molnar <[email protected]>

This lacks:

1) Fixes tag

2) Cc: stable ....

Sigh.

> ---
> arch/x86/events/intel/core.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index 1539647ea39d..f61dcbef20ff 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -3575,6 +3575,12 @@ static void intel_pmu_cpu_starting(int cpu)
>
> cpuc->lbr_sel = NULL;
>
> + if (x86_pmu.flags & PMU_FL_TFA) {
> + WARN_ON_ONCE(cpuc->tfa_shadow);
> + cpuc->tfa_shadow = ~0ULL;
> + intel_set_tfa(cpuc, false);
> + }
> +
> if (x86_pmu.version > 1)
> flip_smm_bit(&x86_pmu.attr_freeze_on_smi);
>
>

2019-04-03 12:24:54

by Vince Weaver

[permalink] [raw]
Subject: Re: [tip:perf/urgent] perf/x86/intel: Initialize TFA MSR

On Wed, 3 Apr 2019, Thomas Gleixner wrote:

> On Wed, 3 Apr 2019, tip-bot for Peter Zijlstra wrote:
>
> > Commit-ID: d7262457e35dbe239659e62654e56f8ddb814bed
> > Gitweb: https://git.kernel.org/tip/d7262457e35dbe239659e62654e56f8ddb814bed
> > Author: Peter Zijlstra <[email protected]>
> > AuthorDate: Thu, 21 Mar 2019 13:38:49 +0100
> > Committer: Ingo Molnar <[email protected]>
> > CommitDate: Wed, 3 Apr 2019 11:40:32 +0200
> >
> > perf/x86/intel: Initialize TFA MSR
> >
> > Stephane reported that the TFA MSR is not initialized by the kernel,
> > but the TFA bit could set by firmware or as a leftover from a kexec,
> > which makes the state inconsistent.
> >
> > Reported-by: Stephane Eranian <[email protected]>
> > Tested-by: Nelson DSouza <[email protected]>
> > Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
> > Cc: Alexander Shishkin <[email protected]>
> > Cc: Arnaldo Carvalho de Melo <[email protected]>
> > Cc: Jiri Olsa <[email protected]>
> > Cc: Linus Torvalds <[email protected]>
> > Cc: Peter Zijlstra <[email protected]>
> > Cc: Thomas Gleixner <[email protected]>
> > Cc: Vince Weaver <[email protected]>
> > Cc: [email protected]
> > Link: https://lkml.kernel.org/r/[email protected]
> > Signed-off-by: Ingo Molnar <[email protected]>
>
> This lacks:
>
> 1) Fixes tag
>
> 2) Cc: stable ....
>
> Sigh.

It would also be nice to know what a "TFA" bit is without having to go
find a copy of the Intel documentation.

Vince