2023-01-05 20:17:23

by Atish Patra

[permalink] [raw]
Subject: Expected rdpmc behavior during context swtich and a RISC-V conundrum

Hi All,
There was a recent uabi update[1] for RISC-V that allows the users to
read cycle and instruction count without any checks.
We tried to restrict that behavior to address security concerns
earlier but it resulted in breakage for some user space
applications[2].
Thus, previous behavior was restored where a user on RISC-V platforms
can directly read cycle or instruction count[3].

Comparison with other ISAs w.r.t user space access of counters:
ARM64
-- Enabled/Disabled via (/proc/sys/kernel/perf_user_access)
-- Only for task bound events configured via perf.

X86
--- rdpmc instruction
--- Enable/Disable via “/sys/devices/cpu/rdpmc”
-- Before v4.0
-- any process (even without active perf event) rdpmc
After v4.0
-- Default behavior changed to support only active events in a
process’s context.
-- Configured through perf similar to ARM64
-- Continue to maintain backward compatibility for unrestricted access
by writing 2 to “/sys/devices/cpu/rdpmc”

IMO, RISC-V should only enable user space access through perf similar
to ARM64 and x86 (post v4.0).
However, we do have to support the legacy behavior to avoid
application breakage.
As per my understanding a direct user space access can lead to the
following problems:

1) There is no context switch support, so counts from other contexts are exposed
2) If a perf user is allocated one of these counters, the counter
value will be written

Looking at the x86 code as it continues to allow the above behavior,
rdpmc_always_available_key is enabled in the above case. However,
during the context switch (cr4_update_pce_mm)
only dirty counters are cleared. It only prevents leakage from perf
task to rdpmc task.

How does the context switch of counters work for users who enable
unrestricted access by writing 2 to “/sys/devices/cpu/rdpmc” ?
Otherwise, rdpmc users likely get noise from other applications. Is
that expected ?
This can be a security concern also where a rogue rdpmc user
application can monitor other critical applications to initiate side
channel attack.

Am I missing something? Please correct my understanding of the x86
implementation if it is wrong.

[1] https://lore.kernel.org/lkml/[email protected]/
[2] https://groups.google.com/a/groups.riscv.org/g/sw-dev/c/REWcwYnzsKE?pli=1
[3] https://lore.kernel.org/all/[email protected]/T/

--
Regards,
Atish


2023-01-06 12:10:54

by Mark Rutland

[permalink] [raw]
Subject: Re: Expected rdpmc behavior during context swtich and a RISC-V conundrum

On Thu, Jan 05, 2023 at 11:59:24AM -0800, Atish Patra wrote:
> Hi All,
> There was a recent uabi update[1] for RISC-V that allows the users to
> read cycle and instruction count without any checks.
> We tried to restrict that behavior to address security concerns
> earlier but it resulted in breakage for some user space
> applications[2].
> Thus, previous behavior was restored where a user on RISC-V platforms
> can directly read cycle or instruction count[3].
>
> Comparison with other ISAs w.r.t user space access of counters:
> ARM64
> -- Enabled/Disabled via (/proc/sys/kernel/perf_user_access)
> -- Only for task bound events configured via perf.
>
> X86
> --- rdpmc instruction
> --- Enable/Disable via “/sys/devices/cpu/rdpmc”
> -- Before v4.0
> -- any process (even without active perf event) rdpmc
> After v4.0
> -- Default behavior changed to support only active events in a
> process’s context.
> -- Configured through perf similar to ARM64
> -- Continue to maintain backward compatibility for unrestricted access
> by writing 2 to “/sys/devices/cpu/rdpmc”
>
> IMO, RISC-V should only enable user space access through perf similar
> to ARM64 and x86 (post v4.0).
> However, we do have to support the legacy behavior to avoid
> application breakage.
> As per my understanding a direct user space access can lead to the
> following problems:
>
> 1) There is no context switch support, so counts from other contexts are exposed
> 2) If a perf user is allocated one of these counters, the counter
> value will be written
>
> Looking at the x86 code as it continues to allow the above behavior,
> rdpmc_always_available_key is enabled in the above case. However,
> during the context switch (cr4_update_pce_mm)
> only dirty counters are cleared. It only prevents leakage from perf
> task to rdpmc task.
>
> How does the context switch of counters work for users who enable
> unrestricted access by writing 2 to “/sys/devices/cpu/rdpmc” ?
> Otherwise, rdpmc users likely get noise from other applications. Is
> that expected ?

Regardless of leakage, they're also going to get random jumps in the visible
values of the cycle count and instruction count as the task is context-switched
(and/or if those values get reset across idle, as can happen on arm64), so
those aren't going to be useful unless a number of other constraints apply.

AFAICT the affected package was actually a library of intrinsics; does this
affect a real application, or was this just in tests? If it's the latter there
might still be scope to properly lock this down...

Thanks,
Mark.

> This can be a security concern also where a rogue rdpmc user
> application can monitor other critical applications to initiate side
> channel attack.
>
> Am I missing something? Please correct my understanding of the x86
> implementation if it is wrong.
>
> [1] https://lore.kernel.org/lkml/[email protected]/
> [2] https://groups.google.com/a/groups.riscv.org/g/sw-dev/c/REWcwYnzsKE?pli=1
> [3] https://lore.kernel.org/all/[email protected]/T/
>
> --
> Regards,
> Atish

2023-01-09 09:45:11

by Atish Patra

[permalink] [raw]
Subject: Re: Expected rdpmc behavior during context swtich and a RISC-V conundrum

On Fri, Jan 6, 2023 at 4:02 AM Mark Rutland <[email protected]> wrote:
>
> On Thu, Jan 05, 2023 at 11:59:24AM -0800, Atish Patra wrote:
> > Hi All,
> > There was a recent uabi update[1] for RISC-V that allows the users to
> > read cycle and instruction count without any checks.
> > We tried to restrict that behavior to address security concerns
> > earlier but it resulted in breakage for some user space
> > applications[2].
> > Thus, previous behavior was restored where a user on RISC-V platforms
> > can directly read cycle or instruction count[3].
> >
> > Comparison with other ISAs w.r.t user space access of counters:
> > ARM64
> > -- Enabled/Disabled via (/proc/sys/kernel/perf_user_access)
> > -- Only for task bound events configured via perf.
> >
> > X86
> > --- rdpmc instruction
> > --- Enable/Disable via “/sys/devices/cpu/rdpmc”
> > -- Before v4.0
> > -- any process (even without active perf event) rdpmc
> > After v4.0
> > -- Default behavior changed to support only active events in a
> > process’s context.
> > -- Configured through perf similar to ARM64
> > -- Continue to maintain backward compatibility for unrestricted access
> > by writing 2 to “/sys/devices/cpu/rdpmc”
> >
> > IMO, RISC-V should only enable user space access through perf similar
> > to ARM64 and x86 (post v4.0).
> > However, we do have to support the legacy behavior to avoid
> > application breakage.
> > As per my understanding a direct user space access can lead to the
> > following problems:
> >
> > 1) There is no context switch support, so counts from other contexts are exposed
> > 2) If a perf user is allocated one of these counters, the counter
> > value will be written
> >
> > Looking at the x86 code as it continues to allow the above behavior,
> > rdpmc_always_available_key is enabled in the above case. However,
> > during the context switch (cr4_update_pce_mm)
> > only dirty counters are cleared. It only prevents leakage from perf
> > task to rdpmc task.
> >
> > How does the context switch of counters work for users who enable
> > unrestricted access by writing 2 to “/sys/devices/cpu/rdpmc” ?
> > Otherwise, rdpmc users likely get noise from other applications. Is
> > that expected ?
>
> Regardless of leakage, they're also going to get random jumps in the visible
> values of the cycle count and instruction count as the task is context-switched
> (and/or if those values get reset across idle, as can happen on arm64), so
> those aren't going to be useful unless a number of other constraints apply.
>

Agreed.

> AFAICT the affected package was actually a library of intrinsics; does this
> affect a real application, or was this just in tests? If it's the latter there
> might still be scope to properly lock this down...
>

Unfortunately, there are real applications In RISC-V started using
cycle counters due to legacy reasons.

Here is the short list from debian repo pointed out in [1]
https://codesearch.debian.net/search?q=%22rdcycle+%250%22

Looking at aarch64 code in one of the application, it seems they rely
on reading "pmccntr_el0" to read time
https://sources.debian.org/src/chromium/108.0.5359.124-1/third_party/ffmpeg/libavutil/aarch64/timer.h/

AFAIK, any counter access from EL0 is disabled by default in
reset_pmuserenr_el0 and should be enabled via the
proc/sys/perf_user_access
in armv8pmu_enable_user_access. Is that correct ?

I couldn't find any application actually enabling the access using
perf_user_access. Maybe I am missing something?
Otherwise, the above application would trap on access to pmccntr_el0.

[1] https://groups.google.com/a/groups.riscv.org/g/sw-dev/c/REWcwYnzsKE?pli=1
> Thanks,
> Mark.
>
> > This can be a security concern also where a rogue rdpmc user
> > application can monitor other critical applications to initiate side
> > channel attack.
> >
> > Am I missing something? Please correct my understanding of the x86
> > implementation if it is wrong.
> >
> > [1] https://lore.kernel.org/lkml/[email protected]/
> > [2] https://groups.google.com/a/groups.riscv.org/g/sw-dev/c/REWcwYnzsKE?pli=1
> > [3] https://lore.kernel.org/all/[email protected]/T/
> >
> > --
> > Regards,
> > Atish



--
Regards,
Atish

2023-01-09 13:03:32

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Expected rdpmc behavior during context swtich and a RISC-V conundrum

On Thu, Jan 05, 2023 at 11:59:24AM -0800, Atish Patra wrote:
> Hi All,
> There was a recent uabi update[1] for RISC-V that allows the users to
> read cycle and instruction count without any checks.
> We tried to restrict that behavior to address security concerns
> earlier but it resulted in breakage for some user space
> applications[2].
> Thus, previous behavior was restored where a user on RISC-V platforms
> can directly read cycle or instruction count[3].
>
> Comparison with other ISAs w.r.t user space access of counters:
> ARM64
> -- Enabled/Disabled via (/proc/sys/kernel/perf_user_access)
> -- Only for task bound events configured via perf.
>
> X86
> --- rdpmc instruction
> --- Enable/Disable via “/sys/devices/cpu/rdpmc”
> -- Before v4.0
> -- any process (even without active perf event) rdpmc
> After v4.0
> -- Default behavior changed to support only active events in a
> process’s context.
> -- Configured through perf similar to ARM64
> -- Continue to maintain backward compatibility for unrestricted access
> by writing 2 to “/sys/devices/cpu/rdpmc”
>
> IMO, RISC-V should only enable user space access through perf similar
> to ARM64 and x86 (post v4.0).
> However, we do have to support the legacy behavior to avoid
> application breakage.
> As per my understanding a direct user space access can lead to the
> following problems:
>
> 1) There is no context switch support, so counts from other contexts are exposed
> 2) If a perf user is allocated one of these counters, the counter
> value will be written
>
> Looking at the x86 code as it continues to allow the above behavior,
> rdpmc_always_available_key is enabled in the above case. However,
> during the context switch (cr4_update_pce_mm)
> only dirty counters are cleared. It only prevents leakage from perf
> task to rdpmc task.
>
> How does the context switch of counters work for users who enable
> unrestricted access by writing 2 to “/sys/devices/cpu/rdpmc” ?
> Otherwise, rdpmc users likely get noise from other applications. Is
> that expected ?
> This can be a security concern also where a rogue rdpmc user
> application can monitor other critical applications to initiate side
> channel attack.
>
> Am I missing something? Please correct my understanding of the x86
> implementation if it is wrong.

So on x86 we have RDTSC and RDPMC instructions. RDTSC reads the
Time-Stamp-Counter which is a globally synchronized monotonic increasing
counter at some 'random' rate (idealized, don't ask). This thing is used
for time-keeping etc..

And then there's RDPMC which (optionally) allows reading the PMU
counters which are normally disabled and all 0.

Even if RDPMC is unconditionally allowed from userspace (the 2 option
you refer to) userspace will only be able to read these 0s unless
someone also programs the PMU. Linux only supports a single means of
doing so: perf (some people use /dev/msr to poke directly to the MSRs
but they get to keep all pieces).

RDPMC is only useful if you read counters you own on yourself -- IOW
selfmonitoring, using the interface outlined in uapi/linux/perf_events.h
near struct perf_event_mmap_page.

Any other usage -- you get to keep the pieces.

Can you observe random other counters, yes, unavoidably so. The sysfs
control you mention was instituted to restrict this somewhat.

If the RISC-V counters are fundamentally the PMU counters that need to
be reset to trigger events, then you've managed to paint yourself into a
tight spot :/

Either you must dis-allow userspace access to these things (and break
them) or limit the PMU usage -- both options suck.


Now, I'm thinking that esp. something like instruction count is not
synchronized between cores (seems fundamentally impossible) and can only
be reasonably be consumed (and compared) when strictly affine to a
particular CPU, you can argue that applications doing this without also
strictly managing their affinity mask are broken anyway and therefore
your breakage is not in fact a breaking them -- you can't break
something that's already broken.


Anyway, given RISC-V being a very young platform, I would try really
*really* *REALLY* hard to stomp on these applications and get them to
change in order to reclaim the PMU usage.

2023-01-09 15:50:11

by Mark Rutland

[permalink] [raw]
Subject: Re: Expected rdpmc behavior during context swtich and a RISC-V conundrum

On Mon, Jan 09, 2023 at 01:41:15PM +0100, Peter Zijlstra wrote:
> On Thu, Jan 05, 2023 at 11:59:24AM -0800, Atish Patra wrote:
> > Hi All,
> > There was a recent uabi update[1] for RISC-V that allows the users to
> > read cycle and instruction count without any checks.
> > We tried to restrict that behavior to address security concerns
> > earlier but it resulted in breakage for some user space
> > applications[2].
> > Thus, previous behavior was restored where a user on RISC-V platforms
> > can directly read cycle or instruction count[3].
> >
> > Comparison with other ISAs w.r.t user space access of counters:
> > ARM64
> > -- Enabled/Disabled via (/proc/sys/kernel/perf_user_access)
> > -- Only for task bound events configured via perf.
> >
> > X86
> > --- rdpmc instruction
> > --- Enable/Disable via “/sys/devices/cpu/rdpmc”
> > -- Before v4.0
> > -- any process (even without active perf event) rdpmc
> > After v4.0
> > -- Default behavior changed to support only active events in a
> > process’s context.
> > -- Configured through perf similar to ARM64
> > -- Continue to maintain backward compatibility for unrestricted access
> > by writing 2 to “/sys/devices/cpu/rdpmc”
> >
> > IMO, RISC-V should only enable user space access through perf similar
> > to ARM64 and x86 (post v4.0).
> > However, we do have to support the legacy behavior to avoid
> > application breakage.
> > As per my understanding a direct user space access can lead to the
> > following problems:
> >
> > 1) There is no context switch support, so counts from other contexts are exposed
> > 2) If a perf user is allocated one of these counters, the counter
> > value will be written
> >
> > Looking at the x86 code as it continues to allow the above behavior,
> > rdpmc_always_available_key is enabled in the above case. However,
> > during the context switch (cr4_update_pce_mm)
> > only dirty counters are cleared. It only prevents leakage from perf
> > task to rdpmc task.
> >
> > How does the context switch of counters work for users who enable
> > unrestricted access by writing 2 to “/sys/devices/cpu/rdpmc” ?
> > Otherwise, rdpmc users likely get noise from other applications. Is
> > that expected ?
> > This can be a security concern also where a rogue rdpmc user
> > application can monitor other critical applications to initiate side
> > channel attack.
> >
> > Am I missing something? Please correct my understanding of the x86
> > implementation if it is wrong.
>
> So on x86 we have RDTSC and RDPMC instructions. RDTSC reads the
> Time-Stamp-Counter which is a globally synchronized monotonic increasing
> counter at some 'random' rate (idealized, don't ask). This thing is used
> for time-keeping etc..

For context, the arm64 equivalent would be CNTVCT_EL0, which is a constant-rate
always-on free-running counter which is (architecturally) consistent across
CPUs, whereas PMCCNTR_EL0 is not any of those things.

> And then there's RDPMC which (optionally) allows reading the PMU
> counters which are normally disabled and all 0.
>
> Even if RDPMC is unconditionally allowed from userspace (the 2 option
> you refer to) userspace will only be able to read these 0s unless
> someone also programs the PMU. Linux only supports a single means of
> doing so: perf (some people use /dev/msr to poke directly to the MSRs
> but they get to keep all pieces).
>
> RDPMC is only useful if you read counters you own on yourself -- IOW
> selfmonitoring, using the interface outlined in uapi/linux/perf_events.h
> near struct perf_event_mmap_page.
>
> Any other usage -- you get to keep the pieces.

Yup.

> Can you observe random other counters, yes, unavoidably so. The sysfs
> control you mention was instituted to restrict this somewhat.
>
> If the RISC-V counters are fundamentally the PMU counters that need to
> be reset to trigger events, then you've managed to paint yourself into a
> tight spot :/
>
> Either you must dis-allow userspace access to these things (and break
> them) or limit the PMU usage -- both options suck.

> Now, I'm thinking that esp. something like instruction count is not
> synchronized between cores (seems fundamentally impossible) and can only
> be reasonably be consumed (and compared) when strictly affine to a
> particular CPU, you can argue that applications doing this without also
> strictly managing their affinity mask are broken anyway and therefore
> your breakage is not in fact a breaking them -- you can't break
> something that's already broken.

Yup, that was my thinking too.

The intermediate option is to trap-and-emulate (as zero or some other fixed
value), which highlghts the bug without crashing applications.

>
> Anyway, given RISC-V being a very young platform, I would try really
> *really* *REALLY* hard to stomp on these applications and get them to
> change in order to reclaim the PMU usage.

Agreed.

Thanks,
Mark.

2023-01-09 16:08:02

by Mark Rutland

[permalink] [raw]
Subject: Re: Expected rdpmc behavior during context swtich and a RISC-V conundrum

On Mon, Jan 09, 2023 at 01:06:45AM -0800, Atish Patra wrote:
> On Fri, Jan 6, 2023 at 4:02 AM Mark Rutland <[email protected]> wrote:
> >
> > On Thu, Jan 05, 2023 at 11:59:24AM -0800, Atish Patra wrote:
> > > Hi All,
> > > There was a recent uabi update[1] for RISC-V that allows the users to
> > > read cycle and instruction count without any checks.
> > > We tried to restrict that behavior to address security concerns
> > > earlier but it resulted in breakage for some user space
> > > applications[2].
> > > Thus, previous behavior was restored where a user on RISC-V platforms
> > > can directly read cycle or instruction count[3].
> > >
> > > Comparison with other ISAs w.r.t user space access of counters:
> > > ARM64
> > > -- Enabled/Disabled via (/proc/sys/kernel/perf_user_access)
> > > -- Only for task bound events configured via perf.
> > >
> > > X86
> > > --- rdpmc instruction
> > > --- Enable/Disable via “/sys/devices/cpu/rdpmc”
> > > -- Before v4.0
> > > -- any process (even without active perf event) rdpmc
> > > After v4.0
> > > -- Default behavior changed to support only active events in a
> > > process’s context.
> > > -- Configured through perf similar to ARM64
> > > -- Continue to maintain backward compatibility for unrestricted access
> > > by writing 2 to “/sys/devices/cpu/rdpmc”
> > >
> > > IMO, RISC-V should only enable user space access through perf similar
> > > to ARM64 and x86 (post v4.0).
> > > However, we do have to support the legacy behavior to avoid
> > > application breakage.
> > > As per my understanding a direct user space access can lead to the
> > > following problems:
> > >
> > > 1) There is no context switch support, so counts from other contexts are exposed
> > > 2) If a perf user is allocated one of these counters, the counter
> > > value will be written
> > >
> > > Looking at the x86 code as it continues to allow the above behavior,
> > > rdpmc_always_available_key is enabled in the above case. However,
> > > during the context switch (cr4_update_pce_mm)
> > > only dirty counters are cleared. It only prevents leakage from perf
> > > task to rdpmc task.
> > >
> > > How does the context switch of counters work for users who enable
> > > unrestricted access by writing 2 to “/sys/devices/cpu/rdpmc” ?
> > > Otherwise, rdpmc users likely get noise from other applications. Is
> > > that expected ?
> >
> > Regardless of leakage, they're also going to get random jumps in the visible
> > values of the cycle count and instruction count as the task is context-switched
> > (and/or if those values get reset across idle, as can happen on arm64), so
> > those aren't going to be useful unless a number of other constraints apply.
> >
>
> Agreed.
>
> > AFAICT the affected package was actually a library of intrinsics; does this
> > affect a real application, or was this just in tests? If it's the latter there
> > might still be scope to properly lock this down...
> >
>
> Unfortunately, there are real applications In RISC-V started using
> cycle counters due to legacy reasons.
>
> Here is the short list from debian repo pointed out in [1]
> https://codesearch.debian.net/search?q=%22rdcycle+%250%22

The first of those is GRUB, when running bare metal.

The second is this library again, which is not a whole application.

I see that it's used in some benchmarks in real applications, e.g. firefox,
chrome. However, as above we know that's *broken* today.

Is that code actually run?

> Looking at aarch64 code in one of the application, it seems they rely
> on reading "pmccntr_el0" to read time
> https://sources.debian.org/src/chromium/108.0.5359.124-1/third_party/ffmpeg/libavutil/aarch64/timer.h/

That part is under ifdefs for mac os:

| #if defined(__APPLE__)

... and it's nonsensical anyway, pmccntr_el0 is *not* a timer, and I don't know
if Mac OS would bother to context-switch the value, so it's very likely broken
anyhow.

> AFAIK, any counter access from EL0 is disabled by default in
> reset_pmuserenr_el0 and should be enabled via the
> proc/sys/perf_user_access
> in armv8pmu_enable_user_access. Is that correct ?

Yes, we *only* enable access for tasks doing self-monitoring via perf.

No other useage makes sense, since the value is arbitrarily reset, and it's not
consistent across CPUs. It *cannot* be used as a timer.

> I couldn't find any application actually enabling the access using
> perf_user_access. Maybe I am missing something?
> Otherwise, the above application would trap on access to pmccntr_el0.

As above, that's on Mac OS, not Linux.

Thanks,
Mark.

>
> [1] https://groups.google.com/a/groups.riscv.org/g/sw-dev/c/REWcwYnzsKE?pli=1
> > Thanks,
> > Mark.
> >
> > > This can be a security concern also where a rogue rdpmc user
> > > application can monitor other critical applications to initiate side
> > > channel attack.
> > >
> > > Am I missing something? Please correct my understanding of the x86
> > > implementation if it is wrong.
> > >
> > > [1] https://lore.kernel.org/lkml/[email protected]/
> > > [2] https://groups.google.com/a/groups.riscv.org/g/sw-dev/c/REWcwYnzsKE?pli=1
> > > [3] https://lore.kernel.org/all/[email protected]/T/
> > >
> > > --
> > > Regards,
> > > Atish
>
>
>
> --
> Regards,
> Atish

2023-01-09 20:14:00

by Atish Patra

[permalink] [raw]
Subject: Re: Expected rdpmc behavior during context swtich and a RISC-V conundrum

On Mon, Jan 9, 2023 at 7:26 AM Mark Rutland <[email protected]> wrote:
>
> On Mon, Jan 09, 2023 at 01:06:45AM -0800, Atish Patra wrote:
> > On Fri, Jan 6, 2023 at 4:02 AM Mark Rutland <[email protected]> wrote:
> > >
> > > On Thu, Jan 05, 2023 at 11:59:24AM -0800, Atish Patra wrote:
> > > > Hi All,
> > > > There was a recent uabi update[1] for RISC-V that allows the users to
> > > > read cycle and instruction count without any checks.
> > > > We tried to restrict that behavior to address security concerns
> > > > earlier but it resulted in breakage for some user space
> > > > applications[2].
> > > > Thus, previous behavior was restored where a user on RISC-V platforms
> > > > can directly read cycle or instruction count[3].
> > > >
> > > > Comparison with other ISAs w.r.t user space access of counters:
> > > > ARM64
> > > > -- Enabled/Disabled via (/proc/sys/kernel/perf_user_access)
> > > > -- Only for task bound events configured via perf.
> > > >
> > > > X86
> > > > --- rdpmc instruction
> > > > --- Enable/Disable via “/sys/devices/cpu/rdpmc”
> > > > -- Before v4.0
> > > > -- any process (even without active perf event) rdpmc
> > > > After v4.0
> > > > -- Default behavior changed to support only active events in a
> > > > process’s context.
> > > > -- Configured through perf similar to ARM64
> > > > -- Continue to maintain backward compatibility for unrestricted access
> > > > by writing 2 to “/sys/devices/cpu/rdpmc”
> > > >
> > > > IMO, RISC-V should only enable user space access through perf similar
> > > > to ARM64 and x86 (post v4.0).
> > > > However, we do have to support the legacy behavior to avoid
> > > > application breakage.
> > > > As per my understanding a direct user space access can lead to the
> > > > following problems:
> > > >
> > > > 1) There is no context switch support, so counts from other contexts are exposed
> > > > 2) If a perf user is allocated one of these counters, the counter
> > > > value will be written
> > > >
> > > > Looking at the x86 code as it continues to allow the above behavior,
> > > > rdpmc_always_available_key is enabled in the above case. However,
> > > > during the context switch (cr4_update_pce_mm)
> > > > only dirty counters are cleared. It only prevents leakage from perf
> > > > task to rdpmc task.
> > > >
> > > > How does the context switch of counters work for users who enable
> > > > unrestricted access by writing 2 to “/sys/devices/cpu/rdpmc” ?
> > > > Otherwise, rdpmc users likely get noise from other applications. Is
> > > > that expected ?
> > >
> > > Regardless of leakage, they're also going to get random jumps in the visible
> > > values of the cycle count and instruction count as the task is context-switched
> > > (and/or if those values get reset across idle, as can happen on arm64), so
> > > those aren't going to be useful unless a number of other constraints apply.
> > >
> >
> > Agreed.
> >
> > > AFAICT the affected package was actually a library of intrinsics; does this
> > > affect a real application, or was this just in tests? If it's the latter there
> > > might still be scope to properly lock this down...
> > >
> >
> > Unfortunately, there are real applications In RISC-V started using
> > cycle counters due to legacy reasons.
> >
> > Here is the short list from debian repo pointed out in [1]
> > https://codesearch.debian.net/search?q=%22rdcycle+%250%22
>
> The first of those is GRUB, when running bare metal.
>
> The second is this library again, which is not a whole application.
>
> I see that it's used in some benchmarks in real applications, e.g. firefox,
> chrome. However, as above we know that's *broken* today.
>
> Is that code actually run?
>
> > Looking at aarch64 code in one of the application, it seems they rely
> > on reading "pmccntr_el0" to read time
> > https://sources.debian.org/src/chromium/108.0.5359.124-1/third_party/ffmpeg/libavutil/aarch64/timer.h/
>
> That part is under ifdefs for mac os:
>
> | #if defined(__APPLE__)
>

Yes. Sorry I missed that earlier. I checked few other application code
which correctly
use cntvct_el0 as you pointed out.

> ... and it's nonsensical anyway, pmccntr_el0 is *not* a timer, and I don't know
> if Mac OS would bother to context-switch the value, so it's very likely broken
> anyhow.
>
> > AFAIK, any counter access from EL0 is disabled by default in
> > reset_pmuserenr_el0 and should be enabled via the
> > proc/sys/perf_user_access
> > in armv8pmu_enable_user_access. Is that correct ?
>
> Yes, we *only* enable access for tasks doing self-monitoring via perf.
>
> No other useage makes sense, since the value is arbitrarily reset, and it's not
> consistent across CPUs. It *cannot* be used as a timer.
>

Yes. That's what we were trying to convince the user space application
maintainers too.

> > I couldn't find any application actually enabling the access using
> > perf_user_access. Maybe I am missing something?
> > Otherwise, the above application would trap on access to pmccntr_el0.
>
> As above, that's on Mac OS, not Linux.
>

Yeah. My bad. Sorry for the noise :)

> Thanks,
> Mark.
>
> >
> > [1] https://groups.google.com/a/groups.riscv.org/g/sw-dev/c/REWcwYnzsKE?pli=1
> > > Thanks,
> > > Mark.
> > >
> > > > This can be a security concern also where a rogue rdpmc user
> > > > application can monitor other critical applications to initiate side
> > > > channel attack.
> > > >
> > > > Am I missing something? Please correct my understanding of the x86
> > > > implementation if it is wrong.
> > > >
> > > > [1] https://lore.kernel.org/lkml/[email protected]/
> > > > [2] https://groups.google.com/a/groups.riscv.org/g/sw-dev/c/REWcwYnzsKE?pli=1
> > > > [3] https://lore.kernel.org/all/[email protected]/T/
> > > >
> > > > --
> > > > Regards,
> > > > Atish
> >
> >
> >
> > --
> > Regards,
> > Atish



--
Regards,
Atish

2023-01-09 20:18:37

by Atish Patra

[permalink] [raw]
Subject: Re: Expected rdpmc behavior during context swtich and a RISC-V conundrum

On Mon, Jan 9, 2023 at 4:41 AM Peter Zijlstra <[email protected]> wrote:
>
> On Thu, Jan 05, 2023 at 11:59:24AM -0800, Atish Patra wrote:
> > Hi All,
> > There was a recent uabi update[1] for RISC-V that allows the users to
> > read cycle and instruction count without any checks.
> > We tried to restrict that behavior to address security concerns
> > earlier but it resulted in breakage for some user space
> > applications[2].
> > Thus, previous behavior was restored where a user on RISC-V platforms
> > can directly read cycle or instruction count[3].
> >
> > Comparison with other ISAs w.r.t user space access of counters:
> > ARM64
> > -- Enabled/Disabled via (/proc/sys/kernel/perf_user_access)
> > -- Only for task bound events configured via perf.
> >
> > X86
> > --- rdpmc instruction
> > --- Enable/Disable via “/sys/devices/cpu/rdpmc”
> > -- Before v4.0
> > -- any process (even without active perf event) rdpmc
> > After v4.0
> > -- Default behavior changed to support only active events in a
> > process’s context.
> > -- Configured through perf similar to ARM64
> > -- Continue to maintain backward compatibility for unrestricted access
> > by writing 2 to “/sys/devices/cpu/rdpmc”
> >
> > IMO, RISC-V should only enable user space access through perf similar
> > to ARM64 and x86 (post v4.0).
> > However, we do have to support the legacy behavior to avoid
> > application breakage.
> > As per my understanding a direct user space access can lead to the
> > following problems:
> >
> > 1) There is no context switch support, so counts from other contexts are exposed
> > 2) If a perf user is allocated one of these counters, the counter
> > value will be written
> >
> > Looking at the x86 code as it continues to allow the above behavior,
> > rdpmc_always_available_key is enabled in the above case. However,
> > during the context switch (cr4_update_pce_mm)
> > only dirty counters are cleared. It only prevents leakage from perf
> > task to rdpmc task.
> >
> > How does the context switch of counters work for users who enable
> > unrestricted access by writing 2 to “/sys/devices/cpu/rdpmc” ?
> > Otherwise, rdpmc users likely get noise from other applications. Is
> > that expected ?
> > This can be a security concern also where a rogue rdpmc user
> > application can monitor other critical applications to initiate side
> > channel attack.
> >
> > Am I missing something? Please correct my understanding of the x86
> > implementation if it is wrong.
>
> So on x86 we have RDTSC and RDPMC instructions. RDTSC reads the
> Time-Stamp-Counter which is a globally synchronized monotonic increasing
> counter at some 'random' rate (idealized, don't ask). This thing is used
> for time-keeping etc..
>
> And then there's RDPMC which (optionally) allows reading the PMU
> counters which are normally disabled and all 0.
>
> Even if RDPMC is unconditionally allowed from userspace (the 2 option
> you refer to) userspace will only be able to read these 0s unless
> someone also programs the PMU. Linux only supports a single means of
> doing so: perf (some people use /dev/msr to poke directly to the MSRs
> but they get to keep all pieces).
>

It makes sense now. Thanks!!

AFAIK, the /dev/msr interface is also allowed for root users only. So that
covers the security concerns I was asking about.

> RDPMC is only useful if you read counters you own on yourself -- IOW
> selfmonitoring, using the interface outlined in uapi/linux/perf_events.h
> near struct perf_event_mmap_page.
>
> Any other usage -- you get to keep the pieces.
>
> Can you observe random other counters, yes, unavoidably so. The sysfs
> control you mention was instituted to restrict this somewhat.
>
> If the RISC-V counters are fundamentally the PMU counters that need to
> be reset to trigger events, then you've managed to paint yourself into a
> tight spot :/
>
> Either you must dis-allow userspace access to these things (and break
> them) or limit the PMU usage -- both options suck.
>
>
> Now, I'm thinking that esp. something like instruction count is not
> synchronized between cores (seems fundamentally impossible) and can only
> be reasonably be consumed (and compared) when strictly affine to a
> particular CPU, you can argue that applications doing this without also
> strictly managing their affinity mask are broken anyway and therefore
> your breakage is not in fact a breaking them -- you can't break
> something that's already broken.
>

I think most broken applications were using rdcycle to measure time
which was wrong anyways.
It probably happened because there was no "time" CSR in the early
hardwares. Thus, the rdtime would
trap & emulated by the firmware which was slow. This lead to user
space application to use rdcycle which
was not correct either. So the existing applications are broken for
using rdcycle as well.

Since both cycle & instret behave similarly (fixed counters), they get
enabled/disabled together.

>
> Anyway, given RISC-V being a very young platform, I would try really
> *really* *REALLY* hard to stomp on these applications and get them to
> change in order to reclaim the PMU usage.

Yes. Thanks for your valuable input.

--
Regards,
Atish

2023-01-10 06:41:23

by Anup Patel

[permalink] [raw]
Subject: Re: Expected rdpmc behavior during context swtich and a RISC-V conundrum

+linux-riscv +kvm-riscv

On Tue, Jan 10, 2023 at 1:26 AM Atish Patra <[email protected]> wrote:
>
> On Mon, Jan 9, 2023 at 4:41 AM Peter Zijlstra <[email protected]> wrote:
> >
> > On Thu, Jan 05, 2023 at 11:59:24AM -0800, Atish Patra wrote:
> > > Hi All,
> > > There was a recent uabi update[1] for RISC-V that allows the users to
> > > read cycle and instruction count without any checks.
> > > We tried to restrict that behavior to address security concerns
> > > earlier but it resulted in breakage for some user space
> > > applications[2].
> > > Thus, previous behavior was restored where a user on RISC-V platforms
> > > can directly read cycle or instruction count[3].
> > >
> > > Comparison with other ISAs w.r.t user space access of counters:
> > > ARM64
> > > -- Enabled/Disabled via (/proc/sys/kernel/perf_user_access)
> > > -- Only for task bound events configured via perf.
> > >
> > > X86
> > > --- rdpmc instruction
> > > --- Enable/Disable via “/sys/devices/cpu/rdpmc”
> > > -- Before v4.0
> > > -- any process (even without active perf event) rdpmc
> > > After v4.0
> > > -- Default behavior changed to support only active events in a
> > > process’s context.
> > > -- Configured through perf similar to ARM64
> > > -- Continue to maintain backward compatibility for unrestricted access
> > > by writing 2 to “/sys/devices/cpu/rdpmc”
> > >
> > > IMO, RISC-V should only enable user space access through perf similar
> > > to ARM64 and x86 (post v4.0).
> > > However, we do have to support the legacy behavior to avoid
> > > application breakage.
> > > As per my understanding a direct user space access can lead to the
> > > following problems:
> > >
> > > 1) There is no context switch support, so counts from other contexts are exposed
> > > 2) If a perf user is allocated one of these counters, the counter
> > > value will be written
> > >
> > > Looking at the x86 code as it continues to allow the above behavior,
> > > rdpmc_always_available_key is enabled in the above case. However,
> > > during the context switch (cr4_update_pce_mm)
> > > only dirty counters are cleared. It only prevents leakage from perf
> > > task to rdpmc task.
> > >
> > > How does the context switch of counters work for users who enable
> > > unrestricted access by writing 2 to “/sys/devices/cpu/rdpmc” ?
> > > Otherwise, rdpmc users likely get noise from other applications. Is
> > > that expected ?
> > > This can be a security concern also where a rogue rdpmc user
> > > application can monitor other critical applications to initiate side
> > > channel attack.
> > >
> > > Am I missing something? Please correct my understanding of the x86
> > > implementation if it is wrong.
> >
> > So on x86 we have RDTSC and RDPMC instructions. RDTSC reads the
> > Time-Stamp-Counter which is a globally synchronized monotonic increasing
> > counter at some 'random' rate (idealized, don't ask). This thing is used
> > for time-keeping etc..
> >
> > And then there's RDPMC which (optionally) allows reading the PMU
> > counters which are normally disabled and all 0.
> >
> > Even if RDPMC is unconditionally allowed from userspace (the 2 option
> > you refer to) userspace will only be able to read these 0s unless
> > someone also programs the PMU. Linux only supports a single means of
> > doing so: perf (some people use /dev/msr to poke directly to the MSRs
> > but they get to keep all pieces).
> >
>
> It makes sense now. Thanks!!
>
> AFAIK, the /dev/msr interface is also allowed for root users only. So that
> covers the security concerns I was asking about.
>
> > RDPMC is only useful if you read counters you own on yourself -- IOW
> > selfmonitoring, using the interface outlined in uapi/linux/perf_events.h
> > near struct perf_event_mmap_page.
> >
> > Any other usage -- you get to keep the pieces.
> >
> > Can you observe random other counters, yes, unavoidably so. The sysfs
> > control you mention was instituted to restrict this somewhat.
> >
> > If the RISC-V counters are fundamentally the PMU counters that need to
> > be reset to trigger events, then you've managed to paint yourself into a
> > tight spot :/
> >
> > Either you must dis-allow userspace access to these things (and break
> > them) or limit the PMU usage -- both options suck.
> >
> >
> > Now, I'm thinking that esp. something like instruction count is not
> > synchronized between cores (seems fundamentally impossible) and can only
> > be reasonably be consumed (and compared) when strictly affine to a
> > particular CPU, you can argue that applications doing this without also
> > strictly managing their affinity mask are broken anyway and therefore
> > your breakage is not in fact a breaking them -- you can't break
> > something that's already broken.
> >
>
> I think most broken applications were using rdcycle to measure time
> which was wrong anyways.
> It probably happened because there was no "time" CSR in the early
> hardwares. Thus, the rdtime would
> trap & emulated by the firmware which was slow. This lead to user
> space application to use rdcycle which
> was not correct either. So the existing applications are broken for
> using rdcycle as well.
>
> Since both cycle & instret behave similarly (fixed counters), they get
> enabled/disabled together.
>
> >
> > Anyway, given RISC-V being a very young platform, I would try really
> > *really* *REALLY* hard to stomp on these applications and get them to
> > change in order to reclaim the PMU usage.
>
> Yes. Thanks for your valuable input.

I agree with Peter Z. We had a similar discussion in the
Performance Analysis SIG of RISC-V international as well.

This also forces KVM (and other hypervisors) to save-n-restore
CYCLE and INSTRUCTION counters so that one Guest/VM
can't see cycle/instruction counts from another Guest/VM.

Only a few applications are affected and RISC-V ecosystem
is young so it is better to change these applications rather
than making CYCLE and INSTRUCTION counters as part
of uABI.

Regards,
Anup