2019-12-23 13:05:27

by Srinivas Ramana

[permalink] [raw]
Subject: [PATCH] arm64: Set SSBS for user threads while creation

Current SSBS implementation takes care of setting the
SSBS bit in start_thread() for user threads. While this works
for tasks launched with fork/clone followed by execve, for cases
where userspace would just call fork (eg, Java applications) this
leaves the SSBS bit unset. This results in performance
regression for such tasks.

It is understood that commit cbdf8a189a66 ("arm64: Force SSBS
on context switch") masks this issue, but that was done for a
different reason where heterogeneous CPUs(both SSBS supported
and unsupported) are present. It is appropriate to take care
of the SSBS bit for all threads while creation itself.

Fixes: 8f04e8e6e29c ("arm64: ssbd: Add support for PSTATE.SSBS rather than trapping to EL3")
Signed-off-by: Srinivas Ramana <[email protected]>
---
arch/arm64/kernel/process.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 71f788cd2b18..a8f05cc39261 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -399,6 +399,13 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start,
*/
if (clone_flags & CLONE_SETTLS)
p->thread.uw.tp_value = childregs->regs[3];
+
+ if (arm64_get_ssbd_state() != ARM64_SSBD_FORCE_ENABLE) {
+ if (is_compat_thread(task_thread_info(p)))
+ set_compat_ssbs_bit(childregs);
+ else
+ set_ssbs_bit(childregs);
+ }
} else {
memset(childregs, 0, sizeof(struct pt_regs));
childregs->pstate = PSR_MODE_EL1h;
--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.,
is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.


2019-12-24 07:06:50

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [PATCH] arm64: Set SSBS for user threads while creation



On 12/23/2019 06:32 PM, Srinivas Ramana wrote:
> Current SSBS implementation takes care of setting the
> SSBS bit in start_thread() for user threads. While this works
> for tasks launched with fork/clone followed by execve, for cases
> where userspace would just call fork (eg, Java applications) this
> leaves the SSBS bit unset. This results in performance
> regression for such tasks.
>
> It is understood that commit cbdf8a189a66 ("arm64: Force SSBS
> on context switch") masks this issue, but that was done for a
> different reason where heterogeneous CPUs(both SSBS supported
> and unsupported) are present. It is appropriate to take care
> of the SSBS bit for all threads while creation itself.

So this fixes the situation (i.e low performance) from the creation time
of a task with fork() which will never see a subsequent execve, till it
gets context switched for the very first time ?

>
> Fixes: 8f04e8e6e29c ("arm64: ssbd: Add support for PSTATE.SSBS rather than trapping to EL3")
> Signed-off-by: Srinivas Ramana <[email protected]>
> ---
> arch/arm64/kernel/process.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> index 71f788cd2b18..a8f05cc39261 100644
> --- a/arch/arm64/kernel/process.c
> +++ b/arch/arm64/kernel/process.c
> @@ -399,6 +399,13 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start,
> */
> if (clone_flags & CLONE_SETTLS)
> p->thread.uw.tp_value = childregs->regs[3];
> +
> + if (arm64_get_ssbd_state() != ARM64_SSBD_FORCE_ENABLE) {
> + if (is_compat_thread(task_thread_info(p)))
> + set_compat_ssbs_bit(childregs);
> + else
> + set_ssbs_bit(childregs);
> + }
> } else {
> memset(childregs, 0, sizeof(struct pt_regs));
> childregs->pstate = PSR_MODE_EL1h;
>

2019-12-24 08:33:05

by Srinivas Ramana

[permalink] [raw]
Subject: Re: [PATCH] arm64: Set SSBS for user threads while creation

On 12/24/2019 12:36 PM, Anshuman Khandual wrote:
>
>
> On 12/23/2019 06:32 PM, Srinivas Ramana wrote:
>> Current SSBS implementation takes care of setting the
>> SSBS bit in start_thread() for user threads. While this works
>> for tasks launched with fork/clone followed by execve, for cases
>> where userspace would just call fork (eg, Java applications) this
>> leaves the SSBS bit unset. This results in performance
>> regression for such tasks.
>>
>> It is understood that commit cbdf8a189a66 ("arm64: Force SSBS
>> on context switch") masks this issue, but that was done for a
>> different reason where heterogeneous CPUs(both SSBS supported
>> and unsupported) are present. It is appropriate to take care
>> of the SSBS bit for all threads while creation itself.
>
> So this fixes the situation (i.e low performance) from the creation time
> of a task with fork() which will never see a subsequent execve, till it
> gets context switched for the very first time ?
>
Yes, that is correct.

>>
>> Fixes: 8f04e8e6e29c ("arm64: ssbd: Add support for PSTATE.SSBS rather than trapping to EL3")
>> Signed-off-by: Srinivas Ramana <[email protected]>
>> ---
>> arch/arm64/kernel/process.c | 7 +++++++
>> 1 file changed, 7 insertions(+)
>>
>> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
>> index 71f788cd2b18..a8f05cc39261 100644
>> --- a/arch/arm64/kernel/process.c
>> +++ b/arch/arm64/kernel/process.c
>> @@ -399,6 +399,13 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start,
>> */
>> if (clone_flags & CLONE_SETTLS)
>> p->thread.uw.tp_value = childregs->regs[3];
>> +
>> + if (arm64_get_ssbd_state() != ARM64_SSBD_FORCE_ENABLE) {
>> + if (is_compat_thread(task_thread_info(p)))
>> + set_compat_ssbs_bit(childregs);
>> + else
>> + set_ssbs_bit(childregs);
>> + }
>> } else {
>> memset(childregs, 0, sizeof(struct pt_regs));
>> childregs->pstate = PSR_MODE_EL1h;
>>

Thanks,
-- Srinivas R

--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation
Center, Inc., is a member of Code Aurora Forum, a Linux Foundation
Collaborative Project

2020-01-02 18:03:19

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH] arm64: Set SSBS for user threads while creation

On Mon, Dec 23, 2019 at 06:32:26PM +0530, Srinivas Ramana wrote:
> Current SSBS implementation takes care of setting the
> SSBS bit in start_thread() for user threads. While this works
> for tasks launched with fork/clone followed by execve, for cases
> where userspace would just call fork (eg, Java applications) this
> leaves the SSBS bit unset. This results in performance
> regression for such tasks.
>
> It is understood that commit cbdf8a189a66 ("arm64: Force SSBS
> on context switch") masks this issue, but that was done for a
> different reason where heterogeneous CPUs(both SSBS supported
> and unsupported) are present. It is appropriate to take care
> of the SSBS bit for all threads while creation itself.
>
> Fixes: 8f04e8e6e29c ("arm64: ssbd: Add support for PSTATE.SSBS rather than trapping to EL3")
> Signed-off-by: Srinivas Ramana <[email protected]>

I suppose the parent process cleared SSBS explicitly. Isn't the child
after fork() supposed to be nearly identical to the parent? If we did as
you suggest, someone else might complain that SSBS has been set in the
child after fork().

I think the fix is for user space to set SSBS in the child if it no
longer needs it.

--
Catalin

2020-01-09 19:07:13

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH] arm64: Set SSBS for user threads while creation

On Thu, Jan 02, 2020 at 06:01:45PM +0000, Catalin Marinas wrote:
> On Mon, Dec 23, 2019 at 06:32:26PM +0530, Srinivas Ramana wrote:
> > Current SSBS implementation takes care of setting the
> > SSBS bit in start_thread() for user threads. While this works
> > for tasks launched with fork/clone followed by execve, for cases
> > where userspace would just call fork (eg, Java applications) this
> > leaves the SSBS bit unset. This results in performance
> > regression for such tasks.
> >
> > It is understood that commit cbdf8a189a66 ("arm64: Force SSBS
> > on context switch") masks this issue, but that was done for a
> > different reason where heterogeneous CPUs(both SSBS supported
> > and unsupported) are present. It is appropriate to take care
> > of the SSBS bit for all threads while creation itself.
> >
> > Fixes: 8f04e8e6e29c ("arm64: ssbd: Add support for PSTATE.SSBS rather than trapping to EL3")
> > Signed-off-by: Srinivas Ramana <[email protected]>
>
> I suppose the parent process cleared SSBS explicitly. Isn't the child
> after fork() supposed to be nearly identical to the parent? If we did as
> you suggest, someone else might complain that SSBS has been set in the
> child after fork().

Right, I'd expect the parent SSBS to be inherited when we copy the pstate
field along with the other regs, and I think this is the correct behaviour.

Is that broken somehow?

Will

2020-01-29 11:50:24

by Srinivas Ramana

[permalink] [raw]
Subject: Re: [PATCH] arm64: Set SSBS for user threads while creation

On 1/2/2020 11:31 PM, Catalin Marinas wrote:
> On Mon, Dec 23, 2019 at 06:32:26PM +0530, Srinivas Ramana wrote:
>> Current SSBS implementation takes care of setting the
>> SSBS bit in start_thread() for user threads. While this works
>> for tasks launched with fork/clone followed by execve, for cases
>> where userspace would just call fork (eg, Java applications) this
>> leaves the SSBS bit unset. This results in performance
>> regression for such tasks.
>>
>> It is understood that commit cbdf8a189a66 ("arm64: Force SSBS
>> on context switch") masks this issue, but that was done for a
>> different reason where heterogeneous CPUs(both SSBS supported
>> and unsupported) are present. It is appropriate to take care
>> of the SSBS bit for all threads while creation itself.
>>
>> Fixes: 8f04e8e6e29c ("arm64: ssbd: Add support for PSTATE.SSBS rather than trapping to EL3")
>> Signed-off-by: Srinivas Ramana <[email protected]>
>
> I suppose the parent process cleared SSBS explicitly. Isn't the child

Actually we observe that parent(in case of android, zygote that launches
the app) does have SSBS bit set. However child doesn't have the bit set.

> after fork() supposed to be nearly identical to the parent? If we did as
> you suggest, someone else might complain that SSBS has been set in the
> child after fork().

I am also wondering why would a userspace process clear SSBS bit loosing
the performance benefit.
>
> I think the fix is for user space to set SSBS in the child if it no
> longer needs it.
>

Sorry for the late response on this.

Thanks,
-- Srinivas R


--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation
Center, Inc., is a member of Code Aurora Forum, a Linux Foundation
Collaborative Project

2020-01-29 16:23:06

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH] arm64: Set SSBS for user threads while creation

On Wed, Jan 29, 2020 at 05:18:53PM +0530, Srinivas Ramana wrote:
> On 1/2/2020 11:31 PM, Catalin Marinas wrote:
> > On Mon, Dec 23, 2019 at 06:32:26PM +0530, Srinivas Ramana wrote:
> > > Current SSBS implementation takes care of setting the
> > > SSBS bit in start_thread() for user threads. While this works
> > > for tasks launched with fork/clone followed by execve, for cases
> > > where userspace would just call fork (eg, Java applications) this
> > > leaves the SSBS bit unset. This results in performance
> > > regression for such tasks.
> > >
> > > It is understood that commit cbdf8a189a66 ("arm64: Force SSBS
> > > on context switch") masks this issue, but that was done for a
> > > different reason where heterogeneous CPUs(both SSBS supported
> > > and unsupported) are present. It is appropriate to take care
> > > of the SSBS bit for all threads while creation itself.
> > >
> > > Fixes: 8f04e8e6e29c ("arm64: ssbd: Add support for PSTATE.SSBS rather than trapping to EL3")
> > > Signed-off-by: Srinivas Ramana <[email protected]>
> >
> > I suppose the parent process cleared SSBS explicitly. Isn't the child
>
> Actually we observe that parent(in case of android, zygote that launches the
> app) does have SSBS bit set. However child doesn't have the bit set.

On which SoC? Your commit message talks about heterogeneous systems (wrt
SSBS) as though they don't apply in your case. Could you provide us with
a reproducer?

> > after fork() supposed to be nearly identical to the parent? If we did as
> > you suggest, someone else might complain that SSBS has been set in the
> > child after fork().
>
> I am also wondering why would a userspace process clear SSBS bit loosing the
> performance benefit.

I guess it could happen during sigreturn if the signal handler wasn't
careful about preserving bits in pstate, although it doesn't feel like
something you'd regularly run into.

But hang on a sec -- it looks like the context switch logic in
cbdf8a189a66 actually does the wrong thing for systems where all of the
CPUs implement SSBS. I don't think it explains the behaviour you're seeing,
but I do think it could end up in situations where SSBS is unexpectedly
*set*.

Diff below.

Will

--->8

diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index bbb0f0c145f6..e38284c9fb7b 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -466,6 +466,13 @@ static void ssbs_thread_switch(struct task_struct *next)
if (unlikely(next->flags & PF_KTHREAD))
return;

+ /*
+ * If all CPUs implement the SSBS instructions, then we just
+ * need to context-switch the PSTATE field.
+ */
+ if (cpu_have_feature(cpu_feature(SSBS)))
+ return;
+
/* If the mitigation is enabled, then we leave SSBS clear. */
if ((arm64_get_ssbd_state() == ARM64_SSBD_FORCE_ENABLE) ||
test_tsk_thread_flag(next, TIF_SSBD))