2024-01-27 19:08:38

by a-development

[permalink] [raw]
Subject: [RFC][PATCH 00/17] Fix up the recent SRSO patches

putting srso=off in the cmdline fixed up my FUSE related issues.
Basically, I could not suspend anymore.
kernel 6.7.1.

This is the behavior with srso enabled...
https://paste.cachyos.org/p/bae7257


2024-01-27 19:20:29

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC][PATCH 00/17] Fix up the recent SRSO patches

On Sat, Jan 27, 2024 at 06:58:37PM +0000, [email protected] wrote:
> putting srso=off in the cmdline fixed up my FUSE related issues.
> Basically, I could not suspend anymore.
> kernel 6.7.1.
>
> This is the behavior with srso enabled...
> https://paste.cachyos.org/p/bae7257

Can you disable, if possible, whatever's doing FUSE and try suspending
then?

Also, can you share full dmesg, .config and /proc/cpuinfo from the
machine?

Thx.


--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2024-01-27 19:28:04

by a-development

[permalink] [raw]
Subject: Re: [RFC][PATCH 00/17] Fix up the recent SRSO patches

Oh that was quick :)

I can umount the FUSE mounts and it will work fine.
Previously I didn't even suspend.
Also, in the log I had provided, I was on a cachyos kernel, but it
didn't matter, even the most recent arch kernel had the same issues.

full dmesg is no problem - I can do that the next day, when I startup
the server again
full ~/.config folder I don't want to share
here is /proc/cpuinfo https://paste.cachyos.org/p/158b767

Thanks

On 27.01.2024 20:19, Borislav Petkov wrote:
> On Sat, Jan 27, 2024 at 06:58:37PM +0000, [email protected]
> wrote:
>> putting srso=off in the cmdline fixed up my FUSE related issues.
>> Basically, I could not suspend anymore.
>> kernel 6.7.1.
>>
>> This is the behavior with srso enabled...
>> https://paste.cachyos.org/p/bae7257
>
> Can you disable, if possible, whatever's doing FUSE and try suspending
> then?
>
> Also, can you share full dmesg, .config and /proc/cpuinfo from the
> machine?
>
> Thx.

2024-01-27 19:29:03

by a-development

[permalink] [raw]
Subject: Re: [RFC][PATCH 00/17] Fix up the recent SRSO patches

Oh that was quick :)

I can umount the FUSE mounts and it will work fine.
Previously I didn't even suspend.
Also, in the log I had provided, I was on a cachyos kernel, but it
didn't matter, even the most recent arch kernel had the same issues.

full dmesg is no problem - I can do that the next day, when I startup
the server again
full ~/.config folder I don't want to share
here is /proc/cpuinfo https://paste.cachyos.org/p/158b767

Thanks

On 27.01.2024 20:19, Borislav Petkov wrote:
> On Sat, Jan 27, 2024 at 06:58:37PM +0000, [email protected]
> wrote:
>> putting srso=off in the cmdline fixed up my FUSE related issues.
>> Basically, I could not suspend anymore.
>> kernel 6.7.1.
>>
>> This is the behavior with srso enabled...
>> https://paste.cachyos.org/p/bae7257
>
> Can you disable, if possible, whatever's doing FUSE and try suspending
> then?
>
> Also, can you share full dmesg, .config and /proc/cpuinfo from the
> machine?
>
> Thx.

2024-01-27 19:42:11

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC][PATCH 00/17] Fix up the recent SRSO patches

On Sat, Jan 27, 2024 at 07:27:45PM +0000, [email protected] wrote:
> I can umount the FUSE mounts and it will work fine.

Aha, so it is FUSE-related.

How do I trigger it here? What are the steps to reproduce? Suspend while
I have a FUSE mount? How do I set it up so that it is as close to yours
as possible?

> Previously I didn't even suspend. Also, in the log I had provided,
> I was on a cachyos kernel, but it didn't matter, even the most recent
> arch kernel had the same issues.

You should try an upstream kernel to confirm it reproduces there - no
distro kernels.

> full dmesg is no problem - I can do that the next day, when I startup
> the server again full ~/.config folder I don't want to share

Not the full .config folder - just the kernel .config of the kernel
you're triggering this with so that I can try to do it here too.

> here is /proc/cpuinfo https://paste.cachyos.org/p/158b767

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2024-01-29 18:18:21

by a-development

[permalink] [raw]
Subject: Re: [RFC][PATCH 00/17] Fix up the recent SRSO patches

Hello.

I have the feeling that something else is amiss.
Currently under 6.7.2-2-cachyos with srso=off.
https://0x0.st/HDqP.txt

Now I feel, further communication is rather selfish, as a clean
environment is hard to provide.
In any case, my FUSE arguments are sshfs -o kernel_cache -o auto_cache
-o reconnect \
-o compression=yes -o cache_timeout=600 -o
ServerAliveInterval=30 \
"$source" "$target" -o idmap=user

With this line, I somehow managed to have the FUSE mount infinitely
mounted, even if the device was offline for couple of days.
A followed suspend would fail to freeze.
srso=off would reproducibly work.

Please provide me a specific version of a kernel I should try in my
configuration to try and reproduce.
I'd prefer a pre-compiled one; if not tell me...
I use archlinux.

Please give me a reason to not feel bad about myself.

All the best







On 27.01.2024 20:41, Borislav Petkov wrote:
> On Sat, Jan 27, 2024 at 07:27:45PM +0000, [email protected]
> wrote:
>> I can umount the FUSE mounts and it will work fine.
>
> Aha, so it is FUSE-related.
>
> How do I trigger it here? What are the steps to reproduce? Suspend
> while
> I have a FUSE mount? How do I set it up so that it is as close to yours
> as possible?
>
>> Previously I didn't even suspend. Also, in the log I had provided,
>> I was on a cachyos kernel, but it didn't matter, even the most recent
>> arch kernel had the same issues.
>
> You should try an upstream kernel to confirm it reproduces there - no
> distro kernels.
>
>> full dmesg is no problem - I can do that the next day, when I startup
>> the server again full ~/.config folder I don't want to share
>
> Not the full .config folder - just the kernel .config of the kernel
> you're triggering this with so that I can try to do it here too.
>
>> here is /proc/cpuinfo https://paste.cachyos.org/p/158b767
>
> Thx.

2024-03-26 22:22:25

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC][PATCH 00/17] Fix up the recent SRSO patches

Whoops,

this fell through the cracks. Sorry about that.

On Mon, Jan 29, 2024 at 06:18:00PM +0000, [email protected] wrote:
> I have the feeling that something else is amiss.
> Currently under 6.7.2-2-cachyos with srso=off.
> https://0x0.st/HDqP.txt

Yah, your tasks refuse to freeze on suspend and they have this fuse
stuff in the stacktrace:

[ 6346.492593] task:btop state:D stack:0 pid:279617 tgid:1548 ppid:1531 flags:0x00004006
[ 6346.492600] Call Trace:
[ 6346.492602] <TASK>
[ 6346.492607] __schedule+0xd44/0x1af0
[ 6346.492614] ? srso_alias_return_thunk+0x5/0xfbef5
[ 6346.492617] ? __wake_up+0x9d/0xc0
[ 6346.492622] schedule+0x32/0xd0
[ 6346.492627] request_wait_answer+0xd0/0x2a0 [fuse db37c699d94393e946cf93306449ea0f307959a1]
[ 6346.492638] ? __pfx_autoremove_wake_function+0x10/0x10
[ 6346.492643] fuse_simple_request+0x21c/0x390 [fuse db37c699d94393e946cf93306449ea0f307959a1]
[ 6346.492653] fuse_statfs+0xf2/0x160 [fuse db37c699d94393e946cf93306449ea0f307959a1]
[ 6346.492667] statfs_by_dentry+0x67/0x90

>
> Now I feel, further communication is rather selfish, as a clean environment
> is hard to provide.
> In any case, my FUSE arguments are sshfs -o kernel_cache -o auto_cache -o
> reconnect \
> -o compression=yes -o cache_timeout=600 -o
> ServerAliveInterval=30 \
> "$source" "$target" -o idmap=user
>
> With this line, I somehow managed to have the FUSE mount infinitely mounted,
> even if the device was offline for couple of days.
> A followed suspend would fail to freeze.
> srso=off would reproducibly work.

Not in your example above. It would fail after a couple of suspend
cycles.

And looking at your splats

[ 6366.524953] ? switch_fpu_return+0x50/0xe0
[ 6366.524956] ? srso_alias_return_thunk+0x5/0xfbef5
^^^^^^^^^^^^^^^^^^^^^^^^
[ 6366.524958] ? exit_to_user_mode_prepare+0x132/0x1f

the right cmdline option to disable it is:

spec_rstack_overflow=off

not

srso=off

:-)

> Please provide me a specific version of a kernel I should try in my
> configuration to try and reproduce.
> I'd prefer a pre-compiled one; if not tell me...
> I use archlinux.

Just build the latest released kernel, which is 6.8 now. Take your
config and use it to build it. The net is full of tutorials how to do
so.

And then try with spec_rstack_overflow=off and let's see what that does.

> Please give me a reason to not feel bad about myself.

Don't worry - it's just a machine. :)

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2024-04-16 06:59:06

by a-development

[permalink] [raw]
Subject: Re: [RFC][PATCH 00/17] Fix up the recent SRSO patches

It worked, it worked!

https://up.tail.ws/txt/working-suspend.txt

I've tested it now quite some time.
But, I also had to start using 6.6.26-1-lts because my magewell capture
card wouldn't without.

Thanks again!

On 26.03.2024 23:21, Borislav Petkov wrote:
> Whoops,
>
> this fell through the cracks. Sorry about that.
>
> On Mon, Jan 29, 2024 at 06:18:00PM +0000, [email protected]
> wrote:
>> I have the feeling that something else is amiss.
>> Currently under 6.7.2-2-cachyos with srso=off.
>> https://0x0.st/HDqP.txt
>
> Yah, your tasks refuse to freeze on suspend and they have this fuse
> stuff in the stacktrace:
>
> [ 6346.492593] task:btop state:D stack:0 pid:279617
> tgid:1548 ppid:1531 flags:0x00004006
> [ 6346.492600] Call Trace:
> [ 6346.492602] <TASK>
> [ 6346.492607] __schedule+0xd44/0x1af0
> [ 6346.492614] ? srso_alias_return_thunk+0x5/0xfbef5
> [ 6346.492617] ? __wake_up+0x9d/0xc0
> [ 6346.492622] schedule+0x32/0xd0
> [ 6346.492627] request_wait_answer+0xd0/0x2a0 [fuse
> db37c699d94393e946cf93306449ea0f307959a1]
> [ 6346.492638] ? __pfx_autoremove_wake_function+0x10/0x10
> [ 6346.492643] fuse_simple_request+0x21c/0x390 [fuse
> db37c699d94393e946cf93306449ea0f307959a1]
> [ 6346.492653] fuse_statfs+0xf2/0x160 [fuse
> db37c699d94393e946cf93306449ea0f307959a1]
> [ 6346.492667] statfs_by_dentry+0x67/0x90
>
>>
>> Now I feel, further communication is rather selfish, as a clean
>> environment
>> is hard to provide.
>> In any case, my FUSE arguments are sshfs -o kernel_cache -o auto_cache
>> -o
>> reconnect \
>> -o compression=yes -o cache_timeout=600 -o
>> ServerAliveInterval=30 \
>> "$source" "$target" -o idmap=user
>>
>> With this line, I somehow managed to have the FUSE mount infinitely
>> mounted,
>> even if the device was offline for couple of days.
>> A followed suspend would fail to freeze.
>> srso=off would reproducibly work.
>
> Not in your example above. It would fail after a couple of suspend
> cycles.
>
> And looking at your splats
>
> [ 6366.524953] ? switch_fpu_return+0x50/0xe0
> [ 6366.524956] ? srso_alias_return_thunk+0x5/0xfbef5
> ^^^^^^^^^^^^^^^^^^^^^^^^
> [ 6366.524958] ? exit_to_user_mode_prepare+0x132/0x1f
>
> the right cmdline option to disable it is:
>
> spec_rstack_overflow=off
>
> not
>
> srso=off
>
> :-)
>
>> Please provide me a specific version of a kernel I should try in my
>> configuration to try and reproduce.
>> I'd prefer a pre-compiled one; if not tell me...
>> I use archlinux.
>
> Just build the latest released kernel, which is 6.8 now. Take your
> .config and use it to build it. The net is full of tutorials how to do
> so.
>
> And then try with spec_rstack_overflow=off and let's see what that
> does.
>
>> Please give me a reason to not feel bad about myself.
>
> Don't worry - it's just a machine. :)

2024-04-16 08:46:19

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC][PATCH 00/17] Fix up the recent SRSO patches

On Tue, Apr 16, 2024 at 06:48:54AM +0000, [email protected] wrote:
> It worked, it worked!
>
> https://up.tail.ws/txt/working-suspend.txt
>
> I've tested it now quite some time.
> But, I also had to start using 6.6.26-1-lts because my magewell capture card
> wouldn't without.

Right, that thing I guess:

[Mon Apr 15 18:37:58 2024] ProCapture: loading out-of-tree module taints kernel.
[Mon Apr 15 18:37:58 2024] ProCapture: module verification failed: signature and/or required key missing - tainting kernel

So, machines do suspend even with SRSO enabled and since your machine is
affected, you probably should try without spec_rstack_overflow=off to
see if it works with the new kernel.

Then, the other thing you could try is whether suspend works without
that proprietary crap.

And then we can see.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2024-04-16 20:14:27

by a-development

[permalink] [raw]
Subject: Re: [RFC][PATCH 00/17] Fix up the recent SRSO patches

Now that it is deactivated, the machine no longer suspends!

https://up.tail.ws/txt/non-working-suspend.txt

> Then, the other thing you could try is whether suspend works without
> that proprietary crap.

I refuse. I can explain. I tried lots of capture cards that stated they
support uvcvideo and linux.
This problem existed prior and I need it for work on this machine.
But none of them worked reliably or would straight up glitch out.
Thats because they do not implement it properly.

It had to be a product from Magewell, who manage an array of bash
scripts and the AUR maintainer gets updates if something breaks, too.
Why do I use a PCIe HDMI Capture Card?
I need to use Cameras and Displays.

As for USB Cameras, unless its a product from e.g Logitech, they kept
giving me similar headaches.
And that included an older setup that ran a Intel i7 8700K as well.

Thx.



On 16.04.2024 10:45, Borislav Petkov wrote:
> On Tue, Apr 16, 2024 at 06:48:54AM +0000, [email protected]
> wrote:
>> It worked, it worked!
>>
>> https://up.tail.ws/txt/working-suspend.txt
>>
>> I've tested it now quite some time.
>> But, I also had to start using 6.6.26-1-lts because my magewell
>> capture card
>> wouldn't without.
>
> Right, that thing I guess:
>
> [Mon Apr 15 18:37:58 2024] ProCapture: loading out-of-tree module
> taints kernel.
> [Mon Apr 15 18:37:58 2024] ProCapture: module verification failed:
> signature and/or required key missing - tainting kernel
>
> So, machines do suspend even with SRSO enabled and since your machine
> is
> affected, you probably should try without spec_rstack_overflow=off to
> see if it works with the new kernel.
>
> Then, the other thing you could try is whether suspend works without
> that proprietary crap.
>
> And then we can see.
>
> Thx.

2024-04-17 08:09:12

by a-development

[permalink] [raw]
Subject: Re: [RFC][PATCH 00/17] Fix up the recent SRSO patches

Today I failed to suspend, and the spec_rstack thing was off.

https://up.tail.ws/txt/non-working-suspend-2.txt




On 16.04.2024 22:14, [email protected] wrote:
> Now that it is deactivated, the machine no longer suspends!
>
> https://up.tail.ws/txt/non-working-suspend.txt
>
>> Then, the other thing you could try is whether suspend works without
>> that proprietary crap.
>
> I refuse. I can explain. I tried lots of capture cards that stated
> they support uvcvideo and linux.
> This problem existed prior and I need it for work on this machine.
> But none of them worked reliably or would straight up glitch out.
> Thats because they do not implement it properly.
>
> It had to be a product from Magewell, who manage an array of bash
> scripts and the AUR maintainer gets updates if something breaks, too.
> Why do I use a PCIe HDMI Capture Card?
> I need to use Cameras and Displays.
>
> As for USB Cameras, unless its a product from e.g Logitech, they kept
> giving me similar headaches.
> And that included an older setup that ran a Intel i7 8700K as well.
>
> Thx.
>
>
>
> On 16.04.2024 10:45, Borislav Petkov wrote:
>> On Tue, Apr 16, 2024 at 06:48:54AM +0000, [email protected]
>> wrote:
>>> It worked, it worked!
>>>
>>> https://up.tail.ws/txt/working-suspend.txt
>>>
>>> I've tested it now quite some time.
>>> But, I also had to start using 6.6.26-1-lts because my magewell
>>> capture card
>>> wouldn't without.
>>
>> Right, that thing I guess:
>>
>> [Mon Apr 15 18:37:58 2024] ProCapture: loading out-of-tree module
>> taints kernel.
>> [Mon Apr 15 18:37:58 2024] ProCapture: module verification failed:
>> signature and/or required key missing - tainting kernel
>>
>> So, machines do suspend even with SRSO enabled and since your machine
>> is
>> affected, you probably should try without spec_rstack_overflow=off to
>> see if it works with the new kernel.
>>
>> Then, the other thing you could try is whether suspend works without
>> that proprietary crap.
>>
>> And then we can see.
>>
>> Thx.

2024-04-17 09:12:36

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC][PATCH 00/17] Fix up the recent SRSO patches

On Wed, Apr 17, 2024 at 08:08:53AM +0000, [email protected] wrote:
> Today I failed to suspend, and the spec_rstack thing was off.
>
> https://up.tail.ws/txt/non-working-suspend-2.txt

Ok, but please do not top-post. Put your reply underneath the next
you're replying to and remove the rest of the quoted text like I just
did.

So this could be caused by the proprietary module or something else.

If you want this debugged, you'd have to try to reproduce it with the
latest upstream kernel from here:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

after having removed the propietary module.

HTH.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2024-04-22 07:01:21

by a-development

[permalink] [raw]
Subject: Re: [RFC][PATCH 00/17] Fix up the recent SRSO patches

Hello. I have installed the kernel through
https://aur.archlinux.org/packages/linux-mainline and noticed that SRSO
is disabled. "Speculative Return Stack Overflow: IBPB-extending
microcode not applied!"

cat /sys/devices/system/cpu/vulnerabilities/spec_rstack_overflow
Vulnerable: Safe RET, no microcode

So far, I have been succesfully suspending the one night I used it.

Assuming this is per-default, I've installed the kernel module for my
PCIe Capture card and testing it.

Any new instructions?

Thanks

On 17.04.2024 11:12, Borislav Petkov wrote:
> On Wed, Apr 17, 2024 at 08:08:53AM +0000, [email protected]
> wrote:
>> Today I failed to suspend, and the spec_rstack thing was off.
>>
>> https://up.tail.ws/txt/non-working-suspend-2.txt
>
> Ok, but please do not top-post. Put your reply underneath the next
> you're replying to and remove the rest of the quoted text like I just
> did.
>
> So this could be caused by the proprietary module or something else.
>
> If you want this debugged, you'd have to try to reproduce it with the
> latest upstream kernel from here:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
>
> after having removed the propietary module.
>
> HTH.