putting srso=off in the cmdline fixed up my FUSE related issues.
Basically, I could not suspend anymore.
kernel 6.7.1.
This is the behavior with srso enabled...
https://paste.cachyos.org/p/bae7257
On Sat, Jan 27, 2024 at 06:58:37PM +0000, [email protected] wrote:
> putting srso=off in the cmdline fixed up my FUSE related issues.
> Basically, I could not suspend anymore.
> kernel 6.7.1.
>
> This is the behavior with srso enabled...
> https://paste.cachyos.org/p/bae7257
Can you disable, if possible, whatever's doing FUSE and try suspending
then?
Also, can you share full dmesg, .config and /proc/cpuinfo from the
machine?
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
Oh that was quick :)
I can umount the FUSE mounts and it will work fine.
Previously I didn't even suspend.
Also, in the log I had provided, I was on a cachyos kernel, but it
didn't matter, even the most recent arch kernel had the same issues.
full dmesg is no problem - I can do that the next day, when I startup
the server again
full ~/.config folder I don't want to share
here is /proc/cpuinfo https://paste.cachyos.org/p/158b767
Thanks
On 27.01.2024 20:19, Borislav Petkov wrote:
> On Sat, Jan 27, 2024 at 06:58:37PM +0000, [email protected]
> wrote:
>> putting srso=off in the cmdline fixed up my FUSE related issues.
>> Basically, I could not suspend anymore.
>> kernel 6.7.1.
>>
>> This is the behavior with srso enabled...
>> https://paste.cachyos.org/p/bae7257
>
> Can you disable, if possible, whatever's doing FUSE and try suspending
> then?
>
> Also, can you share full dmesg, .config and /proc/cpuinfo from the
> machine?
>
> Thx.
Oh that was quick :)
I can umount the FUSE mounts and it will work fine.
Previously I didn't even suspend.
Also, in the log I had provided, I was on a cachyos kernel, but it
didn't matter, even the most recent arch kernel had the same issues.
full dmesg is no problem - I can do that the next day, when I startup
the server again
full ~/.config folder I don't want to share
here is /proc/cpuinfo https://paste.cachyos.org/p/158b767
Thanks
On 27.01.2024 20:19, Borislav Petkov wrote:
> On Sat, Jan 27, 2024 at 06:58:37PM +0000, [email protected]
> wrote:
>> putting srso=off in the cmdline fixed up my FUSE related issues.
>> Basically, I could not suspend anymore.
>> kernel 6.7.1.
>>
>> This is the behavior with srso enabled...
>> https://paste.cachyos.org/p/bae7257
>
> Can you disable, if possible, whatever's doing FUSE and try suspending
> then?
>
> Also, can you share full dmesg, .config and /proc/cpuinfo from the
> machine?
>
> Thx.
On Sat, Jan 27, 2024 at 07:27:45PM +0000, [email protected] wrote:
> I can umount the FUSE mounts and it will work fine.
Aha, so it is FUSE-related.
How do I trigger it here? What are the steps to reproduce? Suspend while
I have a FUSE mount? How do I set it up so that it is as close to yours
as possible?
> Previously I didn't even suspend. Also, in the log I had provided,
> I was on a cachyos kernel, but it didn't matter, even the most recent
> arch kernel had the same issues.
You should try an upstream kernel to confirm it reproduces there - no
distro kernels.
> full dmesg is no problem - I can do that the next day, when I startup
> the server again full ~/.config folder I don't want to share
Not the full .config folder - just the kernel .config of the kernel
you're triggering this with so that I can try to do it here too.
> here is /proc/cpuinfo https://paste.cachyos.org/p/158b767
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
Hello.
I have the feeling that something else is amiss.
Currently under 6.7.2-2-cachyos with srso=off.
https://0x0.st/HDqP.txt
Now I feel, further communication is rather selfish, as a clean
environment is hard to provide.
In any case, my FUSE arguments are sshfs -o kernel_cache -o auto_cache
-o reconnect \
-o compression=yes -o cache_timeout=600 -o
ServerAliveInterval=30 \
"$source" "$target" -o idmap=user
With this line, I somehow managed to have the FUSE mount infinitely
mounted, even if the device was offline for couple of days.
A followed suspend would fail to freeze.
srso=off would reproducibly work.
Please provide me a specific version of a kernel I should try in my
configuration to try and reproduce.
I'd prefer a pre-compiled one; if not tell me...
I use archlinux.
Please give me a reason to not feel bad about myself.
All the best
On 27.01.2024 20:41, Borislav Petkov wrote:
> On Sat, Jan 27, 2024 at 07:27:45PM +0000, [email protected]
> wrote:
>> I can umount the FUSE mounts and it will work fine.
>
> Aha, so it is FUSE-related.
>
> How do I trigger it here? What are the steps to reproduce? Suspend
> while
> I have a FUSE mount? How do I set it up so that it is as close to yours
> as possible?
>
>> Previously I didn't even suspend. Also, in the log I had provided,
>> I was on a cachyos kernel, but it didn't matter, even the most recent
>> arch kernel had the same issues.
>
> You should try an upstream kernel to confirm it reproduces there - no
> distro kernels.
>
>> full dmesg is no problem - I can do that the next day, when I startup
>> the server again full ~/.config folder I don't want to share
>
> Not the full .config folder - just the kernel .config of the kernel
> you're triggering this with so that I can try to do it here too.
>
>> here is /proc/cpuinfo https://paste.cachyos.org/p/158b767
>
> Thx.
Whoops,
this fell through the cracks. Sorry about that.
On Mon, Jan 29, 2024 at 06:18:00PM +0000, [email protected] wrote:
> I have the feeling that something else is amiss.
> Currently under 6.7.2-2-cachyos with srso=off.
> https://0x0.st/HDqP.txt
Yah, your tasks refuse to freeze on suspend and they have this fuse
stuff in the stacktrace:
[ 6346.492593] task:btop state:D stack:0 pid:279617 tgid:1548 ppid:1531 flags:0x00004006
[ 6346.492600] Call Trace:
[ 6346.492602] <TASK>
[ 6346.492607] __schedule+0xd44/0x1af0
[ 6346.492614] ? srso_alias_return_thunk+0x5/0xfbef5
[ 6346.492617] ? __wake_up+0x9d/0xc0
[ 6346.492622] schedule+0x32/0xd0
[ 6346.492627] request_wait_answer+0xd0/0x2a0 [fuse db37c699d94393e946cf93306449ea0f307959a1]
[ 6346.492638] ? __pfx_autoremove_wake_function+0x10/0x10
[ 6346.492643] fuse_simple_request+0x21c/0x390 [fuse db37c699d94393e946cf93306449ea0f307959a1]
[ 6346.492653] fuse_statfs+0xf2/0x160 [fuse db37c699d94393e946cf93306449ea0f307959a1]
[ 6346.492667] statfs_by_dentry+0x67/0x90
>
> Now I feel, further communication is rather selfish, as a clean environment
> is hard to provide.
> In any case, my FUSE arguments are sshfs -o kernel_cache -o auto_cache -o
> reconnect \
> -o compression=yes -o cache_timeout=600 -o
> ServerAliveInterval=30 \
> "$source" "$target" -o idmap=user
>
> With this line, I somehow managed to have the FUSE mount infinitely mounted,
> even if the device was offline for couple of days.
> A followed suspend would fail to freeze.
> srso=off would reproducibly work.
Not in your example above. It would fail after a couple of suspend
cycles.
And looking at your splats
[ 6366.524953] ? switch_fpu_return+0x50/0xe0
[ 6366.524956] ? srso_alias_return_thunk+0x5/0xfbef5
^^^^^^^^^^^^^^^^^^^^^^^^
[ 6366.524958] ? exit_to_user_mode_prepare+0x132/0x1f
the right cmdline option to disable it is:
spec_rstack_overflow=off
not
srso=off
:-)
> Please provide me a specific version of a kernel I should try in my
> configuration to try and reproduce.
> I'd prefer a pre-compiled one; if not tell me...
> I use archlinux.
Just build the latest released kernel, which is 6.8 now. Take your
config and use it to build it. The net is full of tutorials how to do
so.
And then try with spec_rstack_overflow=off and let's see what that does.
> Please give me a reason to not feel bad about myself.
Don't worry - it's just a machine. :)
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
It worked, it worked!
https://up.tail.ws/txt/working-suspend.txt
I've tested it now quite some time.
But, I also had to start using 6.6.26-1-lts because my magewell capture
card wouldn't without.
Thanks again!
On 26.03.2024 23:21, Borislav Petkov wrote:
> Whoops,
>
> this fell through the cracks. Sorry about that.
>
> On Mon, Jan 29, 2024 at 06:18:00PM +0000, [email protected]
> wrote:
>> I have the feeling that something else is amiss.
>> Currently under 6.7.2-2-cachyos with srso=off.
>> https://0x0.st/HDqP.txt
>
> Yah, your tasks refuse to freeze on suspend and they have this fuse
> stuff in the stacktrace:
>
> [ 6346.492593] task:btop state:D stack:0 pid:279617
> tgid:1548 ppid:1531 flags:0x00004006
> [ 6346.492600] Call Trace:
> [ 6346.492602] <TASK>
> [ 6346.492607] __schedule+0xd44/0x1af0
> [ 6346.492614] ? srso_alias_return_thunk+0x5/0xfbef5
> [ 6346.492617] ? __wake_up+0x9d/0xc0
> [ 6346.492622] schedule+0x32/0xd0
> [ 6346.492627] request_wait_answer+0xd0/0x2a0 [fuse
> db37c699d94393e946cf93306449ea0f307959a1]
> [ 6346.492638] ? __pfx_autoremove_wake_function+0x10/0x10
> [ 6346.492643] fuse_simple_request+0x21c/0x390 [fuse
> db37c699d94393e946cf93306449ea0f307959a1]
> [ 6346.492653] fuse_statfs+0xf2/0x160 [fuse
> db37c699d94393e946cf93306449ea0f307959a1]
> [ 6346.492667] statfs_by_dentry+0x67/0x90
>
>>
>> Now I feel, further communication is rather selfish, as a clean
>> environment
>> is hard to provide.
>> In any case, my FUSE arguments are sshfs -o kernel_cache -o auto_cache
>> -o
>> reconnect \
>> -o compression=yes -o cache_timeout=600 -o
>> ServerAliveInterval=30 \
>> "$source" "$target" -o idmap=user
>>
>> With this line, I somehow managed to have the FUSE mount infinitely
>> mounted,
>> even if the device was offline for couple of days.
>> A followed suspend would fail to freeze.
>> srso=off would reproducibly work.
>
> Not in your example above. It would fail after a couple of suspend
> cycles.
>
> And looking at your splats
>
> [ 6366.524953] ? switch_fpu_return+0x50/0xe0
> [ 6366.524956] ? srso_alias_return_thunk+0x5/0xfbef5
> ^^^^^^^^^^^^^^^^^^^^^^^^
> [ 6366.524958] ? exit_to_user_mode_prepare+0x132/0x1f
>
> the right cmdline option to disable it is:
>
> spec_rstack_overflow=off
>
> not
>
> srso=off
>
> :-)
>
>> Please provide me a specific version of a kernel I should try in my
>> configuration to try and reproduce.
>> I'd prefer a pre-compiled one; if not tell me...
>> I use archlinux.
>
> Just build the latest released kernel, which is 6.8 now. Take your
> .config and use it to build it. The net is full of tutorials how to do
> so.
>
> And then try with spec_rstack_overflow=off and let's see what that
> does.
>
>> Please give me a reason to not feel bad about myself.
>
> Don't worry - it's just a machine. :)
On Tue, Apr 16, 2024 at 06:48:54AM +0000, [email protected] wrote:
> It worked, it worked!
>
> https://up.tail.ws/txt/working-suspend.txt
>
> I've tested it now quite some time.
> But, I also had to start using 6.6.26-1-lts because my magewell capture card
> wouldn't without.
Right, that thing I guess:
[Mon Apr 15 18:37:58 2024] ProCapture: loading out-of-tree module taints kernel.
[Mon Apr 15 18:37:58 2024] ProCapture: module verification failed: signature and/or required key missing - tainting kernel
So, machines do suspend even with SRSO enabled and since your machine is
affected, you probably should try without spec_rstack_overflow=off to
see if it works with the new kernel.
Then, the other thing you could try is whether suspend works without
that proprietary crap.
And then we can see.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
Now that it is deactivated, the machine no longer suspends!
https://up.tail.ws/txt/non-working-suspend.txt
> Then, the other thing you could try is whether suspend works without
> that proprietary crap.
I refuse. I can explain. I tried lots of capture cards that stated they
support uvcvideo and linux.
This problem existed prior and I need it for work on this machine.
But none of them worked reliably or would straight up glitch out.
Thats because they do not implement it properly.
It had to be a product from Magewell, who manage an array of bash
scripts and the AUR maintainer gets updates if something breaks, too.
Why do I use a PCIe HDMI Capture Card?
I need to use Cameras and Displays.
As for USB Cameras, unless its a product from e.g Logitech, they kept
giving me similar headaches.
And that included an older setup that ran a Intel i7 8700K as well.
Thx.
On 16.04.2024 10:45, Borislav Petkov wrote:
> On Tue, Apr 16, 2024 at 06:48:54AM +0000, [email protected]
> wrote:
>> It worked, it worked!
>>
>> https://up.tail.ws/txt/working-suspend.txt
>>
>> I've tested it now quite some time.
>> But, I also had to start using 6.6.26-1-lts because my magewell
>> capture card
>> wouldn't without.
>
> Right, that thing I guess:
>
> [Mon Apr 15 18:37:58 2024] ProCapture: loading out-of-tree module
> taints kernel.
> [Mon Apr 15 18:37:58 2024] ProCapture: module verification failed:
> signature and/or required key missing - tainting kernel
>
> So, machines do suspend even with SRSO enabled and since your machine
> is
> affected, you probably should try without spec_rstack_overflow=off to
> see if it works with the new kernel.
>
> Then, the other thing you could try is whether suspend works without
> that proprietary crap.
>
> And then we can see.
>
> Thx.
Today I failed to suspend, and the spec_rstack thing was off.
https://up.tail.ws/txt/non-working-suspend-2.txt
On 16.04.2024 22:14, [email protected] wrote:
> Now that it is deactivated, the machine no longer suspends!
>
> https://up.tail.ws/txt/non-working-suspend.txt
>
>> Then, the other thing you could try is whether suspend works without
>> that proprietary crap.
>
> I refuse. I can explain. I tried lots of capture cards that stated
> they support uvcvideo and linux.
> This problem existed prior and I need it for work on this machine.
> But none of them worked reliably or would straight up glitch out.
> Thats because they do not implement it properly.
>
> It had to be a product from Magewell, who manage an array of bash
> scripts and the AUR maintainer gets updates if something breaks, too.
> Why do I use a PCIe HDMI Capture Card?
> I need to use Cameras and Displays.
>
> As for USB Cameras, unless its a product from e.g Logitech, they kept
> giving me similar headaches.
> And that included an older setup that ran a Intel i7 8700K as well.
>
> Thx.
>
>
>
> On 16.04.2024 10:45, Borislav Petkov wrote:
>> On Tue, Apr 16, 2024 at 06:48:54AM +0000, [email protected]
>> wrote:
>>> It worked, it worked!
>>>
>>> https://up.tail.ws/txt/working-suspend.txt
>>>
>>> I've tested it now quite some time.
>>> But, I also had to start using 6.6.26-1-lts because my magewell
>>> capture card
>>> wouldn't without.
>>
>> Right, that thing I guess:
>>
>> [Mon Apr 15 18:37:58 2024] ProCapture: loading out-of-tree module
>> taints kernel.
>> [Mon Apr 15 18:37:58 2024] ProCapture: module verification failed:
>> signature and/or required key missing - tainting kernel
>>
>> So, machines do suspend even with SRSO enabled and since your machine
>> is
>> affected, you probably should try without spec_rstack_overflow=off to
>> see if it works with the new kernel.
>>
>> Then, the other thing you could try is whether suspend works without
>> that proprietary crap.
>>
>> And then we can see.
>>
>> Thx.
On Wed, Apr 17, 2024 at 08:08:53AM +0000, [email protected] wrote:
> Today I failed to suspend, and the spec_rstack thing was off.
>
> https://up.tail.ws/txt/non-working-suspend-2.txt
Ok, but please do not top-post. Put your reply underneath the next
you're replying to and remove the rest of the quoted text like I just
did.
So this could be caused by the proprietary module or something else.
If you want this debugged, you'd have to try to reproduce it with the
latest upstream kernel from here:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
after having removed the propietary module.
HTH.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
Hello. I have installed the kernel through
https://aur.archlinux.org/packages/linux-mainline and noticed that SRSO
is disabled. "Speculative Return Stack Overflow: IBPB-extending
microcode not applied!"
cat /sys/devices/system/cpu/vulnerabilities/spec_rstack_overflow
Vulnerable: Safe RET, no microcode
So far, I have been succesfully suspending the one night I used it.
Assuming this is per-default, I've installed the kernel module for my
PCIe Capture card and testing it.
Any new instructions?
Thanks
On 17.04.2024 11:12, Borislav Petkov wrote:
> On Wed, Apr 17, 2024 at 08:08:53AM +0000, [email protected]
> wrote:
>> Today I failed to suspend, and the spec_rstack thing was off.
>>
>> https://up.tail.ws/txt/non-working-suspend-2.txt
>
> Ok, but please do not top-post. Put your reply underneath the next
> you're replying to and remove the rest of the quoted text like I just
> did.
>
> So this could be caused by the proprietary module or something else.
>
> If you want this debugged, you'd have to try to reproduce it with the
> latest upstream kernel from here:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
>
> after having removed the propietary module.
>
> HTH.