2022-06-28 05:41:27

by Alexandre Messier

[permalink] [raw]
Subject: [REGRESSION] Unable to unlock encrypted disk starting with kernel 5.19-rc1+

Hello,

I tested 5.19-rc4 on my system that is currently running 5.18.0, and came
across an issue when unlocking the encrypted rootfs disk at startup. The error
message is:

device-mapper: reload ioctl on nvme0n1p3_crypt (254:0) failed: No such file or directory

The kernel log shows:

device-mapper: table: 254:0: crypt: Error allocating crypto tfm (-ENOENT)
device-mapper: ioctl: error adding target to table

I tested the previous 5.19-rcX, and the issue started happening with 5.19-rc1.
A bisection between 5.18.0 and 5.19-rc1 identifies the following commit:

8ad7e8f69695 ("x86/fpu/xsave: Support XSAVEC in the kernel")

I reverted that commit on top of 5.19-rc4, and unlocking the encrypted disk
works again.

Some more information about the system:
- CPU is AMD Ryzen 5700G
- Userspace is Debian Sid
- The encrypted disk setup is a default encrypted rootfs, as configured by the
standard Debian installer

Please let me know if more information is needed, or if some tests are needed
to be run.

Thanks,
Alex

#regzbot introduced 8ad7e8f69695


2022-06-28 09:47:46

by Borislav Petkov

[permalink] [raw]
Subject: Re: [REGRESSION] Unable to unlock encrypted disk starting with kernel 5.19-rc1+

On Tue, Jun 28, 2022 at 01:13:30AM -0400, Alexandre Messier wrote:
> Please let me know if more information is needed, or if some tests are needed
> to be run.

Yeah, pls send /proc/cpuinfo and full dmesg - privately is fine too.

Also, it would be lovely if I were able to reproduce this on a machine
here but mine doesn't have a crypto rootfs.

Perhaps you can point me to the exact instructions you're running to
decrypt your rootfs and I can try to create a usb crypto disk and try to
reproduce it with them...

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2022-06-28 17:00:48

by Dave Hansen

[permalink] [raw]
Subject: Re: [REGRESSION] Unable to unlock encrypted disk starting with kernel 5.19-rc1+

First of all, thank you for bisecting this! I know those are a lot of work.

That XSAVEC patch modifies the AVX register save/restore code. There is
a set of x86 AES acceleration instructions called AES-NI. Those
instructions use the AVX registers. So, it's at least a plausible
connection between that patch and your symptoms. But, I don't think
anyone's been able to reproduce what you're seeing yet.

The kernel XSAVE buffer formats also differ slightly between AMD and
Intel. That *should* be OK, but it might explain why I can't reproduce
this.

If you get a chance, could you apply this (ugly hackish) patch to the
userspace 'cryptsetup' utility and run it?

https://sr71.net/~dave/intel/cryptsetup-memcmp.patch

On Ubuntu at least, it was as simple as:

apt-get source cryptsetup
apt-get build-dep cryptsetup
cd cryptsetup-1.6.6
./configure
make

Then I could run:

./src/cryptsetup benchmark --cipher=aes-xts --key-size=512
and
./src/cryptsetup benchmark --cipher=aes-xts --key-size=256

With that patch applied, you should see some output like:

# ./src/cryptsetup benchmark --cipher=aes-xts --key-size=512
# Tests are approximate using memory only (no storage IO).
memcmp12: 0
memcmp23: 0
memcmp13: 0
memcmp12: -173
memcmp23: 173
memcmp13: 0
# Algorithm | Key | Encryption | Decryption
aes-xts 512b 4592.2 MiB/s 4192.0 MiB/s

The "memcmp13:" lines should both be 0. That means that an encryption
and decryption cycle didn't change the data. You *might* have to run
this in a loop if there's some kind of bad timing involved in triggering
the bug.

If you see a "memcmp13:" with something other than 0, that will narrow
things down and means we'll have a pretty quick reproducer that doesn't
involve luks which should speed things along.

2022-06-28 21:37:37

by Alexandre Messier

[permalink] [raw]
Subject: Re: [REGRESSION] Unable to unlock encrypted disk starting with kernel 5.19-rc1+

On 2022-06-28 05:20, Borislav Petkov wrote:
> On Tue, Jun 28, 2022 at 01:13:30AM -0400, Alexandre Messier wrote:
>> Please let me know if more information is needed, or if some tests are needed
>> to be run.
>
> Yeah, pls send /proc/cpuinfo and full dmesg - privately is fine too.

Here is the cpuinfo output:

processor : 0
vendor_id : AuthenticAMD
cpu family : 25
model : 80
model name : AMD Ryzen 7 5700G with Radeon Graphics
stepping : 0
microcode : 0xa50000c
cpu MHz : 3514.072
cache size : 512 KB
physical id : 0
siblings : 16
core id : 0
cpu cores : 8
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 16
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext
fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl
nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq
monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave
avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm
sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce
topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb
cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall
fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed
adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1
xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local
clzero irperf xsaveerptr rdpru wbnoinvd cppc arat npt lbrv
svm_lock nrip_save tsc_scale vmcb_clean flushbyasid
decodeassists pausefilter pfthreshold avic v_vmsave_vmload
vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid
overflow_recov succor smca fsrm
bugs : sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass
bogomips : 7585.33
TLB size : 2560 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14]

And here is the dmesg output of 5.19-rc4 without the revert (taken from the
initramfs). I put it on a paste service since it is too big for email:

https://paste.debian.net/1245491/

>
> Also, it would be lovely if I were able to reproduce this on a machine
> here but mine doesn't have a crypto rootfs.
>
> Perhaps you can point me to the exact instructions you're running to
> decrypt your rootfs and I can try to create a usb crypto disk and try to
> reproduce it with them...

I setup an unencrypted Debian installation on another drive to be able to run
cryptsetup commands in userspace while using rc4, and was able to see the
issue. In a up-to-date Debian Sid installation (important, more on this below),
running these commands makes it possible to reproduce the issue:

dd if=/dev/zero bs=1M count=20 of=./test.img
sudo cryptsetup luksFormat ./test.img
sudo cryptsetup luksOpen ./test.img test_crypt

The "luksOpen" will fail with the same error message I get on my main system.

It seems using the latest Debian Sid is important. At first, I was trying with
Debian Bullseye, but everything was working, even unlocking my main drive.

Could it be a difference due to the cryptsetup version? Sid is using 2.4.3,
while Bullseye is based on 2.3.7. I will try to compile cryptsetup 2.4.3 and
use it in a Bullseye system with kernel 5.19-rc4, to see if the issue occurs
in that setup.

Thanks,
Alex

>
> Thx.
>

2022-06-28 23:17:17

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [REGRESSION] Unable to unlock encrypted disk starting with kernel 5.19-rc1+

Alexandre,

On Tue, Jun 28 2022 at 17:31, Alexandre Messier wrote:
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
> pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext
> fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl
> nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq
> monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave
> avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm
> sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce
> topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb
> cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall
> fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed
> adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1
> xsaves cqm_llc cqm_occup_llc cqm_mbm_total
> cqm_mbm_local

So this CPU supports XSAVEC and XSAVES which means the kernel uses
XSAVES as the kernel before that.

> And here is the dmesg output of 5.19-rc4 without the revert (taken from the
> initramfs). I put it on a paste service since it is too big for email:
>
> https://paste.debian.net/1245491/

[ 0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x200: 'Protection Keys User registers'
[ 0.000000] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256
[ 0.000000] x86/fpu: xstate_offset[9]: 832, xstate_sizes[9]: 8
[ 0.000000] x86/fpu: Enabled xstate features 0x207, context size is 840 bytes, using 'compacted' format.

This is correct. Is there any difference on a 5.18 kernel or on 5.19-rc
with the commit reverted? I doubt that.

I'm completely puzzled and stared at the commit in question on and off,
but I can't spot the fail.

> I setup an unencrypted Debian installation on another drive to be able to run
> cryptsetup commands in userspace while using rc4, and was able to see the
> issue. In a up-to-date Debian Sid installation (important, more on this below),
> running these commands makes it possible to reproduce the issue:
>
> dd if=/dev/zero bs=1M count=20 of=./test.img
> sudo cryptsetup luksFormat ./test.img
> sudo cryptsetup luksOpen ./test.img test_crypt
>
> The "luksOpen" will fail with the same error message I get on my main system.
>
> It seems using the latest Debian Sid is important. At first, I was trying with
> Debian Bullseye, but everything was working, even unlocking my main drive.
>
> Could it be a difference due to the cryptsetup version? Sid is using 2.4.3,
> while Bullseye is based on 2.3.7. I will try to compile cryptsetup 2.4.3 and
> use it in a Bullseye system with kernel 5.19-rc4, to see if the issue occurs
> in that setup.

It might use a different crypto algorithm.

Still confused....

I'll have another look tomorrow morning with brain awake.

Thanks,

tglx

2022-06-28 23:37:24

by Alexandre Messier

[permalink] [raw]
Subject: Re: [REGRESSION] Unable to unlock encrypted disk starting with kernel 5.19-rc1+

On 2022-06-28 18:59, Thomas Gleixner wrote:
> Alexandre,
>
> On Tue, Jun 28 2022 at 17:31, Alexandre Messier wrote:
>> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
>> pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext
>> fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl
>> nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq
>> monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave
>> avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm
>> sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce
>> topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb
>> cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall
>> fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed
>> adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1
>> xsaves cqm_llc cqm_occup_llc cqm_mbm_total
>> cqm_mbm_local
>
> So this CPU supports XSAVEC and XSAVES which means the kernel uses
> XSAVES as the kernel before that.
>
>> And here is the dmesg output of 5.19-rc4 without the revert (taken from the
>> initramfs). I put it on a paste service since it is too big for email:
>>
>> https://paste.debian.net/1245491/
>
> [ 0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
> [ 0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
> [ 0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
> [ 0.000000] x86/fpu: Supporting XSAVE feature 0x200: 'Protection Keys User registers'
> [ 0.000000] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256
> [ 0.000000] x86/fpu: xstate_offset[9]: 832, xstate_sizes[9]: 8
> [ 0.000000] x86/fpu: Enabled xstate features 0x207, context size is 840 bytes, using 'compacted' format.
>
> This is correct. Is there any difference on a 5.18 kernel or on 5.19-rc
> with the commit reverted? I doubt that.
>
> I'm completely puzzled and stared at the commit in question on and off,
> but I can't spot the fail.
>
>> I setup an unencrypted Debian installation on another drive to be able to run
>> cryptsetup commands in userspace while using rc4, and was able to see the
>> issue. In a up-to-date Debian Sid installation (important, more on this below),
>> running these commands makes it possible to reproduce the issue:
>>
>> dd if=/dev/zero bs=1M count=20 of=./test.img
>> sudo cryptsetup luksFormat ./test.img
>> sudo cryptsetup luksOpen ./test.img test_crypt
>>
>> The "luksOpen" will fail with the same error message I get on my main system.
>>
>> It seems using the latest Debian Sid is important. At first, I was trying with
>> Debian Bullseye, but everything was working, even unlocking my main drive.
>>
>> Could it be a difference due to the cryptsetup version? Sid is using 2.4.3,
>> while Bullseye is based on 2.3.7. I will try to compile cryptsetup 2.4.3 and
>> use it in a Bullseye system with kernel 5.19-rc4, to see if the issue occurs
>> in that setup.
>
> It might use a different crypto algorithm.
>
> Still confused....
>
> I'll have another look tomorrow morning with brain awake.

Thomas, Borislav,

Well this is embarrassing... I ran the test Dave sent in his email, and when
running it on that unencrypted Debian Sid installation with kernel 5.19-rc4, it
failed too, but indicated that "aes-xts" was not available... It was right.

I forgot to mention I am using a custom kernel config, and indeed CRYPTO_XTS
was not enabled. When I enabled it, the cryptsetup benchmark worked, along with
the test that previously failed with the test file.

So I enabled that option too on my main installation and I am now able to
unlock the drive like before. I don't know why it is needed now, but that fixed
the issue.

Sorry again for the trouble, this was not a kernel regression, but my error.

Thanks,
Alex

#regzbot invalid: Missing kernel config, not kernel regression

>
> Thanks,
>
> tglx

2022-06-29 16:02:09

by Dave Hansen

[permalink] [raw]
Subject: Re: [REGRESSION] Unable to unlock encrypted disk starting with kernel 5.19-rc1+

On 6/28/22 16:24, Alexandre Messier wrote:
> Sorry again for the trouble, this was not a kernel regression, but my error.

Been there, done that! I'm just glad we don't have anything to fix. :)