2023-12-03 16:31:21

by Juhyung Park

[permalink] [raw]
Subject: Weird EROFS data corruption

(Cc'ing f2fs and crypto as I've noticed something similar with f2fs a
while ago, which may mean that this is not specific to EROFS:
https://lore.kernel.org/all/CAD14+f2nBZtLfLC6CwNjgCOuRRRjwzttp3D3iK4Of+1EEjK+cw@mail.gmail.com/
)

Hi.

I'm encountering a very weird EROFS data corruption.

I noticed when I build an EROFS image for AOSP development, the device
would randomly not boot from a certain build.
After inspecting the log, I noticed that a file got corrupted.

After adding a hash check during the build flow, I noticed that EROFS
would randomly read data wrong.

I now have a reliable method of reproducing the issue, but here's the
funny/weird part: it's only happening on my laptop (i7-1185G7). This
is not happening with my 128 cores buildfarm machine (Threadripper
3990X).

I first suspected a hardware issue, but:
a. The laptop had its motherboard replaced recently (due to a failing
physical Type-C port).
b. The laptop passes memory test (memtest86).
c. This happens on all kernel versions from v5.4 to the latest v6.6
including my personal custom builds and Canonical's official Ubuntu
kernels.
d. This happens on different host SSDs and file-system combinations.
e. This only happens on LZ4. LZ4HC doesn't trigger the issue.
f. This only happens when mounting the image natively by the kernel.
Using fuse with erofsfuse is fine.

This is how I'm reproducing the issue:

# mkfs.erofs -zlz4 -T0 --ignore-mtime tmp.img /mnt/lib64/
mkfs.erofs 1.7
Build completed.
------
Filesystem UUID: 3a7e1f90-5450-40f9-92a2-945bacdb51c3
Filesystem total blocks: 53075 (of 4096-byte blocks)
Filesystem total inodes: 973
Filesystem total metadata blocks: 73
Filesystem total deduplicated bytes (of source files): 0
# mount tmp.img /mnt
# for i in {1..30}; do echo 3 > /proc/sys/vm/drop_caches; find /mnt
-type f -exec xxh64sum {} + | sort -k2 | xxh64sum -; done
0b40f1abfbb6e9a8 stdin
0b40f1abfbb6e9a8 stdin
0b40f1abfbb6e9a8 stdin
0b40f1abfbb6e9a8 stdin
0b40f1abfbb6e9a8 stdin
0b40f1abfbb6e9a8 stdin
293a8e7de2a53019 stdin
0b40f1abfbb6e9a8 stdin
0b40f1abfbb6e9a8 stdin
0b40f1abfbb6e9a8 stdin
0b40f1abfbb6e9a8 stdin
0b40f1abfbb6e9a8 stdin
0b40f1abfbb6e9a8 stdin
0b40f1abfbb6e9a8 stdin
0b40f1abfbb6e9a8 stdin
0b40f1abfbb6e9a8 stdin
0b40f1abfbb6e9a8 stdin
293a8e7de2a53019 stdin
293a8e7de2a53019 stdin
0b40f1abfbb6e9a8 stdin
0b40f1abfbb6e9a8 stdin
0b40f1abfbb6e9a8 stdin
0b40f1abfbb6e9a8 stdin
0b40f1abfbb6e9a8 stdin
0b40f1abfbb6e9a8 stdin
0b40f1abfbb6e9a8 stdin
0b40f1abfbb6e9a8 stdin
0b40f1abfbb6e9a8 stdin
0b40f1abfbb6e9a8 stdin
0b40f1abfbb6e9a8 stdin

As you can see, I sometimes get 0b40f1abfbb6e9a8 and 293a8e7de2a53019 in others.

This is when I manually inspect the failing file:

# echo 3 > /proc/sys/vm/drop_caches; xxh64sum
/mnt/[email protected]
dc96f35f015a0e5d /mnt/[email protected]
# xxd < /mnt/[email protected] > /tmp/1
[ several more attempts until I get a different hash... ]
# echo 3 > /proc/sys/vm/drop_caches; xxh64sum
/mnt/[email protected]
1cfe5d69c28fff6c /mnt/[email protected]
# xxd < /mnt/[email protected] > /tmp/2
# diff /tmp/[12]
3741c3741
< 0000e9c0: f40e 0000 b46b 0000 ac5c 0000 140e 0000 .....k...\......
---
> 0000e9c0: 445a 0000 e40d 0000 ac5c 0000 140e 0000 DZ.......\......

This could still very well be my hardware issue, but I highly suspect
something's wrong with the kernel software code that happens to only
trigger on my hardware configuration.

I've uploaded the generated image here:
https://arter97.com/.erofs/
but I'm not sure it'll be reproducible on other machines.

I've also tried updating the LZ4 module in the /lib/lz4 to the latest
v1.9.4 and the latest dev trunk (4032c8c787e6). I've managed to get it
working with the Linux kernel, but the corruption still happens.

Let me know if there's anything I can help to narrow down the culprit.

Thanks,


2023-12-03 18:33:29

by Gao Xiang

[permalink] [raw]
Subject: Re: Weird EROFS data corruption

Hi Juhyung,

On 2023/12/4 00:22, Juhyung Park wrote:
> (Cc'ing f2fs and crypto as I've noticed something similar with f2fs a
> while ago, which may mean that this is not specific to EROFS:
> https://lore.kernel.org/all/CAD14+f2nBZtLfLC6CwNjgCOuRRRjwzttp3D3iK4Of+1EEjK+cw@mail.gmail.com/
> )
>
> Hi.
>
> I'm encountering a very weird EROFS data corruption.
>
> I noticed when I build an EROFS image for AOSP development, the device
> would randomly not boot from a certain build.
> After inspecting the log, I noticed that a file got corrupted.

Is it observed on your laptop (i7-1185G7), yes? or some other arm64
device?

>
> After adding a hash check during the build flow, I noticed that EROFS
> would randomly read data wrong.
>
> I now have a reliable method of reproducing the issue, but here's the
> funny/weird part: it's only happening on my laptop (i7-1185G7). This
> is not happening with my 128 cores buildfarm machine (Threadripper
> 3990X).>
> I first suspected a hardware issue, but:
> a. The laptop had its motherboard replaced recently (due to a failing
> physical Type-C port).
> b. The laptop passes memory test (memtest86).
> c. This happens on all kernel versions from v5.4 to the latest v6.6
> including my personal custom builds and Canonical's official Ubuntu
> kernels.
> d. This happens on different host SSDs and file-system combinations.
> e. This only happens on LZ4. LZ4HC doesn't trigger the issue.
> f. This only happens when mounting the image natively by the kernel.
> Using fuse with erofsfuse is fine.

I think it's a weird issue with inplace decompression because you said
it depends on the hardware. In addition, with your dataset sadly I
cannot reproduce on my local server (Xeon(R) CPU E5-2682 v4).

What is the difference between these two machines? just different CPU or
they have some other difference like different compliers?

Thanks,
Gao Xiang

2023-12-03 18:33:36

by Juhyung Park

[permalink] [raw]
Subject: Re: Weird EROFS data corruption

Hi Gao,

On Mon, Dec 4, 2023 at 1:52 AM Gao Xiang <[email protected]> wrote:
>
> Hi Juhyung,
>
> On 2023/12/4 00:22, Juhyung Park wrote:
> > (Cc'ing f2fs and crypto as I've noticed something similar with f2fs a
> > while ago, which may mean that this is not specific to EROFS:
> > https://lore.kernel.org/all/CAD14+f2nBZtLfLC6CwNjgCOuRRRjwzttp3D3iK4Of+1EEjK+cw@mail.gmail.com/
> > )
> >
> > Hi.
> >
> > I'm encountering a very weird EROFS data corruption.
> >
> > I noticed when I build an EROFS image for AOSP development, the device
> > would randomly not boot from a certain build.
> > After inspecting the log, I noticed that a file got corrupted.
>
> Is it observed on your laptop (i7-1185G7), yes? or some other arm64
> device?

Yes, only on my laptop. The arm64 device seems fine.
The reason that it would not boot was that the host machine (my
laptop) was repacking the EROFS image wrongfully.

The workflow is something like this:
Server-built EROFS AOSP image -> Image copied to laptop -> Laptop
mounts the EROFS image -> Copies the entire content to a scratch
directory (CORRUPT!) -> Changes some files -> mkfs.erofs

So the device is not responsible for the corruption, the laptop is.

>
> >
> > After adding a hash check during the build flow, I noticed that EROFS
> > would randomly read data wrong.
> >
> > I now have a reliable method of reproducing the issue, but here's the
> > funny/weird part: it's only happening on my laptop (i7-1185G7). This
> > is not happening with my 128 cores buildfarm machine (Threadripper
> > 3990X).>
> > I first suspected a hardware issue, but:
> > a. The laptop had its motherboard replaced recently (due to a failing
> > physical Type-C port).
> > b. The laptop passes memory test (memtest86).
> > c. This happens on all kernel versions from v5.4 to the latest v6.6
> > including my personal custom builds and Canonical's official Ubuntu
> > kernels.
> > d. This happens on different host SSDs and file-system combinations.
> > e. This only happens on LZ4. LZ4HC doesn't trigger the issue.
> > f. This only happens when mounting the image natively by the kernel.
> > Using fuse with erofsfuse is fine.
>
> I think it's a weird issue with inplace decompression because you said
> it depends on the hardware. In addition, with your dataset sadly I
> cannot reproduce on my local server (Xeon(R) CPU E5-2682 v4).

As I feared. Bummer :(

>
> What is the difference between these two machines? just different CPU or
> they have some other difference like different compliers?

I fully and exclusively control both devices, and the setup is almost the same.
Same Ubuntu version, kernel/compiler version.

But as I said, on my laptop, the issue happens on kernels that someone
else (Canonical) built, so I don't think it matters.

>
> Thanks,
> Gao Xiang

2023-12-03 18:33:44

by Gao Xiang

[permalink] [raw]
Subject: Re: Weird EROFS data corruption



On 2023/12/4 01:01, Juhyung Park wrote:
> Hi Gao,
>
> On Mon, Dec 4, 2023 at 1:52 AM Gao Xiang <[email protected]> wrote:
>>
>> Hi Juhyung,
>>
>> On 2023/12/4 00:22, Juhyung Park wrote:
>>> (Cc'ing f2fs and crypto as I've noticed something similar with f2fs a
>>> while ago, which may mean that this is not specific to EROFS:
>>> https://lore.kernel.org/all/CAD14+f2nBZtLfLC6CwNjgCOuRRRjwzttp3D3iK4Of+1EEjK+cw@mail.gmail.com/
>>> )
>>>
>>> Hi.
>>>
>>> I'm encountering a very weird EROFS data corruption.
>>>
>>> I noticed when I build an EROFS image for AOSP development, the device
>>> would randomly not boot from a certain build.
>>> After inspecting the log, I noticed that a file got corrupted.
>>
>> Is it observed on your laptop (i7-1185G7), yes? or some other arm64
>> device?
>
> Yes, only on my laptop. The arm64 device seems fine.
> The reason that it would not boot was that the host machine (my
> laptop) was repacking the EROFS image wrongfully.
>
> The workflow is something like this:
> Server-built EROFS AOSP image -> Image copied to laptop -> Laptop
> mounts the EROFS image -> Copies the entire content to a scratch
> directory (CORRUPT!) -> Changes some files -> mkfs.erofs
>
> So the device is not responsible for the corruption, the laptop is.

Ok.

>
>>
>>>
>>> After adding a hash check during the build flow, I noticed that EROFS
>>> would randomly read data wrong.
>>>
>>> I now have a reliable method of reproducing the issue, but here's the
>>> funny/weird part: it's only happening on my laptop (i7-1185G7). This
>>> is not happening with my 128 cores buildfarm machine (Threadripper
>>> 3990X).>
>>> I first suspected a hardware issue, but:
>>> a. The laptop had its motherboard replaced recently (due to a failing
>>> physical Type-C port).
>>> b. The laptop passes memory test (memtest86).
>>> c. This happens on all kernel versions from v5.4 to the latest v6.6
>>> including my personal custom builds and Canonical's official Ubuntu
>>> kernels.
>>> d. This happens on different host SSDs and file-system combinations.
>>> e. This only happens on LZ4. LZ4HC doesn't trigger the issue.
>>> f. This only happens when mounting the image natively by the kernel.
>>> Using fuse with erofsfuse is fine.
>>
>> I think it's a weird issue with inplace decompression because you said
>> it depends on the hardware. In addition, with your dataset sadly I
>> cannot reproduce on my local server (Xeon(R) CPU E5-2682 v4).
>
> As I feared. Bummer :(
>
>>
>> What is the difference between these two machines? just different CPU or
>> they have some other difference like different compliers?
>
> I fully and exclusively control both devices, and the setup is almost the same.
> Same Ubuntu version, kernel/compiler version.
>
> But as I said, on my laptop, the issue happens on kernels that someone
> else (Canonical) built, so I don't think it matters.

The only thing I could say is that the kernel side has optimized
inplace decompression compared to fuse so that it will reuse the
same buffer for decompression but with a safe margin (according to
the current lz4 decompression implementation). It shouldn't behave
different just due to different CPUs. Let me find more clues
later, also maybe we should introduce a way for users to turn off
this if needed.

Thanks,
Gao Xiang

2023-12-03 18:33:51

by Juhyung Park

[permalink] [raw]
Subject: Re: Weird EROFS data corruption

Hi Gao,

On Mon, Dec 4, 2023 at 2:22 AM Gao Xiang <[email protected]> wrote:
>
>
>
> On 2023/12/4 01:01, Juhyung Park wrote:
> > Hi Gao,
> >
> > On Mon, Dec 4, 2023 at 1:52 AM Gao Xiang <[email protected]> wrote:
> >>
> >> Hi Juhyung,
> >>
> >> On 2023/12/4 00:22, Juhyung Park wrote:
> >>> (Cc'ing f2fs and crypto as I've noticed something similar with f2fs a
> >>> while ago, which may mean that this is not specific to EROFS:
> >>> https://lore.kernel.org/all/CAD14+f2nBZtLfLC6CwNjgCOuRRRjwzttp3D3iK4Of+1EEjK+cw@mail.gmail.com/
> >>> )
> >>>
> >>> Hi.
> >>>
> >>> I'm encountering a very weird EROFS data corruption.
> >>>
> >>> I noticed when I build an EROFS image for AOSP development, the device
> >>> would randomly not boot from a certain build.
> >>> After inspecting the log, I noticed that a file got corrupted.
> >>
> >> Is it observed on your laptop (i7-1185G7), yes? or some other arm64
> >> device?
> >
> > Yes, only on my laptop. The arm64 device seems fine.
> > The reason that it would not boot was that the host machine (my
> > laptop) was repacking the EROFS image wrongfully.
> >
> > The workflow is something like this:
> > Server-built EROFS AOSP image -> Image copied to laptop -> Laptop
> > mounts the EROFS image -> Copies the entire content to a scratch
> > directory (CORRUPT!) -> Changes some files -> mkfs.erofs
> >
> > So the device is not responsible for the corruption, the laptop is.
>
> Ok.
>
> >
> >>
> >>>
> >>> After adding a hash check during the build flow, I noticed that EROFS
> >>> would randomly read data wrong.
> >>>
> >>> I now have a reliable method of reproducing the issue, but here's the
> >>> funny/weird part: it's only happening on my laptop (i7-1185G7). This
> >>> is not happening with my 128 cores buildfarm machine (Threadripper
> >>> 3990X).>
> >>> I first suspected a hardware issue, but:
> >>> a. The laptop had its motherboard replaced recently (due to a failing
> >>> physical Type-C port).
> >>> b. The laptop passes memory test (memtest86).
> >>> c. This happens on all kernel versions from v5.4 to the latest v6.6
> >>> including my personal custom builds and Canonical's official Ubuntu
> >>> kernels.
> >>> d. This happens on different host SSDs and file-system combinations.
> >>> e. This only happens on LZ4. LZ4HC doesn't trigger the issue.
> >>> f. This only happens when mounting the image natively by the kernel.
> >>> Using fuse with erofsfuse is fine.
> >>
> >> I think it's a weird issue with inplace decompression because you said
> >> it depends on the hardware. In addition, with your dataset sadly I
> >> cannot reproduce on my local server (Xeon(R) CPU E5-2682 v4).
> >
> > As I feared. Bummer :(
> >
> >>
> >> What is the difference between these two machines? just different CPU or
> >> they have some other difference like different compliers?
> >
> > I fully and exclusively control both devices, and the setup is almost the same.
> > Same Ubuntu version, kernel/compiler version.
> >
> > But as I said, on my laptop, the issue happens on kernels that someone
> > else (Canonical) built, so I don't think it matters.
>
> The only thing I could say is that the kernel side has optimized
> inplace decompression compared to fuse so that it will reuse the
> same buffer for decompression but with a safe margin (according to
> the current lz4 decompression implementation). It shouldn't behave
> different just due to different CPUs. Let me find more clues
> later, also maybe we should introduce a way for users to turn off
> this if needed.

Cool :)

I'm comfortable changing and building my own custom kernel for this
specific laptop. Feel free to ask me to try out some patches.

Thanks.

>
> Thanks,
> Gao Xiang

2023-12-04 04:32:50

by Gao Xiang

[permalink] [raw]
Subject: Re: Weird EROFS data corruption



On 2023/12/4 01:32, Juhyung Park wrote:
> Hi Gao,

...

>>>
>>>>
>>>> What is the difference between these two machines? just different CPU or
>>>> they have some other difference like different compliers?
>>>
>>> I fully and exclusively control both devices, and the setup is almost the same.
>>> Same Ubuntu version, kernel/compiler version.
>>>
>>> But as I said, on my laptop, the issue happens on kernels that someone
>>> else (Canonical) built, so I don't think it matters.
>>
>> The only thing I could say is that the kernel side has optimized
>> inplace decompression compared to fuse so that it will reuse the
>> same buffer for decompression but with a safe margin (according to
>> the current lz4 decompression implementation). It shouldn't behave
>> different just due to different CPUs. Let me find more clues
>> later, also maybe we should introduce a way for users to turn off
>> this if needed.
>
> Cool :)
>
> I'm comfortable changing and building my own custom kernel for this
> specific laptop. Feel free to ask me to try out some patches.

Thanks, I need to narrow down this issue:

- First, could you apply the following diff to test if it's still
reproducable?

diff --git a/fs/erofs/decompressor.c b/fs/erofs/decompressor.c
index 021be5feb1bc..40a306628e1a 100644
--- a/fs/erofs/decompressor.c
+++ b/fs/erofs/decompressor.c
@@ -131,7 +131,7 @@ static void *z_erofs_lz4_handle_overlap(struct z_erofs_lz4_decompress_ctx *ctx,

if (rq->inplace_io) {
omargin = PAGE_ALIGN(ctx->oend) - ctx->oend;
- if (rq->partial_decoding || !may_inplace ||
+ if (1 || rq->partial_decoding || !may_inplace ||
omargin < LZ4_DECOMPRESS_INPLACE_MARGIN(rq->inputsize))
goto docopy;

- Could you share the full message about the output of `lscpu`?

Thanks,
Gao Xiang

2023-12-04 04:32:57

by Juhyung Park

[permalink] [raw]
Subject: Re: Weird EROFS data corruption

Hi Gao,

On Mon, Dec 4, 2023 at 12:28 PM Gao Xiang <[email protected]> wrote:
>
>
>
> On 2023/12/4 01:32, Juhyung Park wrote:
> > Hi Gao,
>
> ...
>
> >>>
> >>>>
> >>>> What is the difference between these two machines? just different CPU or
> >>>> they have some other difference like different compliers?
> >>>
> >>> I fully and exclusively control both devices, and the setup is almost the same.
> >>> Same Ubuntu version, kernel/compiler version.
> >>>
> >>> But as I said, on my laptop, the issue happens on kernels that someone
> >>> else (Canonical) built, so I don't think it matters.
> >>
> >> The only thing I could say is that the kernel side has optimized
> >> inplace decompression compared to fuse so that it will reuse the
> >> same buffer for decompression but with a safe margin (according to
> >> the current lz4 decompression implementation). It shouldn't behave
> >> different just due to different CPUs. Let me find more clues
> >> later, also maybe we should introduce a way for users to turn off
> >> this if needed.
> >
> > Cool :)
> >
> > I'm comfortable changing and building my own custom kernel for this
> > specific laptop. Feel free to ask me to try out some patches.
>
> Thanks, I need to narrow down this issue:
>
> - First, could you apply the following diff to test if it's still
> reproducable?
>
> diff --git a/fs/erofs/decompressor.c b/fs/erofs/decompressor.c
> index 021be5feb1bc..40a306628e1a 100644
> --- a/fs/erofs/decompressor.c
> +++ b/fs/erofs/decompressor.c
> @@ -131,7 +131,7 @@ static void *z_erofs_lz4_handle_overlap(struct z_erofs_lz4_decompress_ctx *ctx,
>
> if (rq->inplace_io) {
> omargin = PAGE_ALIGN(ctx->oend) - ctx->oend;
> - if (rq->partial_decoding || !may_inplace ||
> + if (1 || rq->partial_decoding || !may_inplace ||
> omargin < LZ4_DECOMPRESS_INPLACE_MARGIN(rq->inputsize))
> goto docopy;

Yup, that fixes it.

The hash output is the same for 50 runs.

>
> - Could you share the full message about the output of `lscpu`?

Sure:

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 39 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Vendor ID: GenuineIntel
BIOS Vendor ID: Intel(R) Corporation
Model name: 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz
BIOS Model name: 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz None CPU
@ 3.0GHz
BIOS CPU family: 198
CPU family: 6
Model: 140
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 1
Stepping: 1
CPU(s) scaling MHz: 60%
CPU max MHz: 4800.0000
CPU min MHz: 400.0000
BogoMIPS: 5990.40
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mc
a cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss
ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art
arch_perfmon pebs bts rep_good nopl xtopology nonstop_
tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes6
4 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xt
pr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_dead
line_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowp
refetch cpuid_fault epb cat_l2 cdp_l2 ssbd ibrs ibpb st
ibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_
ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid
rdt_a avx512f avx512dq rdseed adx smap avx512ifma clfl
ushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl
xsaveopt xsavec xgetbv1 xsaves split_lock_detect dtherm
ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp
hwp_pkg_req vnmi avx512vbmi umip pku ospke avx512_vbmi
2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme av
x512_vpopcntdq rdpid movdiri movdir64b fsrm avx512_vp2i
ntersect md_clear ibt flush_l1d arch_capabilities
Virtualization features:
Virtualization: VT-x
Caches (sum of all):
L1d: 192 KiB (4 instances)
L1i: 128 KiB (4 instances)
L2: 5 MiB (4 instances)
L3: 12 MiB (1 instance)
NUMA:
NUMA node(s): 1
NUMA node0 CPU(s): 0-7
Vulnerabilities:
Gather data sampling: Vulnerable
Itlb multihit: Not affected
L1tf: Not affected
Mds: Not affected
Meltdown: Not affected
Mmio stale data: Not affected
Retbleed: Not affected
Spec rstack overflow: Not affected
Spec store bypass: Vulnerable
Spectre v1: Vulnerable: __user pointer sanitization and usercopy ba
rriers only; no swapgs barriers
Spectre v2: Vulnerable, IBPB: disabled, STIBP: disabled, PBRSB-eIBR
S: Vulnerable
Srbds: Not affected
Tsx async abort: Not affected


>
> Thanks,
> Gao Xiang

2023-12-05 08:39:39

by Gao Xiang

[permalink] [raw]
Subject: Re: Weird EROFS data corruption

Hi Juhyung,

On 2023/12/4 11:41, Juhyung Park wrote:

...
>
>>
>> - Could you share the full message about the output of `lscpu`?
>
> Sure:
>
> Architecture: x86_64
> CPU op-mode(s): 32-bit, 64-bit
> Address sizes: 39 bits physical, 48 bits virtual
> Byte Order: Little Endian
> CPU(s): 8
> On-line CPU(s) list: 0-7
> Vendor ID: GenuineIntel
> BIOS Vendor ID: Intel(R) Corporation
> Model name: 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz
> BIOS Model name: 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz None CPU
> @ 3.0GHz
> BIOS CPU family: 198
> CPU family: 6
> Model: 140
> Thread(s) per core: 2
> Core(s) per socket: 4
> Socket(s): 1
> Stepping: 1
> CPU(s) scaling MHz: 60%
> CPU max MHz: 4800.0000
> CPU min MHz: 400.0000
> BogoMIPS: 5990.40
> Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mc
> a cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss
> ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art
> arch_perfmon pebs bts rep_good nopl xtopology nonstop_
> tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes6
> 4 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xt
> pr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_dead
> line_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowp
> refetch cpuid_fault epb cat_l2 cdp_l2 ssbd ibrs ibpb st
> ibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_
> ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid
> rdt_a avx512f avx512dq rdseed adx smap avx512ifma clfl
> ushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl
> xsaveopt xsavec xgetbv1 xsaves split_lock_detect dtherm
> ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp
> hwp_pkg_req vnmi avx512vbmi umip pku ospke avx512_vbmi
> 2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme av
> x512_vpopcntdq rdpid movdiri movdir64b fsrm avx512_vp2i

Sigh, I've been thinking. Here FSRM is the most significant difference between
our environments, could you only try the following diff to see if there's any
difference anymore? (without the previous disable patch.)

diff --git a/arch/x86/lib/memmove_64.S b/arch/x86/lib/memmove_64.S
index 1b60ae81ecd8..1b52a913233c 100644
--- a/arch/x86/lib/memmove_64.S
+++ b/arch/x86/lib/memmove_64.S
@@ -41,9 +41,7 @@ SYM_FUNC_START(__memmove)
#define CHECK_LEN cmp $0x20, %rdx; jb 1f
#define MEMMOVE_BYTES movq %rdx, %rcx; rep movsb; RET
.Lmemmove_begin_forward:
- ALTERNATIVE_2 __stringify(CHECK_LEN), \
- __stringify(CHECK_LEN; MEMMOVE_BYTES), X86_FEATURE_ERMS, \
- __stringify(MEMMOVE_BYTES), X86_FEATURE_FSRM
+ CHECK_LEN

/*
* movsq instruction have many startup latency

Thanks,
Gao Xiang

2023-12-05 14:37:46

by Juhyung Park

[permalink] [raw]
Subject: Re: Weird EROFS data corruption

Hi Gao,

On Tue, Dec 5, 2023 at 4:32 PM Gao Xiang <[email protected]> wrote:
>
> Hi Juhyung,
>
> On 2023/12/4 11:41, Juhyung Park wrote:
>
> ...
> >
> >>
> >> - Could you share the full message about the output of `lscpu`?
> >
> > Sure:
> >
> > Architecture: x86_64
> > CPU op-mode(s): 32-bit, 64-bit
> > Address sizes: 39 bits physical, 48 bits virtual
> > Byte Order: Little Endian
> > CPU(s): 8
> > On-line CPU(s) list: 0-7
> > Vendor ID: GenuineIntel
> > BIOS Vendor ID: Intel(R) Corporation
> > Model name: 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz
> > BIOS Model name: 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz None CPU
> > @ 3.0GHz
> > BIOS CPU family: 198
> > CPU family: 6
> > Model: 140
> > Thread(s) per core: 2
> > Core(s) per socket: 4
> > Socket(s): 1
> > Stepping: 1
> > CPU(s) scaling MHz: 60%
> > CPU max MHz: 4800.0000
> > CPU min MHz: 400.0000
> > BogoMIPS: 5990.40
> > Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mc
> > a cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss
> > ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art
> > arch_perfmon pebs bts rep_good nopl xtopology nonstop_
> > tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes6
> > 4 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xt
> > pr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_dead
> > line_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowp
> > refetch cpuid_fault epb cat_l2 cdp_l2 ssbd ibrs ibpb st
> > ibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_
> > ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid
> > rdt_a avx512f avx512dq rdseed adx smap avx512ifma clfl
> > ushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl
> > xsaveopt xsavec xgetbv1 xsaves split_lock_detect dtherm
> > ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp
> > hwp_pkg_req vnmi avx512vbmi umip pku ospke avx512_vbmi
> > 2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme av
> > x512_vpopcntdq rdpid movdiri movdir64b fsrm avx512_vp2i
>
> Sigh, I've been thinking. Here FSRM is the most significant difference between
> our environments, could you only try the following diff to see if there's any
> difference anymore? (without the previous disable patch.)
>
> diff --git a/arch/x86/lib/memmove_64.S b/arch/x86/lib/memmove_64.S
> index 1b60ae81ecd8..1b52a913233c 100644
> --- a/arch/x86/lib/memmove_64.S
> +++ b/arch/x86/lib/memmove_64.S
> @@ -41,9 +41,7 @@ SYM_FUNC_START(__memmove)
> #define CHECK_LEN cmp $0x20, %rdx; jb 1f
> #define MEMMOVE_BYTES movq %rdx, %rcx; rep movsb; RET
> .Lmemmove_begin_forward:
> - ALTERNATIVE_2 __stringify(CHECK_LEN), \
> - __stringify(CHECK_LEN; MEMMOVE_BYTES), X86_FEATURE_ERMS, \
> - __stringify(MEMMOVE_BYTES), X86_FEATURE_FSRM
> + CHECK_LEN
>
> /*
> * movsq instruction have many startup latency

Yup, that also seems to fix it.
Are we looking at a potential memmove issue?

>
> Thanks,
> Gao Xiang

2023-12-05 14:37:49

by Gao Xiang

[permalink] [raw]
Subject: Re: Weird EROFS data corruption



On 2023/12/5 22:23, Juhyung Park wrote:
> Hi Gao,
>
> On Tue, Dec 5, 2023 at 4:32 PM Gao Xiang <[email protected]> wrote:
>>
>> Hi Juhyung,
>>
>> On 2023/12/4 11:41, Juhyung Park wrote:
>>
>> ...
>>>
>>>>
>>>> - Could you share the full message about the output of `lscpu`?
>>>
>>> Sure:
>>>
>>> Architecture: x86_64
>>> CPU op-mode(s): 32-bit, 64-bit
>>> Address sizes: 39 bits physical, 48 bits virtual
>>> Byte Order: Little Endian
>>> CPU(s): 8
>>> On-line CPU(s) list: 0-7
>>> Vendor ID: GenuineIntel
>>> BIOS Vendor ID: Intel(R) Corporation
>>> Model name: 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz
>>> BIOS Model name: 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz None CPU
>>> @ 3.0GHz
>>> BIOS CPU family: 198
>>> CPU family: 6
>>> Model: 140
>>> Thread(s) per core: 2
>>> Core(s) per socket: 4
>>> Socket(s): 1
>>> Stepping: 1
>>> CPU(s) scaling MHz: 60%
>>> CPU max MHz: 4800.0000
>>> CPU min MHz: 400.0000
>>> BogoMIPS: 5990.40
>>> Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mc
>>> a cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss
>>> ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art
>>> arch_perfmon pebs bts rep_good nopl xtopology nonstop_
>>> tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes6
>>> 4 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xt
>>> pr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_dead
>>> line_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowp
>>> refetch cpuid_fault epb cat_l2 cdp_l2 ssbd ibrs ibpb st
>>> ibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_
>>> ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid
>>> rdt_a avx512f avx512dq rdseed adx smap avx512ifma clfl
>>> ushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl
>>> xsaveopt xsavec xgetbv1 xsaves split_lock_detect dtherm
>>> ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp
>>> hwp_pkg_req vnmi avx512vbmi umip pku ospke avx512_vbmi
>>> 2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme av
>>> x512_vpopcntdq rdpid movdiri movdir64b fsrm avx512_vp2i
>>
>> Sigh, I've been thinking. Here FSRM is the most significant difference between
>> our environments, could you only try the following diff to see if there's any
>> difference anymore? (without the previous disable patch.)
>>
>> diff --git a/arch/x86/lib/memmove_64.S b/arch/x86/lib/memmove_64.S
>> index 1b60ae81ecd8..1b52a913233c 100644
>> --- a/arch/x86/lib/memmove_64.S
>> +++ b/arch/x86/lib/memmove_64.S
>> @@ -41,9 +41,7 @@ SYM_FUNC_START(__memmove)
>> #define CHECK_LEN cmp $0x20, %rdx; jb 1f
>> #define MEMMOVE_BYTES movq %rdx, %rcx; rep movsb; RET
>> .Lmemmove_begin_forward:
>> - ALTERNATIVE_2 __stringify(CHECK_LEN), \
>> - __stringify(CHECK_LEN; MEMMOVE_BYTES), X86_FEATURE_ERMS, \
>> - __stringify(MEMMOVE_BYTES), X86_FEATURE_FSRM
>> + CHECK_LEN
>>
>> /*
>> * movsq instruction have many startup latency
>
> Yup, that also seems to fix it.
> Are we looking at a potential memmove issue?

I'm still analyzing this behavior as well as the root cause and
I will also try to get a recent cloud server with FSRM myself
to find more clues.

Thanks,
Gao Xiang

2023-12-05 16:42:40

by Juhyung Park

[permalink] [raw]
Subject: Re: Weird EROFS data corruption

On Tue, Dec 5, 2023 at 11:34 PM Gao Xiang <[email protected]> wrote:
>
>
>
> On 2023/12/5 22:23, Juhyung Park wrote:
> > Hi Gao,
> >
> > On Tue, Dec 5, 2023 at 4:32 PM Gao Xiang <[email protected]> wrote:
> >>
> >> Hi Juhyung,
> >>
> >> On 2023/12/4 11:41, Juhyung Park wrote:
> >>
> >> ...
> >>>
> >>>>
> >>>> - Could you share the full message about the output of `lscpu`?
> >>>
> >>> Sure:
> >>>
> >>> Architecture: x86_64
> >>> CPU op-mode(s): 32-bit, 64-bit
> >>> Address sizes: 39 bits physical, 48 bits virtual
> >>> Byte Order: Little Endian
> >>> CPU(s): 8
> >>> On-line CPU(s) list: 0-7
> >>> Vendor ID: GenuineIntel
> >>> BIOS Vendor ID: Intel(R) Corporation
> >>> Model name: 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz
> >>> BIOS Model name: 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz None CPU
> >>> @ 3.0GHz
> >>> BIOS CPU family: 198
> >>> CPU family: 6
> >>> Model: 140
> >>> Thread(s) per core: 2
> >>> Core(s) per socket: 4
> >>> Socket(s): 1
> >>> Stepping: 1
> >>> CPU(s) scaling MHz: 60%
> >>> CPU max MHz: 4800.0000
> >>> CPU min MHz: 400.0000
> >>> BogoMIPS: 5990.40
> >>> Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mc
> >>> a cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss
> >>> ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art
> >>> arch_perfmon pebs bts rep_good nopl xtopology nonstop_
> >>> tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes6
> >>> 4 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xt
> >>> pr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_dead
> >>> line_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowp
> >>> refetch cpuid_fault epb cat_l2 cdp_l2 ssbd ibrs ibpb st
> >>> ibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_
> >>> ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid
> >>> rdt_a avx512f avx512dq rdseed adx smap avx512ifma clfl
> >>> ushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl
> >>> xsaveopt xsavec xgetbv1 xsaves split_lock_detect dtherm
> >>> ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp
> >>> hwp_pkg_req vnmi avx512vbmi umip pku ospke avx512_vbmi
> >>> 2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme av
> >>> x512_vpopcntdq rdpid movdiri movdir64b fsrm avx512_vp2i
> >>
> >> Sigh, I've been thinking. Here FSRM is the most significant difference between
> >> our environments, could you only try the following diff to see if there's any
> >> difference anymore? (without the previous disable patch.)
> >>
> >> diff --git a/arch/x86/lib/memmove_64.S b/arch/x86/lib/memmove_64.S
> >> index 1b60ae81ecd8..1b52a913233c 100644
> >> --- a/arch/x86/lib/memmove_64.S
> >> +++ b/arch/x86/lib/memmove_64.S
> >> @@ -41,9 +41,7 @@ SYM_FUNC_START(__memmove)
> >> #define CHECK_LEN cmp $0x20, %rdx; jb 1f
> >> #define MEMMOVE_BYTES movq %rdx, %rcx; rep movsb; RET
> >> .Lmemmove_begin_forward:
> >> - ALTERNATIVE_2 __stringify(CHECK_LEN), \
> >> - __stringify(CHECK_LEN; MEMMOVE_BYTES), X86_FEATURE_ERMS, \
> >> - __stringify(MEMMOVE_BYTES), X86_FEATURE_FSRM
> >> + CHECK_LEN
> >>
> >> /*
> >> * movsq instruction have many startup latency
> >
> > Yup, that also seems to fix it.
> > Are we looking at a potential memmove issue?
>
> I'm still analyzing this behavior as well as the root cause and
> I will also try to get a recent cloud server with FSRM myself
> to find more clues.

Down the rabbit hole we go...

Let me know if you have trouble getting an instance with FSRM. I'll
see what I can do.

>
> Thanks,
> Gao Xiang

2023-12-06 04:37:51

by Gao Xiang

[permalink] [raw]
Subject: Re: Weird EROFS data corruption

Hi Juhyung,

On 2023/12/5 22:43, Juhyung Park wrote:
> On Tue, Dec 5, 2023 at 11:34 PM Gao Xiang <[email protected]> wrote:
>>

...

>>
>> I'm still analyzing this behavior as well as the root cause and
>> I will also try to get a recent cloud server with FSRM myself
>> to find more clues.
>
> Down the rabbit hole we go...
>
> Let me know if you have trouble getting an instance with FSRM. I'll
> see what I can do.

I've sent out a fix to address this, please help check:
https://lore.kernel.org/r/[email protected]

Thanks,
Gao Xiang