Hi,
On 31. 07. 22, 23:43, Linus Torvalds wrote:
> So here we are, one week late, and 5.19 is tagged and pushed out.
>
> The full shortlog (just from rc8, obviously not all of 5.19) is below,
> but I can happily report that there is nothing really interesting in
> there. A lot of random small stuff.
Note: I originally reported this downstream for tracking at:
https://bugzilla.suse.com/show_bug.cgi?id=1202203
5.19 behaves pretty weird in openSUSE's openQA (opposing to 5.18, or
5.18.15). It's all qemu-kvm "HW"¹⁾:
https://openqa.opensuse.org/tests/2502148
loop2: detected capacity change from 0 to 72264
EXT4-fs warning (device zram0): ext4_end_bio:343: I/O error 10 writing
to inode 57375 starting block 137216)
Buffer I/O error on device zram0, logical block 137216
Buffer I/O error on device zram0, logical block 137217
...
SQUASHFS error: xz decompression failed, data probably corrupt
SQUASHFS error: Failed to read block 0x2e41680: -5
SQUASHFS error: xz decompression failed, data probably corrupt
SQUASHFS error: Failed to read block 0x2e41680: -5
Bus error
https://openqa.opensuse.org/tests/2502145
FS-Cache: Loaded
begin 644 ldconfig.core.pid_2094.sig_7.time_1659859442
https://openqa.opensuse.org/tests/2502146
FS-Cache: Loaded
begin 644 Xorg.bin.core.pid_3733.sig_6.time_1659858784
https://openqa.opensuse.org/tests/2502148
EXT4-fs warning (device zram0): ext4_end_bio:343: I/O error 10 writing
to inode 57375 starting block 137216)
Buffer I/O error on device zram0, logical block 137216
Buffer I/O error on device zram0, logical block 137217
https://openqa.opensuse.org/tests/2502154
[ 13.158090][ T634] FS-Cache: Loaded
...
[ 525.627024][ C0] sysrq: Show State
Those are various failures -- crashes of ldconfig, Xorg; I/O failures on
zram; the last one is a lockup likely, something invoked sysrq after
500s stall.
Interestingly, I've also hit this twice locally:
> init[1]: segfault at 18 ip 00007fb6154b4c81 sp 00007ffc243ed600 error
6 in libc.so.6[7fb61543f000+185000]
> Code: 41 5f c3 66 0f 1f 44 00 00 42 f6 44 10 08 01 0f 84 04 01 00 00
48 83 e1 fe 48 89 48 08 49 8b 47 70 49 89 5f 70 66 48 0f 6e c0 <48> 89
58 18 0f 16 44 24 08 48 81 fd ff 03 00 00 76 08 66 0f ef c9
> *** signal 11 ***
> malloc(): unsorted double linked list corrupted
> traps: init[1] general protection fault ip:7fb61543f8b9
sp:7ffc243ebf40 error:0 in libc.so.6[7fb61543f000+185000]
> Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> CPU: 0 PID: 1 Comm: init Not tainted 5.19.0-1-default #1 openSUSE
Tumbleweed e1df13166a33f423514290c702e43cfbb2b5b575
KASAN is not helpful either, so it's unlikely a memory corruption
(unless it is "HW" related; should I try to turn on IOMMU in qemu?):
> kasan: KernelAddressSanitizer initialized
> ...
> zram: module verification failed: signature and/or required key missing - tainting kernel
> zram: Added device: zram0
> zram0: detected capacity change from 0 to 2097152
> EXT4-fs (zram0): mounting ext2 file system using the ext4 subsystem
> EXT4-fs (zram0): mounted filesystem without journal. Quota mode: none.
> EXT4-fs warning (device zram0): ext4_end_bio:343: I/O error 10 writing to inode 16386 starting block 159744)
> Buffer I/O error on device zram0, logical block 159744
> Buffer I/O error on device zram0, logical block 159745
They all occur to me like a zram failure. The installer apparently
creates an ext2 FS and after it mounts it using ext4 module, the issue
starts occurring.
Any tests I/you could run on 5.19 to exercise zram and ext2? Otherwise I
am unable to reproduce easily, except using the openSUSE installer :/.
Any other ideas? Or is this known already?
¹⁾ main are uefi boot and virtio-blk (it likely happens with virtio-scsi
too). The cmdline _I_ use: qemu-kvm -device intel-hda -device hda-duplex
-drive file=/tmp/pokus.qcow2,if=none,id=hd -device
virtio-blk-pci,drive=hd -drive
if=pflash,format=raw,unit=0,readonly=on,file=/usr/share/qemu/ovmf-x86_64-opensuse-code.bin
-drive if=pflash,format=raw,unit=1,file=/tmp/vars.bin -cdrom
/tmp/cd1.iso -m 1G -smp 1 -net user -net nic,model=virtio -serial pty
-device virtio-rng-pci -device qemu-xhci,p2=4,p3=4 -usbdevice tablet
thanks,
--
js
suse labs
On 09. 08. 22, 8:03, Jiri Slaby wrote:
> Hi,
>
> On 31. 07. 22, 23:43, Linus Torvalds wrote:
>> So here we are, one week late, and 5.19 is tagged and pushed out.
>>
>> The full shortlog (just from rc8, obviously not all of 5.19) is below,
>> but I can happily report that there is nothing really interesting in
>> there. A lot of random small stuff.
>
> Note: I originally reported this downstream for tracking at:
> https://bugzilla.suse.com/show_bug.cgi?id=1202203
>
> 5.19 behaves pretty weird in openSUSE's openQA (opposing to 5.18, or
> 5.18.15). It's all qemu-kvm "HW"¹⁾:
> https://openqa.opensuse.org/tests/2502148
> loop2: detected capacity change from 0 to 72264
> EXT4-fs warning (device zram0): ext4_end_bio:343: I/O error 10 writing
> to inode 57375 starting block 137216)
> Buffer I/O error on device zram0, logical block 137216
> Buffer I/O error on device zram0, logical block 137217
> ...
> SQUASHFS error: xz decompression failed, data probably corrupt
> SQUASHFS error: Failed to read block 0x2e41680: -5
> SQUASHFS error: xz decompression failed, data probably corrupt
> SQUASHFS error: Failed to read block 0x2e41680: -5
> Bus error
>
>
>
> https://openqa.opensuse.org/tests/2502145
> FS-Cache: Loaded
> begin 644 ldconfig.core.pid_2094.sig_7.time_1659859442
>
>
>
> https://openqa.opensuse.org/tests/2502146
> FS-Cache: Loaded
> begin 644 Xorg.bin.core.pid_3733.sig_6.time_1659858784
>
>
>
> https://openqa.opensuse.org/tests/2502148
> EXT4-fs warning (device zram0): ext4_end_bio:343: I/O error 10 writing
> to inode 57375 starting block 137216)
> Buffer I/O error on device zram0, logical block 137216
> Buffer I/O error on device zram0, logical block 137217
>
>
>
> https://openqa.opensuse.org/tests/2502154
> [ 13.158090][ T634] FS-Cache: Loaded
> ...
> [ 525.627024][ C0] sysrq: Show State
>
>
>
> Those are various failures -- crashes of ldconfig, Xorg; I/O failures on
> zram; the last one is a lockup likely, something invoked sysrq after
> 500s stall.
>
> Interestingly, I've also hit this twice locally:
> > init[1]: segfault at 18 ip 00007fb6154b4c81 sp 00007ffc243ed600 error
> 6 in libc.so.6[7fb61543f000+185000]
> > Code: 41 5f c3 66 0f 1f 44 00 00 42 f6 44 10 08 01 0f 84 04 01 00 00
> 48 83 e1 fe 48 89 48 08 49 8b 47 70 49 89 5f 70 66 48 0f 6e c0 <48> 89
> 58 18 0f 16 44 24 08 48 81 fd ff 03 00 00 76 08 66 0f ef c9
> > *** signal 11 ***
> > malloc(): unsorted double linked list corrupted
> > traps: init[1] general protection fault ip:7fb61543f8b9
> sp:7ffc243ebf40 error:0 in libc.so.6[7fb61543f000+185000]
> > Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> > CPU: 0 PID: 1 Comm: init Not tainted 5.19.0-1-default #1 openSUSE
> Tumbleweed e1df13166a33f423514290c702e43cfbb2b5b575
>
> KASAN is not helpful either, so it's unlikely a memory corruption
> (unless it is "HW" related; should I try to turn on IOMMU in qemu?):
>> kasan: KernelAddressSanitizer initialized
>> ...
>> zram: module verification failed: signature and/or required key
>> missing - tainting kernel
>> zram: Added device: zram0
>> zram0: detected capacity change from 0 to 2097152
>> EXT4-fs (zram0): mounting ext2 file system using the ext4 subsystem
>> EXT4-fs (zram0): mounted filesystem without journal. Quota mode: none.
>> EXT4-fs warning (device zram0): ext4_end_bio:343: I/O error 10 writing
>> to inode 16386 starting block 159744)
>> Buffer I/O error on device zram0, logical block 159744
>> Buffer I/O error on device zram0, logical block 159745
>
>
>
> They all occur to me like a zram failure. The installer apparently
> creates an ext2 FS and after it mounts it using ext4 module, the issue
> starts occurring.
>
> Any tests I/you could run on 5.19 to exercise zram and ext2? Otherwise I
> am unable to reproduce easily, except using the openSUSE installer :/.
Ah, now I can. It's easy when one lowers memory available to qemu. -m
800M in this case:
echo $((1000*1024*1024)) > /sys/block/zram0/disksize
mkfs.ext2 /dev/zram0
mount /dev/zram0 /mnt/a/
dd if=/dev/urandom of=/mnt/a/stuff
[ 200.334277][ T8] EXT4-fs warning (device zram0): ext4_end_bio:343:
I/O error 10 writing to inode 12 starting block 8192)
[ 200.340198][ T8] Buffer I/O error on device zram0, logical block 8192
So currently, I blame:
commit e7be8d1dd983156bbdd22c0319b71119a8fbb697
Author: Alexey Romanov <[email protected]>
Date: Thu May 12 20:23:07 2022 -0700
zram: remove double compression logic
/me needs to confirm.
> Any other ideas? Or is this known already?
>
> ¹⁾ main are uefi boot and virtio-blk (it likely happens with virtio-scsi
> too). The cmdline _I_ use: qemu-kvm -device intel-hda -device hda-duplex
> -drive file=/tmp/pokus.qcow2,if=none,id=hd -device
> virtio-blk-pci,drive=hd -drive
> if=pflash,format=raw,unit=0,readonly=on,file=/usr/share/qemu/ovmf-x86_64-opensuse-code.bin -drive if=pflash,format=raw,unit=1,file=/tmp/vars.bin -cdrom /tmp/cd1.iso -m 1G -smp 1 -net user -net nic,model=virtio -serial pty -device virtio-rng-pci -device qemu-xhci,p2=4,p3=4 -usbdevice tablet
>
>
> thanks,
--
js
suse labs
On 09. 08. 22, 9:59, Jiri Slaby wrote:
> Ah, now I can. It's easy when one lowers memory available to qemu. -m
> 800M in this case:
> echo $((1000*1024*1024)) > /sys/block/zram0/disksize
> mkfs.ext2 /dev/zram0
> mount /dev/zram0 /mnt/a/
> dd if=/dev/urandom of=/mnt/a/stuff
> [ 200.334277][ T8] EXT4-fs warning (device zram0): ext4_end_bio:343:
> I/O error 10 writing to inode 12 starting block 8192)
> [ 200.340198][ T8] Buffer I/O error on device zram0, logical block 8192
>
>
> So currently, I blame:
> commit e7be8d1dd983156bbdd22c0319b71119a8fbb697
> Author: Alexey Romanov <[email protected]>
> Date: Thu May 12 20:23:07 2022 -0700
>
> zram: remove double compression logic
>
>
> /me needs to confirm.
With that commit reverted, I see no more I/O errors, only oom-killer
messages (which is OK IMO, provided I write 1G of urandom on a machine
w/ 800M of RAM):
[ 30.424603][ T728] dd invoked oom-killer:
gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0
Now let me submit it to openQA too...
thanks,
--
js
suse labs
On (22/08/09 10:12), Jiri Slaby wrote:
> > So currently, I blame:
> > commit e7be8d1dd983156bbdd22c0319b71119a8fbb697
> > Author: Alexey Romanov <[email protected]>
> > Date:?? Thu May 12 20:23:07 2022 -0700
> >
> > ??? zram: remove double compression logic
> >
> >
> > /me needs to confirm.
>
> With that commit reverted, I see no more I/O errors, only oom-killer
> messages (which is OK IMO, provided I write 1G of urandom on a machine w/
> 800M of RAM):
Hmm... So handle allocation always succeeds in the slow path? (when we
try to allocate it second time)
On Tue, Aug 09, 2022 at 08:03:11AM +0200, Jiri Slaby wrote:
> Hi,
>
> On 31. 07. 22, 23:43, Linus Torvalds wrote:
> > So here we are, one week late, and 5.19 is tagged and pushed out.
> >
> > The full shortlog (just from rc8, obviously not all of 5.19) is below,
> > but I can happily report that there is nothing really interesting in
> > there. A lot of random small stuff.
>
> Note: I originally reported this downstream for tracking at:
> https://bugzilla.suse.com/show_bug.cgi?id=1202203
>
> 5.19 behaves pretty weird in openSUSE's openQA (opposing to 5.18, or
> 5.18.15). It's all qemu-kvm "HW"¹⁾:
> https://openqa.opensuse.org/tests/2502148
> loop2: detected capacity change from 0 to 72264
> EXT4-fs warning (device zram0): ext4_end_bio:343: I/O error 10 writing to
> inode 57375 starting block 137216)
> Buffer I/O error on device zram0, logical block 137216
> Buffer I/O error on device zram0, logical block 137217
> ...
> SQUASHFS error: xz decompression failed, data probably corrupt
> SQUASHFS error: Failed to read block 0x2e41680: -5
> SQUASHFS error: xz decompression failed, data probably corrupt
> SQUASHFS error: Failed to read block 0x2e41680: -5
> Bus error
>
>
>
> https://openqa.opensuse.org/tests/2502145
> FS-Cache: Loaded
> begin 644 ldconfig.core.pid_2094.sig_7.time_1659859442
>
>
>
> https://openqa.opensuse.org/tests/2502146
> FS-Cache: Loaded
> begin 644 Xorg.bin.core.pid_3733.sig_6.time_1659858784
>
>
>
> https://openqa.opensuse.org/tests/2502148
> EXT4-fs warning (device zram0): ext4_end_bio:343: I/O error 10 writing to
> inode 57375 starting block 137216)
> Buffer I/O error on device zram0, logical block 137216
> Buffer I/O error on device zram0, logical block 137217
>
>
>
> https://openqa.opensuse.org/tests/2502154
> [ 13.158090][ T634] FS-Cache: Loaded
> ...
> [ 525.627024][ C0] sysrq: Show State
>
>
>
> Those are various failures -- crashes of ldconfig, Xorg; I/O failures on
> zram; the last one is a lockup likely, something invoked sysrq after 500s
> stall.
>
> Interestingly, I've also hit this twice locally:
> > init[1]: segfault at 18 ip 00007fb6154b4c81 sp 00007ffc243ed600 error 6 in
> libc.so.6[7fb61543f000+185000]
> > Code: 41 5f c3 66 0f 1f 44 00 00 42 f6 44 10 08 01 0f 84 04 01 00 00 48 83
> e1 fe 48 89 48 08 49 8b 47 70 49 89 5f 70 66 48 0f 6e c0 <48> 89 58 18 0f 16
> 44 24 08 48 81 fd ff 03 00 00 76 08 66 0f ef c9
> > *** signal 11 ***
> > malloc(): unsorted double linked list corrupted
> > traps: init[1] general protection fault ip:7fb61543f8b9 sp:7ffc243ebf40
> error:0 in libc.so.6[7fb61543f000+185000]
> > Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> > CPU: 0 PID: 1 Comm: init Not tainted 5.19.0-1-default #1 openSUSE
> Tumbleweed e1df13166a33f423514290c702e43cfbb2b5b575
>
> KASAN is not helpful either, so it's unlikely a memory corruption (unless it
> is "HW" related; should I try to turn on IOMMU in qemu?):
> > kasan: KernelAddressSanitizer initialized
> > ...
> > zram: module verification failed: signature and/or required key missing - tainting kernel
> > zram: Added device: zram0
> > zram0: detected capacity change from 0 to 2097152
> > EXT4-fs (zram0): mounting ext2 file system using the ext4 subsystem
> > EXT4-fs (zram0): mounted filesystem without journal. Quota mode: none.
> > EXT4-fs warning (device zram0): ext4_end_bio:343: I/O error 10 writing to inode 16386 starting block 159744)
> > Buffer I/O error on device zram0, logical block 159744
> > Buffer I/O error on device zram0, logical block 159745
>
>
>
> They all occur to me like a zram failure. The installer apparently creates
> an ext2 FS and after it mounts it using ext4 module, the issue starts
> occurring.
>
> Any tests I/you could run on 5.19 to exercise zram and ext2? Otherwise I am
> unable to reproduce easily, except using the openSUSE installer :/.
Hi Jiri,
I've tried a quick xfstests run on ext2 on zram and I can't see any
issues like this so far. I will run a full test and report back in case
there is anything obvious.
-Lukas
>
> Any other ideas? Or is this known already?
>
> ¹⁾ main are uefi boot and virtio-blk (it likely happens with virtio-scsi
> too). The cmdline _I_ use: qemu-kvm -device intel-hda -device hda-duplex
> -drive file=/tmp/pokus.qcow2,if=none,id=hd -device virtio-blk-pci,drive=hd
> -drive if=pflash,format=raw,unit=0,readonly=on,file=/usr/share/qemu/ovmf-x86_64-opensuse-code.bin
> -drive if=pflash,format=raw,unit=1,file=/tmp/vars.bin -cdrom /tmp/cd1.iso
> -m 1G -smp 1 -net user -net nic,model=virtio -serial pty -device
> virtio-rng-pci -device qemu-xhci,p2=4,p3=4 -usbdevice tablet
>
>
> thanks,
> --
> js
> suse labs
>
On (22/08/09 11:12), Lukas Czerner wrote:
> Hi Jiri,
>
> I've tried a quick xfstests run on ext2 on zram and I can't see any
> issues like this so far. I will run a full test and report back in case
> there is anything obvious.
AFAICT this should be visible only when we are under memory pressure,
so that direct reclaim from zs_malloc handle allocation call makes a
difference.
On (22/08/09 17:43), Sergey Senozhatsky wrote:
> On (22/08/09 10:12), Jiri Slaby wrote:
> > > So currently, I blame:
> > > commit e7be8d1dd983156bbdd22c0319b71119a8fbb697
> > > Author: Alexey Romanov <[email protected]>
> > > Date:?? Thu May 12 20:23:07 2022 -0700
> > >
> > > ??? zram: remove double compression logic
> > >
> > >
> > > /me needs to confirm.
> >
> > With that commit reverted, I see no more I/O errors, only oom-killer
> > messages (which is OK IMO, provided I write 1G of urandom on a machine w/
> > 800M of RAM):
>
> Hmm... So handle allocation always succeeds in the slow path? (when we
> try to allocate it second time)
Yeah I can see how handle re-allocation with direct reclaim can make it more
successful, but in exchange it oom-kills some user-space process, I suppose.
Is oom-kill really a good alternative though?
On (22/08/09 18:11), Sergey Senozhatsky wrote:
> > > > /me needs to confirm.
> > >
> > > With that commit reverted, I see no more I/O errors, only oom-killer
> > > messages (which is OK IMO, provided I write 1G of urandom on a machine w/
> > > 800M of RAM):
> >
> > Hmm... So handle allocation always succeeds in the slow path? (when we
> > try to allocate it second time)
>
> Yeah I can see how handle re-allocation with direct reclaim can make it more
> successful, but in exchange it oom-kills some user-space process, I suppose.
> Is oom-kill really a good alternative though?
We likely will need to revert e7be8d1dd983 given that it has some
user visible changes. But, honestly, failing zram write vs oom-kill
a user-space is a tough choice.
On Tue, Aug 09, 2022 at 06:15:37PM +0900, Sergey Senozhatsky wrote:
> On (22/08/09 11:12), Lukas Czerner wrote:
> > Hi Jiri,
> >
> > I've tried a quick xfstests run on ext2 on zram and I can't see any
> > issues like this so far. I will run a full test and report back in case
> > there is anything obvious.
>
> AFAICT this should be visible only when we are under memory pressure,
> so that direct reclaim from zs_malloc handle allocation call makes a
> difference.
>
True, I haven't seen the other email from Jiri, sorry about that. I can
confirm that under memory pressure it is in fact reproducible with
xfstests and also I can confirm that reverting
e7be8d1dd983156bbdd22c0319b71119a8fbb697 makes it go away.
But Jiri has a better repro already anyway.
Thanks!
-Lukas
Hello Sergey,
On Tue, Aug 09, 2022 at 06:20:04PM +0900, Sergey Senozhatsky wrote:
> On (22/08/09 18:11), Sergey Senozhatsky wrote:
> > > > > /me needs to confirm.
> > > >
> > > > With that commit reverted, I see no more I/O errors, only oom-killer
> > > > messages (which is OK IMO, provided I write 1G of urandom on a machine w/
> > > > 800M of RAM):
> > >
> > > Hmm... So handle allocation always succeeds in the slow path? (when we
> > > try to allocate it second time)
> >
> > Yeah I can see how handle re-allocation with direct reclaim can make it more
> > successful, but in exchange it oom-kills some user-space process, I suppose.
> > Is oom-kill really a good alternative though?
>
> We likely will need to revert e7be8d1dd983 given that it has some
> user visible changes. But, honestly, failing zram write vs oom-kill
> a user-space is a tough choice.
I think oom-kill is an inevitable escape from low memory situation if we
don't solve original problem with high memory consumption in the user
setup. Reclaim-based zram slow path just delays oom if memory eating
root cause is not resolved.
I totally agree with you that all patches which have visible user
degradations should be reverted, but maybe this is more user setup
problem, what do you think?
If you make the decision to revert slow path removal patch, I would
prefer to review the original patch with unneeded code removal again
if you don't mind:
https://lore.kernel.org/linux-block/[email protected]/
--
Thank you,
Dmitry
Hi,
On (22/08/09 10:20), Dmitry Rokosov wrote:
> I think oom-kill is an inevitable escape from low memory situation if we
> don't solve original problem with high memory consumption in the user
> setup. Reclaim-based zram slow path just delays oom if memory eating
> root cause is not resolved.
>
> I totally agree with you that all patches which have visible user
> degradations should be reverted, but maybe this is more user setup
> problem, what do you think?
I'd go with the revert.
Jiri, are you going to send the revert patch or shall I handle it?
> If you make the decision to revert slow path removal patch, I would
> prefer to review the original patch with unneeded code removal again
> if you don't mind:
> https://lore.kernel.org/linux-block/[email protected]/
Sure, we can return to it after the merge window.
On 09. 08. 22, 11:20, Sergey Senozhatsky wrote:
> On (22/08/09 18:11), Sergey Senozhatsky wrote:
>>>>> /me needs to confirm.
>>>>
>>>> With that commit reverted, I see no more I/O errors, only oom-killer
>>>> messages (which is OK IMO, provided I write 1G of urandom on a machine w/
>>>> 800M of RAM):
>>>
>>> Hmm... So handle allocation always succeeds in the slow path? (when we
>>> try to allocate it second time)
>>
>> Yeah I can see how handle re-allocation with direct reclaim can make it more
>> successful, but in exchange it oom-kills some user-space process, I suppose.
>> Is oom-kill really a good alternative though?
>
> We likely will need to revert e7be8d1dd983 given that it has some
> user visible changes. But, honestly, failing zram write vs oom-kill
> a user-space is a tough choice.
Note that it OOMs only in my use case -- it's obviously too large zram
on too low memory machine.
But the installer is different. It just creates memory pressure, yet,
reclaim works well and is able to find memory and go on. I would say
atomic vs non-atomic retry in the original (pre-5.19) approach makes the
difference.
And yes, we should likely increase the memory in openQA to avoid too
many reclaims...
PS the kernel finished building, now images are built, hence the new
openQA run hasn't started yet. I will send the revert when it's complete
and all green.
thanks,
--
js
suse labs
On 09. 08. 22, 14:35, Jiri Slaby wrote:
> But the installer is different. It just creates memory pressure, yet,
> reclaim works well and is able to find memory and go on. I would say
> atomic vs non-atomic retry in the original (pre-5.19) approach makes the
> difference.
Sorry, I meant no-direct-reclaim (5.19) vs direct-reclaim (pre-5.19).
--
js
suse labs
On (22/08/09 14:45), Jiri Slaby wrote:
> On 09. 08. 22, 14:35, Jiri Slaby wrote:
> > But the installer is different. It just creates memory pressure, yet,
> > reclaim works well and is able to find memory and go on. I would say
> > atomic vs non-atomic retry in the original (pre-5.19) approach makes the
> > difference.
>
> Sorry, I meant no-direct-reclaim (5.19) vs direct-reclaim (pre-5.19).
Sure, I understood.
This somehow makes me scratch my head and ask if we really want to
continue using per-CPU steams. We previously (many years ago) had
a list of idle compression streams, which would do compression in
preemptible context and we would have only one zs_malloc handle
allocation path, which would do direct reclaim (when needed)
On (22/08/09 21:57), Sergey Senozhatsky wrote:
> This somehow makes me scratch my head and ask if we really want to
> continue using per-CPU steams. We previously (many years ago) had
> a list of idle compression streams, which would do compression in
> preemptible context and we would have only one zs_malloc handle
> allocation path, which would do direct reclaim (when needed)
Scratch that, I take it back. Sorry for the noise.
Hi Sergey,
On Tue, Aug 09, 2022 at 08:53:36PM +0900, Sergey Senozhatsky wrote:
> > If you make the decision to revert slow path removal patch, I would
> > prefer to review the original patch with unneeded code removal again
> > if you don't mind:
> > https://lore.kernel.org/linux-block/[email protected]/
>
> Sure, we can return to it after the merge window.
In that case, do I need to send my original patch again?
--
Thank you,
Alexey
On (22/08/09 13:15), Aleksey Romanov wrote:
> On Tue, Aug 09, 2022 at 08:53:36PM +0900, Sergey Senozhatsky wrote:
> > > If you make the decision to revert slow path removal patch, I would
> > > prefer to review the original patch with unneeded code removal again
> > > if you don't mind:
> > > https://lore.kernel.org/linux-block/[email protected]/
> >
> > Sure, we can return to it after the merge window.
>
> In that case, do I need to send my original patch again?
Would be nice, since the patch needs rebasing (due to zsmalloc PTR_ERR changes)
This reverts commit e7be8d1dd983156bbdd22c0319b71119a8fbb697 as it
causes zram failures. It does not revert cleanly, PTR_ERR handling was
introduced in the meantime. This is handled by appropriate IS_ERR.
When under memory pressure, zs_malloc() can fail. Before the above
commit, the allocation was retried with direct reclaim enabled
(GFP_NOIO). After the commit, it is not -- only __GFP_KSWAPD_RECLAIM is
tried.
So when the failure occurs under memory pressure, the overlaying
filesystem such as ext2 (mounted by ext4 module in this case) can emit
failures, making the (file)system unusable:
EXT4-fs warning (device zram0): ext4_end_bio:343: I/O error 10 writing to inode 16386 starting block 159744)
Buffer I/O error on device zram0, logical block 159744
With direct reclaim, memory is really reclaimed and allocation succeeds,
eventually. In the worst case, the oom killer is invoked, which is
proper outcome if user sets up zram too large (in comparison to
available RAM).
This very diff doesn't apply to 5.19 (stable) cleanly (see PTR_ERR note
above). Use revert of e7be8d1dd983 directly.
Link: https://bugzilla.suse.com/show_bug.cgi?id=1202203
Fixes: e7be8d1dd983 ("zram: remove double compression logic")
Cc: [email protected] # 5.19
Cc: Minchan Kim <[email protected]>
Cc: Nitin Gupta <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: Alexey Romanov <[email protected]>
Cc: Dmitry Rokosov <[email protected]>
Cc: Lukas Czerner <[email protected]>
Cc: Ext4 Developers List <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
drivers/block/zram/zram_drv.c | 42 ++++++++++++++++++++++++++---------
drivers/block/zram/zram_drv.h | 1 +
2 files changed, 33 insertions(+), 10 deletions(-)
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 92cb929a45b7..226ea76cc819 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -1146,14 +1146,15 @@ static ssize_t bd_stat_show(struct device *dev,
static ssize_t debug_stat_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
- int version = 2;
+ int version = 1;
struct zram *zram = dev_to_zram(dev);
ssize_t ret;
down_read(&zram->init_lock);
ret = scnprintf(buf, PAGE_SIZE,
- "version: %d\n%8llu\n",
+ "version: %d\n%8llu %8llu\n",
version,
+ (u64)atomic64_read(&zram->stats.writestall),
(u64)atomic64_read(&zram->stats.miss_free));
up_read(&zram->init_lock);
@@ -1351,7 +1352,7 @@ static int __zram_bvec_write(struct zram *zram, struct bio_vec *bvec,
{
int ret = 0;
unsigned long alloced_pages;
- unsigned long handle = 0;
+ unsigned long handle = -ENOMEM;
unsigned int comp_len = 0;
void *src, *dst, *mem;
struct zcomp_strm *zstrm;
@@ -1369,6 +1370,7 @@ static int __zram_bvec_write(struct zram *zram, struct bio_vec *bvec,
}
kunmap_atomic(mem);
+compress_again:
zstrm = zcomp_stream_get(zram->comp);
src = kmap_atomic(page);
ret = zcomp_compress(zstrm, src, &comp_len);
@@ -1377,20 +1379,39 @@ static int __zram_bvec_write(struct zram *zram, struct bio_vec *bvec,
if (unlikely(ret)) {
zcomp_stream_put(zram->comp);
pr_err("Compression failed! err=%d\n", ret);
+ zs_free(zram->mem_pool, handle);
return ret;
}
if (comp_len >= huge_class_size)
comp_len = PAGE_SIZE;
-
- handle = zs_malloc(zram->mem_pool, comp_len,
- __GFP_KSWAPD_RECLAIM |
- __GFP_NOWARN |
- __GFP_HIGHMEM |
- __GFP_MOVABLE);
-
+ /*
+ * handle allocation has 2 paths:
+ * a) fast path is executed with preemption disabled (for
+ * per-cpu streams) and has __GFP_DIRECT_RECLAIM bit clear,
+ * since we can't sleep;
+ * b) slow path enables preemption and attempts to allocate
+ * the page with __GFP_DIRECT_RECLAIM bit set. we have to
+ * put per-cpu compression stream and, thus, to re-do
+ * the compression once handle is allocated.
+ *
+ * if we have a 'non-null' handle here then we are coming
+ * from the slow path and handle has already been allocated.
+ */
+ if (IS_ERR((void *)handle))
+ handle = zs_malloc(zram->mem_pool, comp_len,
+ __GFP_KSWAPD_RECLAIM |
+ __GFP_NOWARN |
+ __GFP_HIGHMEM |
+ __GFP_MOVABLE);
if (IS_ERR((void *)handle)) {
zcomp_stream_put(zram->comp);
+ atomic64_inc(&zram->stats.writestall);
+ handle = zs_malloc(zram->mem_pool, comp_len,
+ GFP_NOIO | __GFP_HIGHMEM |
+ __GFP_MOVABLE);
+ if (!IS_ERR((void *)handle))
+ goto compress_again;
return PTR_ERR((void *)handle);
}
@@ -1948,6 +1969,7 @@ static int zram_add(void)
if (ZRAM_LOGICAL_BLOCK_SIZE == PAGE_SIZE)
blk_queue_max_write_zeroes_sectors(zram->disk->queue, UINT_MAX);
+ blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES, zram->disk->queue);
ret = device_add_disk(NULL, zram->disk, zram_disk_groups);
if (ret)
goto out_cleanup_disk;
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index 158c91e54850..80c3b43b4828 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -81,6 +81,7 @@ struct zram_stats {
atomic64_t huge_pages_since; /* no. of huge pages since zram set up */
atomic64_t pages_stored; /* no. of pages currently stored */
atomic_long_t max_used_pages; /* no. of maximum pages stored */
+ atomic64_t writestall; /* no. of write slow paths */
atomic64_t miss_free; /* no. of missed free */
#ifdef CONFIG_ZRAM_WRITEBACK
atomic64_t bd_count; /* no. of pages in backing device */
--
2.37.1
On (22/08/10 09:06), Jiri Slaby wrote:
> This reverts commit e7be8d1dd983156bbdd22c0319b71119a8fbb697 as it
> causes zram failures. It does not revert cleanly, PTR_ERR handling was
> introduced in the meantime. This is handled by appropriate IS_ERR.
>
> When under memory pressure, zs_malloc() can fail. Before the above
> commit, the allocation was retried with direct reclaim enabled
> (GFP_NOIO). After the commit, it is not -- only __GFP_KSWAPD_RECLAIM is
> tried.
>
> So when the failure occurs under memory pressure, the overlaying
> filesystem such as ext2 (mounted by ext4 module in this case) can emit
> failures, making the (file)system unusable:
> EXT4-fs warning (device zram0): ext4_end_bio:343: I/O error 10 writing to inode 16386 starting block 159744)
> Buffer I/O error on device zram0, logical block 159744
>
> With direct reclaim, memory is really reclaimed and allocation succeeds,
> eventually. In the worst case, the oom killer is invoked, which is
> proper outcome if user sets up zram too large (in comparison to
> available RAM).
>
> This very diff doesn't apply to 5.19 (stable) cleanly (see PTR_ERR note
> above). Use revert of e7be8d1dd983 directly.
>
> Link: https://bugzilla.suse.com/show_bug.cgi?id=1202203
> Fixes: e7be8d1dd983 ("zram: remove double compression logic")
> Cc: [email protected] # 5.19
> Cc: Minchan Kim <[email protected]>
> Cc: Nitin Gupta <[email protected]>
> Cc: Sergey Senozhatsky <[email protected]>
> Cc: Alexey Romanov <[email protected]>
> Cc: Dmitry Rokosov <[email protected]>
> Cc: Lukas Czerner <[email protected]>
> Cc: Ext4 Developers List <[email protected]>
> Signed-off-by: Jiri Slaby <[email protected]>
Reviewed-by: Sergey Senozhatsky <[email protected]>
On 30. 08. 22, 23:46, [email protected] wrote:
> Hi, I think i bumped on the same issue on version 5.19.2 with ext4 on zram mounted on /tmp
Only 5.19.6 contains the fix.
> ```
> # sudo dmesg -T | grep ext4
>
> [Tue Aug 30 21:41:45 2022] EXT4-fs error (device zram1): ext4_check_bdev_write_error:218: comm kworker/u8:3: Error while
> [Tue Aug 30 21:41:45 2022] EXT4-fs warning (device zram1): ext4_end_bio:347: I/O error 10 writing to inode 76 starting b
> [Tue Aug 30 21:41:45 2022] EXT4-fs warning (device zram1): ext4_end_bio:347: I/O error 10 writing to inode 76 starting b
> [Tue Aug 30 21:41:45 2022] EXT4-fs warning (device zram1): ext4_end_bio:347: I/O error 10 writing to inode 66 starting b
> [Tue Aug 30 22:07:02 2022] EXT4-fs error (device zram1): ext4_journal_check_start:83: comm ThreadPoolForeg: Detected abo
> [Tue Aug 30 22:07:02 2022] EXT4-fs (zram1): Remounting filesystem read-only
>
> ```
> Not sure what caused it, i was just updating my arch system.
>
--
js