2016-07-12 07:14:38

by Zhangfei Gao

[permalink] [raw]
Subject: Re: ext4 error when testing virtio-scsi & vhost-scsi

Some update:

If test with ext2, no problem in iblock.
If test with ext4, ext4_mb_generate_buddy reported error in the
removing files after reboot.


root@(none)$ rm test
[ 21.006549] EXT4-fs error (device sda): ext4_mb_generate_buddy:758: group 18
, block bitmap and bg descriptor inconsistent: 26464 vs 25600 free clusters
[ 21.008249] JBD2: Spotted dirty metadata buffer (dev = sda, blocknr = 0). Th
ere's a risk of filesystem corruption in case of system crash.

Any special notes of using ext4 in qemu?

Thanks


On Mon, Jul 11, 2016 at 12:05 PM, Zhangfei Gao <[email protected]> wrote:
> Hi
>
> Does qemu process need flush data before closing?
>
> In the test of virtio_scsi & vhost_scsi, the first time read & write
> to the mounted disk have no problem.
> But after reboot, remount the disk, error happen immediately when
> remove the files created in the first time.
>
> For example:
> # targetcli
> /> cd backstores/iblock
> /backstores/iblock> create name=block_backend dev=/dev/sda3
> /backstores/iblock> cd /vhost
> /vhost> create wwn=naa.60014053c5cc00ac
> /vhost> ls
> o- vhost ............................................................ [1 Target]
> o- naa.60014053c5cc00ac .............................................. [1 TPG]
> o- tpg1 ............................................. [naa.6001405830beacfa]
> o- luns ......................................................... [0 LUNs]
> /vhost> cd naa.60014053c5cc00ac/tpg1/luns
> /vhost/naa.60...0ac/tpg1/luns> create /backstores/iblock/block_backend
>
> qemu.git/aarch64-softmmu/qemu-system-aarch64 \
> -enable-kvm -nographic -kernel Image \
> -device vhost-scsi-pci,wwpn=naa.60014053c5cc00ac \
> -m 512 -M virt -cpu host \
> -append "earlyprintk console=ttyAMA0 mem=512M"
>
> in qemu system:
> mount /dev/sda /mnt;
>
> sync; date; dd if=/dev/zero of=/mnt/test bs=1M count=100; sync; date;
>
> no problem for several times.
>
> Reboot
> targetcli config -> start qemu again.
> in qemu:
>
> mount /dev/sda /mnt;
>
> root@(none)$ rm test
> [ 12.900540] EXT4-fs error (device sda): ext4_mb_generate_buddy:758: group 3s
> [ 12.908844] EXT4-fs error (device sda): ext4_mb_generate_buddy:758: group 3s
> [ 12.911154] JBD2: Spotted dirty metadata buffer (dev = sda, blocknr = 0). T.
>
> Error happens immediately removing the files, which is created in the
> first time.
>
> Thanks
>
>
> On Sun, Jun 12, 2016 at 11:23 AM, Zhangfei Gao <[email protected]> wrote:
>> Here is one question about testing virtio-scsi & vhost-scsi.
>> I met ext4 error using fileio or iblock.
>> And after the error, the filesystem can not be remount next time in
>> guest os except mkfs.ext4 again.
>>
>> Any suggestions?
>> Thanks in advance.
>>
>>
>> Basic steps.
>> fileio:
>> mount /dev/sda3 /mnt
>> dd if=/dev/zero of=test bs=1M count=1024
>>
>>
>> #targetcli
>>
>> (targetcli) /> cd backstores/fileio
>>
>> (targetcli) /> create name=file_backend file_or_dev=/mnt/test size=1G
>>
>> (targetcli) /> cd /vhost
>>
>> (targetcli) /> create wwn=naa.60014052cc816bf4
>>
>> (targetcli) /> cd naa.60014052cc816bf4/tpgt1/luns
>>
>> (targetcli) /> create /backstores/fileio/file_backend
>>
>> (targetcli) /> cd /
>>
>> (targetcli) /> saveconfig
>>
>> (targetcli) /> exit
>>
>> qemu.git/aarch64-softmmu/qemu-system-aarch64 \
>>
>> -enable-kvm -nographic -kernel Image \
>>
>> -device vhost-scsi-pci,wwpn=naa.60014052cc816bf4 \
>>
>> -m 512 -M virt -cpu host \
>>
>> -append "earlyprintk console=ttyAMA0 mem=512M rw"
>>
>>
>> After guest kernel is boot,
>>
>> Mkfs.ext4 /dev/sda
>>
>> Mount /dev/sda /mnt
>>
>>
>> sync; date; dd if=/dev/zero of=test bs=1M count=100; sync; date;
>>
>>
>> Ext4 error:
>>
>> And can not be mounted next time.
>>
>> [ 762.387457] EXT4-fs error (device sda) in
>> ext4_reserve_inode_write:5172: Corrupt filesystem
>>
>> [ 762.395622] EXT4-fs error (device sda) in
>> ext4_reserve_inode_write:5172: Corrupt filesystem
>>
>> [ 762.403915] EXT4-fs error (device sda) in
>> ext4_reserve_inode_write:5172: Corrupt filesystem
>>
>> [ 762.412263] EXT4-fs error (device sda) in ext4_ext_truncate:4661:
>> Corrupt filesystem
>>
>> [ 762.420613] EXT4-fs error (device sda) in
>> ext4_reserve_inode_write:5172: Corrupt filesystem
>>
>> [ 762.428913] EXT4-fs error (device sda) in ext4_orphan_del:2896:
>> Corrupt filesystem
>>
>> [ 762.437262] EXT4-fs error (device sda) in
>> ext4_reserve_inode_write:5172: Corrupt filesystem
>>
>> [ 762.445614] EXT4-fs error (device sda) in
>> ext4_reserve_inode_write:5172: Corrupt filesystem
>>
>> [ 762.454516] EXT4-fs error (device sda) in
>> ext4_reserve_inode_write:5172: Corrupt filesystem
>>
>> [ 762.462283] EXT4-fs error (device sda) in
>> ext4_reserve_inode_write:5172: Corrupt filesystem
>>
>> [ 767.370571] jbd2_journal_bmap: journal block not found at offset 13 on sda-8
>>
>> [ 767.371458] Aborting journal on device sda-8.
>>
>> [ 767.395583] EXT4-fs error: 564 callbacks suppressed
>>
>> [ 767.396173] EXT4-fs error (device sda) in ext4_da_write_end:2841: IO failure
>>
>> [ 767.412221] EXT4-fs error (device sda):
>> ext4_journal_check_start:56: Detected aborted journal
>>
>> [ 767.413325] EXT4-fs (sda): Remounting filesystem read-only
>>
>> dd: writing '/mnt/test.bin': Read-only file system
>>
>>
>> blockio:
>>
>> # targetcli
>>
>> /> cd backstores/iblock
>>
>> /backstores/iblock> create name=block_backend dev=/dev/sda4
>>
>> /backstores/iblock> cd /vhost
>>
>> /vhost> create
>>
>> /vhost> ls
>>
>> o- vhost ............................................................ [1 Target]
>>
>> o- naa.60014053c5cc00ac .............................................. [1 TPG]
>>
>> o- tpg1 ............................................. [naa.6001405830beacfa]
>>
>> o- luns ......................................................... [0 LUNs]
>>
>> /vhost> cd naa.60014053c5cc00ac/tpg1/luns
>>
>> /vhost/naa.60...0ac/tpg1/luns> create /backstores/iblock/block_backend
>>
>> /vhost/naa.60...0ac/tpg1/luns> cd /
>>
>> /> saveconfig
>>
>> qemu.git/aarch64-softmmu/qemu-system-aarch64 \
>>
>> -enable-kvm -nographic -kernel Image \
>>
>> -device vhost-scsi-pci,wwpn=naa.60014053c5cc00ac \
>>
>> -m 512 -M virt -cpu host \
>>
>> -append "earlyprintk console=ttyAMA0 mem=512M"
>>
>>
>> Mount /dev/sda /mnt
>>
>> sync; date; dd if=/dev/zero of=/mnt/test bs=1M count=100; sync; date;
>>
>>
>> sync; date; sync; date; dd if=/dev/zero of=/mnt/test bs=1M count=100;
>>
>> Thu Jan 1 00:01:16 UTC 1970
>>
>> [ 77.044879] EXT4-fs error (device sda) in
>> ext4_reserve_inode_write:5172: Corrupt filesystem
>>
>> [ 77.067334] EXT4-fs error (device sda) in
>> ext4_reserve_inode_write:5172: Corrupt filesystem
>>
>> [ 77.075623] EXT4-fs error (device sda) in
>> ext4_reserve_inode_write:5172: Corrupt filesystem
>>
>> [ 77.083970] EXT4-fs error (device sda) in ext4_ext_truncate:4661:
>> Corrupt filesystem
>>
>> [ 77.092322] EXT4-fs error (device sda) in
>> ext4_reserve_inode_write:5172: Corrupt filesystem
>>
>> [ 77.100619] EXT4-fs error (device sda) in ext4_orphan_del:2896:
>> Corrupt filesystem
>>
>> [ 77.108971] EXT4-fs error (device sda) in
>> ext4_reserve_inode_write:5172: Corrupt filesystem
>>
>> [ 77.117321] EXT4-fs error (device sda) in
>> ext4_reserve_inode_write:5172: Corrupt filesystem
>>
>> [ 77.126204] EXT4-fs error (device sda) in
>> ext4_reserve_inode_write:5172: Corrupt filesystem
>>
>> [ 77.133989] EXT4-fs error (device sda) in
>> ext4_reserve_inode_write:5172: Corrupt filesystem
>>
>>
>> [ 82.025630] jbd2_journal_bmap: journal block not found at offset 10 on sda-8
>>
>> [ 82.026522] Aborting journal on device sda-8.
>>
>> [ 82.050642] EXT4-fs error: 563 callbacks suppressed
>>
>> [ 82.051278] EXT4-fs error (device sda) in ext4_da_write_end:2841: IO failure
>>
>> [ 82.067283] EXT4-fs error (device sda):
>> ext4_journal_check_start:56: Detected aborted journal
>>
>> [ 82.068372] EXT4-fs (sda): Remounting filesystem read-only
>>
>> dd: writing '/mnt/test': Read-only file system


2016-07-12 16:43:24

by Theodore Ts'o

[permalink] [raw]
Subject: Re: ext4 error when testing virtio-scsi & vhost-scsi

On Tue, Jul 12, 2016 at 03:14:38PM +0800, Zhangfei Gao wrote:
> Some update:
>
> If test with ext2, no problem in iblock.
> If test with ext4, ext4_mb_generate_buddy reported error in the
> removing files after reboot.
>
>
> root@(none)$ rm test
> [ 21.006549] EXT4-fs error (device sda): ext4_mb_generate_buddy:758: group 18
> , block bitmap and bg descriptor inconsistent: 26464 vs 25600 free clusters
> [ 21.008249] JBD2: Spotted dirty metadata buffer (dev = sda, blocknr = 0). Th
> ere's a risk of filesystem corruption in case of system crash.
>
> Any special notes of using ext4 in qemu?

Ext4 has more runtime consistency checking than ext2. So just because
ext4 complains doesn't mean that there isn't a problem with the file
system; it just means that ext4 is more likely to notice before you
lose user data.

So if you test with ext2, try running e2fsck afterwards, to make sure
the file system is consistent.

Given that I'm reguarly testing ext4 using kvm, and I haven't seen
anything like this in a very long time, I suspect the problemb is with
your SCSI code, and not with ext4.

- Ted

2016-07-12 23:03:56

by Dave Chinner

[permalink] [raw]
Subject: Re: ext4 error when testing virtio-scsi & vhost-scsi

On Tue, Jul 12, 2016 at 12:43:24PM -0400, Theodore Ts'o wrote:
> On Tue, Jul 12, 2016 at 03:14:38PM +0800, Zhangfei Gao wrote:
> > Some update:
> >
> > If test with ext2, no problem in iblock.
> > If test with ext4, ext4_mb_generate_buddy reported error in the
> > removing files after reboot.
> >
> >
> > root@(none)$ rm test
> > [ 21.006549] EXT4-fs error (device sda): ext4_mb_generate_buddy:758: group 18
> > , block bitmap and bg descriptor inconsistent: 26464 vs 25600 free clusters
> > [ 21.008249] JBD2: Spotted dirty metadata buffer (dev = sda, blocknr = 0). Th
> > ere's a risk of filesystem corruption in case of system crash.
> >
> > Any special notes of using ext4 in qemu?
>
> Ext4 has more runtime consistency checking than ext2. So just because
> ext4 complains doesn't mean that there isn't a problem with the file
> system; it just means that ext4 is more likely to notice before you
> lose user data.
>
> So if you test with ext2, try running e2fsck afterwards, to make sure
> the file system is consistent.
>
> Given that I'm reguarly testing ext4 using kvm, and I haven't seen
> anything like this in a very long time, I suspect the problemb is with
> your SCSI code, and not with ext4.

It's the same error I reported yesterday for ext3 on 4.7-rc6 when
rebooting a VM after it hung.

Cheers,

Dave.
--
Dave Chinner
[email protected]

2016-07-13 07:25:09

by Zhangfei Gao

[permalink] [raw]
Subject: Re: ext4 error when testing virtio-scsi & vhost-scsi

Dear Ted

On Wed, Jul 13, 2016 at 12:43 AM, Theodore Ts'o <[email protected]> wrote:
> On Tue, Jul 12, 2016 at 03:14:38PM +0800, Zhangfei Gao wrote:
>> Some update:
>>
>> If test with ext2, no problem in iblock.
>> If test with ext4, ext4_mb_generate_buddy reported error in the
>> removing files after reboot.
>>
>>
>> root@(none)$ rm test
>> [ 21.006549] EXT4-fs error (device sda): ext4_mb_generate_buddy:758: group 18
>> , block bitmap and bg descriptor inconsistent: 26464 vs 25600 free clusters
>> [ 21.008249] JBD2: Spotted dirty metadata buffer (dev = sda, blocknr = 0). Th
>> ere's a risk of filesystem corruption in case of system crash.
>>
>> Any special notes of using ext4 in qemu?
>
> Ext4 has more runtime consistency checking than ext2. So just because
> ext4 complains doesn't mean that there isn't a problem with the file
> system; it just means that ext4 is more likely to notice before you
> lose user data.
>
> So if you test with ext2, try running e2fsck afterwards, to make sure
> the file system is consistent.
>
> Given that I'm reguarly testing ext4 using kvm, and I haven't seen
> anything like this in a very long time, I suspect the problemb is with
> your SCSI code, and not with ext4.
>

Instead of using sas disk, I am trying with u-disk as backstore, via iblock.

# targetcli
/backstores/iblock> create name=block_backend dev=/dev/sdb
/backstores/iblock> cd /vhost
/vhost> create wwn=naa.60014053c5cc00ac
/vhost> ls
o- vhost ............................................................ [1 Target]
o- naa.60014053c5cc00ac .............................................. [1 TPG]
o- tpg1 ............................................. [naa.6001405830beacfa]
o- luns ......................................................... [0 LUNs]
/vhost> cd naa.60014053c5cc00ac/tpg1/luns
/vhost/naa.60...0ac/tpg1/luns> create /backstores/iblock/block_backend

/work/qemu.git/aarch64-softmmu/qemu-system-aarch64 \
-enable-kvm -nographic -kernel Image \
-device vhost-scsi-pci,wwpn=naa.60014053c5cc00ac \
-m 512 -M virt -cpu host \
-append "earlyprintk console=ttyAMA0 mem=512M"



in qemu:

Just test with dd, got following error.

#sync; date; dd if=/dev/zero of=test bs=1M count=100; sync;
Thu Jan 1 00:00:45 UTC 1970
[ 45.150514] EXT4-fs error (device sda) in ext4_reserve_inode_write:5172: Corr
upt filesystem
[ 45.153319] EXT4-fs error (device sda) in ext4_reserve_inode_write:5172: Corr
upt filesystem
[ 45.156054] EXT4-fs error (device sda) in ext4_reserve_inode_write:5172: Corr
upt filesystem
[ 45.160806] EXT4-fs error (device sda) in ext4_ext_truncate:4661: Corrupt fil
esystem
[ 45.165431] EXT4-fs error (device sda) in ext4_reserve_inode_write:5172: Corr
upt filesystem
[ 45.169177] EXT4-fs error (device sda) in ext4_orphan_del:2896: Corrupt files
ystem
[ 45.172676] EXT4-fs error (device sda) in ext4_reserve_inode_write:5172: Corr
upt filesystem
[ 45.176427] EXT4-fs error (device sda) in ext4_reserve_inode_write:5172: Corr
upt filesystem
[ 45.180800] EXT4-fs error (device sda) in ext4_reserve_inode_write:5172: Corr
upt filesystem
[ 45.183571] EXT4-fs error (device sda) in ext4_reserve_inode_write:5172: Corr
upt filesystem

[ 50.122300] jbd2_journal_bmap: journal block not found at offset 26 on sda-8
[ 50.123181] Aborting journal on device sda-8.
[ 50.138046] EXT4-fs error (device sda): ext4_journal_check_start:56: Detected
aborted journal
[ 50.139117] EXT4-fs (sda): Remounting filesystem read-only
dd: writing 'test': Read-only file system
6+0 records in
4+1 records out
Thu Jan 1 00:00:50 UTC 1970


Also get error like this after reboot, not always happen.
root@(none)$ rm test
root@(none)$ ls
lost+found
; date;one)$ sync; date; dd if=/dev/zero of=test bs=1M count=100; sync
Thu Jan 1 00:00:29 UTC 1970
[ 29.909074] EXT4-fs error (device sda): ext4_init_inode_bitmap:79: comm dd: C
hecksum bad for group 3
[ 29.910205] BUG: scheduling while atomic: dd/1091/0x00000002
[ 29.910928] Modules linked in:
[ 29.911340] CPU: 0 PID: 1091 Comm: dd Not tainted 4.5.0-rc1+ #70
[ 29.912066] Hardware name: linux,dummy-virt (DT)
[ 29.912639] Call trace:
[ 29.912957] [<ffffffc000089830>] dump_backtrace+0x0/0x180
[ 29.913623] [<ffffffc0000899c4>] show_stack+0x14/0x20
[ 29.914249] [<ffffffc00033fc88>] dump_stack+0x90/0xc8
[ 29.914893] [<ffffffc0000d68d4>] __schedule_bug+0x44/0x58
[ 29.915573] [<ffffffc0006cc46c>] __schedule+0x4f4/0x5a0
[ 29.916200] [<ffffffc0006cc554>] schedule+0x3c/0xa8
[ 29.916786] [<ffffffc0006cf07c>] schedule_timeout+0x15c/0x1b0
[ 29.917471] [<ffffffc0006cbf08>] io_schedule_timeout+0xa0/0x110
[ 29.918177] [<ffffffc0006cceb8>] bit_wait_io+0x18/0x68
[ 29.918827] [<ffffffc0006ccd5c>] __wait_on_bit_lock+0x7c/0xf0
[ 29.919552] [<ffffffc0006cce30>] out_of_line_wait_on_bit_lock+0x60/0x68
[ 29.920352] [<ffffffc0001eb110>] __lock_buffer+0x38/0x48
[ 29.920991] [<ffffffc0001efb4c>] __sync_dirty_buffer+0xf4/0xf8
[ 29.921692] [<ffffffc00024c6d4>] ext4_commit_super+0x18c/0x268
[ 29.922389] [<ffffffc00024ca38>] __ext4_error+0x60/0xd0
[ 29.923055] [<ffffffc000236b6c>] ext4_read_inode_bitmap+0x3a4/0x7b0
[ 29.923826] [<ffffffc000237d5c>] __ext4_new_inode+0x344/0x1468
[ 29.924532] [<ffffffc000248084>] ext4_create+0xac/0x1c0
[ 29.925163] [<ffffffc0001c9ab4>] vfs_create+0xf4/0x158
[ 29.925780] [<ffffffc0001ca498>] path_openat+0x980/0xf00
[ 29.926419] [<ffffffc0001cbe0c>] do_filp_open+0x64/0xe0
[ 29.927086] [<ffffffc0001bafc8>] do_sys_open+0x148/0x218
[ 29.927746] [<ffffffc0001bb0d0>] SyS_openat+0x10/0x18
[ 29.928359] [<ffffffc000085d30>] el0_svc_naked+0x24/0x28


Again, ext2 test no problem.


qemu is master branch:
qemu.git$ git log --oneline
860a3b3 Update version for v2.6.0-rc5 release
53db932 Merge remote-tracking branch
'remotes/kraxel/tags/pull-vga-20160509-1' into staging
975eb6a Update version for v2.6.0-rc4 release
1beb99f Revert "acpi: mark PMTIMER as unlocked"

kernel is 4.5-rc1, cherry-pick latest vhost and target patches.

Thanks

2016-07-15 07:55:20

by Zhangfei Gao

[permalink] [raw]
Subject: Re: ext4 error when testing virtio-scsi & vhost-scsi

Dear Dave

On Wed, Jul 13, 2016 at 7:03 AM, Dave Chinner <[email protected]> wrote:
> On Tue, Jul 12, 2016 at 12:43:24PM -0400, Theodore Ts'o wrote:
>> On Tue, Jul 12, 2016 at 03:14:38PM +0800, Zhangfei Gao wrote:
>> > Some update:
>> >
>> > If test with ext2, no problem in iblock.
>> > If test with ext4, ext4_mb_generate_buddy reported error in the
>> > removing files after reboot.
>> >
>> >
>> > root@(none)$ rm test
>> > [ 21.006549] EXT4-fs error (device sda): ext4_mb_generate_buddy:758: group 18
>> > , block bitmap and bg descriptor inconsistent: 26464 vs 25600 free clusters
>> > [ 21.008249] JBD2: Spotted dirty metadata buffer (dev = sda, blocknr = 0). Th
>> > ere's a risk of filesystem corruption in case of system crash.
>> >
>> > Any special notes of using ext4 in qemu?
>>
>> Ext4 has more runtime consistency checking than ext2. So just because
>> ext4 complains doesn't mean that there isn't a problem with the file
>> system; it just means that ext4 is more likely to notice before you
>> lose user data.
>>
>> So if you test with ext2, try running e2fsck afterwards, to make sure
>> the file system is consistent.
>>
>> Given that I'm reguarly testing ext4 using kvm, and I haven't seen
>> anything like this in a very long time, I suspect the problemb is with
>> your SCSI code, and not with ext4.
>
> It's the same error I reported yesterday for ext3 on 4.7-rc6 when
> rebooting a VM after it hung.


Any link of this error?

Now I still can not get conclusion of which part cause this error?

1. No problem
a. Using with virtio-scsi, and test via files with ext4 filesystem, no problem.

/work/qemu.git/aarch64-softmmu/qemu-system-aarch64 \
-enable-kvm -nographic -kernel Image \
-global virtio-blk-device.scsi=on -device virtio-scsi-device,id=scsi \
-drive file=ext4_oe64.img,id=coreimg,cache=none,if=none,format=raw \
-device scsi-hd,drive=coreimg \
-m 512 -M virt -cpu host \
-append "earlyprintk console=ttyAMA0 mem=512M"

ext4_oe64.img is ext4 file system.

b. Use vhost-scsi & target, ramdisk as backstore, ext4 filesystem,
also no problem.


2. Has problem
a. Using vhost-scsi & target, iblock, with sas disk & u-disk as
backstore, ext4, both has issue.
it only prove the issue is not in driver (sas & u-disk) itself.


Looks the issue is in vhost-scsi & target.
Still in checking how to narrow down.

Any suggestion?


Thanks

2016-07-18 01:53:41

by Dave Chinner

[permalink] [raw]
Subject: Re: ext4 error when testing virtio-scsi & vhost-scsi

On Fri, Jul 15, 2016 at 03:55:20PM +0800, Zhangfei Gao wrote:
> Dear Dave
>
> On Wed, Jul 13, 2016 at 7:03 AM, Dave Chinner <[email protected]> wrote:
> > On Tue, Jul 12, 2016 at 12:43:24PM -0400, Theodore Ts'o wrote:
> >> On Tue, Jul 12, 2016 at 03:14:38PM +0800, Zhangfei Gao wrote:
> >> > Some update:
> >> >
> >> > If test with ext2, no problem in iblock.
> >> > If test with ext4, ext4_mb_generate_buddy reported error in the
> >> > removing files after reboot.
> >> >
> >> >
> >> > root@(none)$ rm test
> >> > [ 21.006549] EXT4-fs error (device sda): ext4_mb_generate_buddy:758: group 18
> >> > , block bitmap and bg descriptor inconsistent: 26464 vs 25600 free clusters
> >> > [ 21.008249] JBD2: Spotted dirty metadata buffer (dev = sda, blocknr = 0). Th
> >> > ere's a risk of filesystem corruption in case of system crash.
> >> >
> >> > Any special notes of using ext4 in qemu?
> >>
> >> Ext4 has more runtime consistency checking than ext2. So just because
> >> ext4 complains doesn't mean that there isn't a problem with the file
> >> system; it just means that ext4 is more likely to notice before you
> >> lose user data.
> >>
> >> So if you test with ext2, try running e2fsck afterwards, to make sure
> >> the file system is consistent.
> >>
> >> Given that I'm reguarly testing ext4 using kvm, and I haven't seen
> >> anything like this in a very long time, I suspect the problemb is with
> >> your SCSI code, and not with ext4.
> >
> > It's the same error I reported yesterday for ext3 on 4.7-rc6 when
> > rebooting a VM after it hung.
>
>
> Any link of this error?

http://article.gmane.org/gmane.comp.file-systems.ext4/53792

Cheers,

Dave.
--
Dave Chinner
[email protected]

2016-07-19 07:56:55

by Zhangfei Gao

[permalink] [raw]
Subject: Re: ext4 error when testing virtio-scsi & vhost-scsi

Dear Ted

On Wed, Jul 13, 2016 at 12:43 AM, Theodore Ts'o <[email protected]> wrote:
> On Tue, Jul 12, 2016 at 03:14:38PM +0800, Zhangfei Gao wrote:
>> Some update:
>>
>> If test with ext2, no problem in iblock.
>> If test with ext4, ext4_mb_generate_buddy reported error in the
>> removing files after reboot.
>>
>>
>> root@(none)$ rm test
>> [ 21.006549] EXT4-fs error (device sda): ext4_mb_generate_buddy:758: group 18
>> , block bitmap and bg descriptor inconsistent: 26464 vs 25600 free clusters
>> [ 21.008249] JBD2: Spotted dirty metadata buffer (dev = sda, blocknr = 0). Th
>> ere's a risk of filesystem corruption in case of system crash.
>>
>> Any special notes of using ext4 in qemu?
>
> Ext4 has more runtime consistency checking than ext2. So just because
> ext4 complains doesn't mean that there isn't a problem with the file
> system; it just means that ext4 is more likely to notice before you
> lose user data.
>
> So if you test with ext2, try running e2fsck afterwards, to make sure
> the file system is consistent.
>
> Given that I'm reguarly testing ext4 using kvm, and I haven't seen
> anything like this in a very long time, I suspect the problemb is with
> your SCSI code, and not with ext4.
>

Do you know what's the possible reason of this error.

Have tried 4.7-rc2, same issue exist.
It can be reproduced by fileio and iblock as backstore.
It is easier to happen in qemu like this process:
qemu-> mount-> dd xx -> umout -> mount -> rm xx, then the error may
happen, no need to reboot.

ramdisk can not cause error just because it just malloc and memcpy,
while not going to blk layer.

Also tried creating one file in /tmp, used as fileio, also can reproduce.
So no real device is based.

like:
cd /tmp
dd if=/dev/zero of=test bs=1M count=1024; sync;
targetcli
#targetcli
(targetcli) /> cd backstores/fileio
(targetcli) /> create name=file_backend file_or_dev=/tmp/test size=1G
(targetcli) /> cd /vhost
(targetcli) /> create wwn=naa.60014052cc816bf4
(targetcli) /> cd naa.60014052cc816bf4/tpgt1/luns
(targetcli) /> create /backstores/fileio/file_backend
(targetcli) /> cd /
(targetcli) /> saveconfig
(targetcli) /> exit

/work/qemu.git/aarch64-softmmu/qemu-system-aarch64 \
-enable-kvm -nographic -kernel Image \
-device vhost-scsi-pci,wwpn=naa.60014052cc816bf4 \
-m 512 -M virt -cpu host \
-append "earlyprintk console=ttyAMA0 mem=512M"

in qemu:
mkfs.ext4 /dev/sda
mount /dev/sda /mnt/
sync; date; dd if=/dev/zero of=/mnt/test bs=1M count=100; sync; date;

using dd test, then some error happen.
log like:
oot@(none)$ sync; date; dd if=/dev/zero of=test bs=1M count=100; sync;; date;
[ 1789.917963] sbc_parse_cdb cdb[0]=0x35
[ 1789.922000] fd_execute_sync_cache immed=0
Tue Jul 19 07:26:12 UTC 2016
[ 200.712879] EXT4-fs error (device sda) [ 1790.191770] sbc_parse_cdb
cdb[0]=0x2a
in ext4_reserve_inode_write:5362[ 1790.198382] fd_execute_rw
: Corrupt filesystem
[ 200.729001] EXT4-fs error (device sda) [ 1790.207843] sbc_parse_cdb
cdb[0]=0x2a
in ext4_reserve_inode_write:5362[ 1790.214495] fd_execute_rw
: Corrupt filesystem

Looks like the error usually happens after SYCHRONIZE CACHE, but not
for sure it is always happen after sync cache.

Thanks

2016-07-19 08:21:43

by Zhangfei Gao

[permalink] [raw]
Subject: Re: ext4 error when testing virtio-scsi & vhost-scsi

On Tue, Jul 19, 2016 at 3:56 PM, Zhangfei Gao <[email protected]> wrote:
> Dear Ted
>
> On Wed, Jul 13, 2016 at 12:43 AM, Theodore Ts'o <[email protected]> wrote:
>> On Tue, Jul 12, 2016 at 03:14:38PM +0800, Zhangfei Gao wrote:
>>> Some update:
>>>
>>> If test with ext2, no problem in iblock.
>>> If test with ext4, ext4_mb_generate_buddy reported error in the
>>> removing files after reboot.
>>>
>>>
>>> root@(none)$ rm test
>>> [ 21.006549] EXT4-fs error (device sda): ext4_mb_generate_buddy:758: group 18
>>> , block bitmap and bg descriptor inconsistent: 26464 vs 25600 free clusters
>>> [ 21.008249] JBD2: Spotted dirty metadata buffer (dev = sda, blocknr = 0). Th
>>> ere's a risk of filesystem corruption in case of system crash.
>>>
>>> Any special notes of using ext4 in qemu?
>>
>> Ext4 has more runtime consistency checking than ext2. So just because
>> ext4 complains doesn't mean that there isn't a problem with the file
>> system; it just means that ext4 is more likely to notice before you
>> lose user data.
>>
>> So if you test with ext2, try running e2fsck afterwards, to make sure
>> the file system is consistent.
>>
>> Given that I'm reguarly testing ext4 using kvm, and I haven't seen
>> anything like this in a very long time, I suspect the problemb is with
>> your SCSI code, and not with ext4.
>>
>
> Do you know what's the possible reason of this error.
>
> Have tried 4.7-rc2, same issue exist.
> It can be reproduced by fileio and iblock as backstore.
> It is easier to happen in qemu like this process:
> qemu-> mount-> dd xx -> umout -> mount -> rm xx, then the error may
> happen, no need to reboot.
>
> ramdisk can not cause error just because it just malloc and memcpy,
> while not going to blk layer.
>
> Also tried creating one file in /tmp, used as fileio, also can reproduce.
> So no real device is based.
>
> like:
> cd /tmp
> dd if=/dev/zero of=test bs=1M count=1024; sync;
> targetcli
> #targetcli
> (targetcli) /> cd backstores/fileio
> (targetcli) /> create name=file_backend file_or_dev=/tmp/test size=1G
> (targetcli) /> cd /vhost
> (targetcli) /> create wwn=naa.60014052cc816bf4
> (targetcli) /> cd naa.60014052cc816bf4/tpgt1/luns
> (targetcli) /> create /backstores/fileio/file_backend
> (targetcli) /> cd /
> (targetcli) /> saveconfig
> (targetcli) /> exit
>
> /work/qemu.git/aarch64-softmmu/qemu-system-aarch64 \
> -enable-kvm -nographic -kernel Image \
> -device vhost-scsi-pci,wwpn=naa.60014052cc816bf4 \
> -m 512 -M virt -cpu host \
> -append "earlyprintk console=ttyAMA0 mem=512M"
>
> in qemu:
> mkfs.ext4 /dev/sda
> mount /dev/sda /mnt/
> sync; date; dd if=/dev/zero of=/mnt/test bs=1M count=100; sync; date;
>
> using dd test, then some error happen.
> log like:
> oot@(none)$ sync; date; dd if=/dev/zero of=test bs=1M count=100; sync;; date;
> [ 1789.917963] sbc_parse_cdb cdb[0]=0x35
> [ 1789.922000] fd_execute_sync_cache immed=0
> Tue Jul 19 07:26:12 UTC 2016
> [ 200.712879] EXT4-fs error (device sda) [ 1790.191770] sbc_parse_cdb
> cdb[0]=0x2a
> in ext4_reserve_inode_write:5362[ 1790.198382] fd_execute_rw
> : Corrupt filesystem
> [ 200.729001] EXT4-fs error (device sda) [ 1790.207843] sbc_parse_cdb
> cdb[0]=0x2a
> in ext4_reserve_inode_write:5362[ 1790.214495] fd_execute_rw
> : Corrupt filesystem
>
> Looks like the error usually happens after SYCHRONIZE CACHE, but not
> for sure it is always happen after sync cache.
>
It is not always happen after SYCHRONIZE CACHE

Just tried in qemu: mount-> dd xx -> umount -> mount -> rm xx
ram based, (/tmp/test), no reboot.

root@(none)$ cd /mnt
root@(none)$ ls
[ 301.444966] sbc_parse_cdb cdb[0]=0x28
[ 301.449003] fd_execute_rw
lost+found test
root@(none)$ rm test
[ 304.281920] sbc_parse_cdb cdb[0]=0x28
[ 304.285955] fd_execute_rw
[ 118.002338] EXT4-fs error (device sda):[ 304.290685] gzf sbc_parse_cdb cdb[0
]=0x28
ext4_mb_generate_buddy:758: gro[ 304.296737] gzf fd_execute_rw
up 3, block bitmap and bg descri[ 304.304099] sbc_parse_cdb cdb[0]=0x28
ptor inconsistent: 21504 vs 2143[ 304.309322] fd_execute_rw
9 free clusters
[ 118.015903] JBD2: Spotted dirty metadata buffer (dev = sda, blocknr = 0). The
re's a risk of filesystem corruption in case of system crash.
root@(none)$

Thanks

2016-07-27 07:58:55

by Zhangfei Gao

[permalink] [raw]
Subject: Re: ext4 error when testing virtio-scsi & vhost-scsi

Hi, Michael

I have met ext4 error when using vhost_scsi on arm64 platform, and
suspect it is vhost_scsi issue.

Ext4 error when testing virtio_scsi & vhost_scsi


No issue:
1. virtio_scsi, ext4
2. vhost_scsi & virtio_scsi, ext2
3. Instead of vhost, also tried loopback and no problem.
Using loopback, host can use the new block device, while vhost is used
by guest (qemu).
http://www.linux-iscsi.org/wiki/Tcm_loop
Test directly in host, not find ext4 error.



Have issue:
1. vhost_scsi & virtio_scsi, ext4
a. iblock
b, fileio, file located in /tmp (ram), no device based.

2, Have tried 4.7-r2 and 4.5-rc1 on D02 board, both have issue.
Since I need kvm specific patch for D02, so it may not freely to switch
to older version.

3. Also test with ext4, disabling journal
mkfs.ext4 -O ^has_journal /dev/sda


Do you have any suggestion?

Thanks

On Tue, Jul 19, 2016 at 4:21 PM, Zhangfei Gao <[email protected]> wrote:
> On Tue, Jul 19, 2016 at 3:56 PM, Zhangfei Gao <[email protected]> wrote:
>> Dear Ted
>>
>> On Wed, Jul 13, 2016 at 12:43 AM, Theodore Ts'o <[email protected]> wrote:
>>> On Tue, Jul 12, 2016 at 03:14:38PM +0800, Zhangfei Gao wrote:
>>>> Some update:
>>>>
>>>> If test with ext2, no problem in iblock.
>>>> If test with ext4, ext4_mb_generate_buddy reported error in the
>>>> removing files after reboot.
>>>>
>>>>
>>>> root@(none)$ rm test
>>>> [ 21.006549] EXT4-fs error (device sda): ext4_mb_generate_buddy:758: group 18
>>>> , block bitmap and bg descriptor inconsistent: 26464 vs 25600 free clusters
>>>> [ 21.008249] JBD2: Spotted dirty metadata buffer (dev = sda, blocknr = 0). Th
>>>> ere's a risk of filesystem corruption in case of system crash.
>>>>
>>>> Any special notes of using ext4 in qemu?
>>>
>>> Ext4 has more runtime consistency checking than ext2. So just because
>>> ext4 complains doesn't mean that there isn't a problem with the file
>>> system; it just means that ext4 is more likely to notice before you
>>> lose user data.
>>>
>>> So if you test with ext2, try running e2fsck afterwards, to make sure
>>> the file system is consistent.
>>>
>>> Given that I'm reguarly testing ext4 using kvm, and I haven't seen
>>> anything like this in a very long time, I suspect the problemb is with
>>> your SCSI code, and not with ext4.
>>>
>>
>> Do you know what's the possible reason of this error.
>>
>> Have tried 4.7-rc2, same issue exist.
>> It can be reproduced by fileio and iblock as backstore.
>> It is easier to happen in qemu like this process:
>> qemu-> mount-> dd xx -> umout -> mount -> rm xx, then the error may
>> happen, no need to reboot.
>>
>> ramdisk can not cause error just because it just malloc and memcpy,
>> while not going to blk layer.
>>
>> Also tried creating one file in /tmp, used as fileio, also can reproduce.
>> So no real device is based.
>>
>> like:
>> cd /tmp
>> dd if=/dev/zero of=test bs=1M count=1024; sync;
>> targetcli
>> #targetcli
>> (targetcli) /> cd backstores/fileio
>> (targetcli) /> create name=file_backend file_or_dev=/tmp/test size=1G
>> (targetcli) /> cd /vhost
>> (targetcli) /> create wwn=naa.60014052cc816bf4
>> (targetcli) /> cd naa.60014052cc816bf4/tpgt1/luns
>> (targetcli) /> create /backstores/fileio/file_backend
>> (targetcli) /> cd /
>> (targetcli) /> saveconfig
>> (targetcli) /> exit
>>
>> /work/qemu.git/aarch64-softmmu/qemu-system-aarch64 \
>> -enable-kvm -nographic -kernel Image \
>> -device vhost-scsi-pci,wwpn=naa.60014052cc816bf4 \
>> -m 512 -M virt -cpu host \
>> -append "earlyprintk console=ttyAMA0 mem=512M"
>>
>> in qemu:
>> mkfs.ext4 /dev/sda
>> mount /dev/sda /mnt/
>> sync; date; dd if=/dev/zero of=/mnt/test bs=1M count=100; sync; date;
>>
>> using dd test, then some error happen.
>> log like:
>> oot@(none)$ sync; date; dd if=/dev/zero of=test bs=1M count=100; sync;; date;
>> [ 1789.917963] sbc_parse_cdb cdb[0]=0x35
>> [ 1789.922000] fd_execute_sync_cache immed=0
>> Tue Jul 19 07:26:12 UTC 2016
>> [ 200.712879] EXT4-fs error (device sda) [ 1790.191770] sbc_parse_cdb
>> cdb[0]=0x2a
>> in ext4_reserve_inode_write:5362[ 1790.198382] fd_execute_rw
>> : Corrupt filesystem
>> [ 200.729001] EXT4-fs error (device sda) [ 1790.207843] sbc_parse_cdb
>> cdb[0]=0x2a
>> in ext4_reserve_inode_write:5362[ 1790.214495] fd_execute_rw
>> : Corrupt filesystem
>>
>> Looks like the error usually happens after SYCHRONIZE CACHE, but not
>> for sure it is always happen after sync cache.
>>
> It is not always happen after SYCHRONIZE CACHE
>
> Just tried in qemu: mount-> dd xx -> umount -> mount -> rm xx
> ram based, (/tmp/test), no reboot.
>
> root@(none)$ cd /mnt
> root@(none)$ ls
> [ 301.444966] sbc_parse_cdb cdb[0]=0x28
> [ 301.449003] fd_execute_rw
> lost+found test
> root@(none)$ rm test
> [ 304.281920] sbc_parse_cdb cdb[0]=0x28
> [ 304.285955] fd_execute_rw
> [ 118.002338] EXT4-fs error (device sda):[ 304.290685] gzf sbc_parse_cdb cdb[0
> ]=0x28
> ext4_mb_generate_buddy:758: gro[ 304.296737] gzf fd_execute_rw
> up 3, block bitmap and bg descri[ 304.304099] sbc_parse_cdb cdb[0]=0x28
> ptor inconsistent: 21504 vs 2143[ 304.309322] fd_execute_rw
> 9 free clusters
> [ 118.015903] JBD2: Spotted dirty metadata buffer (dev = sda, blocknr = 0). The
> re's a risk of filesystem corruption in case of system crash.
> root@(none)$
>
> Thanks

2016-07-27 15:56:13

by Jan Kara

[permalink] [raw]
Subject: Re: ext4 error when testing virtio-scsi & vhost-scsi

Hi!

On Wed 27-07-16 15:58:55, Zhangfei Gao wrote:
> Hi, Michael
>
> I have met ext4 error when using vhost_scsi on arm64 platform, and
> suspect it is vhost_scsi issue.
>
> Ext4 error when testing virtio_scsi & vhost_scsi
>
>
> No issue:
> 1. virtio_scsi, ext4
> 2. vhost_scsi & virtio_scsi, ext2
> 3. Instead of vhost, also tried loopback and no problem.
> Using loopback, host can use the new block device, while vhost is used
> by guest (qemu).
> http://www.linux-iscsi.org/wiki/Tcm_loop
> Test directly in host, not find ext4 error.
>
>
>
> Have issue:
> 1. vhost_scsi & virtio_scsi, ext4
> a. iblock
> b, fileio, file located in /tmp (ram), no device based.
>
> 2, Have tried 4.7-r2 and 4.5-rc1 on D02 board, both have issue.
> Since I need kvm specific patch for D02, so it may not freely to switch
> to older version.
>
> 3. Also test with ext4, disabling journal
> mkfs.ext4 -O ^has_journal /dev/sda
>
>
> Do you have any suggestion?

So can you mount the filesystem with errors=remount-ro to avoid clobbering
the fs after the problem happens? And then run e2fsck on the problematic
filesystem and send the output here?

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2016-07-28 01:29:35

by Zhangfei Gao

[permalink] [raw]
Subject: Re: ext4 error when testing virtio-scsi & vhost-scsi

Hi, Jan

On Wed, Jul 27, 2016 at 11:56 PM, Jan Kara <[email protected]> wrote:
> Hi!
>
> On Wed 27-07-16 15:58:55, Zhangfei Gao wrote:
>> Hi, Michael
>>
>> I have met ext4 error when using vhost_scsi on arm64 platform, and
>> suspect it is vhost_scsi issue.
>>
>> Ext4 error when testing virtio_scsi & vhost_scsi
>>
>>
>> No issue:
>> 1. virtio_scsi, ext4
>> 2. vhost_scsi & virtio_scsi, ext2
>> 3. Instead of vhost, also tried loopback and no problem.
>> Using loopback, host can use the new block device, while vhost is used
>> by guest (qemu).
>> http://www.linux-iscsi.org/wiki/Tcm_loop
>> Test directly in host, not find ext4 error.
>>
>>
>>
>> Have issue:
>> 1. vhost_scsi & virtio_scsi, ext4
>> a. iblock
>> b, fileio, file located in /tmp (ram), no device based.
>>
>> 2, Have tried 4.7-r2 and 4.5-rc1 on D02 board, both have issue.
>> Since I need kvm specific patch for D02, so it may not freely to switch
>> to older version.
>>
>> 3. Also test with ext4, disabling journal
>> mkfs.ext4 -O ^has_journal /dev/sda
>>
>>
>> Do you have any suggestion?
>
> So can you mount the filesystem with errors=remount-ro to avoid clobbering
> the fs after the problem happens? And then run e2fsck on the problematic
> filesystem and send the output here?
>

Tested twice, log pasted.
Both using fileio, located in host ramfs /tmp
Before e2fsck, umount /dev/sda

1.
root@(none)$ mount -o errors=remount-ro /dev/sda /mnt
[ 22.812053] EXT4-fs (sda): mounted filesystem with ordered data
mode. Opts: errors=remount-ro
$ rm /mnt/test
[ 108.388905] EXT4-fs error (device sda) in
ext4_reserve_inode_write:5362: Corrupt filesystem
[ 108.406930] Aborting journal on device sda-8.
[ 108.414120] EXT4-fs (sda): Remounting filesystem read-only
[ 108.414847] EXT4-fs error (device sda) in ext4_dirty_inode:5487: IO failure
[ 108.423571] EXT4-fs error (device sda) in ext4_free_blocks:4904:
Journal has aborted
[ 108.431919] EXT4-fs error (device sda) in
ext4_reserve_inode_write:5362: Corrupt filesystem
[ 108.440269] EXT4-fs error (device sda) in
ext4_reserve_inode_write:5362: Corrupt filesystem
[ 108.448568] EXT4-fs error (device sda) in
ext4_ext_remove_space:3058: IO failure
[ 108.456917] EXT4-fs error (device sda) in ext4_ext_truncate:4657:
Corrupt filesystem
[ 108.465267] EXT4-fs error (device sda) in
ext4_reserve_inode_write:5362: Corrupt filesystem
[ 108.473567] EXT4-fs error (device sda) in ext4_truncate:4150: IO failure
[ 108.481917] EXT4-fs error (device sda) in
ext4_reserve_inode_write:5362: Corrupt filesystem
root@(none)$ e2fsck /dev/sda
e2fsck 1.42.9 (28-Dec-2013)
/dev/sda is mounted.
e2fsck: Cannot continue, aborting.


root@(none)$ umount /mnt
[ 260.756250] EXT4-fs error (device sda): ext4_put_super:837:
Couldn't clean up the journal
root@(none)$ umount /mnt e2fsck /dev/sda
e2fsck 1.42.9 (28-Dec-2013)
ext2fs_open2: Bad magic number in super-block
e2fsck: Superblock invalid, trying backup blocks...
Superblock needs_recovery flag is clear, but journal has data.
Recovery flag not set in backup superblock, so running journal anyway.
/dev/sda: recovering journal
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Free blocks count wrong for group #1 (32703, counted=8127).
Fix<y>? yes
Free blocks count wrong for group #2 (32768, counted=31744).
Fix<y>? yes
Free blocks count wrong (249509, counted=223909).
Fix<y>? yes
Free inodes count wrong for group #0 (8181, counted=8180).
Fix<y>? yes
Free inodes count wrong (65525, counted=65524).
Fix<y>? yes

/dev/sda: ***** FILE SYSTEM WAS MODIFIED *****
/dev/sda: 12/65536 files (8.3% non-contiguous), 38235/262144 blocks
root@(none)$

2.

root@(none)$ rm /mnt/test
[ 71.021484] EXT4-fs error (device sda) in
ext4_reserve_inode_write:5362: Corrupt filesystem
[ 71.044959] Aborting journal on device sda-8.
[ 71.052152] EXT4-fs (sda): Remounting filesystem read-only
[ 71.052833] EXT4-fs error (device sda) in ext4_dirty_inode:5487: IO failure
[ 71.061600] EXT4-fs error (device sda) in ext4_free_blocks:4904:
Journal has aborted
[ 71.069948] EXT4-fs error (device sda) in
ext4_reserve_inode_write:5362: Corrupt filesystem
[ 71.078296] EXT4-fs error (device sda) in
ext4_reserve_inode_write:5362: Corrupt filesystem
[ 71.086597] EXT4-fs error (device sda) in
ext4_ext_remove_space:3058: IO failure
[ 71.094946] EXT4-fs error (device sda) in ext4_ext_truncate:4657:
Corrupt filesystem
[ 71.103296] EXT4-fs error (device sda) in
ext4_reserve_inode_write:5362: Corrupt filesystem
[ 71.111595] EXT4-fs error (device sda) in ext4_truncate:4150: IO failure
[ 71.119946] EXT4-fs error (device sda) in
ext4_reserve_inode_write:5362: Corrupt filesystem
root@(none)$ e2fsck /dev/sda
e2fsck 1.42.9 (28-Dec-2013)
/dev/sda is mounted.
e2fsck: Cannot continue, aborting.


root@(none)$ umou nt /mnt/
[ 92.103221] EXT4-fs error (device sda): ext4_put_super:837:
Couldn't clean up the journal
root@(none)$ umount /mnt/ e2fsck /dev/sda
e2fsck 1.42.9 (28-Dec-2013)
ext2fs_open2: Bad magic number in super-block
e2fsck: Superblock invalid, trying backup blocks...
Superblock needs_recovery flag is clear, but journal has data.
Recovery flag not set in backup superblock, so running journal anyway.
/dev/sda: recovering journal
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Free blocks count wrong for group #1 (32703, counted=8127).
Fix<y>? yes
Free blocks count wrong for group #2 (32768, counted=31744).
Fix<y>? yes
Free blocks count wrong (249509, counted=223909).
Fix<y>? yes
Free inodes count wrong for group #0 (8181, counted=8180).
Fix<y>? yes
Free inodes count wrong (65525, counted=65524).
Fix<y>? yes

/dev/sda: ***** FILE SYSTEM WAS MODIFIED *****
/dev/sda: 12/65536 files (8.3% non-contiguous), 38235/262144 blocks
root@(none)$


Thanks

2016-08-01 02:40:33

by Zhangfei Gao

[permalink] [raw]
Subject: Re: ext4 error when testing virtio-scsi & vhost-scsi

Hi, Jan

On Thu, Jul 28, 2016 at 9:29 AM, Zhangfei Gao <[email protected]> wrote:
> Hi, Jan
>
> On Wed, Jul 27, 2016 at 11:56 PM, Jan Kara <[email protected]> wrote:
>> Hi!
>>
>> On Wed 27-07-16 15:58:55, Zhangfei Gao wrote:
>>> Hi, Michael
>>>
>>> I have met ext4 error when using vhost_scsi on arm64 platform, and
>>> suspect it is vhost_scsi issue.
>>>
>>> Ext4 error when testing virtio_scsi & vhost_scsi
>>>
>>>
>>> No issue:
>>> 1. virtio_scsi, ext4
>>> 2. vhost_scsi & virtio_scsi, ext2
>>> 3. Instead of vhost, also tried loopback and no problem.
>>> Using loopback, host can use the new block device, while vhost is used
>>> by guest (qemu).
>>> http://www.linux-iscsi.org/wiki/Tcm_loop
>>> Test directly in host, not find ext4 error.
>>>
>>>
>>>
>>> Have issue:
>>> 1. vhost_scsi & virtio_scsi, ext4
>>> a. iblock
>>> b, fileio, file located in /tmp (ram), no device based.
>>>
>>> 2, Have tried 4.7-r2 and 4.5-rc1 on D02 board, both have issue.
>>> Since I need kvm specific patch for D02, so it may not freely to switch
>>> to older version.
>>>
>>> 3. Also test with ext4, disabling journal
>>> mkfs.ext4 -O ^has_journal /dev/sda
>>>
>>>
>>> Do you have any suggestion?
>>
>> So can you mount the filesystem with errors=remount-ro to avoid clobbering
>> the fs after the problem happens? And then run e2fsck on the problematic
>> filesystem and send the output here?
>>
>
> Tested twice, log pasted.
> Both using fileio, located in host ramfs /tmp
> Before e2fsck, umount /dev/sda
>
> 1.
> root@(none)$ mount -o errors=remount-ro /dev/sda /mnt
> [ 22.812053] EXT4-fs (sda): mounted filesystem with ordered data
> mode. Opts: errors=remount-ro
> $ rm /mnt/test
> [ 108.388905] EXT4-fs error (device sda) in
> ext4_reserve_inode_write:5362: Corrupt filesystem
> [ 108.406930] Aborting journal on device sda-8.
> [ 108.414120] EXT4-fs (sda): Remounting filesystem read-only
> [ 108.414847] EXT4-fs error (device sda) in ext4_dirty_inode:5487: IO failure
> [ 108.423571] EXT4-fs error (device sda) in ext4_free_blocks:4904:
> Journal has aborted
> [ 108.431919] EXT4-fs error (device sda) in
> ext4_reserve_inode_write:5362: Corrupt filesystem
> [ 108.440269] EXT4-fs error (device sda) in
> ext4_reserve_inode_write:5362: Corrupt filesystem
> [ 108.448568] EXT4-fs error (device sda) in
> ext4_ext_remove_space:3058: IO failure
> [ 108.456917] EXT4-fs error (device sda) in ext4_ext_truncate:4657:
> Corrupt filesystem
> [ 108.465267] EXT4-fs error (device sda) in
> ext4_reserve_inode_write:5362: Corrupt filesystem
> [ 108.473567] EXT4-fs error (device sda) in ext4_truncate:4150: IO failure
> [ 108.481917] EXT4-fs error (device sda) in
> ext4_reserve_inode_write:5362: Corrupt filesystem
> root@(none)$ e2fsck /dev/sda
> e2fsck 1.42.9 (28-Dec-2013)
> /dev/sda is mounted.
> e2fsck: Cannot continue, aborting.
>
>
> root@(none)$ umount /mnt
> [ 260.756250] EXT4-fs error (device sda): ext4_put_super:837:
> Couldn't clean up the journal
> root@(none)$ umount /mnt e2fsck /dev/sda
> e2fsck 1.42.9 (28-Dec-2013)
> ext2fs_open2: Bad magic number in super-block
> e2fsck: Superblock invalid, trying backup blocks...
> Superblock needs_recovery flag is clear, but journal has data.
> Recovery flag not set in backup superblock, so running journal anyway.
> /dev/sda: recovering journal
> Pass 1: Checking inodes, blocks, and sizes
> Pass 2: Checking directory structure
> Pass 3: Checking directory connectivity
> Pass 4: Checking reference counts
> Pass 5: Checking group summary information
> Free blocks count wrong for group #1 (32703, counted=8127).
> Fix<y>? yes
> Free blocks count wrong for group #2 (32768, counted=31744).
> Fix<y>? yes
> Free blocks count wrong (249509, counted=223909).
> Fix<y>? yes
> Free inodes count wrong for group #0 (8181, counted=8180).
> Fix<y>? yes
> Free inodes count wrong (65525, counted=65524).
> Fix<y>? yes
>
> /dev/sda: ***** FILE SYSTEM WAS MODIFIED *****
> /dev/sda: 12/65536 files (8.3% non-contiguous), 38235/262144 blocks
> root@(none)$
>
> 2.
>
> root@(none)$ rm /mnt/test
> [ 71.021484] EXT4-fs error (device sda) in
> ext4_reserve_inode_write:5362: Corrupt filesystem
> [ 71.044959] Aborting journal on device sda-8.
> [ 71.052152] EXT4-fs (sda): Remounting filesystem read-only
> [ 71.052833] EXT4-fs error (device sda) in ext4_dirty_inode:5487: IO failure
> [ 71.061600] EXT4-fs error (device sda) in ext4_free_blocks:4904:
> Journal has aborted
> [ 71.069948] EXT4-fs error (device sda) in
> ext4_reserve_inode_write:5362: Corrupt filesystem
> [ 71.078296] EXT4-fs error (device sda) in
> ext4_reserve_inode_write:5362: Corrupt filesystem
> [ 71.086597] EXT4-fs error (device sda) in
> ext4_ext_remove_space:3058: IO failure
> [ 71.094946] EXT4-fs error (device sda) in ext4_ext_truncate:4657:
> Corrupt filesystem
> [ 71.103296] EXT4-fs error (device sda) in
> ext4_reserve_inode_write:5362: Corrupt filesystem
> [ 71.111595] EXT4-fs error (device sda) in ext4_truncate:4150: IO failure
> [ 71.119946] EXT4-fs error (device sda) in
> ext4_reserve_inode_write:5362: Corrupt filesystem
> root@(none)$ e2fsck /dev/sda
> e2fsck 1.42.9 (28-Dec-2013)
> /dev/sda is mounted.
> e2fsck: Cannot continue, aborting.
>
>
> root@(none)$ umou nt /mnt/
> [ 92.103221] EXT4-fs error (device sda): ext4_put_super:837:
> Couldn't clean up the journal
> root@(none)$ umount /mnt/ e2fsck /dev/sda
> e2fsck 1.42.9 (28-Dec-2013)
> ext2fs_open2: Bad magic number in super-block
> e2fsck: Superblock invalid, trying backup blocks...
> Superblock needs_recovery flag is clear, but journal has data.
> Recovery flag not set in backup superblock, so running journal anyway.
> /dev/sda: recovering journal
> Pass 1: Checking inodes, blocks, and sizes
> Pass 2: Checking directory structure
> Pass 3: Checking directory connectivity
> Pass 4: Checking reference counts
> Pass 5: Checking group summary information
> Free blocks count wrong for group #1 (32703, counted=8127).
> Fix<y>? yes
> Free blocks count wrong for group #2 (32768, counted=31744).
> Fix<y>? yes
> Free blocks count wrong (249509, counted=223909).
> Fix<y>? yes
> Free inodes count wrong for group #0 (8181, counted=8180).
> Fix<y>? yes
> Free inodes count wrong (65525, counted=65524).
> Fix<y>? yes
>
> /dev/sda: ***** FILE SYSTEM WAS MODIFIED *****
> /dev/sda: 12/65536 files (8.3% non-contiguous), 38235/262144 blocks
> root@(none)$
>

One more test on another different arm64 machine (apm-mustang).

root@(none)$ dd if=/dev/zero of=/mnt/test bs=1M count=100; sync;
[ 117.556265] EXT4-fs error (device sda) in
ext4_reserve_inode_write:5172: Corrupt filesystem
[ 117.570231] Aborting journal on device sda-8.
[ 117.582769] EXT4-fs (sda): Remounting filesystem read-only
[ 117.583739] EXT4-fs error (device sda) in ext4_dirty_inode:5297: IO failure
[ 117.596578] EXT4-fs error (device sda) in ext4_free_blocks:4897:
Journal has aborted
[ 117.609122] EXT4-fs error (device sda) in
ext4_reserve_inode_write:5172: Corrupt filesystem
[ 117.622970] EXT4-fs error (device sda) in
ext4_reserve_inode_write:5172: Corrupt filesystem
[ 117.635486] EXT4-fs error (device sda) in
ext4_ext_remove_space:3044: IO failure
[ 117.649351] EXT4-fs error (device sda) in ext4_ext_truncate:4661:
Corrupt filesystem
[ 117.661875] EXT4-fs error (device sda) in
ext4_reserve_inode_write:5172: Corrupt filesystem
[ 117.675717] EXT4-fs error (device sda) in ext4_orphan_del:2896:
Corrupt filesystem
[ 117.688235] EXT4-fs error (device sda) in
ext4_reserve_inode_write:5172: Corrupt filesystem
dd: writing '/mnt/test': Read-only file system
1+0 records in
0+0 records out
root@(none)$ umount /mnt
[ 126.637862] EXT4-fs error (device sda): ext4_put_super:838:
Couldn't clean up the journal
root@(none)$ e2fsck /dev/sda
e2fsck 1.42.9 (28-Dec-2013)
ext2fs_open2: Bad magic number in super-block
e2fsck: Superblock invalid, trying backup blocks...
Superblock needs_recovery flag is clear, but journal has data.
Recovery flag not set in backup superblock, so running journal anyway.
/dev/sda: recovering journal
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences: -(86016--87039)
Fix<y>? yes
Free blocks count wrong for group #1 (32703, counted=7103).
Fix<y>? yes
Free blocks count wrong (249509, counted=223909).
Fix<y>? yes
Free inodes count wrong for group #0 (8181, counted=8180).
Fix<y>? yes
Free inodes count wrong (65525, counted=65524).
Fix<y>? yes

/dev/sda: ***** FILE SYSTEM WAS MODIFIED *****
/dev/sda: 12/65536 files (8.3% non-contiguous), 38235/262144 blocks



Do you know what's the possible reason of this error?


I got from your comments from other mail.
"
Hum, interesting. So 'Free blocks count wrong' and 'Free inodes count
wrong' messages are harmless - those entries and updated only
opportunistically and on mount and generally do not have to match on live
filesystem. The other three errors regarding inode and directory count are
a fallout from aborted inode deletion. Most importantly there is *no
problem* whatsoever with block bitmaps. So it was either some memory glitch
(bitflip in the counter or the bitmap) or there is some race and bb_free
can get out of sync with the bitmap and I don't see how that could happen
especially so early after mount... Strange.
"

there is such error:
Block bitmap differences: -(86016--87039)

Thanks