2022-10-20 01:12:18

by Dmitrii Tcvetkov

[permalink] [raw]
Subject: [bisected] QEMU guest boot failure since 6.0 on x86_64 host

Hello,

After upgrading host kernel to 6.0 QEMU 7.0.0 guests can't boot if their
underlying disk image has 4096 byte sectors and cache property set to
"none".

Host kernel:
Linux version 6.0.2 ([email protected]) (gcc (Gentoo 11.3.0 p4)
11.3.0, GNU ld (Gentoo 2.38 p4) 2.38) #1 SMP

Bisect led me to commit b1a000d3b8ec5 ("block: relax direct io memory
alignment"). I was unable to resolve revert conflicts when
tried to revert b1a000d3b8ec5 ("block: relax direct io memory
alignment") as I lack necessary understanding of block subsystem.

This fails to boot on 6.0+ host:
# losetup -b 4096 -f image.raw
# qemu-system-x86_64 -enable-kvm -drive
file=/dev/loop0,format=raw,cache=none

These boot fine on 6.0+ host:
# losetup -b 4096 -f image.raw
# qemu-system-x86_64 -enable-kvm -drive
file=/dev/loop0,format=raw

# losetup -f image.raw
# qemu-system-x86_64 -enable-kvm -drive
file=/dev/loop0,format=raw,cache=none

On 5.19 and older kernels the guest boots in all cases above. Problem
reproduces on 6.1-rc1.

What other info I can provide to help to find root cause?

Bisect log:
git bisect start
# status: waiting for both good and bad commits
# bad: [4fe89d07dcc2804c8b562f6c7896a45643d34b2f] Linux 6.0
git bisect bad 4fe89d07dcc2804c8b562f6c7896a45643d34b2f
# status: waiting for good commit(s), bad commit known
# good: [3d7cb6b04c3f3115719235cc6866b10326de34cd] Linux 5.19
git bisect good 3d7cb6b04c3f3115719235cc6866b10326de34cd
# bad: [78acd4ca433425e6dd4032cfc2156c60e34931f2] usb: cdns3: Don't use priv_dev uninitialized in cdns3_gadget_ep_enable()
git bisect bad 78acd4ca433425e6dd4032cfc2156c60e34931f2
# bad: [526942b8134cc34d25d27f95dfff98b8ce2f6fcd] Merge tag 'ata-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata
git bisect bad 526942b8134cc34d25d27f95dfff98b8ce2f6fcd
# good: [2e7a95156d64667a8ded606829d57c6fc92e41df] Merge tag 'regmap-v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap
git bisect good 2e7a95156d64667a8ded606829d57c6fc92e41df
# bad: [c013d0af81f60cc7dbe357c4e2a925fb6738dbfe] Merge tag 'for-5.20/block-2022-07-29' of git://git.kernel.dk/linux-block
git bisect bad c013d0af81f60cc7dbe357c4e2a925fb6738dbfe
# good: [efb2883060afc79638bb1eb19e2c30e7f6c5a178] Merge branch 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux
git bisect good efb2883060afc79638bb1eb19e2c30e7f6c5a178
# good: [cb309ae49da7a7c28f0051deea13970291134fac] io_uring/net: improve io_get_notif_slot types
git bisect good cb309ae49da7a7c28f0051deea13970291134fac
# bad: [22c80aac882f712897b88b7ea8f5a74ea19019df] blktrace: Trace remapped requests correctly
git bisect bad 22c80aac882f712897b88b7ea8f5a74ea19019df
# bad: [22d0c4080fe49299640d9d6c43154c49794c2825] block: simplify disk_set_independent_access_ranges
git bisect bad 22d0c4080fe49299640d9d6c43154c49794c2825
# bad: [3c8f9da41ed90294d8ca42b3ad8a13c5379bd549] blk-mq: Don't disable preemption around __blk_mq_run_hw_queue().
git bisect bad 3c8f9da41ed90294d8ca42b3ad8a13c5379bd549
# bad: [798f2a6f734de87633351c3ab13b17b07397cf68] block: Directly use ida_alloc()/free()
git bisect bad 798f2a6f734de87633351c3ab13b17b07397cf68
# good: [67927d22015060967122facc8cfeaad8012e8808] block/merge: count bytes instead of sectors
git bisect good 67927d22015060967122facc8cfeaad8012e8808
# good: [5debd9691c3ac64c3acd6867c264ad38bbe48cdc] block: introduce bdev_iter_is_aligned helper
git bisect good 5debd9691c3ac64c3acd6867c264ad38bbe48cdc
# bad: [bf8d08532bc19a14cfb54ae61099dccadefca446] iomap: add support for dma aligned direct-io
git bisect bad bf8d08532bc19a14cfb54ae61099dccadefca446
# bad: [b1a000d3b8ec582da64bb644be633e5a0beffcbf] block: relax direct io memory alignment
git bisect bad b1a000d3b8ec582da64bb644be633e5a0beffcbf
# first bad commit: [b1a000d3b8ec582da64bb644be633e5a0beffcbf] block: relax direct io memory alignment


2022-10-20 02:43:51

by Keith Busch

[permalink] [raw]
Subject: Re: [bisected] QEMU guest boot failure since 6.0 on x86_64 host

On Thu, Oct 20, 2022 at 03:17:25AM +0300, Dmitrii Tcvetkov wrote:
>
> Bisect led me to commit b1a000d3b8ec5 ("block: relax direct io memory
> alignment"). I was unable to resolve revert conflicts when
> tried to revert b1a000d3b8ec5 ("block: relax direct io memory
> alignment") as I lack necessary understanding of block subsystem.

Background info: when your virtual block device's logical block size is
smaller than the host's block device backing it, qemu needs to bounce
unaligned buffers when using direct-io.

Historically for direct-io, the logical block size happened to also be
the memory page offset alignment. QEMU did this the other way around: it
used the memory offset as the block size, and that was not intended:

https://lore.kernel.org/lkml/[email protected]/

The kernel patch you bisected to detangled memory alignment from logical
block size, so now older qemu versions have the wrong idea of the
minimum vector size. That is fixed in the qemu repository here:

https://git.qemu.org/?p=qemu.git;a=commitdiff;h=25474d90aa50bd32e0de395a33d8de42dd6f2aef
>
> This fails to boot on 6.0+ host:
> # losetup -b 4096 -f image.raw
> # qemu-system-x86_64 -enable-kvm -drive
> file=/dev/loop0,format=raw,cache=none

In the above, your backing storage is 4k, and the default virtual device
block size is 512b, so qemu needs to bounce that, but older versions
might not do that as intended.

It should work if you include logical_block_size=4096 to the -drive
parameters.

> These boot fine on 6.0+ host:
> # losetup -b 4096 -f image.raw
> # qemu-system-x86_64 -enable-kvm -drive
> file=/dev/loop0,format=raw

The above is using cache, which doesn't have any alignment and size
constraints, so works with anything sizes.

> # losetup -f image.raw
> # qemu-system-x86_64 -enable-kvm -drive
> file=/dev/loop0,format=raw,cache=none

The above is using a 512b formated backing store to a 512b emulated
drive, so the matching means qemu never needs to bounce.

2022-10-20 11:38:06

by Dmitrii Tcvetkov

[permalink] [raw]
Subject: Re: [bisected] QEMU guest boot failure since 6.0 on x86_64 host

On Wed, 19 Oct 2022 19:28:17 -0600
Keith Busch <[email protected]> wrote:

> On Thu, Oct 20, 2022 at 03:17:25AM +0300, Dmitrii Tcvetkov wrote:
> >
> > Bisect led me to commit b1a000d3b8ec5 ("block: relax direct io
> > memory alignment"). I was unable to resolve revert conflicts when
> > tried to revert b1a000d3b8ec5 ("block: relax direct io memory
> > alignment") as I lack necessary understanding of block subsystem.
>
> Background info: when your virtual block device's logical block size
> is smaller than the host's block device backing it, qemu needs to
> bounce unaligned buffers when using direct-io.
>
> Historically for direct-io, the logical block size happened to also be
> the memory page offset alignment. QEMU did this the other way around:
> it used the memory offset as the block size, and that was not
> intended:
>
> https://lore.kernel.org/lkml/[email protected]/
>
> The kernel patch you bisected to detangled memory alignment from
> logical block size, so now older qemu versions have the wrong idea of
> the minimum vector size. That is fixed in the qemu repository here:
>
> https://git.qemu.org/?p=qemu.git;a=commitdiff;h=25474d90aa50bd32e0de395a33d8de42dd6f2aef
> >
> > This fails to boot on 6.0+ host:
> > # losetup -b 4096 -f image.raw
> > # qemu-system-x86_64 -enable-kvm -drive
> > file=/dev/loop0,format=raw,cache=none
>
> In the above, your backing storage is 4k, and the default virtual
> device block size is 512b, so qemu needs to bounce that, but older
> versions might not do that as intended.
>
> It should work if you include logical_block_size=4096 to the -drive
> parameters.
>
> > These boot fine on 6.0+ host:
> > # losetup -b 4096 -f image.raw
> > # qemu-system-x86_64 -enable-kvm -drive
> > file=/dev/loop0,format=raw
>
> The above is using cache, which doesn't have any alignment and size
> constraints, so works with anything sizes.
>
> > # losetup -f image.raw
> > # qemu-system-x86_64 -enable-kvm -drive
> > file=/dev/loop0,format=raw,cache=none
>
> The above is using a 512b formated backing store to a 512b emulated
> drive, so the matching means qemu never needs to bounce.

Thanks! Specifying logical_block_size=4096 indeed helps, guest still
doesn't boot but because it has partition table with an assumption of 512
sectors. After reinstall with logical_block_size=4096 specified it
boots.