2023-11-29 15:30:54

by Fiona Ebner

[permalink] [raw]
Subject: SCSI hotplug issues with UEFI VM with guest kernel >= 6.5

Hi,
it seems that hot-plugging SCSI disks for QEMU virtual machines booting
with UEFI and with guest kernels >= 6.5 might be broken. It's not
consistently broken, hinting there might be a race somewhere.

Reverting the following two commits seems to make it work reliably again:

cc22522fd55e2 ("PCI: acpiphp: Use
pci_assign_unassigned_bridge_resources() only for non-root bus")
40613da52b13f ("PCI: acpiphp: Reassign resources on bridge if necessary"

Of course, they might only expose some pre-existing issue, but this is
my best lead. See below for some logs and details about an affected
virtual machine. Happy to provide more information and to debug/test
further.

Best Regards,
Fiona

Host kernel: 6.5.11-4-pve which is based on the one from Ubuntu
Guest kernel: 6.7.0-rc3 and 6.7.0-rc3 with above commits reverted
QEMU version: v8.1.0 built from source
EDK2 version: submodule in the QEMU v8.1 repository: edk2-stable202302

QEMU command line:

> ./qemu-system-x86_64 \
> -accel 'kvm' \
> -chardev 'socket,id=qmp,path=/var/run/qemu-server/104.qmp,server=on,wait=off' \
> -mon 'chardev=qmp,mode=control' \
> -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' \
> -mon 'chardev=qmp-event,mode=control' \
> -pidfile /var/run/qemu-server/104.pid \
> -drive 'if=pflash,unit=0,format=raw,readonly=on,file=./pc-bios/edk2-x86_64-code.fd' \
> -drive 'if=pflash,unit=1,id=drive-efidisk0,format=raw,file=/dev/u2nvme/vm-104-disk-0,size=540672' \
> -smp '4,sockets=1,cores=4,maxcpus=4' \
> -nodefaults \
> -vnc 'unix:/var/run/qemu-server/104.vnc,password=on' \
> -m 4096 \
> -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' \
> -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' \
> -device 'pci-bridge,id=pci.3,chassis_nr=3,bus=pci.0,addr=0x5' \
> -device 'VGA,id=vga,bus=pci.0,addr=0x2' \
> -device 'virtio-scsi-pci,id=virtioscsi0,bus=pci.3,addr=0x1' \
> -drive 'file=/dev/u2nvme/vm-104-disk-1,if=none,id=drive-scsi0,format=raw' \
> -blockdev 'raw,file.driver=file,file.filename=/home/febner/plug.raw,node-name=drive-scsi1' \
> -device 'scsi-hd,bus=virtioscsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' \
> -netdev 'type=tap,id=net0,ifname=tap104i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' \
> -device 'virtio-net-pci,mac=BC:24:11:89:6A:E6,netdev=net0,bus=pci.0,addr=0x12,id=net0,rx_queue_size=1024,tx_queue_size=256,bootindex=102' \
> -bios './pc-bios/edk2-x86_64-code.fd'

Script to issue hotplug command via QEMU monitor protocol (QMP):

> #!/bin/sh
>
> ID=$1
> CMD=$2
>
> if [ -z "$ID" ]; then
> echo "need to specify ID";
> exit 1;
> fi
>
> if [ -z "$CMD" ]; then
> echo "need to specify command (plug or unplug)";
> exit 1;
> fi
>
>
> if [ "$CMD" = "plug" ]; then
> socat - /var/run/qemu-server/"$ID".qmp << END
> {"execute": "qmp_capabilities"}
> {"arguments":{"driver":"virtio-scsi-pci","bus":"pci.3","addr":"0x2","id":"virtioscsi1"},"execute":"device_add"}
> {"arguments":{"bus":"virtioscsi1.0","channel":"0","driver":"scsi-hd","id":"scsi1","drive":"drive-scsi1","scsi-id":"0","lun":"1"},"execute":"device_add"}
> END
> elif [ "$CMD" = "unplug" ]; then
> socat - /var/run/qemu-server/"$ID".qmp << END
> {"execute": "qmp_capabilities"}
> {"arguments":{"id":"scsi1"},"execute":"device_del"}
> {"arguments":{"id":"virtioscsi1"},"execute":"device_del"}
> END
> fi

I've also tired and added 10 second sleep between the two device_add
commands just to be sure, but that didn't make a difference. (Our
management stack does query via QMP and wait for the device to show up
and is also affected, I was just too lazy to do that for the reproducer
here).

I've attached some logs for guest using kernel 6.7.0-rc3 where hotplug
works rarely and guest using kernel 6.7.0-rc3 with the previously
mentioned commits reverted where hotplug works reliably:

6.7.0-rc3:

> Nov 29 15:12:02 hotplug kernel: pci 0000:01:02.0: [1af4:1004] type 00 class 0x010000
> Nov 29 15:12:02 hotplug kernel: pci 0000:01:02.0: reg 0x10: [io 0x0000-0x003f]
> Nov 29 15:12:02 hotplug kernel: pci 0000:01:02.0: reg 0x14: [mem 0x00000000-0x00000fff]
> Nov 29 15:12:02 hotplug kernel: pci 0000:01:02.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref]
> Nov 29 15:12:02 hotplug kernel: pci 0000:01:02.0: BAR 4: assigned [mem 0xc000004000-0xc000007fff 64bit pref]
> Nov 29 15:12:02 hotplug kernel: pci 0000:01:02.0: BAR 1: assigned [mem 0xc1401000-0xc1401fff]
> Nov 29 15:12:02 hotplug kernel: pci 0000:01:02.0: BAR 0: assigned [io 0xe040-0xe07f]
> Nov 29 15:12:02 hotplug kernel: pci 0000:00:05.0: PCI bridge to [bus 01]
> Nov 29 15:12:02 hotplug kernel: pci 0000:00:05.0: bridge window [io 0xe000-0xefff]
> Nov 29 15:12:02 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc1400000-0xc15fffff]
> Nov 29 15:12:02 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc000000000-0xc01fffffff 64bit pref]
> Nov 29 15:12:02 hotplug kernel: virtio-pci 0000:01:02.0: enabling device (0000 -> 0003)
> Nov 29 15:12:02 hotplug kernel: ACPI: \_SB_.LNKC: Enabled at IRQ 11
> Nov 29 15:12:02 hotplug kernel: scsi host3: Virtio SCSI HBA
> Nov 29 15:12:02 hotplug kernel: pci 0000:00:05.0: PCI bridge to [bus 01]
> Nov 29 15:12:02 hotplug kernel: pci 0000:00:05.0: bridge window [io 0xe000-0xefff]
> Nov 29 15:12:02 hotplug kernel: scsi 3:0:0:1: Direct-Access QEMU QEMU HARDDISK 2.5+ PQ: 0 ANSI: 5
> Nov 29 15:12:02 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc1400000-0xc15fffff]
> Nov 29 15:12:02 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc000000000-0xc01fffffff 64bit pref]

Reboot

> Nov 29 15:12:52 hotplug kernel: pci 0000:01:02.0: [1af4:1004] type 00 class 0x010000
> Nov 29 15:12:52 hotplug kernel: pci 0000:01:02.0: reg 0x10: [io 0x0000-0x003f]
> Nov 29 15:12:52 hotplug kernel: pci 0000:01:02.0: reg 0x14: [mem 0x00000000-0x00000fff]
> Nov 29 15:12:52 hotplug kernel: pci 0000:01:02.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref]
> Nov 29 15:12:52 hotplug kernel: pci 0000:01:02.0: BAR 4: assigned [mem 0xc000004000-0xc000007fff 64bit pref]
> Nov 29 15:12:52 hotplug kernel: pci 0000:01:02.0: BAR 1: assigned [mem 0xc1401000-0xc1401fff]
> Nov 29 15:12:52 hotplug kernel: pci 0000:01:02.0: BAR 0: assigned [io 0xe040-0xe07f]
> Nov 29 15:12:52 hotplug kernel: pci 0000:00:05.0: PCI bridge to [bus 01]
> Nov 29 15:12:52 hotplug kernel: pci 0000:00:05.0: bridge window [io 0xe000-0xefff]
> Nov 29 15:12:52 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc1400000-0xc15fffff]
> Nov 29 15:12:52 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc000000000-0xc01fffffff 64bit pref]
> Nov 29 15:12:52 hotplug kernel: virtio-pci 0000:01:02.0: enabling device (0000 -> 0003)
> Nov 29 15:12:52 hotplug kernel: ACPI: \_SB_.LNKC: Enabled at IRQ 11
> Nov 29 15:12:52 hotplug kernel: scsi host3: Virtio SCSI HBA
> Nov 29 15:12:52 hotplug kernel: pci 0000:00:05.0: PCI bridge to [bus 01]
> Nov 29 15:12:52 hotplug kernel: pci 0000:00:05.0: bridge window [io 0xe000-0xefff]
> Nov 29 15:12:52 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc1400000-0xc15fffff]
> Nov 29 15:12:52 hotplug kernel: scsi 3:0:0:1: Direct-Access QEMU QEMU HARDDISK 2.5+ PQ: 0 ANSI: 5
> Nov 29 15:12:52 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc000000000-0xc01fffffff 64bit pref]

RebootThe one time it did work. Note that the line with "QEMU HARDDISK"
comes after all lines with "bridge window":

> Nov 29 15:13:51 hotplug kernel: pci 0000:01:02.0: [1af4:1004] type 00 class 0x010000
> Nov 29 15:13:51 hotplug kernel: pci 0000:01:02.0: reg 0x10: [io 0x0000-0x003f]
> Nov 29 15:13:51 hotplug kernel: pci 0000:01:02.0: reg 0x14: [mem 0x00000000-0x00000fff]
> Nov 29 15:13:51 hotplug kernel: pci 0000:01:02.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref]
> Nov 29 15:13:51 hotplug kernel: pci 0000:01:02.0: BAR 4: assigned [mem 0xc000004000-0xc000007fff 64bit pref]
> Nov 29 15:13:51 hotplug kernel: pci 0000:01:02.0: BAR 1: assigned [mem 0xc1401000-0xc1401fff]
> Nov 29 15:13:51 hotplug kernel: pci 0000:01:02.0: BAR 0: assigned [io 0xe040-0xe07f]
> Nov 29 15:13:51 hotplug kernel: pci 0000:00:05.0: PCI bridge to [bus 01]
> Nov 29 15:13:51 hotplug kernel: pci 0000:00:05.0: bridge window [io 0xe000-0xefff]
> Nov 29 15:13:51 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc1400000-0xc15fffff]
> Nov 29 15:13:51 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc000000000-0xc01fffffff 64bit pref]
> Nov 29 15:13:51 hotplug kernel: virtio-pci 0000:01:02.0: enabling device (0000 -> 0003)
> Nov 29 15:13:51 hotplug kernel: ACPI: \_SB_.LNKC: Enabled at IRQ 11
> Nov 29 15:13:51 hotplug kernel: scsi host3: Virtio SCSI HBA
> Nov 29 15:13:51 hotplug kernel: pci 0000:00:05.0: PCI bridge to [bus 01]
> Nov 29 15:13:51 hotplug kernel: pci 0000:00:05.0: bridge window [io 0xe000-0xefff]
> Nov 29 15:13:51 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc1400000-0xc15fffff]
> Nov 29 15:13:51 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc000000000-0xc01fffffff 64bit pref]
> Nov 29 15:13:51 hotplug kernel: scsi 3:0:0:1: Direct-Access QEMU QEMU HARDDISK 2.5+ PQ: 0 ANSI: 5
> Nov 29 15:13:51 hotplug kernel: sd 3:0:0:1: Attached scsi generic sg1 type 0
> Nov 29 15:13:51 hotplug kernel: sd 3:0:0:1: Power-on or device reset occurred
> Nov 29 15:13:51 hotplug kernel: sd 3:0:0:1: [sdb] 2048 512-byte logical blocks: (1.05 MB/1.00 MiB)
> Nov 29 15:13:51 hotplug kernel: sd 3:0:0:1: [sdb] Write Protect is off
> Nov 29 15:13:51 hotplug kernel: sd 3:0:0:1: [sdb] Mode Sense: 63 00 00 08
> Nov 29 15:13:51 hotplug kernel: sd 3:0:0:1: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
> Nov 29 15:13:51 hotplug kernel: sd 3:0:0:1: [sdb] Attached SCSI disk
> Nov 29 15:14:08 hotplug systemd[1]: systemd-fsckd.service: Deactivated successfully.

6.7.0-rc3 with the following reverted:
cc22522fd55e2 ("PCI: acpiphp: Use
pci_assign_unassigned_bridge_resources() only for non-root bus")
40613da52b13f ("PCI: acpiphp: Reassign resources on bridge if necessary")

> Nov 29 15:15:37 hotplug kernel: pci 0000:01:02.0: [1af4:1004] type 00 class 0x010000
> Nov 29 15:15:37 hotplug kernel: pci 0000:01:02.0: reg 0x10: [io 0x0000-0x003f]
> Nov 29 15:15:37 hotplug kernel: pci 0000:01:02.0: reg 0x14: [mem 0x00000000-0x00000fff]
> Nov 29 15:15:37 hotplug kernel: pci 0000:01:02.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref]
> Nov 29 15:15:37 hotplug kernel: pci 0000:01:02.0: BAR 4: assigned [mem 0xc000004000-0xc000007fff 64bit pref]
> Nov 29 15:15:37 hotplug kernel: pci 0000:01:02.0: BAR 1: assigned [mem 0xc1401000-0xc1401fff]
> Nov 29 15:15:37 hotplug kernel: pci 0000:01:02.0: BAR 0: assigned [io 0xe040-0xe07f]
> Nov 29 15:15:37 hotplug kernel: virtio-pci 0000:01:02.0: enabling device (0000 -> 0003)
> Nov 29 15:15:37 hotplug kernel: ACPI: \_SB_.LNKC: Enabled at IRQ 11
> Nov 29 15:15:37 hotplug kernel: scsi host3: Virtio SCSI HBA
> Nov 29 15:15:37 hotplug kernel: scsi 3:0:0:1: Direct-Access QEMU QEMU HARDDISK 2.5+ PQ: 0 ANSI: 5
> Nov 29 15:15:37 hotplug kernel: sd 3:0:0:1: Attached scsi generic sg1 type 0
> Nov 29 15:15:37 hotplug kernel: sd 3:0:0:1: Power-on or device reset occurred
> Nov 29 15:15:37 hotplug kernel: sd 3:0:0:1: [sdb] 2048 512-byte logical blocks: (1.05 MB/1.00 MiB)
> Nov 29 15:15:37 hotplug kernel: sd 3:0:0:1: [sdb] Write Protect is off
> Nov 29 15:15:37 hotplug kernel: sd 3:0:0:1: [sdb] Mode Sense: 63 00 00 08
> Nov 29 15:15:37 hotplug kernel: sd 3:0:0:1: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
> Nov 29 15:15:37 hotplug kernel: sd 3:0:0:1: [sdb] Attached SCSI disk
> Nov 29 15:15:38 hotplug systemd[1]: systemd-fsckd.service: Deactivated successfully.


2023-11-30 23:18:20

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: SCSI hotplug issues with UEFI VM with guest kernel >= 6.5

On Wed, Nov 29, 2023 at 04:22:41PM +0100, Fiona Ebner wrote:
> Hi,
> it seems that hot-plugging SCSI disks for QEMU virtual machines booting
> with UEFI and with guest kernels >= 6.5 might be broken. It's not
> consistently broken, hinting there might be a race somewhere.
>
> Reverting the following two commits seems to make it work reliably again:
>
> cc22522fd55e2 ("PCI: acpiphp: Use
> pci_assign_unassigned_bridge_resources() only for non-root bus")
> 40613da52b13f ("PCI: acpiphp: Reassign resources on bridge if necessary"
>
> Of course, they might only expose some pre-existing issue, but this is
> my best lead. See below for some logs and details about an affected
> virtual machine. Happy to provide more information and to debug/test
> further.

Shoot. Thanks very much for the report and your debugging. I'm
hoping Igor will chime in with some ideas.

Both of those commits appeard in v6.5 and fixed legit issues, so I
hate to revert them, but this does appear to be a regression.

#regzbot introduced: cc22522fd55e2 ^
#regzbot introduced: 40613da52b13f ^

> Host kernel: 6.5.11-4-pve which is based on the one from Ubuntu
> Guest kernel: 6.7.0-rc3 and 6.7.0-rc3 with above commits reverted
> QEMU version: v8.1.0 built from source
> EDK2 version: submodule in the QEMU v8.1 repository: edk2-stable202302
>
> QEMU command line:
>
> > ./qemu-system-x86_64 \
> > -accel 'kvm' \
> > -chardev 'socket,id=qmp,path=/var/run/qemu-server/104.qmp,server=on,wait=off' \
> > -mon 'chardev=qmp,mode=control' \
> > -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' \
> > -mon 'chardev=qmp-event,mode=control' \
> > -pidfile /var/run/qemu-server/104.pid \
> > -drive 'if=pflash,unit=0,format=raw,readonly=on,file=./pc-bios/edk2-x86_64-code.fd' \
> > -drive 'if=pflash,unit=1,id=drive-efidisk0,format=raw,file=/dev/u2nvme/vm-104-disk-0,size=540672' \
> > -smp '4,sockets=1,cores=4,maxcpus=4' \
> > -nodefaults \
> > -vnc 'unix:/var/run/qemu-server/104.vnc,password=on' \
> > -m 4096 \
> > -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' \
> > -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' \
> > -device 'pci-bridge,id=pci.3,chassis_nr=3,bus=pci.0,addr=0x5' \
> > -device 'VGA,id=vga,bus=pci.0,addr=0x2' \
> > -device 'virtio-scsi-pci,id=virtioscsi0,bus=pci.3,addr=0x1' \
> > -drive 'file=/dev/u2nvme/vm-104-disk-1,if=none,id=drive-scsi0,format=raw' \
> > -blockdev 'raw,file.driver=file,file.filename=/home/febner/plug.raw,node-name=drive-scsi1' \
> > -device 'scsi-hd,bus=virtioscsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' \
> > -netdev 'type=tap,id=net0,ifname=tap104i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' \
> > -device 'virtio-net-pci,mac=BC:24:11:89:6A:E6,netdev=net0,bus=pci.0,addr=0x12,id=net0,rx_queue_size=1024,tx_queue_size=256,bootindex=102' \
> > -bios './pc-bios/edk2-x86_64-code.fd'
>
> Script to issue hotplug command via QEMU monitor protocol (QMP):
>
> > #!/bin/sh
> >
> > ID=$1
> > CMD=$2
> >
> > if [ -z "$ID" ]; then
> > echo "need to specify ID";
> > exit 1;
> > fi
> >
> > if [ -z "$CMD" ]; then
> > echo "need to specify command (plug or unplug)";
> > exit 1;
> > fi
> >
> >
> > if [ "$CMD" = "plug" ]; then
> > socat - /var/run/qemu-server/"$ID".qmp << END
> > {"execute": "qmp_capabilities"}
> > {"arguments":{"driver":"virtio-scsi-pci","bus":"pci.3","addr":"0x2","id":"virtioscsi1"},"execute":"device_add"}
> > {"arguments":{"bus":"virtioscsi1.0","channel":"0","driver":"scsi-hd","id":"scsi1","drive":"drive-scsi1","scsi-id":"0","lun":"1"},"execute":"device_add"}
> > END
> > elif [ "$CMD" = "unplug" ]; then
> > socat - /var/run/qemu-server/"$ID".qmp << END
> > {"execute": "qmp_capabilities"}
> > {"arguments":{"id":"scsi1"},"execute":"device_del"}
> > {"arguments":{"id":"virtioscsi1"},"execute":"device_del"}
> > END
> > fi
>
> I've also tired and added 10 second sleep between the two device_add
> commands just to be sure, but that didn't make a difference. (Our
> management stack does query via QMP and wait for the device to show up
> and is also affected, I was just too lazy to do that for the reproducer
> here).
>
> I've attached some logs for guest using kernel 6.7.0-rc3 where hotplug
> works rarely and guest using kernel 6.7.0-rc3 with the previously
> mentioned commits reverted where hotplug works reliably:
>
> 6.7.0-rc3:
>
> > Nov 29 15:12:02 hotplug kernel: pci 0000:01:02.0: [1af4:1004] type 00 class 0x010000
> > Nov 29 15:12:02 hotplug kernel: pci 0000:01:02.0: reg 0x10: [io 0x0000-0x003f]
> > Nov 29 15:12:02 hotplug kernel: pci 0000:01:02.0: reg 0x14: [mem 0x00000000-0x00000fff]
> > Nov 29 15:12:02 hotplug kernel: pci 0000:01:02.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref]
> > Nov 29 15:12:02 hotplug kernel: pci 0000:01:02.0: BAR 4: assigned [mem 0xc000004000-0xc000007fff 64bit pref]
> > Nov 29 15:12:02 hotplug kernel: pci 0000:01:02.0: BAR 1: assigned [mem 0xc1401000-0xc1401fff]
> > Nov 29 15:12:02 hotplug kernel: pci 0000:01:02.0: BAR 0: assigned [io 0xe040-0xe07f]
> > Nov 29 15:12:02 hotplug kernel: pci 0000:00:05.0: PCI bridge to [bus 01]
> > Nov 29 15:12:02 hotplug kernel: pci 0000:00:05.0: bridge window [io 0xe000-0xefff]
> > Nov 29 15:12:02 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc1400000-0xc15fffff]
> > Nov 29 15:12:02 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc000000000-0xc01fffffff 64bit pref]
> > Nov 29 15:12:02 hotplug kernel: virtio-pci 0000:01:02.0: enabling device (0000 -> 0003)
> > Nov 29 15:12:02 hotplug kernel: ACPI: \_SB_.LNKC: Enabled at IRQ 11
> > Nov 29 15:12:02 hotplug kernel: scsi host3: Virtio SCSI HBA
> > Nov 29 15:12:02 hotplug kernel: pci 0000:00:05.0: PCI bridge to [bus 01]
> > Nov 29 15:12:02 hotplug kernel: pci 0000:00:05.0: bridge window [io 0xe000-0xefff]
> > Nov 29 15:12:02 hotplug kernel: scsi 3:0:0:1: Direct-Access QEMU QEMU HARDDISK 2.5+ PQ: 0 ANSI: 5
> > Nov 29 15:12:02 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc1400000-0xc15fffff]
> > Nov 29 15:12:02 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc000000000-0xc01fffffff 64bit pref]
>
> Reboot
>
> > Nov 29 15:12:52 hotplug kernel: pci 0000:01:02.0: [1af4:1004] type 00 class 0x010000
> > Nov 29 15:12:52 hotplug kernel: pci 0000:01:02.0: reg 0x10: [io 0x0000-0x003f]
> > Nov 29 15:12:52 hotplug kernel: pci 0000:01:02.0: reg 0x14: [mem 0x00000000-0x00000fff]
> > Nov 29 15:12:52 hotplug kernel: pci 0000:01:02.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref]
> > Nov 29 15:12:52 hotplug kernel: pci 0000:01:02.0: BAR 4: assigned [mem 0xc000004000-0xc000007fff 64bit pref]
> > Nov 29 15:12:52 hotplug kernel: pci 0000:01:02.0: BAR 1: assigned [mem 0xc1401000-0xc1401fff]
> > Nov 29 15:12:52 hotplug kernel: pci 0000:01:02.0: BAR 0: assigned [io 0xe040-0xe07f]
> > Nov 29 15:12:52 hotplug kernel: pci 0000:00:05.0: PCI bridge to [bus 01]
> > Nov 29 15:12:52 hotplug kernel: pci 0000:00:05.0: bridge window [io 0xe000-0xefff]
> > Nov 29 15:12:52 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc1400000-0xc15fffff]
> > Nov 29 15:12:52 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc000000000-0xc01fffffff 64bit pref]
> > Nov 29 15:12:52 hotplug kernel: virtio-pci 0000:01:02.0: enabling device (0000 -> 0003)
> > Nov 29 15:12:52 hotplug kernel: ACPI: \_SB_.LNKC: Enabled at IRQ 11
> > Nov 29 15:12:52 hotplug kernel: scsi host3: Virtio SCSI HBA
> > Nov 29 15:12:52 hotplug kernel: pci 0000:00:05.0: PCI bridge to [bus 01]
> > Nov 29 15:12:52 hotplug kernel: pci 0000:00:05.0: bridge window [io 0xe000-0xefff]
> > Nov 29 15:12:52 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc1400000-0xc15fffff]
> > Nov 29 15:12:52 hotplug kernel: scsi 3:0:0:1: Direct-Access QEMU QEMU HARDDISK 2.5+ PQ: 0 ANSI: 5
> > Nov 29 15:12:52 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc000000000-0xc01fffffff 64bit pref]
>
> RebootThe one time it did work. Note that the line with "QEMU HARDDISK"
> comes after all lines with "bridge window":
>
> > Nov 29 15:13:51 hotplug kernel: pci 0000:01:02.0: [1af4:1004] type 00 class 0x010000
> > Nov 29 15:13:51 hotplug kernel: pci 0000:01:02.0: reg 0x10: [io 0x0000-0x003f]
> > Nov 29 15:13:51 hotplug kernel: pci 0000:01:02.0: reg 0x14: [mem 0x00000000-0x00000fff]
> > Nov 29 15:13:51 hotplug kernel: pci 0000:01:02.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref]
> > Nov 29 15:13:51 hotplug kernel: pci 0000:01:02.0: BAR 4: assigned [mem 0xc000004000-0xc000007fff 64bit pref]
> > Nov 29 15:13:51 hotplug kernel: pci 0000:01:02.0: BAR 1: assigned [mem 0xc1401000-0xc1401fff]
> > Nov 29 15:13:51 hotplug kernel: pci 0000:01:02.0: BAR 0: assigned [io 0xe040-0xe07f]
> > Nov 29 15:13:51 hotplug kernel: pci 0000:00:05.0: PCI bridge to [bus 01]
> > Nov 29 15:13:51 hotplug kernel: pci 0000:00:05.0: bridge window [io 0xe000-0xefff]
> > Nov 29 15:13:51 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc1400000-0xc15fffff]
> > Nov 29 15:13:51 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc000000000-0xc01fffffff 64bit pref]
> > Nov 29 15:13:51 hotplug kernel: virtio-pci 0000:01:02.0: enabling device (0000 -> 0003)
> > Nov 29 15:13:51 hotplug kernel: ACPI: \_SB_.LNKC: Enabled at IRQ 11
> > Nov 29 15:13:51 hotplug kernel: scsi host3: Virtio SCSI HBA
> > Nov 29 15:13:51 hotplug kernel: pci 0000:00:05.0: PCI bridge to [bus 01]
> > Nov 29 15:13:51 hotplug kernel: pci 0000:00:05.0: bridge window [io 0xe000-0xefff]
> > Nov 29 15:13:51 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc1400000-0xc15fffff]
> > Nov 29 15:13:51 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc000000000-0xc01fffffff 64bit pref]
> > Nov 29 15:13:51 hotplug kernel: scsi 3:0:0:1: Direct-Access QEMU QEMU HARDDISK 2.5+ PQ: 0 ANSI: 5
> > Nov 29 15:13:51 hotplug kernel: sd 3:0:0:1: Attached scsi generic sg1 type 0
> > Nov 29 15:13:51 hotplug kernel: sd 3:0:0:1: Power-on or device reset occurred
> > Nov 29 15:13:51 hotplug kernel: sd 3:0:0:1: [sdb] 2048 512-byte logical blocks: (1.05 MB/1.00 MiB)
> > Nov 29 15:13:51 hotplug kernel: sd 3:0:0:1: [sdb] Write Protect is off
> > Nov 29 15:13:51 hotplug kernel: sd 3:0:0:1: [sdb] Mode Sense: 63 00 00 08
> > Nov 29 15:13:51 hotplug kernel: sd 3:0:0:1: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
> > Nov 29 15:13:51 hotplug kernel: sd 3:0:0:1: [sdb] Attached SCSI disk
> > Nov 29 15:14:08 hotplug systemd[1]: systemd-fsckd.service: Deactivated successfully.
>
> 6.7.0-rc3 with the following reverted:
> cc22522fd55e2 ("PCI: acpiphp: Use
> pci_assign_unassigned_bridge_resources() only for non-root bus")
> 40613da52b13f ("PCI: acpiphp: Reassign resources on bridge if necessary")
>
> > Nov 29 15:15:37 hotplug kernel: pci 0000:01:02.0: [1af4:1004] type 00 class 0x010000
> > Nov 29 15:15:37 hotplug kernel: pci 0000:01:02.0: reg 0x10: [io 0x0000-0x003f]
> > Nov 29 15:15:37 hotplug kernel: pci 0000:01:02.0: reg 0x14: [mem 0x00000000-0x00000fff]
> > Nov 29 15:15:37 hotplug kernel: pci 0000:01:02.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref]
> > Nov 29 15:15:37 hotplug kernel: pci 0000:01:02.0: BAR 4: assigned [mem 0xc000004000-0xc000007fff 64bit pref]
> > Nov 29 15:15:37 hotplug kernel: pci 0000:01:02.0: BAR 1: assigned [mem 0xc1401000-0xc1401fff]
> > Nov 29 15:15:37 hotplug kernel: pci 0000:01:02.0: BAR 0: assigned [io 0xe040-0xe07f]
> > Nov 29 15:15:37 hotplug kernel: virtio-pci 0000:01:02.0: enabling device (0000 -> 0003)
> > Nov 29 15:15:37 hotplug kernel: ACPI: \_SB_.LNKC: Enabled at IRQ 11
> > Nov 29 15:15:37 hotplug kernel: scsi host3: Virtio SCSI HBA
> > Nov 29 15:15:37 hotplug kernel: scsi 3:0:0:1: Direct-Access QEMU QEMU HARDDISK 2.5+ PQ: 0 ANSI: 5
> > Nov 29 15:15:37 hotplug kernel: sd 3:0:0:1: Attached scsi generic sg1 type 0
> > Nov 29 15:15:37 hotplug kernel: sd 3:0:0:1: Power-on or device reset occurred
> > Nov 29 15:15:37 hotplug kernel: sd 3:0:0:1: [sdb] 2048 512-byte logical blocks: (1.05 MB/1.00 MiB)
> > Nov 29 15:15:37 hotplug kernel: sd 3:0:0:1: [sdb] Write Protect is off
> > Nov 29 15:15:37 hotplug kernel: sd 3:0:0:1: [sdb] Mode Sense: 63 00 00 08
> > Nov 29 15:15:37 hotplug kernel: sd 3:0:0:1: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
> > Nov 29 15:15:37 hotplug kernel: sd 3:0:0:1: [sdb] Attached SCSI disk
> > Nov 29 15:15:38 hotplug systemd[1]: systemd-fsckd.service: Deactivated successfully.
>

2023-12-01 09:25:10

by Fiona Ebner

[permalink] [raw]
Subject: Re: SCSI hotplug issues with UEFI VM with guest kernel >= 6.5

Am 01.12.23 um 00:18 schrieb Bjorn Helgaas:
> On Wed, Nov 29, 2023 at 04:22:41PM +0100, Fiona Ebner wrote:
>> Hi,
>> it seems that hot-plugging SCSI disks for QEMU virtual machines booting
>> with UEFI and with guest kernels >= 6.5 might be broken. It's not
>> consistently broken, hinting there might be a race somewhere.
>>
>> Reverting the following two commits seems to make it work reliably again:
>>
>> cc22522fd55e2 ("PCI: acpiphp: Use
>> pci_assign_unassigned_bridge_resources() only for non-root bus")
>> 40613da52b13f ("PCI: acpiphp: Reassign resources on bridge if necessary"
>>
>> Of course, they might only expose some pre-existing issue, but this is
>> my best lead. See below for some logs and details about an affected
>> virtual machine. Happy to provide more information and to debug/test
>> further.
>
> Shoot. Thanks very much for the report and your debugging. I'm
> hoping Igor will chime in with some ideas.
>
> Both of those commits appeard in v6.5 and fixed legit issues, so I
> hate to revert them, but this does appear to be a regression.
>
> #regzbot introduced: cc22522fd55e2 ^
> #regzbot introduced: 40613da52b13f ^
>
>> Host kernel: 6.5.11-4-pve which is based on the one from Ubuntu
>> Guest kernel: 6.7.0-rc3 and 6.7.0-rc3 with above commits reverted
>> QEMU version: v8.1.0 built from source
>> EDK2 version: submodule in the QEMU v8.1 repository: edk2-stable202302
>>

I should mention that I haven't run into the issue when booting the VM
with SeaBIOS yet.

Log for 6.7.0-rc3 + SeaBIOS (bundled with QEMU 8.1):

> Dec 01 10:08:08 hotplug kernel: pci 0000:01:02.0: [1af4:1004] type 00 class 0x010000
> Dec 01 10:08:08 hotplug kernel: pci 0000:01:02.0: reg 0x10: [io 0x0000-0x003f]
> Dec 01 10:08:08 hotplug kernel: pci 0000:01:02.0: reg 0x14: [mem 0x00000000-0x00000fff]
> Dec 01 10:08:08 hotplug kernel: pci 0000:01:02.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref]
> Dec 01 10:08:08 hotplug kernel: pci 0000:01:02.0: BAR 4: assigned [mem 0xfd404000-0xfd407fff 64bit pref]
> Dec 01 10:08:08 hotplug kernel: pci 0000:01:02.0: BAR 1: assigned [mem 0xfe801000-0xfe801fff]
> Dec 01 10:08:08 hotplug kernel: pci 0000:01:02.0: BAR 0: assigned [io 0xe040-0xe07f]
> Dec 01 10:08:08 hotplug kernel: pci 0000:00:05.0: PCI bridge to [bus 01]
> Dec 01 10:08:08 hotplug kernel: pci 0000:00:05.0: bridge window [io 0xe000-0xefff]
> Dec 01 10:08:08 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xfe800000-0xfe9fffff]
> Dec 01 10:08:08 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xfd400000-0xfd5fffff 64bit pref]
> Dec 01 10:08:08 hotplug kernel: virtio-pci 0000:01:02.0: enabling device (0000 -> 0003)
> Dec 01 10:08:08 hotplug kernel: ACPI: \_SB_.LNKC: Enabled at IRQ 11
> Dec 01 10:08:08 hotplug kernel: scsi host3: Virtio SCSI HBA
> Dec 01 10:08:08 hotplug kernel: pci 0000:00:05.0: PCI bridge to [bus 01]
> Dec 01 10:08:08 hotplug kernel: pci 0000:00:05.0: bridge window [io 0xe000-0xefff]
> Dec 01 10:08:08 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xfe800000-0xfe9fffff]
> Dec 01 10:08:08 hotplug kernel: scsi 3:0:0:1: Direct-Access QEMU QEMU HARDDISK 2.5+ PQ: 0 ANSI: 5
> Dec 01 10:08:08 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xfd400000-0xfd5fffff 64bit pref]
> Dec 01 10:08:08 hotplug kernel: scsi 3:0:0:1: Attached scsi generic sg1 type 0
> Dec 01 10:08:08 hotplug kernel: sd 3:0:0:1: Power-on or device reset occurred
> Dec 01 10:08:08 hotplug kernel: sd 3:0:0:1: [sdb] 2048 512-byte logical blocks: (1.05 MB/1.00 MiB)
> Dec 01 10:08:08 hotplug kernel: sd 3:0:0:1: [sdb] Write Protect is off
> Dec 01 10:08:08 hotplug kernel: sd 3:0:0:1: [sdb] Mode Sense: 63 00 00 08
> Dec 01 10:08:08 hotplug kernel: sd 3:0:0:1: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
> Dec 01 10:08:08 hotplug kernel: sd 3:0:0:1: [sdb] Attached SCSI disk

Interestingly, the line with "QEMU HARDDISK" does not come after all
lines with "bridge window" like was the case for the one time it did
work with UEFI. So maybe that was just a red herring.

Best Regards,
Fiona

2023-12-07 14:56:19

by Igor Mammedov

[permalink] [raw]
Subject: Re: SCSI hotplug issues with UEFI VM with guest kernel >= 6.5

On Fri, 1 Dec 2023 10:24:41 +0100
Fiona Ebner <[email protected]> wrote:

> Am 01.12.23 um 00:18 schrieb Bjorn Helgaas:
> > On Wed, Nov 29, 2023 at 04:22:41PM +0100, Fiona Ebner wrote:
> >> Hi,
> >> it seems that hot-plugging SCSI disks for QEMU virtual machines booting
> >> with UEFI and with guest kernels >= 6.5 might be broken. It's not
> >> consistently broken, hinting there might be a race somewhere.
> >>
> >> Reverting the following two commits seems to make it work reliably again:
> >>
> >> cc22522fd55e2 ("PCI: acpiphp: Use
> >> pci_assign_unassigned_bridge_resources() only for non-root bus")
> >> 40613da52b13f ("PCI: acpiphp: Reassign resources on bridge if necessary"
> >>
> >> Of course, they might only expose some pre-existing issue, but this is
> >> my best lead. See below for some logs and details about an affected
> >> virtual machine. Happy to provide more information and to debug/test
> >> further.
> >
> > Shoot. Thanks very much for the report and your debugging. I'm
> > hoping Igor will chime in with some ideas.
> >
> > Both of those commits appeard in v6.5 and fixed legit issues, so I
> > hate to revert them, but this does appear to be a regression.
> >
> > #regzbot introduced: cc22522fd55e2 ^
> > #regzbot introduced: 40613da52b13f ^
> >
> >> Host kernel: 6.5.11-4-pve which is based on the one from Ubuntu
> >> Guest kernel: 6.7.0-rc3 and 6.7.0-rc3 with above commits reverted
> >> QEMU version: v8.1.0 built from source
> >> EDK2 version: submodule in the QEMU v8.1 repository: edk2-stable202302
> >>
>
> I should mention that I haven't run into the issue when booting the VM
> with SeaBIOS yet.
>
> Log for 6.7.0-rc3 + SeaBIOS (bundled with QEMU 8.1):
>
> > Dec 01 10:08:08 hotplug kernel: pci 0000:01:02.0: [1af4:1004] type 00 class 0x010000
> > Dec 01 10:08:08 hotplug kernel: pci 0000:01:02.0: reg 0x10: [io 0x0000-0x003f]
> > Dec 01 10:08:08 hotplug kernel: pci 0000:01:02.0: reg 0x14: [mem 0x00000000-0x00000fff]
> > Dec 01 10:08:08 hotplug kernel: pci 0000:01:02.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref]
> > Dec 01 10:08:08 hotplug kernel: pci 0000:01:02.0: BAR 4: assigned [mem 0xfd404000-0xfd407fff 64bit pref]
> > Dec 01 10:08:08 hotplug kernel: pci 0000:01:02.0: BAR 1: assigned [mem 0xfe801000-0xfe801fff]
> > Dec 01 10:08:08 hotplug kernel: pci 0000:01:02.0: BAR 0: assigned [io 0xe040-0xe07f]
> > Dec 01 10:08:08 hotplug kernel: pci 0000:00:05.0: PCI bridge to [bus 01]
> > Dec 01 10:08:08 hotplug kernel: pci 0000:00:05.0: bridge window [io 0xe000-0xefff]
> > Dec 01 10:08:08 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xfe800000-0xfe9fffff]
> > Dec 01 10:08:08 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xfd400000-0xfd5fffff 64bit pref]
> > Dec 01 10:08:08 hotplug kernel: virtio-pci 0000:01:02.0: enabling device (0000 -> 0003)
> > Dec 01 10:08:08 hotplug kernel: ACPI: \_SB_.LNKC: Enabled at IRQ 11
> > Dec 01 10:08:08 hotplug kernel: scsi host3: Virtio SCSI HBA
> > Dec 01 10:08:08 hotplug kernel: pci 0000:00:05.0: PCI bridge to [bus 01]
> > Dec 01 10:08:08 hotplug kernel: pci 0000:00:05.0: bridge window [io 0xe000-0xefff]
> > Dec 01 10:08:08 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xfe800000-0xfe9fffff]
> > Dec 01 10:08:08 hotplug kernel: scsi 3:0:0:1: Direct-Access QEMU QEMU HARDDISK 2.5+ PQ: 0 ANSI: 5
> > Dec 01 10:08:08 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xfd400000-0xfd5fffff 64bit pref]
> > Dec 01 10:08:08 hotplug kernel: scsi 3:0:0:1: Attached scsi generic sg1 type 0
> > Dec 01 10:08:08 hotplug kernel: sd 3:0:0:1: Power-on or device reset occurred
> > Dec 01 10:08:08 hotplug kernel: sd 3:0:0:1: [sdb] 2048 512-byte logical blocks: (1.05 MB/1.00 MiB)
> > Dec 01 10:08:08 hotplug kernel: sd 3:0:0:1: [sdb] Write Protect is off
> > Dec 01 10:08:08 hotplug kernel: sd 3:0:0:1: [sdb] Mode Sense: 63 00 00 08
> > Dec 01 10:08:08 hotplug kernel: sd 3:0:0:1: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
> > Dec 01 10:08:08 hotplug kernel: sd 3:0:0:1: [sdb] Attached SCSI disk
>
> Interestingly, the line with "QEMU HARDDISK" does not come after all
> lines with "bridge window" like was the case for the one time it did
> work with UEFI. So maybe that was just a red herring.

I've just seen this,
let me try to reproduce and see what can be done with it.

>
> Best Regards,
> Fiona
>

2023-12-07 23:28:47

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: SCSI hotplug issues with UEFI VM with guest kernel >= 6.5

On Wed, Nov 29, 2023 at 04:22:41PM +0100, Fiona Ebner wrote:
> Hi,
> it seems that hot-plugging SCSI disks for QEMU virtual machines booting
> with UEFI and with guest kernels >= 6.5 might be broken. It's not
> consistently broken, hinting there might be a race somewhere.
>
> Reverting the following two commits seems to make it work reliably again:
>
> cc22522fd55e2 ("PCI: acpiphp: Use
> pci_assign_unassigned_bridge_resources() only for non-root bus")
> 40613da52b13f ("PCI: acpiphp: Reassign resources on bridge if necessary"
>
> Of course, they might only expose some pre-existing issue, but this is
> my best lead. See below for some logs and details about an affected
> virtual machine. Happy to provide more information and to debug/test
> further.
> ...

> I've attached some logs for guest using kernel 6.7.0-rc3 where hotplug
> works rarely and guest using kernel 6.7.0-rc3 with the previously
> mentioned commits reverted where hotplug works reliably:
>
> 6.7.0-rc3:
>
> > Nov 29 15:12:02 hotplug kernel: pci 0000:01:02.0: [1af4:1004] type 00 class 0x010000
> > Nov 29 15:12:02 hotplug kernel: pci 0000:01:02.0: reg 0x10: [io 0x0000-0x003f]
> > Nov 29 15:12:02 hotplug kernel: pci 0000:01:02.0: reg 0x14: [mem 0x00000000-0x00000fff]
> > Nov 29 15:12:02 hotplug kernel: pci 0000:01:02.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref]
> > Nov 29 15:12:02 hotplug kernel: pci 0000:01:02.0: BAR 4: assigned [mem 0xc000004000-0xc000007fff 64bit pref]
> > Nov 29 15:12:02 hotplug kernel: pci 0000:01:02.0: BAR 1: assigned [mem 0xc1401000-0xc1401fff]
> > Nov 29 15:12:02 hotplug kernel: pci 0000:01:02.0: BAR 0: assigned [io 0xe040-0xe07f]
> > Nov 29 15:12:02 hotplug kernel: pci 0000:00:05.0: PCI bridge to [bus 01]
> > Nov 29 15:12:02 hotplug kernel: pci 0000:00:05.0: bridge window [io 0xe000-0xefff]
> > Nov 29 15:12:02 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc1400000-0xc15fffff]
> > Nov 29 15:12:02 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc000000000-0xc01fffffff 64bit pref]
> > Nov 29 15:12:02 hotplug kernel: virtio-pci 0000:01:02.0: enabling device (0000 -> 0003)
> > Nov 29 15:12:02 hotplug kernel: ACPI: \_SB_.LNKC: Enabled at IRQ 11
> > Nov 29 15:12:02 hotplug kernel: scsi host3: Virtio SCSI HBA
> > Nov 29 15:12:02 hotplug kernel: pci 0000:00:05.0: PCI bridge to [bus 01]
> > Nov 29 15:12:02 hotplug kernel: pci 0000:00:05.0: bridge window [io 0xe000-0xefff]
> > Nov 29 15:12:02 hotplug kernel: scsi 3:0:0:1: Direct-Access QEMU QEMU HARDDISK 2.5+ PQ: 0 ANSI: 5
> > Nov 29 15:12:02 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc1400000-0xc15fffff]
> > Nov 29 15:12:02 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc000000000-0xc01fffffff 64bit pref]

What's the actual symptom that this is broken? All these log
fragments show the exact same assignments for BARs 0, 1, 4 and for the
bridge windows.

I assume 0000:01:02.0 is the hot-added SCSI HBA, and 00:05.0 is a
bridge leading to it?

Can you put the complete dmesg logs somewhere? There's a lot of
context missing here.

Do you have to revert both cc22522fd55e2 and 40613da52b13f to make it
work reliably? If we have to revert something, reverting one would be
better than reverting both.

Bjorn

> Reboot
>
> > Nov 29 15:12:52 hotplug kernel: pci 0000:01:02.0: [1af4:1004] type 00 class 0x010000
> > Nov 29 15:12:52 hotplug kernel: pci 0000:01:02.0: reg 0x10: [io 0x0000-0x003f]
> > Nov 29 15:12:52 hotplug kernel: pci 0000:01:02.0: reg 0x14: [mem 0x00000000-0x00000fff]
> > Nov 29 15:12:52 hotplug kernel: pci 0000:01:02.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref]
> > Nov 29 15:12:52 hotplug kernel: pci 0000:01:02.0: BAR 4: assigned [mem 0xc000004000-0xc000007fff 64bit pref]
> > Nov 29 15:12:52 hotplug kernel: pci 0000:01:02.0: BAR 1: assigned [mem 0xc1401000-0xc1401fff]
> > Nov 29 15:12:52 hotplug kernel: pci 0000:01:02.0: BAR 0: assigned [io 0xe040-0xe07f]
> > Nov 29 15:12:52 hotplug kernel: pci 0000:00:05.0: PCI bridge to [bus 01]
> > Nov 29 15:12:52 hotplug kernel: pci 0000:00:05.0: bridge window [io 0xe000-0xefff]
> > Nov 29 15:12:52 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc1400000-0xc15fffff]
> > Nov 29 15:12:52 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc000000000-0xc01fffffff 64bit pref]
> > Nov 29 15:12:52 hotplug kernel: virtio-pci 0000:01:02.0: enabling device (0000 -> 0003)
> > Nov 29 15:12:52 hotplug kernel: ACPI: \_SB_.LNKC: Enabled at IRQ 11
> > Nov 29 15:12:52 hotplug kernel: scsi host3: Virtio SCSI HBA
> > Nov 29 15:12:52 hotplug kernel: pci 0000:00:05.0: PCI bridge to [bus 01]
> > Nov 29 15:12:52 hotplug kernel: pci 0000:00:05.0: bridge window [io 0xe000-0xefff]
> > Nov 29 15:12:52 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc1400000-0xc15fffff]
> > Nov 29 15:12:52 hotplug kernel: scsi 3:0:0:1: Direct-Access QEMU QEMU HARDDISK 2.5+ PQ: 0 ANSI: 5
> > Nov 29 15:12:52 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc000000000-0xc01fffffff 64bit pref]
>
> RebootThe one time it did work. Note that the line with "QEMU HARDDISK"
> comes after all lines with "bridge window":
>
> > Nov 29 15:13:51 hotplug kernel: pci 0000:01:02.0: [1af4:1004] type 00 class 0x010000
> > Nov 29 15:13:51 hotplug kernel: pci 0000:01:02.0: reg 0x10: [io 0x0000-0x003f]
> > Nov 29 15:13:51 hotplug kernel: pci 0000:01:02.0: reg 0x14: [mem 0x00000000-0x00000fff]
> > Nov 29 15:13:51 hotplug kernel: pci 0000:01:02.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref]
> > Nov 29 15:13:51 hotplug kernel: pci 0000:01:02.0: BAR 4: assigned [mem 0xc000004000-0xc000007fff 64bit pref]
> > Nov 29 15:13:51 hotplug kernel: pci 0000:01:02.0: BAR 1: assigned [mem 0xc1401000-0xc1401fff]
> > Nov 29 15:13:51 hotplug kernel: pci 0000:01:02.0: BAR 0: assigned [io 0xe040-0xe07f]
> > Nov 29 15:13:51 hotplug kernel: pci 0000:00:05.0: PCI bridge to [bus 01]
> > Nov 29 15:13:51 hotplug kernel: pci 0000:00:05.0: bridge window [io 0xe000-0xefff]
> > Nov 29 15:13:51 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc1400000-0xc15fffff]
> > Nov 29 15:13:51 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc000000000-0xc01fffffff 64bit pref]
> > Nov 29 15:13:51 hotplug kernel: virtio-pci 0000:01:02.0: enabling device (0000 -> 0003)
> > Nov 29 15:13:51 hotplug kernel: ACPI: \_SB_.LNKC: Enabled at IRQ 11
> > Nov 29 15:13:51 hotplug kernel: scsi host3: Virtio SCSI HBA
> > Nov 29 15:13:51 hotplug kernel: pci 0000:00:05.0: PCI bridge to [bus 01]
> > Nov 29 15:13:51 hotplug kernel: pci 0000:00:05.0: bridge window [io 0xe000-0xefff]
> > Nov 29 15:13:51 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc1400000-0xc15fffff]
> > Nov 29 15:13:51 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc000000000-0xc01fffffff 64bit pref]
> > Nov 29 15:13:51 hotplug kernel: scsi 3:0:0:1: Direct-Access QEMU QEMU HARDDISK 2.5+ PQ: 0 ANSI: 5
> > Nov 29 15:13:51 hotplug kernel: sd 3:0:0:1: Attached scsi generic sg1 type 0
> > Nov 29 15:13:51 hotplug kernel: sd 3:0:0:1: Power-on or device reset occurred
> > Nov 29 15:13:51 hotplug kernel: sd 3:0:0:1: [sdb] 2048 512-byte logical blocks: (1.05 MB/1.00 MiB)
> > Nov 29 15:13:51 hotplug kernel: sd 3:0:0:1: [sdb] Write Protect is off
> > Nov 29 15:13:51 hotplug kernel: sd 3:0:0:1: [sdb] Mode Sense: 63 00 00 08
> > Nov 29 15:13:51 hotplug kernel: sd 3:0:0:1: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
> > Nov 29 15:13:51 hotplug kernel: sd 3:0:0:1: [sdb] Attached SCSI disk
> > Nov 29 15:14:08 hotplug systemd[1]: systemd-fsckd.service: Deactivated successfully.
>
> 6.7.0-rc3 with the following reverted:
> cc22522fd55e2 ("PCI: acpiphp: Use
> pci_assign_unassigned_bridge_resources() only for non-root bus")
> 40613da52b13f ("PCI: acpiphp: Reassign resources on bridge if necessary")
>
> > Nov 29 15:15:37 hotplug kernel: pci 0000:01:02.0: [1af4:1004] type 00 class 0x010000
> > Nov 29 15:15:37 hotplug kernel: pci 0000:01:02.0: reg 0x10: [io 0x0000-0x003f]
> > Nov 29 15:15:37 hotplug kernel: pci 0000:01:02.0: reg 0x14: [mem 0x00000000-0x00000fff]
> > Nov 29 15:15:37 hotplug kernel: pci 0000:01:02.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref]
> > Nov 29 15:15:37 hotplug kernel: pci 0000:01:02.0: BAR 4: assigned [mem 0xc000004000-0xc000007fff 64bit pref]
> > Nov 29 15:15:37 hotplug kernel: pci 0000:01:02.0: BAR 1: assigned [mem 0xc1401000-0xc1401fff]
> > Nov 29 15:15:37 hotplug kernel: pci 0000:01:02.0: BAR 0: assigned [io 0xe040-0xe07f]
> > Nov 29 15:15:37 hotplug kernel: virtio-pci 0000:01:02.0: enabling device (0000 -> 0003)
> > Nov 29 15:15:37 hotplug kernel: ACPI: \_SB_.LNKC: Enabled at IRQ 11
> > Nov 29 15:15:37 hotplug kernel: scsi host3: Virtio SCSI HBA
> > Nov 29 15:15:37 hotplug kernel: scsi 3:0:0:1: Direct-Access QEMU QEMU HARDDISK 2.5+ PQ: 0 ANSI: 5
> > Nov 29 15:15:37 hotplug kernel: sd 3:0:0:1: Attached scsi generic sg1 type 0
> > Nov 29 15:15:37 hotplug kernel: sd 3:0:0:1: Power-on or device reset occurred
> > Nov 29 15:15:37 hotplug kernel: sd 3:0:0:1: [sdb] 2048 512-byte logical blocks: (1.05 MB/1.00 MiB)
> > Nov 29 15:15:37 hotplug kernel: sd 3:0:0:1: [sdb] Write Protect is off
> > Nov 29 15:15:37 hotplug kernel: sd 3:0:0:1: [sdb] Mode Sense: 63 00 00 08
> > Nov 29 15:15:37 hotplug kernel: sd 3:0:0:1: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
> > Nov 29 15:15:37 hotplug kernel: sd 3:0:0:1: [sdb] Attached SCSI disk
> > Nov 29 15:15:38 hotplug systemd[1]: systemd-fsckd.service: Deactivated successfully.
>

2023-12-08 15:47:45

by Igor Mammedov

[permalink] [raw]
Subject: Re: SCSI hotplug issues with UEFI VM with guest kernel >= 6.5

On Thu, 7 Dec 2023 17:28:15 -0600
Bjorn Helgaas <[email protected]> wrote:

> On Wed, Nov 29, 2023 at 04:22:41PM +0100, Fiona Ebner wrote:
> > Hi,
> > it seems that hot-plugging SCSI disks for QEMU virtual machines booting
> > with UEFI and with guest kernels >= 6.5 might be broken. It's not
> > consistently broken, hinting there might be a race somewhere.
> >
> > Reverting the following two commits seems to make it work reliably again:
> >
> > cc22522fd55e2 ("PCI: acpiphp: Use
> > pci_assign_unassigned_bridge_resources() only for non-root bus")
> > 40613da52b13f ("PCI: acpiphp: Reassign resources on bridge if necessary"
> >
> > Of course, they might only expose some pre-existing issue, but this is
> > my best lead. See below for some logs and details about an affected
> > virtual machine. Happy to provide more information and to debug/test
> > further.
> > ...
>
> > I've attached some logs for guest using kernel 6.7.0-rc3 where hotplug
> > works rarely and guest using kernel 6.7.0-rc3 with the previously
> > mentioned commits reverted where hotplug works reliably:
> >
> > 6.7.0-rc3:
> >
> > > Nov 29 15:12:02 hotplug kernel: pci 0000:01:02.0: [1af4:1004] type 00 class 0x010000
> > > Nov 29 15:12:02 hotplug kernel: pci 0000:01:02.0: reg 0x10: [io 0x0000-0x003f]
> > > Nov 29 15:12:02 hotplug kernel: pci 0000:01:02.0: reg 0x14: [mem 0x00000000-0x00000fff]
> > > Nov 29 15:12:02 hotplug kernel: pci 0000:01:02.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref]
> > > Nov 29 15:12:02 hotplug kernel: pci 0000:01:02.0: BAR 4: assigned [mem 0xc000004000-0xc000007fff 64bit pref]
> > > Nov 29 15:12:02 hotplug kernel: pci 0000:01:02.0: BAR 1: assigned [mem 0xc1401000-0xc1401fff]
> > > Nov 29 15:12:02 hotplug kernel: pci 0000:01:02.0: BAR 0: assigned [io 0xe040-0xe07f]
> > > Nov 29 15:12:02 hotplug kernel: pci 0000:00:05.0: PCI bridge to [bus 01]
> > > Nov 29 15:12:02 hotplug kernel: pci 0000:00:05.0: bridge window [io 0xe000-0xefff]
> > > Nov 29 15:12:02 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc1400000-0xc15fffff]
> > > Nov 29 15:12:02 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc000000000-0xc01fffffff 64bit pref]
> > > Nov 29 15:12:02 hotplug kernel: virtio-pci 0000:01:02.0: enabling device (0000 -> 0003)
> > > Nov 29 15:12:02 hotplug kernel: ACPI: \_SB_.LNKC: Enabled at IRQ 11
> > > Nov 29 15:12:02 hotplug kernel: scsi host3: Virtio SCSI HBA
> > > Nov 29 15:12:02 hotplug kernel: pci 0000:00:05.0: PCI bridge to [bus 01]
> > > Nov 29 15:12:02 hotplug kernel: pci 0000:00:05.0: bridge window [io 0xe000-0xefff]
> > > Nov 29 15:12:02 hotplug kernel: scsi 3:0:0:1: Direct-Access QEMU QEMU HARDDISK 2.5+ PQ: 0 ANSI: 5
> > > Nov 29 15:12:02 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc1400000-0xc15fffff]
> > > Nov 29 15:12:02 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc000000000-0xc01fffffff 64bit pref]
>
> What's the actual symptom that this is broken? All these log
> fragments show the exact same assignments for BARs 0, 1, 4 and for the
> bridge windows.
>
> I assume 0000:01:02.0 is the hot-added SCSI HBA, and 00:05.0 is a
> bridge leading to it?
>
> Can you put the complete dmesg logs somewhere? There's a lot of
> context missing here.
>
> Do you have to revert both cc22522fd55e2 and 40613da52b13f to make it
> work reliably? If we have to revert something, reverting one would be
> better than reverting both.

here is simplified reproducer:
./qemu-system-x86_64 -enable-kvm -m 4G -smp 4 -cpu host \
/dev/lvmpool/fedora-rawhide \
-device pci-bridge,id=pci.3,chassis_nr=3,bus=pci.0,addr=0x5 \
-device virtio-scsi-pci,id=virtioscsi0,bus=pci.3,addr=0x1 \
-blockdev raw,file.driver=file,file.filename=Fedora-Server-dvd-x86_64-Rawhide-20231127.n.0.iso,node-name=drive-scsi1 \
-monitor stdio -serial file:/tmp/console_log

then once booted at monitor prompt:

(qemu) device_add virtio-scsi-pci,bus=pci.3,addr=2,id=virtioscsi1
(qemu) device_add scsi-hd,id=scsi1,drive=drive-scsi1,bus=virtioscsi1.0

with distro shipped 6.7.0-0.rc2.20231125git0f5cc96c367f.26.fc40.x86_64 kernel
hotplugged HBA is visible but a disk hotplugged into it is not (like Fiona has reported).

Problem happens when hotpluged virtio-scsi-pci is the 2nd HBA on the same bridge,
an attempt to rescan HBA (any on the bridge) causes guest hang.


However with the same 0f5cc96c367f commit, upstream kernel (without initrd and some minimal config):

-kernel ./linux-2.6/arch/x86_64/boot/bzImage -append 'root=/dev/sda3 console=ttyS0 console=tty0'

works as expected (aka disk is visible after hotplug)

[ 75.636170] pci 0000:01:02.0: [1af4:1004] type 00 class 0x010000
[ 75.636178] pci 0000:01:02.0: reg 0x10: [io 0x0000-0x003f]
[ 75.637193] pci 0000:01:02.0: reg 0x14: [mem 0x00000000-0x00000fff]
[ 75.638441] pci 0000:01:02.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref]
[ 75.647035] pci 0000:01:02.0: BAR 4: assigned [mem 0x380800004000-0x380800007fff 64bit pref]
[ 75.649461] pci 0000:01:02.0: BAR 1: assigned [mem 0xfe801000-0xfe801fff]
[ 75.650793] pci 0000:01:02.0: BAR 0: assigned [io 0xc040-0xc07f]
[ 75.652109] pci 0000:00:05.0: PCI bridge to [bus 01]
[ 75.653181] pci 0000:00:05.0: bridge window [io 0xc000-0xcfff]
[ 75.656971] pci 0000:00:05.0: bridge window [mem 0xfe800000-0xfe9fffff]
[ 75.659970] pci 0000:00:05.0: bridge window [mem 0x380800000000-0x380fffffffff 64bit pref]
[ 75.664990] virtio-pci 0000:01:02.0: enabling device (0000 -> 0003)
[ 75.695505] scsi host3: Virtio SCSI HBA
[ 75.698099] pci 0000:00:05.0: PCI bridge to [bus 01]
[ 75.735840] pci 0000:00:05.0: bridge window [io 0xc000-0xcfff]
[ 75.740361] pci 0000:00:05.0: bridge window [mem 0xfe800000-0xfe9fffff]
[ 75.744088] pci 0000:00:05.0: bridge window [mem 0x380800000000-0x380fffffffff 64bit pref]


So, I'm still looking at where/why it goes wrong


> Bjorn
>
> > Reboot
> >
> > > Nov 29 15:12:52 hotplug kernel: pci 0000:01:02.0: [1af4:1004] type 00 class 0x010000
> > > Nov 29 15:12:52 hotplug kernel: pci 0000:01:02.0: reg 0x10: [io 0x0000-0x003f]
> > > Nov 29 15:12:52 hotplug kernel: pci 0000:01:02.0: reg 0x14: [mem 0x00000000-0x00000fff]
> > > Nov 29 15:12:52 hotplug kernel: pci 0000:01:02.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref]
> > > Nov 29 15:12:52 hotplug kernel: pci 0000:01:02.0: BAR 4: assigned [mem 0xc000004000-0xc000007fff 64bit pref]
> > > Nov 29 15:12:52 hotplug kernel: pci 0000:01:02.0: BAR 1: assigned [mem 0xc1401000-0xc1401fff]
> > > Nov 29 15:12:52 hotplug kernel: pci 0000:01:02.0: BAR 0: assigned [io 0xe040-0xe07f]
> > > Nov 29 15:12:52 hotplug kernel: pci 0000:00:05.0: PCI bridge to [bus 01]
> > > Nov 29 15:12:52 hotplug kernel: pci 0000:00:05.0: bridge window [io 0xe000-0xefff]
> > > Nov 29 15:12:52 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc1400000-0xc15fffff]
> > > Nov 29 15:12:52 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc000000000-0xc01fffffff 64bit pref]
> > > Nov 29 15:12:52 hotplug kernel: virtio-pci 0000:01:02.0: enabling device (0000 -> 0003)
> > > Nov 29 15:12:52 hotplug kernel: ACPI: \_SB_.LNKC: Enabled at IRQ 11
> > > Nov 29 15:12:52 hotplug kernel: scsi host3: Virtio SCSI HBA
> > > Nov 29 15:12:52 hotplug kernel: pci 0000:00:05.0: PCI bridge to [bus 01]
> > > Nov 29 15:12:52 hotplug kernel: pci 0000:00:05.0: bridge window [io 0xe000-0xefff]
> > > Nov 29 15:12:52 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc1400000-0xc15fffff]
> > > Nov 29 15:12:52 hotplug kernel: scsi 3:0:0:1: Direct-Access QEMU QEMU HARDDISK 2.5+ PQ: 0 ANSI: 5
> > > Nov 29 15:12:52 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc000000000-0xc01fffffff 64bit pref]
> >
> > RebootThe one time it did work. Note that the line with "QEMU HARDDISK"
> > comes after all lines with "bridge window":
> >
> > > Nov 29 15:13:51 hotplug kernel: pci 0000:01:02.0: [1af4:1004] type 00 class 0x010000
> > > Nov 29 15:13:51 hotplug kernel: pci 0000:01:02.0: reg 0x10: [io 0x0000-0x003f]
> > > Nov 29 15:13:51 hotplug kernel: pci 0000:01:02.0: reg 0x14: [mem 0x00000000-0x00000fff]
> > > Nov 29 15:13:51 hotplug kernel: pci 0000:01:02.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref]
> > > Nov 29 15:13:51 hotplug kernel: pci 0000:01:02.0: BAR 4: assigned [mem 0xc000004000-0xc000007fff 64bit pref]
> > > Nov 29 15:13:51 hotplug kernel: pci 0000:01:02.0: BAR 1: assigned [mem 0xc1401000-0xc1401fff]
> > > Nov 29 15:13:51 hotplug kernel: pci 0000:01:02.0: BAR 0: assigned [io 0xe040-0xe07f]
> > > Nov 29 15:13:51 hotplug kernel: pci 0000:00:05.0: PCI bridge to [bus 01]
> > > Nov 29 15:13:51 hotplug kernel: pci 0000:00:05.0: bridge window [io 0xe000-0xefff]
> > > Nov 29 15:13:51 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc1400000-0xc15fffff]
> > > Nov 29 15:13:51 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc000000000-0xc01fffffff 64bit pref]
> > > Nov 29 15:13:51 hotplug kernel: virtio-pci 0000:01:02.0: enabling device (0000 -> 0003)
> > > Nov 29 15:13:51 hotplug kernel: ACPI: \_SB_.LNKC: Enabled at IRQ 11
> > > Nov 29 15:13:51 hotplug kernel: scsi host3: Virtio SCSI HBA
> > > Nov 29 15:13:51 hotplug kernel: pci 0000:00:05.0: PCI bridge to [bus 01]
> > > Nov 29 15:13:51 hotplug kernel: pci 0000:00:05.0: bridge window [io 0xe000-0xefff]
> > > Nov 29 15:13:51 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc1400000-0xc15fffff]
> > > Nov 29 15:13:51 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc000000000-0xc01fffffff 64bit pref]
> > > Nov 29 15:13:51 hotplug kernel: scsi 3:0:0:1: Direct-Access QEMU QEMU HARDDISK 2.5+ PQ: 0 ANSI: 5
> > > Nov 29 15:13:51 hotplug kernel: sd 3:0:0:1: Attached scsi generic sg1 type 0
> > > Nov 29 15:13:51 hotplug kernel: sd 3:0:0:1: Power-on or device reset occurred
> > > Nov 29 15:13:51 hotplug kernel: sd 3:0:0:1: [sdb] 2048 512-byte logical blocks: (1.05 MB/1.00 MiB)
> > > Nov 29 15:13:51 hotplug kernel: sd 3:0:0:1: [sdb] Write Protect is off
> > > Nov 29 15:13:51 hotplug kernel: sd 3:0:0:1: [sdb] Mode Sense: 63 00 00 08
> > > Nov 29 15:13:51 hotplug kernel: sd 3:0:0:1: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
> > > Nov 29 15:13:51 hotplug kernel: sd 3:0:0:1: [sdb] Attached SCSI disk
> > > Nov 29 15:14:08 hotplug systemd[1]: systemd-fsckd.service: Deactivated successfully.
> >
> > 6.7.0-rc3 with the following reverted:
> > cc22522fd55e2 ("PCI: acpiphp: Use
> > pci_assign_unassigned_bridge_resources() only for non-root bus")
> > 40613da52b13f ("PCI: acpiphp: Reassign resources on bridge if necessary")
> >
> > > Nov 29 15:15:37 hotplug kernel: pci 0000:01:02.0: [1af4:1004] type 00 class 0x010000
> > > Nov 29 15:15:37 hotplug kernel: pci 0000:01:02.0: reg 0x10: [io 0x0000-0x003f]
> > > Nov 29 15:15:37 hotplug kernel: pci 0000:01:02.0: reg 0x14: [mem 0x00000000-0x00000fff]
> > > Nov 29 15:15:37 hotplug kernel: pci 0000:01:02.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref]
> > > Nov 29 15:15:37 hotplug kernel: pci 0000:01:02.0: BAR 4: assigned [mem 0xc000004000-0xc000007fff 64bit pref]
> > > Nov 29 15:15:37 hotplug kernel: pci 0000:01:02.0: BAR 1: assigned [mem 0xc1401000-0xc1401fff]
> > > Nov 29 15:15:37 hotplug kernel: pci 0000:01:02.0: BAR 0: assigned [io 0xe040-0xe07f]
> > > Nov 29 15:15:37 hotplug kernel: virtio-pci 0000:01:02.0: enabling device (0000 -> 0003)
> > > Nov 29 15:15:37 hotplug kernel: ACPI: \_SB_.LNKC: Enabled at IRQ 11
> > > Nov 29 15:15:37 hotplug kernel: scsi host3: Virtio SCSI HBA
> > > Nov 29 15:15:37 hotplug kernel: scsi 3:0:0:1: Direct-Access QEMU QEMU HARDDISK 2.5+ PQ: 0 ANSI: 5
> > > Nov 29 15:15:37 hotplug kernel: sd 3:0:0:1: Attached scsi generic sg1 type 0
> > > Nov 29 15:15:37 hotplug kernel: sd 3:0:0:1: Power-on or device reset occurred
> > > Nov 29 15:15:37 hotplug kernel: sd 3:0:0:1: [sdb] 2048 512-byte logical blocks: (1.05 MB/1.00 MiB)
> > > Nov 29 15:15:37 hotplug kernel: sd 3:0:0:1: [sdb] Write Protect is off
> > > Nov 29 15:15:37 hotplug kernel: sd 3:0:0:1: [sdb] Mode Sense: 63 00 00 08
> > > Nov 29 15:15:37 hotplug kernel: sd 3:0:0:1: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
> > > Nov 29 15:15:37 hotplug kernel: sd 3:0:0:1: [sdb] Attached SCSI disk
> > > Nov 29 15:15:38 hotplug systemd[1]: systemd-fsckd.service: Deactivated successfully.
> >
>

2023-12-11 07:46:30

by Igor Mammedov

[permalink] [raw]
Subject: Re: SCSI hotplug issues with UEFI VM with guest kernel >= 6.5

On Fri, 8 Dec 2023 16:47:23 +0100
Igor Mammedov <[email protected]> wrote:

> On Thu, 7 Dec 2023 17:28:15 -0600
> Bjorn Helgaas <[email protected]> wrote:
>
> > On Wed, Nov 29, 2023 at 04:22:41PM +0100, Fiona Ebner wrote:
> > > Hi,
> > > it seems that hot-plugging SCSI disks for QEMU virtual machines booting
> > > with UEFI and with guest kernels >= 6.5 might be broken. It's not
> > > consistently broken, hinting there might be a race somewhere.
> > >
> > > Reverting the following two commits seems to make it work reliably again:
> > >
> > > cc22522fd55e2 ("PCI: acpiphp: Use
> > > pci_assign_unassigned_bridge_resources() only for non-root bus")
> > > 40613da52b13f ("PCI: acpiphp: Reassign resources on bridge if necessary"
> > >
> > > Of course, they might only expose some pre-existing issue, but this is
> > > my best lead. See below for some logs and details about an affected
> > > virtual machine. Happy to provide more information and to debug/test
> > > further.
> > > ...
> >
> > > I've attached some logs for guest using kernel 6.7.0-rc3 where hotplug
> > > works rarely and guest using kernel 6.7.0-rc3 with the previously
> > > mentioned commits reverted where hotplug works reliably:
> > >
> > > 6.7.0-rc3:
> > >
> > > > Nov 29 15:12:02 hotplug kernel: pci 0000:01:02.0: [1af4:1004] type 00 class 0x010000
> > > > Nov 29 15:12:02 hotplug kernel: pci 0000:01:02.0: reg 0x10: [io 0x0000-0x003f]
> > > > Nov 29 15:12:02 hotplug kernel: pci 0000:01:02.0: reg 0x14: [mem 0x00000000-0x00000fff]
> > > > Nov 29 15:12:02 hotplug kernel: pci 0000:01:02.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref]
> > > > Nov 29 15:12:02 hotplug kernel: pci 0000:01:02.0: BAR 4: assigned [mem 0xc000004000-0xc000007fff 64bit pref]
> > > > Nov 29 15:12:02 hotplug kernel: pci 0000:01:02.0: BAR 1: assigned [mem 0xc1401000-0xc1401fff]
> > > > Nov 29 15:12:02 hotplug kernel: pci 0000:01:02.0: BAR 0: assigned [io 0xe040-0xe07f]
> > > > Nov 29 15:12:02 hotplug kernel: pci 0000:00:05.0: PCI bridge to [bus 01]
> > > > Nov 29 15:12:02 hotplug kernel: pci 0000:00:05.0: bridge window [io 0xe000-0xefff]
> > > > Nov 29 15:12:02 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc1400000-0xc15fffff]
> > > > Nov 29 15:12:02 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc000000000-0xc01fffffff 64bit pref]
> > > > Nov 29 15:12:02 hotplug kernel: virtio-pci 0000:01:02.0: enabling device (0000 -> 0003)
> > > > Nov 29 15:12:02 hotplug kernel: ACPI: \_SB_.LNKC: Enabled at IRQ 11
> > > > Nov 29 15:12:02 hotplug kernel: scsi host3: Virtio SCSI HBA
> > > > Nov 29 15:12:02 hotplug kernel: pci 0000:00:05.0: PCI bridge to [bus 01]
> > > > Nov 29 15:12:02 hotplug kernel: pci 0000:00:05.0: bridge window [io 0xe000-0xefff]
> > > > Nov 29 15:12:02 hotplug kernel: scsi 3:0:0:1: Direct-Access QEMU QEMU HARDDISK 2.5+ PQ: 0 ANSI: 5
> > > > Nov 29 15:12:02 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc1400000-0xc15fffff]
> > > > Nov 29 15:12:02 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc000000000-0xc01fffffff 64bit pref]
> >
> > What's the actual symptom that this is broken? All these log
> > fragments show the exact same assignments for BARs 0, 1, 4 and for the
> > bridge windows.
> >
> > I assume 0000:01:02.0 is the hot-added SCSI HBA, and 00:05.0 is a
> > bridge leading to it?
> >
> > Can you put the complete dmesg logs somewhere? There's a lot of
> > context missing here.
> >
> > Do you have to revert both cc22522fd55e2 and 40613da52b13f to make it
> > work reliably? If we have to revert something, reverting one would be
> > better than reverting both.
>
> here is simplified reproducer:
> ./qemu-system-x86_64 -enable-kvm -m 4G -smp 4 -cpu host \
> /dev/lvmpool/fedora-rawhide \
> -device pci-bridge,id=pci.3,chassis_nr=3,bus=pci.0,addr=0x5 \
> -device virtio-scsi-pci,id=virtioscsi0,bus=pci.3,addr=0x1 \
> -blockdev raw,file.driver=file,file.filename=Fedora-Server-dvd-x86_64-Rawhide-20231127.n.0.iso,node-name=drive-scsi1 \
> -monitor stdio -serial file:/tmp/console_log
>
> then once booted at monitor prompt:
>
> (qemu) device_add virtio-scsi-pci,bus=pci.3,addr=2,id=virtioscsi1
> (qemu) device_add scsi-hd,id=scsi1,drive=drive-scsi1,bus=virtioscsi1.0
>
> with distro shipped 6.7.0-0.rc2.20231125git0f5cc96c367f.26.fc40.x86_64 kernel
> hotplugged HBA is visible but a disk hotplugged into it is not (like Fiona has reported).
>
> Problem happens when hotpluged virtio-scsi-pci is the 2nd HBA on the same bridge,
> an attempt to rescan HBA (any on the bridge) causes guest hang.
>
>
> However with the same 0f5cc96c367f commit, upstream kernel (without initrd and some minimal config):
>
> -kernel ./linux-2.6/arch/x86_64/boot/bzImage -append 'root=/dev/sda3 console=ttyS0 console=tty0'
>
> works as expected (aka disk is visible after hotplug)
>
> [ 75.636170] pci 0000:01:02.0: [1af4:1004] type 00 class 0x010000
> [ 75.636178] pci 0000:01:02.0: reg 0x10: [io 0x0000-0x003f]
> [ 75.637193] pci 0000:01:02.0: reg 0x14: [mem 0x00000000-0x00000fff]
> [ 75.638441] pci 0000:01:02.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref]
> [ 75.647035] pci 0000:01:02.0: BAR 4: assigned [mem 0x380800004000-0x380800007fff 64bit pref]
> [ 75.649461] pci 0000:01:02.0: BAR 1: assigned [mem 0xfe801000-0xfe801fff]
> [ 75.650793] pci 0000:01:02.0: BAR 0: assigned [io 0xc040-0xc07f]
> [ 75.652109] pci 0000:00:05.0: PCI bridge to [bus 01]
> [ 75.653181] pci 0000:00:05.0: bridge window [io 0xc000-0xcfff]
> [ 75.656971] pci 0000:00:05.0: bridge window [mem 0xfe800000-0xfe9fffff]
> [ 75.659970] pci 0000:00:05.0: bridge window [mem 0x380800000000-0x380fffffffff 64bit pref]
> [ 75.664990] virtio-pci 0000:01:02.0: enabling device (0000 -> 0003)
> [ 75.695505] scsi host3: Virtio SCSI HBA
> [ 75.698099] pci 0000:00:05.0: PCI bridge to [bus 01]
> [ 75.735840] pci 0000:00:05.0: bridge window [io 0xc000-0xcfff]
> [ 75.740361] pci 0000:00:05.0: bridge window [mem 0xfe800000-0xfe9fffff]
> [ 75.744088] pci 0000:00:05.0: bridge window [mem 0x380800000000-0x380fffffffff 64bit pref]
>
>
> So, I'm still looking at where/why it goes wrong

I've narrowed it down to ACPI PCI hotpug, (i.e. it works fine with SHPC and Fedora's kernel).
Though I still can't reproduce with upstream kernel (even with Fedora config (mod=yes)).
So far I was testing Seabios variant, next thing to try is UEFI setup (perhaps upstream will fail there)

Fiona,

Does it help if you use q35 machine with '-global ICH9-LPC.acpi-pci-hotplug-with-bridge-support=off' option?

>
> > Bjorn
> >
> > > Reboot
> > >
> > > > Nov 29 15:12:52 hotplug kernel: pci 0000:01:02.0: [1af4:1004] type 00 class 0x010000
> > > > Nov 29 15:12:52 hotplug kernel: pci 0000:01:02.0: reg 0x10: [io 0x0000-0x003f]
> > > > Nov 29 15:12:52 hotplug kernel: pci 0000:01:02.0: reg 0x14: [mem 0x00000000-0x00000fff]
> > > > Nov 29 15:12:52 hotplug kernel: pci 0000:01:02.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref]
> > > > Nov 29 15:12:52 hotplug kernel: pci 0000:01:02.0: BAR 4: assigned [mem 0xc000004000-0xc000007fff 64bit pref]
> > > > Nov 29 15:12:52 hotplug kernel: pci 0000:01:02.0: BAR 1: assigned [mem 0xc1401000-0xc1401fff]
> > > > Nov 29 15:12:52 hotplug kernel: pci 0000:01:02.0: BAR 0: assigned [io 0xe040-0xe07f]
> > > > Nov 29 15:12:52 hotplug kernel: pci 0000:00:05.0: PCI bridge to [bus 01]
> > > > Nov 29 15:12:52 hotplug kernel: pci 0000:00:05.0: bridge window [io 0xe000-0xefff]
> > > > Nov 29 15:12:52 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc1400000-0xc15fffff]
> > > > Nov 29 15:12:52 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc000000000-0xc01fffffff 64bit pref]
> > > > Nov 29 15:12:52 hotplug kernel: virtio-pci 0000:01:02.0: enabling device (0000 -> 0003)
> > > > Nov 29 15:12:52 hotplug kernel: ACPI: \_SB_.LNKC: Enabled at IRQ 11
> > > > Nov 29 15:12:52 hotplug kernel: scsi host3: Virtio SCSI HBA
> > > > Nov 29 15:12:52 hotplug kernel: pci 0000:00:05.0: PCI bridge to [bus 01]
> > > > Nov 29 15:12:52 hotplug kernel: pci 0000:00:05.0: bridge window [io 0xe000-0xefff]
> > > > Nov 29 15:12:52 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc1400000-0xc15fffff]
> > > > Nov 29 15:12:52 hotplug kernel: scsi 3:0:0:1: Direct-Access QEMU QEMU HARDDISK 2.5+ PQ: 0 ANSI: 5
> > > > Nov 29 15:12:52 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc000000000-0xc01fffffff 64bit pref]
> > >
> > > RebootThe one time it did work. Note that the line with "QEMU HARDDISK"
> > > comes after all lines with "bridge window":
> > >
> > > > Nov 29 15:13:51 hotplug kernel: pci 0000:01:02.0: [1af4:1004] type 00 class 0x010000
> > > > Nov 29 15:13:51 hotplug kernel: pci 0000:01:02.0: reg 0x10: [io 0x0000-0x003f]
> > > > Nov 29 15:13:51 hotplug kernel: pci 0000:01:02.0: reg 0x14: [mem 0x00000000-0x00000fff]
> > > > Nov 29 15:13:51 hotplug kernel: pci 0000:01:02.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref]
> > > > Nov 29 15:13:51 hotplug kernel: pci 0000:01:02.0: BAR 4: assigned [mem 0xc000004000-0xc000007fff 64bit pref]
> > > > Nov 29 15:13:51 hotplug kernel: pci 0000:01:02.0: BAR 1: assigned [mem 0xc1401000-0xc1401fff]
> > > > Nov 29 15:13:51 hotplug kernel: pci 0000:01:02.0: BAR 0: assigned [io 0xe040-0xe07f]
> > > > Nov 29 15:13:51 hotplug kernel: pci 0000:00:05.0: PCI bridge to [bus 01]
> > > > Nov 29 15:13:51 hotplug kernel: pci 0000:00:05.0: bridge window [io 0xe000-0xefff]
> > > > Nov 29 15:13:51 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc1400000-0xc15fffff]
> > > > Nov 29 15:13:51 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc000000000-0xc01fffffff 64bit pref]
> > > > Nov 29 15:13:51 hotplug kernel: virtio-pci 0000:01:02.0: enabling device (0000 -> 0003)
> > > > Nov 29 15:13:51 hotplug kernel: ACPI: \_SB_.LNKC: Enabled at IRQ 11
> > > > Nov 29 15:13:51 hotplug kernel: scsi host3: Virtio SCSI HBA
> > > > Nov 29 15:13:51 hotplug kernel: pci 0000:00:05.0: PCI bridge to [bus 01]
> > > > Nov 29 15:13:51 hotplug kernel: pci 0000:00:05.0: bridge window [io 0xe000-0xefff]
> > > > Nov 29 15:13:51 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc1400000-0xc15fffff]
> > > > Nov 29 15:13:51 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc000000000-0xc01fffffff 64bit pref]
> > > > Nov 29 15:13:51 hotplug kernel: scsi 3:0:0:1: Direct-Access QEMU QEMU HARDDISK 2.5+ PQ: 0 ANSI: 5
> > > > Nov 29 15:13:51 hotplug kernel: sd 3:0:0:1: Attached scsi generic sg1 type 0
> > > > Nov 29 15:13:51 hotplug kernel: sd 3:0:0:1: Power-on or device reset occurred
> > > > Nov 29 15:13:51 hotplug kernel: sd 3:0:0:1: [sdb] 2048 512-byte logical blocks: (1.05 MB/1.00 MiB)
> > > > Nov 29 15:13:51 hotplug kernel: sd 3:0:0:1: [sdb] Write Protect is off
> > > > Nov 29 15:13:51 hotplug kernel: sd 3:0:0:1: [sdb] Mode Sense: 63 00 00 08
> > > > Nov 29 15:13:51 hotplug kernel: sd 3:0:0:1: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
> > > > Nov 29 15:13:51 hotplug kernel: sd 3:0:0:1: [sdb] Attached SCSI disk
> > > > Nov 29 15:14:08 hotplug systemd[1]: systemd-fsckd.service: Deactivated successfully.
> > >
> > > 6.7.0-rc3 with the following reverted:
> > > cc22522fd55e2 ("PCI: acpiphp: Use
> > > pci_assign_unassigned_bridge_resources() only for non-root bus")
> > > 40613da52b13f ("PCI: acpiphp: Reassign resources on bridge if necessary")
> > >
> > > > Nov 29 15:15:37 hotplug kernel: pci 0000:01:02.0: [1af4:1004] type 00 class 0x010000
> > > > Nov 29 15:15:37 hotplug kernel: pci 0000:01:02.0: reg 0x10: [io 0x0000-0x003f]
> > > > Nov 29 15:15:37 hotplug kernel: pci 0000:01:02.0: reg 0x14: [mem 0x00000000-0x00000fff]
> > > > Nov 29 15:15:37 hotplug kernel: pci 0000:01:02.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref]
> > > > Nov 29 15:15:37 hotplug kernel: pci 0000:01:02.0: BAR 4: assigned [mem 0xc000004000-0xc000007fff 64bit pref]
> > > > Nov 29 15:15:37 hotplug kernel: pci 0000:01:02.0: BAR 1: assigned [mem 0xc1401000-0xc1401fff]
> > > > Nov 29 15:15:37 hotplug kernel: pci 0000:01:02.0: BAR 0: assigned [io 0xe040-0xe07f]
> > > > Nov 29 15:15:37 hotplug kernel: virtio-pci 0000:01:02.0: enabling device (0000 -> 0003)
> > > > Nov 29 15:15:37 hotplug kernel: ACPI: \_SB_.LNKC: Enabled at IRQ 11
> > > > Nov 29 15:15:37 hotplug kernel: scsi host3: Virtio SCSI HBA
> > > > Nov 29 15:15:37 hotplug kernel: scsi 3:0:0:1: Direct-Access QEMU QEMU HARDDISK 2.5+ PQ: 0 ANSI: 5
> > > > Nov 29 15:15:37 hotplug kernel: sd 3:0:0:1: Attached scsi generic sg1 type 0
> > > > Nov 29 15:15:37 hotplug kernel: sd 3:0:0:1: Power-on or device reset occurred
> > > > Nov 29 15:15:37 hotplug kernel: sd 3:0:0:1: [sdb] 2048 512-byte logical blocks: (1.05 MB/1.00 MiB)
> > > > Nov 29 15:15:37 hotplug kernel: sd 3:0:0:1: [sdb] Write Protect is off
> > > > Nov 29 15:15:37 hotplug kernel: sd 3:0:0:1: [sdb] Mode Sense: 63 00 00 08
> > > > Nov 29 15:15:37 hotplug kernel: sd 3:0:0:1: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
> > > > Nov 29 15:15:37 hotplug kernel: sd 3:0:0:1: [sdb] Attached SCSI disk
> > > > Nov 29 15:15:38 hotplug systemd[1]: systemd-fsckd.service: Deactivated successfully.
> > >
> >
>

2023-12-11 13:53:56

by Fiona Ebner

[permalink] [raw]
Subject: Re: SCSI hotplug issues with UEFI VM with guest kernel >= 6.5

Am 11.12.23 um 08:46 schrieb Igor Mammedov:
> On Fri, 8 Dec 2023 16:47:23 +0100
> Igor Mammedov <[email protected]> wrote:
>
>> On Thu, 7 Dec 2023 17:28:15 -0600
>> Bjorn Helgaas <[email protected]> wrote:
>>
>>>
>>> What's the actual symptom that this is broken? All these log
>>> fragments show the exact same assignments for BARs 0, 1, 4 and for the
>>> bridge windows.
>>>

The disk never shows up in /dev

>>> I assume 0000:01:02.0 is the hot-added SCSI HBA, and 00:05.0 is a
>>> bridge leading to it?
>>>
>>> Can you put the complete dmesg logs somewhere? There's a lot of
>>> context missing here.
>>>

Is this still necessary with Igor being able to reproduce the issue?

>>> Do you have to revert both cc22522fd55e2 and 40613da52b13f to make it
>>> work reliably? If we have to revert something, reverting one would be
>>> better than reverting both.
>>

Just reverting cc22522fd55e2 is not enough (and cc22522fd55e2 fixes
40613da52b13f so I can't revert just 40613da52b13f).

>
> Fiona,
>
> Does it help if you use q35 machine with '-global ICH9-LPC.acpi-pci-hotplug-with-bridge-support=off' option?
>

Yes, it does :)

I added the following to my QEMU commandline (first line, because there
wouldn't be a "pci.0" otherwise):

> -device 'pci-bridge,id=pci.0,chassis_nr=4' \
> -machine 'q35' \
> -global 'ICH9-LPC.acpi-pci-hotplug-with-bridge-support=off' \

and while it takes a few seconds, the disk does show up successfully:

> Dec 11 13:07:32 hotplug kernel: shpchp 0000:01:05.0: Latch close on Slot(2-1)
> Dec 11 13:07:32 hotplug kernel: shpchp 0000:01:05.0: Button pressed on Slot(2-1)
> Dec 11 13:07:32 hotplug kernel: shpchp 0000:01:05.0: Card present on Slot(2-1)
> Dec 11 13:07:32 hotplug kernel: shpchp 0000:01:05.0: PCI slot #2-1 - powering on due to button press
> Dec 11 13:07:38 hotplug kernel: pci 0000:02:02.0: [1af4:1004] type 00 class 0x010000
> Dec 11 13:07:38 hotplug kernel: pci 0000:02:02.0: reg 0x10: [io 0x0000-0x003f]
> Dec 11 13:07:38 hotplug kernel: pci 0000:02:02.0: reg 0x14: [mem 0x00000000-0x00000fff]
> Dec 11 13:07:38 hotplug kernel: pci 0000:02:02.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref]
> Dec 11 13:07:38 hotplug kernel: pci 0000:02:02.0: BAR 4: assigned [mem 0xc040004000-0xc040007fff 64bit pref]
> Dec 11 13:07:38 hotplug kernel: pci 0000:02:02.0: BAR 1: assigned [mem 0xc1401000-0xc1401fff]
> Dec 11 13:07:38 hotplug kernel: pci 0000:02:02.0: BAR 0: assigned [io 0x8040-0x807f]
> Dec 11 13:07:38 hotplug kernel: shpchp 0000:01:05.0: PCI bridge to [bus 02]
> Dec 11 13:07:38 hotplug kernel: shpchp 0000:01:05.0: bridge window [io 0x8000-0x8fff]
> Dec 11 13:07:38 hotplug kernel: shpchp 0000:01:05.0: bridge window [mem 0xc1400000-0xc15fffff]
> Dec 11 13:07:38 hotplug kernel: shpchp 0000:01:05.0: bridge window [mem 0xc040000000-0xc05fffffff 64bit pref]
> Dec 11 13:07:38 hotplug kernel: virtio-pci 0000:02:02.0: enabling device (0000 -> 0003)
> Dec 11 13:07:38 hotplug kernel: scsi host7: Virtio SCSI HBA
> Dec 11 13:07:38 hotplug kernel: scsi 7:0:0:1: Direct-Access QEMU QEMU HARDDISK 2.5+ PQ: 0 ANSI: 5
> Dec 11 13:07:38 hotplug kernel: sd 7:0:0:1: Attached scsi generic sg1 type 0
> Dec 11 13:07:38 hotplug kernel: sd 7:0:0:1: Power-on or device reset occurred
> Dec 11 13:07:38 hotplug kernel: sd 7:0:0:1: [sdb] 2048 512-byte logical blocks: (1.05 MB/1.00 MiB)
> Dec 11 13:07:38 hotplug kernel: sd 7:0:0:1: [sdb] Write Protect is off
> Dec 11 13:07:38 hotplug kernel: sd 7:0:0:1: [sdb] Mode Sense: 63 00 00 08
> Dec 11 13:07:38 hotplug kernel: sd 7:0:0:1: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
> Dec 11 13:07:38 hotplug kernel: sd 7:0:0:1: [sdb] Attached SCSI disk

Best Regards,
Fiona

2023-12-11 14:18:17

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: SCSI hotplug issues with UEFI VM with guest kernel >= 6.5

On Mon, Dec 11, 2023 at 02:52:42PM +0100, Fiona Ebner wrote:
> Am 11.12.23 um 08:46 schrieb Igor Mammedov:
> > On Fri, 8 Dec 2023 16:47:23 +0100
> > Igor Mammedov <[email protected]> wrote:
> >> On Thu, 7 Dec 2023 17:28:15 -0600
> >> Bjorn Helgaas <[email protected]> wrote:
> >>
> >>> What's the actual symptom that this is broken? All these log
> >>> fragments show the exact same assignments for BARs 0, 1, 4 and for the
> >>> bridge windows.
>
> The disk never shows up in /dev
>
> >>> I assume 0000:01:02.0 is the hot-added SCSI HBA, and 00:05.0 is a
> >>> bridge leading to it?
> >>>
> >>> Can you put the complete dmesg logs somewhere? There's a lot of
> >>> context missing here.
>
> Is this still necessary with Igor being able to reproduce the issue?

Only if you want help resolving the problem. I don't have time to
guess what interesting things might be missing and go back and forth
to get them.

> >>> Do you have to revert both cc22522fd55e2 and 40613da52b13f to make it
> >>> work reliably? If we have to revert something, reverting one would be
> >>> better than reverting both.
>
> Just reverting cc22522fd55e2 is not enough (and cc22522fd55e2 fixes
> 40613da52b13f so I can't revert just 40613da52b13f).

OK, thanks. So it sounds like our options are to fix it or to revert
both. I'm going to be on vacation Dec 16-25, so we have about 3-4
days to figure something out.

Bjorn

2023-12-11 14:54:05

by Fiona Ebner

[permalink] [raw]
Subject: Re: SCSI hotplug issues with UEFI VM with guest kernel >= 6.5

Am 11.12.23 um 15:12 schrieb Bjorn Helgaas:
> On Mon, Dec 11, 2023 at 02:52:42PM +0100, Fiona Ebner wrote:
>> Am 11.12.23 um 08:46 schrieb Igor Mammedov:
>>> On Fri, 8 Dec 2023 16:47:23 +0100
>>> Igor Mammedov <[email protected]> wrote:
>>>> On Thu, 7 Dec 2023 17:28:15 -0600
>>>> Bjorn Helgaas <[email protected]> wrote:
>>>>
>>>>> What's the actual symptom that this is broken? All these log
>>>>> fragments show the exact same assignments for BARs 0, 1, 4 and for the
>>>>> bridge windows.
>>
>> The disk never shows up in /dev
>>
>>>>> I assume 0000:01:02.0 is the hot-added SCSI HBA, and 00:05.0 is a
>>>>> bridge leading to it?
>>>>>
>>>>> Can you put the complete dmesg logs somewhere? There's a lot of
>>>>> context missing here.
>>
>> Is this still necessary with Igor being able to reproduce the issue?
>
> Only if you want help resolving the problem. I don't have time to
> guess what interesting things might be missing and go back and forth
> to get them.
>

I put the output of dmesg with my original QEMU commandline and kernel
6.7.0-rc3 here:

https://pastebin.com/UvpGAYf2

Best Regards,
Fiona

2023-12-11 16:18:56

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: SCSI hotplug issues with UEFI VM with guest kernel >= 6.5

On Mon, Dec 11, 2023 at 03:53:46PM +0100, Fiona Ebner wrote:
> Am 11.12.23 um 15:12 schrieb Bjorn Helgaas:
> > On Mon, Dec 11, 2023 at 02:52:42PM +0100, Fiona Ebner wrote:
> >> Am 11.12.23 um 08:46 schrieb Igor Mammedov:
> >>> On Fri, 8 Dec 2023 16:47:23 +0100
> >>> Igor Mammedov <[email protected]> wrote:
> >>>> On Thu, 7 Dec 2023 17:28:15 -0600
> >>>> Bjorn Helgaas <[email protected]> wrote:
> >>>>
> >>>>> What's the actual symptom that this is broken? All these log
> >>>>> fragments show the exact same assignments for BARs 0, 1, 4 and for the
> >>>>> bridge windows.
> >>
> >> The disk never shows up in /dev
> >>
> >>>>> I assume 0000:01:02.0 is the hot-added SCSI HBA, and 00:05.0 is a
> >>>>> bridge leading to it?
> >>>>>
> >>>>> Can you put the complete dmesg logs somewhere? There's a lot of
> >>>>> context missing here.
> >>
> >> Is this still necessary with Igor being able to reproduce the issue?
> >
> > Only if you want help resolving the problem. I don't have time to
> > guess what interesting things might be missing and go back and forth
> > to get them.
> >
>
> I put the output of dmesg with my original QEMU commandline and kernel
> 6.7.0-rc3 here:
>
> https://pastebin.com/UvpGAYf2

Thanks, I opened https://bugzilla.kernel.org/show_bug.cgi?id=218255
and attached this dmesg log there so this doesn't get lost or expired
from pastebin.com.

IIUC this is a log from when it fails. Can you attach the
corresponding log when it works correctly?

Bjorn

2023-12-12 09:41:47

by Fiona Ebner

[permalink] [raw]
Subject: Re: SCSI hotplug issues with UEFI VM with guest kernel >= 6.5

Am 11.12.23 um 17:18 schrieb Bjorn Helgaas:
>
> Thanks, I opened https://bugzilla.kernel.org/show_bug.cgi?id=218255
> and attached this dmesg log there so this doesn't get lost or expired
> from pastebin.com.
>

Thank you!

> IIUC this is a log from when it fails. Can you attach the
> corresponding log when it works correctly?
>

Yes, it is. I attached the dmesg log for both, a run with the reverts
and a run without the reverts where the issue did not trigger by chance.

Best Regards,
Fiona

2023-12-12 11:26:59

by Igor Mammedov

[permalink] [raw]
Subject: Re: SCSI hotplug issues with UEFI VM with guest kernel >= 6.5

On Mon, 11 Dec 2023 14:52:42 +0100
Fiona Ebner <[email protected]> wrote:

> Am 11.12.23 um 08:46 schrieb Igor Mammedov:
> > On Fri, 8 Dec 2023 16:47:23 +0100
> > Igor Mammedov <[email protected]> wrote:
> >
> >> On Thu, 7 Dec 2023 17:28:15 -0600
> >> Bjorn Helgaas <[email protected]> wrote:
> >>
> >>>
> >>> What's the actual symptom that this is broken? All these log
> >>> fragments show the exact same assignments for BARs 0, 1, 4 and for the
> >>> bridge windows.
> >>>
>
> The disk never shows up in /dev
>
> >>> I assume 0000:01:02.0 is the hot-added SCSI HBA, and 00:05.0 is a
> >>> bridge leading to it?
> >>>
> >>> Can you put the complete dmesg logs somewhere? There's a lot of
> >>> context missing here.
> >>>
>
> Is this still necessary with Igor being able to reproduce the issue?

it's not necessary, but it would help to find out what's going wrong faster.
Otherwise we would need to fallback to debugging over email.
Are you willing to help with testing/providing debug logs to track down
the cause?.

Though debug over email would be slow, so our best option is to revert
offending patches until the cause if found/fixed.

> >>> Do you have to revert both cc22522fd55e2 and 40613da52b13f to make it
> >>> work reliably? If we have to revert something, reverting one would be
> >>> better than reverting both.
> >>
>
> Just reverting cc22522fd55e2 is not enough (and cc22522fd55e2 fixes
> 40613da52b13f so I can't revert just 40613da52b13f).

With UEFI setup, it still works for me fine with current master.

Kernel 6.7.0-rc5-00014-g26aff849438c on an x86_64 (ttyS0)

ibm-p8-kvm-03-guest-02 login: pci 0000:01:02.0: [1af4:1004] type 00 class 0x010000
pci 0000:01:02.0: reg 0x10: [io 0x0000-0x003f]
pci 0000:01:02.0: reg 0x14: [mem 0x00000000-0x00000fff]
pci 0000:01:02.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref]
pci 0000:01:02.0: BAR 4: assigned [mem 0x380000004000-0x380000007fff 64bit pref]
pci 0000:01:02.0: BAR 1: assigned [mem 0xc1001000-0xc1001fff]
pci 0000:01:02.0: BAR 0: assigned [io 0xc040-0xc07f]
pci 0000:00:05.0: PCI bridge to [bus 01]
pci 0000:00:05.0: bridge window [io 0xc000-0xcfff]
pci 0000:00:05.0: bridge window [mem 0xc1000000-0xc11fffff]
pci 0000:00:05.0: bridge window [mem 0x380000000000-0x3807ffffffff 64bit pref]
virtio-pci 0000:01:02.0: enabling device (0000 -> 0003)
scsi host3: Virtio SCSI HBA
pci 0000:00:05.0: PCI bridge to [bus 01]
pci 0000:00:05.0: bridge window [io 0xc000-0xcfff]
pci 0000:00:05.0: bridge window [mem 0xc1000000-0xc11fffff]
pci 0000:00:05.0: bridge window [mem 0x380000000000-0x3807ffffffff 64bit pref]
scsi 3:0:0:0: Direct-Access QEMU QEMU HARDDISK 2.5+ PQ: 0 ANSI: 5
sd 3:0:0:0: Power-on or device reset occurred
sd 3:0:0:0: Attached scsi generic sg2 type 0
sd 3:0:0:0: LUN assignments on this target have changed. The Linux SCSI layer does not automatically remap LUN assignments.
sd 3:0:0:0: [sdb] 5190784 512-byte logical blocks: (2.66 GB/2.47 GiB)
sd 3:0:0:0: [sdb] Write Protect is off
sd 3:0:0:0: [sdb] Mode Sense: 63 00 00 08
sd 3:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
GPT:Primary header thinks Alt. header is not at the end of the disk.
GPT:5190099 != 5190783
GPT:Alternate GPT header not at the end of the disk.
GPT:5190099 != 5190783
GPT: Use GNU Parted to correct GPT errors.
sdb: sdb1 sdb2
sd 3:0:0:0: [sdb] Attached SCSI disk

it still doesn't work with Fedora's 6.7.0-0.rc2.20231125git0f5cc96c367f.26.fc40.x86_64 kernel.
However it's necessary to have -smp 4 for it to break,
with -smp 1 it works fine as well.


> > Fiona,
> >
> > Does it help if you use q35 machine with '-global ICH9-LPC.acpi-pci-hotplug-with-bridge-support=off' option?
> >
>
> Yes, it does :)
>
> I added the following to my QEMU commandline (first line, because there
> wouldn't be a "pci.0" otherwise):
>
> > -device 'pci-bridge,id=pci.0,chassis_nr=4' \
> > -machine 'q35' \
> > -global 'ICH9-LPC.acpi-pci-hotplug-with-bridge-support=off' \
>
> and while it takes a few seconds, the disk does show up successfully:

delay is normal for SHPC

>
> Best Regards,
> Fiona
>

2023-12-12 12:50:41

by Fiona Ebner

[permalink] [raw]
Subject: Re: SCSI hotplug issues with UEFI VM with guest kernel >= 6.5

Am 12.12.23 um 12:26 schrieb Igor Mammedov:
>
> it's not necessary, but it would help to find out what's going wrong faster.
> Otherwise we would need to fallback to debugging over email.
> Are you willing to help with testing/providing debug logs to track down
> the cause?.
>

I submitted the dmesg logs in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=218255

> Though debug over email would be slow, so our best option is to revert
> offending patches until the cause if found/fixed.
>
>>>>> Do you have to revert both cc22522fd55e2 and 40613da52b13f to make it
>>>>> work reliably? If we have to revert something, reverting one would be
>>>>> better than reverting both.
>>>>
>>
>> Just reverting cc22522fd55e2 is not enough (and cc22522fd55e2 fixes
>> 40613da52b13f so I can't revert just 40613da52b13f).
>
> With UEFI setup, it still works for me fine with current master.
>
> Kernel 6.7.0-rc5-00014-g26aff849438c on an x86_64 (ttyS0)
>

I also built from current master (still 26aff849438c) to verify and it's
still broken for me.

>
> it still doesn't work with Fedora's 6.7.0-0.rc2.20231125git0f5cc96c367f.26.fc40.x86_64 kernel.
> However it's necessary to have -smp 4 for it to break,
> with -smp 1 it works fine as well.
>

For me it's (always with build from current master):

-smp 1 -> it worked 5 times out of 5
-smp 2 -> it worked 3 times out of 5
-smp 4 -> it worked 0 times out of 5
-smp 8 -> it worked 0 times out of 5

Best Regards,
Fiona

2023-12-12 15:53:05

by Igor Mammedov

[permalink] [raw]
Subject: Re: SCSI hotplug issues with UEFI VM with guest kernel >= 6.5

On Tue, 12 Dec 2023 13:50:20 +0100
Fiona Ebner <[email protected]> wrote:

> Am 12.12.23 um 12:26 schrieb Igor Mammedov:
> >
> > it's not necessary, but it would help to find out what's going wrong faster.
> > Otherwise we would need to fallback to debugging over email.
> > Are you willing to help with testing/providing debug logs to track down
> > the cause?.
> >
>
> I submitted the dmesg logs in bugzilla:
> https://bugzilla.kernel.org/show_bug.cgi?id=218255
>
> > Though debug over email would be slow, so our best option is to revert
> > offending patches until the cause if found/fixed.
> >
> >>>>> Do you have to revert both cc22522fd55e2 and 40613da52b13f to make it
> >>>>> work reliably? If we have to revert something, reverting one would be
> >>>>> better than reverting both.
> >>>>
> >>
> >> Just reverting cc22522fd55e2 is not enough (and cc22522fd55e2 fixes
> >> 40613da52b13f so I can't revert just 40613da52b13f).
> >
> > With UEFI setup, it still works for me fine with current master.
> >
> > Kernel 6.7.0-rc5-00014-g26aff849438c on an x86_64 (ttyS0)
> >
>
> I also built from current master (still 26aff849438c) to verify and it's
> still broken for me.
>
> >
> > it still doesn't work with Fedora's 6.7.0-0.rc2.20231125git0f5cc96c367f.26.fc40.x86_64 kernel.
> > However it's necessary to have -smp 4 for it to break,
> > with -smp 1 it works fine as well.
> >
>
> For me it's (always with build from current master):
>
> -smp 1 -> it worked 5 times out of 5
> -smp 2 -> it worked 3 times out of 5
> -smp 4 -> it worked 0 times out of 5
> -smp 8 -> it worked 0 times out of 5


I managed to reproduce it with upstream using fedora 40 config as is
(without converting it to mod2yesconfig).
So give me a couple of days to debug it before reverting.

>
> Best Regards,
> Fiona
>

2023-12-12 20:03:17

by Igor Mammedov

[permalink] [raw]
Subject: Re: SCSI hotplug issues with UEFI VM with guest kernel >= 6.5

On Tue, 12 Dec 2023 16:25:29 +0100
Igor Mammedov <[email protected]> wrote:

> On Tue, 12 Dec 2023 13:50:20 +0100
> Fiona Ebner <[email protected]> wrote:
>
> > Am 12.12.23 um 12:26 schrieb Igor Mammedov:
> > >
> > > it's not necessary, but it would help to find out what's going wrong faster.
> > > Otherwise we would need to fallback to debugging over email.
> > > Are you willing to help with testing/providing debug logs to track down
> > > the cause?.
> > >
> >
> > I submitted the dmesg logs in bugzilla:
> > https://bugzilla.kernel.org/show_bug.cgi?id=218255
> >
> > > Though debug over email would be slow, so our best option is to revert
> > > offending patches until the cause if found/fixed.
> > >
> > >>>>> Do you have to revert both cc22522fd55e2 and 40613da52b13f to make it
> > >>>>> work reliably? If we have to revert something, reverting one would be
> > >>>>> better than reverting both.
> > >>>>
> > >>
> > >> Just reverting cc22522fd55e2 is not enough (and cc22522fd55e2 fixes
> > >> 40613da52b13f so I can't revert just 40613da52b13f).
> > >
> > > With UEFI setup, it still works for me fine with current master.
> > >
> > > Kernel 6.7.0-rc5-00014-g26aff849438c on an x86_64 (ttyS0)
> > >
> >
> > I also built from current master (still 26aff849438c) to verify and it's
> > still broken for me.
> >
> > >
> > > it still doesn't work with Fedora's 6.7.0-0.rc2.20231125git0f5cc96c367f.26.fc40.x86_64 kernel.
> > > However it's necessary to have -smp 4 for it to break,
> > > with -smp 1 it works fine as well.
> > >
> >
> > For me it's (always with build from current master):
> >
> > -smp 1 -> it worked 5 times out of 5
> > -smp 2 -> it worked 3 times out of 5
> > -smp 4 -> it worked 0 times out of 5
> > -smp 8 -> it worked 0 times out of 5
>
>
> I managed to reproduce it with upstream using fedora 40 config as is
> (without converting it to mod2yesconfig).
> So give me a couple of days to debug it before reverting.

Actually here is another report + analysis explaining where the race is happening:
https://www.spinics.net/lists/kernel/msg5033061.html

That's the reason why my minimal config worked
(based on defconfig where CONFIG_SCSI_SCAN_ASYNC in not enabled by default for x86)

While distros (at least some) do enable it.

>
> >
> > Best Regards,
> > Fiona
> >
>

2023-12-15 07:53:53

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: SCSI hotplug issues with UEFI VM with guest kernel >= 6.5

[TLDR: This mail in primarily relevant for Linux kernel regression
tracking. See link in footer if these mails annoy you.]

On 29.11.23 16:22, Fiona Ebner wrote:
> it seems that hot-plugging SCSI disks for QEMU virtual machines booting
> with UEFI and with guest kernels >= 6.5 might be broken. It's not
> consistently broken, hinting there might be a race somewhere.
>
> Reverting the following two commits seems to make it work reliably again:
>
> cc22522fd55e2 ("PCI: acpiphp: Use
> pci_assign_unassigned_bridge_resources() only for non-root bus")
> 40613da52b13f ("PCI: acpiphp: Reassign resources on bridge if necessary"

#regzbot fix: Revert "PCI: acpiphp: Reassign resources on bridge if
necessary"
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.