2022-08-12 16:03:26

by Michael S. Tsirkin

[permalink] [raw]
Subject: [GIT PULL] virtio: fatures, fixes

The following changes since commit 3d7cb6b04c3f3115719235cc6866b10326de34cd:

Linux 5.19 (2022-07-31 14:03:01 -0700)

are available in the Git repository at:

https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus

for you to fetch changes up to 93e530d2a1c4c0fcce45e01ae6c5c6287a08d3e3:

vdpa/mlx5: Fix possible uninitialized return value (2022-08-11 10:00:36 -0400)

----------------------------------------------------------------
virtio: fatures, fixes

A huge patchset supporting vq resize using the
new vq reset capability.
Features, fixes, cleanups all over the place.

Signed-off-by: Michael S. Tsirkin <[email protected]>

----------------------------------------------------------------
Alvaro Karsz (1):
net: virtio_net: notifications coalescing support

Bo Liu (3):
virtio: Check dev_set_name() return value
vhost-vdpa: Call ida_simple_remove() when failed
virtio_vdpa: support the arg sizes of find_vqs()

Colin Ian King (1):
vDPA/ifcvf: remove duplicated assignment to pointer cfg

David Hildenbrand (1):
drivers/virtio: Clarify CONFIG_VIRTIO_MEM for unsupported architectures

Eli Cohen (3):
vdpa/mlx5: Implement susupend virtqueue callback
vdpa/mlx5: Support different address spaces for control and data
vdpa/mlx5: Fix possible uninitialized return value

Eugenio PĂ©rez (4):
vdpa: Add suspend operation
vhost-vdpa: introduce SUSPEND backend feature bit
vhost-vdpa: uAPI to suspend the device
vdpa_sim: Implement suspend vdpa op

Jason Wang (2):
virtio_pmem: initialize provider_data through nd_region_desc
virtio_pmem: set device ready in probe()

Michael S. Tsirkin (1):
virtio: VIRTIO_HARDEN_NOTIFICATION is broken

Mike Christie (2):
vhost-scsi: Fix max number of virtqueues
vhost scsi: Allow user to control num virtqueues

Minghao Xue (2):
dt-bindings: virtio: mmio: add optional wakeup-source property
virtio_mmio: add support to set IRQ of a virtio device as wakeup source

Robin Murphy (1):
vdpa: Use device_iommu_capable()

Shigeru Yoshida (1):
virtio-blk: Avoid use-after-free on suspend/resume

Stefano Garzarella (11):
vringh: iterate on iotlb_translate to handle large translations
vdpa_sim_blk: use dev_dbg() to print errors
vdpa_sim_blk: limit the number of request handled per batch
vdpa_sim_blk: call vringh_complete_iotlb() also in the error path
vdpa_sim_blk: set number of address spaces and virtqueue groups
vdpa_sim: use max_iotlb_entries as a limit in vhost_iotlb_init
tools/virtio: fix build
vdpa_sim_blk: check if sector is 0 for commands other than read or write
vdpa_sim_blk: make vdpasim_blk_check_range usable by other requests
vdpa_sim_blk: add support for VIRTIO_BLK_T_FLUSH
vdpa_sim_blk: add support for discard and write-zeroes

Xie Yongji (5):
vduse: Remove unnecessary spin lock protection
vduse: Use memcpy_{to,from}_page() in do_bounce()
vduse: Support using userspace pages as bounce buffer
vduse: Support registering userspace memory for IOVA regions
vduse: Support querying information of IOVA regions

Xu Qiang (1):
vdpa/mlx5: Use eth_broadcast_addr() to assign broadcast address

Xuan Zhuo (44):
remoteproc: rename len of rpoc_vring to num
virtio_ring: remove the arg vq of vring_alloc_desc_extra()
virtio: record the maximum queue num supported by the device.
virtio: struct virtio_config_ops add callbacks for queue_reset
virtio_ring: update the document of the virtqueue_detach_unused_buf for queue reset
virtio_ring: extract the logic of freeing vring
virtio_ring: split vring_virtqueue
virtio_ring: introduce virtqueue_init()
virtio_ring: split: stop __vring_new_virtqueue as export symbol
virtio_ring: split: __vring_new_virtqueue() accept struct vring_virtqueue_split
virtio_ring: split: introduce vring_free_split()
virtio_ring: split: extract the logic of alloc queue
virtio_ring: split: extract the logic of alloc state and extra
virtio_ring: split: extract the logic of vring init
virtio_ring: split: extract the logic of attach vring
virtio_ring: split: introduce virtqueue_reinit_split()
virtio_ring: split: reserve vring_align, may_reduce_num
virtio_ring: split: introduce virtqueue_resize_split()
virtio_ring: packed: introduce vring_free_packed
virtio_ring: packed: extract the logic of alloc queue
virtio_ring: packed: extract the logic of alloc state and extra
virtio_ring: packed: extract the logic of vring init
virtio_ring: packed: extract the logic of attach vring
virtio_ring: packed: introduce virtqueue_reinit_packed()
virtio_ring: packed: introduce virtqueue_resize_packed()
virtio_ring: introduce virtqueue_resize()
virtio_pci: struct virtio_pci_common_cfg add queue_notify_data
virtio: allow to unbreak/break virtqueue individually
virtio: queue_reset: add VIRTIO_F_RING_RESET
virtio_ring: struct virtqueue introduce reset
virtio_pci: struct virtio_pci_common_cfg add queue_reset
virtio_pci: introduce helper to get/set queue reset
virtio_pci: extract the logic of active vq for modern pci
virtio_pci: support VIRTIO_F_RING_RESET
virtio: find_vqs() add arg sizes
virtio_pci: support the arg sizes of find_vqs()
virtio_mmio: support the arg sizes of find_vqs()
virtio: add helper virtio_find_vqs_ctx_size()
virtio_net: set the default max ring size by find_vqs()
virtio_net: get ringparam by virtqueue_get_vring_max_size()
virtio_net: split free_unused_bufs()
virtio_net: support rx queue resize
virtio_net: support tx queue resize
virtio_net: support set_ringparam

Zhang Jiaming (1):
vdpa: ifcvf: Fix spelling mistake in comments

Zhu Lingshan (4):
vDPA/ifcvf: get_config_size should return a value no greater than dev implementation
vDPA/ifcvf: support userspace to query features and MQ of a management device
vDPA: !FEATURES_OK should not block querying device config space
vDPA: fix 'cast to restricted le16' warnings in vdpa.c

Documentation/devicetree/bindings/virtio/mmio.yaml | 4 +
arch/um/drivers/virtio_uml.c | 3 +-
drivers/block/virtio_blk.c | 24 +-
drivers/net/virtio_net.c | 325 +++++++-
drivers/nvdimm/virtio_pmem.c | 9 +-
drivers/platform/mellanox/mlxbf-tmfifo.c | 3 +
drivers/remoteproc/remoteproc_core.c | 4 +-
drivers/remoteproc/remoteproc_virtio.c | 13 +-
drivers/s390/virtio/virtio_ccw.c | 4 +
drivers/vdpa/ifcvf/ifcvf_base.c | 14 +-
drivers/vdpa/ifcvf/ifcvf_base.h | 2 +
drivers/vdpa/ifcvf/ifcvf_main.c | 144 ++--
drivers/vdpa/mlx5/core/mlx5_vdpa.h | 11 +
drivers/vdpa/mlx5/net/mlx5_vnet.c | 175 ++++-
drivers/vdpa/vdpa.c | 14 +-
drivers/vdpa/vdpa_sim/vdpa_sim.c | 18 +-
drivers/vdpa/vdpa_sim/vdpa_sim.h | 1 +
drivers/vdpa/vdpa_sim/vdpa_sim_blk.c | 176 ++++-
drivers/vdpa/vdpa_sim/vdpa_sim_net.c | 3 +
drivers/vdpa/vdpa_user/iova_domain.c | 102 ++-
drivers/vdpa/vdpa_user/iova_domain.h | 8 +
drivers/vdpa/vdpa_user/vduse_dev.c | 180 +++++
drivers/vhost/scsi.c | 85 ++-
drivers/vhost/vdpa.c | 38 +-
drivers/vhost/vringh.c | 78 +-
drivers/virtio/Kconfig | 11 +-
drivers/virtio/virtio.c | 4 +-
drivers/virtio/virtio_mmio.c | 14 +-
drivers/virtio/virtio_pci_common.c | 32 +-
drivers/virtio/virtio_pci_common.h | 3 +-
drivers/virtio/virtio_pci_legacy.c | 8 +-
drivers/virtio/virtio_pci_modern.c | 153 +++-
drivers/virtio/virtio_pci_modern_dev.c | 39 +
drivers/virtio/virtio_ring.c | 814 +++++++++++++++------
drivers/virtio/virtio_vdpa.c | 18 +-
include/linux/mlx5/mlx5_ifc_vdpa.h | 8 +
include/linux/remoteproc.h | 4 +-
include/linux/vdpa.h | 4 +
include/linux/virtio.h | 10 +
include/linux/virtio_config.h | 40 +-
include/linux/virtio_pci_modern.h | 9 +
include/linux/virtio_ring.h | 10 -
include/uapi/linux/vduse.h | 47 ++
include/uapi/linux/vhost.h | 9 +
include/uapi/linux/vhost_types.h | 2 +
include/uapi/linux/virtio_config.h | 7 +-
include/uapi/linux/virtio_net.h | 34 +-
include/uapi/linux/virtio_pci.h | 2 +
tools/virtio/linux/kernel.h | 2 +-
tools/virtio/linux/vringh.h | 1 +
tools/virtio/virtio_test.c | 4 +-
51 files changed, 2171 insertions(+), 556 deletions(-)


2022-08-12 17:01:10

by pr-tracker-bot

[permalink] [raw]
Subject: Re: [GIT PULL] virtio: fatures, fixes

The pull request you sent on Fri, 12 Aug 2022 11:42:50 -0400:

> https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/7a53e17accce9d310d2e522dfc701d8da7ccfa65

Thank you!

--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html

2022-08-14 00:54:07

by Andres Freund

[permalink] [raw]
Subject: Re: [GIT PULL] virtio: fatures, fixes

Hi,

On 2022-08-12 11:42:50 -0400, Michael S. Tsirkin wrote:
> The following changes since commit 3d7cb6b04c3f3115719235cc6866b10326de34cd:
>
> Linux 5.19 (2022-07-31 14:03:01 -0700)
>
> are available in the Git repository at:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus
>
> for you to fetch changes up to 93e530d2a1c4c0fcce45e01ae6c5c6287a08d3e3:
>
> vdpa/mlx5: Fix possible uninitialized return value (2022-08-11 10:00:36 -0400)
> ----------------------------------------------------------------
> virtio: fatures, fixes
>
> A huge patchset supporting vq resize using the
> new vq reset capability.
> Features, fixes, cleanups all over the place.
>
> Signed-off-by: Michael S. Tsirkin <[email protected]>
>
> ----------------------------------------------------------------

I have a script [1] that daily builds google cloud VM images with a fresh vanilla
kernel for postgres CI testing. The last successful image creation was
7ebfc85e2cd7b08f518b526173e9a33b56b3913b
and the first failing was
69dac8e431af26173ca0a1ebc87054e01c585bcc

Since then creating a new kernel boots but network does not come up.

Looking at the merges between those commit makes me suspect this merge:

69dac8e431af Merge tag 'riscv-for-linus-5.20-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
6c833c0581f1 Merge tag 'devicetree-fixes-for-6.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
3d076fec5a0c Merge tag 'rtc-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
4a9350597aff Merge tag 'sound-fix-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
7a53e17accce Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
999324f58c41 Merge tag 'loongarch-5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
f7cdaeeab8ca Merge tag 'for-v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply
d16b418fac3d Merge tag 'vfio-v6.0-rc1pt2' of https://github.com/awilliam/linux-vfio
9801002f76c6 perf: riscv_pmu{,_sbi}: Miscallenous improvement & fixes
c3adefb5baf3 Merge tag 'for-6.0/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
7ce2aa6d7fe1 Merge tag 'drm-next-2022-08-12-1' of git://anongit.freedesktop.org/drm/drm
7ab52f75a9cf RISC-V: Add Sstc extension support
36fa1cb56ac5 Merge tag 'drm-misc-next-fixes-2022-08-10' of git://anongit.freedesktop.org/drm/drm-misc into drm-next
da06cc5bb600 RISC-V: fixups to work with crash tool
6de9eb21cd36 Merge 'irq/loongarch', 'pci/ctrl/loongson' and 'pci/header-cleanup-immutable'
3aefb2ee5bdd riscv: implement Zicbom-based CMO instructions + the t-head variant
8f2f74b4b6e6 RISC-V: Canaan devicetree fixes
f94ba7039fb4 Merge tag 'at91-reset-sama7g5-signed' into psy-next

all the drivers/net changes in that commit range were part of this pull
request.


excerpt from serial log for debian sid kernel (sorry for the interspersed logs):

Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 cloud-ifupdown-helper: Generated configuration for ens4
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 kern[ OK ] Finished Raise network interfaces.
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 systemd[1]: Found device Virtio network device.
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 systemd[1]: Commit a transient machine-id on disk was skipped because of a failed condition check (ConditionPathIsMountPoint[ OK ] Reached target Network.
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 systemd[1]: Started ifup for ens4.
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 kernel: [ 0.354044] x86: [ OK ] Reached target Network is Online.
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 dhclient[354]: Internet Systems Consortium DHCP Client 4.4.3
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 sh[354]: Internet Systems Consortium DHCP Client 4.4.3
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 sh[354]: Copyright 2004-2022 Internet Systems Consortium.
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 sh[354]: For info, please visit https://www.isc.org/software/dhcp/
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 dhclient[354]: Copyright 2004-2022 Internet Systems Consortium.
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 dhclient[354]: For info, please visit https://www.isc.org/software/dhcp/
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 systemd[1]: Starting Raise network interfaces...
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 ifup[356]: ifup: waiting for lock on /run/network/ifstate.ens4
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 dhclient[354]: Listening on LPF/ens4/42:01:0a:a8:00:07
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 sh[354]: Listening on LPF/ens4/42:01:0a:a8:00:07
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 sh[354]: Sending on LPF/ens4/42:01:0a:a8:00:07
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 dhclient[354]: Sending on LPF/ens4/42:01:0a:a8:00:07
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 sh[354]: DHCPDISCOVER on ens4 to 255.255.255.255 port 67 interval 7
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 kernel: [ 0.400657] NET: Registered PF_NETLINK/PF_ROUTE protocol family
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 dhclient[354]: DHCPDISCOVER on ens4 to 255.255.255.255 port 67 interval 7
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 kernel: [ 0.408289] audit: initializing netlink subsys (disabled)
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 dhclient[354]: DHCPOFFER of 10.168.0.7 from 169.254.169.254
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 sh[354]: DHCPOFFER of 10.168.0.7 from 169.254.169.254
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 sh[354]: DHCPREQUEST for 10.168.0.7 on ens4 to 255.255.255.255 port 67
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 dhclient[354]: DHCPREQUEST for 10.168.0.7 on ens4 to 255.255.255.255 port 67
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 dhclient[354]: DHCPACK of 10.168.0.7 from 169.254.169.254
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 sh[354]: DHCPACK of 10.168.0.7 from 169.254.169.254
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 kernel: [ 0.549954] NetLabel: Initializing
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 kernel: [ 0.550736] NetLabel: domain hash size = 128
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 kernel: [ 0.551480] NetLabel: protocols = UNLABELED CIPSOv4 CALIPSO
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 kernel: [ 0.552303] NetLabel: unlabeled traffic allowed by default
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 kernel: [ 0.570445] NET: Registered PF_INET protocol family
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 kernel: [ 0.586842] NET: Registered PF_UNIX/PF_LOCAL protocol family
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 kernel: [ 0.587916] NET: Registered PF_XDP protocol family
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 kernel: [ 0.865585] NET: Registered PF_INET6 protocol family
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 kernel: [ 0.872235] NET: Registered PF_PACKET protocol family
rnel: [ 1.153962] virtio_net virtio1 ens4: renamed from eth0
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 sh[474]: ens4=ens4
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 systemd[1]: Finished Raise network interfaces.
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 systemd[1]: Reached target Network.
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 systemd[1]: Reached target Network is Online.

rebooting into the new kernel:

[ 0.475837] NET: Registered PF_NETLINK/PF_ROUTE protocol family
[ 0.476558] audit: initializing netlink subsys (disabled)
[ 0.630598] NetLabel: Initializing
[ 0.631503] NetLabel: domain hash size = 128
[ 0.632409] NetLabel: protocols = UNLABELED CIPSOv4 CALIPSO
[ 0.632515] NetLabel: unlabeled traffic allowed by default
[ 0.654654] NET: Registered PF_INET protocol family
[ 0.672514] NET: Registered PF_UNIX/PF_LOCAL protocol family
[ 0.871362] Initializing XFRM netlink socket
[ 0.872171] NET: Registered PF_INET6 protocol family
[ 0.875791] NET: Registered PF_PACKET protocol family
[ 0.876932] 9pnet: Installing 9P2000 support
[ 0.887570] printk: console [netcon0] enabled
[ 0.888339] netconsole: network logging started
[ 0.943112] virtio_net virtio1 enp0s4: renamed from eth0
Starting Raise network interfaces...
[ OK ] Found device Virtio network device.
[ 1.876517] IPv6: ADDRCONF(NETDEV_CHANGE): enp0s4: link becomes ready
Aug 13 22:51:16 debian systemd[1]: Starting Raise network interfaces...
Aug 13 22:51:16 debian dhclient[349]: Internet Systems Consortium DHCP Client 4.4.3
Aug 13 22:51:16 debian ifup[349]: Internet Systems Consortium DHCP Client 4.4.3
Aug 13 22:51:16 debian ifup[349]: Copyright 2004-2022 Internet Systems Consortium.
Aug 13 22:51:16 debian ifup[349]: For info, please visit https://www.isc.org/software/dhcp/
Aug 13 22:51:16 debian dhclient[349]: Copyright 2004-2022 Internet Systems Consortium.
Aug 13 22:51:16 debian dhclient[349]: For info, please visit https://www.isc.org/software/dhcp/
Aug 13 22:51:16 debian kernel: [ 0.475837] NET: Registered PF_NETLINK/PF_ROUTE protocol family
Aug 13 22:51:16 debian kernel: [ 0.476558] audit: initializing netlink subsys (disabled)
Aug 13 22:51:16 debian systemd[1]: Found device Virtio network device.
Aug 13 22:51:16 debian ifup[349]: DHCPDISCOVER on enp0s4 to 255.255.255.255 port 67 interval 6
Aug 13 22:51:16 debian dhclient[349]: DHCPDISCOVER on enp0s4 to 255.255.255.255 port 67 interval 6
Aug 13 22:51:16 debian sh[356]: ifup: waiting for lock on /run/network/ifstate.enp0s4
Aug 13 22:51:16 debian kernel: [ 0.630598] NetLabel: Initializing
Aug 13 22:51:16 debian kernel: [ 0.631503] NetLabel: domain hash size = 128
Aug 13 22:51:16 debian kernel: [ 0.632409] NetLabel: protocols = UNLABELED CIPSOv4 CALIPSO
Aug 13 22:51:16 debian kernel: [ 0.632515] NetLabel: unlabeled traffic allowed by default
Aug 13 22:51:16 debian kernel: [ 0.654654] NET: Registered PF_INET protocol family
Aug 13 22:51:16 debian kernel: [ 0.672514] NET: Registered PF_UNIX/PF_LOCAL protocol family
Aug 13 22:51:16 debian kernel: [ 0.871362] Initializing XFRM netlink socket
Aug 13 22:51:16 debian kernel: [ 0.872171] NET: Registered PF_INET6 protocol family
Aug 13 22:51:16 debian kernel: [ 0.875791] NET: Registered PF_PACKET protocol family
Aug 13 22:51:16 debian kernel: [ 0.876932] 9pnet: Installing 9P2000 support
Aug 13 22:51:16 debian kernel: [ 0.887570] printk: console [netcon0] enabled
Aug 13 22:51:16 debian kernel: [ 0.888339] netconsole: network logging started
Aug 13 22:51:16 debian kernel: [ 0.943112] virtio_net virtio1 enp0s4: renamed from eth0
Aug 13 22:51:16 debian kernel: [ 1.876517] IPv6: ADDRCONF(NETDEV_CHANGE): enp0s4: link becomes ready
[ *** ] A start job is running for Raise network interfaces (6s / 5min)
Aug 13 22:51:22 debian dhclient[349]: DHCPDISCOVER on enp0s4 to 255.255.255.255 port 67 interval 13
[*** ] A start job is running for Raise network interfaces (19s / 5min)
Aug 13 22:51:35 debian dhclient[349]: DHCPDISCOVER on enp0s4 to 255.255.255.255 port 67 interval 14
[*** ] A start job is running for Raise network interfaces (33s / 5min)
...


Greetings,

Andres Freund


[1] https://github.com/anarazel/pg-vm-images/blob/main/packer/linux_debian.pkr.hcl#L225

2022-08-14 01:54:40

by Xuan Zhuo

[permalink] [raw]
Subject: Re: [GIT PULL] virtio: fatures, fixes

On Sat, 13 Aug 2022 17:45:22 -0700, Andres Freund <[email protected]> wrote:
> Hi,
>
> On 2022-08-12 11:42:50 -0400, Michael S. Tsirkin wrote:
> > The following changes since commit 3d7cb6b04c3f3115719235cc6866b10326de34cd:
> >
> > Linux 5.19 (2022-07-31 14:03:01 -0700)
> >
> > are available in the Git repository at:
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus
> >
> > for you to fetch changes up to 93e530d2a1c4c0fcce45e01ae6c5c6287a08d3e3:
> >
> > vdpa/mlx5: Fix possible uninitialized return value (2022-08-11 10:00:36 -0400)
> > ----------------------------------------------------------------
> > virtio: fatures, fixes
> >
> > A huge patchset supporting vq resize using the
> > new vq reset capability.
> > Features, fixes, cleanups all over the place.
> >
> > Signed-off-by: Michael S. Tsirkin <[email protected]>
> >
> > ----------------------------------------------------------------
>
> I have a script [1] that daily builds google cloud VM images with a fresh vanilla
> kernel for postgres CI testing. The last successful image creation was
> 7ebfc85e2cd7b08f518b526173e9a33b56b3913b
> and the first failing was
> 69dac8e431af26173ca0a1ebc87054e01c585bcc
>
> Since then creating a new kernel boots but network does not come up.
>
> Looking at the merges between those commit makes me suspect this merge:
>
> 69dac8e431af Merge tag 'riscv-for-linus-5.20-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
> 6c833c0581f1 Merge tag 'devicetree-fixes-for-6.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
> 3d076fec5a0c Merge tag 'rtc-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
> 4a9350597aff Merge tag 'sound-fix-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
> 7a53e17accce Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
> 999324f58c41 Merge tag 'loongarch-5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
> f7cdaeeab8ca Merge tag 'for-v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply
> d16b418fac3d Merge tag 'vfio-v6.0-rc1pt2' of https://github.com/awilliam/linux-vfio
> 9801002f76c6 perf: riscv_pmu{,_sbi}: Miscallenous improvement & fixes
> c3adefb5baf3 Merge tag 'for-6.0/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
> 7ce2aa6d7fe1 Merge tag 'drm-next-2022-08-12-1' of git://anongit.freedesktop.org/drm/drm
> 7ab52f75a9cf RISC-V: Add Sstc extension support
> 36fa1cb56ac5 Merge tag 'drm-misc-next-fixes-2022-08-10' of git://anongit.freedesktop.org/drm/drm-misc into drm-next
> da06cc5bb600 RISC-V: fixups to work with crash tool
> 6de9eb21cd36 Merge 'irq/loongarch', 'pci/ctrl/loongson' and 'pci/header-cleanup-immutable'
> 3aefb2ee5bdd riscv: implement Zicbom-based CMO instructions + the t-head variant
> 8f2f74b4b6e6 RISC-V: Canaan devicetree fixes
> f94ba7039fb4 Merge tag 'at91-reset-sama7g5-signed' into psy-next
>
> all the drivers/net changes in that commit range were part of this pull
> request.
>
>
> excerpt from serial log for debian sid kernel (sorry for the interspersed logs):
>
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 cloud-ifupdown-helper: Generated configuration for ens4
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 kern[ OK ] Finished Raise network interfaces.
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 systemd[1]: Found device Virtio network device.
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 systemd[1]: Commit a transient machine-id on disk was skipped because of a failed condition check (ConditionPathIsMountPoint[ OK ] Reached target Network.
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 systemd[1]: Started ifup for ens4.
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 kernel: [ 0.354044] x86: [ OK ] Reached target Network is Online.
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 dhclient[354]: Internet Systems Consortium DHCP Client 4.4.3
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 sh[354]: Internet Systems Consortium DHCP Client 4.4.3
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 sh[354]: Copyright 2004-2022 Internet Systems Consortium.
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 sh[354]: For info, please visit https://www.isc.org/software/dhcp/
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 dhclient[354]: Copyright 2004-2022 Internet Systems Consortium.
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 dhclient[354]: For info, please visit https://www.isc.org/software/dhcp/
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 systemd[1]: Starting Raise network interfaces...
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 ifup[356]: ifup: waiting for lock on /run/network/ifstate.ens4
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 dhclient[354]: Listening on LPF/ens4/42:01:0a:a8:00:07
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 sh[354]: Listening on LPF/ens4/42:01:0a:a8:00:07
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 sh[354]: Sending on LPF/ens4/42:01:0a:a8:00:07
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 dhclient[354]: Sending on LPF/ens4/42:01:0a:a8:00:07
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 sh[354]: DHCPDISCOVER on ens4 to 255.255.255.255 port 67 interval 7
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 kernel: [ 0.400657] NET: Registered PF_NETLINK/PF_ROUTE protocol family
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 dhclient[354]: DHCPDISCOVER on ens4 to 255.255.255.255 port 67 interval 7
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 kernel: [ 0.408289] audit: initializing netlink subsys (disabled)
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 dhclient[354]: DHCPOFFER of 10.168.0.7 from 169.254.169.254
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 sh[354]: DHCPOFFER of 10.168.0.7 from 169.254.169.254
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 sh[354]: DHCPREQUEST for 10.168.0.7 on ens4 to 255.255.255.255 port 67
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 dhclient[354]: DHCPREQUEST for 10.168.0.7 on ens4 to 255.255.255.255 port 67
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 dhclient[354]: DHCPACK of 10.168.0.7 from 169.254.169.254
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 sh[354]: DHCPACK of 10.168.0.7 from 169.254.169.254
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 kernel: [ 0.549954] NetLabel: Initializing
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 kernel: [ 0.550736] NetLabel: domain hash size = 128
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 kernel: [ 0.551480] NetLabel: protocols = UNLABELED CIPSOv4 CALIPSO
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 kernel: [ 0.552303] NetLabel: unlabeled traffic allowed by default
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 kernel: [ 0.570445] NET: Registered PF_INET protocol family
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 kernel: [ 0.586842] NET: Registered PF_UNIX/PF_LOCAL protocol family
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 kernel: [ 0.587916] NET: Registered PF_XDP protocol family
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 kernel: [ 0.865585] NET: Registered PF_INET6 protocol family
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 kernel: [ 0.872235] NET: Registered PF_PACKET protocol family
> rnel: [ 1.153962] virtio_net virtio1 ens4: renamed from eth0
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 sh[474]: ens4=ens4
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 systemd[1]: Finished Raise network interfaces.
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 systemd[1]: Reached target Network.
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 systemd[1]: Reached target Network is Online.
>
> rebooting into the new kernel:
>
> [ 0.475837] NET: Registered PF_NETLINK/PF_ROUTE protocol family
> [ 0.476558] audit: initializing netlink subsys (disabled)
> [ 0.630598] NetLabel: Initializing
> [ 0.631503] NetLabel: domain hash size = 128
> [ 0.632409] NetLabel: protocols = UNLABELED CIPSOv4 CALIPSO
> [ 0.632515] NetLabel: unlabeled traffic allowed by default
> [ 0.654654] NET: Registered PF_INET protocol family
> [ 0.672514] NET: Registered PF_UNIX/PF_LOCAL protocol family
> [ 0.871362] Initializing XFRM netlink socket
> [ 0.872171] NET: Registered PF_INET6 protocol family
> [ 0.875791] NET: Registered PF_PACKET protocol family
> [ 0.876932] 9pnet: Installing 9P2000 support
> [ 0.887570] printk: console [netcon0] enabled
> [ 0.888339] netconsole: network logging started
> [ 0.943112] virtio_net virtio1 enp0s4: renamed from eth0
> Starting Raise network interfaces...
> [ OK ] Found device Virtio network device.
> [ 1.876517] IPv6: ADDRCONF(NETDEV_CHANGE): enp0s4: link becomes ready
> Aug 13 22:51:16 debian systemd[1]: Starting Raise network interfaces...
> Aug 13 22:51:16 debian dhclient[349]: Internet Systems Consortium DHCP Client 4.4.3
> Aug 13 22:51:16 debian ifup[349]: Internet Systems Consortium DHCP Client 4.4.3
> Aug 13 22:51:16 debian ifup[349]: Copyright 2004-2022 Internet Systems Consortium.
> Aug 13 22:51:16 debian ifup[349]: For info, please visit https://www.isc.org/software/dhcp/
> Aug 13 22:51:16 debian dhclient[349]: Copyright 2004-2022 Internet Systems Consortium.
> Aug 13 22:51:16 debian dhclient[349]: For info, please visit https://www.isc.org/software/dhcp/
> Aug 13 22:51:16 debian kernel: [ 0.475837] NET: Registered PF_NETLINK/PF_ROUTE protocol family
> Aug 13 22:51:16 debian kernel: [ 0.476558] audit: initializing netlink subsys (disabled)
> Aug 13 22:51:16 debian systemd[1]: Found device Virtio network device.
> Aug 13 22:51:16 debian ifup[349]: DHCPDISCOVER on enp0s4 to 255.255.255.255 port 67 interval 6
> Aug 13 22:51:16 debian dhclient[349]: DHCPDISCOVER on enp0s4 to 255.255.255.255 port 67 interval 6
> Aug 13 22:51:16 debian sh[356]: ifup: waiting for lock on /run/network/ifstate.enp0s4
> Aug 13 22:51:16 debian kernel: [ 0.630598] NetLabel: Initializing
> Aug 13 22:51:16 debian kernel: [ 0.631503] NetLabel: domain hash size = 128
> Aug 13 22:51:16 debian kernel: [ 0.632409] NetLabel: protocols = UNLABELED CIPSOv4 CALIPSO
> Aug 13 22:51:16 debian kernel: [ 0.632515] NetLabel: unlabeled traffic allowed by default
> Aug 13 22:51:16 debian kernel: [ 0.654654] NET: Registered PF_INET protocol family
> Aug 13 22:51:16 debian kernel: [ 0.672514] NET: Registered PF_UNIX/PF_LOCAL protocol family
> Aug 13 22:51:16 debian kernel: [ 0.871362] Initializing XFRM netlink socket
> Aug 13 22:51:16 debian kernel: [ 0.872171] NET: Registered PF_INET6 protocol family
> Aug 13 22:51:16 debian kernel: [ 0.875791] NET: Registered PF_PACKET protocol family
> Aug 13 22:51:16 debian kernel: [ 0.876932] 9pnet: Installing 9P2000 support
> Aug 13 22:51:16 debian kernel: [ 0.887570] printk: console [netcon0] enabled
> Aug 13 22:51:16 debian kernel: [ 0.888339] netconsole: network logging started
> Aug 13 22:51:16 debian kernel: [ 0.943112] virtio_net virtio1 enp0s4: renamed from eth0
> Aug 13 22:51:16 debian kernel: [ 1.876517] IPv6: ADDRCONF(NETDEV_CHANGE): enp0s4: link becomes ready
> [ *** ] A start job is running for Raise network interfaces (6s / 5min)
> Aug 13 22:51:22 debian dhclient[349]: DHCPDISCOVER on enp0s4 to 255.255.255.255 port 67 interval 13
> [*** ] A start job is running for Raise network interfaces (19s / 5min)
> Aug 13 22:51:35 debian dhclient[349]: DHCPDISCOVER on enp0s4 to 255.255.255.255 port 67 interval 14
> [*** ] A start job is running for Raise network interfaces (33s / 5min)
> ...
>


Hi,

Sorry, I didn't get any valuable information from the logs, can you tell me how
to get such an image? Or how your [1] script is executed.

Thanks.


>
> Greetings,
>
> Andres Freund
>
>
> [1] https://github.com/anarazel/pg-vm-images/blob/main/packer/linux_debian.pkr.hcl#L225

2022-08-14 05:32:34

by Andres Freund

[permalink] [raw]
Subject: Re: [GIT PULL] virtio: fatures, fixes

Hi,

On 2022-08-13 20:52:39 -0700, Andres Freund wrote:
> Is there specific information you'd like from the VM? I just recreated the
> problem and can extract.

Actually, after reproducing I seem to now hit a likely different issue. I
guess I should have checked exactly the revision I had a problem with earlier,
rather than doing a git pull (up to aea23e7c464b)

[ 0.727199] scsi host0: Virtio SCSI HBA
[ 0.732257] scsi 0:0:1:0: Direct-Access Google PersistentDisk 1 PQ: 0 ANSI: 6
[ 0.736259] Freeing initrd memory: 7236K
[ 0.741743] sd 0:0:1:0: Attached scsi generic sg0 type 0
[ 0.742569] sd 0:0:1:0: [sda] 52428800 512-byte logical blocks: (26.8 GB/25.0 GiB)
[ 0.742628] tun: Universal TUN/TAP device driver, 1.6
[ 0.743730] sd 0:0:1:0: [sda] 4096-byte physical blocks
[ 0.748026] sd 0:0:1:0: [sda] Write Protect is off
[ 0.750684] sd 0:0:1:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 0.795519] BUG: unable to handle page fault for address: ffffa3107bd80008
[ 0.795753] sky2: driver version 1.30
[ 0.796500] #PF: supervisor read access in kernel mode
[ 0.797252] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[ 0.796500] #PF: error_code(0x0000) - not-present page
[ 0.796500] PGD 100001067 P4D 100001067 PUD 0
[ 0.796500] Oops: 0000 [#1] PREEMPT SMP PTI
[ 0.796500] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.19.0-origin-14013-gaea23e7c464b #2
[ 0.798728] ehci-pci: EHCI PCI platform driver
[ 0.796500] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/29/2022
[ 0.800112] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
[ 0.796500] RIP: 0010:kmem_cache_free+0x155/0x3e0
[ 0.801875] ohci-pci: OHCI PCI platform driver
[ 0.796500] Code: 02 00 00 65 48 ff 08 e8 e9 cd e6 ff 66 90 8b 45 28 48 c7 04 03 00 00 00 00 48 85 db 74 38 48 8b 45 00 65 48 03 05 fb 13 34 6d <48> 8b 50 08 4c 39 60 10 0f 85 da 01 00 00 8b 4d 28 48 8b 00 48 89
[ 0.803798] uhci_hcd: USB Universal Host Controller Interface driver
[ 0.796500] RSP: 0000:ffffa29cc0134e80 EFLAGS: 00010286
[ 0.805319] RAX: ffffa3107bd80000 RBX: ffff998840b253c0 RCX: ffff029c00000000
[ 0.805319] RDX: 0000000000000000 RSI: ffffc8f280000000 RDI: ffff998840ab2300
[ 0.805319] RBP: ffff998840ab2300 R08: fffffffffff0bddf R09: 0000000000000008
[ 0.805319] R10: ffffffff93e060c0 R11: ffffa29cc0134ff8 R12: ffffc8f28402c940
[ 0.805319] R13: ffffffff92f17edd R14: 0000000000001000 R15: 0000000000001000
[ 0.805319] FS: 0000000000000000(0000) GS:ffff99887bd80000(0000) knlGS:0000000000000000
[ 0.805319] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.805319] CR2: ffffa3107bd80008 CR3: 000000002720c001 CR4: 00000000003706e0
[ 0.805319] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 0.805319] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 0.805319] Call Trace:
[ 0.805319] <IRQ>
[ 0.805319] blk_update_request+0xfd/0x3d0
[ 0.805319] ? detach_buf_split+0x6a/0x150
[ 0.805319] scsi_end_request+0x22/0x1b0
[ 0.805319] scsi_io_completion+0x3c/0x750
[ 0.805319] blk_complete_reqs+0x38/0x50
[ 0.805319] __do_softirq+0xe1/0x2ed
[ 0.805319] ? handle_edge_irq+0x9a/0x230
[ 0.805319] __irq_exit_rcu+0xa6/0x100
[ 0.805319] common_interrupt+0xa5/0xc0
[ 0.805319] </IRQ>
[ 0.805319] <TASK>
[ 0.805319] asm_common_interrupt+0x22/0x40
[ 0.805319] RIP: 0010:acpi_idle_do_entry+0x46/0x60
[ 0.805319] Code: 75 08 48 8b 15 2f 1a 19 01 ed c3 cc cc cc cc 65 48 8b 04 25 00 ad 01 00 48 8b 00 a8 08 75 eb 66 90 0f 00 2d 9c 0d 5b 00 fb f4 <fa> c3 cc cc cc cc e9 2f fd ff ff 66 66 2e 0f 1f 84 00 00 00 00 00
[ 0.805319] RSP: 0000:ffffa29cc00a7e68 EFLAGS: 00000246
[ 0.805319] RAX: 0000000000004000 RBX: 0000000000000001 RCX: 000000000000098d
[ 0.805319] RDX: ffff99887bd80000 RSI: ffff998840b2c000 RDI: ffff998840b2c064
[ 0.805319] RBP: ffff998841a2a400 R08: fffffffffff0be0e R09: 0000000157c1aaba
[ 0.805319] R10: 0000000000000018 R11: 0000000000000c27 R12: ffffffff93fc46a0
[ 0.805319] R13: ffff998840b2c064 R14: 0000000000000001 R15: 0000000000000000
[ 0.805319] acpi_idle_enter+0x9f/0x100
[ 0.805319] cpuidle_enter_state+0x84/0x400
[ 0.805319] cpuidle_enter+0x24/0x40
[ 0.805319] do_idle+0x1df/0x260
[ 0.805319] cpu_startup_entry+0x14/0x20
[ 0.805319] start_secondary+0xe8/0xf0
[ 0.805319] secondary_startup_64_no_verify+0xe0/0xeb
[ 0.805319] </TASK>
[ 0.805319] Modules linked in:
[ 0.805319] CR2: ffffa3107bd80008
[ 0.805319] ---[ end trace 0000000000000000 ]---

Regards,

Andres

2022-08-14 05:32:38

by Andres Freund

[permalink] [raw]
Subject: Re: [GIT PULL] virtio: fatures, fixes

Hi,

On 2022-08-14 09:50:35 +0800, Xuan Zhuo wrote:
> Sorry, I didn't get any valuable information from the logs, can you tell me how
> to get such an image? Or how your [1] script is executed.

Is there specific information you'd like from the VM? I just recreated the
problem and can extract.


The last image that succeeded getting built is publically available, so you
could create a gcp VM for that, go to /usr/src/linux, git pull, make & install
the new kernel and reproduce the problem that way. The git pull will take a
bit because it's a shallow clone...

gcloud compute instances create myvm --preemptible --project your-gcp-project --image-project pg-ci-images --image pg-ci-sid-newkernel-2022-08-12t06-52 --zone us-west1-a --custom-cpu=4 --custom-memory=4 --metadata=serial-port-enable=true

If you want to log in via serial console, you'd have set a password before
rebooting.

gcloud compute connect-to-serial-port --zone us-west1-a --project=pg-ci-images-dev myvm


Executing the script requires a gcp key with the right to create instances and
images. Here's how to invoke it:

PACKER_LOG=1 GOOGLE_APPLICATION_CREDENTIALS=~/[email protected] \
packer build \
-var gcp_project=pg-ci-images-dev \
-var "image_date=$(date --utc +'%Y-%m-%dt%H-%M')" \
-var "task_name=sid-newkernel" \
-only 'linux.googlecompute.sid-newkernel' \
-on-error=ask \
packer/linux_debian.pkr.hcl

Of course you'd need to change the gcp_project= variable to point to a the
project you have access to and GOOGLE_APPLICATION_CREDENTIALS to point to your
gcp key.

Initially (package upgrades, kernel builds) the VM would be SSH
accessible. After building the kernel it's only accessible via serial console.


I can probably also get you the image in some other form that you prefer,
although I don't know if the problem will reproduce outside gcp. If helpful I
could upload a "broken" gcp image that you could use to


> > [1] https://github.com/anarazel/pg-vm-images/blob/main/packer/linux_debian.pkr.hcl#L225

Greetings,

Andres Freund

2022-08-14 09:25:34

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [GIT PULL] virtio: fatures, fixes

On Sat, Aug 13, 2022 at 09:39:06PM -0700, Andres Freund wrote:
> Hi,
>
> On 2022-08-13 20:52:39 -0700, Andres Freund wrote:
> > Is there specific information you'd like from the VM? I just recreated the
> > problem and can extract.
>
> Actually, after reproducing I seem to now hit a likely different issue. I
> guess I should have checked exactly the revision I had a problem with earlier,
> rather than doing a git pull (up to aea23e7c464b)

Looks like there's a generic memory corruption so it crashes
in random places. Would bisect be possible for you?

--
MST

2022-08-14 20:33:36

by Andres Freund

[permalink] [raw]
Subject: Re: [GIT PULL] virtio: fatures, fixes

Hi,

On 2022-08-14 04:59:48 -0400, Michael S. Tsirkin wrote:
> On Sat, Aug 13, 2022 at 09:39:06PM -0700, Andres Freund wrote:
> > Hi,
> >
> > On 2022-08-13 20:52:39 -0700, Andres Freund wrote:
> > > Is there specific information you'd like from the VM? I just recreated the
> > > problem and can extract.
> >
> > Actually, after reproducing I seem to now hit a likely different issue. I
> > guess I should have checked exactly the revision I had a problem with earlier,
> > rather than doing a git pull (up to aea23e7c464b)
>
> Looks like there's a generic memory corruption so it crashes
> in random places.

Either a generic memory corruption, or something wrong with IO.

> Would bisect be possible for you?

I'll give it a go.

Greetings,

Andres Freund

2022-08-15 07:25:14

by Andres Freund

[permalink] [raw]
Subject: Re: [GIT PULL] virtio: fatures, fixes

Hi,

On 2022-08-14 12:40:31 -0700, Andres Freund wrote:
> On 2022-08-14 04:59:48 -0400, Michael S. Tsirkin wrote:
> > On Sat, Aug 13, 2022 at 09:39:06PM -0700, Andres Freund wrote:
> > > Hi,
> > >
> > > On 2022-08-13 20:52:39 -0700, Andres Freund wrote:
> > > > Is there specific information you'd like from the VM? I just recreated the
> > > > problem and can extract.
> > >
> > > Actually, after reproducing I seem to now hit a likely different issue. I
> > > guess I should have checked exactly the revision I had a problem with earlier,
> > > rather than doing a git pull (up to aea23e7c464b)
> >
> > Looks like there's a generic memory corruption so it crashes
> > in random places.
>
> Either a generic memory corruption, or something wrong with IO.
>
> > Would bisect be possible for you?
>
> I'll give it a go.

Bisect points to

commit 762faee5a2678559d3dc09d95f8f2c54cd0466a7 (refs/bisect/bad)
Author: Xuan Zhuo <[email protected]>
Date: Mon Aug 1 14:38:57 2022 +0800

virtio_net: set the default max ring size by find_vqs()

Use virtio_find_vqs_ctx_size() to specify the maximum ring size of tx,
rx at the same time.

| rx/tx ring size
-------------------------------------------
speed == UNKNOWN or < 10G| 1024
speed < 40G | 4096
speed >= 40G | 8192

Call virtnet_update_settings() once before calling init_vqs() to update
speed.

Signed-off-by: Xuan Zhuo <[email protected]>
Acked-by: Jason Wang <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>


I'm not 100% confident yet, because the likelihood of encountering problems
was not uniform across the versions, with one of them showing the problem only
in 1/3 boots, whereas some of the others showed it 100% of the time. But I've
rebooted enough times to be fairly confident.

With 762faee5a267 I reliably see network not connecting, with
762faee5a267^=fe3dc04e31aa I haven't seen a problem yet.


I did see some other types of crashes in commits nearby, so this might not be
the only problematic bit. See also the discussion around
https://lore.kernel.org/all/CAHk-=wikzU4402P-FpJRK_QwfVOS+t-3p1Wx5awGHTvr-s_0Ew@mail.gmail.com/

Greetings,

Andres Freund