2024-04-15 14:42:07

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 6.6 000/122] 6.6.28-rc1 review

This is the start of the stable review cycle for the 6.6.28 release.
There are 122 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.

Responses should be made by Wed, 17 Apr 2024 14:19:30 +0000.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.6.28-rc1.gz
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y
and the diffstat can be found below.

thanks,

greg k-h

-------------
Pseudo-Shortlog of commits:

Greg Kroah-Hartman <[email protected]>
Linux 6.6.28-rc1

Fudongwang <[email protected]>
drm/amd/display: fix disable otg wa logic in DCN316

Harry Wentland <[email protected]>
drm/amd/display: Set VSC SDP Colorimetry same way for MST and SST

Harry Wentland <[email protected]>
drm/amd/display: Program VSC SDP colorimetry for all DP sinks >= 1.4

Tim Huang <[email protected]>
drm/amdgpu: fix incorrect number of active RBs for gfx11

Alex Deucher <[email protected]>
drm/amdgpu: always force full reset for SOC21

Lijo Lazar <[email protected]>
drm/amdgpu: Reset dGPU if suspend got aborted

Ville Syrjälä <[email protected]>
drm/i915: Disable port sync when bigjoiner is used

Ville Syrjälä <[email protected]>
drm/i915/cdclk: Fix CDCLK programming order when pipes are active

Josh Poimboeuf <[email protected]>
x86/bugs: Replace CONFIG_SPECTRE_BHI_{ON,OFF} with CONFIG_MITIGATION_SPECTRE_BHI

Josh Poimboeuf <[email protected]>
x86/bugs: Remove CONFIG_BHI_MITIGATION_AUTO and spectre_bhi=auto

Josh Poimboeuf <[email protected]>
x86/bugs: Clarify that syscall hardening isn't a BHI mitigation

Josh Poimboeuf <[email protected]>
x86/bugs: Fix BHI handling of RRSBA

Ingo Molnar <[email protected]>
x86/bugs: Rename various 'ia32_cap' variables to 'x86_arch_cap_msr'

Josh Poimboeuf <[email protected]>
x86/bugs: Cache the value of MSR_IA32_ARCH_CAPABILITIES

Josh Poimboeuf <[email protected]>
x86/bugs: Fix BHI documentation

Daniel Sneddon <[email protected]>
x86/bugs: Fix return type of spectre_bhi_state()

Arnd Bergmann <[email protected]>
irqflags: Explicitly ignore lockdep_hrtimer_exit() argument

Adam Dunlap <[email protected]>
x86/apic: Force native_apic_mem_read() to use the MOV instruction

John Stultz <[email protected]>
selftests: timers: Fix abs() warning in posix_timers test

Sean Christopherson <[email protected]>
x86/cpu: Actually turn off mitigations by default for SPECULATION_MITIGATIONS=n

Namhyung Kim <[email protected]>
perf/x86: Fix out of range data

Gavin Shan <[email protected]>
vhost: Add smp_rmb() in vhost_enable_notify()

Gavin Shan <[email protected]>
vhost: Add smp_rmb() in vhost_vq_avail_empty()

Frank Li <[email protected]>
arm64: dts: imx8-ss-dma: fix spi lpcg indices

Frank Li <[email protected]>
arm64: dts: imx8-ss-lsio: fix pwm lpcg indices

Frank Li <[email protected]>
arm64: dts: imx8-ss-conn: fix usb lpcg indices

Frank Li <[email protected]>
arm64: dts: imx8-ss-dma: fix adc lpcg indices

Frank Li <[email protected]>
arm64: dts: imx8-ss-dma: fix can lpcg indices

Frank Li <[email protected]>
arm64: dts: imx8qm-ss-dma: fix can lpcg indices

Ville Syrjälä <[email protected]>
drm/client: Fully protect modes[] with dev->mode_config.mutex

Boris Brezillon <[email protected]>
drm/panfrost: Fix the error path in panfrost_mmu_map_fault_addr()

Jammy Huang <[email protected]>
drm/ast: Fix soft lockup

Harish Kasiviswanathan <[email protected]>
drm/amdkfd: Reset GPU on queue preemption failure

Ville Syrjälä <[email protected]>
drm/i915/vrr: Disable VRR when using bigjoiner

Zack Rusin <[email protected]>
drm/vmwgfx: Enable DMA mappings with SEV

Jacek Lawrynowicz <[email protected]>
accel/ivpu: Fix deadlock in context_xa

Alexander Wetzel <[email protected]>
scsi: sg: Avoid race in error handling & drop bogus warn

Alexander Wetzel <[email protected]>
scsi: sg: Avoid sg device teardown race

Zheng Yejian <[email protected]>
kprobes: Fix possible use-after-free issue on kprobe registration

Pavel Begunkov <[email protected]>
io_uring/net: restore msg_control on sendzc retry

Boris Burkov <[email protected]>
btrfs: qgroup: convert PREALLOC to PERTRANS after record_root_in_trans

Boris Burkov <[email protected]>
btrfs: record delayed inode root in transaction

Boris Burkov <[email protected]>
btrfs: qgroup: fix qgroup prealloc rsv leak in subvolume operations

Boris Burkov <[email protected]>
btrfs: qgroup: correctly model root qgroup rsv in convert

Geliang Tang <[email protected]>
selftests: mptcp: use += operator to append strings

Jacob Pan <[email protected]>
iommu/vt-d: Allocate local memory for page request queue

Xuchun Shang <[email protected]>
iommu/vt-d: Fix wrong use of pasid config

Arnd Bergmann <[email protected]>
tracing: hide unused ftrace_event_id_fops

David Arinzon <[email protected]>
net: ena: Set tx_info->xdpf value to NULL

David Arinzon <[email protected]>
net: ena: Use tx_ring instead of xdp_ring for XDP channel TX

David Arinzon <[email protected]>
net: ena: Pass ena_adapter instead of net_device to ena_xmit_common()

David Arinzon <[email protected]>
net: ena: Move XDP code to its new files

David Arinzon <[email protected]>
net: ena: Fix incorrect descriptor free behavior

David Arinzon <[email protected]>
net: ena: Wrong missing IO completions check order

David Arinzon <[email protected]>
net: ena: Fix potential sign extension issue

Michal Luczaj <[email protected]>
af_unix: Fix garbage collector racing against connect()

Kuniyuki Iwashima <[email protected]>
af_unix: Do not use atomic ops for unix_sk(sk)->inflight.

Arınç ÜNAL <[email protected]>
net: dsa: mt7530: trap link-local frames regardless of ST Port State

Gerd Bayer <[email protected]>
Revert "s390/ism: fix receive message buffer allocation"

Daniel Machon <[email protected]>
net: sparx5: fix wrong config being used when reconfiguring PCS

Rahul Rameshbabu <[email protected]>
net/mlx5e: Do not produce metadata freelist entries in Tx port ts WQE xmit

Carolina Jubran <[email protected]>
net/mlx5e: HTB, Fix inconsistencies with QoS SQs number

Carolina Jubran <[email protected]>
net/mlx5e: Fix mlx5e_priv_init() cleanup flow

Cosmin Ratiu <[email protected]>
net/mlx5: Correctly compare pkt reformat ids

Cosmin Ratiu <[email protected]>
net/mlx5: Properly link new fs rules into the tree

Michael Liang <[email protected]>
net/mlx5: offset comp irq index in name by one

Shay Drory <[email protected]>
net/mlx5: Register devlink first under devlink lock

Moshe Shemesh <[email protected]>
net/mlx5: SF, Stop waiting for FW as teardown was called

Eric Dumazet <[email protected]>
netfilter: complete validation of user input

Archie Pusaka <[email protected]>
Bluetooth: l2cap: Don't double set the HCI_CONN_MGMT_CONNECTED bit

Luiz Augusto von Dentz <[email protected]>
Bluetooth: SCO: Fix not validating setsockopt user input

Luiz Augusto von Dentz <[email protected]>
Bluetooth: hci_sync: Fix using the same interval and window for Coded PHY

Luiz Augusto von Dentz <[email protected]>
Bluetooth: hci_sync: Use QoS to determine which PHY to scan

Luiz Augusto von Dentz <[email protected]>
Bluetooth: ISO: Don't reject BT_ISO_QOS if parameters are unset

Luiz Augusto von Dentz <[email protected]>
Bluetooth: ISO: Align broadcast sync_timeout with connection timeout

Jiri Benc <[email protected]>
ipv6: fix race condition between ipv6_get_ifaddr and ipv6_del_addr

Arnd Bergmann <[email protected]>
ipv4/route: avoid unused-but-set-variable warning

Arnd Bergmann <[email protected]>
ipv6: fib: hide unused 'pn' variable

Geetha sowjanya <[email protected]>
octeontx2-af: Fix NIX SQ mode and BP config

Kuniyuki Iwashima <[email protected]>
af_unix: Clear stale u->oob_skb.

Marek Vasut <[email protected]>
net: ks8851: Handle softirqs at the end of IRQ thread to fix hang

Marek Vasut <[email protected]>
net: ks8851: Inline ks8851_rx_skb()

Pavan Chebbi <[email protected]>
bnxt_en: Reset PTP tx_avail after possible firmware reset

Vikas Gupta <[email protected]>
bnxt_en: Fix error recovery for RoCE ulp client

Vikas Gupta <[email protected]>
bnxt_en: Fix possible memory leak in bnxt_rdma_aux_device_init()

Gerd Bayer <[email protected]>
s390/ism: fix receive message buffer allocation

Eric Dumazet <[email protected]>
geneve: fix header validation in geneve[6]_xmit_skb

Ming Lei <[email protected]>
block: fix q->blkg_list corruption during disk rebind

Hariprasad Kelam <[email protected]>
octeontx2-pf: Fix transmit scheduler resource leak

Eric Dumazet <[email protected]>
xsk: validate user input for XDP_{UMEM|COMPLETION}_FILL_RING

Petr Tesarik <[email protected]>
u64_stats: fix u64_stats_init() for lockdep when used repeatedly in one file

Ilya Maximets <[email protected]>
net: openvswitch: fix unwanted error log on timeout policy probing

Dan Carpenter <[email protected]>
scsi: qla2xxx: Fix off by one in qla_edif_app_getstats()

Xiang Chen <[email protected]>
scsi: hisi_sas: Modify the deadline for ata_wait_after_reset()

Arnd Bergmann <[email protected]>
nouveau: fix function cast warning

Alex Constantino <[email protected]>
Revert "drm/qxl: simplify qxl_fence_wait"

Kwangjin Ko <[email protected]>
cxl/core: Fix initialization of mbox_cmd.size_out in get event

Frank Li <[email protected]>
arm64: dts: imx8-ss-conn: fix usdhc wrong lpcg clock order

Dmitry Baryshkov <[email protected]>
drm/msm/dpu: don't allow overriding data from catalog

Dave Jiang <[email protected]>
cxl/core/regs: Fix usage of map->reg_type in cxl_decode_regblock() before assigned

Yuquan Wang <[email protected]>
cxl/mem: Fix for the index of Clear Event Record Handle

Cristian Marussi <[email protected]>
firmware: arm_scmi: Make raw debugfs entries non-seekable

Aaro Koskinen <[email protected]>
ARM: OMAP2+: fix USB regression on Nokia N8x0

Aaro Koskinen <[email protected]>
mmc: omap: restore original power up/down steps

Aaro Koskinen <[email protected]>
mmc: omap: fix deferred probe

Aaro Koskinen <[email protected]>
mmc: omap: fix broken slot switch lookup

Aaro Koskinen <[email protected]>
ARM: OMAP2+: fix N810 MMC gpiod table

Aaro Koskinen <[email protected]>
ARM: OMAP2+: fix bogus MMC GPIO labels on Nokia N8x0

Nini Song <[email protected]>
media: cec: core: remove length check of Timer Status

Anna-Maria Behnsen <[email protected]>
PM: s2idle: Make sure CPUs will wakeup directly on resume

Hans de Goede <[email protected]>
ACPI: scan: Do not increase dep_unmet for already met dependencies

Noah Loomans <[email protected]>
platform/chrome: cros_ec_uart: properly fix race condition

Tim Huang <[email protected]>
drm/amd/pm: fixes a random hang in S4 for SMU v13.0.4/11

Dmitry Antipov <[email protected]>
Bluetooth: Fix memory leak in hci_req_sync_complete()

Steven Rostedt (Google) <[email protected]>
ring-buffer: Only update pages_touched when a new page is touched

Yu Kuai <[email protected]>
raid1: fix use-after-free for original bio in raid1_write_request()

Fabio Estevam <[email protected]>
ARM: dts: imx7s-warp: Pass OV2680 link-frequencies

Gavin Shan <[email protected]>
arm64: tlb: Fix TLBI RANGE operand

Sven Eckelmann <[email protected]>
batman-adv: Avoid infinite loop trying to resize local TT

Damien Le Moal <[email protected]>
ata: libata-scsi: Fix ata_scsi_dev_rescan() error path

Igor Pylypiv <[email protected]>
ata: libata-core: Allow command duration limits detection for ACS-4 drives

Steve French <[email protected]>
smb3: fix Open files on server counter going negative


-------------

Diffstat:

Documentation/admin-guide/hw-vuln/spectre.rst | 22 +-
Documentation/admin-guide/kernel-parameters.txt | 12 +-
.../device_drivers/ethernet/amazon/ena.rst | 1 +
Makefile | 4 +-
arch/arm/boot/dts/nxp/imx/imx7s-warp.dts | 1 +
arch/arm/mach-omap2/board-n8x0.c | 23 +-
arch/arm64/boot/dts/freescale/imx8-ss-conn.dtsi | 16 +-
arch/arm64/boot/dts/freescale/imx8-ss-dma.dtsi | 36 +-
arch/arm64/boot/dts/freescale/imx8-ss-lsio.dtsi | 16 +-
arch/arm64/boot/dts/freescale/imx8qm-ss-dma.dtsi | 8 +-
arch/arm64/include/asm/tlbflush.h | 20 +-
arch/x86/Kconfig | 21 +-
arch/x86/events/core.c | 1 +
arch/x86/include/asm/apic.h | 3 +-
arch/x86/kernel/apic/apic.c | 6 +-
arch/x86/kernel/cpu/bugs.c | 82 ++-
arch/x86/kernel/cpu/common.c | 48 +-
block/blk-cgroup.c | 9 +-
block/blk-cgroup.h | 2 +
block/blk-core.c | 2 +
drivers/accel/ivpu/ivpu_drv.c | 2 +-
drivers/acpi/scan.c | 3 +-
drivers/ata/libata-core.c | 2 +-
drivers/ata/libata-scsi.c | 9 +-
drivers/cxl/core/mbox.c | 5 +-
drivers/cxl/core/regs.c | 5 +-
drivers/firmware/arm_scmi/raw_mode.c | 7 +-
drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 2 +-
drivers/gpu/drm/amd/amdgpu/soc21.c | 27 +-
.../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 1 +
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 15 +-
.../amd/display/dc/clk_mgr/dcn316/dcn316_clk_mgr.c | 19 +-
.../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.c | 12 +-
drivers/gpu/drm/ast/ast_dp.c | 3 +
drivers/gpu/drm/drm_client_modeset.c | 3 +-
drivers/gpu/drm/i915/display/intel_cdclk.c | 7 +-
drivers/gpu/drm/i915/display/intel_cdclk.h | 3 +
drivers/gpu/drm/i915/display/intel_ddi.c | 5 +
drivers/gpu/drm/i915/display/intel_vrr.c | 7 +
drivers/gpu/drm/msm/disp/dpu1/dpu_core_perf.c | 10 +-
.../gpu/drm/nouveau/nvkm/subdev/bios/shadowof.c | 7 +-
drivers/gpu/drm/panfrost/panfrost_mmu.c | 13 +-
drivers/gpu/drm/qxl/qxl_release.c | 50 +-
drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 11 +-
drivers/iommu/intel/perfmon.c | 2 +-
drivers/iommu/intel/svm.c | 2 +-
drivers/md/raid1.c | 2 +-
drivers/media/cec/core/cec-adap.c | 14 -
drivers/mmc/host/omap.c | 48 +-
drivers/net/dsa/mt7530.c | 229 ++++++-
drivers/net/dsa/mt7530.h | 5 +
drivers/net/ethernet/amazon/ena/Makefile | 2 +-
drivers/net/ethernet/amazon/ena/ena_com.c | 2 +-
drivers/net/ethernet/amazon/ena/ena_ethtool.c | 1 +
drivers/net/ethernet/amazon/ena/ena_netdev.c | 688 ++-------------------
drivers/net/ethernet/amazon/ena/ena_netdev.h | 83 +--
drivers/net/ethernet/amazon/ena/ena_xdp.c | 466 ++++++++++++++
drivers/net/ethernet/amazon/ena/ena_xdp.h | 152 +++++
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 2 +
drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c | 6 +-
.../net/ethernet/marvell/octeontx2/af/rvu_nix.c | 22 +-
drivers/net/ethernet/marvell/octeontx2/nic/qos.c | 1 +
drivers/net/ethernet/mellanox/mlx5/core/en/ptp.h | 8 +-
drivers/net/ethernet/mellanox/mlx5/core/en/qos.c | 33 +-
drivers/net/ethernet/mellanox/mlx5/core/en/selq.c | 2 +
drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 2 -
drivers/net/ethernet/mellanox/mlx5/core/en_tx.c | 7 +-
drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 17 +-
drivers/net/ethernet/mellanox/mlx5/core/main.c | 37 +-
drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c | 4 +-
.../ethernet/mellanox/mlx5/core/sf/dev/driver.c | 22 +-
drivers/net/ethernet/micrel/ks8851.h | 3 -
drivers/net/ethernet/micrel/ks8851_common.c | 16 +-
drivers/net/ethernet/micrel/ks8851_par.c | 11 -
drivers/net/ethernet/micrel/ks8851_spi.c | 11 -
.../net/ethernet/microchip/sparx5/sparx5_port.c | 4 +-
drivers/net/geneve.c | 4 +-
drivers/platform/chrome/cros_ec_uart.c | 28 +-
drivers/scsi/hisi_sas/hisi_sas_main.c | 2 +-
drivers/scsi/qla2xxx/qla_edif.c | 2 +-
drivers/scsi/sg.c | 20 +-
drivers/vhost/vhost.c | 28 +-
fs/btrfs/delayed-inode.c | 3 +
fs/btrfs/inode.c | 13 +-
fs/btrfs/ioctl.c | 37 +-
fs/btrfs/qgroup.c | 2 +
fs/btrfs/root-tree.c | 10 -
fs/btrfs/root-tree.h | 2 -
fs/btrfs/transaction.c | 17 +-
fs/smb/client/cached_dir.c | 4 +-
include/linux/dma-fence.h | 7 +
include/linux/irqflags.h | 2 +-
include/linux/u64_stats_sync.h | 9 +-
include/net/addrconf.h | 4 +
include/net/af_unix.h | 2 +-
include/net/bluetooth/bluetooth.h | 11 +
include/net/ip_tunnels.h | 33 +
io_uring/net.c | 1 +
kernel/cpu.c | 3 +-
kernel/kprobes.c | 18 +-
kernel/power/suspend.c | 6 +
kernel/trace/ring_buffer.c | 6 +-
kernel/trace/trace_events.c | 4 +
net/batman-adv/translation-table.c | 2 +-
net/bluetooth/hci_request.c | 4 +-
net/bluetooth/hci_sync.c | 66 +-
net/bluetooth/iso.c | 14 +-
net/bluetooth/l2cap_core.c | 3 +-
net/bluetooth/sco.c | 23 +-
net/ipv4/netfilter/arp_tables.c | 4 +
net/ipv4/netfilter/ip_tables.c | 4 +
net/ipv4/route.c | 4 +-
net/ipv6/addrconf.c | 7 +-
net/ipv6/ip6_fib.c | 7 +-
net/ipv6/netfilter/ip6_tables.c | 4 +
net/openvswitch/conntrack.c | 5 +-
net/unix/af_unix.c | 8 +-
net/unix/garbage.c | 35 +-
net/unix/scm.c | 8 +-
net/xdp/xsk.c | 2 +
tools/testing/selftests/net/mptcp/mptcp_connect.sh | 53 +-
tools/testing/selftests/net/mptcp/mptcp_join.sh | 30 +-
tools/testing/selftests/timers/posix_timers.c | 2 +-
123 files changed, 1765 insertions(+), 1263 deletions(-)




2024-04-15 18:12:50

by Florian Fainelli

[permalink] [raw]
Subject: Re: [PATCH 6.6 000/122] 6.6.28-rc1 review



On 4/15/2024 7:19 AM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 6.6.28 release.
> There are 122 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed, 17 Apr 2024 14:19:30 +0000.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.6.28-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h

On ARCH_BRCMSTB using 32-bit and 64-bit ARM kernels, build tested on
BMIPS_GENERIC:

Tested-by: Florian Fainelli <[email protected]>
--
Florian

2024-04-15 23:53:12

by Kelsey Steele

[permalink] [raw]
Subject: Re: [PATCH 6.6 000/122] 6.6.28-rc1 review

On Mon, Apr 15, 2024 at 04:19:25PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 6.6.28 release.
> There are 122 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed, 17 Apr 2024 14:19:30 +0000.
> Anything received after that time might be too late.
>
No regressions found on WSL (x86 and arm64).

Built, booted, and reviewed dmesg.

Thank you. :)

Tested-by: Kelsey Steele <[email protected]>

2024-04-16 00:05:43

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH 6.6 000/122] 6.6.28-rc1 review

On Mon, Apr 15, 2024 at 04:19:25PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 6.6.28 release.
> There are 122 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.

I'm seeing boot breakage with this one on the Arm fast models, a bisect
is running now, for slow values of run but should be done by the time I
get back tonight. It only seems to be affecting 6.6, the boot grinds to
a halt shortly after getting to userspace apparently with some
PCI/virtio issues:

[ 1.606075] VFS: Mounted root (ext4 filesystem) on device 254:1.
[ 1.608751] devtmpfs: mounted
[ 1.627412] Freeing unused kernel memory: 9152K
[ 1.627894] Run /sbin/init as init process
[ 1.627957] with arguments:
[ 1.628009] /sbin/init
[ 1.628064] Image
[ 1.628117] with environment:
[ 1.628169] HOME=/
[ 1.628222] TERM=linux
[ 1.628275] user_debug=31
[ 11.764055] pci 0000:00:01.0: deferred probe pending
[ 11.764141] pci 0000:00:02.0: deferred probe pending
[ 11.764227] pci 0000:00:03.0: deferred probe pending
[ 11.764313] pci 0000:00:04.0: deferred probe pending
[ 11.764399] pci 0000:03:00.0: deferred probe pending
[ 11.764485] pci 0000:04:00.0: deferred probe pending
[ 11.764571] pci 0000:04:01.0: deferred probe pending
[ 11.764657] pci 0000:04:02.0: deferred probe pending
[ 11.764743] pci 0000:00:1f.0: deferred probe pending
[ 11.764829] pci 0000:01:00.0: deferred probe pending
[ 11.764915] pci 0000:05:00.0: deferred probe pending

(no probe deferral happens for working boots.)


Attachments:
(No filename) (1.67 kB)
signature.asc (499.00 B)
Download all attachments

2024-04-16 06:27:05

by Ron Economos

[permalink] [raw]
Subject: Re: [PATCH 6.6 000/122] 6.6.28-rc1 review

On 4/15/24 7:19 AM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 6.6.28 release.
> There are 122 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed, 17 Apr 2024 14:19:30 +0000.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.6.28-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h

Built and booted successfully on RISC-V RV64 (HiFive Unmatched).

Tested-by: Ron Economos <[email protected]>


2024-04-16 07:45:56

by Harshit Mogalapalli

[permalink] [raw]
Subject: Re: [PATCH 6.6 000/122] 6.6.28-rc1 review

Hi Greg,

On 15/04/24 19:49, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 6.6.28 release.
> There are 122 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed, 17 Apr 2024 14:19:30 +0000.
> Anything received after that time might be too late.
>
No problems seen on x86_64 and aarch64 with our testing.

Tested-by: Harshit Mogalapalli <[email protected]>

Thanks,
Harshit

> The whole patch series can be found in one patch at:
> https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.6.28-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h
>
> ---------

2024-04-16 09:17:25

by Naresh Kamboju

[permalink] [raw]
Subject: Re: [PATCH 6.6 000/122] 6.6.28-rc1 review

On Tue, 16 Apr 2024 at 05:35, Mark Brown <[email protected]> wrote:
>
> On Mon, Apr 15, 2024 at 04:19:25PM +0200, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 6.6.28 release.
> > There are 122 patches in this series, all will be posted as a response
> > to this one. If anyone has any issues with these being applied, please
> > let me know.
>
> I'm seeing boot breakage with this one on the Arm fast models, a bisect
> is running now, for slow values of run but should be done by the time I
> get back tonight. It only seems to be affecting 6.6, the boot grinds to
> a halt shortly after getting to userspace apparently with some
> PCI/virtio issues:

LKFT also noticed the problem that Mark Brown reported.

>
> [ 1.606075] VFS: Mounted root (ext4 filesystem) on device 254:1.
> [ 1.608751] devtmpfs: mounted
> [ 1.627412] Freeing unused kernel memory: 9152K
> [ 1.627894] Run /sbin/init as init process
> [ 1.627957] with arguments:
> [ 1.628009] /sbin/init
> [ 1.628064] Image
> [ 1.628117] with environment:
> [ 1.628169] HOME=/
> [ 1.628222] TERM=linux
> [ 1.628275] user_debug=31
> [ 11.764055] pci 0000:00:01.0: deferred probe pending
> [ 11.764141] pci 0000:00:02.0: deferred probe pending
> [ 11.764227] pci 0000:00:03.0: deferred probe pending
> [ 11.764313] pci 0000:00:04.0: deferred probe pending
> [ 11.764399] pci 0000:03:00.0: deferred probe pending
> [ 11.764485] pci 0000:04:00.0: deferred probe pending
> [ 11.764571] pci 0000:04:01.0: deferred probe pending
> [ 11.764657] pci 0000:04:02.0: deferred probe pending
> [ 11.764743] pci 0000:00:1f.0: deferred probe pending
> [ 11.764829] pci 0000:01:00.0: deferred probe pending
> [ 11.764915] pci 0000:05:00.0: deferred probe pending
>
> (no probe deferral happens for working boots.)


--
Linaro LKFT
https://lkft.linaro.org

2024-04-16 10:13:51

by Takeshi Ogasawara

[permalink] [raw]
Subject: Re: [PATCH 6.6 000/122] 6.6.28-rc1 review

Hi Greg

On Mon, Apr 15, 2024 at 11:35 PM Greg Kroah-Hartman
<[email protected]> wrote:
>
> This is the start of the stable review cycle for the 6.6.28 release.
> There are 122 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed, 17 Apr 2024 14:19:30 +0000.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.6.28-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h

6.6.28-rc1 tested.

Build successfully completed.
Boot successfully completed.
No dmesg regressions.
Video output normal.
Sound output normal.

Lenovo ThinkPad X1 Carbon Gen10(Intel i7-1260P(x86_64) arch linux)

[ 0.000000] Linux version 6.6.28-rc1rv
(takeshi@ThinkPadX1Gen10J0764) (gcc (GCC) 13.2.1 20230801, GNU ld (GNU
Binutils) 2.42.0) #1 SMP PREEMPT_DYNAMIC Tue Apr 16 18:32:52 JST 2024

Thanks

Tested-by: Takeshi Ogasawara <[email protected]>

2024-04-16 10:34:25

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH 6.6 000/122] 6.6.28-rc1 review

On Mon, Apr 15, 2024 at 04:19:25PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 6.6.28 release.
> There are 122 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.

The bisect of the boot issue that's affecting the FVP in v6.6 (only)
landed on c9ad150ed8dd988 (arm64: tlb: Fix TLBI RANGE operand),
e3ba51ab24fdd in mainline, as being the first bad commit - it's also in
the -rc for v6.8 but that seems fine. I've done no investigation beyond
the bisect and looking at the commit log to pull out people to CC and
note that the fix was explicitly targeted at v6.6.

Bisect log:

# bad: [a4e5ff3532873150dc32d20f5c214ec59f98bcd2] Linux 6.6.28-rc1
# good: [5e828009c8b380739e13da92be847f10461c38b1] Linux 6.6.27
git bisect start 'a4e5ff3532873150dc32d20f5c214ec59f98bcd2' '5e828009c8b380739e13da92be847f10461c38b1'
# bad: [a4e5ff3532873150dc32d20f5c214ec59f98bcd2] Linux 6.6.28-rc1
git bisect bad a4e5ff3532873150dc32d20f5c214ec59f98bcd2
# bad: [f95afc8867d1f2e18e0c6abd16ca76c99a2839be] net/mlx5e: HTB, Fix inconsistencies with QoS SQs number
git bisect bad f95afc8867d1f2e18e0c6abd16ca76c99a2839be
# bad: [06e82fe83cc671df58a956cd0cf8ba64c15a6d0d] scsi: qla2xxx: Fix off by one in qla_edif_app_getstats()
git bisect bad 06e82fe83cc671df58a956cd0cf8ba64c15a6d0d
# bad: [d2b5692676e7a204487546699cd5511baad5e9b6] ARM: OMAP2+: fix bogus MMC GPIO labels on Nokia N8x0
git bisect bad d2b5692676e7a204487546699cd5511baad5e9b6
# bad: [a438d050bf7ba5e3462dd61d90897569e7892c80] raid1: fix use-after-free for original bio in raid1_write_request()
git bisect bad a438d050bf7ba5e3462dd61d90897569e7892c80
# good: [6e869ee886dead911b2411c7cba816be52dffb19] ata: libata-scsi: Fix ata_scsi_dev_rescan() error path
git bisect good 6e869ee886dead911b2411c7cba816be52dffb19
# bad: [c9ad150ed8dd988d1cefc1a8e19df53d46990e76] arm64: tlb: Fix TLBI RANGE operand
git bisect bad c9ad150ed8dd988d1cefc1a8e19df53d46990e76
# good: [56a6896c1f107d519c0045dd6575648745bcba21] batman-adv: Avoid infinite loop trying to resize local TT
git bisect good 56a6896c1f107d519c0045dd6575648745bcba21
# first bad commit: [c9ad150ed8dd988d1cefc1a8e19df53d46990e76] arm64: tlb: Fix TLBI RANGE operand


Attachments:
(No filename) (2.28 kB)
signature.asc (499.00 B)
Download all attachments

2024-04-16 10:38:24

by Jon Hunter

[permalink] [raw]
Subject: Re: [PATCH 6.6 000/122] 6.6.28-rc1 review

On Mon, 15 Apr 2024 16:19:25 +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 6.6.28 release.
> There are 122 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed, 17 Apr 2024 14:19:30 +0000.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.6.28-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h

All tests passing for Tegra ...

Test results for stable-v6.6:
10 builds: 10 pass, 0 fail
26 boots: 26 pass, 0 fail
102 tests: 102 pass, 0 fail

Linux version: 6.6.28-rc1-ga4e5ff353287
Boards tested: tegra124-jetson-tk1, tegra186-p2771-0000,
tegra194-p2972-0000, tegra194-p3509-0000+p3668-0000,
tegra20-ventana, tegra210-p2371-2180,
tegra210-p3450-0000, tegra30-cardhu-a04

Tested-by: Jon Hunter <[email protected]>

Jon

2024-04-16 11:04:41

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH 6.6 000/122] 6.6.28-rc1 review

On Tue, 16 Apr 2024 11:34:14 +0100,
Mark Brown <[email protected]> wrote:
>
> On Mon, Apr 15, 2024 at 04:19:25PM +0200, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 6.6.28 release.
> > There are 122 patches in this series, all will be posted as a response
> > to this one. If anyone has any issues with these being applied, please
> > let me know.
>
> The bisect of the boot issue that's affecting the FVP in v6.6 (only)
> landed on c9ad150ed8dd988 (arm64: tlb: Fix TLBI RANGE operand),
> e3ba51ab24fdd in mainline, as being the first bad commit - it's also in
> the -rc for v6.8 but that seems fine. I've done no investigation beyond
> the bisect and looking at the commit log to pull out people to CC and
> note that the fix was explicitly targeted at v6.6.

What are the configurations of the kernel and the FVP?

M.

--
Without deviation from the norm, progress is not possible.

2024-04-16 11:14:12

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH 6.6 000/122] 6.6.28-rc1 review

On Tue, Apr 16, 2024 at 12:04:29PM +0100, Marc Zyngier wrote:
> Mark Brown <[email protected]> wrote:

> > The bisect of the boot issue that's affecting the FVP in v6.6 (only)
> > landed on c9ad150ed8dd988 (arm64: tlb: Fix TLBI RANGE operand),
> > e3ba51ab24fdd in mainline, as being the first bad commit - it's also in
> > the -rc for v6.8 but that seems fine. I've done no investigation beyond
> > the bisect and looking at the commit log to pull out people to CC and
> > note that the fix was explicitly targeted at v6.6.

> What are the configurations of the kernel and the FVP?

The kernel is a defconfig, the FVP arguments can be seen in the log from
the job here:

https://lava.sirena.org.uk/scheduler/job/148281#L233

(sorry, should've included that in the earlier mail.)


Attachments:
(No filename) (803.00 B)
signature.asc (499.00 B)
Download all attachments

2024-04-16 13:07:50

by Naresh Kamboju

[permalink] [raw]
Subject: Re: [PATCH 6.6 000/122] 6.6.28-rc1 review

On Tue, 16 Apr 2024 at 16:04, Mark Brown <[email protected]> wrote:
>
> On Mon, Apr 15, 2024 at 04:19:25PM +0200, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 6.6.28 release.
> > There are 122 patches in this series, all will be posted as a response
> > to this one. If anyone has any issues with these being applied, please
> > let me know.
>
> The bisect of the boot issue that's affecting the FVP in v6.6 (only)
> landed on c9ad150ed8dd988 (arm64: tlb: Fix TLBI RANGE operand),
> e3ba51ab24fdd in mainline, as being the first bad commit - it's also in
> the -rc for v6.8 but that seems fine. I've done no investigation beyond
> the bisect and looking at the commit log to pull out people to CC and
> note that the fix was explicitly targeted at v6.6.

Anders investigated this reported issues and bisected and also found
the missing commit for stable-rc 6.6 is
e2768b798a19 ("arm64/mm: Modify range-based tlbi to decrement scale")

--
Linaro LKFT
https://lkft.linaro.org

2024-04-16 13:22:20

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH 6.6 000/122] 6.6.28-rc1 review

On Tue, 16 Apr 2024 14:07:30 +0100,
Naresh Kamboju <[email protected]> wrote:
>
> On Tue, 16 Apr 2024 at 16:04, Mark Brown <[email protected]> wrote:
> >
> > On Mon, Apr 15, 2024 at 04:19:25PM +0200, Greg Kroah-Hartman wrote:
> > > This is the start of the stable review cycle for the 6.6.28 release.
> > > There are 122 patches in this series, all will be posted as a response
> > > to this one. If anyone has any issues with these being applied, please
> > > let me know.
> >
> > The bisect of the boot issue that's affecting the FVP in v6.6 (only)
> > landed on c9ad150ed8dd988 (arm64: tlb: Fix TLBI RANGE operand),
> > e3ba51ab24fdd in mainline, as being the first bad commit - it's also in
> > the -rc for v6.8 but that seems fine. I've done no investigation beyond
> > the bisect and looking at the commit log to pull out people to CC and
> > note that the fix was explicitly targeted at v6.6.
>
> Anders investigated this reported issues and bisected and also found
> the missing commit for stable-rc 6.6 is
> e2768b798a19 ("arm64/mm: Modify range-based tlbi to decrement scale")

Which is definitely *not* stable candidate. We need to understand why
the invalidation goes south when the scale go up instead of down.

M.

--
Without deviation from the norm, progress is not possible.

2024-04-16 14:19:48

by Pascal Ernster

[permalink] [raw]
Subject: Re: [PATCH 6.6 000/122] 6.6.28-rc1 review

[2024-04-15 16:19] Greg Kroah-Hartman:
> This is the start of the stable review cycle for the 6.6.28 release.
> There are 122 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed, 17 Apr 2024 14:19:30 +0000.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.6.28-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h


Hi, 6.6.28-rc1 is running fine on both an x86_64 Haswell VM, and on a
Mikrotik SXTsq 5 ac (the SoC a Qualcomm Atheros IPQ4018, which has 4
Cortex-A7 cores).

Regards
Pascal

2024-04-16 17:28:25

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH 6.6 000/122] 6.6.28-rc1 review

On Tue, Apr 16, 2024 at 02:22:07PM +0100, Marc Zyngier wrote:
> On Tue, 16 Apr 2024 14:07:30 +0100,
> Naresh Kamboju <[email protected]> wrote:
> > On Tue, 16 Apr 2024 at 16:04, Mark Brown <[email protected]> wrote:
> > > On Mon, Apr 15, 2024 at 04:19:25PM +0200, Greg Kroah-Hartman wrote:
> > > > This is the start of the stable review cycle for the 6.6.28 release.
> > > > There are 122 patches in this series, all will be posted as a response
> > > > to this one. If anyone has any issues with these being applied, please
> > > > let me know.
> > >
> > > The bisect of the boot issue that's affecting the FVP in v6.6 (only)
> > > landed on c9ad150ed8dd988 (arm64: tlb: Fix TLBI RANGE operand),
> > > e3ba51ab24fdd in mainline, as being the first bad commit - it's also in
> > > the -rc for v6.8 but that seems fine. I've done no investigation beyond
> > > the bisect and looking at the commit log to pull out people to CC and
> > > note that the fix was explicitly targeted at v6.6.
> >
> > Anders investigated this reported issues and bisected and also found
> > the missing commit for stable-rc 6.6 is
> > e2768b798a19 ("arm64/mm: Modify range-based tlbi to decrement scale")
>
> Which is definitely *not* stable candidate. We need to understand why
> the invalidation goes south when the scale go up instead of down.

If you backport e3ba51ab24fd ("arm64: tlb: Fix TLBI RANGE operand")
which fixes 117940aa6e5f ("KVM: arm64: Define
kvm_tlb_flush_vmid_range()") but without the newer e2768b798a19
("arm64/mm: Modify range-based tlbi to decrement scale"), it looks like
"scale" in __flush_tlb_range_op() goes out of range to 4. Tested on my
CBMC model, not on the actual kernel. It may be worth adding some
WARN_ONs in __flush_tlb_range_op() if scale is outside the 0..3 range or
num greater than 31.

I haven't investigated properly (and I'm off tomorrow, back on Thu) but
it's likely the original code was not very friendly to the maximum
range, never tested. Anyway, if one figures out why it goes out of
range, I think the solution is to also backport e2768b798a19 to stable.

--
Catalin

2024-04-17 07:05:28

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 6.6 000/122] 6.6.28-rc1 review

On Tue, Apr 16, 2024 at 06:28:10PM +0100, Catalin Marinas wrote:
> On Tue, Apr 16, 2024 at 02:22:07PM +0100, Marc Zyngier wrote:
> > On Tue, 16 Apr 2024 14:07:30 +0100,
> > Naresh Kamboju <[email protected]> wrote:
> > > On Tue, 16 Apr 2024 at 16:04, Mark Brown <[email protected]> wrote:
> > > > On Mon, Apr 15, 2024 at 04:19:25PM +0200, Greg Kroah-Hartman wrote:
> > > > > This is the start of the stable review cycle for the 6.6.28 release.
> > > > > There are 122 patches in this series, all will be posted as a response
> > > > > to this one. If anyone has any issues with these being applied, please
> > > > > let me know.
> > > >
> > > > The bisect of the boot issue that's affecting the FVP in v6.6 (only)
> > > > landed on c9ad150ed8dd988 (arm64: tlb: Fix TLBI RANGE operand),
> > > > e3ba51ab24fdd in mainline, as being the first bad commit - it's also in
> > > > the -rc for v6.8 but that seems fine. I've done no investigation beyond
> > > > the bisect and looking at the commit log to pull out people to CC and
> > > > note that the fix was explicitly targeted at v6.6.
> > >
> > > Anders investigated this reported issues and bisected and also found
> > > the missing commit for stable-rc 6.6 is
> > > e2768b798a19 ("arm64/mm: Modify range-based tlbi to decrement scale")
> >
> > Which is definitely *not* stable candidate. We need to understand why
> > the invalidation goes south when the scale go up instead of down.
>
> If you backport e3ba51ab24fd ("arm64: tlb: Fix TLBI RANGE operand")
> which fixes 117940aa6e5f ("KVM: arm64: Define
> kvm_tlb_flush_vmid_range()") but without the newer e2768b798a19
> ("arm64/mm: Modify range-based tlbi to decrement scale"), it looks like
> "scale" in __flush_tlb_range_op() goes out of range to 4. Tested on my
> CBMC model, not on the actual kernel. It may be worth adding some
> WARN_ONs in __flush_tlb_range_op() if scale is outside the 0..3 range or
> num greater than 31.
>
> I haven't investigated properly (and I'm off tomorrow, back on Thu) but
> it's likely the original code was not very friendly to the maximum
> range, never tested. Anyway, if one figures out why it goes out of
> range, I think the solution is to also backport e2768b798a19 to stable.

How about I drop the offending commit from stable and let you all figure
out what needs to be added before applying anything else :)

thanks,

greg k-h

2024-04-17 20:08:29

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH 6.6 000/122] 6.6.28-rc1 review

On Wed, Apr 17, 2024 at 09:05:12AM +0200, Greg Kroah-Hartman wrote:
> On Tue, Apr 16, 2024 at 06:28:10PM +0100, Catalin Marinas wrote:
> > On Tue, Apr 16, 2024 at 02:22:07PM +0100, Marc Zyngier wrote:
> > > On Tue, 16 Apr 2024 14:07:30 +0100,
> > > Naresh Kamboju <[email protected]> wrote:
> > > > On Tue, 16 Apr 2024 at 16:04, Mark Brown <[email protected]> wrote:
> > > > > On Mon, Apr 15, 2024 at 04:19:25PM +0200, Greg Kroah-Hartman wrote:
> > > > > > This is the start of the stable review cycle for the 6.6.28 release.
> > > > > > There are 122 patches in this series, all will be posted as a response
> > > > > > to this one. If anyone has any issues with these being applied, please
> > > > > > let me know.
> > > > >
> > > > > The bisect of the boot issue that's affecting the FVP in v6.6 (only)
> > > > > landed on c9ad150ed8dd988 (arm64: tlb: Fix TLBI RANGE operand),
> > > > > e3ba51ab24fdd in mainline, as being the first bad commit - it's also in
> > > > > the -rc for v6.8 but that seems fine. I've done no investigation beyond
> > > > > the bisect and looking at the commit log to pull out people to CC and
> > > > > note that the fix was explicitly targeted at v6.6.
> > > >
> > > > Anders investigated this reported issues and bisected and also found
> > > > the missing commit for stable-rc 6.6 is
> > > > e2768b798a19 ("arm64/mm: Modify range-based tlbi to decrement scale")
> > >
> > > Which is definitely *not* stable candidate. We need to understand why
> > > the invalidation goes south when the scale go up instead of down.
> >
> > If you backport e3ba51ab24fd ("arm64: tlb: Fix TLBI RANGE operand")
> > which fixes 117940aa6e5f ("KVM: arm64: Define
> > kvm_tlb_flush_vmid_range()") but without the newer e2768b798a19
> > ("arm64/mm: Modify range-based tlbi to decrement scale"), it looks like
> > "scale" in __flush_tlb_range_op() goes out of range to 4. Tested on my
> > CBMC model, not on the actual kernel. It may be worth adding some
> > WARN_ONs in __flush_tlb_range_op() if scale is outside the 0..3 range or
> > num greater than 31.
> >
> > I haven't investigated properly (and I'm off tomorrow, back on Thu) but
> > it's likely the original code was not very friendly to the maximum
> > range, never tested. Anyway, if one figures out why it goes out of
> > range, I think the solution is to also backport e2768b798a19 to stable.
>
> How about I drop the offending commit from stable and let you all figure
> out what needs to be added before applying anything else :)

It makes sense ;). We'll send them to stable once sorted.

--
Catalin

2024-04-18 11:07:58

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH 6.6 000/122] 6.6.28-rc1 review

On Tue, 16 Apr 2024 18:28:10 +0100,
Catalin Marinas <[email protected]> wrote:
>
> On Tue, Apr 16, 2024 at 02:22:07PM +0100, Marc Zyngier wrote:
> > On Tue, 16 Apr 2024 14:07:30 +0100,
> > Naresh Kamboju <[email protected]> wrote:
> > > On Tue, 16 Apr 2024 at 16:04, Mark Brown <[email protected]> wrote:
> > > > On Mon, Apr 15, 2024 at 04:19:25PM +0200, Greg Kroah-Hartman wrote:
> > > > > This is the start of the stable review cycle for the 6.6.28 release.
> > > > > There are 122 patches in this series, all will be posted as a response
> > > > > to this one. If anyone has any issues with these being applied, please
> > > > > let me know.
> > > >
> > > > The bisect of the boot issue that's affecting the FVP in v6.6 (only)
> > > > landed on c9ad150ed8dd988 (arm64: tlb: Fix TLBI RANGE operand),
> > > > e3ba51ab24fdd in mainline, as being the first bad commit - it's also in
> > > > the -rc for v6.8 but that seems fine. I've done no investigation beyond
> > > > the bisect and looking at the commit log to pull out people to CC and
> > > > note that the fix was explicitly targeted at v6.6.
> > >
> > > Anders investigated this reported issues and bisected and also found
> > > the missing commit for stable-rc 6.6 is
> > > e2768b798a19 ("arm64/mm: Modify range-based tlbi to decrement scale")
> >
> > Which is definitely *not* stable candidate. We need to understand why
> > the invalidation goes south when the scale go up instead of down.
>
> If you backport e3ba51ab24fd ("arm64: tlb: Fix TLBI RANGE operand")
> which fixes 117940aa6e5f ("KVM: arm64: Define
> kvm_tlb_flush_vmid_range()") but without the newer e2768b798a19
> ("arm64/mm: Modify range-based tlbi to decrement scale"), it looks like
> "scale" in __flush_tlb_range_op() goes out of range to 4. Tested on my
> CBMC model, not on the actual kernel. It may be worth adding some
> WARN_ONs in __flush_tlb_range_op() if scale is outside the 0..3 range or
> num greater than 31.
>
> I haven't investigated properly (and I'm off tomorrow, back on Thu) but
> it's likely the original code was not very friendly to the maximum
> range, never tested. Anyway, if one figures out why it goes out of
> range, I think the solution is to also backport e2768b798a19 to stable.

I looked into this, and I came to the conclusion that this patch is
pretty much incompatible with the increasing scale (even if you cap
num to 30).

The number of pages to invalidate is a 20 bit quantity, a 5 bit slice
per scale. With the 6.6 approach (limit of num=30 and increasing
scale), we invalidate each 5 bit slice independently. After each
scale round, the corresponding slice is guaranteed to be 0.

With the 6.9 method, we invalidate the maximum possible for a given
scale. With a decreasing scale, we converge towards 0 or 1 on each
round. With an increasing scale, this breaks spectacularly, because
the strong guarantee that the remaining page count is "aligned" to
2^(5*scale+1) is not valid anymore (the low bits may not be 0).

As a result, we don't converge because we never consider these low
bits anymore, the page count doesn't decrease, scale goes past 3, and
everything catches fire.

So despite my earlier comment, it looks like picking e2768b798a19 is
the right thing to do *if* we're taking e3ba51ab24fd into 6.6-stable.

Otherwise, we need a separate fix, which Ryan initially advocating for
initially.

Thanks,

M.

--
Without deviation from the norm, progress is not possible.

2024-04-18 11:21:32

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH 6.6 000/122] 6.6.28-rc1 review

On Thu, Apr 18, 2024 at 12:07:35PM +0100, Marc Zyngier wrote:
> On Tue, 16 Apr 2024 18:28:10 +0100,
> Catalin Marinas <[email protected]> wrote:
> > On Tue, Apr 16, 2024 at 02:22:07PM +0100, Marc Zyngier wrote:
> > > On Tue, 16 Apr 2024 14:07:30 +0100,
> > > Naresh Kamboju <[email protected]> wrote:
> > > > On Tue, 16 Apr 2024 at 16:04, Mark Brown <[email protected]> wrote:
> > > > > On Mon, Apr 15, 2024 at 04:19:25PM +0200, Greg Kroah-Hartman wrote:
> > > > > > This is the start of the stable review cycle for the 6.6.28 release.
> > > > > > There are 122 patches in this series, all will be posted as a response
> > > > > > to this one. If anyone has any issues with these being applied, please
> > > > > > let me know.
> > > > >
> > > > > The bisect of the boot issue that's affecting the FVP in v6.6 (only)
> > > > > landed on c9ad150ed8dd988 (arm64: tlb: Fix TLBI RANGE operand),
> > > > > e3ba51ab24fdd in mainline, as being the first bad commit - it's also in
> > > > > the -rc for v6.8 but that seems fine. I've done no investigation beyond
> > > > > the bisect and looking at the commit log to pull out people to CC and
> > > > > note that the fix was explicitly targeted at v6.6.
> > > >
> > > > Anders investigated this reported issues and bisected and also found
> > > > the missing commit for stable-rc 6.6 is
> > > > e2768b798a19 ("arm64/mm: Modify range-based tlbi to decrement scale")
> > >
> > > Which is definitely *not* stable candidate. We need to understand why
> > > the invalidation goes south when the scale go up instead of down.
> >
> > If you backport e3ba51ab24fd ("arm64: tlb: Fix TLBI RANGE operand")
> > which fixes 117940aa6e5f ("KVM: arm64: Define
> > kvm_tlb_flush_vmid_range()") but without the newer e2768b798a19
> > ("arm64/mm: Modify range-based tlbi to decrement scale"), it looks like
> > "scale" in __flush_tlb_range_op() goes out of range to 4. Tested on my
> > CBMC model, not on the actual kernel. It may be worth adding some
> > WARN_ONs in __flush_tlb_range_op() if scale is outside the 0..3 range or
> > num greater than 31.
> >
> > I haven't investigated properly (and I'm off tomorrow, back on Thu) but
> > it's likely the original code was not very friendly to the maximum
> > range, never tested. Anyway, if one figures out why it goes out of
> > range, I think the solution is to also backport e2768b798a19 to stable.
>
> I looked into this, and I came to the conclusion that this patch is
> pretty much incompatible with the increasing scale (even if you cap
> num to 30).

Thanks Marc for digging into this.

> So despite my earlier comment, it looks like picking e2768b798a19 is
> the right thing to do *if* we're taking e3ba51ab24fd into 6.6-stable.
>
> Otherwise, we need a separate fix, which Ryan initially advocating for
> initially.

My preference would be to cherry-pick the two upstream commits than
coming up with an alternative fix for 6.6.

--
Catalin

2024-04-19 10:40:47

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 6.6 000/122] 6.6.28-rc1 review

On Thu, Apr 18, 2024 at 12:21:17PM +0100, Catalin Marinas wrote:
> On Thu, Apr 18, 2024 at 12:07:35PM +0100, Marc Zyngier wrote:
> > On Tue, 16 Apr 2024 18:28:10 +0100,
> > Catalin Marinas <[email protected]> wrote:
> > > On Tue, Apr 16, 2024 at 02:22:07PM +0100, Marc Zyngier wrote:
> > > > On Tue, 16 Apr 2024 14:07:30 +0100,
> > > > Naresh Kamboju <[email protected]> wrote:
> > > > > On Tue, 16 Apr 2024 at 16:04, Mark Brown <[email protected]> wrote:
> > > > > > On Mon, Apr 15, 2024 at 04:19:25PM +0200, Greg Kroah-Hartman wrote:
> > > > > > > This is the start of the stable review cycle for the 6.6.28 release.
> > > > > > > There are 122 patches in this series, all will be posted as a response
> > > > > > > to this one. If anyone has any issues with these being applied, please
> > > > > > > let me know.
> > > > > >
> > > > > > The bisect of the boot issue that's affecting the FVP in v6.6 (only)
> > > > > > landed on c9ad150ed8dd988 (arm64: tlb: Fix TLBI RANGE operand),
> > > > > > e3ba51ab24fdd in mainline, as being the first bad commit - it's also in
> > > > > > the -rc for v6.8 but that seems fine. I've done no investigation beyond
> > > > > > the bisect and looking at the commit log to pull out people to CC and
> > > > > > note that the fix was explicitly targeted at v6.6.
> > > > >
> > > > > Anders investigated this reported issues and bisected and also found
> > > > > the missing commit for stable-rc 6.6 is
> > > > > e2768b798a19 ("arm64/mm: Modify range-based tlbi to decrement scale")
> > > >
> > > > Which is definitely *not* stable candidate. We need to understand why
> > > > the invalidation goes south when the scale go up instead of down.
> > >
> > > If you backport e3ba51ab24fd ("arm64: tlb: Fix TLBI RANGE operand")
> > > which fixes 117940aa6e5f ("KVM: arm64: Define
> > > kvm_tlb_flush_vmid_range()") but without the newer e2768b798a19
> > > ("arm64/mm: Modify range-based tlbi to decrement scale"), it looks like
> > > "scale" in __flush_tlb_range_op() goes out of range to 4. Tested on my
> > > CBMC model, not on the actual kernel. It may be worth adding some
> > > WARN_ONs in __flush_tlb_range_op() if scale is outside the 0..3 range or
> > > num greater than 31.
> > >
> > > I haven't investigated properly (and I'm off tomorrow, back on Thu) but
> > > it's likely the original code was not very friendly to the maximum
> > > range, never tested. Anyway, if one figures out why it goes out of
> > > range, I think the solution is to also backport e2768b798a19 to stable.
> >
> > I looked into this, and I came to the conclusion that this patch is
> > pretty much incompatible with the increasing scale (even if you cap
> > num to 30).
>
> Thanks Marc for digging into this.
>
> > So despite my earlier comment, it looks like picking e2768b798a19 is
> > the right thing to do *if* we're taking e3ba51ab24fd into 6.6-stable.
> >
> > Otherwise, we need a separate fix, which Ryan initially advocating for
> > initially.
>
> My preference would be to cherry-pick the two upstream commits than
> coming up with an alternative fix for 6.6.

To be specific, which 2 commits, and what order?

thanks,

greg k-h

2024-04-19 10:50:39

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH 6.6 000/122] 6.6.28-rc1 review

On Fri, 19 Apr 2024 11:40:33 +0100,
Greg Kroah-Hartman <[email protected]> wrote:
>
> On Thu, Apr 18, 2024 at 12:21:17PM +0100, Catalin Marinas wrote:
> > On Thu, Apr 18, 2024 at 12:07:35PM +0100, Marc Zyngier wrote:
> > > On Tue, 16 Apr 2024 18:28:10 +0100,
> > > Catalin Marinas <[email protected]> wrote:
> > > > On Tue, Apr 16, 2024 at 02:22:07PM +0100, Marc Zyngier wrote:
> > > > > On Tue, 16 Apr 2024 14:07:30 +0100,
> > > > > Naresh Kamboju <[email protected]> wrote:
> > > > > > On Tue, 16 Apr 2024 at 16:04, Mark Brown <[email protected]> wrote:
> > > > > > > On Mon, Apr 15, 2024 at 04:19:25PM +0200, Greg Kroah-Hartman wrote:
> > > > > > > > This is the start of the stable review cycle for the 6.6.28 release.
> > > > > > > > There are 122 patches in this series, all will be posted as a response
> > > > > > > > to this one. If anyone has any issues with these being applied, please
> > > > > > > > let me know.
> > > > > > >
> > > > > > > The bisect of the boot issue that's affecting the FVP in v6.6 (only)
> > > > > > > landed on c9ad150ed8dd988 (arm64: tlb: Fix TLBI RANGE operand),
> > > > > > > e3ba51ab24fdd in mainline, as being the first bad commit - it's also in
> > > > > > > the -rc for v6.8 but that seems fine. I've done no investigation beyond
> > > > > > > the bisect and looking at the commit log to pull out people to CC and
> > > > > > > note that the fix was explicitly targeted at v6.6.
> > > > > >
> > > > > > Anders investigated this reported issues and bisected and also found
> > > > > > the missing commit for stable-rc 6.6 is
> > > > > > e2768b798a19 ("arm64/mm: Modify range-based tlbi to decrement scale")
> > > > >
> > > > > Which is definitely *not* stable candidate. We need to understand why
> > > > > the invalidation goes south when the scale go up instead of down.
> > > >
> > > > If you backport e3ba51ab24fd ("arm64: tlb: Fix TLBI RANGE operand")
> > > > which fixes 117940aa6e5f ("KVM: arm64: Define
> > > > kvm_tlb_flush_vmid_range()") but without the newer e2768b798a19
> > > > ("arm64/mm: Modify range-based tlbi to decrement scale"), it looks like
> > > > "scale" in __flush_tlb_range_op() goes out of range to 4. Tested on my
> > > > CBMC model, not on the actual kernel. It may be worth adding some
> > > > WARN_ONs in __flush_tlb_range_op() if scale is outside the 0..3 range or
> > > > num greater than 31.
> > > >
> > > > I haven't investigated properly (and I'm off tomorrow, back on Thu) but
> > > > it's likely the original code was not very friendly to the maximum
> > > > range, never tested. Anyway, if one figures out why it goes out of
> > > > range, I think the solution is to also backport e2768b798a19 to stable.
> > >
> > > I looked into this, and I came to the conclusion that this patch is
> > > pretty much incompatible with the increasing scale (even if you cap
> > > num to 30).
> >
> > Thanks Marc for digging into this.
> >
> > > So despite my earlier comment, it looks like picking e2768b798a19 is
> > > the right thing to do *if* we're taking e3ba51ab24fd into 6.6-stable.
> > >
> > > Otherwise, we need a separate fix, which Ryan initially advocating for
> > > initially.
> >
> > My preference would be to cherry-pick the two upstream commits than
> > coming up with an alternative fix for 6.6.
>
> To be specific, which 2 commits, and what order?

That'd be:

e2768b798a19 ("arm64/mm: Modify range-based tlbi to decrement scale")

followed by:

e3ba51ab24fd ("arm64: tlb: Fix TLBI RANGE operand")

Thanks,

M.

--
Without deviation from the norm, progress is not possible.

2024-04-19 11:05:30

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 6.6 000/122] 6.6.28-rc1 review

On Fri, Apr 19, 2024 at 11:50:14AM +0100, Marc Zyngier wrote:
> On Fri, 19 Apr 2024 11:40:33 +0100,
> Greg Kroah-Hartman <[email protected]> wrote:
> >
> > On Thu, Apr 18, 2024 at 12:21:17PM +0100, Catalin Marinas wrote:
> > > On Thu, Apr 18, 2024 at 12:07:35PM +0100, Marc Zyngier wrote:
> > > > On Tue, 16 Apr 2024 18:28:10 +0100,
> > > > Catalin Marinas <[email protected]> wrote:
> > > > > On Tue, Apr 16, 2024 at 02:22:07PM +0100, Marc Zyngier wrote:
> > > > > > On Tue, 16 Apr 2024 14:07:30 +0100,
> > > > > > Naresh Kamboju <[email protected]> wrote:
> > > > > > > On Tue, 16 Apr 2024 at 16:04, Mark Brown <[email protected]> wrote:
> > > > > > > > On Mon, Apr 15, 2024 at 04:19:25PM +0200, Greg Kroah-Hartman wrote:
> > > > > > > > > This is the start of the stable review cycle for the 6.6.28 release.
> > > > > > > > > There are 122 patches in this series, all will be posted as a response
> > > > > > > > > to this one. If anyone has any issues with these being applied, please
> > > > > > > > > let me know.
> > > > > > > >
> > > > > > > > The bisect of the boot issue that's affecting the FVP in v6.6 (only)
> > > > > > > > landed on c9ad150ed8dd988 (arm64: tlb: Fix TLBI RANGE operand),
> > > > > > > > e3ba51ab24fdd in mainline, as being the first bad commit - it's also in
> > > > > > > > the -rc for v6.8 but that seems fine. I've done no investigation beyond
> > > > > > > > the bisect and looking at the commit log to pull out people to CC and
> > > > > > > > note that the fix was explicitly targeted at v6.6.
> > > > > > >
> > > > > > > Anders investigated this reported issues and bisected and also found
> > > > > > > the missing commit for stable-rc 6.6 is
> > > > > > > e2768b798a19 ("arm64/mm: Modify range-based tlbi to decrement scale")
> > > > > >
> > > > > > Which is definitely *not* stable candidate. We need to understand why
> > > > > > the invalidation goes south when the scale go up instead of down.
> > > > >
> > > > > If you backport e3ba51ab24fd ("arm64: tlb: Fix TLBI RANGE operand")
> > > > > which fixes 117940aa6e5f ("KVM: arm64: Define
> > > > > kvm_tlb_flush_vmid_range()") but without the newer e2768b798a19
> > > > > ("arm64/mm: Modify range-based tlbi to decrement scale"), it looks like
> > > > > "scale" in __flush_tlb_range_op() goes out of range to 4. Tested on my
> > > > > CBMC model, not on the actual kernel. It may be worth adding some
> > > > > WARN_ONs in __flush_tlb_range_op() if scale is outside the 0..3 range or
> > > > > num greater than 31.
> > > > >
> > > > > I haven't investigated properly (and I'm off tomorrow, back on Thu) but
> > > > > it's likely the original code was not very friendly to the maximum
> > > > > range, never tested. Anyway, if one figures out why it goes out of
> > > > > range, I think the solution is to also backport e2768b798a19 to stable.
> > > >
> > > > I looked into this, and I came to the conclusion that this patch is
> > > > pretty much incompatible with the increasing scale (even if you cap
> > > > num to 30).
> > >
> > > Thanks Marc for digging into this.
> > >
> > > > So despite my earlier comment, it looks like picking e2768b798a19 is
> > > > the right thing to do *if* we're taking e3ba51ab24fd into 6.6-stable.
> > > >
> > > > Otherwise, we need a separate fix, which Ryan initially advocating for
> > > > initially.
> > >
> > > My preference would be to cherry-pick the two upstream commits than
> > > coming up with an alternative fix for 6.6.
> >
> > To be specific, which 2 commits, and what order?
>
> That'd be:
>
> e2768b798a19 ("arm64/mm: Modify range-based tlbi to decrement scale")
>
> followed by:
>
> e3ba51ab24fd ("arm64: tlb: Fix TLBI RANGE operand")

Thanks, now queued up.

greg k-h