2022-08-29 12:07:47

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 000/158] 5.19.6-rc1 review

This is the start of the stable review cycle for the 5.19.6 release.
There are 158 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.

Responses should be made by Wed, 31 Aug 2022 10:57:37 +0000.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.19.6-rc1.gz
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.19.y
and the diffstat can be found below.

thanks,

greg k-h

-------------
Pseudo-Shortlog of commits:

Greg Kroah-Hartman <[email protected]>
Linux 5.19.6-rc1

Daniel Borkmann <[email protected]>
bpf: Don't use tnum_range on array range checking for poke descriptors

Conor Dooley <[email protected]>
riscv: dts: microchip: mpfs: remove pci axi address translation property

Conor Dooley <[email protected]>
riscv: dts: microchip: mpfs: remove bogus card-detect-delay

Conor Dooley <[email protected]>
riscv: dts: microchip: mpfs: remove ti,fifo-depth property

Conor Dooley <[email protected]>
riscv: dts: microchip: mpfs: fix incorrect pcie child node name

Mike Christie <[email protected]>
scsi: core: Fix passthrough retry counter handling

Saurabh Sengar <[email protected]>
scsi: storvsc: Remove WQ_MEM_RECLAIM from storvsc_error_wq

Kiwoong Kim <[email protected]>
scsi: ufs: core: Enable link lost interrupt

Mark Brown <[email protected]>
arm64/sme: Don't flush SVE register state when handling SME traps

Mark Brown <[email protected]>
arm64/sme: Don't flush SVE register state when allocating SME storage

Mark Brown <[email protected]>
arm64/signal: Flush FPSIMD register state when disabling streaming mode

Mark Rutland <[email protected]>
arm64: fix rodata=full

Ian Rogers <[email protected]>
perf stat: Clear evsel->reset_group for each stat run

Stephane Eranian <[email protected]>
perf/x86/intel/ds: Fix precise store latency handling

Stephane Eranian <[email protected]>
perf/x86/intel/uncore: Fix broken read_counter() for SNB IMC PMU

James Clark <[email protected]>
perf python: Fix build when PYTHON_CONFIG is user supplied

Yu Kuai <[email protected]>
blk-mq: fix io hung due to missing commit_rqs

Salvatore Bonaccorso <[email protected]>
Documentation/ABI: Mention retbleed vulnerability info file for sysfs

Prike Liang <[email protected]>
drm/amdkfd: Fix isa version for the GC 10.3.7

Peter Zijlstra <[email protected]>
x86/nospec: Fix i386 RSB stuffing

Liam Howlett <[email protected]>
binder_alloc: add missing mmap_lock calls when using the VMA

Zenghui Yu <[email protected]>
arm64: Fix match_list for erratum 1286807 on Arm Cortex-A76

Guoqing Jiang <[email protected]>
md: call __md_stop_writes in md_stop

Guoqing Jiang <[email protected]>
Revert "md-raid: destroy the bitmap after destroying the thread"

David Hildenbrand <[email protected]>
mm/hugetlb: fix hugetlb not supporting softdirty tracking

Jens Axboe <[email protected]>
io_uring: fix issue with io_write() not always undoing sb_start_write()

Jiri Slaby <[email protected]>
Revert "zram: remove double compression logic"

Heinrich Schuchardt <[email protected]>
riscv: dts: microchip: correct L2 cache interrupts

Conor Dooley <[email protected]>
riscv: traps: add missing prototype

Conor Dooley <[email protected]>
riscv: signal: fix missing prototype warning

Juergen Gross <[email protected]>
xen/privcmd: fix error exit of privcmd_ioctl_dm_op()

Heming Zhao <[email protected]>
ocfs2: fix freeing uninitialized resource on ocfs2_dlm_shutdown

David Howells <[email protected]>
smb3: missing inode locks in punch hole

Karol Herbst <[email protected]>
nouveau: explicitly wait on the fence in nouveau_bo_move_m2mf

Riwen Lu <[email protected]>
ACPI: processor: Remove freq Qos request for all CPUs

Matthew Wilcox (Oracle) <[email protected]>
shmem: update folio if shmem_replace_page() updates the page

Shakeel Butt <[email protected]>
Revert "memcg: cleanup racy sum avoidance code"

Shigeru Yoshida <[email protected]>
fbdev: fbcon: Properly revert changes when vc_resize() failed

Brian Foster <[email protected]>
s390: fix double free of GS and RI CBs on fork() failure

Paulo Alcantara <[email protected]>
cifs: skip extra NULL byte in filenames

Peter Xu <[email protected]>
mm/mprotect: only reference swap pfn page if type match

Miaohe Lin <[email protected]>
mm/hugetlb: avoid corrupting page->mapping in hugetlb_mcopy_atomic_pte

Liu Shixin <[email protected]>
bootmem: remove the vmemmap pages from kmemleak in put_page_bootmem

Gerald Schaefer <[email protected]>
s390/mm: do not trigger write fault when vma does not allow VM_WRITE

Badari Pulavarty <[email protected]>
mm/damon/dbgfs: avoid duplicate context directory creation

Quanyang Wang <[email protected]>
asm-generic: sections: refactor memory_intersects

Richard Guy Briggs <[email protected]>
audit: move audit_return_fixup before the filters

Khazhismel Kumykov <[email protected]>
writeback: avoid use-after-free after removing device

Siddh Raman Pant <[email protected]>
loop: Check for overflow while configuring loop

Jan Beulich <[email protected]>
x86/PAT: Have pat_enabled() properly reflect state when running on Xen

Peter Zijlstra <[email protected]>
x86/nospec: Unwreck the RSB stuffing

Pawan Gupta <[email protected]>
x86/bugs: Add "unknown" reporting for MMIO Stale Data

Tom Lendacky <[email protected]>
x86/sev: Don't use cc_platform_has() for early SEV-SNP calls

Chen Zhongjin <[email protected]>
x86/unwind/orc: Unwind ftrace trampolines with correct ORC entry

Juergen Gross <[email protected]>
x86/entry: Fix entry_INT80_compat for Xen PV guests

Kan Liang <[email protected]>
perf/x86/lbr: Enable the branch type for the Arch LBR by default

Kan Liang <[email protected]>
perf/x86/intel: Fix pebs event constraints for ADL

Michael Roth <[email protected]>
x86/boot: Don't propagate uninitialized boot_params->cc_blob_address

Filipe Manana <[email protected]>
btrfs: update generation of hole file extent item when merging holes

Zixuan Fu <[email protected]>
btrfs: fix possible memory leak in btrfs_get_dev_args_from_path()

Goldwyn Rodrigues <[email protected]>
btrfs: check if root is readonly while setting security xattr

Omar Sandoval <[email protected]>
btrfs: fix space cache corruption and potential double allocations

Anand Jain <[email protected]>
btrfs: add info when mount fails due to stale replace target

Anand Jain <[email protected]>
btrfs: replace: drop assert for suspended replace

Filipe Manana <[email protected]>
btrfs: fix silent failure when deleting root reference

Aleksander Jan Bajkowski <[email protected]>
net: lantiq_xrx200: restore buffer if memory allocation failed

Aleksander Jan Bajkowski <[email protected]>
net: lantiq_xrx200: fix lock under memory pressure

Aleksander Jan Bajkowski <[email protected]>
net: lantiq_xrx200: confirm skb is allocated before using

Heiner Kallweit <[email protected]>
net: stmmac: work around sporadic tx issue on link-up

R Mohamed Shah <[email protected]>
ionic: VF initial random MAC address if no assigned mac

Shannon Nelson <[email protected]>
ionic: fix up issues with handling EAGAIN on FW cmds

Shannon Nelson <[email protected]>
ionic: clear broken state on generation change

David Howells <[email protected]>
rxrpc: Fix locking in rxrpc's sendmsg

Lorenzo Bianconi <[email protected]>
net: ethernet: mtk_eth_soc: fix hw hash reporting for MTK_NETSYS_V2

Lorenzo Bianconi <[email protected]>
net: ethernet: mtk_eth_soc: enable rx cksum offload for MTK_NETSYS_V2

Sylwester Dziedziuch <[email protected]>
i40e: Fix incorrect address type for IPv6 flow rules

Jacob Keller <[email protected]>
ixgbe: stop resetting SYSTIME in ixgbe_ptp_start_cyclecounter

Kuniyuki Iwashima <[email protected]>
net: Fix a data-race around sysctl_somaxconn.

Kuniyuki Iwashima <[email protected]>
net: Fix a data-race around netdev_unregister_timeout_secs.

Kuniyuki Iwashima <[email protected]>
net: Fix a data-race around gro_normal_batch.

Kuniyuki Iwashima <[email protected]>
net: Fix data-races around sysctl_devconf_inherit_init_net.

Kuniyuki Iwashima <[email protected]>
net: Fix data-races around sysctl_fb_tunnels_only_for_init_net.

Kuniyuki Iwashima <[email protected]>
net: Fix a data-race around netdev_budget_usecs.

Kuniyuki Iwashima <[email protected]>
net: Fix data-races around sysctl_max_skb_frags.

Kuniyuki Iwashima <[email protected]>
net: Fix a data-race around netdev_budget.

Kuniyuki Iwashima <[email protected]>
net: Fix a data-race around sysctl_net_busy_read.

Kuniyuki Iwashima <[email protected]>
net: Fix a data-race around sysctl_net_busy_poll.

Kuniyuki Iwashima <[email protected]>
net: Fix a data-race around sysctl_tstamp_allow_data.

Kuniyuki Iwashima <[email protected]>
net: Fix data-races around sysctl_optmem_max.

Kuniyuki Iwashima <[email protected]>
ratelimit: Fix data-races in ___ratelimit().

Kuniyuki Iwashima <[email protected]>
net: Fix data-races around netdev_tstamp_prequeue.

Kuniyuki Iwashima <[email protected]>
net: Fix data-races around netdev_max_backlog.

Kuniyuki Iwashima <[email protected]>
net: Fix data-races around weight_p and dev_weight_[rt]x_bias.

Kuniyuki Iwashima <[email protected]>
net: Fix data-races around sysctl_[rw]mem_(max|default).

Pablo Neira Ayuso <[email protected]>
netfilter: flowtable: fix stuck flows on cleanup due to pending work

Pablo Neira Ayuso <[email protected]>
netfilter: flowtable: add function to invoke garbage collection immediately

Pablo Neira Ayuso <[email protected]>
netfilter: nf_tables: disallow binding to already bound chain

Pablo Neira Ayuso <[email protected]>
netfilter: nft_tunnel: restrict it to netdev family

Pablo Neira Ayuso <[email protected]>
netfilter: nft_osf: restrict osf to ipv4, ipv6 and inet families

Pablo Neira Ayuso <[email protected]>
netfilter: nf_tables: do not leave chain stats enabled on error

Pablo Neira Ayuso <[email protected]>
netfilter: nft_payload: do not truncate csum_offset and csum_type

Pablo Neira Ayuso <[email protected]>
netfilter: nft_payload: report ERANGE for too long offset and length

Pablo Neira Ayuso <[email protected]>
netfilter: nf_tables: make table handle allocation per-netns friendly

Pablo Neira Ayuso <[email protected]>
netfilter: nf_tables: disallow updates of implicit chain

Vikas Gupta <[email protected]>
bnxt_en: fix LRO/GRO_HW features in ndo_fix_features callback

Vikas Gupta <[email protected]>
bnxt_en: fix NQ resource accounting during vf creation on 57500 chips

Vikas Gupta <[email protected]>
bnxt_en: set missing reload flag in devlink features

Pavan Chebbi <[email protected]>
bnxt_en: Use PAGE_SIZE to init buffer when multi buffer XDP is not in use

Florian Westphal <[email protected]>
netfilter: nft_tproxy: restrict to prerouting hook

Florian Westphal <[email protected]>
netfilter: ebtables: reject blobs that don't provide all entry points

Maciej Żenczykowski <[email protected]>
net: ipvtap - add __init/__exit annotations to module init/exit funcs

Jonathan Toppins <[email protected]>
bonding: 802.3ad: fix no transmission of LACPDUs

Sergei Antonov <[email protected]>
net: moxa: get rid of asymmetry in DMA mapping/unmapping

Xiaolei Wang <[email protected]>
net: phy: Don't WARN for PHY_READY state in mdio_bus_phy_resume()

Alex Elder <[email protected]>
net: ipa: don't assume SMEM is page-aligned

Vladimir Oltean <[email protected]>
net: dsa: microchip: keep compatibility with device tree blobs with no phy-mode

Arun Ramadoss <[email protected]>
net: dsa: microchip: update the ksz_phylink_get_caps

Arun Ramadoss <[email protected]>
net: dsa: microchip: move the port mirror to ksz_common

Arun Ramadoss <[email protected]>
net: dsa: microchip: move vlan functionality to ksz_common

Arun Ramadoss <[email protected]>
net: dsa: microchip: move tag_protocol to ksz_common

Arun Ramadoss <[email protected]>
net: dsa: microchip: move switch chip_id detection to ksz_common

Arun Ramadoss <[email protected]>
net: dsa: microchip: ksz9477: cleanup the ksz9477_switch_detect

Maor Dickman <[email protected]>
net/mlx5e: Fix wrong tc flag used when set hw-tc-offload off

Aya Levin <[email protected]>
net/mlx5e: Fix wrong application of the LRO state

Moshe Shemesh <[email protected]>
net/mlx5: Avoid false positive lockdep warning by adding lock_class_key

Roy Novich <[email protected]>
net/mlx5: Fix cmd error logging for manage pages cmd

Vlad Buslov <[email protected]>
net/mlx5: Disable irq when locking lag_lock

Eli Cohen <[email protected]>
net/mlx5: Eswitch, Fix forwarding decision to uplink

Eli Cohen <[email protected]>
net/mlx5: LAG, fix logic over MLX5_LAG_FLAG_NDEVS_READY

Vlad Buslov <[email protected]>
net/mlx5e: Properly disable vlan strip on non-UL reps

Maciej Fijalkowski <[email protected]>
ice: xsk: use Rx ring's XDP ring when picking NAPI context

Maciej Fijalkowski <[email protected]>
ice: xsk: prohibit usage of non-balanced queue id

Duoming Zhou <[email protected]>
nfc: pn533: Fix use-after-free bugs caused by pn532_cmd_timeout

Hayes Wang <[email protected]>
r8152: fix the RX FIFO settings when suspending

Hayes Wang <[email protected]>
r8152: fix the units of some registers for RTL8156A

Bernard Pidoux <[email protected]>
rose: check NULL rose_loopback_neigh->loopback

Christian Brauner <[email protected]>
ntfs: fix acl handling

Peter Xu <[email protected]>
mm/smaps: don't access young/dirty bit if pte unpresent

Trond Myklebust <[email protected]>
SUNRPC: RPC level errors should set task->tk_rpc_status

Olga Kornievskaia <[email protected]>
NFSv4.2 fix problems with __nfs42_ssc_open

Sabrina Dubroca <[email protected]>
Revert "net: macsec: update SCI upon MAC address change."

Seth Forshee <[email protected]>
fs: require CAP_SYS_ADMIN in target namespace for idmapped mounts

Nikolay Aleksandrov <[email protected]>
xfrm: policy: fix metadata dst->dev xmit null pointer dereference

Herbert Xu <[email protected]>
af_key: Do not call xfrm_probe_algs in parallel

Antony Antony <[email protected]>
xfrm: clone missing x->lastused in xfrm_do_migrate

Antony Antony <[email protected]>
Revert "xfrm: update SA curlft.use_time"

Xin Xiong <[email protected]>
xfrm: fix refcount leak in __xfrm_policy_check()

Deren Wu <[email protected]>
mt76: mt7921: fix command timeout in AP stop period

David Hildenbrand <[email protected]>
mm/hugetlb: support write-faults in shared mappings

Peter Xu <[email protected]>
mm/uffd: reset write protection when unregister with wp-mode

Kuniyuki Iwashima <[email protected]>
kprobes: don't call disarm_kprobe() for disabled kprobes

Randy Dunlap <[email protected]>
kernel/sys_ni: add compat entry for fadvise64_64

Helge Deller <[email protected]>
parisc: Fix exception handler for fldw and fstw instructions

Helge Deller <[email protected]>
parisc: Make CONFIG_64BIT available for ARCH=parisc64 only

Jing-Ting Wu <[email protected]>
cgroup: Fix race condition at rebind_subsystems()

Gaosheng Cui <[email protected]>
audit: fix potential double free on error path from fsnotify_add_inode_mark

Trond Myklebust <[email protected]>
NFS: Fix another fsync() issue after a server reboot

David Hildenbrand <[email protected]>
mm/gup: fix FOLL_FORCE COW security issue and remove FOLL_COW


-------------

Diffstat:

Documentation/ABI/testing/sysfs-devices-system-cpu | 1 +
.../hw-vuln/processor_mmio_stale_data.rst | 14 ++
Documentation/admin-guide/kernel-parameters.txt | 2 +
Documentation/admin-guide/sysctl/net.rst | 2 +-
Makefile | 4 +-
arch/arm64/include/asm/fpsimd.h | 4 +-
arch/arm64/include/asm/setup.h | 17 ++
arch/arm64/kernel/cpu_errata.c | 2 +
arch/arm64/kernel/fpsimd.c | 21 +--
arch/arm64/kernel/ptrace.c | 6 +-
arch/arm64/kernel/signal.c | 12 +-
arch/arm64/mm/mmu.c | 18 ---
arch/parisc/Kconfig | 21 +--
arch/parisc/kernel/unaligned.c | 2 +-
arch/riscv/boot/dts/microchip/mpfs-icicle-kit.dts | 3 -
arch/riscv/boot/dts/microchip/mpfs-polarberry.dts | 3 -
arch/riscv/boot/dts/microchip/mpfs.dtsi | 5 +-
arch/riscv/include/asm/signal.h | 12 ++
arch/riscv/include/asm/thread_info.h | 2 +
arch/riscv/kernel/signal.c | 1 +
arch/riscv/kernel/traps.c | 3 +-
arch/s390/kernel/process.c | 22 ++-
arch/s390/mm/fault.c | 4 +-
arch/x86/boot/compressed/misc.h | 12 +-
arch/x86/boot/compressed/sev.c | 8 +
arch/x86/entry/entry_64_compat.S | 2 +-
arch/x86/events/intel/ds.c | 12 +-
arch/x86/events/intel/lbr.c | 8 +
arch/x86/events/intel/uncore_snb.c | 18 ++-
arch/x86/include/asm/cpufeatures.h | 5 +-
arch/x86/include/asm/nospec-branch.h | 92 ++++++-----
arch/x86/kernel/cpu/bugs.c | 14 +-
arch/x86/kernel/cpu/common.c | 42 +++--
arch/x86/kernel/sev.c | 16 +-
arch/x86/kernel/unwind_orc.c | 15 +-
arch/x86/mm/pat/memtype.c | 10 +-
block/blk-mq.c | 5 +-
drivers/acpi/processor_thermal.c | 2 +-
drivers/android/binder_alloc.c | 31 ++--
drivers/block/loop.c | 5 +
drivers/block/zram/zram_drv.c | 42 +++--
drivers/block/zram/zram_drv.h | 1 +
drivers/gpu/drm/amd/amdkfd/kfd_device.c | 6 +-
drivers/gpu/drm/nouveau/nouveau_bo.c | 9 ++
drivers/md/md.c | 3 +-
drivers/net/bonding/bond_3ad.c | 38 ++---
drivers/net/dsa/microchip/ksz8795.c | 102 +++---------
drivers/net/dsa/microchip/ksz8795_reg.h | 16 --
drivers/net/dsa/microchip/ksz9477.c | 89 +++--------
drivers/net/dsa/microchip/ksz9477_reg.h | 1 -
drivers/net/dsa/microchip/ksz_common.c | 173 ++++++++++++++++++++-
drivers/net/dsa/microchip/ksz_common.h | 47 +++++-
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 5 +-
drivers/net/ethernet/broadcom/bnxt/bnxt.h | 1 +
drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c | 1 +
drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c | 2 +-
drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c | 10 +-
drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 2 +-
drivers/net/ethernet/intel/ice/ice.h | 36 +++--
drivers/net/ethernet/intel/ice/ice_lib.c | 4 +-
drivers/net/ethernet/intel/ice/ice_main.c | 25 ++-
drivers/net/ethernet/intel/ice/ice_xsk.c | 18 ++-
drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c | 59 +++++--
drivers/net/ethernet/lantiq_xrx200.c | 9 +-
drivers/net/ethernet/mediatek/mtk_eth_soc.c | 29 ++--
drivers/net/ethernet/mediatek/mtk_eth_soc.h | 5 +
drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 12 +-
drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 2 +
.../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 3 +-
drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c | 57 ++++---
drivers/net/ethernet/mellanox/mlx5/core/main.c | 4 +
.../net/ethernet/mellanox/mlx5/core/pagealloc.c | 9 +-
drivers/net/ethernet/moxa/moxart_ether.c | 11 +-
drivers/net/ethernet/pensando/ionic/ionic_lif.c | 95 ++++++++++-
drivers/net/ethernet/pensando/ionic/ionic_main.c | 4 +-
drivers/net/ethernet/stmicro/stmmac/dwmac_lib.c | 8 +-
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 9 +-
drivers/net/ipa/ipa_mem.c | 2 +-
drivers/net/ipvlan/ipvtap.c | 4 +-
drivers/net/macsec.c | 11 +-
drivers/net/phy/phy_device.c | 8 +-
drivers/net/usb/r8152.c | 27 ++--
.../net/wireless/mediatek/mt76/mt76_connac_mcu.c | 2 +
drivers/net/wireless/mediatek/mt76/mt7921/main.c | 47 ++++--
drivers/net/wireless/mediatek/mt76/mt7921/mcu.c | 5 +-
drivers/net/wireless/mediatek/mt76/mt7921/mt7921.h | 2 +
drivers/nfc/pn533/uart.c | 1 +
drivers/scsi/scsi_lib.c | 3 +-
drivers/scsi/storvsc_drv.c | 2 +-
drivers/video/fbdev/core/fbcon.c | 27 +++-
drivers/xen/privcmd.c | 21 +--
fs/btrfs/block-group.c | 47 ++----
fs/btrfs/block-group.h | 4 +-
fs/btrfs/ctree.h | 1 -
fs/btrfs/dev-replace.c | 5 +-
fs/btrfs/extent-tree.c | 30 +---
fs/btrfs/file.c | 2 +
fs/btrfs/root-tree.c | 5 +-
fs/btrfs/volumes.c | 5 +-
fs/btrfs/xattr.c | 3 +
fs/cifs/smb2ops.c | 12 +-
fs/cifs/smb2pdu.c | 16 +-
fs/fs-writeback.c | 12 +-
fs/namespace.c | 7 +
fs/nfs/file.c | 15 +-
fs/nfs/inode.c | 1 +
fs/nfs/nfs4file.c | 6 +
fs/nfs/write.c | 6 +-
fs/ntfs3/xattr.c | 16 +-
fs/ocfs2/dlmglue.c | 8 +-
fs/ocfs2/super.c | 3 +-
fs/proc/task_mmu.c | 7 +-
fs/userfaultfd.c | 4 +
include/asm-generic/sections.h | 7 +-
include/linux/memcontrol.h | 15 +-
include/linux/mlx5/driver.h | 1 +
include/linux/mm.h | 1 -
include/linux/netdevice.h | 20 ++-
include/linux/netfilter_bridge/ebtables.h | 4 -
include/linux/nfs_fs.h | 1 +
include/linux/userfaultfd_k.h | 2 +
include/net/busy_poll.h | 2 +-
include/net/gro.h | 2 +-
include/net/netfilter/nf_flow_table.h | 3 +
include/net/netfilter/nf_tables.h | 1 +
include/ufs/ufshci.h | 6 +-
init/main.c | 18 ++-
io_uring/io_uring.c | 7 +-
kernel/audit_fsnotify.c | 1 +
kernel/auditsc.c | 4 +-
kernel/bpf/verifier.c | 10 +-
kernel/cgroup/cgroup.c | 1 +
kernel/kprobes.c | 9 +-
kernel/sys_ni.c | 1 +
lib/ratelimit.c | 12 +-
mm/backing-dev.c | 10 +-
mm/bootmem_info.c | 2 +
mm/damon/dbgfs.c | 3 +
mm/gup.c | 69 +++++---
mm/huge_memory.c | 65 +++++---
mm/hugetlb.c | 28 +++-
mm/mmap.c | 8 +-
mm/mprotect.c | 3 +-
mm/page-writeback.c | 6 +-
mm/shmem.c | 1 +
mm/userfaultfd.c | 29 ++--
net/bridge/netfilter/ebtable_broute.c | 8 -
net/bridge/netfilter/ebtable_filter.c | 8 -
net/bridge/netfilter/ebtable_nat.c | 8 -
net/bridge/netfilter/ebtables.c | 8 +-
net/core/bpf_sk_storage.c | 5 +-
net/core/dev.c | 20 +--
net/core/filter.c | 13 +-
net/core/gro_cells.c | 2 +-
net/core/skbuff.c | 2 +-
net/core/sock.c | 18 ++-
net/core/sysctl_net_core.c | 15 +-
net/ipv4/devinet.c | 16 +-
net/ipv4/ip_output.c | 2 +-
net/ipv4/ip_sockglue.c | 6 +-
net/ipv4/tcp.c | 4 +-
net/ipv4/tcp_output.c | 2 +-
net/ipv6/addrconf.c | 5 +-
net/ipv6/ipv6_sockglue.c | 4 +-
net/key/af_key.c | 3 +
net/mptcp/protocol.c | 2 +-
net/netfilter/ipvs/ip_vs_sync.c | 4 +-
net/netfilter/nf_flow_table_core.c | 15 +-
net/netfilter/nf_flow_table_offload.c | 8 +
net/netfilter/nf_tables_api.c | 14 +-
net/netfilter/nft_osf.c | 18 ++-
net/netfilter/nft_payload.c | 29 +++-
net/netfilter/nft_tproxy.c | 8 +
net/netfilter/nft_tunnel.c | 1 +
net/rose/rose_loopback.c | 3 +-
net/rxrpc/call_object.c | 4 +-
net/rxrpc/sendmsg.c | 92 ++++++-----
net/sched/sch_generic.c | 2 +-
net/socket.c | 2 +-
net/sunrpc/clnt.c | 2 +-
net/xfrm/espintcp.c | 2 +-
net/xfrm/xfrm_input.c | 3 +-
net/xfrm/xfrm_output.c | 1 -
net/xfrm/xfrm_policy.c | 3 +-
net/xfrm/xfrm_state.c | 1 +
tools/perf/Makefile.config | 2 +-
tools/perf/builtin-stat.c | 1 +
187 files changed, 1603 insertions(+), 917 deletions(-)



2022-08-29 12:07:56

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 147/158] arm64: fix rodata=full

From: Mark Rutland <[email protected]>

commit 2e8cff0a0eee87b27f0cf87ad8310eb41b5886ab upstream.

On arm64, "rodata=full" has been suppored (but not documented) since
commit:

c55191e96caa9d78 ("arm64: mm: apply r/o permissions of VM areas to its linear alias as well")

As it's necessary to determine the rodata configuration early during
boot, arm64 has an early_param() handler for this, whereas init/main.c
has a __setup() handler which is run later.

Unfortunately, this split meant that since commit:

f9a40b0890658330 ("init/main.c: return 1 from handled __setup() functions")

... passing "rodata=full" would result in a spurious warning from the
__setup() handler (though RO permissions would be configured
appropriately).

Further, "rodata=full" has been broken since commit:

0d6ea3ac94ca77c5 ("lib/kstrtox.c: add "false"/"true" support to kstrtobool()")

... which caused strtobool() to parse "full" as false (in addition to
many other values not documented for the "rodata=" kernel parameter.

This patch fixes this breakage by:

* Moving the core parameter parser to an __early_param(), such that it
is available early.

* Adding an (optional) arch hook which arm64 can use to parse "full".

* Updating the documentation to mention that "full" is valid for arm64.

* Having the core parameter parser handle "on" and "off" explicitly,
such that any undocumented values (e.g. typos such as "ful") are
reported as errors rather than being silently accepted.

Note that __setup() and early_param() have opposite conventions for
their return values, where __setup() uses 1 to indicate a parameter was
handled and early_param() uses 0 to indicate a parameter was handled.

Fixes: f9a40b089065 ("init/main.c: return 1 from handled __setup() functions")
Fixes: 0d6ea3ac94ca ("lib/kstrtox.c: add "false"/"true" support to kstrtobool()")
Signed-off-by: Mark Rutland <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Jagdish Gediya <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: Will Deacon <[email protected]>
Reviewed-by: Ard Biesheuvel <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
Documentation/admin-guide/kernel-parameters.txt | 2 ++
arch/arm64/include/asm/setup.h | 17 +++++++++++++++++
arch/arm64/mm/mmu.c | 18 ------------------
init/main.c | 18 +++++++++++++++---
4 files changed, 34 insertions(+), 21 deletions(-)

--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5260,6 +5260,8 @@
rodata= [KNL]
on Mark read-only kernel memory as read-only (default).
off Leave read-only kernel memory writable for debugging.
+ full Mark read-only kernel memory and aliases as read-only
+ [arm64]

rockchip.usb_uart
Enable the uart passthrough on the designated usb port
--- a/arch/arm64/include/asm/setup.h
+++ b/arch/arm64/include/asm/setup.h
@@ -3,6 +3,8 @@
#ifndef __ARM64_ASM_SETUP_H
#define __ARM64_ASM_SETUP_H

+#include <linux/string.h>
+
#include <uapi/asm/setup.h>

void *get_early_fdt_ptr(void);
@@ -14,4 +16,19 @@ void early_fdt_map(u64 dt_phys);
extern phys_addr_t __fdt_pointer __initdata;
extern u64 __cacheline_aligned boot_args[4];

+static inline bool arch_parse_debug_rodata(char *arg)
+{
+ extern bool rodata_enabled;
+ extern bool rodata_full;
+
+ if (arg && !strcmp(arg, "full")) {
+ rodata_enabled = true;
+ rodata_full = true;
+ return true;
+ }
+
+ return false;
+}
+#define arch_parse_debug_rodata arch_parse_debug_rodata
+
#endif
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -625,24 +625,6 @@ static void __init map_kernel_segment(pg
vm_area_add_early(vma);
}

-static int __init parse_rodata(char *arg)
-{
- int ret = strtobool(arg, &rodata_enabled);
- if (!ret) {
- rodata_full = false;
- return 0;
- }
-
- /* permit 'full' in addition to boolean options */
- if (strcmp(arg, "full"))
- return -EINVAL;
-
- rodata_enabled = true;
- rodata_full = true;
- return 0;
-}
-early_param("rodata", parse_rodata);
-
#ifdef CONFIG_UNMAP_KERNEL_AT_EL0
static int __init map_entry_trampoline(void)
{
--- a/init/main.c
+++ b/init/main.c
@@ -1446,13 +1446,25 @@ static noinline void __init kernel_init_

#if defined(CONFIG_STRICT_KERNEL_RWX) || defined(CONFIG_STRICT_MODULE_RWX)
bool rodata_enabled __ro_after_init = true;
+
+#ifndef arch_parse_debug_rodata
+static inline bool arch_parse_debug_rodata(char *str) { return false; }
+#endif
+
static int __init set_debug_rodata(char *str)
{
- if (strtobool(str, &rodata_enabled))
+ if (arch_parse_debug_rodata(str))
+ return 0;
+
+ if (str && !strcmp(str, "on"))
+ rodata_enabled = true;
+ else if (str && !strcmp(str, "off"))
+ rodata_enabled = false;
+ else
pr_warn("Invalid option string for rodata: '%s'\n", str);
- return 1;
+ return 0;
}
-__setup("rodata=", set_debug_rodata);
+early_param("rodata", set_debug_rodata);
#endif

#ifdef CONFIG_STRICT_KERNEL_RWX


2022-08-29 12:08:03

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 018/158] Revert "net: macsec: update SCI upon MAC address change."

From: Sabrina Dubroca <[email protected]>

[ Upstream commit e82c649e851c9c25367fb7a2a6cf3479187de467 ]

This reverts commit 6fc498bc82929ee23aa2f35a828c6178dfd3f823.

Commit 6fc498bc8292 states:

SCI should be updated, because it contains MAC in its first 6
octets.

That's not entirely correct. The SCI can be based on the MAC address,
but doesn't have to be. We can also use any 64-bit number as the
SCI. When the SCI based on the MAC address, it uses a 16-bit "port
number" provided by userspace, which commit 6fc498bc8292 overwrites
with 1.

In addition, changing the SCI after macsec has been setup can just
confuse the receiver. If we configure the RXSC on the peer based on
the original SCI, we should keep the same SCI on TX.

When the macsec device is being managed by a userspace key negotiation
daemon such as wpa_supplicant, commit 6fc498bc8292 would also
overwrite the SCI defined by userspace.

Fixes: 6fc498bc8292 ("net: macsec: update SCI upon MAC address change.")
Signed-off-by: Sabrina Dubroca <[email protected]>
Link: https://lore.kernel.org/r/9b1a9d28327e7eb54550a92eebda45d25e54dd0d.1660667033.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/macsec.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c
index f354fad05714a..5b0b23e55fa76 100644
--- a/drivers/net/macsec.c
+++ b/drivers/net/macsec.c
@@ -449,11 +449,6 @@ static struct macsec_eth_header *macsec_ethhdr(struct sk_buff *skb)
return (struct macsec_eth_header *)skb_mac_header(skb);
}

-static sci_t dev_to_sci(struct net_device *dev, __be16 port)
-{
- return make_sci(dev->dev_addr, port);
-}
-
static void __macsec_pn_wrapped(struct macsec_secy *secy,
struct macsec_tx_sa *tx_sa)
{
@@ -3622,7 +3617,6 @@ static int macsec_set_mac_address(struct net_device *dev, void *p)

out:
eth_hw_addr_set(dev, addr->sa_data);
- macsec->secy.sci = dev_to_sci(dev, MACSEC_PORT_ES);

/* If h/w offloading is available, propagate to the device */
if (macsec_is_offloaded(macsec)) {
@@ -3960,6 +3954,11 @@ static bool sci_exists(struct net_device *dev, sci_t sci)
return false;
}

+static sci_t dev_to_sci(struct net_device *dev, __be16 port)
+{
+ return make_sci(dev->dev_addr, port);
+}
+
static int macsec_add_dev(struct net_device *dev, sci_t sci, u8 icv_len)
{
struct macsec_dev *macsec = macsec_priv(dev);
--
2.35.1



2022-08-29 12:08:15

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 134/158] mm/hugetlb: fix hugetlb not supporting softdirty tracking

From: David Hildenbrand <[email protected]>

commit f96f7a40874d7c746680c0b9f57cef2262ae551f upstream.

Patch series "mm/hugetlb: fix write-fault handling for shared mappings", v2.

I observed that hugetlb does not support/expect write-faults in shared
mappings that would have to map the R/O-mapped page writable -- and I
found two case where we could currently get such faults and would
erroneously map an anon page into a shared mapping.

Reproducers part of the patches.

I propose to backport both fixes to stable trees. The first fix needs a
small adjustment.


This patch (of 2):

Staring at hugetlb_wp(), one might wonder where all the logic for shared
mappings is when stumbling over a write-protected page in a shared
mapping. In fact, there is none, and so far we thought we could get away
with that because e.g., mprotect() should always do the right thing and
map all pages directly writable.

Looks like we were wrong:

--------------------------------------------------------------------------
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <unistd.h>
#include <errno.h>
#include <sys/mman.h>

#define HUGETLB_SIZE (2 * 1024 * 1024u)

static void clear_softdirty(void)
{
int fd = open("/proc/self/clear_refs", O_WRONLY);
const char *ctrl = "4";
int ret;

if (fd < 0) {
fprintf(stderr, "open(clear_refs) failed\n");
exit(1);
}
ret = write(fd, ctrl, strlen(ctrl));
if (ret != strlen(ctrl)) {
fprintf(stderr, "write(clear_refs) failed\n");
exit(1);
}
close(fd);
}

int main(int argc, char **argv)
{
char *map;
int fd;

fd = open("/dev/hugepages/tmp", O_RDWR | O_CREAT);
if (!fd) {
fprintf(stderr, "open() failed\n");
return -errno;
}
if (ftruncate(fd, HUGETLB_SIZE)) {
fprintf(stderr, "ftruncate() failed\n");
return -errno;
}

map = mmap(NULL, HUGETLB_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
if (map == MAP_FAILED) {
fprintf(stderr, "mmap() failed\n");
return -errno;
}

*map = 0;

if (mprotect(map, HUGETLB_SIZE, PROT_READ)) {
fprintf(stderr, "mmprotect() failed\n");
return -errno;
}

clear_softdirty();

if (mprotect(map, HUGETLB_SIZE, PROT_READ|PROT_WRITE)) {
fprintf(stderr, "mmprotect() failed\n");
return -errno;
}

*map = 0;

return 0;
}
--------------------------------------------------------------------------

Above test fails with SIGBUS when there is only a single free hugetlb page.
# echo 1 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
# ./test
Bus error (core dumped)

And worse, with sufficient free hugetlb pages it will map an anonymous page
into a shared mapping, for example, messing up accounting during unmap
and breaking MAP_SHARED semantics:
# echo 2 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
# ./test
# cat /proc/meminfo | grep HugePages_
HugePages_Total: 2
HugePages_Free: 1
HugePages_Rsvd: 18446744073709551615
HugePages_Surp: 0

Reason in this particular case is that vma_wants_writenotify() will
return "true", removing VM_SHARED in vma_set_page_prot() to map pages
write-protected. Let's teach vma_wants_writenotify() that hugetlb does not
support softdirty tracking.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 64e455079e1b ("mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared")
Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Mike Kravetz <[email protected]>
Cc: Peter Feiner <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Cyrill Gorcunov <[email protected]>
Cc: Pavel Emelyanov <[email protected]>
Cc: Jamie Liu <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Bjorn Helgaas <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: <[email protected]> [3.18+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
mm/mmap.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1693,8 +1693,12 @@ int vma_wants_writenotify(struct vm_area
pgprot_val(vm_pgprot_modify(vm_page_prot, vm_flags)))
return 0;

- /* Do we need to track softdirty? */
- if (IS_ENABLED(CONFIG_MEM_SOFT_DIRTY) && !(vm_flags & VM_SOFTDIRTY))
+ /*
+ * Do we need to track softdirty? hugetlb does not support softdirty
+ * tracking yet.
+ */
+ if (IS_ENABLED(CONFIG_MEM_SOFT_DIRTY) && !(vm_flags & VM_SOFTDIRTY) &&
+ !is_vm_hugetlb_page(vma))
return 1;

/* Specialty mapping? */


2022-08-29 12:08:31

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 127/158] ocfs2: fix freeing uninitialized resource on ocfs2_dlm_shutdown

From: Heming Zhao <[email protected]>

commit 550842cc60987b269e31b222283ade3e1b6c7fc8 upstream.

After commit 0737e01de9c4 ("ocfs2: ocfs2_mount_volume does cleanup job
before return error"), any procedure after ocfs2_dlm_init() fails will
trigger crash when calling ocfs2_dlm_shutdown().

ie: On local mount mode, no dlm resource is initialized. If
ocfs2_mount_volume() fails in ocfs2_find_slot(), error handling will call
ocfs2_dlm_shutdown(), then does dlm resource cleanup job, which will
trigger kernel crash.

This solution should bypass uninitialized resources in
ocfs2_dlm_shutdown().

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 0737e01de9c4 ("ocfs2: ocfs2_mount_volume does cleanup job before return error")
Signed-off-by: Heming Zhao <[email protected]>
Reviewed-by: Joseph Qi <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Gang He <[email protected]>
Cc: Jun Piao <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/ocfs2/dlmglue.c | 8 +++++---
fs/ocfs2/super.c | 3 +--
2 files changed, 6 insertions(+), 5 deletions(-)

--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -3403,10 +3403,12 @@ void ocfs2_dlm_shutdown(struct ocfs2_sup
ocfs2_lock_res_free(&osb->osb_nfs_sync_lockres);
ocfs2_lock_res_free(&osb->osb_orphan_scan.os_lockres);

- ocfs2_cluster_disconnect(osb->cconn, hangup_pending);
- osb->cconn = NULL;
+ if (osb->cconn) {
+ ocfs2_cluster_disconnect(osb->cconn, hangup_pending);
+ osb->cconn = NULL;

- ocfs2_dlm_shutdown_debug(osb);
+ ocfs2_dlm_shutdown_debug(osb);
+ }
}

static int ocfs2_drop_lock(struct ocfs2_super *osb,
--- a/fs/ocfs2/super.c
+++ b/fs/ocfs2/super.c
@@ -1914,8 +1914,7 @@ static void ocfs2_dismount_volume(struct
!ocfs2_is_hard_readonly(osb))
hangup_needed = 1;

- if (osb->cconn)
- ocfs2_dlm_shutdown(osb, hangup_needed);
+ ocfs2_dlm_shutdown(osb, hangup_needed);

ocfs2_blockcheck_stats_debugfs_remove(&osb->osb_ecc_stats);
debugfs_remove_recursive(osb->osb_debug_root);


2022-08-29 12:08:32

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 130/158] riscv: traps: add missing prototype

From: Conor Dooley <[email protected]>

commit d951b20b9def73dcc39a5379831525d0d2a537e9 upstream.

Sparse complains:
arch/riscv/kernel/traps.c:213:6: warning: symbol 'shadow_stack' was not declared. Should it be static?

The variable is used in entry.S, so declare shadow_stack there
alongside SHADOW_OVERFLOW_STACK_SIZE.

Fixes: 31da94c25aea ("riscv: add VMAP_STACK overflow detection")
Signed-off-by: Conor Dooley <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/riscv/include/asm/thread_info.h | 2 ++
arch/riscv/kernel/traps.c | 3 ++-
2 files changed, 4 insertions(+), 1 deletion(-)

--- a/arch/riscv/include/asm/thread_info.h
+++ b/arch/riscv/include/asm/thread_info.h
@@ -42,6 +42,8 @@

#ifndef __ASSEMBLY__

+extern long shadow_stack[SHADOW_OVERFLOW_STACK_SIZE / sizeof(long)];
+
#include <asm/processor.h>
#include <asm/csr.h>

--- a/arch/riscv/kernel/traps.c
+++ b/arch/riscv/kernel/traps.c
@@ -20,9 +20,10 @@

#include <asm/asm-prototypes.h>
#include <asm/bug.h>
+#include <asm/csr.h>
#include <asm/processor.h>
#include <asm/ptrace.h>
-#include <asm/csr.h>
+#include <asm/thread_info.h>

int show_unhandled_signals = 1;



2022-08-29 12:08:49

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 046/158] net: moxa: get rid of asymmetry in DMA mapping/unmapping

From: Sergei Antonov <[email protected]>

[ Upstream commit 0ee7828dfc56e97d71e51e6374dc7b4eb2b6e081 ]

Since priv->rx_mapping[i] is maped in moxart_mac_open(), we
should unmap it from moxart_mac_stop(). Fixes 2 warnings.

1. During error unwinding in moxart_mac_probe(): "goto init_fail;",
then moxart_mac_free_memory() calls dma_unmap_single() with
priv->rx_mapping[i] pointers zeroed.

WARNING: CPU: 0 PID: 1 at kernel/dma/debug.c:963 check_unmap+0x704/0x980
DMA-API: moxart-ethernet 92000000.mac: device driver tries to free DMA memory it has not allocated [device address=0x0000000000000000] [size=1600 bytes]
CPU: 0 PID: 1 Comm: swapper Not tainted 5.19.0+ #60
Hardware name: Generic DT based system
unwind_backtrace from show_stack+0x10/0x14
show_stack from dump_stack_lvl+0x34/0x44
dump_stack_lvl from __warn+0xbc/0x1f0
__warn from warn_slowpath_fmt+0x94/0xc8
warn_slowpath_fmt from check_unmap+0x704/0x980
check_unmap from debug_dma_unmap_page+0x8c/0x9c
debug_dma_unmap_page from moxart_mac_free_memory+0x3c/0xa8
moxart_mac_free_memory from moxart_mac_probe+0x190/0x218
moxart_mac_probe from platform_probe+0x48/0x88
platform_probe from really_probe+0xc0/0x2e4

2. After commands:
ip link set dev eth0 down
ip link set dev eth0 up

WARNING: CPU: 0 PID: 55 at kernel/dma/debug.c:570 add_dma_entry+0x204/0x2ec
DMA-API: moxart-ethernet 92000000.mac: cacheline tracking EEXIST, overlapping mappings aren't supported
CPU: 0 PID: 55 Comm: ip Not tainted 5.19.0+ #57
Hardware name: Generic DT based system
unwind_backtrace from show_stack+0x10/0x14
show_stack from dump_stack_lvl+0x34/0x44
dump_stack_lvl from __warn+0xbc/0x1f0
__warn from warn_slowpath_fmt+0x94/0xc8
warn_slowpath_fmt from add_dma_entry+0x204/0x2ec
add_dma_entry from dma_map_page_attrs+0x110/0x328
dma_map_page_attrs from moxart_mac_open+0x134/0x320
moxart_mac_open from __dev_open+0x11c/0x1ec
__dev_open from __dev_change_flags+0x194/0x22c
__dev_change_flags from dev_change_flags+0x14/0x44
dev_change_flags from devinet_ioctl+0x6d4/0x93c
devinet_ioctl from inet_ioctl+0x1ac/0x25c

v1 -> v2:
Extraneous change removed.

Fixes: 6c821bd9edc9 ("net: Add MOXA ART SoCs ethernet driver")
Signed-off-by: Sergei Antonov <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/ethernet/moxa/moxart_ether.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/moxa/moxart_ether.c b/drivers/net/ethernet/moxa/moxart_ether.c
index f11f1cb92025f..3b6beb96ca856 100644
--- a/drivers/net/ethernet/moxa/moxart_ether.c
+++ b/drivers/net/ethernet/moxa/moxart_ether.c
@@ -74,11 +74,6 @@ static int moxart_set_mac_address(struct net_device *ndev, void *addr)
static void moxart_mac_free_memory(struct net_device *ndev)
{
struct moxart_mac_priv_t *priv = netdev_priv(ndev);
- int i;
-
- for (i = 0; i < RX_DESC_NUM; i++)
- dma_unmap_single(&priv->pdev->dev, priv->rx_mapping[i],
- priv->rx_buf_size, DMA_FROM_DEVICE);

if (priv->tx_desc_base)
dma_free_coherent(&priv->pdev->dev,
@@ -193,6 +188,7 @@ static int moxart_mac_open(struct net_device *ndev)
static int moxart_mac_stop(struct net_device *ndev)
{
struct moxart_mac_priv_t *priv = netdev_priv(ndev);
+ int i;

napi_disable(&priv->napi);

@@ -204,6 +200,11 @@ static int moxart_mac_stop(struct net_device *ndev)
/* disable all functions */
writel(0, priv->base + REG_MAC_CTRL);

+ /* unmap areas mapped in moxart_mac_setup_desc_ring() */
+ for (i = 0; i < RX_DESC_NUM; i++)
+ dma_unmap_single(&priv->pdev->dev, priv->rx_mapping[i],
+ priv->rx_buf_size, DMA_FROM_DEVICE);
+
return 0;
}

--
2.35.1



2022-08-29 12:09:08

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 116/158] bootmem: remove the vmemmap pages from kmemleak in put_page_bootmem

From: Liu Shixin <[email protected]>

commit dd0ff4d12dd284c334f7e9b07f8f335af856ac78 upstream.

The vmemmap pages is marked by kmemleak when allocated from memblock.
Remove it from kmemleak when freeing the page. Otherwise, when we reuse
the page, kmemleak may report such an error and then stop working.

kmemleak: Cannot insert 0xffff98fb6eab3d40 into the object search tree (overlaps existing)
kmemleak: Kernel memory leak detector disabled
kmemleak: Object 0xffff98fb6be00000 (size 335544320):
kmemleak: comm "swapper", pid 0, jiffies 4294892296
kmemleak: min_count = 0
kmemleak: count = 0
kmemleak: flags = 0x1
kmemleak: checksum = 0
kmemleak: backtrace:

Link: https://lkml.kernel.org/r/[email protected]
Fixes: f41f2ed43ca5 (mm: hugetlb: free the vmemmap pages associated with each HugeTLB page)
Signed-off-by: Liu Shixin <[email protected]>
Reviewed-by: Muchun Song <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
mm/bootmem_info.c | 2 ++
1 file changed, 2 insertions(+)

--- a/mm/bootmem_info.c
+++ b/mm/bootmem_info.c
@@ -12,6 +12,7 @@
#include <linux/memblock.h>
#include <linux/bootmem_info.h>
#include <linux/memory_hotplug.h>
+#include <linux/kmemleak.h>

void get_page_bootmem(unsigned long info, struct page *page, unsigned long type)
{
@@ -33,6 +34,7 @@ void put_page_bootmem(struct page *page)
ClearPagePrivate(page);
set_page_private(page, 0);
INIT_LIST_HEAD(&page->lru);
+ kmemleak_free_part(page_to_virt(page), PAGE_SIZE);
free_reserved_page(page);
}
}


2022-08-29 12:09:42

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 017/158] fs: require CAP_SYS_ADMIN in target namespace for idmapped mounts

From: Seth Forshee <[email protected]>

[ Upstream commit bf1ac16edf6770a92bc75cf2373f1f9feea398a4 ]

Idmapped mounts should not allow a user to map file ownsership into a
range of ids which is not under the control of that user. However, we
currently don't check whether the mounter is privileged wrt to the
target user namespace.

Currently no FS_USERNS_MOUNT filesystems support idmapped mounts, thus
this is not a problem as only CAP_SYS_ADMIN in init_user_ns is allowed
to set up idmapped mounts. But this could change in the future, so add a
check to refuse to create idmapped mounts when the mounter does not have
CAP_SYS_ADMIN in the target user namespace.

Fixes: bd303368b776 ("fs: support mapped mounts of mapped filesystems")
Signed-off-by: Seth Forshee <[email protected]>
Reviewed-by: Christian Brauner (Microsoft) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Christian Brauner (Microsoft) <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
fs/namespace.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/fs/namespace.c b/fs/namespace.c
index e6a7e769d25dd..a59f8d645654a 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -4238,6 +4238,13 @@ static int build_mount_idmapped(const struct mount_attr *attr, size_t usize,
err = -EPERM;
goto out_fput;
}
+
+ /* We're not controlling the target namespace. */
+ if (!ns_capable(mnt_userns, CAP_SYS_ADMIN)) {
+ err = -EPERM;
+ goto out_fput;
+ }
+
kattr->mnt_userns = get_user_ns(mnt_userns);

out_fput:
--
2.35.1



2022-08-29 12:09:47

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 090/158] net: stmmac: work around sporadic tx issue on link-up

From: Heiner Kallweit <[email protected]>

[ Upstream commit a3a57bf07de23fe1ff779e0fdf710aa581c3ff73 ]

This is a follow-up to the discussion in [0]. It seems to me that
at least the IP version used on Amlogic SoC's sometimes has a problem
if register MAC_CTRL_REG is written whilst the chip is still processing
a previous write. But that's just a guess.
Adding a delay between two writes to this register helps, but we can
also simply omit the offending second write. This patch uses the second
approach and is based on a suggestion from Qi Duan.
Benefit of this approach is that we can save few register writes, also
on not affected chip versions.

[0] https://www.spinics.net/lists/netdev/msg831526.html

Fixes: bfab27a146ed ("stmmac: add the experimental PCI support")
Suggested-by: Qi Duan <[email protected]>
Suggested-by: Jerome Brunet <[email protected]>
Signed-off-by: Heiner Kallweit <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/ethernet/stmicro/stmmac/dwmac_lib.c | 8 ++++++--
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 9 +++++----
2 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac_lib.c b/drivers/net/ethernet/stmicro/stmmac/dwmac_lib.c
index caa4bfc4c1d62..9b6138b117766 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac_lib.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac_lib.c
@@ -258,14 +258,18 @@ EXPORT_SYMBOL_GPL(stmmac_set_mac_addr);
/* Enable disable MAC RX/TX */
void stmmac_set_mac(void __iomem *ioaddr, bool enable)
{
- u32 value = readl(ioaddr + MAC_CTRL_REG);
+ u32 old_val, value;
+
+ old_val = readl(ioaddr + MAC_CTRL_REG);
+ value = old_val;

if (enable)
value |= MAC_ENABLE_RX | MAC_ENABLE_TX;
else
value &= ~(MAC_ENABLE_TX | MAC_ENABLE_RX);

- writel(value, ioaddr + MAC_CTRL_REG);
+ if (value != old_val)
+ writel(value, ioaddr + MAC_CTRL_REG);
}

void stmmac_get_mac_addr(void __iomem *ioaddr, unsigned char *addr,
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index c5f33630e7718..78f11dabca056 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -983,10 +983,10 @@ static void stmmac_mac_link_up(struct phylink_config *config,
bool tx_pause, bool rx_pause)
{
struct stmmac_priv *priv = netdev_priv(to_net_dev(config->dev));
- u32 ctrl;
+ u32 old_ctrl, ctrl;

- ctrl = readl(priv->ioaddr + MAC_CTRL_REG);
- ctrl &= ~priv->hw->link.speed_mask;
+ old_ctrl = readl(priv->ioaddr + MAC_CTRL_REG);
+ ctrl = old_ctrl & ~priv->hw->link.speed_mask;

if (interface == PHY_INTERFACE_MODE_USXGMII) {
switch (speed) {
@@ -1061,7 +1061,8 @@ static void stmmac_mac_link_up(struct phylink_config *config,
if (tx_pause && rx_pause)
stmmac_mac_flow_ctrl(priv, duplex);

- writel(ctrl, priv->ioaddr + MAC_CTRL_REG);
+ if (ctrl != old_ctrl)
+ writel(ctrl, priv->ioaddr + MAC_CTRL_REG);

stmmac_mac_set(priv, priv->ioaddr, true);
if (phy && priv->dma_cap.eee) {
--
2.35.1



2022-08-29 12:10:00

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 044/158] net: ipa: dont assume SMEM is page-aligned

From: Alex Elder <[email protected]>

[ Upstream commit b8d4380365c515d8e0351f2f46d371738dd19be1 ]

In ipa_smem_init(), a Qualcomm SMEM region is allocated (if needed)
and then its virtual address is fetched using qcom_smem_get(). The
physical address associated with that region is also fetched.

The physical address is adjusted so that it is page-aligned, and an
attempt is made to update the size of the region to compensate for
any non-zero adjustment.

But that adjustment isn't done properly. The physical address is
aligned twice, and as a result the size is never actually adjusted.

Fix this by *not* aligning the "addr" local variable, and instead
making the "phys" local variable be the adjusted "addr" value.

Fixes: a0036bb413d5b ("net: ipa: define SMEM memory region for IPA")
Signed-off-by: Alex Elder <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/ipa/ipa_mem.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ipa/ipa_mem.c b/drivers/net/ipa/ipa_mem.c
index 1e9eae208e44f..53a1dbeaffa6d 100644
--- a/drivers/net/ipa/ipa_mem.c
+++ b/drivers/net/ipa/ipa_mem.c
@@ -568,7 +568,7 @@ static int ipa_smem_init(struct ipa *ipa, u32 item, size_t size)
}

/* Align the address down and the size up to a page boundary */
- addr = qcom_smem_virt_to_phys(virt) & PAGE_MASK;
+ addr = qcom_smem_virt_to_phys(virt);
phys = addr & PAGE_MASK;
size = PAGE_ALIGN(size + addr - phys);
iova = phys; /* We just want a direct mapping */
--
2.35.1



2022-08-29 12:10:12

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 040/158] net: dsa: microchip: move vlan functionality to ksz_common

From: Arun Ramadoss <[email protected]>

[ Upstream commit f0d997e31bb307c7aa046c4992c568547fd25195 ]

This patch moves the vlan dsa_switch_ops such as vlan_add, vlan_del and
vlan_filtering from the individual files ksz8795.c, ksz9477.c to
ksz_common.c file.

Signed-off-by: Arun Ramadoss <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/dsa/microchip/ksz8795.c | 19 +++++++------
drivers/net/dsa/microchip/ksz9477.c | 19 +++++++------
drivers/net/dsa/microchip/ksz_common.c | 37 ++++++++++++++++++++++++++
drivers/net/dsa/microchip/ksz_common.h | 14 ++++++++++
4 files changed, 69 insertions(+), 20 deletions(-)

diff --git a/drivers/net/dsa/microchip/ksz8795.c b/drivers/net/dsa/microchip/ksz8795.c
index 041956e3c7b1a..16e946dbd9d42 100644
--- a/drivers/net/dsa/microchip/ksz8795.c
+++ b/drivers/net/dsa/microchip/ksz8795.c
@@ -958,11 +958,9 @@ static void ksz8_flush_dyn_mac_table(struct ksz_device *dev, int port)
}
}

-static int ksz8_port_vlan_filtering(struct dsa_switch *ds, int port, bool flag,
+static int ksz8_port_vlan_filtering(struct ksz_device *dev, int port, bool flag,
struct netlink_ext_ack *extack)
{
- struct ksz_device *dev = ds->priv;
-
if (ksz_is_ksz88x3(dev))
return -ENOTSUPP;

@@ -987,12 +985,11 @@ static void ksz8_port_enable_pvid(struct ksz_device *dev, int port, bool state)
}
}

-static int ksz8_port_vlan_add(struct dsa_switch *ds, int port,
+static int ksz8_port_vlan_add(struct ksz_device *dev, int port,
const struct switchdev_obj_port_vlan *vlan,
struct netlink_ext_ack *extack)
{
bool untagged = vlan->flags & BRIDGE_VLAN_INFO_UNTAGGED;
- struct ksz_device *dev = ds->priv;
struct ksz_port *p = &dev->ports[port];
u16 data, new_pvid = 0;
u8 fid, member, valid;
@@ -1060,10 +1057,9 @@ static int ksz8_port_vlan_add(struct dsa_switch *ds, int port,
return 0;
}

-static int ksz8_port_vlan_del(struct dsa_switch *ds, int port,
+static int ksz8_port_vlan_del(struct ksz_device *dev, int port,
const struct switchdev_obj_port_vlan *vlan)
{
- struct ksz_device *dev = ds->priv;
u16 data, pvid;
u8 fid, member, valid;

@@ -1398,9 +1394,9 @@ static const struct dsa_switch_ops ksz8_switch_ops = {
.port_bridge_leave = ksz_port_bridge_leave,
.port_stp_state_set = ksz8_port_stp_state_set,
.port_fast_age = ksz_port_fast_age,
- .port_vlan_filtering = ksz8_port_vlan_filtering,
- .port_vlan_add = ksz8_port_vlan_add,
- .port_vlan_del = ksz8_port_vlan_del,
+ .port_vlan_filtering = ksz_port_vlan_filtering,
+ .port_vlan_add = ksz_port_vlan_add,
+ .port_vlan_del = ksz_port_vlan_del,
.port_fdb_dump = ksz_port_fdb_dump,
.port_mdb_add = ksz_port_mdb_add,
.port_mdb_del = ksz_port_mdb_del,
@@ -1465,6 +1461,9 @@ static const struct ksz_dev_ops ksz8_dev_ops = {
.r_mib_pkt = ksz8_r_mib_pkt,
.freeze_mib = ksz8_freeze_mib,
.port_init_cnt = ksz8_port_init_cnt,
+ .vlan_filtering = ksz8_port_vlan_filtering,
+ .vlan_add = ksz8_port_vlan_add,
+ .vlan_del = ksz8_port_vlan_del,
.shutdown = ksz8_reset_switch,
.init = ksz8_switch_init,
.exit = ksz8_switch_exit,
diff --git a/drivers/net/dsa/microchip/ksz9477.c b/drivers/net/dsa/microchip/ksz9477.c
index 31be767027feb..1bb994a9109cd 100644
--- a/drivers/net/dsa/microchip/ksz9477.c
+++ b/drivers/net/dsa/microchip/ksz9477.c
@@ -377,12 +377,10 @@ static void ksz9477_flush_dyn_mac_table(struct ksz_device *dev, int port)
}
}

-static int ksz9477_port_vlan_filtering(struct dsa_switch *ds, int port,
+static int ksz9477_port_vlan_filtering(struct ksz_device *dev, int port,
bool flag,
struct netlink_ext_ack *extack)
{
- struct ksz_device *dev = ds->priv;
-
if (flag) {
ksz_port_cfg(dev, port, REG_PORT_LUE_CTRL,
PORT_VLAN_LOOKUP_VID_0, true);
@@ -396,11 +394,10 @@ static int ksz9477_port_vlan_filtering(struct dsa_switch *ds, int port,
return 0;
}

-static int ksz9477_port_vlan_add(struct dsa_switch *ds, int port,
+static int ksz9477_port_vlan_add(struct ksz_device *dev, int port,
const struct switchdev_obj_port_vlan *vlan,
struct netlink_ext_ack *extack)
{
- struct ksz_device *dev = ds->priv;
u32 vlan_table[3];
bool untagged = vlan->flags & BRIDGE_VLAN_INFO_UNTAGGED;
int err;
@@ -433,10 +430,9 @@ static int ksz9477_port_vlan_add(struct dsa_switch *ds, int port,
return 0;
}

-static int ksz9477_port_vlan_del(struct dsa_switch *ds, int port,
+static int ksz9477_port_vlan_del(struct ksz_device *dev, int port,
const struct switchdev_obj_port_vlan *vlan)
{
- struct ksz_device *dev = ds->priv;
bool untagged = vlan->flags & BRIDGE_VLAN_INFO_UNTAGGED;
u32 vlan_table[3];
u16 pvid;
@@ -1331,9 +1327,9 @@ static const struct dsa_switch_ops ksz9477_switch_ops = {
.port_bridge_leave = ksz_port_bridge_leave,
.port_stp_state_set = ksz9477_port_stp_state_set,
.port_fast_age = ksz_port_fast_age,
- .port_vlan_filtering = ksz9477_port_vlan_filtering,
- .port_vlan_add = ksz9477_port_vlan_add,
- .port_vlan_del = ksz9477_port_vlan_del,
+ .port_vlan_filtering = ksz_port_vlan_filtering,
+ .port_vlan_add = ksz_port_vlan_add,
+ .port_vlan_del = ksz_port_vlan_del,
.port_fdb_dump = ksz9477_port_fdb_dump,
.port_fdb_add = ksz9477_port_fdb_add,
.port_fdb_del = ksz9477_port_fdb_del,
@@ -1413,6 +1409,9 @@ static const struct ksz_dev_ops ksz9477_dev_ops = {
.r_mib_stat64 = ksz_r_mib_stats64,
.freeze_mib = ksz9477_freeze_mib,
.port_init_cnt = ksz9477_port_init_cnt,
+ .vlan_filtering = ksz9477_port_vlan_filtering,
+ .vlan_add = ksz9477_port_vlan_add,
+ .vlan_del = ksz9477_port_vlan_del,
.shutdown = ksz9477_reset_switch,
.init = ksz9477_switch_init,
.exit = ksz9477_switch_exit,
diff --git a/drivers/net/dsa/microchip/ksz_common.c b/drivers/net/dsa/microchip/ksz_common.c
index 0713a40685fa9..5db2b55152885 100644
--- a/drivers/net/dsa/microchip/ksz_common.c
+++ b/drivers/net/dsa/microchip/ksz_common.c
@@ -954,6 +954,43 @@ enum dsa_tag_protocol ksz_get_tag_protocol(struct dsa_switch *ds,
}
EXPORT_SYMBOL_GPL(ksz_get_tag_protocol);

+int ksz_port_vlan_filtering(struct dsa_switch *ds, int port,
+ bool flag, struct netlink_ext_ack *extack)
+{
+ struct ksz_device *dev = ds->priv;
+
+ if (!dev->dev_ops->vlan_filtering)
+ return -EOPNOTSUPP;
+
+ return dev->dev_ops->vlan_filtering(dev, port, flag, extack);
+}
+EXPORT_SYMBOL_GPL(ksz_port_vlan_filtering);
+
+int ksz_port_vlan_add(struct dsa_switch *ds, int port,
+ const struct switchdev_obj_port_vlan *vlan,
+ struct netlink_ext_ack *extack)
+{
+ struct ksz_device *dev = ds->priv;
+
+ if (!dev->dev_ops->vlan_add)
+ return -EOPNOTSUPP;
+
+ return dev->dev_ops->vlan_add(dev, port, vlan, extack);
+}
+EXPORT_SYMBOL_GPL(ksz_port_vlan_add);
+
+int ksz_port_vlan_del(struct dsa_switch *ds, int port,
+ const struct switchdev_obj_port_vlan *vlan)
+{
+ struct ksz_device *dev = ds->priv;
+
+ if (!dev->dev_ops->vlan_del)
+ return -EOPNOTSUPP;
+
+ return dev->dev_ops->vlan_del(dev, port, vlan);
+}
+EXPORT_SYMBOL_GPL(ksz_port_vlan_del);
+
static int ksz_switch_detect(struct ksz_device *dev)
{
u8 id1, id2;
diff --git a/drivers/net/dsa/microchip/ksz_common.h b/drivers/net/dsa/microchip/ksz_common.h
index 21db6f79035fa..1baa270859aa2 100644
--- a/drivers/net/dsa/microchip/ksz_common.h
+++ b/drivers/net/dsa/microchip/ksz_common.h
@@ -180,6 +180,13 @@ struct ksz_dev_ops {
void (*r_mib_pkt)(struct ksz_device *dev, int port, u16 addr,
u64 *dropped, u64 *cnt);
void (*r_mib_stat64)(struct ksz_device *dev, int port);
+ int (*vlan_filtering)(struct ksz_device *dev, int port,
+ bool flag, struct netlink_ext_ack *extack);
+ int (*vlan_add)(struct ksz_device *dev, int port,
+ const struct switchdev_obj_port_vlan *vlan,
+ struct netlink_ext_ack *extack);
+ int (*vlan_del)(struct ksz_device *dev, int port,
+ const struct switchdev_obj_port_vlan *vlan);
void (*freeze_mib)(struct ksz_device *dev, int port, bool freeze);
void (*port_init_cnt)(struct ksz_device *dev, int port);
int (*shutdown)(struct ksz_device *dev);
@@ -233,6 +240,13 @@ void ksz_get_strings(struct dsa_switch *ds, int port,
u32 stringset, uint8_t *buf);
enum dsa_tag_protocol ksz_get_tag_protocol(struct dsa_switch *ds,
int port, enum dsa_tag_protocol mp);
+int ksz_port_vlan_filtering(struct dsa_switch *ds, int port,
+ bool flag, struct netlink_ext_ack *extack);
+int ksz_port_vlan_add(struct dsa_switch *ds, int port,
+ const struct switchdev_obj_port_vlan *vlan,
+ struct netlink_ext_ack *extack);
+int ksz_port_vlan_del(struct dsa_switch *ds, int port,
+ const struct switchdev_obj_port_vlan *vlan);

/* Common register access functions */

--
2.35.1



2022-08-29 12:10:15

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 129/158] riscv: signal: fix missing prototype warning

From: Conor Dooley <[email protected]>

commit b5c3aca86d2698c4850b6ee8b341938025d2780c upstream.

Fix the warning:
arch/riscv/kernel/signal.c:316:27: warning: no previous prototype for function 'do_notify_resume' [-Wmissing-prototypes]
asmlinkage __visible void do_notify_resume(struct pt_regs *regs,

All other functions in the file are static & none of the existing
headers stood out as an obvious location. Create signal.h to hold the
declaration.

Fixes: e2c0cdfba7f6 ("RISC-V: User-facing API")
Signed-off-by: Conor Dooley <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/riscv/include/asm/signal.h | 12 ++++++++++++
arch/riscv/kernel/signal.c | 1 +
2 files changed, 13 insertions(+)
create mode 100644 arch/riscv/include/asm/signal.h

--- /dev/null
+++ b/arch/riscv/include/asm/signal.h
@@ -0,0 +1,12 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef __ASM_SIGNAL_H
+#define __ASM_SIGNAL_H
+
+#include <uapi/asm/signal.h>
+#include <uapi/asm/ptrace.h>
+
+asmlinkage __visible
+void do_notify_resume(struct pt_regs *regs, unsigned long thread_info_flags);
+
+#endif
--- a/arch/riscv/kernel/signal.c
+++ b/arch/riscv/kernel/signal.c
@@ -15,6 +15,7 @@

#include <asm/ucontext.h>
#include <asm/vdso.h>
+#include <asm/signal.h>
#include <asm/signal32.h>
#include <asm/switch_to.h>
#include <asm/csr.h>


2022-08-29 12:10:16

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 154/158] riscv: dts: microchip: mpfs: fix incorrect pcie child node name

From: Conor Dooley <[email protected]>

commit 3f67e69976035352db110443916bcce32c7f64ac upstream.

Recent versions of dt-schema complain about the PCIe controller's child
node name:
arch/riscv/boot/dts/microchip/mpfs-icicle-kit.dtb: pcie@2000000000: Unevaluated properties are not allowed ('clock-names', 'clocks', 'legacy-interrupt-controller', 'microchip,axi-m-atr0' were unexpected)
From schema: Documentation/devicetree/bindings/pci/microchip,pcie-host.yaml
Make the dts match the correct property name in the dts.

Fixes: 528a5b1f2556 ("riscv: dts: microchip: add new peripherals to icicle kit device tree")
Signed-off-by: Conor Dooley <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/riscv/boot/dts/microchip/mpfs.dtsi | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/riscv/boot/dts/microchip/mpfs.dtsi
+++ b/arch/riscv/boot/dts/microchip/mpfs.dtsi
@@ -448,7 +448,7 @@
msi-controller;
microchip,axi-m-atr0 = <0x10 0x0>;
status = "disabled";
- pcie_intc: legacy-interrupt-controller {
+ pcie_intc: interrupt-controller {
#address-cells = <0>;
#interrupt-cells = <1>;
interrupt-controller;


2022-08-29 12:10:31

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 002/158] NFS: Fix another fsync() issue after a server reboot

From: Trond Myklebust <[email protected]>

commit 67f4b5dc49913abcdb5cc736e73674e2f352f81d upstream.

Currently, when the writeback code detects a server reboot, it redirties
any pages that were not committed to disk, and it sets the flag
NFS_CONTEXT_RESEND_WRITES in the nfs_open_context of the file descriptor
that dirtied the file. While this allows the file descriptor in question
to redrive its own writes, it violates the fsync() requirement that we
should be synchronising all writes to disk.
While the problem is infrequent, we do see corner cases where an
untimely server reboot causes the fsync() call to abandon its attempt to
sync data to disk and causing data corruption issues due to missed error
conditions or similar.

In order to tighted up the client's ability to deal with this situation
without introducing livelocks, add a counter that records the number of
times pages are redirtied due to a server reboot-like condition, and use
that in fsync() to redrive the sync to disk.

Fixes: 2197e9b06c22 ("NFS: Fix up fsync() when the server rebooted")
Cc: [email protected]
Signed-off-by: Trond Myklebust <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/nfs/file.c | 15 ++++++---------
fs/nfs/inode.c | 1 +
fs/nfs/write.c | 6 ++++--
include/linux/nfs_fs.h | 1 +
4 files changed, 12 insertions(+), 11 deletions(-)

--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -221,8 +221,10 @@ nfs_file_fsync_commit(struct file *file,
int
nfs_file_fsync(struct file *file, loff_t start, loff_t end, int datasync)
{
- struct nfs_open_context *ctx = nfs_file_open_context(file);
struct inode *inode = file_inode(file);
+ struct nfs_inode *nfsi = NFS_I(inode);
+ long save_nredirtied = atomic_long_read(&nfsi->redirtied_pages);
+ long nredirtied;
int ret;

trace_nfs_fsync_enter(inode);
@@ -237,15 +239,10 @@ nfs_file_fsync(struct file *file, loff_t
ret = pnfs_sync_inode(inode, !!datasync);
if (ret != 0)
break;
- if (!test_and_clear_bit(NFS_CONTEXT_RESEND_WRITES, &ctx->flags))
+ nredirtied = atomic_long_read(&nfsi->redirtied_pages);
+ if (nredirtied == save_nredirtied)
break;
- /*
- * If nfs_file_fsync_commit detected a server reboot, then
- * resend all dirty pages that might have been covered by
- * the NFS_CONTEXT_RESEND_WRITES flag
- */
- start = 0;
- end = LLONG_MAX;
+ save_nredirtied = nredirtied;
}

trace_nfs_fsync_exit(inode, ret);
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -426,6 +426,7 @@ nfs_ilookup(struct super_block *sb, stru
static void nfs_inode_init_regular(struct nfs_inode *nfsi)
{
atomic_long_set(&nfsi->nrequests, 0);
+ atomic_long_set(&nfsi->redirtied_pages, 0);
INIT_LIST_HEAD(&nfsi->commit_info.list);
atomic_long_set(&nfsi->commit_info.ncommit, 0);
atomic_set(&nfsi->commit_info.rpcs_out, 0);
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1419,10 +1419,12 @@ static void nfs_initiate_write(struct nf
*/
static void nfs_redirty_request(struct nfs_page *req)
{
+ struct nfs_inode *nfsi = NFS_I(page_file_mapping(req->wb_page)->host);
+
/* Bump the transmission count */
req->wb_nio++;
nfs_mark_request_dirty(req);
- set_bit(NFS_CONTEXT_RESEND_WRITES, &nfs_req_openctx(req)->flags);
+ atomic_long_inc(&nfsi->redirtied_pages);
nfs_end_page_writeback(req);
nfs_release_request(req);
}
@@ -1892,7 +1894,7 @@ static void nfs_commit_release_pages(str
/* We have a mismatch. Write the page again */
dprintk_cont(" mismatch\n");
nfs_mark_request_dirty(req);
- set_bit(NFS_CONTEXT_RESEND_WRITES, &nfs_req_openctx(req)->flags);
+ atomic_long_inc(&NFS_I(data->inode)->redirtied_pages);
next:
nfs_unlock_and_release_request(req);
/* Latency breaker */
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -182,6 +182,7 @@ struct nfs_inode {
/* Regular file */
struct {
atomic_long_t nrequests;
+ atomic_long_t redirtied_pages;
struct nfs_mds_commit_info commit_info;
struct mutex commit_mutex;
};


2022-08-29 12:10:36

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 051/158] bnxt_en: Use PAGE_SIZE to init buffer when multi buffer XDP is not in use

From: Pavan Chebbi <[email protected]>

[ Upstream commit 7dd3de7cb1d657a918c6b2bc673c71e318aa0c05 ]

Using BNXT_PAGE_MODE_BUF_SIZE + offset as buffer length value is not
sufficient when running single buffer XDP programs doing redirect
operations. The stack will complain on missing skb tail room. Fix it
by using PAGE_SIZE when calling xdp_init_buff() for single buffer
programs.

Fixes: b231c3f3414c ("bnxt: refactor bnxt_rx_xdp to separate xdp_init_buff/xdp_prepare_buff")
Reviewed-by: Somnath Kotur <[email protected]>
Signed-off-by: Pavan Chebbi <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/ethernet/broadcom/bnxt/bnxt.h | 1 +
drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c | 10 ++++++++--
2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 075c6206325ce..b1b17f9113006 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -2130,6 +2130,7 @@ struct bnxt {
#define BNXT_DUMP_CRASH 1

struct bpf_prog *xdp_prog;
+ u8 xdp_has_frags;

struct bnxt_ptp_cfg *ptp_cfg;
u8 ptp_all_rx_tstamp;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
index f53387ed0167b..c3065ec0a4798 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
@@ -181,6 +181,7 @@ void bnxt_xdp_buff_init(struct bnxt *bp, struct bnxt_rx_ring_info *rxr,
struct xdp_buff *xdp)
{
struct bnxt_sw_rx_bd *rx_buf;
+ u32 buflen = PAGE_SIZE;
struct pci_dev *pdev;
dma_addr_t mapping;
u32 offset;
@@ -192,7 +193,10 @@ void bnxt_xdp_buff_init(struct bnxt *bp, struct bnxt_rx_ring_info *rxr,
mapping = rx_buf->mapping - bp->rx_dma_offset;
dma_sync_single_for_cpu(&pdev->dev, mapping + offset, *len, bp->rx_dir);

- xdp_init_buff(xdp, BNXT_PAGE_MODE_BUF_SIZE + offset, &rxr->xdp_rxq);
+ if (bp->xdp_has_frags)
+ buflen = BNXT_PAGE_MODE_BUF_SIZE + offset;
+
+ xdp_init_buff(xdp, buflen, &rxr->xdp_rxq);
xdp_prepare_buff(xdp, *data_ptr - offset, offset, *len, false);
}

@@ -397,8 +401,10 @@ static int bnxt_xdp_set(struct bnxt *bp, struct bpf_prog *prog)
netdev_warn(dev, "ethtool rx/tx channels must be combined to support XDP.\n");
return -EOPNOTSUPP;
}
- if (prog)
+ if (prog) {
tx_xdp = bp->rx_nr_rings;
+ bp->xdp_has_frags = prog->aux->xdp_has_frags;
+ }

tc = netdev_get_num_tc(dev);
if (!tc)
--
2.35.1



2022-08-29 12:10:45

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 031/158] net/mlx5: Eswitch, Fix forwarding decision to uplink

From: Eli Cohen <[email protected]>

[ Upstream commit 942fca7e762be39204e5926e91a288a343a97c72 ]

Make sure to modify the rule for uplink forwarding only for the case
where destination vport number is MLX5_VPORT_UPLINK.

Fixes: 94db33177819 ("net/mlx5: Support multiport eswitch mode")
Signed-off-by: Eli Cohen <[email protected]>
Reviewed-by: Maor Dickman <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index eb79810199d3e..d04739cb793e5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -427,7 +427,8 @@ esw_setup_vport_dest(struct mlx5_flow_destination *dest, struct mlx5_flow_act *f
dest[dest_idx].vport.vhca_id =
MLX5_CAP_GEN(esw_attr->dests[attr_idx].mdev, vhca_id);
dest[dest_idx].vport.flags |= MLX5_FLOW_DEST_VPORT_VHCA_ID;
- if (mlx5_lag_mpesw_is_activated(esw->dev))
+ if (dest[dest_idx].vport.num == MLX5_VPORT_UPLINK &&
+ mlx5_lag_mpesw_is_activated(esw->dev))
dest[dest_idx].type = MLX5_FLOW_DESTINATION_TYPE_UPLINK;
}
if (esw_attr->dests[attr_idx].flags & MLX5_ESW_DEST_ENCAP) {
--
2.35.1



2022-08-29 12:10:45

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 119/158] cifs: skip extra NULL byte in filenames

From: Paulo Alcantara <[email protected]>

commit a1d2eb51f0a33c28f5399a1610e66b3fbd24e884 upstream.

Since commit:
cifs: alloc_path_with_tree_prefix: do not append sep. if the path is empty
alloc_path_with_tree_prefix() function was no longer including the
trailing separator when @path is empty, although @out_len was still
assuming a path separator thus adding an extra byte to the final
filename.

This has caused mount issues in some Synology servers due to the extra
NULL byte in filenames when sending SMB2_CREATE requests with
SMB2_FLAGS_DFS_OPERATIONS set.

Fix this by checking if @path is not empty and then add extra byte for
separator. Also, do not include any trailing NULL bytes in filename
as MS-SMB2 requires it to be 8-byte aligned and not NULL terminated.

Cc: [email protected]
Fixes: 7eacba3b00a3 ("cifs: alloc_path_with_tree_prefix: do not append sep. if the path is empty")
Signed-off-by: Paulo Alcantara (SUSE) <[email protected]>
Signed-off-by: Steve French <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/cifs/smb2pdu.c | 16 ++++++----------
1 file changed, 6 insertions(+), 10 deletions(-)

--- a/fs/cifs/smb2pdu.c
+++ b/fs/cifs/smb2pdu.c
@@ -2571,19 +2571,15 @@ alloc_path_with_tree_prefix(__le16 **out

path_len = UniStrnlen((wchar_t *)path, PATH_MAX);

- /*
- * make room for one path separator between the treename and
- * path
- */
- *out_len = treename_len + 1 + path_len;
+ /* make room for one path separator only if @path isn't empty */
+ *out_len = treename_len + (path[0] ? 1 : 0) + path_len;

/*
- * final path needs to be null-terminated UTF16 with a
- * size aligned to 8
+ * final path needs to be 8-byte aligned as specified in
+ * MS-SMB2 2.2.13 SMB2 CREATE Request.
*/
-
- *out_size = roundup((*out_len+1)*2, 8);
- *out_path = kzalloc(*out_size, GFP_KERNEL);
+ *out_size = roundup(*out_len * sizeof(__le16), 8);
+ *out_path = kzalloc(*out_size + sizeof(__le16) /* null */, GFP_KERNEL);
if (!*out_path)
return -ENOMEM;



2022-08-29 12:10:46

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 073/158] net: Fix a data-race around sysctl_net_busy_read.

From: Kuniyuki Iwashima <[email protected]>

[ Upstream commit e59ef36f0795696ab229569c153936bfd068d21c ]

While reading sysctl_net_busy_read, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 2d48d67fa8cd ("net: poll/select low latency socket support")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/core/sock.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/sock.c b/net/core/sock.c
index d672e63a5c2d4..16ab5ef749c60 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -3365,7 +3365,7 @@ void sock_init_data(struct socket *sock, struct sock *sk)

#ifdef CONFIG_NET_RX_BUSY_POLL
sk->sk_napi_id = 0;
- sk->sk_ll_usec = sysctl_net_busy_read;
+ sk->sk_ll_usec = READ_ONCE(sysctl_net_busy_read);
#endif

sk->sk_max_pacing_rate = ~0UL;
--
2.35.1



2022-08-29 12:11:05

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 143/158] perf python: Fix build when PYTHON_CONFIG is user supplied

From: James Clark <[email protected]>

commit bc9e7fe313d5e56d4d5f34bcc04d1165f94f86fb upstream.

The previous change to Python autodetection had a small mistake where
the auto value was used to determine the Python binary, rather than the
user supplied value. The Python binary is only used for one part of the
build process, rather than the final linking, so it was producing
correct builds in most scenarios, especially when the auto detected
value matched what the user wanted, or the system only had a valid set
of Pythons.

Change it so that the Python binary path is derived from either the
PYTHON_CONFIG value or PYTHON value, depending on what is specified by
the user. This was the original intention.

This error was spotted in a build failure an odd cross compilation
environment after commit 4c41cb46a732fe82 ("perf python: Prefer
python3") was merged.

Fixes: 630af16eee495f58 ("perf tools: Use Python devtools for version autodetection rather than runtime")
Signed-off-by: James Clark <[email protected]>
Acked-by: Ian Rogers <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
tools/perf/Makefile.config | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/tools/perf/Makefile.config
+++ b/tools/perf/Makefile.config
@@ -265,7 +265,7 @@ endif
# defined. get-executable-or-default fails with an error if the first argument is supplied but
# doesn't exist.
override PYTHON_CONFIG := $(call get-executable-or-default,PYTHON_CONFIG,$(PYTHON_AUTO))
-override PYTHON := $(call get-executable-or-default,PYTHON,$(subst -config,,$(PYTHON_AUTO)))
+override PYTHON := $(call get-executable-or-default,PYTHON,$(subst -config,,$(PYTHON_CONFIG)))

grep-libs = $(filter -l%,$(1))
strip-libs = $(filter-out -l%,$(1))


2022-08-29 12:11:06

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 035/158] net/mlx5e: Fix wrong application of the LRO state

From: Aya Levin <[email protected]>

[ Upstream commit 7b3707fc79044871ab8f3d5fa5e9603155bb5577 ]

Driver caches packet merge type in mlx5e_params instance which must be
in perfect sync with the netdev_feature's bit.
Prior to this patch, in certain conditions (*) LRO state was set in
mlx5e_params, while netdev_feature's bit was off. Causing the LRO to
be applied on the RQs (HW level).

(*) This can happen only on profile init (mlx5e_build_nic_params()),
when RQ expect non-linear SKB and PCI is fast enough in comparison to
link width.

Solution: remove setting of packet merge type from
mlx5e_build_nic_params() as netdev features are not updated.

Fixes: 619a8f2a42f1 ("net/mlx5e: Use linear SKB in Striding RQ")
Signed-off-by: Aya Levin <[email protected]>
Reviewed-by: Tariq Toukan <[email protected]>
Reviewed-by: Maxim Mikityanskiy <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 8 --------
1 file changed, 8 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 087952b84ccb0..62aab20025345 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -4733,14 +4733,6 @@ void mlx5e_build_nic_params(struct mlx5e_priv *priv, struct mlx5e_xsk *xsk, u16
/* RQ */
mlx5e_build_rq_params(mdev, params);

- /* HW LRO */
- if (MLX5_CAP_ETH(mdev, lro_cap) &&
- params->rq_wq_type == MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ) {
- /* No XSK params: checking the availability of striding RQ in general. */
- if (!mlx5e_rx_mpwqe_is_linear_skb(mdev, params, NULL))
- params->packet_merge.type = slow_pci_heuristic(mdev) ?
- MLX5E_PACKET_MERGE_NONE : MLX5E_PACKET_MERGE_LRO;
- }
params->packet_merge.timeout = mlx5e_choose_lro_timeout(mdev, MLX5E_DEFAULT_LRO_TIMEOUT);

/* CQ moderation params */
--
2.35.1



2022-08-29 12:11:07

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 084/158] net: ethernet: mtk_eth_soc: enable rx cksum offload for MTK_NETSYS_V2

From: Lorenzo Bianconi <[email protected]>

[ Upstream commit da6e113ff010815fdd21ee1e9af2e8d179a2680f ]

Enable rx checksum offload for mt7986 chipset.

Signed-off-by: Lorenzo Bianconi <[email protected]>
Link: https://lore.kernel.org/r/c8699805c18f7fd38315fcb8da2787676d83a32c.1654544585.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/ethernet/mediatek/mtk_eth_soc.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index 59c9a10f83ba5..6beb3d4873a37 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -1444,8 +1444,8 @@ static int mtk_poll_rx(struct napi_struct *napi, int budget,
int done = 0, bytes = 0;

while (done < budget) {
+ unsigned int pktlen, *rxdcsum;
struct net_device *netdev;
- unsigned int pktlen;
dma_addr_t dma_addr;
u32 hash, reason;
int mac = 0;
@@ -1512,7 +1512,13 @@ static int mtk_poll_rx(struct napi_struct *napi, int budget,
pktlen = RX_DMA_GET_PLEN0(trxd.rxd2);
skb->dev = netdev;
skb_put(skb, pktlen);
- if (trxd.rxd4 & eth->soc->txrx.rx_dma_l4_valid)
+
+ if (MTK_HAS_CAPS(eth->soc->caps, MTK_NETSYS_V2))
+ rxdcsum = &trxd.rxd3;
+ else
+ rxdcsum = &trxd.rxd4;
+
+ if (*rxdcsum & eth->soc->txrx.rx_dma_l4_valid)
skb->ip_summed = CHECKSUM_UNNECESSARY;
else
skb_checksum_none_assert(skb);
@@ -3761,6 +3767,7 @@ static const struct mtk_soc_data mt7986_data = {
.txd_size = sizeof(struct mtk_tx_dma_v2),
.rxd_size = sizeof(struct mtk_rx_dma_v2),
.rx_irq_done_mask = MTK_RX_DONE_INT_V2,
+ .rx_dma_l4_valid = RX_DMA_L4_VALID_V2,
.dma_max_len = MTK_TX_DMA_BUF_LEN_V2,
.dma_len_offset = 8,
},
--
2.35.1



2022-08-29 12:11:28

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 133/158] io_uring: fix issue with io_write() not always undoing sb_start_write()

From: Jens Axboe <[email protected]>

commit e053aaf4da56cbf0afb33a0fda4a62188e2c0637 upstream.

This is actually an older issue, but we never used to hit the -EAGAIN
path before having done sb_start_write(). Make sure that we always call
kiocb_end_write() if we need to retry the write, so that we keep the
calls to sb_start_write() etc balanced.

Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
io_uring/io_uring.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -4331,7 +4331,12 @@ done:
copy_iov:
iov_iter_restore(&s->iter, &s->iter_state);
ret = io_setup_async_rw(req, iovec, s, false);
- return ret ?: -EAGAIN;
+ if (!ret) {
+ if (kiocb->ki_flags & IOCB_WRITE)
+ kiocb_end_write(req);
+ return -EAGAIN;
+ }
+ return ret;
}
out_free:
/* it's reportedly faster than delegating the null check to kfree() */


2022-08-29 12:11:32

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 021/158] mm/smaps: dont access young/dirty bit if pte unpresent

From: Peter Xu <[email protected]>

[ Upstream commit efd4149342db2df41b1bbe68972ead853b30e444 ]

These bits should only be valid when the ptes are present. Introducing
two booleans for it and set it to false when !pte_present() for both pte
and pmd accountings.

The bug is found during code reading and no real world issue reported, but
logically such an error can cause incorrect readings for either smaps or
smaps_rollup output on quite a few fields.

For example, it could cause over-estimate on values like Shared_Dirty,
Private_Dirty, Referenced. Or it could also cause under-estimate on
values like LazyFree, Shared_Clean, Private_Clean.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: b1d4d9e0cbd0 ("proc/smaps: carefully handle migration entries")
Fixes: c94b6923fa0a ("/proc/PID/smaps: Add PMD migration entry parsing")
Signed-off-by: Peter Xu <[email protected]>
Reviewed-by: Vlastimil Babka <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Yang Shi <[email protected]>
Cc: Konstantin Khlebnikov <[email protected]>
Cc: Huang Ying <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
fs/proc/task_mmu.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 2d04e3470d4cd..313788bc0c307 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -525,10 +525,12 @@ static void smaps_pte_entry(pte_t *pte, unsigned long addr,
struct vm_area_struct *vma = walk->vma;
bool locked = !!(vma->vm_flags & VM_LOCKED);
struct page *page = NULL;
- bool migration = false;
+ bool migration = false, young = false, dirty = false;

if (pte_present(*pte)) {
page = vm_normal_page(vma, addr, *pte);
+ young = pte_young(*pte);
+ dirty = pte_dirty(*pte);
} else if (is_swap_pte(*pte)) {
swp_entry_t swpent = pte_to_swp_entry(*pte);

@@ -558,8 +560,7 @@ static void smaps_pte_entry(pte_t *pte, unsigned long addr,
if (!page)
return;

- smaps_account(mss, page, false, pte_young(*pte), pte_dirty(*pte),
- locked, migration);
+ smaps_account(mss, page, false, young, dirty, locked, migration);
}

#ifdef CONFIG_TRANSPARENT_HUGEPAGE
--
2.35.1



2022-08-29 12:11:47

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 030/158] net/mlx5: LAG, fix logic over MLX5_LAG_FLAG_NDEVS_READY

From: Eli Cohen <[email protected]>

[ Upstream commit a6e675a66175869b7d87c0e1dd0ddf93e04f8098 ]

Only set MLX5_LAG_FLAG_NDEVS_READY if both netdevices are registered.
Doing so guarantees that both ldev->pf[MLX5_LAG_P0].dev and
ldev->pf[MLX5_LAG_P1].dev have valid pointers when
MLX5_LAG_FLAG_NDEVS_READY is set.

The core issue is asymmetry in setting MLX5_LAG_FLAG_NDEVS_READY and
clearing it. Setting it is done wrongly when both
ldev->pf[MLX5_LAG_P0].dev and ldev->pf[MLX5_LAG_P1].dev are set;
clearing it is done right when either of ldev->pf[i].netdev is cleared.

Consider the following scenario:
1. PF0 loads and sets ldev->pf[MLX5_LAG_P0].dev to a valid pointer
2. PF1 loads and sets both ldev->pf[MLX5_LAG_P1].dev and
ldev->pf[MLX5_LAG_P1].netdev with valid pointers. This results in
MLX5_LAG_FLAG_NDEVS_READY is set.
3. PF0 is unloaded before setting dev->pf[MLX5_LAG_P0].netdev.
MLX5_LAG_FLAG_NDEVS_READY remains set.

Further execution of mlx5_do_bond() will result in null pointer
dereference when calling mlx5_lag_is_multipath()

This patch fixes the following call trace actually encountered:

[ 1293.475195] BUG: kernel NULL pointer dereference, address: 00000000000009a8
[ 1293.478756] #PF: supervisor read access in kernel mode
[ 1293.481320] #PF: error_code(0x0000) - not-present page
[ 1293.483686] PGD 0 P4D 0
[ 1293.484434] Oops: 0000 [#1] SMP PTI
[ 1293.485377] CPU: 1 PID: 23690 Comm: kworker/u16:2 Not tainted 5.18.0-rc5_for_upstream_min_debug_2022_05_05_10_13 #1
[ 1293.488039] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 1293.490836] Workqueue: mlx5_lag mlx5_do_bond_work [mlx5_core]
[ 1293.492448] RIP: 0010:mlx5_lag_is_multipath+0x5/0x50 [mlx5_core]
[ 1293.494044] Code: e8 70 40 ff e0 48 8b 14 24 48 83 05 5c 1a 1b 00 01 e9 19 ff ff ff 48 83 05 47 1a 1b 00 01 eb d7 0f 1f 44 00 00 0f 1f 44 00 00 <48> 8b 87 a8 09 00 00 48 85 c0 74 26 48 83 05 a7 1b 1b 00 01 41 b8
[ 1293.498673] RSP: 0018:ffff88811b2fbe40 EFLAGS: 00010202
[ 1293.500152] RAX: ffff88818a94e1c0 RBX: ffff888165eca6c0 RCX: 0000000000000000
[ 1293.501841] RDX: 0000000000000001 RSI: ffff88818a94e1c0 RDI: 0000000000000000
[ 1293.503585] RBP: 0000000000000000 R08: ffff888119886740 R09: ffff888165eca73c
[ 1293.505286] R10: 0000000000000018 R11: 0000000000000018 R12: ffff88818a94e1c0
[ 1293.506979] R13: ffff888112729800 R14: 0000000000000000 R15: ffff888112729858
[ 1293.508753] FS: 0000000000000000(0000) GS:ffff88852cc40000(0000) knlGS:0000000000000000
[ 1293.510782] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1293.512265] CR2: 00000000000009a8 CR3: 00000001032d4002 CR4: 0000000000370ea0
[ 1293.514001] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1293.515806] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

Fixes: 8a66e4585979 ("net/mlx5: Change ownership model for lag")
Signed-off-by: Eli Cohen <[email protected]>
Reviewed-by: Maor Dickman <[email protected]>
Reviewed-by: Mark Bloch <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
index 5d41e19378e09..c520edb942ca5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
@@ -1234,7 +1234,7 @@ void mlx5_lag_add_netdev(struct mlx5_core_dev *dev,
mlx5_ldev_add_netdev(ldev, dev, netdev);

for (i = 0; i < ldev->ports; i++)
- if (!ldev->pf[i].dev)
+ if (!ldev->pf[i].netdev)
break;

if (i >= ldev->ports)
--
2.35.1



2022-08-29 12:11:47

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 135/158] Revert "md-raid: destroy the bitmap after destroying the thread"

From: Guoqing Jiang <[email protected]>

commit 1d258758cf06a0734482989911d184dd5837ed4e upstream.

This reverts commit e151db8ecfb019b7da31d076130a794574c89f6f. Because it
obviously breaks clustered raid as noticed by Neil though it fixed KASAN
issue for dm-raid, let's revert it and fix KASAN issue in next commit.

[1]. https://lore.kernel.org/linux-raid/[email protected]/T/#m95ac225cab7409f66c295772483d091084a6d470

Fixes: e151db8ecfb0 ("md-raid: destroy the bitmap after destroying the thread")
Signed-off-by: Guoqing Jiang <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/md/md.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -6244,11 +6244,11 @@ static void mddev_detach(struct mddev *m
static void __md_stop(struct mddev *mddev)
{
struct md_personality *pers = mddev->pers;
+ md_bitmap_destroy(mddev);
mddev_detach(mddev);
/* Ensure ->event_work is done */
if (mddev->event_work.func)
flush_workqueue(md_misc_wq);
- md_bitmap_destroy(mddev);
spin_lock(&mddev->lock);
mddev->pers = NULL;
spin_unlock(&mddev->lock);


2022-08-29 12:11:54

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 140/158] drm/amdkfd: Fix isa version for the GC 10.3.7

From: Prike Liang <[email protected]>

commit ee8086dbc1585d9f4020a19447388246a5cff5c8 upstream.

Correct the isa version for handling KFD test.

Fixes: 7c4f4f197e0c ("drm/amdkfd: Add GC 10.3.6 and 10.3.7 KFD definitions")
Signed-off-by: Prike Liang <[email protected]>
Reviewed-by: Aaron Liu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/gpu/drm/amd/amdkfd/kfd_device.c | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)

--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -377,12 +377,8 @@ struct kfd_dev *kgd2kfd_probe(struct amd
f2g = &gfx_v10_3_kfd2kgd;
break;
case IP_VERSION(10, 3, 6):
- gfx_target_version = 100306;
- if (!vf)
- f2g = &gfx_v10_3_kfd2kgd;
- break;
case IP_VERSION(10, 3, 7):
- gfx_target_version = 100307;
+ gfx_target_version = 100306;
if (!vf)
f2g = &gfx_v10_3_kfd2kgd;
break;


2022-08-29 12:11:54

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 102/158] perf/x86/intel: Fix pebs event constraints for ADL

From: Kan Liang <[email protected]>

commit cde643ff75bc20c538dfae787ca3b587bab16b50 upstream.

According to the latest event list, the LOAD_LATENCY PEBS event only
works on the GP counter 0 and 1 for ADL and RPL.

Update the pebs event constraints table.

Fixes: f83d2f91d259 ("perf/x86/intel: Add Alder Lake Hybrid support")
Reported-by: Ammy Yi <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/events/intel/ds.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -822,7 +822,7 @@ struct event_constraint intel_glm_pebs_e

struct event_constraint intel_grt_pebs_event_constraints[] = {
/* Allow all events as PEBS with no flags */
- INTEL_HYBRID_LAT_CONSTRAINT(0x5d0, 0xf),
+ INTEL_HYBRID_LAT_CONSTRAINT(0x5d0, 0x3),
INTEL_HYBRID_LAT_CONSTRAINT(0x6d0, 0xf),
EVENT_CONSTRAINT_END
};


2022-08-29 12:11:58

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 108/158] x86/nospec: Unwreck the RSB stuffing

From: Peter Zijlstra <[email protected]>

commit 4e3aa9238277597c6c7624f302d81a7b568b6f2d upstream.

Commit 2b1299322016 ("x86/speculation: Add RSB VM Exit protections")
made a right mess of the RSB stuffing, rewrite the whole thing to not
suck.

Thanks to Andrew for the enlightening comment about Post-Barrier RSB
things so we can make this code less magical.

Cc: [email protected]
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/include/asm/nospec-branch.h | 80 +++++++++++++++++------------------
1 file changed, 39 insertions(+), 41 deletions(-)

--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -35,33 +35,44 @@
#define RSB_CLEAR_LOOPS 32 /* To forcibly overwrite all entries */

/*
+ * Common helper for __FILL_RETURN_BUFFER and __FILL_ONE_RETURN.
+ */
+#define __FILL_RETURN_SLOT \
+ ANNOTATE_INTRA_FUNCTION_CALL; \
+ call 772f; \
+ int3; \
+772:
+
+/*
+ * Stuff the entire RSB.
+ *
* Google experimented with loop-unrolling and this turned out to be
* the optimal version - two calls, each with their own speculation
* trap should their return address end up getting used, in a loop.
*/
-#define __FILL_RETURN_BUFFER(reg, nr, sp) \
- mov $(nr/2), reg; \
-771: \
- ANNOTATE_INTRA_FUNCTION_CALL; \
- call 772f; \
-773: /* speculation trap */ \
- UNWIND_HINT_EMPTY; \
- pause; \
- lfence; \
- jmp 773b; \
-772: \
- ANNOTATE_INTRA_FUNCTION_CALL; \
- call 774f; \
-775: /* speculation trap */ \
- UNWIND_HINT_EMPTY; \
- pause; \
- lfence; \
- jmp 775b; \
-774: \
- add $(BITS_PER_LONG/8) * 2, sp; \
- dec reg; \
- jnz 771b; \
- /* barrier for jnz misprediction */ \
+#define __FILL_RETURN_BUFFER(reg, nr) \
+ mov $(nr/2), reg; \
+771: \
+ __FILL_RETURN_SLOT \
+ __FILL_RETURN_SLOT \
+ add $(BITS_PER_LONG/8) * 2, %_ASM_SP; \
+ dec reg; \
+ jnz 771b; \
+ /* barrier for jnz misprediction */ \
+ lfence;
+
+/*
+ * Stuff a single RSB slot.
+ *
+ * To mitigate Post-Barrier RSB speculation, one CALL instruction must be
+ * forced to retire before letting a RET instruction execute.
+ *
+ * On PBRSB-vulnerable CPUs, it is not safe for a RET to be executed
+ * before this point.
+ */
+#define __FILL_ONE_RETURN \
+ __FILL_RETURN_SLOT \
+ add $(BITS_PER_LONG/8), %_ASM_SP; \
lfence;

#ifdef __ASSEMBLY__
@@ -120,28 +131,15 @@
#endif
.endm

-.macro ISSUE_UNBALANCED_RET_GUARD
- ANNOTATE_INTRA_FUNCTION_CALL
- call .Lunbalanced_ret_guard_\@
- int3
-.Lunbalanced_ret_guard_\@:
- add $(BITS_PER_LONG/8), %_ASM_SP
- lfence
-.endm
-
/*
* A simpler FILL_RETURN_BUFFER macro. Don't make people use the CPP
* monstrosity above, manually.
*/
-.macro FILL_RETURN_BUFFER reg:req nr:req ftr:req ftr2
-.ifb \ftr2
- ALTERNATIVE "jmp .Lskip_rsb_\@", "", \ftr
-.else
- ALTERNATIVE_2 "jmp .Lskip_rsb_\@", "", \ftr, "jmp .Lunbalanced_\@", \ftr2
-.endif
- __FILL_RETURN_BUFFER(\reg,\nr,%_ASM_SP)
-.Lunbalanced_\@:
- ISSUE_UNBALANCED_RET_GUARD
+.macro FILL_RETURN_BUFFER reg:req nr:req ftr:req ftr2=ALT_NOT(X86_FEATURE_ALWAYS)
+ ALTERNATIVE_2 "jmp .Lskip_rsb_\@", \
+ __stringify(__FILL_RETURN_BUFFER(\reg,\nr)), \ftr, \
+ __stringify(__FILL_ONE_RETURN), \ftr2
+
.Lskip_rsb_\@:
.endm



2022-08-29 12:11:59

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 009/158] mm/uffd: reset write protection when unregister with wp-mode

From: Peter Xu <[email protected]>

commit f369b07c861435bd812a9d14493f71b34132ed6f upstream.

The motivation of this patch comes from a recent report and patchfix from
David Hildenbrand on hugetlb shared handling of wr-protected page [1].

With the reproducer provided in commit message of [1], one can leverage
the uffd-wp lazy-reset of ptes to trigger a hugetlb issue which can affect
not only the attacker process, but also the whole system.

The lazy-reset mechanism of uffd-wp was used to make unregister faster,
meanwhile it has an assumption that any leftover pgtable entries should
only affect the process on its own, so not only the user should be aware
of anything it does, but also it should not affect outside of the process.

But it seems that this is not true, and it can also be utilized to make
some exploit easier.

So far there's no clue showing that the lazy-reset is important to any
userfaultfd users because normally the unregister will only happen once
for a specific range of memory of the lifecycle of the process.

Considering all above, what this patch proposes is to do explicit pte
resets when unregister an uffd region with wr-protect mode enabled.

It should be the same as calling ioctl(UFFDIO_WRITEPROTECT, wp=false)
right before ioctl(UFFDIO_UNREGISTER) for the user. So potentially it'll
make the unregister slower. From that pov it's a very slight abi change,
but hopefully nothing should break with this change either.

Regarding to the change itself - core of uffd write [un]protect operation
is moved into a separate function (uffd_wp_range()) and it is reused in
the unregister code path.

Note that the new function will not check for anything, e.g. ranges or
memory types, because they should have been checked during the previous
UFFDIO_REGISTER or it should have failed already. It also doesn't check
mmap_changing because we're with mmap write lock held anyway.

I added a Fixes upon introducing of uffd-wp shmem+hugetlbfs because that's
the only issue reported so far and that's the commit David's reproducer
will start working (v5.19+). But the whole idea actually applies to not
only file memories but also anonymous. It's just that we don't need to
fix anonymous prior to v5.19- because there's no known way to exploit.

IOW, this patch can also fix the issue reported in [1] as the patch 2 does.

[1] https://lore.kernel.org/all/[email protected]/

Link: https://lkml.kernel.org/r/[email protected]
Fixes: b1f9e876862d ("mm/uffd: enable write protection for shmem & hugetlbfs")
Signed-off-by: Peter Xu <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Nadav Amit <[email protected]>
Cc: Axel Rasmussen <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/userfaultfd.c | 4 ++++
include/linux/userfaultfd_k.h | 2 ++
mm/userfaultfd.c | 29 ++++++++++++++++++-----------
3 files changed, 24 insertions(+), 11 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 1c44bf75f916..175de70e3adf 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -1601,6 +1601,10 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
wake_userfault(vma->vm_userfaultfd_ctx.ctx, &range);
}

+ /* Reset ptes for the whole vma range if wr-protected */
+ if (userfaultfd_wp(vma))
+ uffd_wp_range(mm, vma, start, vma_end - start, false);
+
new_flags = vma->vm_flags & ~__VM_UFFD_FLAGS;
prev = vma_merge(mm, prev, start, vma_end, new_flags,
vma->anon_vma, vma->vm_file, vma->vm_pgoff,
diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
index 732b522bacb7..e1b8a915e9e9 100644
--- a/include/linux/userfaultfd_k.h
+++ b/include/linux/userfaultfd_k.h
@@ -73,6 +73,8 @@ extern ssize_t mcopy_continue(struct mm_struct *dst_mm, unsigned long dst_start,
extern int mwriteprotect_range(struct mm_struct *dst_mm,
unsigned long start, unsigned long len,
bool enable_wp, atomic_t *mmap_changing);
+extern void uffd_wp_range(struct mm_struct *dst_mm, struct vm_area_struct *vma,
+ unsigned long start, unsigned long len, bool enable_wp);

/* mm helpers */
static inline bool is_mergeable_vm_userfaultfd_ctx(struct vm_area_struct *vma,
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 07d3befc80e4..7327b2573f7c 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -703,14 +703,29 @@ ssize_t mcopy_continue(struct mm_struct *dst_mm, unsigned long start,
mmap_changing, 0);
}

+void uffd_wp_range(struct mm_struct *dst_mm, struct vm_area_struct *dst_vma,
+ unsigned long start, unsigned long len, bool enable_wp)
+{
+ struct mmu_gather tlb;
+ pgprot_t newprot;
+
+ if (enable_wp)
+ newprot = vm_get_page_prot(dst_vma->vm_flags & ~(VM_WRITE));
+ else
+ newprot = vm_get_page_prot(dst_vma->vm_flags);
+
+ tlb_gather_mmu(&tlb, dst_mm);
+ change_protection(&tlb, dst_vma, start, start + len, newprot,
+ enable_wp ? MM_CP_UFFD_WP : MM_CP_UFFD_WP_RESOLVE);
+ tlb_finish_mmu(&tlb);
+}
+
int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start,
unsigned long len, bool enable_wp,
atomic_t *mmap_changing)
{
struct vm_area_struct *dst_vma;
unsigned long page_mask;
- struct mmu_gather tlb;
- pgprot_t newprot;
int err;

/*
@@ -750,15 +765,7 @@ int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start,
goto out_unlock;
}

- if (enable_wp)
- newprot = vm_get_page_prot(dst_vma->vm_flags & ~(VM_WRITE));
- else
- newprot = vm_get_page_prot(dst_vma->vm_flags);
-
- tlb_gather_mmu(&tlb, dst_mm);
- change_protection(&tlb, dst_vma, start, start + len, newprot,
- enable_wp ? MM_CP_UFFD_WP : MM_CP_UFFD_WP_RESOLVE);
- tlb_finish_mmu(&tlb);
+ uffd_wp_range(dst_mm, dst_vma, start, len, enable_wp);

err = 0;
out_unlock:
--
2.37.2



2022-08-29 12:12:05

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 099/158] btrfs: fix possible memory leak in btrfs_get_dev_args_from_path()

From: Zixuan Fu <[email protected]>

commit 9ea0106a7a3d8116860712e3f17cd52ce99f6707 upstream.

In btrfs_get_dev_args_from_path(), btrfs_get_bdev_and_sb() can fail if
the path is invalid. In this case, btrfs_get_dev_args_from_path()
returns directly without freeing args->uuid and args->fsid allocated
before, which causes memory leak.

To fix these possible leaks, when btrfs_get_bdev_and_sb() fails,
btrfs_put_dev_args_from_path() is called to clean up the memory.

Reported-by: TOTE Robot <[email protected]>
Fixes: faa775c41d655 ("btrfs: add a btrfs_get_dev_args_from_path helper")
CC: [email protected] # 5.16
Reviewed-by: Boris Burkov <[email protected]>
Signed-off-by: Zixuan Fu <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/btrfs/volumes.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2344,8 +2344,11 @@ int btrfs_get_dev_args_from_path(struct

ret = btrfs_get_bdev_and_sb(path, FMODE_READ, fs_info->bdev_holder, 0,
&bdev, &disk_super);
- if (ret)
+ if (ret) {
+ btrfs_put_dev_args_from_path(args);
return ret;
+ }
+
args->devid = btrfs_stack_device_id(&disk_super->dev_item);
memcpy(args->uuid, disk_super->dev_item.uuid, BTRFS_UUID_SIZE);
if (btrfs_fs_incompat(fs_info, METADATA_UUID))


2022-08-29 12:12:21

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 063/158] netfilter: flowtable: add function to invoke garbage collection immediately

From: Pablo Neira Ayuso <[email protected]>

[ Upstream commit 759eebbcfafcefa23b59e912396306543764bd3c ]

Expose nf_flow_table_gc_run() to force a garbage collector run from the
offload infrastructure.

Signed-off-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
include/net/netfilter/nf_flow_table.h | 1 +
net/netfilter/nf_flow_table_core.c | 12 +++++++++---
2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h
index 64daafd1fc41c..32c25122ab184 100644
--- a/include/net/netfilter/nf_flow_table.h
+++ b/include/net/netfilter/nf_flow_table.h
@@ -270,6 +270,7 @@ void flow_offload_refresh(struct nf_flowtable *flow_table,

struct flow_offload_tuple_rhash *flow_offload_lookup(struct nf_flowtable *flow_table,
struct flow_offload_tuple *tuple);
+void nf_flow_table_gc_run(struct nf_flowtable *flow_table);
void nf_flow_table_gc_cleanup(struct nf_flowtable *flowtable,
struct net_device *dev);
void nf_flow_table_cleanup(struct net_device *dev);
diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
index f2def06d10709..18453fa25199c 100644
--- a/net/netfilter/nf_flow_table_core.c
+++ b/net/netfilter/nf_flow_table_core.c
@@ -442,12 +442,17 @@ static void nf_flow_offload_gc_step(struct nf_flowtable *flow_table,
}
}

+void nf_flow_table_gc_run(struct nf_flowtable *flow_table)
+{
+ nf_flow_table_iterate(flow_table, nf_flow_offload_gc_step, NULL);
+}
+
static void nf_flow_offload_work_gc(struct work_struct *work)
{
struct nf_flowtable *flow_table;

flow_table = container_of(work, struct nf_flowtable, gc_work.work);
- nf_flow_table_iterate(flow_table, nf_flow_offload_gc_step, NULL);
+ nf_flow_table_gc_run(flow_table);
queue_delayed_work(system_power_efficient_wq, &flow_table->gc_work, HZ);
}

@@ -606,10 +611,11 @@ void nf_flow_table_free(struct nf_flowtable *flow_table)

cancel_delayed_work_sync(&flow_table->gc_work);
nf_flow_table_iterate(flow_table, nf_flow_table_do_cleanup, NULL);
- nf_flow_table_iterate(flow_table, nf_flow_offload_gc_step, NULL);
+ nf_flow_table_gc_run(flow_table);
nf_flow_table_offload_flush(flow_table);
if (nf_flowtable_hw_offload(flow_table))
- nf_flow_table_iterate(flow_table, nf_flow_offload_gc_step, NULL);
+ nf_flow_table_gc_run(flow_table);
+
rhashtable_destroy(&flow_table->rhashtable);
}
EXPORT_SYMBOL_GPL(nf_flow_table_free);
--
2.35.1



2022-08-29 12:12:21

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 113/158] asm-generic: sections: refactor memory_intersects

From: Quanyang Wang <[email protected]>

commit 0c7d7cc2b4fe2e74ef8728f030f0f1674f9f6aee upstream.

There are two problems with the current code of memory_intersects:

First, it doesn't check whether the region (begin, end) falls inside the
region (virt, vend), that is (virt < begin && vend > end).

The second problem is if vend is equal to begin, it will return true but
this is wrong since vend (virt + size) is not the last address of the
memory region but (virt + size -1) is. The wrong determination will
trigger the misreporting when the function check_for_illegal_area calls
memory_intersects to check if the dma region intersects with stext region.

The misreporting is as below (stext is at 0x80100000):
WARNING: CPU: 0 PID: 77 at kernel/dma/debug.c:1073 check_for_illegal_area+0x130/0x168
DMA-API: chipidea-usb2 e0002000.usb: device driver maps memory from kernel text or rodata [addr=800f0000] [len=65536]
Modules linked in:
CPU: 1 PID: 77 Comm: usb-storage Not tainted 5.19.0-yocto-standard #5
Hardware name: Xilinx Zynq Platform
unwind_backtrace from show_stack+0x18/0x1c
show_stack from dump_stack_lvl+0x58/0x70
dump_stack_lvl from __warn+0xb0/0x198
__warn from warn_slowpath_fmt+0x80/0xb4
warn_slowpath_fmt from check_for_illegal_area+0x130/0x168
check_for_illegal_area from debug_dma_map_sg+0x94/0x368
debug_dma_map_sg from __dma_map_sg_attrs+0x114/0x128
__dma_map_sg_attrs from dma_map_sg_attrs+0x18/0x24
dma_map_sg_attrs from usb_hcd_map_urb_for_dma+0x250/0x3b4
usb_hcd_map_urb_for_dma from usb_hcd_submit_urb+0x194/0x214
usb_hcd_submit_urb from usb_sg_wait+0xa4/0x118
usb_sg_wait from usb_stor_bulk_transfer_sglist+0xa0/0xec
usb_stor_bulk_transfer_sglist from usb_stor_bulk_srb+0x38/0x70
usb_stor_bulk_srb from usb_stor_Bulk_transport+0x150/0x360
usb_stor_Bulk_transport from usb_stor_invoke_transport+0x38/0x440
usb_stor_invoke_transport from usb_stor_control_thread+0x1e0/0x238
usb_stor_control_thread from kthread+0xf8/0x104
kthread from ret_from_fork+0x14/0x2c

Refactor memory_intersects to fix the two problems above.

Before the 1d7db834a027e ("dma-debug: use memory_intersects()
directly"), memory_intersects is called only by printk_late_init:

printk_late_init -> init_section_intersects ->memory_intersects.

There were few places where memory_intersects was called.

When commit 1d7db834a027e ("dma-debug: use memory_intersects()
directly") was merged and CONFIG_DMA_API_DEBUG is enabled, the DMA
subsystem uses it to check for an illegal area and the calltrace above
is triggered.

[[email protected]: fix nearby comment typo]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 979559362516 ("asm/sections: add helpers to check for section data")
Signed-off-by: Quanyang Wang <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Thierry Reding <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/asm-generic/sections.h | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

--- a/include/asm-generic/sections.h
+++ b/include/asm-generic/sections.h
@@ -97,7 +97,7 @@ static inline bool memory_contains(void
/**
* memory_intersects - checks if the region occupied by an object intersects
* with another memory region
- * @begin: virtual address of the beginning of the memory regien
+ * @begin: virtual address of the beginning of the memory region
* @end: virtual address of the end of the memory region
* @virt: virtual address of the memory object
* @size: size of the memory object
@@ -110,7 +110,10 @@ static inline bool memory_intersects(voi
{
void *vend = virt + size;

- return (virt >= begin && virt < end) || (vend >= begin && vend < end);
+ if (virt < end && vend > begin)
+ return true;
+
+ return false;
}

/**


2022-08-29 12:12:21

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 081/158] net: Fix a data-race around sysctl_somaxconn.

From: Kuniyuki Iwashima <[email protected]>

[ Upstream commit 3c9ba81d72047f2e81bb535d42856517b613aba7 ]

While reading sysctl_somaxconn, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/socket.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/socket.c b/net/socket.c
index 96300cdc06251..34102aa4ab0a6 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -1801,7 +1801,7 @@ int __sys_listen(int fd, int backlog)

sock = sockfd_lookup_light(fd, &err, &fput_needed);
if (sock) {
- somaxconn = sock_net(sock->sk)->core.sysctl_somaxconn;
+ somaxconn = READ_ONCE(sock_net(sock->sk)->core.sysctl_somaxconn);
if ((unsigned int)backlog > somaxconn)
backlog = somaxconn;

--
2.35.1



2022-08-29 12:12:32

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 080/158] net: Fix a data-race around netdev_unregister_timeout_secs.

From: Kuniyuki Iwashima <[email protected]>

[ Upstream commit 05e49cfc89e4f325eebbc62d24dd122e55f94c23 ]

While reading netdev_unregister_timeout_secs, it can be changed
concurrently. Thus, we need to add READ_ONCE() to its reader.

Fixes: 5aa3afe107d9 ("net: make unregister netdev warning timeout configurable")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Acked-by: Dmitry Vyukov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/core/dev.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 19baeaf65a646..a77a979a4bf75 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -10265,7 +10265,7 @@ static struct net_device *netdev_wait_allrefs_any(struct list_head *list)
return dev;

if (time_after(jiffies, warning_time +
- netdev_unregister_timeout_secs * HZ)) {
+ READ_ONCE(netdev_unregister_timeout_secs) * HZ)) {
list_for_each_entry(dev, list, todo_list) {
pr_emerg("unregister_netdevice: waiting for %s to become free. Usage count = %d\n",
dev->name, netdev_refcnt_read(dev));
--
2.35.1



2022-08-29 12:12:35

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 078/158] net: Fix data-races around sysctl_devconf_inherit_init_net.

From: Kuniyuki Iwashima <[email protected]>

[ Upstream commit a5612ca10d1aa05624ebe72633e0c8c792970833 ]

While reading sysctl_devconf_inherit_init_net, it can be changed
concurrently. Thus, we need to add READ_ONCE() to its readers.

Fixes: 856c395cfa63 ("net: introduce a knob to control whether to inherit devconf config")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
include/linux/netdevice.h | 9 +++++++++
net/ipv4/devinet.c | 16 ++++++++++------
net/ipv6/addrconf.c | 5 ++---
3 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 78dd63a5c7c80..db40bc62213bd 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -650,6 +650,15 @@ static inline bool net_has_fallback_tunnels(const struct net *net)
#endif
}

+static inline int net_inherit_devconf(void)
+{
+#if IS_ENABLED(CONFIG_SYSCTL)
+ return READ_ONCE(sysctl_devconf_inherit_init_net);
+#else
+ return 0;
+#endif
+}
+
static inline int netdev_queue_numa_node_read(const struct netdev_queue *q)
{
#if defined(CONFIG_XPS) && defined(CONFIG_NUMA)
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index b2366ad540e62..787a44e3222db 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -2682,23 +2682,27 @@ static __net_init int devinet_init_net(struct net *net)
#endif

if (!net_eq(net, &init_net)) {
- if (IS_ENABLED(CONFIG_SYSCTL) &&
- sysctl_devconf_inherit_init_net == 3) {
+ switch (net_inherit_devconf()) {
+ case 3:
/* copy from the current netns */
memcpy(all, current->nsproxy->net_ns->ipv4.devconf_all,
sizeof(ipv4_devconf));
memcpy(dflt,
current->nsproxy->net_ns->ipv4.devconf_dflt,
sizeof(ipv4_devconf_dflt));
- } else if (!IS_ENABLED(CONFIG_SYSCTL) ||
- sysctl_devconf_inherit_init_net != 2) {
- /* inherit == 0 or 1: copy from init_net */
+ break;
+ case 0:
+ case 1:
+ /* copy from init_net */
memcpy(all, init_net.ipv4.devconf_all,
sizeof(ipv4_devconf));
memcpy(dflt, init_net.ipv4.devconf_dflt,
sizeof(ipv4_devconf_dflt));
+ break;
+ case 2:
+ /* use compiled values */
+ break;
}
- /* else inherit == 2: use compiled values */
}

#ifdef CONFIG_SYSCTL
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 49cc6587dd771..b738eb7e1cae8 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -7158,9 +7158,8 @@ static int __net_init addrconf_init_net(struct net *net)
if (!dflt)
goto err_alloc_dflt;

- if (IS_ENABLED(CONFIG_SYSCTL) &&
- !net_eq(net, &init_net)) {
- switch (sysctl_devconf_inherit_init_net) {
+ if (!net_eq(net, &init_net)) {
+ switch (net_inherit_devconf()) {
case 1: /* copy from init_net */
memcpy(all, init_net.ipv6.devconf_all,
sizeof(ipv6_devconf));
--
2.35.1



2022-08-29 12:12:44

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 027/158] ice: xsk: prohibit usage of non-balanced queue id

From: Maciej Fijalkowski <[email protected]>

[ Upstream commit 5a42f112d367bb4700a8a41f5c12724fde6bfbb9 ]

Fix the following scenario:
1. ethtool -L $IFACE rx 8 tx 96
2. xdpsock -q 10 -t -z

Above refers to a case where user would like to attach XSK socket in
txonly mode at a queue id that does not have a corresponding Rx queue.
At this moment ice's XSK logic is tightly bound to act on a "queue pair",
e.g. both Tx and Rx queues at a given queue id are disabled/enabled and
both of them will get XSK pool assigned, which is broken for the presented
queue configuration. This results in the splat included at the bottom,
which is basically an OOB access to Rx ring array.

To fix this, allow using the ids only in scope of "combined" queues
reported by ethtool. However, logic should be rewritten to allow such
configurations later on, which would end up as a complete rewrite of the
control path, so let us go with this temporary fix.

[420160.558008] BUG: kernel NULL pointer dereference, address: 0000000000000082
[420160.566359] #PF: supervisor read access in kernel mode
[420160.572657] #PF: error_code(0x0000) - not-present page
[420160.579002] PGD 0 P4D 0
[420160.582756] Oops: 0000 [#1] PREEMPT SMP NOPTI
[420160.588396] CPU: 10 PID: 21232 Comm: xdpsock Tainted: G OE 5.19.0-rc7+ #10
[420160.597893] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
[420160.609894] RIP: 0010:ice_xsk_pool_setup+0x44/0x7d0 [ice]
[420160.616968] Code: f3 48 83 ec 40 48 8b 4f 20 48 8b 3f 65 48 8b 04 25 28 00 00 00 48 89 44 24 38 31 c0 48 8d 04 ed 00 00 00 00 48 01 c1 48 8b 11 <0f> b7 92 82 00 00 00 48 85 d2 0f 84 2d 75 00 00 48 8d 72 ff 48 85
[420160.639421] RSP: 0018:ffffc9002d2afd48 EFLAGS: 00010282
[420160.646650] RAX: 0000000000000050 RBX: ffff88811d8bdd00 RCX: ffff888112c14ff8
[420160.655893] RDX: 0000000000000000 RSI: ffff88811d8bdd00 RDI: ffff888109861000
[420160.665166] RBP: 000000000000000a R08: 000000000000000a R09: 0000000000000000
[420160.674493] R10: 000000000000889f R11: 0000000000000000 R12: 000000000000000a
[420160.683833] R13: 000000000000000a R14: 0000000000000000 R15: ffff888117611828
[420160.693211] FS: 00007fa869fc1f80(0000) GS:ffff8897e0880000(0000) knlGS:0000000000000000
[420160.703645] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[420160.711783] CR2: 0000000000000082 CR3: 00000001d076c001 CR4: 00000000007706e0
[420160.721399] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[420160.731045] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[420160.740707] PKRU: 55555554
[420160.745960] Call Trace:
[420160.750962] <TASK>
[420160.755597] ? kmalloc_large_node+0x79/0x90
[420160.762703] ? __kmalloc_node+0x3f5/0x4b0
[420160.769341] xp_assign_dev+0xfd/0x210
[420160.775661] ? shmem_file_read_iter+0x29a/0x420
[420160.782896] xsk_bind+0x152/0x490
[420160.788943] __sys_bind+0xd0/0x100
[420160.795097] ? exit_to_user_mode_prepare+0x20/0x120
[420160.802801] __x64_sys_bind+0x16/0x20
[420160.809298] do_syscall_64+0x38/0x90
[420160.815741] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[420160.823731] RIP: 0033:0x7fa86a0dd2fb
[420160.830264] Code: c3 66 0f 1f 44 00 00 48 8b 15 69 8b 0c 00 f7 d8 64 89 02 b8 ff ff ff ff eb bc 0f 1f 44 00 00 f3 0f 1e fa b8 31 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3d 8b 0c 00 f7 d8 64 89 01 48
[420160.855410] RSP: 002b:00007ffc1146f618 EFLAGS: 00000246 ORIG_RAX: 0000000000000031
[420160.866366] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fa86a0dd2fb
[420160.876957] RDX: 0000000000000010 RSI: 00007ffc1146f680 RDI: 0000000000000003
[420160.887604] RBP: 000055d7113a0520 R08: 00007fa868fb8000 R09: 0000000080000000
[420160.898293] R10: 0000000000008001 R11: 0000000000000246 R12: 000055d7113a04e0
[420160.909038] R13: 000055d7113a0320 R14: 000000000000000a R15: 0000000000000000
[420160.919817] </TASK>
[420160.925659] Modules linked in: ice(OE) af_packet binfmt_misc nls_iso8859_1 ipmi_ssif intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp mei_me coretemp ioatdma mei ipmi_si wmi ipmi_msghandler acpi_pad acpi_power_meter ip_tables x_tables autofs4 ixgbe i40e crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd ahci mdio dca libahci lpc_ich [last unloaded: ice]
[420160.977576] CR2: 0000000000000082
[420160.985037] ---[ end trace 0000000000000000 ]---
[420161.097724] RIP: 0010:ice_xsk_pool_setup+0x44/0x7d0 [ice]
[420161.107341] Code: f3 48 83 ec 40 48 8b 4f 20 48 8b 3f 65 48 8b 04 25 28 00 00 00 48 89 44 24 38 31 c0 48 8d 04 ed 00 00 00 00 48 01 c1 48 8b 11 <0f> b7 92 82 00 00 00 48 85 d2 0f 84 2d 75 00 00 48 8d 72 ff 48 85
[420161.134741] RSP: 0018:ffffc9002d2afd48 EFLAGS: 00010282
[420161.144274] RAX: 0000000000000050 RBX: ffff88811d8bdd00 RCX: ffff888112c14ff8
[420161.155690] RDX: 0000000000000000 RSI: ffff88811d8bdd00 RDI: ffff888109861000
[420161.168088] RBP: 000000000000000a R08: 000000000000000a R09: 0000000000000000
[420161.179295] R10: 000000000000889f R11: 0000000000000000 R12: 000000000000000a
[420161.190420] R13: 000000000000000a R14: 0000000000000000 R15: ffff888117611828
[420161.201505] FS: 00007fa869fc1f80(0000) GS:ffff8897e0880000(0000) knlGS:0000000000000000
[420161.213628] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[420161.223413] CR2: 0000000000000082 CR3: 00000001d076c001 CR4: 00000000007706e0
[420161.234653] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[420161.245893] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[420161.257052] PKRU: 55555554

Fixes: 2d4238f55697 ("ice: Add support for AF_XDP")
Signed-off-by: Maciej Fijalkowski <[email protected]>
Tested-by: George Kuruvinakunnel <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/ethernet/intel/ice/ice_xsk.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
index 49ba8bfdbf047..45f88e6ec25e8 100644
--- a/drivers/net/ethernet/intel/ice/ice_xsk.c
+++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
@@ -329,6 +329,12 @@ int ice_xsk_pool_setup(struct ice_vsi *vsi, struct xsk_buff_pool *pool, u16 qid)
bool if_running, pool_present = !!pool;
int ret = 0, pool_failure = 0;

+ if (qid >= vsi->num_rxq || qid >= vsi->num_txq) {
+ netdev_err(vsi->netdev, "Please use queue id in scope of combined queues count\n");
+ pool_failure = -EINVAL;
+ goto failure;
+ }
+
if (!is_power_of_2(vsi->rx_rings[qid]->count) ||
!is_power_of_2(vsi->tx_rings[qid]->count)) {
netdev_err(vsi->netdev, "Please align ring sizes to power of 2\n");
--
2.35.1



2022-08-29 12:12:48

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 125/158] nouveau: explicitly wait on the fence in nouveau_bo_move_m2mf

From: Karol Herbst <[email protected]>

commit 6b04ce966a738ecdd9294c9593e48513c0dc90aa upstream.

It is a bit unlcear to us why that's helping, but it does and unbreaks
suspend/resume on a lot of GPUs without any known drawbacks.

Cc: [email protected] # v5.15+
Closes: https://gitlab.freedesktop.org/drm/nouveau/-/issues/156
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Lyude Paul <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/gpu/drm/nouveau/nouveau_bo.c | 9 +++++++++
1 file changed, 9 insertions(+)

--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -820,6 +820,15 @@ nouveau_bo_move_m2mf(struct ttm_buffer_o
if (ret == 0) {
ret = nouveau_fence_new(chan, false, &fence);
if (ret == 0) {
+ /* TODO: figure out a better solution here
+ *
+ * wait on the fence here explicitly as going through
+ * ttm_bo_move_accel_cleanup somehow doesn't seem to do it.
+ *
+ * Without this the operation can timeout and we'll fallback to a
+ * software copy, which might take several minutes to finish.
+ */
+ nouveau_fence_wait(fence, false, false);
ret = ttm_bo_move_accel_cleanup(bo,
&fence->base,
evict, false,


2022-08-29 12:12:48

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 069/158] ratelimit: Fix data-races in ___ratelimit().

From: Kuniyuki Iwashima <[email protected]>

[ Upstream commit 6bae8ceb90ba76cdba39496db936164fa672b9be ]

While reading rs->interval and rs->burst, they can be changed
concurrently via sysctl (e.g. net_ratelimit_state). Thus, we
need to add READ_ONCE() to their readers.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
lib/ratelimit.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/lib/ratelimit.c b/lib/ratelimit.c
index e01a93f46f833..ce945c17980b9 100644
--- a/lib/ratelimit.c
+++ b/lib/ratelimit.c
@@ -26,10 +26,16 @@
*/
int ___ratelimit(struct ratelimit_state *rs, const char *func)
{
+ /* Paired with WRITE_ONCE() in .proc_handler().
+ * Changing two values seperately could be inconsistent
+ * and some message could be lost. (See: net_ratelimit_state).
+ */
+ int interval = READ_ONCE(rs->interval);
+ int burst = READ_ONCE(rs->burst);
unsigned long flags;
int ret;

- if (!rs->interval)
+ if (!interval)
return 1;

/*
@@ -44,7 +50,7 @@ int ___ratelimit(struct ratelimit_state *rs, const char *func)
if (!rs->begin)
rs->begin = jiffies;

- if (time_is_before_jiffies(rs->begin + rs->interval)) {
+ if (time_is_before_jiffies(rs->begin + interval)) {
if (rs->missed) {
if (!(rs->flags & RATELIMIT_MSG_ON_RELEASE)) {
printk_deferred(KERN_WARNING
@@ -56,7 +62,7 @@ int ___ratelimit(struct ratelimit_state *rs, const char *func)
rs->begin = jiffies;
rs->printed = 0;
}
- if (rs->burst && rs->burst > rs->printed) {
+ if (burst && burst > rs->printed) {
rs->printed++;
ret = 1;
} else {
--
2.35.1



2022-08-29 12:12:48

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 144/158] perf/x86/intel/uncore: Fix broken read_counter() for SNB IMC PMU

From: Stephane Eranian <[email protected]>

commit 11745ecfe8fea4b4a4c322967a7605d2ecbd5080 upstream.

Existing code was generating bogus counts for the SNB IMC bandwidth counters:

$ perf stat -a -I 1000 -e uncore_imc/data_reads/,uncore_imc/data_writes/
1.000327813 1,024.03 MiB uncore_imc/data_reads/
1.000327813 20.73 MiB uncore_imc/data_writes/
2.000580153 261,120.00 MiB uncore_imc/data_reads/
2.000580153 23.28 MiB uncore_imc/data_writes/

The problem was introduced by commit:
07ce734dd8ad ("perf/x86/intel/uncore: Clean up client IMC")

Where the read_counter callback was replace to point to the generic
uncore_mmio_read_counter() function.

The SNB IMC counters are freerunnig 32-bit counters laid out contiguously in
MMIO. But uncore_mmio_read_counter() is using a readq() call to read from
MMIO therefore reading 64-bit from MMIO. Although this is okay for the
uncore_perf_event_update() function because it is shifting the value based
on the actual counter width to compute a delta, it is not okay for the
uncore_pmu_event_start() which is simply reading the counter and therefore
priming the event->prev_count with a bogus value which is responsible for
causing bogus deltas in the perf stat command above.

The fix is to reintroduce the custom callback for read_counter for the SNB
IMC PMU and use readl() instead of readq(). With the change the output of
perf stat is back to normal:
$ perf stat -a -I 1000 -e uncore_imc/data_reads/,uncore_imc/data_writes/
1.000120987 296.94 MiB uncore_imc/data_reads/
1.000120987 138.42 MiB uncore_imc/data_writes/
2.000403144 175.91 MiB uncore_imc/data_reads/
2.000403144 68.50 MiB uncore_imc/data_writes/

Fixes: 07ce734dd8ad ("perf/x86/intel/uncore: Clean up client IMC")
Signed-off-by: Stephane Eranian <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/events/intel/uncore_snb.c | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)

--- a/arch/x86/events/intel/uncore_snb.c
+++ b/arch/x86/events/intel/uncore_snb.c
@@ -841,6 +841,22 @@ int snb_pci2phy_map_init(int devid)
return 0;
}

+static u64 snb_uncore_imc_read_counter(struct intel_uncore_box *box, struct perf_event *event)
+{
+ struct hw_perf_event *hwc = &event->hw;
+
+ /*
+ * SNB IMC counters are 32-bit and are laid out back to back
+ * in MMIO space. Therefore we must use a 32-bit accessor function
+ * using readq() from uncore_mmio_read_counter() causes problems
+ * because it is reading 64-bit at a time. This is okay for the
+ * uncore_perf_event_update() function because it drops the upper
+ * 32-bits but not okay for plain uncore_read_counter() as invoked
+ * in uncore_pmu_event_start().
+ */
+ return (u64)readl(box->io_addr + hwc->event_base);
+}
+
static struct pmu snb_uncore_imc_pmu = {
.task_ctx_nr = perf_invalid_context,
.event_init = snb_uncore_imc_event_init,
@@ -860,7 +876,7 @@ static struct intel_uncore_ops snb_uncor
.disable_event = snb_uncore_imc_disable_event,
.enable_event = snb_uncore_imc_enable_event,
.hw_config = snb_uncore_imc_hw_config,
- .read_counter = uncore_mmio_read_counter,
+ .read_counter = snb_uncore_imc_read_counter,
};

static struct intel_uncore_type snb_uncore_imc = {


2022-08-29 12:12:54

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 155/158] riscv: dts: microchip: mpfs: remove ti,fifo-depth property

From: Conor Dooley <[email protected]>

commit 72a05748cbd285567d69f173f8694e3471b79f20 upstream.

Recent versions of dt-schema warn about a previously undetected
undocument property on the icicle & polarberry devicetrees:

arch/riscv/boot/dts/microchip/mpfs-icicle-kit.dtb: ethernet@20112000: ethernet-phy@8: Unevaluated properties are not allowed ('ti,fifo-depth' was unexpected)
From schema: Documentation/devicetree/bindings/net/cdns,macb.yaml

I know what you're thinking, the binding doesn't look to be the problem
and I agree. I am not sure why a TI vendor property was ever actually
added since it has no meaning... just get rid of it.

Fixes: bc47b2217f24 ("riscv: dts: microchip: add the sundance polarberry")
Fixes: 0fa6107eca41 ("RISC-V: Initial DTS for Microchip ICICLE board")
Signed-off-by: Conor Dooley <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/riscv/boot/dts/microchip/mpfs-icicle-kit.dts | 2 --
arch/riscv/boot/dts/microchip/mpfs-polarberry.dts | 2 --
2 files changed, 4 deletions(-)

--- a/arch/riscv/boot/dts/microchip/mpfs-icicle-kit.dts
+++ b/arch/riscv/boot/dts/microchip/mpfs-icicle-kit.dts
@@ -84,12 +84,10 @@

phy1: ethernet-phy@9 {
reg = <9>;
- ti,fifo-depth = <0x1>;
};

phy0: ethernet-phy@8 {
reg = <8>;
- ti,fifo-depth = <0x1>;
};
};

--- a/arch/riscv/boot/dts/microchip/mpfs-polarberry.dts
+++ b/arch/riscv/boot/dts/microchip/mpfs-polarberry.dts
@@ -54,12 +54,10 @@

phy1: ethernet-phy@5 {
reg = <5>;
- ti,fifo-depth = <0x01>;
};

phy0: ethernet-phy@4 {
reg = <4>;
- ti,fifo-depth = <0x01>;
};
};



2022-08-29 12:12:58

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 077/158] net: Fix data-races around sysctl_fb_tunnels_only_for_init_net.

From: Kuniyuki Iwashima <[email protected]>

[ Upstream commit af67508ea6cbf0e4ea27f8120056fa2efce127dd ]

While reading sysctl_fb_tunnels_only_for_init_net, it can be changed
concurrently. Thus, we need to add READ_ONCE() to its readers.

Fixes: 79134e6ce2c9 ("net: do not create fallback tunnels for non-default namespaces")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
include/linux/netdevice.h | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 2563d30736e9a..78dd63a5c7c80 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -640,9 +640,14 @@ extern int sysctl_devconf_inherit_init_net;
*/
static inline bool net_has_fallback_tunnels(const struct net *net)
{
- return !IS_ENABLED(CONFIG_SYSCTL) ||
- !sysctl_fb_tunnels_only_for_init_net ||
- (net == &init_net && sysctl_fb_tunnels_only_for_init_net == 1);
+#if IS_ENABLED(CONFIG_SYSCTL)
+ int fb_tunnels_only_for_init_net = READ_ONCE(sysctl_fb_tunnels_only_for_init_net);
+
+ return !fb_tunnels_only_for_init_net ||
+ (net_eq(net, &init_net) && fb_tunnels_only_for_init_net == 1);
+#else
+ return true;
+#endif
}

static inline int netdev_queue_numa_node_read(const struct netdev_queue *q)
--
2.35.1



2022-08-29 12:12:59

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 083/158] i40e: Fix incorrect address type for IPv6 flow rules

From: Sylwester Dziedziuch <[email protected]>

[ Upstream commit bcf3a156429306070afbfda5544f2b492d25e75b ]

It was not possible to create 1-tuple flow director
rule for IPv6 flow type. It was caused by incorrectly
checking for source IP address when validating user provided
destination IP address.

Fix this by changing ip6src to correct ip6dst address
in destination IP address validation for IPv6 flow type.

Fixes: efca91e89b67 ("i40e: Add flow director support for IPv6")
Signed-off-by: Sylwester Dziedziuch <[email protected]>
Tested-by: Gurucharan <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index 19704f5c8291c..22a61802a4027 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -4395,7 +4395,7 @@ static int i40e_check_fdir_input_set(struct i40e_vsi *vsi,
(struct in6_addr *)&ipv6_full_mask))
new_mask |= I40E_L3_V6_DST_MASK;
else if (ipv6_addr_any((struct in6_addr *)
- &usr_ip6_spec->ip6src))
+ &usr_ip6_spec->ip6dst))
new_mask &= ~I40E_L3_V6_DST_MASK;
else
return -EOPNOTSUPP;
--
2.35.1



2022-08-29 12:13:02

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 067/158] net: Fix data-races around netdev_max_backlog.

From: Kuniyuki Iwashima <[email protected]>

[ Upstream commit 5dcd08cd19912892586c6082d56718333e2d19db ]

While reading netdev_max_backlog, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

While at it, we remove the unnecessary spaces in the doc.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
Documentation/admin-guide/sysctl/net.rst | 2 +-
net/core/dev.c | 4 ++--
net/core/gro_cells.c | 2 +-
net/xfrm/espintcp.c | 2 +-
net/xfrm/xfrm_input.c | 2 +-
5 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/Documentation/admin-guide/sysctl/net.rst b/Documentation/admin-guide/sysctl/net.rst
index fcd650bdbc7e2..01d9858197832 100644
--- a/Documentation/admin-guide/sysctl/net.rst
+++ b/Documentation/admin-guide/sysctl/net.rst
@@ -271,7 +271,7 @@ poll cycle or the number of packets processed reaches netdev_budget.
netdev_max_backlog
------------------

-Maximum number of packets, queued on the INPUT side, when the interface
+Maximum number of packets, queued on the INPUT side, when the interface
receives packets faster than kernel can process them.

netdev_rss_key
diff --git a/net/core/dev.c b/net/core/dev.c
index 2a7d81cd9e2ea..e1496e626a532 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4623,7 +4623,7 @@ static bool skb_flow_limit(struct sk_buff *skb, unsigned int qlen)
struct softnet_data *sd;
unsigned int old_flow, new_flow;

- if (qlen < (netdev_max_backlog >> 1))
+ if (qlen < (READ_ONCE(netdev_max_backlog) >> 1))
return false;

sd = this_cpu_ptr(&softnet_data);
@@ -4671,7 +4671,7 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu,
if (!netif_running(skb->dev))
goto drop;
qlen = skb_queue_len(&sd->input_pkt_queue);
- if (qlen <= netdev_max_backlog && !skb_flow_limit(skb, qlen)) {
+ if (qlen <= READ_ONCE(netdev_max_backlog) && !skb_flow_limit(skb, qlen)) {
if (qlen) {
enqueue:
__skb_queue_tail(&sd->input_pkt_queue, skb);
diff --git a/net/core/gro_cells.c b/net/core/gro_cells.c
index 541c7a72a28a4..21619c70a82b7 100644
--- a/net/core/gro_cells.c
+++ b/net/core/gro_cells.c
@@ -26,7 +26,7 @@ int gro_cells_receive(struct gro_cells *gcells, struct sk_buff *skb)

cell = this_cpu_ptr(gcells->cells);

- if (skb_queue_len(&cell->napi_skbs) > netdev_max_backlog) {
+ if (skb_queue_len(&cell->napi_skbs) > READ_ONCE(netdev_max_backlog)) {
drop:
dev_core_stats_rx_dropped_inc(dev);
kfree_skb(skb);
diff --git a/net/xfrm/espintcp.c b/net/xfrm/espintcp.c
index 82d14eea1b5ad..974eb97b77d22 100644
--- a/net/xfrm/espintcp.c
+++ b/net/xfrm/espintcp.c
@@ -168,7 +168,7 @@ int espintcp_queue_out(struct sock *sk, struct sk_buff *skb)
{
struct espintcp_ctx *ctx = espintcp_getctx(sk);

- if (skb_queue_len(&ctx->out_queue) >= netdev_max_backlog)
+ if (skb_queue_len(&ctx->out_queue) >= READ_ONCE(netdev_max_backlog))
return -ENOBUFS;

__skb_queue_tail(&ctx->out_queue, skb);
diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
index 70a8c36f0ba6e..b2f4ec9c537f0 100644
--- a/net/xfrm/xfrm_input.c
+++ b/net/xfrm/xfrm_input.c
@@ -782,7 +782,7 @@ int xfrm_trans_queue_net(struct net *net, struct sk_buff *skb,

trans = this_cpu_ptr(&xfrm_trans_tasklet);

- if (skb_queue_len(&trans->queue) >= netdev_max_backlog)
+ if (skb_queue_len(&trans->queue) >= READ_ONCE(netdev_max_backlog))
return -ENOBUFS;

BUILD_BUG_ON(sizeof(struct xfrm_trans_cb) > sizeof(skb->cb));
--
2.35.1



2022-08-29 12:13:13

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 057/158] netfilter: nft_payload: report ERANGE for too long offset and length

From: Pablo Neira Ayuso <[email protected]>

[ Upstream commit 94254f990c07e9ddf1634e0b727fab821c3b5bf9 ]

Instead of offset and length are truncation to u8, report ERANGE.

Fixes: 96518518cc41 ("netfilter: add nftables")
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/netfilter/nft_payload.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/nft_payload.c b/net/netfilter/nft_payload.c
index 2e7ac007cb30f..4fee67abfe2c5 100644
--- a/net/netfilter/nft_payload.c
+++ b/net/netfilter/nft_payload.c
@@ -833,6 +833,7 @@ nft_payload_select_ops(const struct nft_ctx *ctx,
{
enum nft_payload_bases base;
unsigned int offset, len;
+ int err;

if (tb[NFTA_PAYLOAD_BASE] == NULL ||
tb[NFTA_PAYLOAD_OFFSET] == NULL ||
@@ -859,8 +860,13 @@ nft_payload_select_ops(const struct nft_ctx *ctx,
if (tb[NFTA_PAYLOAD_DREG] == NULL)
return ERR_PTR(-EINVAL);

- offset = ntohl(nla_get_be32(tb[NFTA_PAYLOAD_OFFSET]));
- len = ntohl(nla_get_be32(tb[NFTA_PAYLOAD_LEN]));
+ err = nft_parse_u32_check(tb[NFTA_PAYLOAD_OFFSET], U8_MAX, &offset);
+ if (err < 0)
+ return ERR_PTR(err);
+
+ err = nft_parse_u32_check(tb[NFTA_PAYLOAD_LEN], U8_MAX, &len);
+ if (err < 0)
+ return ERR_PTR(err);

if (len <= 4 && is_power_of_2(len) && IS_ALIGNED(offset, len) &&
base != NFT_PAYLOAD_LL_HEADER && base != NFT_PAYLOAD_INNER_HEADER)
--
2.35.1



2022-08-29 12:13:24

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 071/158] net: Fix a data-race around sysctl_tstamp_allow_data.

From: Kuniyuki Iwashima <[email protected]>

[ Upstream commit d2154b0afa73c0159b2856f875c6b4fe7cf6a95e ]

While reading sysctl_tstamp_allow_data, it can be changed
concurrently. Thus, we need to add READ_ONCE() to its reader.

Fixes: b245be1f4db1 ("net-timestamp: no-payload only sysctl")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/core/skbuff.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 5b3559cb1d827..bebf58464d667 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4772,7 +4772,7 @@ static bool skb_may_tx_timestamp(struct sock *sk, bool tsonly)
{
bool ret;

- if (likely(sysctl_tstamp_allow_data || tsonly))
+ if (likely(READ_ONCE(sysctl_tstamp_allow_data) || tsonly))
return true;

read_lock_bh(&sk->sk_callback_lock);
--
2.35.1



2022-08-29 12:13:34

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 075/158] net: Fix data-races around sysctl_max_skb_frags.

From: Kuniyuki Iwashima <[email protected]>

[ Upstream commit 657b991afb89d25fe6c4783b1b75a8ad4563670d ]

While reading sysctl_max_skb_frags, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

Fixes: 5f74f82ea34c ("net:Add sysctl_max_skb_frags")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/ipv4/tcp.c | 4 ++--
net/mptcp/protocol.c | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 3ae2ea0488838..3d446773ff2a5 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1000,7 +1000,7 @@ static struct sk_buff *tcp_build_frag(struct sock *sk, int size_goal, int flags,

i = skb_shinfo(skb)->nr_frags;
can_coalesce = skb_can_coalesce(skb, i, page, offset);
- if (!can_coalesce && i >= sysctl_max_skb_frags) {
+ if (!can_coalesce && i >= READ_ONCE(sysctl_max_skb_frags)) {
tcp_mark_push(tp, skb);
goto new_segment;
}
@@ -1348,7 +1348,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)

if (!skb_can_coalesce(skb, i, pfrag->page,
pfrag->offset)) {
- if (i >= sysctl_max_skb_frags) {
+ if (i >= READ_ONCE(sysctl_max_skb_frags)) {
tcp_mark_push(tp, skb);
goto new_segment;
}
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 3d90fa9653ef3..513f571a082ba 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1299,7 +1299,7 @@ static int mptcp_sendmsg_frag(struct sock *sk, struct sock *ssk,

i = skb_shinfo(skb)->nr_frags;
can_coalesce = skb_can_coalesce(skb, i, dfrag->page, offset);
- if (!can_coalesce && i >= sysctl_max_skb_frags) {
+ if (!can_coalesce && i >= READ_ONCE(sysctl_max_skb_frags)) {
tcp_mark_push(tcp_sk(ssk), skb);
goto alloc_skb;
}
--
2.35.1



2022-08-29 12:13:34

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 098/158] btrfs: check if root is readonly while setting security xattr

From: Goldwyn Rodrigues <[email protected]>

commit b51111271b0352aa596c5ae8faf06939e91b3b68 upstream.

For a filesystem which has btrfs read-only property set to true, all
write operations including xattr should be denied. However, security
xattr can still be changed even if btrfs ro property is true.

This happens because xattr_permission() does not have any restrictions
on security.*, system.* and in some cases trusted.* from VFS and
the decision is left to the underlying filesystem. See comments in
xattr_permission() for more details.

This patch checks if the root is read-only before performing the set
xattr operation.

Testcase:

DEV=/dev/vdb
MNT=/mnt

mkfs.btrfs -f $DEV
mount $DEV $MNT
echo "file one" > $MNT/f1

setfattr -n "security.one" -v 2 $MNT/f1
btrfs property set /mnt ro true

setfattr -n "security.one" -v 1 $MNT/f1

umount $MNT

CC: [email protected] # 4.9+
Reviewed-by: Qu Wenruo <[email protected]>
Reviewed-by: Filipe Manana <[email protected]>
Signed-off-by: Goldwyn Rodrigues <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/btrfs/xattr.c | 3 +++
1 file changed, 3 insertions(+)

--- a/fs/btrfs/xattr.c
+++ b/fs/btrfs/xattr.c
@@ -371,6 +371,9 @@ static int btrfs_xattr_handler_set(const
const char *name, const void *buffer,
size_t size, int flags)
{
+ if (btrfs_root_readonly(BTRFS_I(inode)->root))
+ return -EROFS;
+
name = xattr_full_name(handler, name);
return btrfs_setxattr_trans(inode, name, buffer, size, flags);
}


2022-08-29 12:13:34

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 036/158] net/mlx5e: Fix wrong tc flag used when set hw-tc-offload off

From: Maor Dickman <[email protected]>

[ Upstream commit 550f96432e6f6770efdaee0e65239d61431062a1 ]

The cited commit reintroduced the ability to set hw-tc-offload
in switchdev mode by reusing NIC mode calls without modifying it
to support both modes, this can cause an illegal memory access
when trying to turn hw-tc-offload off.

Fix this by using the right TC_FLAG when checking if tc rules
are installed while disabling hw-tc-offload.

Fixes: d3cbd4254df8 ("net/mlx5e: Add ndo_set_feature for uplink representor")
Signed-off-by: Maor Dickman <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 62aab20025345..9e6db779b6efa 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3678,7 +3678,9 @@ static int set_feature_hw_tc(struct net_device *netdev, bool enable)
struct mlx5e_priv *priv = netdev_priv(netdev);

#if IS_ENABLED(CONFIG_MLX5_CLS_ACT)
- if (!enable && mlx5e_tc_num_filters(priv, MLX5_TC_FLAG(NIC_OFFLOAD))) {
+ int tc_flag = mlx5e_is_uplink_rep(priv) ? MLX5_TC_FLAG(ESW_OFFLOAD) :
+ MLX5_TC_FLAG(NIC_OFFLOAD);
+ if (!enable && mlx5e_tc_num_filters(priv, tc_flag)) {
netdev_err(netdev,
"Active offloaded tc filters, can't turn hw_tc_offload off\n");
return -EINVAL;
--
2.35.1



2022-08-29 12:13:38

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 111/158] writeback: avoid use-after-free after removing device

From: Khazhismel Kumykov <[email protected]>

commit f87904c075515f3e1d8f4a7115869d3b914674fd upstream.

When a disk is removed, bdi_unregister gets called to stop further
writeback and wait for associated delayed work to complete. However,
wb_inode_writeback_end() may schedule bandwidth estimation dwork after
this has completed, which can result in the timer attempting to access the
just freed bdi_writeback.

Fix this by checking if the bdi_writeback is alive, similar to when
scheduling writeback work.

Since this requires wb->work_lock, and wb_inode_writeback_end() may get
called from interrupt, switch wb->work_lock to an irqsafe lock.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 45a2966fd641 ("writeback: fix bandwidth estimate for spiky workload")
Signed-off-by: Khazhismel Kumykov <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Cc: Michael Stapelberg <[email protected]>
Cc: Wu Fengguang <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/fs-writeback.c | 12 ++++++------
mm/backing-dev.c | 10 +++++-----
mm/page-writeback.c | 6 +++++-
3 files changed, 16 insertions(+), 12 deletions(-)

--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -134,10 +134,10 @@ static bool inode_io_list_move_locked(st

static void wb_wakeup(struct bdi_writeback *wb)
{
- spin_lock_bh(&wb->work_lock);
+ spin_lock_irq(&wb->work_lock);
if (test_bit(WB_registered, &wb->state))
mod_delayed_work(bdi_wq, &wb->dwork, 0);
- spin_unlock_bh(&wb->work_lock);
+ spin_unlock_irq(&wb->work_lock);
}

static void finish_writeback_work(struct bdi_writeback *wb,
@@ -164,7 +164,7 @@ static void wb_queue_work(struct bdi_wri
if (work->done)
atomic_inc(&work->done->cnt);

- spin_lock_bh(&wb->work_lock);
+ spin_lock_irq(&wb->work_lock);

if (test_bit(WB_registered, &wb->state)) {
list_add_tail(&work->list, &wb->work_list);
@@ -172,7 +172,7 @@ static void wb_queue_work(struct bdi_wri
} else
finish_writeback_work(wb, work);

- spin_unlock_bh(&wb->work_lock);
+ spin_unlock_irq(&wb->work_lock);
}

/**
@@ -2082,13 +2082,13 @@ static struct wb_writeback_work *get_nex
{
struct wb_writeback_work *work = NULL;

- spin_lock_bh(&wb->work_lock);
+ spin_lock_irq(&wb->work_lock);
if (!list_empty(&wb->work_list)) {
work = list_entry(wb->work_list.next,
struct wb_writeback_work, list);
list_del_init(&work->list);
}
- spin_unlock_bh(&wb->work_lock);
+ spin_unlock_irq(&wb->work_lock);
return work;
}

--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -260,10 +260,10 @@ void wb_wakeup_delayed(struct bdi_writeb
unsigned long timeout;

timeout = msecs_to_jiffies(dirty_writeback_interval * 10);
- spin_lock_bh(&wb->work_lock);
+ spin_lock_irq(&wb->work_lock);
if (test_bit(WB_registered, &wb->state))
queue_delayed_work(bdi_wq, &wb->dwork, timeout);
- spin_unlock_bh(&wb->work_lock);
+ spin_unlock_irq(&wb->work_lock);
}

static void wb_update_bandwidth_workfn(struct work_struct *work)
@@ -334,12 +334,12 @@ static void cgwb_remove_from_bdi_list(st
static void wb_shutdown(struct bdi_writeback *wb)
{
/* Make sure nobody queues further work */
- spin_lock_bh(&wb->work_lock);
+ spin_lock_irq(&wb->work_lock);
if (!test_and_clear_bit(WB_registered, &wb->state)) {
- spin_unlock_bh(&wb->work_lock);
+ spin_unlock_irq(&wb->work_lock);
return;
}
- spin_unlock_bh(&wb->work_lock);
+ spin_unlock_irq(&wb->work_lock);

cgwb_remove_from_bdi_list(wb);
/*
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2867,6 +2867,7 @@ static void wb_inode_writeback_start(str

static void wb_inode_writeback_end(struct bdi_writeback *wb)
{
+ unsigned long flags;
atomic_dec(&wb->writeback_inodes);
/*
* Make sure estimate of writeback throughput gets updated after
@@ -2875,7 +2876,10 @@ static void wb_inode_writeback_end(struc
* that if multiple inodes end writeback at a similar time, they get
* batched into one bandwidth update.
*/
- queue_delayed_work(bdi_wq, &wb->bw_dwork, BANDWIDTH_INTERVAL);
+ spin_lock_irqsave(&wb->work_lock, flags);
+ if (test_bit(WB_registered, &wb->state))
+ queue_delayed_work(bdi_wq, &wb->bw_dwork, BANDWIDTH_INTERVAL);
+ spin_unlock_irqrestore(&wb->work_lock, flags);
}

bool __folio_end_writeback(struct folio *folio)


2022-08-29 12:13:40

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 097/158] btrfs: fix space cache corruption and potential double allocations

From: Omar Sandoval <[email protected]>

commit ced8ecf026fd8084cf175530ff85c76d6085d715 upstream.

When testing space_cache v2 on a large set of machines, we encountered a
few symptoms:

1. "unable to add free space :-17" (EEXIST) errors.
2. Missing free space info items, sometimes caught with a "missing free
space info for X" error.
3. Double-accounted space: ranges that were allocated in the extent tree
and also marked as free in the free space tree, ranges that were
marked as allocated twice in the extent tree, or ranges that were
marked as free twice in the free space tree. If the latter made it
onto disk, the next reboot would hit the BUG_ON() in
add_new_free_space().
4. On some hosts with no on-disk corruption or error messages, the
in-memory space cache (dumped with drgn) disagreed with the free
space tree.

All of these symptoms have the same underlying cause: a race between
caching the free space for a block group and returning free space to the
in-memory space cache for pinned extents causes us to double-add a free
range to the space cache. This race exists when free space is cached
from the free space tree (space_cache=v2) or the extent tree
(nospace_cache, or space_cache=v1 if the cache needs to be regenerated).
struct btrfs_block_group::last_byte_to_unpin and struct
btrfs_block_group::progress are supposed to protect against this race,
but commit d0c2f4fa555e ("btrfs: make concurrent fsyncs wait less when
waiting for a transaction commit") subtly broke this by allowing
multiple transactions to be unpinning extents at the same time.

Specifically, the race is as follows:

1. An extent is deleted from an uncached block group in transaction A.
2. btrfs_commit_transaction() is called for transaction A.
3. btrfs_run_delayed_refs() -> __btrfs_free_extent() runs the delayed
ref for the deleted extent.
4. __btrfs_free_extent() -> do_free_extent_accounting() ->
add_to_free_space_tree() adds the deleted extent back to the free
space tree.
5. do_free_extent_accounting() -> btrfs_update_block_group() ->
btrfs_cache_block_group() queues up the block group to get cached.
block_group->progress is set to block_group->start.
6. btrfs_commit_transaction() for transaction A calls
switch_commit_roots(). It sets block_group->last_byte_to_unpin to
block_group->progress, which is block_group->start because the block
group hasn't been cached yet.
7. The caching thread gets to our block group. Since the commit roots
were already switched, load_free_space_tree() sees the deleted extent
as free and adds it to the space cache. It finishes caching and sets
block_group->progress to U64_MAX.
8. btrfs_commit_transaction() advances transaction A to
TRANS_STATE_SUPER_COMMITTED.
9. fsync calls btrfs_commit_transaction() for transaction B. Since
transaction A is already in TRANS_STATE_SUPER_COMMITTED and the
commit is for fsync, it advances.
10. btrfs_commit_transaction() for transaction B calls
switch_commit_roots(). This time, the block group has already been
cached, so it sets block_group->last_byte_to_unpin to U64_MAX.
11. btrfs_commit_transaction() for transaction A calls
btrfs_finish_extent_commit(), which calls unpin_extent_range() for
the deleted extent. It sees last_byte_to_unpin set to U64_MAX (by
transaction B!), so it adds the deleted extent to the space cache
again!

This explains all of our symptoms above:

* If the sequence of events is exactly as described above, when the free
space is re-added in step 11, it will fail with EEXIST.
* If another thread reallocates the deleted extent in between steps 7
and 11, then step 11 will silently re-add that space to the space
cache as free even though it is actually allocated. Then, if that
space is allocated *again*, the free space tree will be corrupted
(namely, the wrong item will be deleted).
* If we don't catch this free space tree corruption, it will continue
to get worse as extents are deleted and reallocated.

The v1 space_cache is synchronously loaded when an extent is deleted
(btrfs_update_block_group() with alloc=0 calls btrfs_cache_block_group()
with load_cache_only=1), so it is not normally affected by this bug.
However, as noted above, if we fail to load the space cache, we will
fall back to caching from the extent tree and may hit this bug.

The easiest fix for this race is to also make caching from the free
space tree or extent tree synchronous. Josef tested this and found no
performance regressions.

A few extra changes fall out of this change. Namely, this fix does the
following, with step 2 being the crucial fix:

1. Factor btrfs_caching_ctl_wait_done() out of
btrfs_wait_block_group_cache_done() to allow waiting on a caching_ctl
that we already hold a reference to.
2. Change the call in btrfs_cache_block_group() of
btrfs_wait_space_cache_v1_finished() to
btrfs_caching_ctl_wait_done(), which makes us wait regardless of the
space_cache option.
3. Delete the now unused btrfs_wait_space_cache_v1_finished() and
space_cache_v1_done().
4. Change btrfs_cache_block_group()'s `int load_cache_only` parameter to
`bool wait` to more accurately describe its new meaning.
5. Change a few callers which had a separate call to
btrfs_wait_block_group_cache_done() to use wait = true instead.
6. Make btrfs_wait_block_group_cache_done() static now that it's not
used outside of block-group.c anymore.

Fixes: d0c2f4fa555e ("btrfs: make concurrent fsyncs wait less when waiting for a transaction commit")
CC: [email protected] # 5.12+
Reviewed-by: Filipe Manana <[email protected]>
Signed-off-by: Omar Sandoval <[email protected]>
Signed-off-by: David Sterba <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/btrfs/block-group.c | 47 +++++++++++++++--------------------------------
fs/btrfs/block-group.h | 4 +---
fs/btrfs/ctree.h | 1 -
fs/btrfs/extent-tree.c | 30 ++++++------------------------
4 files changed, 22 insertions(+), 60 deletions(-)

--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -440,39 +440,26 @@ void btrfs_wait_block_group_cache_progre
btrfs_put_caching_control(caching_ctl);
}

-int btrfs_wait_block_group_cache_done(struct btrfs_block_group *cache)
+static int btrfs_caching_ctl_wait_done(struct btrfs_block_group *cache,
+ struct btrfs_caching_control *caching_ctl)
+{
+ wait_event(caching_ctl->wait, btrfs_block_group_done(cache));
+ return cache->cached == BTRFS_CACHE_ERROR ? -EIO : 0;
+}
+
+static int btrfs_wait_block_group_cache_done(struct btrfs_block_group *cache)
{
struct btrfs_caching_control *caching_ctl;
- int ret = 0;
+ int ret;

caching_ctl = btrfs_get_caching_control(cache);
if (!caching_ctl)
return (cache->cached == BTRFS_CACHE_ERROR) ? -EIO : 0;
-
- wait_event(caching_ctl->wait, btrfs_block_group_done(cache));
- if (cache->cached == BTRFS_CACHE_ERROR)
- ret = -EIO;
+ ret = btrfs_caching_ctl_wait_done(cache, caching_ctl);
btrfs_put_caching_control(caching_ctl);
return ret;
}

-static bool space_cache_v1_done(struct btrfs_block_group *cache)
-{
- bool ret;
-
- spin_lock(&cache->lock);
- ret = cache->cached != BTRFS_CACHE_FAST;
- spin_unlock(&cache->lock);
-
- return ret;
-}
-
-void btrfs_wait_space_cache_v1_finished(struct btrfs_block_group *cache,
- struct btrfs_caching_control *caching_ctl)
-{
- wait_event(caching_ctl->wait, space_cache_v1_done(cache));
-}
-
#ifdef CONFIG_BTRFS_DEBUG
static void fragment_free_space(struct btrfs_block_group *block_group)
{
@@ -750,9 +737,8 @@ done:
btrfs_put_block_group(block_group);
}

-int btrfs_cache_block_group(struct btrfs_block_group *cache, int load_cache_only)
+int btrfs_cache_block_group(struct btrfs_block_group *cache, bool wait)
{
- DEFINE_WAIT(wait);
struct btrfs_fs_info *fs_info = cache->fs_info;
struct btrfs_caching_control *caching_ctl = NULL;
int ret = 0;
@@ -785,10 +771,7 @@ int btrfs_cache_block_group(struct btrfs
}
WARN_ON(cache->caching_ctl);
cache->caching_ctl = caching_ctl;
- if (btrfs_test_opt(fs_info, SPACE_CACHE))
- cache->cached = BTRFS_CACHE_FAST;
- else
- cache->cached = BTRFS_CACHE_STARTED;
+ cache->cached = BTRFS_CACHE_STARTED;
cache->has_caching_ctl = 1;
spin_unlock(&cache->lock);

@@ -801,8 +784,8 @@ int btrfs_cache_block_group(struct btrfs

btrfs_queue_work(fs_info->caching_workers, &caching_ctl->work);
out:
- if (load_cache_only && caching_ctl)
- btrfs_wait_space_cache_v1_finished(cache, caching_ctl);
+ if (wait && caching_ctl)
+ ret = btrfs_caching_ctl_wait_done(cache, caching_ctl);
if (caching_ctl)
btrfs_put_caching_control(caching_ctl);

@@ -3313,7 +3296,7 @@ int btrfs_update_block_group(struct btrf
* space back to the block group, otherwise we will leak space.
*/
if (!alloc && !btrfs_block_group_done(cache))
- btrfs_cache_block_group(cache, 1);
+ btrfs_cache_block_group(cache, true);

byte_in_group = bytenr - cache->start;
WARN_ON(byte_in_group > cache->length);
--- a/fs/btrfs/block-group.h
+++ b/fs/btrfs/block-group.h
@@ -263,9 +263,7 @@ void btrfs_dec_nocow_writers(struct btrf
void btrfs_wait_nocow_writers(struct btrfs_block_group *bg);
void btrfs_wait_block_group_cache_progress(struct btrfs_block_group *cache,
u64 num_bytes);
-int btrfs_wait_block_group_cache_done(struct btrfs_block_group *cache);
-int btrfs_cache_block_group(struct btrfs_block_group *cache,
- int load_cache_only);
+int btrfs_cache_block_group(struct btrfs_block_group *cache, bool wait);
void btrfs_put_caching_control(struct btrfs_caching_control *ctl);
struct btrfs_caching_control *btrfs_get_caching_control(
struct btrfs_block_group *cache);
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -494,7 +494,6 @@ struct btrfs_free_cluster {
enum btrfs_caching_type {
BTRFS_CACHE_NO,
BTRFS_CACHE_STARTED,
- BTRFS_CACHE_FAST,
BTRFS_CACHE_FINISHED,
BTRFS_CACHE_ERROR,
};
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2567,17 +2567,10 @@ int btrfs_pin_extent_for_log_replay(stru
return -EINVAL;

/*
- * pull in the free space cache (if any) so that our pin
- * removes the free space from the cache. We have load_only set
- * to one because the slow code to read in the free extents does check
- * the pinned extents.
+ * Fully cache the free space first so that our pin removes the free space
+ * from the cache.
*/
- btrfs_cache_block_group(cache, 1);
- /*
- * Make sure we wait until the cache is completely built in case it is
- * missing or is invalid and therefore needs to be rebuilt.
- */
- ret = btrfs_wait_block_group_cache_done(cache);
+ ret = btrfs_cache_block_group(cache, true);
if (ret)
goto out;

@@ -2600,12 +2593,7 @@ static int __exclude_logged_extent(struc
if (!block_group)
return -EINVAL;

- btrfs_cache_block_group(block_group, 1);
- /*
- * Make sure we wait until the cache is completely built in case it is
- * missing or is invalid and therefore needs to be rebuilt.
- */
- ret = btrfs_wait_block_group_cache_done(block_group);
+ ret = btrfs_cache_block_group(block_group, true);
if (ret)
goto out;

@@ -4415,7 +4403,7 @@ have_block_group:
ffe_ctl->cached = btrfs_block_group_done(block_group);
if (unlikely(!ffe_ctl->cached)) {
ffe_ctl->have_caching_bg = true;
- ret = btrfs_cache_block_group(block_group, 0);
+ ret = btrfs_cache_block_group(block_group, false);

/*
* If we get ENOMEM here or something else we want to
@@ -6169,13 +6157,7 @@ int btrfs_trim_fs(struct btrfs_fs_info *

if (end - start >= range->minlen) {
if (!btrfs_block_group_done(cache)) {
- ret = btrfs_cache_block_group(cache, 0);
- if (ret) {
- bg_failed++;
- bg_ret = ret;
- continue;
- }
- ret = btrfs_wait_block_group_cache_done(cache);
+ ret = btrfs_cache_block_group(cache, true);
if (ret) {
bg_failed++;
bg_ret = ret;


2022-08-29 12:13:42

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 029/158] net/mlx5e: Properly disable vlan strip on non-UL reps

From: Vlad Buslov <[email protected]>

[ Upstream commit f37044fd759b6bc40b6398a978e0b1acdf717372 ]

When querying mlx5 non-uplink representors capabilities with ethtool
rx-vlan-offload is marked as "off [fixed]". However, it is actually always
enabled because mlx5e_params->vlan_strip_disable is 0 by default when
initializing struct mlx5e_params instance. Fix the issue by explicitly
setting the vlan_strip_disable to 'true' for non-uplink representors.

Fixes: cb67b832921c ("net/mlx5e: Introduce SRIOV VF representors")
Signed-off-by: Vlad Buslov <[email protected]>
Reviewed-by: Roi Dayan <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index f797fd97d305b..7da3dc6261929 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -662,6 +662,8 @@ static void mlx5e_build_rep_params(struct net_device *netdev)

params->mqprio.num_tc = 1;
params->tunneled_offload_en = false;
+ if (rep->vport != MLX5_VPORT_UPLINK)
+ params->vlan_strip_disable = true;

mlx5_query_min_inline(mdev, &params->tx_min_inline_mode);
}
--
2.35.1



2022-08-29 12:13:50

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 148/158] arm64/signal: Flush FPSIMD register state when disabling streaming mode

From: Mark Brown <[email protected]>

commit ea64baacbc36a0d552aec0d87107182f40211131 upstream.

When handling a signal delivered to a context with streaming mode enabled
we will disable streaming mode for the signal handler, when doing so we
should also flush the saved FPSIMD register state like exiting streaming
mode in the hardware would do so that if that state is reloaded we get the
same behaviour. Without this we will reload whatever the last FPSIMD state
that was saved for the task was.

Fixes: 40a8e87bb328 ("arm64/sme: Disable ZA and streaming mode when handling signals")
Signed-off-by: Mark Brown <[email protected]>
Reviewed-by: Catalin Marinas <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/arm64/kernel/signal.c | 10 ++++++++++
1 file changed, 10 insertions(+)

--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -922,6 +922,16 @@ static void setup_return(struct pt_regs

/* Signal handlers are invoked with ZA and streaming mode disabled */
if (system_supports_sme()) {
+ /*
+ * If we were in streaming mode the saved register
+ * state was SVE but we will exit SM and use the
+ * FPSIMD register state - flush the saved FPSIMD
+ * register state in case it gets loaded.
+ */
+ if (current->thread.svcr & SVCR_SM_MASK)
+ memset(&current->thread.uw.fpsimd_state, 0,
+ sizeof(current->thread.uw.fpsimd_state));
+
current->thread.svcr &= ~(SVCR_ZA_MASK |
SVCR_SM_MASK);
sme_smstop();


2022-08-29 12:14:01

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 110/158] loop: Check for overflow while configuring loop

From: Siddh Raman Pant <[email protected]>

commit c490a0b5a4f36da3918181a8acdc6991d967c5f3 upstream.

The userspace can configure a loop using an ioctl call, wherein
a configuration of type loop_config is passed (see lo_ioctl()'s
case on line 1550 of drivers/block/loop.c). This proceeds to call
loop_configure() which in turn calls loop_set_status_from_info()
(see line 1050 of loop.c), passing &config->info which is of type
loop_info64*. This function then sets the appropriate values, like
the offset.

loop_device has lo_offset of type loff_t (see line 52 of loop.c),
which is typdef-chained to long long, whereas loop_info64 has
lo_offset of type __u64 (see line 56 of include/uapi/linux/loop.h).

The function directly copies offset from info to the device as
follows (See line 980 of loop.c):
lo->lo_offset = info->lo_offset;

This results in an overflow, which triggers a warning in iomap_iter()
due to a call to iomap_iter_done() which has:
WARN_ON_ONCE(iter->iomap.offset > iter->pos);

Thus, check for negative value during loop_set_status_from_info().

Bug report: https://syzkaller.appspot.com/bug?id=c620fe14aac810396d3c3edc9ad73848bf69a29e

Reported-and-tested-by: [email protected]
Cc: [email protected]
Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Siddh Raman Pant <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/block/loop.c | 5 +++++
1 file changed, 5 insertions(+)

--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -979,6 +979,11 @@ loop_set_status_from_info(struct loop_de

lo->lo_offset = info->lo_offset;
lo->lo_sizelimit = info->lo_sizelimit;
+
+ /* loff_t vars have been assigned __u64 */
+ if (lo->lo_offset < 0 || lo->lo_sizelimit < 0)
+ return -EOVERFLOW;
+
memcpy(lo->lo_file_name, info->lo_file_name, LO_NAME_SIZE);
lo->lo_file_name[LO_NAME_SIZE-1] = 0;
lo->lo_flags = info->lo_flags;


2022-08-29 12:14:02

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 149/158] arm64/sme: Dont flush SVE register state when allocating SME storage

From: Mark Brown <[email protected]>

commit 826a4fdd2ada9e5923c58bdd168f31a42e958ffc upstream.

Currently when taking a SME access trap we allocate storage for the SVE
register state in order to be able to handle storage of streaming mode SVE.
Due to the original usage in a purely SVE context the SVE register state
allocation this also flushes the register state for SVE if storage was
already allocated but in the SME context this is not desirable. For a SME
access trap to be taken the task must not be in streaming mode so either
there already is SVE register state present for regular SVE mode which would
be corrupted or the task does not have TIF_SVE and the flush is redundant.

Fix this by adding a flag to sve_alloc() indicating if we are in a SVE
context and need to flush the state. Freshly allocated storage is always
zeroed either way.

Fixes: 8bd7f91c03d8 ("arm64/sme: Implement traps and syscall handling for SME")
Signed-off-by: Mark Brown <[email protected]>
Reviewed-by: Catalin Marinas <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/arm64/include/asm/fpsimd.h | 4 ++--
arch/arm64/kernel/fpsimd.c | 10 ++++++----
arch/arm64/kernel/ptrace.c | 6 +++---
arch/arm64/kernel/signal.c | 2 +-
4 files changed, 12 insertions(+), 10 deletions(-)

--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -153,7 +153,7 @@ struct vl_info {

#ifdef CONFIG_ARM64_SVE

-extern void sve_alloc(struct task_struct *task);
+extern void sve_alloc(struct task_struct *task, bool flush);
extern void fpsimd_release_task(struct task_struct *task);
extern void fpsimd_sync_to_sve(struct task_struct *task);
extern void fpsimd_force_sync_to_sve(struct task_struct *task);
@@ -256,7 +256,7 @@ size_t sve_state_size(struct task_struct

#else /* ! CONFIG_ARM64_SVE */

-static inline void sve_alloc(struct task_struct *task) { }
+static inline void sve_alloc(struct task_struct *task, bool flush) { }
static inline void fpsimd_release_task(struct task_struct *task) { }
static inline void sve_sync_to_fpsimd(struct task_struct *task) { }
static inline void sve_sync_from_fpsimd_zeropad(struct task_struct *task) { }
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -716,10 +716,12 @@ size_t sve_state_size(struct task_struct
* do_sve_acc() case, there is no ABI requirement to hide stale data
* written previously be task.
*/
-void sve_alloc(struct task_struct *task)
+void sve_alloc(struct task_struct *task, bool flush)
{
if (task->thread.sve_state) {
- memset(task->thread.sve_state, 0, sve_state_size(task));
+ if (flush)
+ memset(task->thread.sve_state, 0,
+ sve_state_size(task));
return;
}

@@ -1389,7 +1391,7 @@ void do_sve_acc(unsigned long esr, struc
return;
}

- sve_alloc(current);
+ sve_alloc(current, true);
if (!current->thread.sve_state) {
force_sig(SIGKILL);
return;
@@ -1440,7 +1442,7 @@ void do_sme_acc(unsigned long esr, struc
return;
}

- sve_alloc(current);
+ sve_alloc(current, false);
sme_alloc(current);
if (!current->thread.sve_state || !current->thread.za_state) {
force_sig(SIGKILL);
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -882,7 +882,7 @@ static int sve_set_common(struct task_st
* state and ensure there's storage.
*/
if (target->thread.svcr != old_svcr)
- sve_alloc(target);
+ sve_alloc(target, true);
}

/* Registers: FPSIMD-only case */
@@ -912,7 +912,7 @@ static int sve_set_common(struct task_st
goto out;
}

- sve_alloc(target);
+ sve_alloc(target, true);
if (!target->thread.sve_state) {
ret = -ENOMEM;
clear_tsk_thread_flag(target, TIF_SVE);
@@ -1082,7 +1082,7 @@ static int za_set(struct task_struct *ta

/* Ensure there is some SVE storage for streaming mode */
if (!target->thread.sve_state) {
- sve_alloc(target);
+ sve_alloc(target, false);
if (!target->thread.sve_state) {
clear_thread_flag(TIF_SME);
ret = -ENOMEM;
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -307,7 +307,7 @@ static int restore_sve_fpsimd_context(st
fpsimd_flush_task_state(current);
/* From now, fpsimd_thread_switch() won't touch thread.sve_state */

- sve_alloc(current);
+ sve_alloc(current, true);
if (!current->thread.sve_state) {
clear_thread_flag(TIF_SVE);
return -ENOMEM;


2022-08-29 12:14:29

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 158/158] bpf: Dont use tnum_range on array range checking for poke descriptors

From: Daniel Borkmann <[email protected]>

commit a657182a5c5150cdfacb6640aad1d2712571a409 upstream.

Hsin-Wei reported a KASAN splat triggered by their BPF runtime fuzzer which
is based on a customized syzkaller:

BUG: KASAN: slab-out-of-bounds in bpf_int_jit_compile+0x1257/0x13f0
Read of size 8 at addr ffff888004e90b58 by task syz-executor.0/1489
CPU: 1 PID: 1489 Comm: syz-executor.0 Not tainted 5.19.0 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.13.0-1ubuntu1.1 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x9c/0xc9
print_address_description.constprop.0+0x1f/0x1f0
? bpf_int_jit_compile+0x1257/0x13f0
kasan_report.cold+0xeb/0x197
? kvmalloc_node+0x170/0x200
? bpf_int_jit_compile+0x1257/0x13f0
bpf_int_jit_compile+0x1257/0x13f0
? arch_prepare_bpf_dispatcher+0xd0/0xd0
? rcu_read_lock_sched_held+0x43/0x70
bpf_prog_select_runtime+0x3e8/0x640
? bpf_obj_name_cpy+0x149/0x1b0
bpf_prog_load+0x102f/0x2220
? __bpf_prog_put.constprop.0+0x220/0x220
? find_held_lock+0x2c/0x110
? __might_fault+0xd6/0x180
? lock_downgrade+0x6e0/0x6e0
? lock_is_held_type+0xa6/0x120
? __might_fault+0x147/0x180
__sys_bpf+0x137b/0x6070
? bpf_perf_link_attach+0x530/0x530
? new_sync_read+0x600/0x600
? __fget_files+0x255/0x450
? lock_downgrade+0x6e0/0x6e0
? fput+0x30/0x1a0
? ksys_write+0x1a8/0x260
__x64_sys_bpf+0x7a/0xc0
? syscall_enter_from_user_mode+0x21/0x70
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f917c4e2c2d

The problem here is that a range of tnum_range(0, map->max_entries - 1) has
limited ability to represent the concrete tight range with the tnum as the
set of resulting states from value + mask can result in a superset of the
actual intended range, and as such a tnum_in(range, reg->var_off) check may
yield true when it shouldn't, for example tnum_range(0, 2) would result in
00XX -> v = 0000, m = 0011 such that the intended set of {0, 1, 2} is here
represented by a less precise superset of {0, 1, 2, 3}. As the register is
known const scalar, really just use the concrete reg->var_off.value for the
upper index check.

Fixes: d2e4c1e6c294 ("bpf: Constant map key tracking for prog array pokes")
Reported-by: Hsin-Wei Hung <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Cc: Shung-Hsi Yu <[email protected]>
Acked-by: John Fastabend <[email protected]>
Link: https://lore.kernel.org/r/984b37f9fdf7ac36831d2137415a4a915744c1b6.1661462653.git.daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
kernel/bpf/verifier.c | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)

--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -6999,8 +6999,7 @@ record_func_key(struct bpf_verifier_env
struct bpf_insn_aux_data *aux = &env->insn_aux_data[insn_idx];
struct bpf_reg_state *regs = cur_regs(env), *reg;
struct bpf_map *map = meta->map_ptr;
- struct tnum range;
- u64 val;
+ u64 val, max;
int err;

if (func_id != BPF_FUNC_tail_call)
@@ -7010,10 +7009,11 @@ record_func_key(struct bpf_verifier_env
return -EINVAL;
}

- range = tnum_range(0, map->max_entries - 1);
reg = &regs[BPF_REG_3];
+ val = reg->var_off.value;
+ max = map->max_entries;

- if (!register_is_const(reg) || !tnum_in(range, reg->var_off)) {
+ if (!(register_is_const(reg) && val < max)) {
bpf_map_key_store(aux, BPF_MAP_KEY_POISON);
return 0;
}
@@ -7021,8 +7021,6 @@ record_func_key(struct bpf_verifier_env
err = mark_chain_precision(env, BPF_REG_3);
if (err)
return err;
-
- val = reg->var_off.value;
if (bpf_map_key_unseen(aux))
bpf_map_key_store(aux, val);
else if (!bpf_map_key_poisoned(aux) &&


2022-08-29 12:14:30

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 023/158] rose: check NULL rose_loopback_neigh->loopback

From: Bernard Pidoux <[email protected]>

[ Upstream commit 3c53cd65dece47dd1f9d3a809f32e59d1d87b2b8 ]

Commit 3b3fd068c56e3fbea30090859216a368398e39bf added NULL check for
`rose_loopback_neigh->dev` in rose_loopback_timer() but omitted to
check rose_loopback_neigh->loopback.

It thus prevents *all* rose connect.

The reason is that a special rose_neigh loopback has a NULL device.

/proc/net/rose_neigh illustrates it via rose_neigh_show() function :
[...]
seq_printf(seq, "%05d %-9s %-4s %3d %3d %3s %3s %3lu %3lu",
rose_neigh->number,
(rose_neigh->loopback) ? "RSLOOP-0" : ax2asc(buf, &rose_neigh->callsign),
rose_neigh->dev ? rose_neigh->dev->name : "???",
rose_neigh->count,

/proc/net/rose_neigh displays special rose_loopback_neigh->loopback as
callsign RSLOOP-0:

addr callsign dev count use mode restart t0 tf digipeaters
00001 RSLOOP-0 ??? 1 2 DCE yes 0 0

By checking rose_loopback_neigh->loopback, rose_rx_call_request() is called
even in case rose_loopback_neigh->dev is NULL. This repairs rose connections.

Verification with rose client application FPAC:

FPAC-Node v 4.1.3 (built Aug 5 2022) for LINUX (help = h)
F6BVP-4 (Commands = ?) : u
Users - AX.25 Level 2 sessions :
Port Callsign Callsign AX.25 state ROSE state NetRom status
axudp F6BVP-5 -> F6BVP-9 Connected Connected ---------

Fixes: 3b3fd068c56e ("rose: Fix Null pointer dereference in rose_send_frame()")
Signed-off-by: Bernard Pidoux <[email protected]>
Suggested-by: Francois Romieu <[email protected]>
Cc: Thomas DL9SAU Osterried <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/rose/rose_loopback.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/rose/rose_loopback.c b/net/rose/rose_loopback.c
index 11c45c8c6c164..036d92c0ad794 100644
--- a/net/rose/rose_loopback.c
+++ b/net/rose/rose_loopback.c
@@ -96,7 +96,8 @@ static void rose_loopback_timer(struct timer_list *unused)
}

if (frametype == ROSE_CALL_REQUEST) {
- if (!rose_loopback_neigh->dev) {
+ if (!rose_loopback_neigh->dev &&
+ !rose_loopback_neigh->loopback) {
kfree_skb(skb);
continue;
}
--
2.35.1



2022-08-29 12:14:31

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 146/158] perf stat: Clear evsel->reset_group for each stat run

From: Ian Rogers <[email protected]>

commit bf515f024e4c0ca46a1b08c4f31860c01781d8a5 upstream.

If a weak group is broken then the reset_group flag remains set for
the next run. Having reset_group set means the counter isn't created
and ultimately a segfault.

A simple reproduction of this is:

# perf stat -r2 -e '{cycles,cycles,cycles,cycles,cycles,cycles,cycles,cycles,cycles,cycles}:W

which will be added as a test in the next patch.

Fixes: 4804e0111662d7d8 ("perf stat: Use affinity for opening events")
Reviewed-by: Andi Kleen <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Tested-by: Xing Zhengjun <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
tools/perf/builtin-stat.c | 1 +
1 file changed, 1 insertion(+)

--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -826,6 +826,7 @@ static int __run_perf_stat(int argc, con
}

evlist__for_each_entry(evsel_list, counter) {
+ counter->reset_group = false;
if (bpf_counter__load(counter, &target))
return -1;
if (!evsel__is_bpf(counter))


2022-08-29 12:14:40

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 022/158] ntfs: fix acl handling

From: Christian Brauner <[email protected]>

[ Upstream commit 0c3bc7899e6dfb52df1c46118a5a670ae619645f ]

While looking at our current POSIX ACL handling in the context of some
overlayfs work I went through a range of other filesystems checking how they
handle them currently and encountered ntfs3.

The posic_acl_{from,to}_xattr() helpers always need to operate on the
filesystem idmapping. Since ntfs3 can only be mounted in the initial user
namespace the relevant idmapping is init_user_ns.

The posix_acl_{from,to}_xattr() helpers are concerned with translating between
the kernel internal struct posix_acl{_entry} and the uapi struct
posix_acl_xattr_{header,entry} and the kernel internal data structure is cached
filesystem wide.

Additional idmappings such as the caller's idmapping or the mount's idmapping
are handled higher up in the VFS. Individual filesystems usually do not need to
concern themselves with these.

The posix_acl_valid() helper is concerned with checking whether the values in
the kernel internal struct posix_acl can be represented in the filesystem's
idmapping. IOW, if they can be written to disk. So this helper too needs to
take the filesystem's idmapping.

Fixes: be71b5cba2e6 ("fs/ntfs3: Add attrib operations")
Cc: Konstantin Komarov <[email protected]>
Cc: [email protected]
Signed-off-by: Christian Brauner (Microsoft) <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
fs/ntfs3/xattr.c | 16 +++++++---------
1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/fs/ntfs3/xattr.c b/fs/ntfs3/xattr.c
index 1b8c89dbf6684..3629049decac1 100644
--- a/fs/ntfs3/xattr.c
+++ b/fs/ntfs3/xattr.c
@@ -478,8 +478,7 @@ static noinline int ntfs_set_ea(struct inode *inode, const char *name,
}

#ifdef CONFIG_NTFS3_FS_POSIX_ACL
-static struct posix_acl *ntfs_get_acl_ex(struct user_namespace *mnt_userns,
- struct inode *inode, int type,
+static struct posix_acl *ntfs_get_acl_ex(struct inode *inode, int type,
int locked)
{
struct ntfs_inode *ni = ntfs_i(inode);
@@ -514,7 +513,7 @@ static struct posix_acl *ntfs_get_acl_ex(struct user_namespace *mnt_userns,

/* Translate extended attribute to acl. */
if (err >= 0) {
- acl = posix_acl_from_xattr(mnt_userns, buf, err);
+ acl = posix_acl_from_xattr(&init_user_ns, buf, err);
} else if (err == -ENODATA) {
acl = NULL;
} else {
@@ -537,8 +536,7 @@ struct posix_acl *ntfs_get_acl(struct inode *inode, int type, bool rcu)
if (rcu)
return ERR_PTR(-ECHILD);

- /* TODO: init_user_ns? */
- return ntfs_get_acl_ex(&init_user_ns, inode, type, 0);
+ return ntfs_get_acl_ex(inode, type, 0);
}

static noinline int ntfs_set_acl_ex(struct user_namespace *mnt_userns,
@@ -590,7 +588,7 @@ static noinline int ntfs_set_acl_ex(struct user_namespace *mnt_userns,
value = kmalloc(size, GFP_NOFS);
if (!value)
return -ENOMEM;
- err = posix_acl_to_xattr(mnt_userns, acl, value, size);
+ err = posix_acl_to_xattr(&init_user_ns, acl, value, size);
if (err < 0)
goto out;
flags = 0;
@@ -641,7 +639,7 @@ static int ntfs_xattr_get_acl(struct user_namespace *mnt_userns,
if (!acl)
return -ENODATA;

- err = posix_acl_to_xattr(mnt_userns, acl, buffer, size);
+ err = posix_acl_to_xattr(&init_user_ns, acl, buffer, size);
posix_acl_release(acl);

return err;
@@ -665,12 +663,12 @@ static int ntfs_xattr_set_acl(struct user_namespace *mnt_userns,
if (!value) {
acl = NULL;
} else {
- acl = posix_acl_from_xattr(mnt_userns, value, size);
+ acl = posix_acl_from_xattr(&init_user_ns, value, size);
if (IS_ERR(acl))
return PTR_ERR(acl);

if (acl) {
- err = posix_acl_valid(mnt_userns, acl);
+ err = posix_acl_valid(&init_user_ns, acl);
if (err)
goto release_and_out;
}
--
2.35.1



2022-08-29 12:14:41

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 131/158] riscv: dts: microchip: correct L2 cache interrupts

From: Heinrich Schuchardt <[email protected]>

commit 34fc9cc3aebe8b9e27d3bc821543dd482dc686ca upstream.

The "PolarFire SoC MSS Technical Reference Manual" documents the
following PLIC interrupts:

1 - L2 Cache Controller Signals when a metadata correction event occurs
2 - L2 Cache Controller Signals when an uncorrectable metadata event occurs
3 - L2 Cache Controller Signals when a data correction event occurs
4 - L2 Cache Controller Signals when an uncorrectable data event occurs

This differs from the SiFive FU540 which only has three L2 cache related
interrupts.

The sequence in the device tree is defined by an enum:

enum {
        DIR_CORR = 0,
        DATA_CORR,
        DATA_UNCORR,
        DIR_UNCORR,
};

So the correct sequence of the L2 cache interrupts is

interrupts = <1>, <3>, <4>, <2>;

[Conor]
This manifests as an unusable system if the l2-cache driver is enabled,
as the wrong interrupt gets cleared & the handler prints errors to the
console ad infinitum.

Fixes: 0fa6107eca41 ("RISC-V: Initial DTS for Microchip ICICLE board")
CC: [email protected] # 5.15: e35b07a7df9b: riscv: dts: microchip: mpfs: Group tuples in interrupt properties
Signed-off-by: Heinrich Schuchardt <[email protected]>
Signed-off-by: Conor Dooley <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/riscv/boot/dts/microchip/mpfs.dtsi | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/riscv/boot/dts/microchip/mpfs.dtsi
+++ b/arch/riscv/boot/dts/microchip/mpfs.dtsi
@@ -169,7 +169,7 @@
cache-size = <2097152>;
cache-unified;
interrupt-parent = <&plic>;
- interrupts = <1>, <2>, <3>;
+ interrupts = <1>, <3>, <4>, <2>;
};

clint: clint@2000000 {


2022-08-29 12:14:41

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 117/158] mm/hugetlb: avoid corrupting page->mapping in hugetlb_mcopy_atomic_pte

From: Miaohe Lin <[email protected]>

commit ab74ef708dc51df7cf2b8a890b9c6990fac5c0c6 upstream.

In MCOPY_ATOMIC_CONTINUE case with a non-shared VMA, pages in the page
cache are installed in the ptes. But hugepage_add_new_anon_rmap is called
for them mistakenly because they're not vm_shared. This will corrupt the
page->mapping used by page cache code.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: f619147104c8 ("userfaultfd: add UFFDIO_CONTINUE ioctl")
Signed-off-by: Miaohe Lin <[email protected]>
Reviewed-by: Mike Kravetz <[email protected]>
Cc: Axel Rasmussen <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
mm/hugetlb.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6026,7 +6026,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_s
if (!huge_pte_none_mostly(huge_ptep_get(dst_pte)))
goto out_release_unlock;

- if (vm_shared) {
+ if (page_in_pagecache) {
page_dup_file_rmap(page, true);
} else {
ClearHPageRestoreReserve(page);


2022-08-29 12:14:41

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 152/158] scsi: storvsc: Remove WQ_MEM_RECLAIM from storvsc_error_wq

From: Saurabh Sengar <[email protected]>

commit d957e7ffb2c72410bcc1a514153a46719255a5da upstream.

storvsc_error_wq workqueue should not be marked as WQ_MEM_RECLAIM as it
doesn't need to make forward progress under memory pressure. Marking this
workqueue as WQ_MEM_RECLAIM may cause deadlock while flushing a
non-WQ_MEM_RECLAIM workqueue. In the current state it causes the following
warning:

[ 14.506347] ------------[ cut here ]------------
[ 14.506354] workqueue: WQ_MEM_RECLAIM storvsc_error_wq_0:storvsc_remove_lun is flushing !WQ_MEM_RECLAIM events_freezable_power_:disk_events_workfn
[ 14.506360] WARNING: CPU: 0 PID: 8 at <-snip->kernel/workqueue.c:2623 check_flush_dependency+0xb5/0x130
[ 14.506390] CPU: 0 PID: 8 Comm: kworker/u4:0 Not tainted 5.4.0-1086-azure #91~18.04.1-Ubuntu
[ 14.506391] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 05/09/2022
[ 14.506393] Workqueue: storvsc_error_wq_0 storvsc_remove_lun
[ 14.506395] RIP: 0010:check_flush_dependency+0xb5/0x130
<-snip->
[ 14.506408] Call Trace:
[ 14.506412] __flush_work+0xf1/0x1c0
[ 14.506414] __cancel_work_timer+0x12f/0x1b0
[ 14.506417] ? kernfs_put+0xf0/0x190
[ 14.506418] cancel_delayed_work_sync+0x13/0x20
[ 14.506420] disk_block_events+0x78/0x80
[ 14.506421] del_gendisk+0x3d/0x2f0
[ 14.506423] sr_remove+0x28/0x70
[ 14.506427] device_release_driver_internal+0xef/0x1c0
[ 14.506428] device_release_driver+0x12/0x20
[ 14.506429] bus_remove_device+0xe1/0x150
[ 14.506431] device_del+0x167/0x380
[ 14.506432] __scsi_remove_device+0x11d/0x150
[ 14.506433] scsi_remove_device+0x26/0x40
[ 14.506434] storvsc_remove_lun+0x40/0x60
[ 14.506436] process_one_work+0x209/0x400
[ 14.506437] worker_thread+0x34/0x400
[ 14.506439] kthread+0x121/0x140
[ 14.506440] ? process_one_work+0x400/0x400
[ 14.506441] ? kthread_park+0x90/0x90
[ 14.506443] ret_from_fork+0x35/0x40
[ 14.506445] ---[ end trace 2d9633159fdc6ee7 ]---

Link: https://lore.kernel.org/r/[email protected]
Fixes: 436ad9413353 ("scsi: storvsc: Allow only one remove lun work item to be issued per lun")
Reviewed-by: Michael Kelley <[email protected]>
Signed-off-by: Saurabh Sengar <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/scsi/storvsc_drv.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -2012,7 +2012,7 @@ static int storvsc_probe(struct hv_devic
*/
host_dev->handle_error_wq =
alloc_ordered_workqueue("storvsc_error_wq_%d",
- WQ_MEM_RECLAIM,
+ 0,
host->host_no);
if (!host_dev->handle_error_wq) {
ret = -ENOMEM;


2022-08-29 12:14:46

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 068/158] net: Fix data-races around netdev_tstamp_prequeue.

From: Kuniyuki Iwashima <[email protected]>

[ Upstream commit 61adf447e38664447526698872e21c04623afb8e ]

While reading netdev_tstamp_prequeue, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

Fixes: 3b098e2d7c69 ("net: Consistent skb timestamping")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/core/dev.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index e1496e626a532..34282b93c3f60 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4927,7 +4927,7 @@ static int netif_rx_internal(struct sk_buff *skb)
{
int ret;

- net_timestamp_check(netdev_tstamp_prequeue, skb);
+ net_timestamp_check(READ_ONCE(netdev_tstamp_prequeue), skb);

trace_netif_rx(skb);

@@ -5280,7 +5280,7 @@ static int __netif_receive_skb_core(struct sk_buff **pskb, bool pfmemalloc,
int ret = NET_RX_DROP;
__be16 type;

- net_timestamp_check(!netdev_tstamp_prequeue, skb);
+ net_timestamp_check(!READ_ONCE(netdev_tstamp_prequeue), skb);

trace_netif_receive_skb(skb);

@@ -5663,7 +5663,7 @@ static int netif_receive_skb_internal(struct sk_buff *skb)
{
int ret;

- net_timestamp_check(netdev_tstamp_prequeue, skb);
+ net_timestamp_check(READ_ONCE(netdev_tstamp_prequeue), skb);

if (skb_defer_rx_timestamp(skb))
return NET_RX_SUCCESS;
@@ -5693,7 +5693,7 @@ void netif_receive_skb_list_internal(struct list_head *head)

INIT_LIST_HEAD(&sublist);
list_for_each_entry_safe(skb, next, head, list) {
- net_timestamp_check(netdev_tstamp_prequeue, skb);
+ net_timestamp_check(READ_ONCE(netdev_tstamp_prequeue), skb);
skb_list_del_init(skb);
if (!skb_defer_rx_timestamp(skb))
list_add_tail(&skb->list, &sublist);
--
2.35.1



2022-08-29 12:14:48

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 053/158] bnxt_en: fix NQ resource accounting during vf creation on 57500 chips

From: Vikas Gupta <[email protected]>

[ Upstream commit 09a89cc59ad67794a11e1d3dd13c5b3172adcc51 ]

There are 2 issues:

1. We should decrement hw_resc->max_nqs instead of hw_resc->max_irqs
with the number of NQs assigned to the VFs. The IRQs are fixed
on each function and cannot be re-assigned. Only the NQs are being
assigned to the VFs.

2. vf_msix is the total number of NQs to be assigned to the VFs. So
we should decrement vf_msix from hw_resc->max_nqs.

Fixes: b16b68918674 ("bnxt_en: Add SR-IOV support for 57500 chips.")
Signed-off-by: Vikas Gupta <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
index a1a2c7a64fd58..c9cf0569451a2 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
@@ -623,7 +623,7 @@ static int bnxt_hwrm_func_vf_resc_cfg(struct bnxt *bp, int num_vfs, bool reset)
hw_resc->max_stat_ctxs -= le16_to_cpu(req->min_stat_ctx) * n;
hw_resc->max_vnics -= le16_to_cpu(req->min_vnics) * n;
if (bp->flags & BNXT_FLAG_CHIP_P5)
- hw_resc->max_irqs -= vf_msix * n;
+ hw_resc->max_nqs -= vf_msix;

rc = pf->active_vfs;
}
--
2.35.1



2022-08-29 12:14:54

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 151/158] scsi: ufs: core: Enable link lost interrupt

From: Kiwoong Kim <[email protected]>

commit 6d17a112e9a63ff6a5edffd1676b99e0ffbcd269 upstream.

Link lost is treated as fatal error with commit c99b9b230149 ("scsi: ufs:
Treat link loss as fatal error"), but the event isn't registered as
interrupt source. Enable it.

Link: https://lore.kernel.org/r/[email protected]
Fixes: c99b9b230149 ("scsi: ufs: Treat link loss as fatal error")
Reviewed-by: Bart Van Assche <[email protected]>
Signed-off-by: Kiwoong Kim <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/ufs/ufshci.h | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)

--- a/include/ufs/ufshci.h
+++ b/include/ufs/ufshci.h
@@ -135,11 +135,7 @@ static inline u32 ufshci_version(u32 maj

#define UFSHCD_UIC_MASK (UIC_COMMAND_COMPL | UFSHCD_UIC_PWR_MASK)

-#define UFSHCD_ERROR_MASK (UIC_ERROR |\
- DEVICE_FATAL_ERROR |\
- CONTROLLER_FATAL_ERROR |\
- SYSTEM_BUS_FATAL_ERROR |\
- CRYPTO_ENGINE_FATAL_ERROR)
+#define UFSHCD_ERROR_MASK (UIC_ERROR | INT_FATAL_ERRORS)

#define INT_FATAL_ERRORS (DEVICE_FATAL_ERROR |\
CONTROLLER_FATAL_ERROR |\


2022-08-29 12:14:55

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 136/158] md: call __md_stop_writes in md_stop

From: Guoqing Jiang <[email protected]>

commit 0dd84b319352bb8ba64752d4e45396d8b13e6018 upstream.

>From the link [1], we can see raid1d was running even after the path
raid_dtr -> md_stop -> __md_stop.

Let's stop write first in destructor to align with normal md-raid to
fix the KASAN issue.

[1]. https://lore.kernel.org/linux-raid/CAPhsuW5gc4AakdGNdF8ubpezAuDLFOYUO_sfMZcec6hQFm8nhg@mail.gmail.com/T/#m7f12bf90481c02c6d2da68c64aeed4779b7df74a

Fixes: 48df498daf62 ("md: move bitmap_destroy to the beginning of __md_stop")
Reported-by: Mikulas Patocka <[email protected]>
Signed-off-by: Guoqing Jiang <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/md/md.c | 1 +
1 file changed, 1 insertion(+)

--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -6266,6 +6266,7 @@ void md_stop(struct mddev *mddev)
/* stop the array and free an attached data structures.
* This is called from dm-raid
*/
+ __md_stop_writes(mddev);
__md_stop(mddev);
bioset_exit(&mddev->bio_set);
bioset_exit(&mddev->sync_set);


2022-08-29 12:14:59

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 120/158] s390: fix double free of GS and RI CBs on fork() failure

From: Brian Foster <[email protected]>

commit 13cccafe0edcd03bf1c841de8ab8a1c8e34f77d9 upstream.

The pointers for guarded storage and runtime instrumentation control
blocks are stored in the thread_struct of the associated task. These
pointers are initially copied on fork() via arch_dup_task_struct()
and then cleared via copy_thread() before fork() returns. If fork()
happens to fail after the initial task dup and before copy_thread(),
the newly allocated task and associated thread_struct memory are
freed via free_task() -> arch_release_task_struct(). This results in
a double free of the guarded storage and runtime info structs
because the fields in the failed task still refer to memory
associated with the source task.

This problem can manifest as a BUG_ON() in set_freepointer() (with
CONFIG_SLAB_FREELIST_HARDENED enabled) or KASAN splat (if enabled)
when running trinity syscall fuzz tests on s390x. To avoid this
problem, clear the associated pointer fields in
arch_dup_task_struct() immediately after the new task is copied.
Note that the RI flag is still cleared in copy_thread() because it
resides in thread stack memory and that is where stack info is
copied.

Signed-off-by: Brian Foster <[email protected]>
Fixes: 8d9047f8b967c ("s390/runtime instrumentation: simplify task exit handling")
Fixes: 7b83c6297d2fc ("s390/guarded storage: simplify task exit handling")
Cc: <[email protected]> # 4.15
Reviewed-by: Gerald Schaefer <[email protected]>
Reviewed-by: Heiko Carstens <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Vasily Gorbik <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/s390/kernel/process.c | 22 ++++++++++++++++------
1 file changed, 16 insertions(+), 6 deletions(-)

--- a/arch/s390/kernel/process.c
+++ b/arch/s390/kernel/process.c
@@ -91,6 +91,18 @@ int arch_dup_task_struct(struct task_str

memcpy(dst, src, arch_task_struct_size);
dst->thread.fpu.regs = dst->thread.fpu.fprs;
+
+ /*
+ * Don't transfer over the runtime instrumentation or the guarded
+ * storage control block pointers. These fields are cleared here instead
+ * of in copy_thread() to avoid premature freeing of associated memory
+ * on fork() failure. Wait to clear the RI flag because ->stack still
+ * refers to the source thread.
+ */
+ dst->thread.ri_cb = NULL;
+ dst->thread.gs_cb = NULL;
+ dst->thread.gs_bc_cb = NULL;
+
return 0;
}

@@ -150,13 +162,11 @@ int copy_thread(struct task_struct *p, c
frame->childregs.flags = 0;
if (new_stackp)
frame->childregs.gprs[15] = new_stackp;
-
- /* Don't copy runtime instrumentation info */
- p->thread.ri_cb = NULL;
+ /*
+ * Clear the runtime instrumentation flag after the above childregs
+ * copy. The CB pointer was already cleared in arch_dup_task_struct().
+ */
frame->childregs.psw.mask &= ~PSW_MASK_RI;
- /* Don't copy guarded storage control block */
- p->thread.gs_cb = NULL;
- p->thread.gs_bc_cb = NULL;

/* Set a new TLS ? */
if (clone_flags & CLONE_SETTLS) {


2022-08-29 12:15:04

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 016/158] xfrm: policy: fix metadata dst->dev xmit null pointer dereference

From: Nikolay Aleksandrov <[email protected]>

[ Upstream commit 17ecd4a4db4783392edd4944f5e8268205083f70 ]

When we try to transmit an skb with metadata_dst attached (i.e. dst->dev
== NULL) through xfrm interface we can hit a null pointer dereference[1]
in xfrmi_xmit2() -> xfrm_lookup_with_ifid() due to the check for a
loopback skb device when there's no policy which dereferences dst->dev
unconditionally. Not having dst->dev can be interepreted as it not being
a loopback device, so just add a check for a null dst_orig->dev.

With this fix xfrm interface's Tx error counters go up as usual.

[1] net-next calltrace captured via netconsole:
BUG: kernel NULL pointer dereference, address: 00000000000000c0
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP
CPU: 1 PID: 7231 Comm: ping Kdump: loaded Not tainted 5.19.0+ #24
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-1.fc36 04/01/2014
RIP: 0010:xfrm_lookup_with_ifid+0x5eb/0xa60
Code: 8d 74 24 38 e8 26 a4 37 00 48 89 c1 e9 12 fc ff ff 49 63 ed 41 83 fd be 0f 85 be 01 00 00 41 be ff ff ff ff 45 31 ed 48 8b 03 <f6> 80 c0 00 00 00 08 75 0f 41 80 bc 24 19 0d 00 00 01 0f 84 1e 02
RSP: 0018:ffffb0db82c679f0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffd0db7fcad430 RCX: ffffb0db82c67a10
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffb0db82c67a80
RBP: ffffb0db82c67a80 R08: ffffb0db82c67a14 R09: 0000000000000000
R10: 0000000000000000 R11: ffff8fa449667dc8 R12: ffffffff966db880
R13: 0000000000000000 R14: 00000000ffffffff R15: 0000000000000000
FS: 00007ff35c83f000(0000) GS:ffff8fa478480000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000000c0 CR3: 000000001ebb7000 CR4: 0000000000350ee0
Call Trace:
<TASK>
xfrmi_xmit+0xde/0x460
? tcf_bpf_act+0x13d/0x2a0
dev_hard_start_xmit+0x72/0x1e0
__dev_queue_xmit+0x251/0xd30
ip_finish_output2+0x140/0x550
ip_push_pending_frames+0x56/0x80
raw_sendmsg+0x663/0x10a0
? try_charge_memcg+0x3fd/0x7a0
? __mod_memcg_lruvec_state+0x93/0x110
? sock_sendmsg+0x30/0x40
sock_sendmsg+0x30/0x40
__sys_sendto+0xeb/0x130
? handle_mm_fault+0xae/0x280
? do_user_addr_fault+0x1e7/0x680
? kvm_read_and_reset_apf_flags+0x3b/0x50
__x64_sys_sendto+0x20/0x30
do_syscall_64+0x34/0x80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
RIP: 0033:0x7ff35cac1366
Code: eb 0b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 11 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 72 c3 90 55 48 83 ec 30 44 89 4c 24 2c 4c 89
RSP: 002b:00007fff738e4028 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00007fff738e57b0 RCX: 00007ff35cac1366
RDX: 0000000000000040 RSI: 0000557164e4b450 RDI: 0000000000000003
RBP: 0000557164e4b450 R08: 00007fff738e7a2c R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
R13: 00007fff738e5770 R14: 00007fff738e4030 R15: 0000001d00000001
</TASK>
Modules linked in: netconsole veth br_netfilter bridge bonding virtio_net [last unloaded: netconsole]
CR2: 00000000000000c0

CC: Steffen Klassert <[email protected]>
CC: Daniel Borkmann <[email protected]>
Fixes: 2d151d39073a ("xfrm: Add possibility to set the default to block if we have no policy")
Signed-off-by: Nikolay Aleksandrov <[email protected]>
Signed-off-by: Steffen Klassert <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/xfrm/xfrm_policy.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 4f8bbb825abcb..cc6ab79609e29 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -3162,7 +3162,7 @@ struct dst_entry *xfrm_lookup_with_ifid(struct net *net,
return dst;

nopol:
- if (!(dst_orig->dev->flags & IFF_LOOPBACK) &&
+ if ((!dst_orig->dev || !(dst_orig->dev->flags & IFF_LOOPBACK)) &&
net->xfrm.policy_default[dir] == XFRM_USERPOLICY_BLOCK) {
err = -EPERM;
goto error;
--
2.35.1



2022-08-29 12:15:10

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 003/158] audit: fix potential double free on error path from fsnotify_add_inode_mark

From: Gaosheng Cui <[email protected]>

commit ad982c3be4e60c7d39c03f782733503cbd88fd2a upstream.

Audit_alloc_mark() assign pathname to audit_mark->path, on error path
from fsnotify_add_inode_mark(), fsnotify_put_mark will free memory
of audit_mark->path, but the caller of audit_alloc_mark will free
the pathname again, so there will be double free problem.

Fix this by resetting audit_mark->path to NULL pointer on error path
from fsnotify_add_inode_mark().

Cc: [email protected]
Fixes: 7b1293234084d ("fsnotify: Add group pointer in fsnotify_init_mark()")
Signed-off-by: Gaosheng Cui <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Paul Moore <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
kernel/audit_fsnotify.c | 1 +
1 file changed, 1 insertion(+)

--- a/kernel/audit_fsnotify.c
+++ b/kernel/audit_fsnotify.c
@@ -102,6 +102,7 @@ struct audit_fsnotify_mark *audit_alloc_

ret = fsnotify_add_inode_mark(&audit_mark->mark, inode, 0);
if (ret < 0) {
+ audit_mark->path = NULL;
fsnotify_put_mark(&audit_mark->mark);
audit_mark = ERR_PTR(ret);
}


2022-08-29 12:15:24

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 112/158] audit: move audit_return_fixup before the filters

From: Richard Guy Briggs <[email protected]>

commit d4fefa4801a1c2f9c0c7a48fbb0fdf384e89a4ab upstream.

The success and return_code are needed by the filters. Move
audit_return_fixup() before the filters. This was causing syscall
auditing events to be missed.

Link: https://github.com/linux-audit/audit-kernel/issues/138
Cc: [email protected]
Fixes: 12c5e81d3fd0 ("audit: prepare audit_context for use in calling contexts beyond syscalls")
Signed-off-by: Richard Guy Briggs <[email protected]>
[PM: manual merge required]
Signed-off-by: Paul Moore <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
kernel/auditsc.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -1965,6 +1965,7 @@ void __audit_uring_exit(int success, lon
goto out;
}

+ audit_return_fixup(ctx, success, code);
if (ctx->context == AUDIT_CTX_SYSCALL) {
/*
* NOTE: See the note in __audit_uring_entry() about the case
@@ -2006,7 +2007,6 @@ void __audit_uring_exit(int success, lon
audit_filter_inodes(current, ctx);
if (ctx->current_state != AUDIT_STATE_RECORD)
goto out;
- audit_return_fixup(ctx, success, code);
audit_log_exit();

out:
@@ -2090,13 +2090,13 @@ void __audit_syscall_exit(int success, l
if (!list_empty(&context->killed_trees))
audit_kill_trees(context);

+ audit_return_fixup(context, success, return_code);
/* run through both filters to ensure we set the filterkey properly */
audit_filter_syscall(current, context);
audit_filter_inodes(current, context);
if (context->current_state < AUDIT_STATE_RECORD)
goto out;

- audit_return_fixup(context, success, return_code);
audit_log_exit();

out:


2022-08-29 12:15:28

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 101/158] x86/boot: Dont propagate uninitialized boot_params->cc_blob_address

From: Michael Roth <[email protected]>

commit 4b1c742407571eff58b6de9881889f7ca7c4b4dc upstream.

In some cases, bootloaders will leave boot_params->cc_blob_address
uninitialized rather than zeroing it out. This field is only meant to be
set by the boot/compressed kernel in order to pass information to the
uncompressed kernel when SEV-SNP support is enabled.

Therefore, there are no cases where the bootloader-provided values
should be treated as anything other than garbage. Otherwise, the
uncompressed kernel may attempt to access this bogus address, leading to
a crash during early boot.

Normally, sanitize_boot_params() would be used to clear out such fields
but that happens too late: sev_enable() may have already initialized
it to a valid value that should not be zeroed out. Instead, have
sev_enable() zero it out unconditionally beforehand.

Also ensure this happens for !CONFIG_AMD_MEM_ENCRYPT as well by also
including this handling in the sev_enable() stub function.

[ bp: Massage commit message and comments. ]

Fixes: b190a043c49a ("x86/sev: Add SEV-SNP feature detection/setup")
Reported-by: Jeremi Piotrowski <[email protected]>
Reported-by: [email protected]
Signed-off-by: Michael Roth <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: [email protected]
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216387
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/boot/compressed/misc.h | 12 +++++++++++-
arch/x86/boot/compressed/sev.c | 8 ++++++++
2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 4910bf230d7b..62208ec04ca4 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -132,7 +132,17 @@ void snp_set_page_private(unsigned long paddr);
void snp_set_page_shared(unsigned long paddr);
void sev_prep_identity_maps(unsigned long top_level_pgt);
#else
-static inline void sev_enable(struct boot_params *bp) { }
+static inline void sev_enable(struct boot_params *bp)
+{
+ /*
+ * bp->cc_blob_address should only be set by boot/compressed kernel.
+ * Initialize it to 0 unconditionally (thus here in this stub too) to
+ * ensure that uninitialized values from buggy bootloaders aren't
+ * propagated.
+ */
+ if (bp)
+ bp->cc_blob_address = 0;
+}
static inline void sev_es_shutdown_ghcb(void) { }
static inline bool sev_es_check_ghcb_fault(unsigned long address)
{
diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index 52f989f6acc2..c93930d5ccbd 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -276,6 +276,14 @@ void sev_enable(struct boot_params *bp)
struct msr m;
bool snp;

+ /*
+ * bp->cc_blob_address should only be set by boot/compressed kernel.
+ * Initialize it to 0 to ensure that uninitialized values from
+ * buggy bootloaders aren't propagated.
+ */
+ if (bp)
+ bp->cc_blob_address = 0;
+
/*
* Setup/preliminary detection of SNP. This will be sanity-checked
* against CPUID/MSR values later.
--
2.37.2



2022-08-29 12:15:39

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 095/158] btrfs: replace: drop assert for suspended replace

From: Anand Jain <[email protected]>

commit 59a3991984dbc1fc47e5651a265c5200bd85464e upstream.

If the filesystem mounts with the replace-operation in a suspended state
and try to cancel the suspended replace-operation, we hit the assert. The
assert came from the commit fe97e2e173af ("btrfs: dev-replace: replace's
scrub must not be running in suspended state") that was actually not
required. So just remove it.

$ mount /dev/sda5 /btrfs

BTRFS info (device sda5): cannot continue dev_replace, tgtdev is missing
BTRFS info (device sda5): you may cancel the operation after 'mount -o degraded'

$ mount -o degraded /dev/sda5 /btrfs <-- success.

$ btrfs replace cancel /btrfs

kernel: assertion failed: ret != -ENOTCONN, in fs/btrfs/dev-replace.c:1131
kernel: ------------[ cut here ]------------
kernel: kernel BUG at fs/btrfs/ctree.h:3750!

After the patch:

$ btrfs replace cancel /btrfs

BTRFS info (device sda5): suspended dev_replace from /dev/sda5 (devid 1) to <missing disk> canceled

Fixes: fe97e2e173af ("btrfs: dev-replace: replace's scrub must not be running in suspended state")
CC: [email protected] # 5.0+
Signed-off-by: Anand Jain <[email protected]>
Signed-off-by: David Sterba <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/btrfs/dev-replace.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -1128,8 +1128,7 @@ int btrfs_dev_replace_cancel(struct btrf
up_write(&dev_replace->rwsem);

/* Scrub for replace must not be running in suspended state */
- ret = btrfs_scrub_cancel(fs_info);
- ASSERT(ret != -ENOTCONN);
+ btrfs_scrub_cancel(fs_info);

trans = btrfs_start_transaction(root, 0);
if (IS_ERR(trans)) {


2022-08-29 12:16:00

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 107/158] x86/bugs: Add "unknown" reporting for MMIO Stale Data

From: Pawan Gupta <[email protected]>

commit 7df548840c496b0141fb2404b889c346380c2b22 upstream.

Older Intel CPUs that are not in the affected processor list for MMIO
Stale Data vulnerabilities currently report "Not affected" in sysfs,
which may not be correct. Vulnerability status for these older CPUs is
unknown.

Add known-not-affected CPUs to the whitelist. Report "unknown"
mitigation status for CPUs that are not in blacklist, whitelist and also
don't enumerate MSR ARCH_CAPABILITIES bits that reflect hardware
immunity to MMIO Stale Data vulnerabilities.

Mitigation is not deployed when the status is unknown.

[ bp: Massage, fixup. ]

Fixes: 8d50cdf8b834 ("x86/speculation/mmio: Add sysfs reporting for Processor MMIO Stale Data")
Suggested-by: Andrew Cooper <[email protected]>
Suggested-by: Tony Luck <[email protected]>
Signed-off-by: Pawan Gupta <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/a932c154772f2121794a5f2eded1a11013114711.1657846269.git.pawan.kumar.gupta@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
Documentation/admin-guide/hw-vuln/processor_mmio_stale_data.rst | 14 +++
arch/x86/include/asm/cpufeatures.h | 5 -
arch/x86/kernel/cpu/bugs.c | 14 ++-
arch/x86/kernel/cpu/common.c | 42 ++++++----
4 files changed, 56 insertions(+), 19 deletions(-)

--- a/Documentation/admin-guide/hw-vuln/processor_mmio_stale_data.rst
+++ b/Documentation/admin-guide/hw-vuln/processor_mmio_stale_data.rst
@@ -230,6 +230,20 @@ The possible values in this file are:
* - 'Mitigation: Clear CPU buffers'
- The processor is vulnerable and the CPU buffer clearing mitigation is
enabled.
+ * - 'Unknown: No mitigations'
+ - The processor vulnerability status is unknown because it is
+ out of Servicing period. Mitigation is not attempted.
+
+Definitions:
+------------
+
+Servicing period: The process of providing functional and security updates to
+Intel processors or platforms, utilizing the Intel Platform Update (IPU)
+process or other similar mechanisms.
+
+End of Servicing Updates (ESU): ESU is the date at which Intel will no
+longer provide Servicing, such as through IPU or other similar update
+processes. ESU dates will typically be aligned to end of quarter.

If the processor is vulnerable then the following information is appended to
the above information:
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -456,7 +456,8 @@
#define X86_BUG_ITLB_MULTIHIT X86_BUG(23) /* CPU may incur MCE during certain page attribute changes */
#define X86_BUG_SRBDS X86_BUG(24) /* CPU may leak RNG bits if not mitigated */
#define X86_BUG_MMIO_STALE_DATA X86_BUG(25) /* CPU is affected by Processor MMIO Stale Data vulnerabilities */
-#define X86_BUG_RETBLEED X86_BUG(26) /* CPU is affected by RETBleed */
-#define X86_BUG_EIBRS_PBRSB X86_BUG(27) /* EIBRS is vulnerable to Post Barrier RSB Predictions */
+#define X86_BUG_MMIO_UNKNOWN X86_BUG(26) /* CPU is too old and its MMIO Stale Data status is unknown */
+#define X86_BUG_RETBLEED X86_BUG(27) /* CPU is affected by RETBleed */
+#define X86_BUG_EIBRS_PBRSB X86_BUG(28) /* EIBRS is vulnerable to Post Barrier RSB Predictions */

#endif /* _ASM_X86_CPUFEATURES_H */
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -433,7 +433,8 @@ static void __init mmio_select_mitigatio
u64 ia32_cap;

if (!boot_cpu_has_bug(X86_BUG_MMIO_STALE_DATA) ||
- cpu_mitigations_off()) {
+ boot_cpu_has_bug(X86_BUG_MMIO_UNKNOWN) ||
+ cpu_mitigations_off()) {
mmio_mitigation = MMIO_MITIGATION_OFF;
return;
}
@@ -538,6 +539,8 @@ out:
pr_info("TAA: %s\n", taa_strings[taa_mitigation]);
if (boot_cpu_has_bug(X86_BUG_MMIO_STALE_DATA))
pr_info("MMIO Stale Data: %s\n", mmio_strings[mmio_mitigation]);
+ else if (boot_cpu_has_bug(X86_BUG_MMIO_UNKNOWN))
+ pr_info("MMIO Stale Data: Unknown: No mitigations\n");
}

static void __init md_clear_select_mitigation(void)
@@ -2275,6 +2278,9 @@ static ssize_t tsx_async_abort_show_stat

static ssize_t mmio_stale_data_show_state(char *buf)
{
+ if (boot_cpu_has_bug(X86_BUG_MMIO_UNKNOWN))
+ return sysfs_emit(buf, "Unknown: No mitigations\n");
+
if (mmio_mitigation == MMIO_MITIGATION_OFF)
return sysfs_emit(buf, "%s\n", mmio_strings[mmio_mitigation]);

@@ -2421,6 +2427,7 @@ static ssize_t cpu_show_common(struct de
return srbds_show_state(buf);

case X86_BUG_MMIO_STALE_DATA:
+ case X86_BUG_MMIO_UNKNOWN:
return mmio_stale_data_show_state(buf);

case X86_BUG_RETBLEED:
@@ -2480,7 +2487,10 @@ ssize_t cpu_show_srbds(struct device *de

ssize_t cpu_show_mmio_stale_data(struct device *dev, struct device_attribute *attr, char *buf)
{
- return cpu_show_common(dev, attr, buf, X86_BUG_MMIO_STALE_DATA);
+ if (boot_cpu_has_bug(X86_BUG_MMIO_UNKNOWN))
+ return cpu_show_common(dev, attr, buf, X86_BUG_MMIO_UNKNOWN);
+ else
+ return cpu_show_common(dev, attr, buf, X86_BUG_MMIO_STALE_DATA);
}

ssize_t cpu_show_retbleed(struct device *dev, struct device_attribute *attr, char *buf)
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1135,7 +1135,8 @@ static void identify_cpu_without_cpuid(s
#define NO_SWAPGS BIT(6)
#define NO_ITLB_MULTIHIT BIT(7)
#define NO_SPECTRE_V2 BIT(8)
-#define NO_EIBRS_PBRSB BIT(9)
+#define NO_MMIO BIT(9)
+#define NO_EIBRS_PBRSB BIT(10)

#define VULNWL(vendor, family, model, whitelist) \
X86_MATCH_VENDOR_FAM_MODEL(vendor, family, model, whitelist)
@@ -1158,6 +1159,11 @@ static const __initconst struct x86_cpu_
VULNWL(VORTEX, 6, X86_MODEL_ANY, NO_SPECULATION),

/* Intel Family 6 */
+ VULNWL_INTEL(TIGERLAKE, NO_MMIO),
+ VULNWL_INTEL(TIGERLAKE_L, NO_MMIO),
+ VULNWL_INTEL(ALDERLAKE, NO_MMIO),
+ VULNWL_INTEL(ALDERLAKE_L, NO_MMIO),
+
VULNWL_INTEL(ATOM_SALTWELL, NO_SPECULATION | NO_ITLB_MULTIHIT),
VULNWL_INTEL(ATOM_SALTWELL_TABLET, NO_SPECULATION | NO_ITLB_MULTIHIT),
VULNWL_INTEL(ATOM_SALTWELL_MID, NO_SPECULATION | NO_ITLB_MULTIHIT),
@@ -1176,9 +1182,9 @@ static const __initconst struct x86_cpu_
VULNWL_INTEL(ATOM_AIRMONT_MID, NO_L1TF | MSBDS_ONLY | NO_SWAPGS | NO_ITLB_MULTIHIT),
VULNWL_INTEL(ATOM_AIRMONT_NP, NO_L1TF | NO_SWAPGS | NO_ITLB_MULTIHIT),

- VULNWL_INTEL(ATOM_GOLDMONT, NO_MDS | NO_L1TF | NO_SWAPGS | NO_ITLB_MULTIHIT),
- VULNWL_INTEL(ATOM_GOLDMONT_D, NO_MDS | NO_L1TF | NO_SWAPGS | NO_ITLB_MULTIHIT),
- VULNWL_INTEL(ATOM_GOLDMONT_PLUS, NO_MDS | NO_L1TF | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_EIBRS_PBRSB),
+ VULNWL_INTEL(ATOM_GOLDMONT, NO_MDS | NO_L1TF | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_MMIO),
+ VULNWL_INTEL(ATOM_GOLDMONT_D, NO_MDS | NO_L1TF | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_MMIO),
+ VULNWL_INTEL(ATOM_GOLDMONT_PLUS, NO_MDS | NO_L1TF | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_MMIO | NO_EIBRS_PBRSB),

/*
* Technically, swapgs isn't serializing on AMD (despite it previously
@@ -1193,18 +1199,18 @@ static const __initconst struct x86_cpu_
VULNWL_INTEL(ATOM_TREMONT_D, NO_ITLB_MULTIHIT | NO_EIBRS_PBRSB),

/* AMD Family 0xf - 0x12 */
- VULNWL_AMD(0x0f, NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT),
- VULNWL_AMD(0x10, NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT),
- VULNWL_AMD(0x11, NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT),
- VULNWL_AMD(0x12, NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT),
+ VULNWL_AMD(0x0f, NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_MMIO),
+ VULNWL_AMD(0x10, NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_MMIO),
+ VULNWL_AMD(0x11, NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_MMIO),
+ VULNWL_AMD(0x12, NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_MMIO),

/* FAMILY_ANY must be last, otherwise 0x0f - 0x12 matches won't work */
- VULNWL_AMD(X86_FAMILY_ANY, NO_MELTDOWN | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT),
- VULNWL_HYGON(X86_FAMILY_ANY, NO_MELTDOWN | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT),
+ VULNWL_AMD(X86_FAMILY_ANY, NO_MELTDOWN | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_MMIO),
+ VULNWL_HYGON(X86_FAMILY_ANY, NO_MELTDOWN | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_MMIO),

/* Zhaoxin Family 7 */
- VULNWL(CENTAUR, 7, X86_MODEL_ANY, NO_SPECTRE_V2 | NO_SWAPGS),
- VULNWL(ZHAOXIN, 7, X86_MODEL_ANY, NO_SPECTRE_V2 | NO_SWAPGS),
+ VULNWL(CENTAUR, 7, X86_MODEL_ANY, NO_SPECTRE_V2 | NO_SWAPGS | NO_MMIO),
+ VULNWL(ZHAOXIN, 7, X86_MODEL_ANY, NO_SPECTRE_V2 | NO_SWAPGS | NO_MMIO),
{}
};

@@ -1358,10 +1364,16 @@ static void __init cpu_set_bug_bits(stru
* Affected CPU list is generally enough to enumerate the vulnerability,
* but for virtualization case check for ARCH_CAP MSR bits also, VMM may
* not want the guest to enumerate the bug.
+ *
+ * Set X86_BUG_MMIO_UNKNOWN for CPUs that are neither in the blacklist,
+ * nor in the whitelist and also don't enumerate MSR ARCH_CAP MMIO bits.
*/
- if (cpu_matches(cpu_vuln_blacklist, MMIO) &&
- !arch_cap_mmio_immune(ia32_cap))
- setup_force_cpu_bug(X86_BUG_MMIO_STALE_DATA);
+ if (!arch_cap_mmio_immune(ia32_cap)) {
+ if (cpu_matches(cpu_vuln_blacklist, MMIO))
+ setup_force_cpu_bug(X86_BUG_MMIO_STALE_DATA);
+ else if (!cpu_matches(cpu_vuln_whitelist, NO_MMIO))
+ setup_force_cpu_bug(X86_BUG_MMIO_UNKNOWN);
+ }

if (!cpu_has(c, X86_FEATURE_BTC_NO)) {
if (cpu_matches(cpu_vuln_blacklist, RETBLEED) || (ia32_cap & ARCH_CAP_RSBA))


2022-08-29 12:16:20

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 122/158] Revert "memcg: cleanup racy sum avoidance code"

From: Shakeel Butt <[email protected]>

commit dbb16df6443c59e8a1ef21c2272fcf387d600ddf upstream.

This reverts commit 96e51ccf1af33e82f429a0d6baebba29c6448d0f.

Recently we started running the kernel with rstat infrastructure on
production traffic and begin to see negative memcg stats values.
Particularly the 'sock' stat is the one which we observed having negative
value.

$ grep "sock " /mnt/memory/job/memory.stat
sock 253952
total_sock 18446744073708724224

Re-run after couple of seconds

$ grep "sock " /mnt/memory/job/memory.stat
sock 253952
total_sock 53248

For now we are only seeing this issue on large machines (256 CPUs) and
only with 'sock' stat. I think the networking stack increase the stat on
one cpu and decrease it on another cpu much more often. So, this negative
sock is due to rstat flusher flushing the stats on the CPU that has seen
the decrement of sock but missed the CPU that has increments. A typical
race condition.

For easy stable backport, revert is the most simple solution. For long
term solution, I am thinking of two directions. First is just reduce the
race window by optimizing the rstat flusher. Second is if the reader sees
a negative stat value, force flush and restart the stat collection.
Basically retry but limited.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 96e51ccf1af33e8 ("memcg: cleanup racy sum avoidance code")
Signed-off-by: Shakeel Butt <[email protected]>
Cc: "Michal Koutný" <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Yosry Ahmed <[email protected]>
Cc: Greg Thelen <[email protected]>
Cc: <[email protected]> [5.15]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/linux/memcontrol.h | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)

--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -978,19 +978,30 @@ static inline void mod_memcg_page_state(

static inline unsigned long memcg_page_state(struct mem_cgroup *memcg, int idx)
{
- return READ_ONCE(memcg->vmstats.state[idx]);
+ long x = READ_ONCE(memcg->vmstats.state[idx]);
+#ifdef CONFIG_SMP
+ if (x < 0)
+ x = 0;
+#endif
+ return x;
}

static inline unsigned long lruvec_page_state(struct lruvec *lruvec,
enum node_stat_item idx)
{
struct mem_cgroup_per_node *pn;
+ long x;

if (mem_cgroup_disabled())
return node_page_state(lruvec_pgdat(lruvec), idx);

pn = container_of(lruvec, struct mem_cgroup_per_node, lruvec);
- return READ_ONCE(pn->lruvec_stats.state[idx]);
+ x = READ_ONCE(pn->lruvec_stats.state[idx]);
+#ifdef CONFIG_SMP
+ if (x < 0)
+ x = 0;
+#endif
+ return x;
}

static inline unsigned long lruvec_page_state_local(struct lruvec *lruvec,


2022-08-29 12:16:29

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 025/158] r8152: fix the RX FIFO settings when suspending

From: Hayes Wang <[email protected]>

[ Upstream commit b75d612014447e04abdf0e37ffb8f2fd8b0b49d6 ]

The RX FIFO would be changed when suspending, so the related settings
have to be modified, too. Otherwise, the flow control would work
abnormally.

BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=216333
Reported-by: Mark Blakeney <[email protected]>
Fixes: cdf0b86b250f ("r8152: fix a WOL issue")
Signed-off-by: Hayes Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/usb/r8152.c | 10 ++++++++++
1 file changed, 10 insertions(+)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index 46c7954d27629..d142ac8fcf6e2 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -5906,6 +5906,11 @@ static void r8153_enter_oob(struct r8152 *tp)
ocp_data &= ~NOW_IS_OOB;
ocp_write_byte(tp, MCU_TYPE_PLA, PLA_OOB_CTRL, ocp_data);

+ /* RX FIFO settings for OOB */
+ ocp_write_dword(tp, MCU_TYPE_PLA, PLA_RXFIFO_CTRL0, RXFIFO_THR1_OOB);
+ ocp_write_word(tp, MCU_TYPE_PLA, PLA_RXFIFO_CTRL1, RXFIFO_THR2_OOB);
+ ocp_write_word(tp, MCU_TYPE_PLA, PLA_RXFIFO_CTRL2, RXFIFO_THR3_OOB);
+
rtl_disable(tp);
rtl_reset_bmu(tp);

@@ -6544,6 +6549,11 @@ static void rtl8156_down(struct r8152 *tp)
ocp_data &= ~NOW_IS_OOB;
ocp_write_byte(tp, MCU_TYPE_PLA, PLA_OOB_CTRL, ocp_data);

+ /* RX FIFO settings for OOB */
+ ocp_write_word(tp, MCU_TYPE_PLA, PLA_RXFIFO_FULL, 64 / 16);
+ ocp_write_word(tp, MCU_TYPE_PLA, PLA_RX_FIFO_FULL, 1024 / 16);
+ ocp_write_word(tp, MCU_TYPE_PLA, PLA_RX_FIFO_EMPTY, 4096 / 16);
+
rtl_disable(tp);
rtl_reset_bmu(tp);

--
2.35.1



2022-08-29 12:18:04

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 094/158] btrfs: fix silent failure when deleting root reference

From: Filipe Manana <[email protected]>

commit 47bf225a8d2cccb15f7e8d4a1ed9b757dd86afd7 upstream.

At btrfs_del_root_ref(), if btrfs_search_slot() returns an error, we end
up returning from the function with a value of 0 (success). This happens
because the function returns the value stored in the variable 'err',
which is 0, while the error value we got from btrfs_search_slot() is
stored in the 'ret' variable.

So fix it by setting 'err' with the error value.

Fixes: 8289ed9f93bef2 ("btrfs: replace the BUG_ON in btrfs_del_root_ref with proper error handling")
CC: [email protected] # 5.16+
Reviewed-by: Qu Wenruo <[email protected]>
Signed-off-by: Filipe Manana <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/btrfs/root-tree.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

--- a/fs/btrfs/root-tree.c
+++ b/fs/btrfs/root-tree.c
@@ -349,9 +349,10 @@ int btrfs_del_root_ref(struct btrfs_tran
key.offset = ref_id;
again:
ret = btrfs_search_slot(trans, tree_root, &key, path, -1, 1);
- if (ret < 0)
+ if (ret < 0) {
+ err = ret;
goto out;
- if (ret == 0) {
+ } else if (ret == 0) {
leaf = path->nodes[0];
ref = btrfs_item_ptr(leaf, path->slots[0],
struct btrfs_root_ref);


2022-08-29 12:18:29

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 058/158] netfilter: nft_payload: do not truncate csum_offset and csum_type

From: Pablo Neira Ayuso <[email protected]>

[ Upstream commit 7044ab281febae9e2fa9b0b247693d6026166293 ]

Instead report ERANGE if csum_offset is too long, and EOPNOTSUPP if type
is not support.

Fixes: 7ec3f7b47b8d ("netfilter: nft_payload: add packet mangling support")
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/netfilter/nft_payload.c | 19 +++++++++++++------
1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/net/netfilter/nft_payload.c b/net/netfilter/nft_payload.c
index 4fee67abfe2c5..eb0e40c297121 100644
--- a/net/netfilter/nft_payload.c
+++ b/net/netfilter/nft_payload.c
@@ -740,17 +740,23 @@ static int nft_payload_set_init(const struct nft_ctx *ctx,
const struct nlattr * const tb[])
{
struct nft_payload_set *priv = nft_expr_priv(expr);
+ u32 csum_offset, csum_type = NFT_PAYLOAD_CSUM_NONE;
+ int err;

priv->base = ntohl(nla_get_be32(tb[NFTA_PAYLOAD_BASE]));
priv->offset = ntohl(nla_get_be32(tb[NFTA_PAYLOAD_OFFSET]));
priv->len = ntohl(nla_get_be32(tb[NFTA_PAYLOAD_LEN]));

if (tb[NFTA_PAYLOAD_CSUM_TYPE])
- priv->csum_type =
- ntohl(nla_get_be32(tb[NFTA_PAYLOAD_CSUM_TYPE]));
- if (tb[NFTA_PAYLOAD_CSUM_OFFSET])
- priv->csum_offset =
- ntohl(nla_get_be32(tb[NFTA_PAYLOAD_CSUM_OFFSET]));
+ csum_type = ntohl(nla_get_be32(tb[NFTA_PAYLOAD_CSUM_TYPE]));
+ if (tb[NFTA_PAYLOAD_CSUM_OFFSET]) {
+ err = nft_parse_u32_check(tb[NFTA_PAYLOAD_CSUM_OFFSET], U8_MAX,
+ &csum_offset);
+ if (err < 0)
+ return err;
+
+ priv->csum_offset = csum_offset;
+ }
if (tb[NFTA_PAYLOAD_CSUM_FLAGS]) {
u32 flags;

@@ -761,7 +767,7 @@ static int nft_payload_set_init(const struct nft_ctx *ctx,
priv->csum_flags = flags;
}

- switch (priv->csum_type) {
+ switch (csum_type) {
case NFT_PAYLOAD_CSUM_NONE:
case NFT_PAYLOAD_CSUM_INET:
break;
@@ -775,6 +781,7 @@ static int nft_payload_set_init(const struct nft_ctx *ctx,
default:
return -EOPNOTSUPP;
}
+ priv->csum_type = csum_type;

return nft_parse_register_load(tb[NFTA_PAYLOAD_SREG], &priv->sreg,
priv->len);
--
2.35.1



2022-08-29 12:18:48

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 137/158] arm64: Fix match_list for erratum 1286807 on Arm Cortex-A76

From: Zenghui Yu <[email protected]>

commit 5e1e087457c94ad7fafbe1cf6f774c6999ee29d4 upstream.

Since commit 51f559d66527 ("arm64: Enable repeat tlbi workaround on KRYO4XX
gold CPUs"), we failed to detect erratum 1286807 on Cortex-A76 because its
entry in arm64_repeat_tlbi_list[] was accidently corrupted by this commit.

Fix this issue by creating a separate entry for Kryo4xx Gold.

Fixes: 51f559d66527 ("arm64: Enable repeat tlbi workaround on KRYO4XX gold CPUs")
Cc: Shreyas K K <[email protected]>
Signed-off-by: Zenghui Yu <[email protected]>
Acked-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/arm64/kernel/cpu_errata.c | 2 ++
1 file changed, 2 insertions(+)

--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -208,6 +208,8 @@ static const struct arm64_cpu_capabiliti
#ifdef CONFIG_ARM64_ERRATUM_1286807
{
ERRATA_MIDR_RANGE(MIDR_CORTEX_A76, 0, 0, 3, 0),
+ },
+ {
/* Kryo4xx Gold (rcpe to rfpe) => (r0p0 to r3p0) */
ERRATA_MIDR_RANGE(MIDR_QCOM_KRYO_4XX_GOLD, 0xc, 0xe, 0xf, 0xe),
},


2022-08-29 12:18:48

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 142/158] blk-mq: fix io hung due to missing commit_rqs

From: Yu Kuai <[email protected]>

commit 65fac0d54f374625b43a9d6ad1f2c212bd41f518 upstream.

Currently, in virtio_scsi, if 'bd->last' is not set to true while
dispatching request, such io will stay in driver's queue, and driver
will wait for block layer to dispatch more rqs. However, if block
layer failed to dispatch more rq, it should trigger commit_rqs to
inform driver.

There is a problem in blk_mq_try_issue_list_directly() that commit_rqs
won't be called:

// assume that queue_depth is set to 1, list contains two rq
blk_mq_try_issue_list_directly
blk_mq_request_issue_directly
// dispatch first rq
// last is false
__blk_mq_try_issue_directly
blk_mq_get_dispatch_budget
// succeed to get first budget
__blk_mq_issue_directly
scsi_queue_rq
cmd->flags |= SCMD_LAST
virtscsi_queuecommand
kick = (sc->flags & SCMD_LAST) != 0
// kick is false, first rq won't issue to disk
queued++

blk_mq_request_issue_directly
// dispatch second rq
__blk_mq_try_issue_directly
blk_mq_get_dispatch_budget
// failed to get second budget
ret == BLK_STS_RESOURCE
blk_mq_request_bypass_insert
// errors is still 0

if (!list_empty(list) || errors && ...)
// won't pass, commit_rqs won't be called

In this situation, first rq relied on second rq to dispatch, while
second rq relied on first rq to complete, thus they will both hung.

Fix the problem by also treat 'BLK_STS_*RESOURCE' as 'errors' since
it means that request is not queued successfully.

Same problem exists in blk_mq_dispatch_rq_list(), 'BLK_STS_*RESOURCE'
can't be treated as 'errors' here, fix the problem by calling
commit_rqs if queue_rq return 'BLK_STS_*RESOURCE'.

Fixes: d666ba98f849 ("blk-mq: add mq_ops->commit_rqs()")
Signed-off-by: Yu Kuai <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
block/blk-mq.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1925,7 +1925,8 @@ out:
/* If we didn't flush the entire list, we could have told the driver
* there was more coming, but that turned out to be a lie.
*/
- if ((!list_empty(list) || errors) && q->mq_ops->commit_rqs && queued)
+ if ((!list_empty(list) || errors || needs_resource ||
+ ret == BLK_STS_DEV_RESOURCE) && q->mq_ops->commit_rqs && queued)
q->mq_ops->commit_rqs(hctx);
/*
* Any items that need requeuing? Stuff them into hctx->dispatch,
@@ -2678,6 +2679,7 @@ void blk_mq_try_issue_list_directly(stru
list_del_init(&rq->queuelist);
ret = blk_mq_request_issue_directly(rq, list_empty(list));
if (ret != BLK_STS_OK) {
+ errors++;
if (ret == BLK_STS_RESOURCE ||
ret == BLK_STS_DEV_RESOURCE) {
blk_mq_request_bypass_insert(rq, false,
@@ -2685,7 +2687,6 @@ void blk_mq_try_issue_list_directly(stru
break;
}
blk_mq_end_request(rq, ret);
- errors++;
} else
queued++;
}


2022-08-29 12:19:40

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 091/158] net: lantiq_xrx200: confirm skb is allocated before using

From: Aleksander Jan Bajkowski <[email protected]>

[ Upstream commit c8b043702dc0894c07721c5b019096cebc8c798f ]

xrx200_hw_receive() assumes build_skb() always works and goes straight
to skb_reserve(). However, build_skb() can fail under memory pressure.

Add a check in case build_skb() failed to allocate and return NULL.

Fixes: e015593573b3 ("net: lantiq_xrx200: convert to build_skb")
Reported-by: Eric Dumazet <[email protected]>
Signed-off-by: Aleksander Jan Bajkowski <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/ethernet/lantiq_xrx200.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/lantiq_xrx200.c b/drivers/net/ethernet/lantiq_xrx200.c
index 5edb68a8aab1e..89314b645c822 100644
--- a/drivers/net/ethernet/lantiq_xrx200.c
+++ b/drivers/net/ethernet/lantiq_xrx200.c
@@ -239,6 +239,12 @@ static int xrx200_hw_receive(struct xrx200_chan *ch)
}

skb = build_skb(buf, priv->rx_skb_size);
+ if (!skb) {
+ skb_free_frag(buf);
+ net_dev->stats.rx_dropped++;
+ return -ENOMEM;
+ }
+
skb_reserve(skb, NET_SKB_PAD);
skb_put(skb, len);

--
2.35.1



2022-08-29 12:20:19

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 079/158] net: Fix a data-race around gro_normal_batch.

From: Kuniyuki Iwashima <[email protected]>

[ Upstream commit 8db24af3f02ebdbf302196006ebb270c4c3a2706 ]

While reading gro_normal_batch, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 323ebb61e32b ("net: use listified RX for handling GRO_NORMAL skbs")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Acked-by: Edward Cree <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
include/net/gro.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/net/gro.h b/include/net/gro.h
index 867656b0739c0..24003dea8fa4d 100644
--- a/include/net/gro.h
+++ b/include/net/gro.h
@@ -439,7 +439,7 @@ static inline void gro_normal_one(struct napi_struct *napi, struct sk_buff *skb,
{
list_add_tail(&skb->list, &napi->rx_list);
napi->rx_count += segs;
- if (napi->rx_count >= gro_normal_batch)
+ if (napi->rx_count >= READ_ONCE(gro_normal_batch))
gro_normal_list(napi);
}

--
2.35.1



2022-08-29 12:20:27

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 156/158] riscv: dts: microchip: mpfs: remove bogus card-detect-delay

From: Conor Dooley <[email protected]>

commit 2b55915d27dcaa35f54bad7925af0a76001079bc upstream.

Recent versions of dt-schema warn about a previously undetected
undocumented property:
arch/riscv/boot/dts/microchip/mpfs-icicle-kit.dtb: mmc@20008000: Unevaluated properties are not allowed ('card-detect-delay' was unexpected)
From schema: Documentation/devicetree/bindings/mmc/cdns,sdhci.yaml

There are no GPIOs connected to MSSIO6B4 pin K3 so adding the common
cd-debounce-delay-ms property makes no sense. The Cadence IP has a
register that sets the card detect delay as "DP * tclk". On MPFS, this
clock frequency is not configurable (it must be 200 MHz) & the FPGA
comes out of reset with this register already set.

Fixes: bc47b2217f24 ("riscv: dts: microchip: add the sundance polarberry")
Fixes: 0fa6107eca41 ("RISC-V: Initial DTS for Microchip ICICLE board")
Signed-off-by: Conor Dooley <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/riscv/boot/dts/microchip/mpfs-icicle-kit.dts | 1 -
arch/riscv/boot/dts/microchip/mpfs-polarberry.dts | 1 -
2 files changed, 2 deletions(-)

diff --git a/arch/riscv/boot/dts/microchip/mpfs-icicle-kit.dts b/arch/riscv/boot/dts/microchip/mpfs-icicle-kit.dts
index ee548ab61a2a..f3f87ed2007f 100644
--- a/arch/riscv/boot/dts/microchip/mpfs-icicle-kit.dts
+++ b/arch/riscv/boot/dts/microchip/mpfs-icicle-kit.dts
@@ -100,7 +100,6 @@ &mmc {
disable-wp;
cap-sd-highspeed;
cap-mmc-highspeed;
- card-detect-delay = <200>;
mmc-ddr-1_8v;
mmc-hs200-1_8v;
sd-uhs-sdr12;
diff --git a/arch/riscv/boot/dts/microchip/mpfs-polarberry.dts b/arch/riscv/boot/dts/microchip/mpfs-polarberry.dts
index dc11bb8fc833..c87cc2d8fe29 100644
--- a/arch/riscv/boot/dts/microchip/mpfs-polarberry.dts
+++ b/arch/riscv/boot/dts/microchip/mpfs-polarberry.dts
@@ -70,7 +70,6 @@ &mmc {
disable-wp;
cap-sd-highspeed;
cap-mmc-highspeed;
- card-detect-delay = <200>;
mmc-ddr-1_8v;
mmc-hs200-1_8v;
sd-uhs-sdr12;
--
2.37.2



2022-08-29 12:21:05

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 118/158] mm/mprotect: only reference swap pfn page if type match

From: Peter Xu <[email protected]>

commit 3d2f78f08cd8388035ac375e731ec1ac1b79b09d upstream.

Yu Zhao reported a bug after the commit "mm/swap: Add swp_offset_pfn() to
fetch PFN from swap entry" added a check in swp_offset_pfn() for swap type [1]:

kernel BUG at include/linux/swapops.h:117!
CPU: 46 PID: 5245 Comm: EventManager_De Tainted: G S O L 6.0.0-dbg-DEV #2
RIP: 0010:pfn_swap_entry_to_page+0x72/0xf0
Code: c6 48 8b 36 48 83 fe ff 74 53 48 01 d1 48 83 c1 08 48 8b 09 f6
c1 01 75 7b 66 90 48 89 c1 48 8b 09 f6 c1 01 74 74 5d c3 eb 9e <0f> 0b
48 ba ff ff ff ff 03 00 00 00 eb ae a9 ff 0f 00 00 75 13 48
RSP: 0018:ffffa59e73fabb80 EFLAGS: 00010282
RAX: 00000000ffffffe8 RBX: 0c00000000000000 RCX: ffffcd5440000000
RDX: 1ffffffffff7a80a RSI: 0000000000000000 RDI: 0c0000000000042b
RBP: ffffa59e73fabb80 R08: ffff9965ca6e8bb8 R09: 0000000000000000
R10: ffffffffa5a2f62d R11: 0000030b372e9fff R12: ffff997b79db5738
R13: 000000000000042b R14: 0c0000000000042b R15: 1ffffffffff7a80a
FS: 00007f549d1bb700(0000) GS:ffff99d3cf680000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000440d035b3180 CR3: 0000002243176004 CR4: 00000000003706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
change_pte_range+0x36e/0x880
change_p4d_range+0x2e8/0x670
change_protection_range+0x14e/0x2c0
mprotect_fixup+0x1ee/0x330
do_mprotect_pkey+0x34c/0x440
__x64_sys_mprotect+0x1d/0x30

It triggers because pfn_swap_entry_to_page() could be called upon e.g. a
genuine swap entry.

Fix it by only calling it when it's a write migration entry where the page*
is used.

[1] https://lore.kernel.org/lkml/CAOUHufaVC2Za-p8m0aiHw6YkheDcrO-C3wRGixwDS32VTS+k1w@mail.gmail.com/

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 6c287605fd56 ("mm: remember exclusively mapped anonymous pages with PG_anon_exclusive")
Signed-off-by: Peter Xu <[email protected]>
Reported-by: Yu Zhao <[email protected]>
Tested-by: Yu Zhao <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: "Huang, Ying" <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
mm/mprotect.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -158,10 +158,11 @@ static unsigned long change_pte_range(st
pages++;
} else if (is_swap_pte(oldpte)) {
swp_entry_t entry = pte_to_swp_entry(oldpte);
- struct page *page = pfn_swap_entry_to_page(entry);
pte_t newpte;

if (is_writable_migration_entry(entry)) {
+ struct page *page = pfn_swap_entry_to_page(entry);
+
/*
* A protection check is difficult so
* just be safe and disable write


2022-08-29 12:23:23

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 096/158] btrfs: add info when mount fails due to stale replace target

From: Anand Jain <[email protected]>

commit f2c3bec215694fb8bc0ef5010f2a758d1906fc2d upstream.

If the replace target device reappears after the suspended replace is
cancelled, it blocks the mount operation as it can't find the matching
replace-item in the metadata. As shown below,

BTRFS error (device sda5): replace devid present without an active replace item

To overcome this situation, the user can run the command

btrfs device scan --forget <replace target device>

and try the mount command again. And also, to avoid repeating the issue,
superblock on the devid=0 must be wiped.

wipefs -a device-path-to-devid=0.

This patch adds some info when this situation occurs.

Reported-by: Samuel Greiner <[email protected]>
Link: https://lore.kernel.org/linux-btrfs/[email protected]/T/
CC: [email protected] # 5.0+
Signed-off-by: Anand Jain <[email protected]>
Signed-off-by: David Sterba <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/btrfs/dev-replace.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -165,7 +165,7 @@ no_valid_dev_replace_entry_found:
*/
if (btrfs_find_device(fs_info->fs_devices, &args)) {
btrfs_err(fs_info,
- "replace devid present without an active replace item");
+"replace without active item, run 'device scan --forget' on the target device");
ret = -EUCLEAN;
} else {
dev_replace->srcdev = NULL;


2022-08-29 12:27:07

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 028/158] ice: xsk: use Rx rings XDP ring when picking NAPI context

From: Maciej Fijalkowski <[email protected]>

[ Upstream commit 9ead7e74bfd6dd54db12ef133b8604add72511de ]

Ice driver allocates per cpu XDP queues so that redirect path can safely
use smp_processor_id() as an index to the array. At the same time
though, XDP rings are used to pick NAPI context to call napi_schedule()
or set NAPIF_STATE_MISSED. When user reduces queue count, say to 8, and
num_possible_cpus() of underlying platform is 44, then this means queue
vectors with correlated NAPI contexts will carry several XDP queues.

This in turn can result in a broken behavior where NAPI context of
interest will never be scheduled and AF_XDP socket will not process any
traffic.

To fix this, let us change the way how XDP rings are assigned to Rx
rings and use this information later on when setting
ice_tx_ring::xsk_pool pointer. For each Rx ring, grab the associated
queue vector and walk through Tx ring's linked list. Once we stumble
upon XDP ring in it, assign this ring to ice_rx_ring::xdp_ring.

Previous [0] approach of fixing this issue was for txonly scenario
because of the described grouping of XDP rings across queue vectors. So,
relying on Rx ring meant that NAPI context could be scheduled with a
queue vector without XDP ring with associated XSK pool.

[0]: https://lore.kernel.org/netdev/[email protected]/

Fixes: 2d4238f55697 ("ice: Add support for AF_XDP")
Fixes: 22bf877e528f ("ice: introduce XDP_TX fallback path")
Signed-off-by: Maciej Fijalkowski <[email protected]>
Tested-by: George Kuruvinakunnel <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/ethernet/intel/ice/ice.h | 36 +++++++++++++++--------
drivers/net/ethernet/intel/ice/ice_lib.c | 4 +--
drivers/net/ethernet/intel/ice/ice_main.c | 25 +++++++++++-----
drivers/net/ethernet/intel/ice/ice_xsk.c | 12 ++++----
4 files changed, 48 insertions(+), 29 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
index 60453b3b8d233..6911cbb7afa50 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -684,8 +684,8 @@ static inline void ice_set_ring_xdp(struct ice_tx_ring *ring)
* ice_xsk_pool - get XSK buffer pool bound to a ring
* @ring: Rx ring to use
*
- * Returns a pointer to xdp_umem structure if there is a buffer pool present,
- * NULL otherwise.
+ * Returns a pointer to xsk_buff_pool structure if there is a buffer pool
+ * present, NULL otherwise.
*/
static inline struct xsk_buff_pool *ice_xsk_pool(struct ice_rx_ring *ring)
{
@@ -699,23 +699,33 @@ static inline struct xsk_buff_pool *ice_xsk_pool(struct ice_rx_ring *ring)
}

/**
- * ice_tx_xsk_pool - get XSK buffer pool bound to a ring
- * @ring: Tx ring to use
+ * ice_tx_xsk_pool - assign XSK buff pool to XDP ring
+ * @vsi: pointer to VSI
+ * @qid: index of a queue to look at XSK buff pool presence
*
- * Returns a pointer to xdp_umem structure if there is a buffer pool present,
- * NULL otherwise. Tx equivalent of ice_xsk_pool.
+ * Sets XSK buff pool pointer on XDP ring.
+ *
+ * XDP ring is picked from Rx ring, whereas Rx ring is picked based on provided
+ * queue id. Reason for doing so is that queue vectors might have assigned more
+ * than one XDP ring, e.g. when user reduced the queue count on netdev; Rx ring
+ * carries a pointer to one of these XDP rings for its own purposes, such as
+ * handling XDP_TX action, therefore we can piggyback here on the
+ * rx_ring->xdp_ring assignment that was done during XDP rings initialization.
*/
-static inline struct xsk_buff_pool *ice_tx_xsk_pool(struct ice_tx_ring *ring)
+static inline void ice_tx_xsk_pool(struct ice_vsi *vsi, u16 qid)
{
- struct ice_vsi *vsi = ring->vsi;
- u16 qid;
+ struct ice_tx_ring *ring;

- qid = ring->q_index - vsi->alloc_txq;
+ ring = vsi->rx_rings[qid]->xdp_ring;
+ if (!ring)
+ return;

- if (!ice_is_xdp_ena_vsi(vsi) || !test_bit(qid, vsi->af_xdp_zc_qps))
- return NULL;
+ if (!ice_is_xdp_ena_vsi(vsi) || !test_bit(qid, vsi->af_xdp_zc_qps)) {
+ ring->xsk_pool = NULL;
+ return;
+ }

- return xsk_get_pool_from_qid(vsi->netdev, qid);
+ ring->xsk_pool = xsk_get_pool_from_qid(vsi->netdev, qid);
}

/**
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
index d6aafa272fb0b..6c4e1d45235ef 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_lib.c
@@ -1983,8 +1983,8 @@ int ice_vsi_cfg_xdp_txqs(struct ice_vsi *vsi)
if (ret)
return ret;

- ice_for_each_xdp_txq(vsi, i)
- vsi->xdp_rings[i]->xsk_pool = ice_tx_xsk_pool(vsi->xdp_rings[i]);
+ ice_for_each_rxq(vsi, i)
+ ice_tx_xsk_pool(vsi, i);

return ret;
}
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index bfd97a9a8f2e0..3d45e075204e3 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -2581,7 +2581,6 @@ static int ice_xdp_alloc_setup_rings(struct ice_vsi *vsi)
if (ice_setup_tx_ring(xdp_ring))
goto free_xdp_rings;
ice_set_ring_xdp(xdp_ring);
- xdp_ring->xsk_pool = ice_tx_xsk_pool(xdp_ring);
spin_lock_init(&xdp_ring->tx_lock);
for (j = 0; j < xdp_ring->count; j++) {
tx_desc = ICE_TX_DESC(xdp_ring, j);
@@ -2589,13 +2588,6 @@ static int ice_xdp_alloc_setup_rings(struct ice_vsi *vsi)
}
}

- ice_for_each_rxq(vsi, i) {
- if (static_key_enabled(&ice_xdp_locking_key))
- vsi->rx_rings[i]->xdp_ring = vsi->xdp_rings[i % vsi->num_xdp_txq];
- else
- vsi->rx_rings[i]->xdp_ring = vsi->xdp_rings[i];
- }
-
return 0;

free_xdp_rings:
@@ -2685,6 +2677,23 @@ int ice_prepare_xdp_rings(struct ice_vsi *vsi, struct bpf_prog *prog)
xdp_rings_rem -= xdp_rings_per_v;
}

+ ice_for_each_rxq(vsi, i) {
+ if (static_key_enabled(&ice_xdp_locking_key)) {
+ vsi->rx_rings[i]->xdp_ring = vsi->xdp_rings[i % vsi->num_xdp_txq];
+ } else {
+ struct ice_q_vector *q_vector = vsi->rx_rings[i]->q_vector;
+ struct ice_tx_ring *ring;
+
+ ice_for_each_tx_ring(ring, q_vector->tx) {
+ if (ice_ring_is_xdp(ring)) {
+ vsi->rx_rings[i]->xdp_ring = ring;
+ break;
+ }
+ }
+ }
+ ice_tx_xsk_pool(vsi, i);
+ }
+
/* omit the scheduler update if in reset path; XDP queues will be
* taken into account at the end of ice_vsi_rebuild, where
* ice_cfg_vsi_lan is being called
diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
index 45f88e6ec25e8..e48e29258450f 100644
--- a/drivers/net/ethernet/intel/ice/ice_xsk.c
+++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
@@ -243,7 +243,7 @@ static int ice_qp_ena(struct ice_vsi *vsi, u16 q_idx)
if (err)
goto free_buf;
ice_set_ring_xdp(xdp_ring);
- xdp_ring->xsk_pool = ice_tx_xsk_pool(xdp_ring);
+ ice_tx_xsk_pool(vsi, q_idx);
}

err = ice_vsi_cfg_rxq(rx_ring);
@@ -359,7 +359,7 @@ int ice_xsk_pool_setup(struct ice_vsi *vsi, struct xsk_buff_pool *pool, u16 qid)
if (if_running) {
ret = ice_qp_ena(vsi, qid);
if (!ret && pool_present)
- napi_schedule(&vsi->xdp_rings[qid]->q_vector->napi);
+ napi_schedule(&vsi->rx_rings[qid]->xdp_ring->q_vector->napi);
else if (ret)
netdev_err(vsi->netdev, "ice_qp_ena error = %d\n", ret);
}
@@ -950,13 +950,13 @@ ice_xsk_wakeup(struct net_device *netdev, u32 queue_id,
if (!ice_is_xdp_ena_vsi(vsi))
return -EINVAL;

- if (queue_id >= vsi->num_txq)
+ if (queue_id >= vsi->num_txq || queue_id >= vsi->num_rxq)
return -EINVAL;

- if (!vsi->xdp_rings[queue_id]->xsk_pool)
- return -EINVAL;
+ ring = vsi->rx_rings[queue_id]->xdp_ring;

- ring = vsi->xdp_rings[queue_id];
+ if (!ring->xsk_pool)
+ return -EINVAL;

/* The idea here is that if NAPI is running, mark a miss, so
* it will run again. If not, trigger an interrupt and
--
2.35.1



2022-08-29 12:28:25

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 085/158] net: ethernet: mtk_eth_soc: fix hw hash reporting for MTK_NETSYS_V2

From: Lorenzo Bianconi <[email protected]>

[ Upstream commit 0cf731f9ebb5bf6f252055bebf4463a5c0bd490b ]

Properly report hw rx hash for mt7986 chipset accroding to the new dma
descriptor layout.

Fixes: 197c9e9b17b11 ("net: ethernet: mtk_eth_soc: introduce support for mt7986 chipset")
Signed-off-by: Lorenzo Bianconi <[email protected]>
Link: https://lore.kernel.org/r/091394ea4e705fbb35f828011d98d0ba33808f69.1661257293.git.lorenzo@kernel.org
Signed-off-by: Paolo Abeni <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/ethernet/mediatek/mtk_eth_soc.c | 22 +++++++++++----------
drivers/net/ethernet/mediatek/mtk_eth_soc.h | 5 +++++
2 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index 6beb3d4873a37..dcf0aac0aa65d 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -1513,10 +1513,19 @@ static int mtk_poll_rx(struct napi_struct *napi, int budget,
skb->dev = netdev;
skb_put(skb, pktlen);

- if (MTK_HAS_CAPS(eth->soc->caps, MTK_NETSYS_V2))
+ if (MTK_HAS_CAPS(eth->soc->caps, MTK_NETSYS_V2)) {
+ hash = trxd.rxd5 & MTK_RXD5_FOE_ENTRY;
+ if (hash != MTK_RXD5_FOE_ENTRY)
+ skb_set_hash(skb, jhash_1word(hash, 0),
+ PKT_HASH_TYPE_L4);
rxdcsum = &trxd.rxd3;
- else
+ } else {
+ hash = trxd.rxd4 & MTK_RXD4_FOE_ENTRY;
+ if (hash != MTK_RXD4_FOE_ENTRY)
+ skb_set_hash(skb, jhash_1word(hash, 0),
+ PKT_HASH_TYPE_L4);
rxdcsum = &trxd.rxd4;
+ }

if (*rxdcsum & eth->soc->txrx.rx_dma_l4_valid)
skb->ip_summed = CHECKSUM_UNNECESSARY;
@@ -1525,16 +1534,9 @@ static int mtk_poll_rx(struct napi_struct *napi, int budget,
skb->protocol = eth_type_trans(skb, netdev);
bytes += pktlen;

- hash = trxd.rxd4 & MTK_RXD4_FOE_ENTRY;
- if (hash != MTK_RXD4_FOE_ENTRY) {
- hash = jhash_1word(hash, 0);
- skb_set_hash(skb, hash, PKT_HASH_TYPE_L4);
- }
-
reason = FIELD_GET(MTK_RXD4_PPE_CPU_REASON, trxd.rxd4);
if (reason == MTK_PPE_CPU_REASON_HIT_UNBIND_RATE_REACHED)
- mtk_ppe_check_skb(eth->ppe, skb,
- trxd.rxd4 & MTK_RXD4_FOE_ENTRY);
+ mtk_ppe_check_skb(eth->ppe, skb, hash);

if (netdev->features & NETIF_F_HW_VLAN_CTAG_RX) {
if (MTK_HAS_CAPS(eth->soc->caps, MTK_NETSYS_V2)) {
diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.h b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
index 0a632896451a4..98d6a6d047e32 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.h
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
@@ -307,6 +307,11 @@
#define RX_DMA_L4_VALID_PDMA BIT(30) /* when PDMA is used */
#define RX_DMA_SPECIAL_TAG BIT(22)

+/* PDMA descriptor rxd5 */
+#define MTK_RXD5_FOE_ENTRY GENMASK(14, 0)
+#define MTK_RXD5_PPE_CPU_REASON GENMASK(22, 18)
+#define MTK_RXD5_SRC_PORT GENMASK(29, 26)
+
#define RX_DMA_GET_SPORT(x) (((x) >> 19) & 0xf)
#define RX_DMA_GET_SPORT_V2(x) (((x) >> 26) & 0x7)

--
2.35.1



2022-08-29 12:28:26

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 074/158] net: Fix a data-race around netdev_budget.

From: Kuniyuki Iwashima <[email protected]>

[ Upstream commit 2e0c42374ee32e72948559d2ae2f7ba3dc6b977c ]

While reading netdev_budget, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 51b0bdedb8e7 ("[NET]: Separate two usages of netdev_max_backlog.")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/core/dev.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 34282b93c3f60..a330f93629314 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6647,7 +6647,7 @@ static __latent_entropy void net_rx_action(struct softirq_action *h)
struct softnet_data *sd = this_cpu_ptr(&softnet_data);
unsigned long time_limit = jiffies +
usecs_to_jiffies(netdev_budget_usecs);
- int budget = netdev_budget;
+ int budget = READ_ONCE(netdev_budget);
LIST_HEAD(list);
LIST_HEAD(repoll);

--
2.35.1



2022-08-29 12:28:27

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 020/158] SUNRPC: RPC level errors should set task->tk_rpc_status

From: Trond Myklebust <[email protected]>

[ Upstream commit ed06fce0b034b2e25bd93430f5c4cbb28036cc1a ]

Fix up a case in call_encode() where we're failing to set
task->tk_rpc_status when an RPC level error occurred.

Fixes: 9c5948c24869 ("SUNRPC: task should be exit if encode return EKEYEXPIRED more times")
Signed-off-by: Trond Myklebust <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/sunrpc/clnt.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 733f9f2260926..c1a01947530f0 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -1888,7 +1888,7 @@ call_encode(struct rpc_task *task)
break;
case -EKEYEXPIRED:
if (!task->tk_cred_retry) {
- rpc_exit(task, task->tk_status);
+ rpc_call_rpcerror(task, task->tk_status);
} else {
task->tk_action = call_refresh;
task->tk_cred_retry--;
--
2.35.1



2022-08-29 12:29:00

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 010/158] mm/hugetlb: support write-faults in shared mappings

From: David Hildenbrand <[email protected]>

commit 1d8d14641fd94a01b20a4abbf2749fd8eddcf57b upstream.

If we ever get a write-fault on a write-protected page in a shared
mapping, we'd be in trouble (again). Instead, we can simply map the page
writable.

And in fact, there is even a way right now to trigger that code via
uffd-wp ever since we stared to support it for shmem in 5.19:

--------------------------------------------------------------------------
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <unistd.h>
#include <errno.h>
#include <sys/mman.h>
#include <sys/syscall.h>
#include <sys/ioctl.h>
#include <linux/userfaultfd.h>

#define HUGETLB_SIZE (2 * 1024 * 1024u)

static char *map;
int uffd;

static int temp_setup_uffd(void)
{
struct uffdio_api uffdio_api;
struct uffdio_register uffdio_register;
struct uffdio_writeprotect uffd_writeprotect;
struct uffdio_range uffd_range;

uffd = syscall(__NR_userfaultfd,
O_CLOEXEC | O_NONBLOCK | UFFD_USER_MODE_ONLY);
if (uffd < 0) {
fprintf(stderr, "syscall() failed: %d\n", errno);
return -errno;
}

uffdio_api.api = UFFD_API;
uffdio_api.features = UFFD_FEATURE_PAGEFAULT_FLAG_WP;
if (ioctl(uffd, UFFDIO_API, &uffdio_api) < 0) {
fprintf(stderr, "UFFDIO_API failed: %d\n", errno);
return -errno;
}

if (!(uffdio_api.features & UFFD_FEATURE_PAGEFAULT_FLAG_WP)) {
fprintf(stderr, "UFFD_FEATURE_WRITEPROTECT missing\n");
return -ENOSYS;
}

/* Register UFFD-WP */
uffdio_register.range.start = (unsigned long) map;
uffdio_register.range.len = HUGETLB_SIZE;
uffdio_register.mode = UFFDIO_REGISTER_MODE_WP;
if (ioctl(uffd, UFFDIO_REGISTER, &uffdio_register) < 0) {
fprintf(stderr, "UFFDIO_REGISTER failed: %d\n", errno);
return -errno;
}

/* Writeprotect a single page. */
uffd_writeprotect.range.start = (unsigned long) map;
uffd_writeprotect.range.len = HUGETLB_SIZE;
uffd_writeprotect.mode = UFFDIO_WRITEPROTECT_MODE_WP;
if (ioctl(uffd, UFFDIO_WRITEPROTECT, &uffd_writeprotect)) {
fprintf(stderr, "UFFDIO_WRITEPROTECT failed: %d\n", errno);
return -errno;
}

/* Unregister UFFD-WP without prior writeunprotection. */
uffd_range.start = (unsigned long) map;
uffd_range.len = HUGETLB_SIZE;
if (ioctl(uffd, UFFDIO_UNREGISTER, &uffd_range)) {
fprintf(stderr, "UFFDIO_UNREGISTER failed: %d\n", errno);
return -errno;
}

return 0;
}

int main(int argc, char **argv)
{
int fd;

fd = open("/dev/hugepages/tmp", O_RDWR | O_CREAT);
if (!fd) {
fprintf(stderr, "open() failed\n");
return -errno;
}
if (ftruncate(fd, HUGETLB_SIZE)) {
fprintf(stderr, "ftruncate() failed\n");
return -errno;
}

map = mmap(NULL, HUGETLB_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
if (map == MAP_FAILED) {
fprintf(stderr, "mmap() failed\n");
return -errno;
}

*map = 0;

if (temp_setup_uffd())
return 1;

*map = 0;

return 0;
}
--------------------------------------------------------------------------

Above test fails with SIGBUS when there is only a single free hugetlb page.
# echo 1 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
# ./test
Bus error (core dumped)

And worse, with sufficient free hugetlb pages it will map an anonymous page
into a shared mapping, for example, messing up accounting during unmap
and breaking MAP_SHARED semantics:
# echo 2 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
# ./test
# cat /proc/meminfo | grep HugePages_
HugePages_Total: 2
HugePages_Free: 1
HugePages_Rsvd: 18446744073709551615
HugePages_Surp: 0

Reason is that uffd-wp doesn't clear the uffd-wp PTE bit when
unregistering and consequently keeps the PTE writeprotected. Reason for
this is to avoid the additional overhead when unregistering. Note that
this is the case also for !hugetlb and that we will end up with writable
PTEs that still have the uffd-wp PTE bit set once we return from
hugetlb_wp(). I'm not touching the uffd-wp PTE bit for now, because it
seems to be a generic thing -- wp_page_reuse() also doesn't clear it.

VM_MAYSHARE handling in hugetlb_fault() for FAULT_FLAG_WRITE indicates
that MAP_SHARED handling was at least envisioned, but could never have
worked as expected.

While at it, make sure that we never end up in hugetlb_wp() on write
faults without VM_WRITE, because we don't support maybe_mkwrite()
semantics as commonly used in the !hugetlb case -- for example, in
wp_page_reuse().

Note that there is no need to do any kind of reservation in
hugetlb_fault() in this case ... because we already have a hugetlb page
mapped R/O that we will simply map writable and we are not dealing with
COW/unsharing.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: b1f9e876862d ("mm/uffd: enable write protection for shmem & hugetlbfs")
Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Mike Kravetz <[email protected]>
Cc: Bjorn Helgaas <[email protected]>
Cc: Cyrill Gorcunov <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Jamie Liu <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Pavel Emelyanov <[email protected]>
Cc: Peter Feiner <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: <[email protected]> [5.19]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
mm/hugetlb.c | 26 +++++++++++++++++++-------
1 file changed, 19 insertions(+), 7 deletions(-)

--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5232,6 +5232,21 @@ static vm_fault_t hugetlb_wp(struct mm_s
VM_BUG_ON(unshare && (flags & FOLL_WRITE));
VM_BUG_ON(!unshare && !(flags & FOLL_WRITE));

+ /*
+ * hugetlb does not support FOLL_FORCE-style write faults that keep the
+ * PTE mapped R/O such as maybe_mkwrite() would do.
+ */
+ if (WARN_ON_ONCE(!unshare && !(vma->vm_flags & VM_WRITE)))
+ return VM_FAULT_SIGSEGV;
+
+ /* Let's take out MAP_SHARED mappings first. */
+ if (vma->vm_flags & VM_MAYSHARE) {
+ if (unlikely(unshare))
+ return 0;
+ set_huge_ptep_writable(vma, haddr, ptep);
+ return 0;
+ }
+
pte = huge_ptep_get(ptep);
old_page = pte_page(pte);

@@ -5766,12 +5781,11 @@ vm_fault_t hugetlb_fault(struct mm_struc
* If we are going to COW/unshare the mapping later, we examine the
* pending reservations for this page now. This will ensure that any
* allocations necessary to record that reservation occur outside the
- * spinlock. For private mappings, we also lookup the pagecache
- * page now as it is used to determine if a reservation has been
- * consumed.
+ * spinlock. Also lookup the pagecache page now as it is used to
+ * determine if a reservation has been consumed.
*/
if ((flags & (FAULT_FLAG_WRITE|FAULT_FLAG_UNSHARE)) &&
- !huge_pte_write(entry)) {
+ !(vma->vm_flags & VM_MAYSHARE) && !huge_pte_write(entry)) {
if (vma_needs_reservation(h, vma, haddr) < 0) {
ret = VM_FAULT_OOM;
goto out_mutex;
@@ -5779,9 +5793,7 @@ vm_fault_t hugetlb_fault(struct mm_struc
/* Just decrements count, does not deallocate */
vma_end_reservation(h, vma, haddr);

- if (!(vma->vm_flags & VM_MAYSHARE))
- pagecache_page = hugetlbfs_pagecache_page(h,
- vma, haddr);
+ pagecache_page = hugetlbfs_pagecache_page(h, vma, haddr);
}

ptl = huge_pte_lock(h, mm, ptep);


2022-08-29 12:29:50

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 039/158] net: dsa: microchip: move tag_protocol to ksz_common

From: Arun Ramadoss <[email protected]>

[ Upstream commit 534a0431e9e68959e2c0d71c141d5b911d66ad7c ]

This patch move the dsa hook get_tag_protocol to ksz_common file. And
the tag_protocol is returned based on the dev->chip_id.

Signed-off-by: Arun Ramadoss <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/dsa/microchip/ksz8795.c | 13 +------------
drivers/net/dsa/microchip/ksz9477.c | 14 +-------------
drivers/net/dsa/microchip/ksz_common.c | 24 ++++++++++++++++++++++++
drivers/net/dsa/microchip/ksz_common.h | 2 ++
4 files changed, 28 insertions(+), 25 deletions(-)

diff --git a/drivers/net/dsa/microchip/ksz8795.c b/drivers/net/dsa/microchip/ksz8795.c
index 3cc51ee5fb6cc..041956e3c7b1a 100644
--- a/drivers/net/dsa/microchip/ksz8795.c
+++ b/drivers/net/dsa/microchip/ksz8795.c
@@ -898,17 +898,6 @@ static void ksz8_w_phy(struct ksz_device *dev, u16 phy, u16 reg, u16 val)
}
}

-static enum dsa_tag_protocol ksz8_get_tag_protocol(struct dsa_switch *ds,
- int port,
- enum dsa_tag_protocol mp)
-{
- struct ksz_device *dev = ds->priv;
-
- /* ksz88x3 uses the same tag schema as KSZ9893 */
- return ksz_is_ksz88x3(dev) ?
- DSA_TAG_PROTO_KSZ9893 : DSA_TAG_PROTO_KSZ8795;
-}
-
static u32 ksz8_sw_get_phy_flags(struct dsa_switch *ds, int port)
{
/* Silicon Errata Sheet (DS80000830A):
@@ -1394,7 +1383,7 @@ static void ksz8_get_caps(struct dsa_switch *ds, int port,
}

static const struct dsa_switch_ops ksz8_switch_ops = {
- .get_tag_protocol = ksz8_get_tag_protocol,
+ .get_tag_protocol = ksz_get_tag_protocol,
.get_phy_flags = ksz8_sw_get_phy_flags,
.setup = ksz8_setup,
.phy_read = ksz_phy_read16,
diff --git a/drivers/net/dsa/microchip/ksz9477.c b/drivers/net/dsa/microchip/ksz9477.c
index bcfdd505ca79a..31be767027feb 100644
--- a/drivers/net/dsa/microchip/ksz9477.c
+++ b/drivers/net/dsa/microchip/ksz9477.c
@@ -276,18 +276,6 @@ static void ksz9477_port_init_cnt(struct ksz_device *dev, int port)
mutex_unlock(&mib->cnt_mutex);
}

-static enum dsa_tag_protocol ksz9477_get_tag_protocol(struct dsa_switch *ds,
- int port,
- enum dsa_tag_protocol mp)
-{
- enum dsa_tag_protocol proto = DSA_TAG_PROTO_KSZ9477;
- struct ksz_device *dev = ds->priv;
-
- if (dev->features & IS_9893)
- proto = DSA_TAG_PROTO_KSZ9893;
- return proto;
-}
-
static int ksz9477_phy_read16(struct dsa_switch *ds, int addr, int reg)
{
struct ksz_device *dev = ds->priv;
@@ -1329,7 +1317,7 @@ static int ksz9477_setup(struct dsa_switch *ds)
}

static const struct dsa_switch_ops ksz9477_switch_ops = {
- .get_tag_protocol = ksz9477_get_tag_protocol,
+ .get_tag_protocol = ksz_get_tag_protocol,
.setup = ksz9477_setup,
.phy_read = ksz9477_phy_read16,
.phy_write = ksz9477_phy_write16,
diff --git a/drivers/net/dsa/microchip/ksz_common.c b/drivers/net/dsa/microchip/ksz_common.c
index 4511e99823f57..0713a40685fa9 100644
--- a/drivers/net/dsa/microchip/ksz_common.c
+++ b/drivers/net/dsa/microchip/ksz_common.c
@@ -930,6 +930,30 @@ void ksz_port_stp_state_set(struct dsa_switch *ds, int port,
}
EXPORT_SYMBOL_GPL(ksz_port_stp_state_set);

+enum dsa_tag_protocol ksz_get_tag_protocol(struct dsa_switch *ds,
+ int port, enum dsa_tag_protocol mp)
+{
+ struct ksz_device *dev = ds->priv;
+ enum dsa_tag_protocol proto = DSA_TAG_PROTO_NONE;
+
+ if (dev->chip_id == KSZ8795_CHIP_ID ||
+ dev->chip_id == KSZ8794_CHIP_ID ||
+ dev->chip_id == KSZ8765_CHIP_ID)
+ proto = DSA_TAG_PROTO_KSZ8795;
+
+ if (dev->chip_id == KSZ8830_CHIP_ID ||
+ dev->chip_id == KSZ9893_CHIP_ID)
+ proto = DSA_TAG_PROTO_KSZ9893;
+
+ if (dev->chip_id == KSZ9477_CHIP_ID ||
+ dev->chip_id == KSZ9897_CHIP_ID ||
+ dev->chip_id == KSZ9567_CHIP_ID)
+ proto = DSA_TAG_PROTO_KSZ9477;
+
+ return proto;
+}
+EXPORT_SYMBOL_GPL(ksz_get_tag_protocol);
+
static int ksz_switch_detect(struct ksz_device *dev)
{
u8 id1, id2;
diff --git a/drivers/net/dsa/microchip/ksz_common.h b/drivers/net/dsa/microchip/ksz_common.h
index e6bc5fb2b1303..21db6f79035fa 100644
--- a/drivers/net/dsa/microchip/ksz_common.h
+++ b/drivers/net/dsa/microchip/ksz_common.h
@@ -231,6 +231,8 @@ int ksz_port_mdb_del(struct dsa_switch *ds, int port,
int ksz_enable_port(struct dsa_switch *ds, int port, struct phy_device *phy);
void ksz_get_strings(struct dsa_switch *ds, int port,
u32 stringset, uint8_t *buf);
+enum dsa_tag_protocol ksz_get_tag_protocol(struct dsa_switch *ds,
+ int port, enum dsa_tag_protocol mp);

/* Common register access functions */

--
2.35.1



2022-08-29 12:31:05

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 041/158] net: dsa: microchip: move the port mirror to ksz_common

From: Arun Ramadoss <[email protected]>

[ Upstream commit 00a298bbc23876288b1cd04c38752d8e7ed53ae2 ]

This patch updates the common port mirror add/del dsa_switch_ops in
ksz_common.c. The individual switches implementation is executed based
on the ksz_dev_ops function pointers.

Signed-off-by: Arun Ramadoss <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/dsa/microchip/ksz8795.c | 13 ++++++-------
drivers/net/dsa/microchip/ksz9477.c | 12 ++++++------
drivers/net/dsa/microchip/ksz_common.c | 23 +++++++++++++++++++++++
drivers/net/dsa/microchip/ksz_common.h | 10 ++++++++++
4 files changed, 45 insertions(+), 13 deletions(-)

diff --git a/drivers/net/dsa/microchip/ksz8795.c b/drivers/net/dsa/microchip/ksz8795.c
index 16e946dbd9d42..2e3d24a3260e1 100644
--- a/drivers/net/dsa/microchip/ksz8795.c
+++ b/drivers/net/dsa/microchip/ksz8795.c
@@ -1089,12 +1089,10 @@ static int ksz8_port_vlan_del(struct ksz_device *dev, int port,
return 0;
}

-static int ksz8_port_mirror_add(struct dsa_switch *ds, int port,
+static int ksz8_port_mirror_add(struct ksz_device *dev, int port,
struct dsa_mall_mirror_tc_entry *mirror,
bool ingress, struct netlink_ext_ack *extack)
{
- struct ksz_device *dev = ds->priv;
-
if (ingress) {
ksz_port_cfg(dev, port, P_MIRROR_CTRL, PORT_MIRROR_RX, true);
dev->mirror_rx |= BIT(port);
@@ -1113,10 +1111,9 @@ static int ksz8_port_mirror_add(struct dsa_switch *ds, int port,
return 0;
}

-static void ksz8_port_mirror_del(struct dsa_switch *ds, int port,
+static void ksz8_port_mirror_del(struct ksz_device *dev, int port,
struct dsa_mall_mirror_tc_entry *mirror)
{
- struct ksz_device *dev = ds->priv;
u8 data;

if (mirror->ingress) {
@@ -1400,8 +1397,8 @@ static const struct dsa_switch_ops ksz8_switch_ops = {
.port_fdb_dump = ksz_port_fdb_dump,
.port_mdb_add = ksz_port_mdb_add,
.port_mdb_del = ksz_port_mdb_del,
- .port_mirror_add = ksz8_port_mirror_add,
- .port_mirror_del = ksz8_port_mirror_del,
+ .port_mirror_add = ksz_port_mirror_add,
+ .port_mirror_del = ksz_port_mirror_del,
};

static u32 ksz8_get_port_addr(int port, int offset)
@@ -1464,6 +1461,8 @@ static const struct ksz_dev_ops ksz8_dev_ops = {
.vlan_filtering = ksz8_port_vlan_filtering,
.vlan_add = ksz8_port_vlan_add,
.vlan_del = ksz8_port_vlan_del,
+ .mirror_add = ksz8_port_mirror_add,
+ .mirror_del = ksz8_port_mirror_del,
.shutdown = ksz8_reset_switch,
.init = ksz8_switch_init,
.exit = ksz8_switch_exit,
diff --git a/drivers/net/dsa/microchip/ksz9477.c b/drivers/net/dsa/microchip/ksz9477.c
index 1bb994a9109cd..cd4a3088e9473 100644
--- a/drivers/net/dsa/microchip/ksz9477.c
+++ b/drivers/net/dsa/microchip/ksz9477.c
@@ -819,11 +819,10 @@ static int ksz9477_port_mdb_del(struct dsa_switch *ds, int port,
return ret;
}

-static int ksz9477_port_mirror_add(struct dsa_switch *ds, int port,
+static int ksz9477_port_mirror_add(struct ksz_device *dev, int port,
struct dsa_mall_mirror_tc_entry *mirror,
bool ingress, struct netlink_ext_ack *extack)
{
- struct ksz_device *dev = ds->priv;
u8 data;
int p;

@@ -859,10 +858,9 @@ static int ksz9477_port_mirror_add(struct dsa_switch *ds, int port,
return 0;
}

-static void ksz9477_port_mirror_del(struct dsa_switch *ds, int port,
+static void ksz9477_port_mirror_del(struct ksz_device *dev, int port,
struct dsa_mall_mirror_tc_entry *mirror)
{
- struct ksz_device *dev = ds->priv;
bool in_use = false;
u8 data;
int p;
@@ -1335,8 +1333,8 @@ static const struct dsa_switch_ops ksz9477_switch_ops = {
.port_fdb_del = ksz9477_port_fdb_del,
.port_mdb_add = ksz9477_port_mdb_add,
.port_mdb_del = ksz9477_port_mdb_del,
- .port_mirror_add = ksz9477_port_mirror_add,
- .port_mirror_del = ksz9477_port_mirror_del,
+ .port_mirror_add = ksz_port_mirror_add,
+ .port_mirror_del = ksz_port_mirror_del,
.get_stats64 = ksz_get_stats64,
.port_change_mtu = ksz9477_change_mtu,
.port_max_mtu = ksz9477_max_mtu,
@@ -1412,6 +1410,8 @@ static const struct ksz_dev_ops ksz9477_dev_ops = {
.vlan_filtering = ksz9477_port_vlan_filtering,
.vlan_add = ksz9477_port_vlan_add,
.vlan_del = ksz9477_port_vlan_del,
+ .mirror_add = ksz9477_port_mirror_add,
+ .mirror_del = ksz9477_port_mirror_del,
.shutdown = ksz9477_reset_switch,
.init = ksz9477_switch_init,
.exit = ksz9477_switch_exit,
diff --git a/drivers/net/dsa/microchip/ksz_common.c b/drivers/net/dsa/microchip/ksz_common.c
index 5db2b55152885..676669d353ea6 100644
--- a/drivers/net/dsa/microchip/ksz_common.c
+++ b/drivers/net/dsa/microchip/ksz_common.c
@@ -991,6 +991,29 @@ int ksz_port_vlan_del(struct dsa_switch *ds, int port,
}
EXPORT_SYMBOL_GPL(ksz_port_vlan_del);

+int ksz_port_mirror_add(struct dsa_switch *ds, int port,
+ struct dsa_mall_mirror_tc_entry *mirror,
+ bool ingress, struct netlink_ext_ack *extack)
+{
+ struct ksz_device *dev = ds->priv;
+
+ if (!dev->dev_ops->mirror_add)
+ return -EOPNOTSUPP;
+
+ return dev->dev_ops->mirror_add(dev, port, mirror, ingress, extack);
+}
+EXPORT_SYMBOL_GPL(ksz_port_mirror_add);
+
+void ksz_port_mirror_del(struct dsa_switch *ds, int port,
+ struct dsa_mall_mirror_tc_entry *mirror)
+{
+ struct ksz_device *dev = ds->priv;
+
+ if (dev->dev_ops->mirror_del)
+ dev->dev_ops->mirror_del(dev, port, mirror);
+}
+EXPORT_SYMBOL_GPL(ksz_port_mirror_del);
+
static int ksz_switch_detect(struct ksz_device *dev)
{
u8 id1, id2;
diff --git a/drivers/net/dsa/microchip/ksz_common.h b/drivers/net/dsa/microchip/ksz_common.h
index 1baa270859aa2..c724cbb437e29 100644
--- a/drivers/net/dsa/microchip/ksz_common.h
+++ b/drivers/net/dsa/microchip/ksz_common.h
@@ -187,6 +187,11 @@ struct ksz_dev_ops {
struct netlink_ext_ack *extack);
int (*vlan_del)(struct ksz_device *dev, int port,
const struct switchdev_obj_port_vlan *vlan);
+ int (*mirror_add)(struct ksz_device *dev, int port,
+ struct dsa_mall_mirror_tc_entry *mirror,
+ bool ingress, struct netlink_ext_ack *extack);
+ void (*mirror_del)(struct ksz_device *dev, int port,
+ struct dsa_mall_mirror_tc_entry *mirror);
void (*freeze_mib)(struct ksz_device *dev, int port, bool freeze);
void (*port_init_cnt)(struct ksz_device *dev, int port);
int (*shutdown)(struct ksz_device *dev);
@@ -247,6 +252,11 @@ int ksz_port_vlan_add(struct dsa_switch *ds, int port,
struct netlink_ext_ack *extack);
int ksz_port_vlan_del(struct dsa_switch *ds, int port,
const struct switchdev_obj_port_vlan *vlan);
+int ksz_port_mirror_add(struct dsa_switch *ds, int port,
+ struct dsa_mall_mirror_tc_entry *mirror,
+ bool ingress, struct netlink_ext_ack *extack);
+void ksz_port_mirror_del(struct dsa_switch *ds, int port,
+ struct dsa_mall_mirror_tc_entry *mirror);

/* Common register access functions */

--
2.35.1



2022-08-29 12:31:29

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 121/158] fbdev: fbcon: Properly revert changes when vc_resize() failed

From: Shigeru Yoshida <[email protected]>

commit a5a923038d70d2d4a86cb4e3f32625a5ee6e7e24 upstream.

fbcon_do_set_font() calls vc_resize() when font size is changed.
However, if if vc_resize() failed, current implementation doesn't
revert changes for font size, and this causes inconsistent state.

syzbot reported unable to handle page fault due to this issue [1].
syzbot's repro uses fault injection which cause failure for memory
allocation, so vc_resize() failed.

This patch fixes this issue by properly revert changes for font
related date when vc_resize() failed.

Link: https://syzkaller.appspot.com/bug?id=3443d3a1fa6d964dd7310a0cb1696d165a3e07c4 [1]
Reported-by: [email protected]
Signed-off-by: Shigeru Yoshida <[email protected]>
Signed-off-by: Helge Deller <[email protected]>
CC: [email protected] # 5.15+
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/video/fbdev/core/fbcon.c | 27 +++++++++++++++++++++++++--
1 file changed, 25 insertions(+), 2 deletions(-)

--- a/drivers/video/fbdev/core/fbcon.c
+++ b/drivers/video/fbdev/core/fbcon.c
@@ -2402,15 +2402,21 @@ static int fbcon_do_set_font(struct vc_d
struct fb_info *info = fbcon_info_from_console(vc->vc_num);
struct fbcon_ops *ops = info->fbcon_par;
struct fbcon_display *p = &fb_display[vc->vc_num];
- int resize;
+ int resize, ret, old_userfont, old_width, old_height, old_charcount;
char *old_data = NULL;

resize = (w != vc->vc_font.width) || (h != vc->vc_font.height);
if (p->userfont)
old_data = vc->vc_font.data;
vc->vc_font.data = (void *)(p->fontdata = data);
+ old_userfont = p->userfont;
if ((p->userfont = userfont))
REFCOUNT(data)++;
+
+ old_width = vc->vc_font.width;
+ old_height = vc->vc_font.height;
+ old_charcount = vc->vc_font.charcount;
+
vc->vc_font.width = w;
vc->vc_font.height = h;
vc->vc_font.charcount = charcount;
@@ -2426,7 +2432,9 @@ static int fbcon_do_set_font(struct vc_d
rows = FBCON_SWAP(ops->rotate, info->var.yres, info->var.xres);
cols /= w;
rows /= h;
- vc_resize(vc, cols, rows);
+ ret = vc_resize(vc, cols, rows);
+ if (ret)
+ goto err_out;
} else if (con_is_visible(vc)
&& vc->vc_mode == KD_TEXT) {
fbcon_clear_margins(vc, 0);
@@ -2436,6 +2444,21 @@ static int fbcon_do_set_font(struct vc_d
if (old_data && (--REFCOUNT(old_data) == 0))
kfree(old_data - FONT_EXTRA_WORDS * sizeof(int));
return 0;
+
+err_out:
+ p->fontdata = old_data;
+ vc->vc_font.data = (void *)old_data;
+
+ if (userfont) {
+ p->userfont = old_userfont;
+ REFCOUNT(data)--;
+ }
+
+ vc->vc_font.width = old_width;
+ vc->vc_font.height = old_height;
+ vc->vc_font.charcount = old_charcount;
+
+ return ret;
}

/*


2022-08-29 12:33:01

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 061/158] netfilter: nft_tunnel: restrict it to netdev family

From: Pablo Neira Ayuso <[email protected]>

[ Upstream commit 01e4092d53bc4fe122a6e4b6d664adbd57528ca3 ]

Only allow to use this expression from NFPROTO_NETDEV family.

Fixes: af308b94a2a4 ("netfilter: nf_tables: add tunnel support")
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/netfilter/nft_tunnel.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/net/netfilter/nft_tunnel.c b/net/netfilter/nft_tunnel.c
index d0f9b1d51b0e9..96b03e0bf74ff 100644
--- a/net/netfilter/nft_tunnel.c
+++ b/net/netfilter/nft_tunnel.c
@@ -161,6 +161,7 @@ static const struct nft_expr_ops nft_tunnel_get_ops = {

static struct nft_expr_type nft_tunnel_type __read_mostly = {
.name = "tunnel",
+ .family = NFPROTO_NETDEV,
.ops = &nft_tunnel_get_ops,
.policy = nft_tunnel_policy,
.maxattr = NFTA_TUNNEL_MAX,
--
2.35.1



2022-08-29 12:33:09

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 089/158] ionic: VF initial random MAC address if no assigned mac

From: R Mohamed Shah <[email protected]>

[ Upstream commit 19058be7c48ceb3e60fa3948e24da1059bd68ee4 ]

Assign a random mac address to the VF interface station
address if it boots with a zero mac address in order to match
similar behavior seen in other VF drivers. Handle the errors
where the older firmware does not allow the VF to set its own
station address.

Newer firmware will allow the VF to set the station mac address
if it hasn't already been set administratively through the PF.
Setting it will also be allowed if the VF has trust.

Fixes: fbb39807e9ae ("ionic: support sr-iov operations")
Signed-off-by: R Mohamed Shah <[email protected]>
Signed-off-by: Shannon Nelson <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
.../net/ethernet/pensando/ionic/ionic_lif.c | 92 ++++++++++++++++++-
1 file changed, 87 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/pensando/ionic/ionic_lif.c b/drivers/net/ethernet/pensando/ionic/ionic_lif.c
index d4226999547e8..0be79c5167813 100644
--- a/drivers/net/ethernet/pensando/ionic/ionic_lif.c
+++ b/drivers/net/ethernet/pensando/ionic/ionic_lif.c
@@ -1564,8 +1564,67 @@ static int ionic_set_features(struct net_device *netdev,
return err;
}

+static int ionic_set_attr_mac(struct ionic_lif *lif, u8 *mac)
+{
+ struct ionic_admin_ctx ctx = {
+ .work = COMPLETION_INITIALIZER_ONSTACK(ctx.work),
+ .cmd.lif_setattr = {
+ .opcode = IONIC_CMD_LIF_SETATTR,
+ .index = cpu_to_le16(lif->index),
+ .attr = IONIC_LIF_ATTR_MAC,
+ },
+ };
+
+ ether_addr_copy(ctx.cmd.lif_setattr.mac, mac);
+ return ionic_adminq_post_wait(lif, &ctx);
+}
+
+static int ionic_get_attr_mac(struct ionic_lif *lif, u8 *mac_addr)
+{
+ struct ionic_admin_ctx ctx = {
+ .work = COMPLETION_INITIALIZER_ONSTACK(ctx.work),
+ .cmd.lif_getattr = {
+ .opcode = IONIC_CMD_LIF_GETATTR,
+ .index = cpu_to_le16(lif->index),
+ .attr = IONIC_LIF_ATTR_MAC,
+ },
+ };
+ int err;
+
+ err = ionic_adminq_post_wait(lif, &ctx);
+ if (err)
+ return err;
+
+ ether_addr_copy(mac_addr, ctx.comp.lif_getattr.mac);
+ return 0;
+}
+
+static int ionic_program_mac(struct ionic_lif *lif, u8 *mac)
+{
+ u8 get_mac[ETH_ALEN];
+ int err;
+
+ err = ionic_set_attr_mac(lif, mac);
+ if (err)
+ return err;
+
+ err = ionic_get_attr_mac(lif, get_mac);
+ if (err)
+ return err;
+
+ /* To deal with older firmware that silently ignores the set attr mac:
+ * doesn't actually change the mac and doesn't return an error, so we
+ * do the get attr to verify whether or not the set actually happened
+ */
+ if (!ether_addr_equal(get_mac, mac))
+ return 1;
+
+ return 0;
+}
+
static int ionic_set_mac_address(struct net_device *netdev, void *sa)
{
+ struct ionic_lif *lif = netdev_priv(netdev);
struct sockaddr *addr = sa;
u8 *mac;
int err;
@@ -1574,6 +1633,14 @@ static int ionic_set_mac_address(struct net_device *netdev, void *sa)
if (ether_addr_equal(netdev->dev_addr, mac))
return 0;

+ err = ionic_program_mac(lif, mac);
+ if (err < 0)
+ return err;
+
+ if (err > 0)
+ netdev_dbg(netdev, "%s: SET and GET ATTR Mac are not equal-due to old FW running\n",
+ __func__);
+
err = eth_prepare_mac_addr_change(netdev, addr);
if (err)
return err;
@@ -3172,6 +3239,7 @@ static int ionic_station_set(struct ionic_lif *lif)
.attr = IONIC_LIF_ATTR_MAC,
},
};
+ u8 mac_address[ETH_ALEN];
struct sockaddr addr;
int err;

@@ -3180,8 +3248,23 @@ static int ionic_station_set(struct ionic_lif *lif)
return err;
netdev_dbg(lif->netdev, "found initial MAC addr %pM\n",
ctx.comp.lif_getattr.mac);
- if (is_zero_ether_addr(ctx.comp.lif_getattr.mac))
- return 0;
+ ether_addr_copy(mac_address, ctx.comp.lif_getattr.mac);
+
+ if (is_zero_ether_addr(mac_address)) {
+ eth_hw_addr_random(netdev);
+ netdev_dbg(netdev, "Random Mac generated: %pM\n", netdev->dev_addr);
+ ether_addr_copy(mac_address, netdev->dev_addr);
+
+ err = ionic_program_mac(lif, mac_address);
+ if (err < 0)
+ return err;
+
+ if (err > 0) {
+ netdev_dbg(netdev, "%s:SET/GET ATTR Mac are not same-due to old FW running\n",
+ __func__);
+ return 0;
+ }
+ }

if (!is_zero_ether_addr(netdev->dev_addr)) {
/* If the netdev mac is non-zero and doesn't match the default
@@ -3189,12 +3272,11 @@ static int ionic_station_set(struct ionic_lif *lif)
* likely here again after a fw-upgrade reset. We need to be
* sure the netdev mac is in our filter list.
*/
- if (!ether_addr_equal(ctx.comp.lif_getattr.mac,
- netdev->dev_addr))
+ if (!ether_addr_equal(mac_address, netdev->dev_addr))
ionic_lif_addr_add(lif, netdev->dev_addr);
} else {
/* Update the netdev mac with the device's mac */
- memcpy(addr.sa_data, ctx.comp.lif_getattr.mac, netdev->addr_len);
+ ether_addr_copy(addr.sa_data, mac_address);
addr.sa_family = AF_INET;
err = eth_prepare_mac_addr_change(netdev, &addr);
if (err) {
--
2.35.1



2022-08-29 12:33:12

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 024/158] r8152: fix the units of some registers for RTL8156A

From: Hayes Wang <[email protected]>

[ Upstream commit 6dc4df12d741c0fe8f885778a43039e0619b9cd9 ]

The units of PLA_RX_FIFO_FULL and PLA_RX_FIFO_EMPTY are 16 bytes.

Fixes: 195aae321c82 ("r8152: support new chips")
Signed-off-by: Hayes Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/usb/r8152.c | 17 ++---------------
1 file changed, 2 insertions(+), 15 deletions(-)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index 0f6efaabaa32b..46c7954d27629 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -6431,21 +6431,8 @@ static void r8156_fc_parameter(struct r8152 *tp)
u32 pause_on = tp->fc_pause_on ? tp->fc_pause_on : fc_pause_on_auto(tp);
u32 pause_off = tp->fc_pause_off ? tp->fc_pause_off : fc_pause_off_auto(tp);

- switch (tp->version) {
- case RTL_VER_10:
- case RTL_VER_11:
- ocp_write_word(tp, MCU_TYPE_PLA, PLA_RX_FIFO_FULL, pause_on / 8);
- ocp_write_word(tp, MCU_TYPE_PLA, PLA_RX_FIFO_EMPTY, pause_off / 8);
- break;
- case RTL_VER_12:
- case RTL_VER_13:
- case RTL_VER_15:
- ocp_write_word(tp, MCU_TYPE_PLA, PLA_RX_FIFO_FULL, pause_on / 16);
- ocp_write_word(tp, MCU_TYPE_PLA, PLA_RX_FIFO_EMPTY, pause_off / 16);
- break;
- default:
- break;
- }
+ ocp_write_word(tp, MCU_TYPE_PLA, PLA_RX_FIFO_FULL, pause_on / 16);
+ ocp_write_word(tp, MCU_TYPE_PLA, PLA_RX_FIFO_EMPTY, pause_off / 16);
}

static void rtl8156_change_mtu(struct r8152 *tp)
--
2.35.1



2022-08-29 12:34:03

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 049/158] netfilter: ebtables: reject blobs that dont provide all entry points

From: Florian Westphal <[email protected]>

[ Upstream commit 7997eff82828304b780dc0a39707e1946d6f1ebf ]

Harshit Mogalapalli says:
In ebt_do_table() function dereferencing 'private->hook_entry[hook]'
can lead to NULL pointer dereference. [..] Kernel panic:

general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f]
[..]
RIP: 0010:ebt_do_table+0x1dc/0x1ce0
Code: 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 5c 16 00 00 48 b8 00 00 00 00 00 fc ff df 49 8b 6c df 08 48 8d 7d 2c 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 88
[..]
Call Trace:
nf_hook_slow+0xb1/0x170
__br_forward+0x289/0x730
maybe_deliver+0x24b/0x380
br_flood+0xc6/0x390
br_dev_xmit+0xa2e/0x12c0

For some reason ebtables rejects blobs that provide entry points that are
not supported by the table, but what it should instead reject is the
opposite: blobs that DO NOT provide an entry point supported by the table.

t->valid_hooks is the bitmask of hooks (input, forward ...) that will see
packets. Providing an entry point that is not support is harmless
(never called/used), but the inverse isn't: it results in a crash
because the ebtables traverser doesn't expect a NULL blob for a location
its receiving packets for.

Instead of fixing all the individual checks, do what iptables is doing and
reject all blobs that differ from the expected hooks.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: Harshit Mogalapalli <[email protected]>
Reported-by: syzkaller <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
include/linux/netfilter_bridge/ebtables.h | 4 ----
net/bridge/netfilter/ebtable_broute.c | 8 --------
net/bridge/netfilter/ebtable_filter.c | 8 --------
net/bridge/netfilter/ebtable_nat.c | 8 --------
net/bridge/netfilter/ebtables.c | 8 +-------
5 files changed, 1 insertion(+), 35 deletions(-)

diff --git a/include/linux/netfilter_bridge/ebtables.h b/include/linux/netfilter_bridge/ebtables.h
index a13296d6c7ceb..fd533552a062c 100644
--- a/include/linux/netfilter_bridge/ebtables.h
+++ b/include/linux/netfilter_bridge/ebtables.h
@@ -94,10 +94,6 @@ struct ebt_table {
struct ebt_replace_kernel *table;
unsigned int valid_hooks;
rwlock_t lock;
- /* e.g. could be the table explicitly only allows certain
- * matches, targets, ... 0 == let it in */
- int (*check)(const struct ebt_table_info *info,
- unsigned int valid_hooks);
/* the data used by the kernel */
struct ebt_table_info *private;
struct nf_hook_ops *ops;
diff --git a/net/bridge/netfilter/ebtable_broute.c b/net/bridge/netfilter/ebtable_broute.c
index 1a11064f99907..8f19253024b0a 100644
--- a/net/bridge/netfilter/ebtable_broute.c
+++ b/net/bridge/netfilter/ebtable_broute.c
@@ -36,18 +36,10 @@ static struct ebt_replace_kernel initial_table = {
.entries = (char *)&initial_chain,
};

-static int check(const struct ebt_table_info *info, unsigned int valid_hooks)
-{
- if (valid_hooks & ~(1 << NF_BR_BROUTING))
- return -EINVAL;
- return 0;
-}
-
static const struct ebt_table broute_table = {
.name = "broute",
.table = &initial_table,
.valid_hooks = 1 << NF_BR_BROUTING,
- .check = check,
.me = THIS_MODULE,
};

diff --git a/net/bridge/netfilter/ebtable_filter.c b/net/bridge/netfilter/ebtable_filter.c
index cb949436bc0e3..278f324e67524 100644
--- a/net/bridge/netfilter/ebtable_filter.c
+++ b/net/bridge/netfilter/ebtable_filter.c
@@ -43,18 +43,10 @@ static struct ebt_replace_kernel initial_table = {
.entries = (char *)initial_chains,
};

-static int check(const struct ebt_table_info *info, unsigned int valid_hooks)
-{
- if (valid_hooks & ~FILTER_VALID_HOOKS)
- return -EINVAL;
- return 0;
-}
-
static const struct ebt_table frame_filter = {
.name = "filter",
.table = &initial_table,
.valid_hooks = FILTER_VALID_HOOKS,
- .check = check,
.me = THIS_MODULE,
};

diff --git a/net/bridge/netfilter/ebtable_nat.c b/net/bridge/netfilter/ebtable_nat.c
index 5ee0531ae5061..9066f7f376d57 100644
--- a/net/bridge/netfilter/ebtable_nat.c
+++ b/net/bridge/netfilter/ebtable_nat.c
@@ -43,18 +43,10 @@ static struct ebt_replace_kernel initial_table = {
.entries = (char *)initial_chains,
};

-static int check(const struct ebt_table_info *info, unsigned int valid_hooks)
-{
- if (valid_hooks & ~NAT_VALID_HOOKS)
- return -EINVAL;
- return 0;
-}
-
static const struct ebt_table frame_nat = {
.name = "nat",
.table = &initial_table,
.valid_hooks = NAT_VALID_HOOKS,
- .check = check,
.me = THIS_MODULE,
};

diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c
index f2dbefb61ce83..9a0ae59cdc500 100644
--- a/net/bridge/netfilter/ebtables.c
+++ b/net/bridge/netfilter/ebtables.c
@@ -1040,8 +1040,7 @@ static int do_replace_finish(struct net *net, struct ebt_replace *repl,
goto free_iterate;
}

- /* the table doesn't like it */
- if (t->check && (ret = t->check(newinfo, repl->valid_hooks)))
+ if (repl->valid_hooks != t->valid_hooks)
goto free_unlock;

if (repl->num_counters && repl->num_counters != t->private->nentries) {
@@ -1231,11 +1230,6 @@ int ebt_register_table(struct net *net, const struct ebt_table *input_table,
if (ret != 0)
goto free_chainstack;

- if (table->check && table->check(newinfo, table->valid_hooks)) {
- ret = -EINVAL;
- goto free_chainstack;
- }
-
table->private = newinfo;
rwlock_init(&table->lock);
mutex_lock(&ebt_mutex);
--
2.35.1



2022-08-29 12:34:14

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 050/158] netfilter: nft_tproxy: restrict to prerouting hook

From: Florian Westphal <[email protected]>

[ Upstream commit 18bbc3213383a82b05383827f4b1b882e3f0a5a5 ]

TPROXY is only allowed from prerouting, but nft_tproxy doesn't check this.
This fixes a crash (null dereference) when using tproxy from e.g. output.

Fixes: 4ed8eb6570a4 ("netfilter: nf_tables: Add native tproxy support")
Reported-by: Shell Chen <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/netfilter/nft_tproxy.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/net/netfilter/nft_tproxy.c b/net/netfilter/nft_tproxy.c
index 801f013971dfa..a701ad64f10af 100644
--- a/net/netfilter/nft_tproxy.c
+++ b/net/netfilter/nft_tproxy.c
@@ -312,6 +312,13 @@ static int nft_tproxy_dump(struct sk_buff *skb,
return 0;
}

+static int nft_tproxy_validate(const struct nft_ctx *ctx,
+ const struct nft_expr *expr,
+ const struct nft_data **data)
+{
+ return nft_chain_validate_hooks(ctx->chain, 1 << NF_INET_PRE_ROUTING);
+}
+
static struct nft_expr_type nft_tproxy_type;
static const struct nft_expr_ops nft_tproxy_ops = {
.type = &nft_tproxy_type,
@@ -321,6 +328,7 @@ static const struct nft_expr_ops nft_tproxy_ops = {
.destroy = nft_tproxy_destroy,
.dump = nft_tproxy_dump,
.reduce = NFT_REDUCE_READONLY,
+ .validate = nft_tproxy_validate,
};

static struct nft_expr_type nft_tproxy_type __read_mostly = {
--
2.35.1



2022-08-29 12:34:16

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 064/158] netfilter: flowtable: fix stuck flows on cleanup due to pending work

From: Pablo Neira Ayuso <[email protected]>

[ Upstream commit 9afb4b27349a499483ae0134282cefd0c90f480f ]

To clear the flow table on flow table free, the following sequence
normally happens in order:

1) gc_step work is stopped to disable any further stats/del requests.
2) All flow table entries are set to teardown state.
3) Run gc_step which will queue HW del work for each flow table entry.
4) Waiting for the above del work to finish (flush).
5) Run gc_step again, deleting all entries from the flow table.
6) Flow table is freed.

But if a flow table entry already has pending HW stats or HW add work
step 3 will not queue HW del work (it will be skipped), step 4 will wait
for the pending add/stats to finish, and step 5 will queue HW del work
which might execute after freeing of the flow table.

To fix the above, this patch flushes the pending work, then it sets the
teardown flag to all flows in the flowtable and it forces a garbage
collector run to queue work to remove the flows from hardware, then it
flushes this new pending work and (finally) it forces another garbage
collector run to remove the entry from the software flowtable.

Stack trace:
[47773.882335] BUG: KASAN: use-after-free in down_read+0x99/0x460
[47773.883634] Write of size 8 at addr ffff888103b45aa8 by task kworker/u20:6/543704
[47773.885634] CPU: 3 PID: 543704 Comm: kworker/u20:6 Not tainted 5.12.0-rc7+ #2
[47773.886745] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009)
[47773.888438] Workqueue: nf_ft_offload_del flow_offload_work_handler [nf_flow_table]
[47773.889727] Call Trace:
[47773.890214] dump_stack+0xbb/0x107
[47773.890818] print_address_description.constprop.0+0x18/0x140
[47773.892990] kasan_report.cold+0x7c/0xd8
[47773.894459] kasan_check_range+0x145/0x1a0
[47773.895174] down_read+0x99/0x460
[47773.899706] nf_flow_offload_tuple+0x24f/0x3c0 [nf_flow_table]
[47773.907137] flow_offload_work_handler+0x72d/0xbe0 [nf_flow_table]
[47773.913372] process_one_work+0x8ac/0x14e0
[47773.921325]
[47773.921325] Allocated by task 592159:
[47773.922031] kasan_save_stack+0x1b/0x40
[47773.922730] __kasan_kmalloc+0x7a/0x90
[47773.923411] tcf_ct_flow_table_get+0x3cb/0x1230 [act_ct]
[47773.924363] tcf_ct_init+0x71c/0x1156 [act_ct]
[47773.925207] tcf_action_init_1+0x45b/0x700
[47773.925987] tcf_action_init+0x453/0x6b0
[47773.926692] tcf_exts_validate+0x3d0/0x600
[47773.927419] fl_change+0x757/0x4a51 [cls_flower]
[47773.928227] tc_new_tfilter+0x89a/0x2070
[47773.936652]
[47773.936652] Freed by task 543704:
[47773.937303] kasan_save_stack+0x1b/0x40
[47773.938039] kasan_set_track+0x1c/0x30
[47773.938731] kasan_set_free_info+0x20/0x30
[47773.939467] __kasan_slab_free+0xe7/0x120
[47773.940194] slab_free_freelist_hook+0x86/0x190
[47773.941038] kfree+0xce/0x3a0
[47773.941644] tcf_ct_flow_table_cleanup_work

Original patch description and stack trace by Paul Blakey.

Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support")
Reported-by: Paul Blakey <[email protected]>
Tested-by: Paul Blakey <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
include/net/netfilter/nf_flow_table.h | 2 ++
net/netfilter/nf_flow_table_core.c | 7 +++----
net/netfilter/nf_flow_table_offload.c | 8 ++++++++
3 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h
index 32c25122ab184..9c93e4981b680 100644
--- a/include/net/netfilter/nf_flow_table.h
+++ b/include/net/netfilter/nf_flow_table.h
@@ -307,6 +307,8 @@ void nf_flow_offload_stats(struct nf_flowtable *flowtable,
struct flow_offload *flow);

void nf_flow_table_offload_flush(struct nf_flowtable *flowtable);
+void nf_flow_table_offload_flush_cleanup(struct nf_flowtable *flowtable);
+
int nf_flow_table_offload_setup(struct nf_flowtable *flowtable,
struct net_device *dev,
enum flow_block_command cmd);
diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
index 18453fa25199c..483b18d35cade 100644
--- a/net/netfilter/nf_flow_table_core.c
+++ b/net/netfilter/nf_flow_table_core.c
@@ -610,12 +610,11 @@ void nf_flow_table_free(struct nf_flowtable *flow_table)
mutex_unlock(&flowtable_lock);

cancel_delayed_work_sync(&flow_table->gc_work);
+ nf_flow_table_offload_flush(flow_table);
+ /* ... no more pending work after this stage ... */
nf_flow_table_iterate(flow_table, nf_flow_table_do_cleanup, NULL);
nf_flow_table_gc_run(flow_table);
- nf_flow_table_offload_flush(flow_table);
- if (nf_flowtable_hw_offload(flow_table))
- nf_flow_table_gc_run(flow_table);
-
+ nf_flow_table_offload_flush_cleanup(flow_table);
rhashtable_destroy(&flow_table->rhashtable);
}
EXPORT_SYMBOL_GPL(nf_flow_table_free);
diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c
index 11b6e19420920..4d1169b634c5f 100644
--- a/net/netfilter/nf_flow_table_offload.c
+++ b/net/netfilter/nf_flow_table_offload.c
@@ -1063,6 +1063,14 @@ void nf_flow_offload_stats(struct nf_flowtable *flowtable,
flow_offload_queue_work(offload);
}

+void nf_flow_table_offload_flush_cleanup(struct nf_flowtable *flowtable)
+{
+ if (nf_flowtable_hw_offload(flowtable)) {
+ flush_workqueue(nf_flow_offload_del_wq);
+ nf_flow_table_gc_run(flowtable);
+ }
+}
+
void nf_flow_table_offload_flush(struct nf_flowtable *flowtable)
{
if (nf_flowtable_hw_offload(flowtable)) {
--
2.35.1



2022-08-29 12:35:30

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 059/158] netfilter: nf_tables: do not leave chain stats enabled on error

From: Pablo Neira Ayuso <[email protected]>

[ Upstream commit 43eb8949cfdffa764b92bc6c54b87cbe5b0003fe ]

Error might occur later in the nf_tables_addchain() codepath, enable
static key only after transaction has been created.

Fixes: 9f08ea848117 ("netfilter: nf_tables: keep chain counters away from hot path")
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/netfilter/nf_tables_api.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index e171257739c2f..b2c89e8c2a655 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -2195,9 +2195,9 @@ static int nf_tables_addchain(struct nft_ctx *ctx, u8 family, u8 genmask,
struct netlink_ext_ack *extack)
{
const struct nlattr * const *nla = ctx->nla;
+ struct nft_stats __percpu *stats = NULL;
struct nft_table *table = ctx->table;
struct nft_base_chain *basechain;
- struct nft_stats __percpu *stats;
struct net *net = ctx->net;
char name[NFT_NAME_MAXLEN];
struct nft_rule_blob *blob;
@@ -2235,7 +2235,6 @@ static int nf_tables_addchain(struct nft_ctx *ctx, u8 family, u8 genmask,
return PTR_ERR(stats);
}
rcu_assign_pointer(basechain->stats, stats);
- static_branch_inc(&nft_counters_enabled);
}

err = nft_basechain_init(basechain, family, &hook, flags);
@@ -2318,6 +2317,9 @@ static int nf_tables_addchain(struct nft_ctx *ctx, u8 family, u8 genmask,
goto err_unregister_hook;
}

+ if (stats)
+ static_branch_inc(&nft_counters_enabled);
+
table->use++;

return 0;
--
2.35.1



2022-08-29 12:35:31

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 126/158] smb3: missing inode locks in punch hole

From: David Howells <[email protected]>

commit ba0803050d610d5072666be727bca5e03e55b242 upstream.

smb3 fallocate punch hole was not grabbing the inode or filemap_invalidate
locks so could have race with pagemap reinstantiating the page.

Cc: [email protected]
Signed-off-by: David Howells <[email protected]>
Signed-off-by: Steve French <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/cifs/smb2ops.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)

--- a/fs/cifs/smb2ops.c
+++ b/fs/cifs/smb2ops.c
@@ -3671,7 +3671,7 @@ static long smb3_zero_range(struct file
static long smb3_punch_hole(struct file *file, struct cifs_tcon *tcon,
loff_t offset, loff_t len)
{
- struct inode *inode;
+ struct inode *inode = file_inode(file);
struct cifsFileInfo *cfile = file->private_data;
struct file_zero_data_information fsctl_buf;
long rc;
@@ -3680,14 +3680,12 @@ static long smb3_punch_hole(struct file

xid = get_xid();

- inode = d_inode(cfile->dentry);
-
+ inode_lock(inode);
/* Need to make file sparse, if not already, before freeing range. */
/* Consider adding equivalent for compressed since it could also work */
if (!smb2_set_sparse(xid, tcon, cfile, inode, set_sparse)) {
rc = -EOPNOTSUPP;
- free_xid(xid);
- return rc;
+ goto out;
}

filemap_invalidate_lock(inode->i_mapping);
@@ -3707,8 +3705,10 @@ static long smb3_punch_hole(struct file
true /* is_fctl */, (char *)&fsctl_buf,
sizeof(struct file_zero_data_information),
CIFSMaxBufSize, NULL, NULL);
- free_xid(xid);
filemap_invalidate_unlock(inode->i_mapping);
+out:
+ inode_unlock(inode);
+ free_xid(xid);
return rc;
}



2022-08-29 12:35:54

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 157/158] riscv: dts: microchip: mpfs: remove pci axi address translation property

From: Conor Dooley <[email protected]>

commit e4009c5fa77b4356aa37ce002e9f9952dfd7a615 upstream.

An AXI master address translation table property was inadvertently
added to the device tree & this was not caught by dtbs_check at the
time. Remove the property - it should not be in mpfs.dtsi anyway as
it would be more suitable in -fabric.dtsi nor does it actually apply
to the version of the reference design we are using for upstream.

Link: https://www.microsemi.com/document-portal/doc_download/1245812-polarfire-fpga-and-polarfire-soc-fpga-pci-express-user-guide # Section 1.3.3
Fixes: 528a5b1f2556 ("riscv: dts: microchip: add new peripherals to icicle kit device tree")
Signed-off-by: Conor Dooley <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/riscv/boot/dts/microchip/mpfs.dtsi | 1 -
1 file changed, 1 deletion(-)

--- a/arch/riscv/boot/dts/microchip/mpfs.dtsi
+++ b/arch/riscv/boot/dts/microchip/mpfs.dtsi
@@ -446,7 +446,6 @@
ranges = <0x3000000 0x0 0x8000000 0x20 0x8000000 0x0 0x80000000>;
msi-parent = <&pcie>;
msi-controller;
- microchip,axi-m-atr0 = <0x10 0x0>;
status = "disabled";
pcie_intc: interrupt-controller {
#address-cells = <0>;


2022-08-29 12:35:59

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 047/158] bonding: 802.3ad: fix no transmission of LACPDUs

From: Jonathan Toppins <[email protected]>

[ Upstream commit d745b5062ad2b5da90a5e728d7ca884fc07315fd ]

This is caused by the global variable ad_ticks_per_sec being zero as
demonstrated by the reproducer script discussed below. This causes
all timer values in __ad_timer_to_ticks to be zero, resulting
in the periodic timer to never fire.

To reproduce:
Run the script in
`tools/testing/selftests/drivers/net/bonding/bond-break-lacpdu-tx.sh` which
puts bonding into a state where it never transmits LACPDUs.

line 44: ip link add fbond type bond mode 4 miimon 200 \
xmit_hash_policy 1 ad_actor_sys_prio 65535 lacp_rate fast
setting bond param: ad_actor_sys_prio
given:
params.ad_actor_system = 0
call stack:
bond_option_ad_actor_sys_prio()
-> bond_3ad_update_ad_actor_settings()
-> set ad.system.sys_priority = bond->params.ad_actor_sys_prio
-> ad.system.sys_mac_addr = bond->dev->dev_addr; because
params.ad_actor_system == 0
results:
ad.system.sys_mac_addr = bond->dev->dev_addr

line 48: ip link set fbond address 52:54:00:3B:7C:A6
setting bond MAC addr
call stack:
bond->dev->dev_addr = new_mac

line 52: ip link set fbond type bond ad_actor_sys_prio 65535
setting bond param: ad_actor_sys_prio
given:
params.ad_actor_system = 0
call stack:
bond_option_ad_actor_sys_prio()
-> bond_3ad_update_ad_actor_settings()
-> set ad.system.sys_priority = bond->params.ad_actor_sys_prio
-> ad.system.sys_mac_addr = bond->dev->dev_addr; because
params.ad_actor_system == 0
results:
ad.system.sys_mac_addr = bond->dev->dev_addr

line 60: ip link set veth1-bond down master fbond
given:
params.ad_actor_system = 0
params.mode = BOND_MODE_8023AD
ad.system.sys_mac_addr == bond->dev->dev_addr
call stack:
bond_enslave
-> bond_3ad_initialize(); because first slave
-> if ad.system.sys_mac_addr != bond->dev->dev_addr
return
results:
Nothing is run in bond_3ad_initialize() because dev_addr equals
sys_mac_addr leaving the global ad_ticks_per_sec zero as it is
never initialized anywhere else.

The if check around the contents of bond_3ad_initialize() is no longer
needed due to commit 5ee14e6d336f ("bonding: 3ad: apply ad_actor settings
changes immediately") which sets ad.system.sys_mac_addr if any one of
the bonding parameters whos set function calls
bond_3ad_update_ad_actor_settings(). This is because if
ad.system.sys_mac_addr is zero it will be set to the current bond mac
address, this causes the if check to never be true.

Fixes: 5ee14e6d336f ("bonding: 3ad: apply ad_actor settings changes immediately")
Signed-off-by: Jonathan Toppins <[email protected]>
Acked-by: Jay Vosburgh <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/bonding/bond_3ad.c | 38 ++++++++++++++--------------------
1 file changed, 16 insertions(+), 22 deletions(-)

diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index d7fb33c078e81..1f0120cbe9e80 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -2007,30 +2007,24 @@ void bond_3ad_initiate_agg_selection(struct bonding *bond, int timeout)
*/
void bond_3ad_initialize(struct bonding *bond, u16 tick_resolution)
{
- /* check that the bond is not initialized yet */
- if (!MAC_ADDRESS_EQUAL(&(BOND_AD_INFO(bond).system.sys_mac_addr),
- bond->dev->dev_addr)) {
-
- BOND_AD_INFO(bond).aggregator_identifier = 0;
-
- BOND_AD_INFO(bond).system.sys_priority =
- bond->params.ad_actor_sys_prio;
- if (is_zero_ether_addr(bond->params.ad_actor_system))
- BOND_AD_INFO(bond).system.sys_mac_addr =
- *((struct mac_addr *)bond->dev->dev_addr);
- else
- BOND_AD_INFO(bond).system.sys_mac_addr =
- *((struct mac_addr *)bond->params.ad_actor_system);
+ BOND_AD_INFO(bond).aggregator_identifier = 0;
+ BOND_AD_INFO(bond).system.sys_priority =
+ bond->params.ad_actor_sys_prio;
+ if (is_zero_ether_addr(bond->params.ad_actor_system))
+ BOND_AD_INFO(bond).system.sys_mac_addr =
+ *((struct mac_addr *)bond->dev->dev_addr);
+ else
+ BOND_AD_INFO(bond).system.sys_mac_addr =
+ *((struct mac_addr *)bond->params.ad_actor_system);

- /* initialize how many times this module is called in one
- * second (should be about every 100ms)
- */
- ad_ticks_per_sec = tick_resolution;
+ /* initialize how many times this module is called in one
+ * second (should be about every 100ms)
+ */
+ ad_ticks_per_sec = tick_resolution;

- bond_3ad_initiate_agg_selection(bond,
- AD_AGGREGATOR_SELECTION_TIMER *
- ad_ticks_per_sec);
- }
+ bond_3ad_initiate_agg_selection(bond,
+ AD_AGGREGATOR_SELECTION_TIMER *
+ ad_ticks_per_sec);
}

/**
--
2.35.1



2022-08-29 12:36:16

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 037/158] net: dsa: microchip: ksz9477: cleanup the ksz9477_switch_detect

From: Arun Ramadoss <[email protected]>

[ Upstream commit 27faa0aa85f6696d411bbbebaed9f0f723c2a175 ]

The ksz9477_switch_detect performs the detecting the chip id from the
location 0x00 and also check gigabit compatibility check & number of
ports based on the register global_options0. To prepare the common ksz
switch detect function, routine other than chip id read is moved to
ksz9477_switch_init.

Signed-off-by: Arun Ramadoss <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/dsa/microchip/ksz9477.c | 48 +++++++++++++----------------
1 file changed, 22 insertions(+), 26 deletions(-)

diff --git a/drivers/net/dsa/microchip/ksz9477.c b/drivers/net/dsa/microchip/ksz9477.c
index ebad795e4e95f..876a801ac23a4 100644
--- a/drivers/net/dsa/microchip/ksz9477.c
+++ b/drivers/net/dsa/microchip/ksz9477.c
@@ -1365,12 +1365,30 @@ static u32 ksz9477_get_port_addr(int port, int offset)

static int ksz9477_switch_detect(struct ksz_device *dev)
{
- u8 data8;
- u8 id_hi;
- u8 id_lo;
u32 id32;
int ret;

+ /* read chip id */
+ ret = ksz_read32(dev, REG_CHIP_ID0__1, &id32);
+ if (ret)
+ return ret;
+
+ dev_dbg(dev->dev, "Switch detect: ID=%08x\n", id32);
+
+ dev->chip_id = id32 & 0x00FFFF00;
+
+ return 0;
+}
+
+static int ksz9477_switch_init(struct ksz_device *dev)
+{
+ u8 data8;
+ int ret;
+
+ dev->ds->ops = &ksz9477_switch_ops;
+
+ dev->port_mask = (1 << dev->info->port_cnt) - 1;
+
/* turn off SPI DO Edge select */
ret = ksz_read8(dev, REG_SW_GLOBAL_SERIAL_CTRL_0, &data8);
if (ret)
@@ -1381,10 +1399,6 @@ static int ksz9477_switch_detect(struct ksz_device *dev)
if (ret)
return ret;

- /* read chip id */
- ret = ksz_read32(dev, REG_CHIP_ID0__1, &id32);
- if (ret)
- return ret;
ret = ksz_read8(dev, REG_GLOBAL_OPTIONS, &data8);
if (ret)
return ret;
@@ -1395,10 +1409,7 @@ static int ksz9477_switch_detect(struct ksz_device *dev)
/* Default capability is gigabit capable. */
dev->features = GBIT_SUPPORT;

- dev_dbg(dev->dev, "Switch detect: ID=%08x%02x\n", id32, data8);
- id_hi = (u8)(id32 >> 16);
- id_lo = (u8)(id32 >> 8);
- if ((id_lo & 0xf) == 3) {
+ if (dev->chip_id == KSZ9893_CHIP_ID) {
/* Chip is from KSZ9893 design. */
dev_info(dev->dev, "Found KSZ9893\n");
dev->features |= IS_9893;
@@ -1416,21 +1427,6 @@ static int ksz9477_switch_detect(struct ksz_device *dev)
if (!(data8 & SW_GIGABIT_ABLE))
dev->features &= ~GBIT_SUPPORT;
}
-
- /* Change chip id to known ones so it can be matched against them. */
- id32 = (id_hi << 16) | (id_lo << 8);
-
- dev->chip_id = id32;
-
- return 0;
-}
-
-static int ksz9477_switch_init(struct ksz_device *dev)
-{
- dev->ds->ops = &ksz9477_switch_ops;
-
- dev->port_mask = (1 << dev->info->port_cnt) - 1;
-
return 0;
}

--
2.35.1



2022-08-29 12:36:24

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 123/158] shmem: update folio if shmem_replace_page() updates the page

From: Matthew Wilcox (Oracle) <[email protected]>

commit 9dfb3b8d655022760ca68af11821f1c63aa547c3 upstream.

If we allocate a new page, we need to make sure that our folio matches
that new page.

If we do end up in this code path, we store the wrong page in the shmem
inode's page cache, and I would rather imagine that data corruption
ensues.

This will be solved by changing shmem_replace_page() to
shmem_replace_folio(), but this is the minimal fix.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: da08e9b79323 ("mm/shmem: convert shmem_swapin_page() to shmem_swapin_folio()")
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: William Kucharski <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
mm/shmem.c | 1 +
1 file changed, 1 insertion(+)

--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1771,6 +1771,7 @@ static int shmem_swapin_folio(struct ino

if (shmem_should_replace_folio(folio, gfp)) {
error = shmem_replace_page(&page, gfp, info, index);
+ folio = page_folio(page);
if (error)
goto failed;
}


2022-08-29 12:36:41

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 138/158] binder_alloc: add missing mmap_lock calls when using the VMA

From: Liam Howlett <[email protected]>

commit 44e602b4e52f70f04620bbbf4fe46ecb40170bde upstream.

Take the mmap_read_lock() when using the VMA in binder_alloc_print_pages()
and when checking for a VMA in binder_alloc_new_buf_locked().

It is worth noting binder_alloc_new_buf_locked() drops the VMA read lock
after it verifies a VMA exists, but may be taken again deeper in the call
stack, if necessary.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: a43cfc87caaf (android: binder: stop saving a pointer to the VMA)
Signed-off-by: Liam R. Howlett <[email protected]>
Reported-by: Ondrej Mosnacek <[email protected]>
Reported-by: <[email protected]>
Acked-by: Carlos Llamas <[email protected]>
Tested-by: Ondrej Mosnacek <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Christian Brauner (Microsoft) <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Hridya Valsaraju <[email protected]>
Cc: Joel Fernandes <[email protected]>
Cc: Martijn Coenen <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: Todd Kjos <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: "Arve Hjønnevåg" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/android/binder_alloc.c | 31 +++++++++++++++++++++----------
1 file changed, 21 insertions(+), 10 deletions(-)

--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -395,12 +395,15 @@ static struct binder_buffer *binder_allo
size_t size, data_offsets_size;
int ret;

+ mmap_read_lock(alloc->vma_vm_mm);
if (!binder_alloc_get_vma(alloc)) {
+ mmap_read_unlock(alloc->vma_vm_mm);
binder_alloc_debug(BINDER_DEBUG_USER_ERROR,
"%d: binder_alloc_buf, no vma\n",
alloc->pid);
return ERR_PTR(-ESRCH);
}
+ mmap_read_unlock(alloc->vma_vm_mm);

data_offsets_size = ALIGN(data_size, sizeof(void *)) +
ALIGN(offsets_size, sizeof(void *));
@@ -922,17 +925,25 @@ void binder_alloc_print_pages(struct seq
* Make sure the binder_alloc is fully initialized, otherwise we might
* read inconsistent state.
*/
- if (binder_alloc_get_vma(alloc) != NULL) {
- for (i = 0; i < alloc->buffer_size / PAGE_SIZE; i++) {
- page = &alloc->pages[i];
- if (!page->page_ptr)
- free++;
- else if (list_empty(&page->lru))
- active++;
- else
- lru++;
- }
+
+ mmap_read_lock(alloc->vma_vm_mm);
+ if (binder_alloc_get_vma(alloc) == NULL) {
+ mmap_read_unlock(alloc->vma_vm_mm);
+ goto uninitialized;
+ }
+
+ mmap_read_unlock(alloc->vma_vm_mm);
+ for (i = 0; i < alloc->buffer_size / PAGE_SIZE; i++) {
+ page = &alloc->pages[i];
+ if (!page->page_ptr)
+ free++;
+ else if (list_empty(&page->lru))
+ active++;
+ else
+ lru++;
}
+
+uninitialized:
mutex_unlock(&alloc->mutex);
seq_printf(m, " pages: %d:%d:%d\n", active, lru, free);
seq_printf(m, " pages high watermark: %zu\n", alloc->pages_high);


2022-08-29 12:37:17

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 052/158] bnxt_en: set missing reload flag in devlink features

From: Vikas Gupta <[email protected]>

[ Upstream commit 574b2bb9692fd3d45ed631ac447176d4679f3010 ]

Add missing devlink_set_features() API for callbacks reload_down
and reload_up to function.

Fixes: 228ea8c187d8 ("bnxt_en: implement devlink dev reload driver_reinit")
Reviewed-by: Somnath Kotur <[email protected]>
Signed-off-by: Vikas Gupta <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
index 6b3d4f4c2a75f..d83be40785b89 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
@@ -1246,6 +1246,7 @@ int bnxt_dl_register(struct bnxt *bp)
if (rc)
goto err_dl_port_unreg;

+ devlink_set_features(dl, DEVLINK_F_RELOAD);
out:
devlink_register(dl);
return 0;
--
2.35.1



2022-08-29 12:37:33

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 070/158] net: Fix data-races around sysctl_optmem_max.

From: Kuniyuki Iwashima <[email protected]>

[ Upstream commit 7de6d09f51917c829af2b835aba8bb5040f8e86a ]

While reading sysctl_optmem_max, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/core/bpf_sk_storage.c | 5 +++--
net/core/filter.c | 9 +++++----
net/core/sock.c | 8 +++++---
net/ipv4/ip_sockglue.c | 6 +++---
net/ipv6/ipv6_sockglue.c | 4 ++--
5 files changed, 18 insertions(+), 14 deletions(-)

diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
index 1b7f385643b4c..94374d529ea42 100644
--- a/net/core/bpf_sk_storage.c
+++ b/net/core/bpf_sk_storage.c
@@ -310,11 +310,12 @@ BPF_CALL_2(bpf_sk_storage_delete, struct bpf_map *, map, struct sock *, sk)
static int bpf_sk_storage_charge(struct bpf_local_storage_map *smap,
void *owner, u32 size)
{
+ int optmem_max = READ_ONCE(sysctl_optmem_max);
struct sock *sk = (struct sock *)owner;

/* same check as in sock_kmalloc() */
- if (size <= sysctl_optmem_max &&
- atomic_read(&sk->sk_omem_alloc) + size < sysctl_optmem_max) {
+ if (size <= optmem_max &&
+ atomic_read(&sk->sk_omem_alloc) + size < optmem_max) {
atomic_add(size, &sk->sk_omem_alloc);
return 0;
}
diff --git a/net/core/filter.c b/net/core/filter.c
index 60c854e7d98ba..063176428086b 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1214,10 +1214,11 @@ void sk_filter_uncharge(struct sock *sk, struct sk_filter *fp)
static bool __sk_filter_charge(struct sock *sk, struct sk_filter *fp)
{
u32 filter_size = bpf_prog_size(fp->prog->len);
+ int optmem_max = READ_ONCE(sysctl_optmem_max);

/* same check as in sock_kmalloc() */
- if (filter_size <= sysctl_optmem_max &&
- atomic_read(&sk->sk_omem_alloc) + filter_size < sysctl_optmem_max) {
+ if (filter_size <= optmem_max &&
+ atomic_read(&sk->sk_omem_alloc) + filter_size < optmem_max) {
atomic_add(filter_size, &sk->sk_omem_alloc);
return true;
}
@@ -1548,7 +1549,7 @@ int sk_reuseport_attach_filter(struct sock_fprog *fprog, struct sock *sk)
if (IS_ERR(prog))
return PTR_ERR(prog);

- if (bpf_prog_size(prog->len) > sysctl_optmem_max)
+ if (bpf_prog_size(prog->len) > READ_ONCE(sysctl_optmem_max))
err = -ENOMEM;
else
err = reuseport_attach_prog(sk, prog);
@@ -1615,7 +1616,7 @@ int sk_reuseport_attach_bpf(u32 ufd, struct sock *sk)
}
} else {
/* BPF_PROG_TYPE_SOCKET_FILTER */
- if (bpf_prog_size(prog->len) > sysctl_optmem_max) {
+ if (bpf_prog_size(prog->len) > READ_ONCE(sysctl_optmem_max)) {
err = -ENOMEM;
goto err_prog_put;
}
diff --git a/net/core/sock.c b/net/core/sock.c
index 62f69bc3a0e6e..d672e63a5c2d4 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2535,7 +2535,7 @@ struct sk_buff *sock_omalloc(struct sock *sk, unsigned long size,

/* small safe race: SKB_TRUESIZE may differ from final skb->truesize */
if (atomic_read(&sk->sk_omem_alloc) + SKB_TRUESIZE(size) >
- sysctl_optmem_max)
+ READ_ONCE(sysctl_optmem_max))
return NULL;

skb = alloc_skb(size, priority);
@@ -2553,8 +2553,10 @@ struct sk_buff *sock_omalloc(struct sock *sk, unsigned long size,
*/
void *sock_kmalloc(struct sock *sk, int size, gfp_t priority)
{
- if ((unsigned int)size <= sysctl_optmem_max &&
- atomic_read(&sk->sk_omem_alloc) + size < sysctl_optmem_max) {
+ int optmem_max = READ_ONCE(sysctl_optmem_max);
+
+ if ((unsigned int)size <= optmem_max &&
+ atomic_read(&sk->sk_omem_alloc) + size < optmem_max) {
void *mem;
/* First do the add, to avoid the race if kmalloc
* might sleep.
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index a8a323ecbb54b..e49a61a053a68 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -772,7 +772,7 @@ static int ip_set_mcast_msfilter(struct sock *sk, sockptr_t optval, int optlen)

if (optlen < GROUP_FILTER_SIZE(0))
return -EINVAL;
- if (optlen > sysctl_optmem_max)
+ if (optlen > READ_ONCE(sysctl_optmem_max))
return -ENOBUFS;

gsf = memdup_sockptr(optval, optlen);
@@ -808,7 +808,7 @@ static int compat_ip_set_mcast_msfilter(struct sock *sk, sockptr_t optval,

if (optlen < size0)
return -EINVAL;
- if (optlen > sysctl_optmem_max - 4)
+ if (optlen > READ_ONCE(sysctl_optmem_max) - 4)
return -ENOBUFS;

p = kmalloc(optlen + 4, GFP_KERNEL);
@@ -1233,7 +1233,7 @@ static int do_ip_setsockopt(struct sock *sk, int level, int optname,

if (optlen < IP_MSFILTER_SIZE(0))
goto e_inval;
- if (optlen > sysctl_optmem_max) {
+ if (optlen > READ_ONCE(sysctl_optmem_max)) {
err = -ENOBUFS;
break;
}
diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index 222f6bf220ba0..e0dcc7a193df2 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -210,7 +210,7 @@ static int ipv6_set_mcast_msfilter(struct sock *sk, sockptr_t optval,

if (optlen < GROUP_FILTER_SIZE(0))
return -EINVAL;
- if (optlen > sysctl_optmem_max)
+ if (optlen > READ_ONCE(sysctl_optmem_max))
return -ENOBUFS;

gsf = memdup_sockptr(optval, optlen);
@@ -244,7 +244,7 @@ static int compat_ipv6_set_mcast_msfilter(struct sock *sk, sockptr_t optval,

if (optlen < size0)
return -EINVAL;
- if (optlen > sysctl_optmem_max - 4)
+ if (optlen > READ_ONCE(sysctl_optmem_max) - 4)
return -ENOBUFS;

p = kmalloc(optlen + 4, GFP_KERNEL);
--
2.35.1



2022-08-29 12:37:42

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 093/158] net: lantiq_xrx200: restore buffer if memory allocation failed

From: Aleksander Jan Bajkowski <[email protected]>

[ Upstream commit c9c3b1775f80fa21f5bff874027d2ccb10f5d90c ]

In a situation where memory allocation fails, an invalid buffer address
is stored. When this descriptor is used again, the system panics in the
build_skb() function when accessing memory.

Fixes: 7ea6cd16f159 ("lantiq: net: fix duplicated skb in rx descriptor ring")
Signed-off-by: Aleksander Jan Bajkowski <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/ethernet/lantiq_xrx200.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/lantiq_xrx200.c b/drivers/net/ethernet/lantiq_xrx200.c
index 25adce7f0c7c0..57f27cc7724e7 100644
--- a/drivers/net/ethernet/lantiq_xrx200.c
+++ b/drivers/net/ethernet/lantiq_xrx200.c
@@ -193,6 +193,7 @@ static int xrx200_alloc_buf(struct xrx200_chan *ch, void *(*alloc)(unsigned int

ch->rx_buff[ch->dma.desc] = alloc(priv->rx_skb_size);
if (!ch->rx_buff[ch->dma.desc]) {
+ ch->rx_buff[ch->dma.desc] = buf;
ret = -ENOMEM;
goto skip;
}
--
2.35.1



2022-08-29 12:37:57

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 141/158] Documentation/ABI: Mention retbleed vulnerability info file for sysfs

From: Salvatore Bonaccorso <[email protected]>

commit 00da0cb385d05a89226e150a102eb49d8abb0359 upstream.

While reporting for the AMD retbleed vulnerability was added in

6b80b59b3555 ("x86/bugs: Report AMD retbleed vulnerability")

the new sysfs file was not mentioned so far in the ABI documentation for
sysfs-devices-system-cpu. Fix that.

Fixes: 6b80b59b3555 ("x86/bugs: Report AMD retbleed vulnerability")
Signed-off-by: Salvatore Bonaccorso <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
Documentation/ABI/testing/sysfs-devices-system-cpu | 1 +
1 file changed, 1 insertion(+)

--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -527,6 +527,7 @@ What: /sys/devices/system/cpu/vulnerabi
/sys/devices/system/cpu/vulnerabilities/tsx_async_abort
/sys/devices/system/cpu/vulnerabilities/itlb_multihit
/sys/devices/system/cpu/vulnerabilities/mmio_stale_data
+ /sys/devices/system/cpu/vulnerabilities/retbleed
Date: January 2018
Contact: Linux kernel mailing list <[email protected]>
Description: Information about CPU vulnerabilities


2022-08-29 12:38:21

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 042/158] net: dsa: microchip: update the ksz_phylink_get_caps

From: Arun Ramadoss <[email protected]>

[ Upstream commit 7012033ce10e0968e6cb82709aa0ed7f2080b61e ]

This patch assigns the phylink_get_caps in ksz8795 and ksz9477 to
ksz_phylink_get_caps. And update their mac_capabilities in the
respective ksz_dev_ops.

Signed-off-by: Arun Ramadoss <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/dsa/microchip/ksz8795.c | 9 +++------
drivers/net/dsa/microchip/ksz9477.c | 7 +++----
drivers/net/dsa/microchip/ksz_common.c | 3 +++
drivers/net/dsa/microchip/ksz_common.h | 2 ++
4 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/drivers/net/dsa/microchip/ksz8795.c b/drivers/net/dsa/microchip/ksz8795.c
index 2e3d24a3260e1..c771797fd902f 100644
--- a/drivers/net/dsa/microchip/ksz8795.c
+++ b/drivers/net/dsa/microchip/ksz8795.c
@@ -1353,13 +1353,9 @@ static int ksz8_setup(struct dsa_switch *ds)
return ksz8_handle_global_errata(ds);
}

-static void ksz8_get_caps(struct dsa_switch *ds, int port,
+static void ksz8_get_caps(struct ksz_device *dev, int port,
struct phylink_config *config)
{
- struct ksz_device *dev = ds->priv;
-
- ksz_phylink_get_caps(ds, port, config);
-
config->mac_capabilities = MAC_10 | MAC_100;

/* Silicon Errata Sheet (DS80000830A):
@@ -1381,7 +1377,7 @@ static const struct dsa_switch_ops ksz8_switch_ops = {
.setup = ksz8_setup,
.phy_read = ksz_phy_read16,
.phy_write = ksz_phy_write16,
- .phylink_get_caps = ksz8_get_caps,
+ .phylink_get_caps = ksz_phylink_get_caps,
.phylink_mac_link_down = ksz_mac_link_down,
.port_enable = ksz_enable_port,
.get_strings = ksz_get_strings,
@@ -1463,6 +1459,7 @@ static const struct ksz_dev_ops ksz8_dev_ops = {
.vlan_del = ksz8_port_vlan_del,
.mirror_add = ksz8_port_mirror_add,
.mirror_del = ksz8_port_mirror_del,
+ .get_caps = ksz8_get_caps,
.shutdown = ksz8_reset_switch,
.init = ksz8_switch_init,
.exit = ksz8_switch_exit,
diff --git a/drivers/net/dsa/microchip/ksz9477.c b/drivers/net/dsa/microchip/ksz9477.c
index cd4a3088e9473..125124fdefbf4 100644
--- a/drivers/net/dsa/microchip/ksz9477.c
+++ b/drivers/net/dsa/microchip/ksz9477.c
@@ -1082,11 +1082,9 @@ static void ksz9477_phy_errata_setup(struct ksz_device *dev, int port)
ksz9477_port_mmd_write(dev, port, 0x1c, 0x20, 0xeeee);
}

-static void ksz9477_get_caps(struct dsa_switch *ds, int port,
+static void ksz9477_get_caps(struct ksz_device *dev, int port,
struct phylink_config *config)
{
- ksz_phylink_get_caps(ds, port, config);
-
config->mac_capabilities = MAC_10 | MAC_100 | MAC_1000FD |
MAC_ASYM_PAUSE | MAC_SYM_PAUSE;
}
@@ -1316,7 +1314,7 @@ static const struct dsa_switch_ops ksz9477_switch_ops = {
.phy_read = ksz9477_phy_read16,
.phy_write = ksz9477_phy_write16,
.phylink_mac_link_down = ksz_mac_link_down,
- .phylink_get_caps = ksz9477_get_caps,
+ .phylink_get_caps = ksz_phylink_get_caps,
.port_enable = ksz_enable_port,
.get_strings = ksz_get_strings,
.get_ethtool_stats = ksz_get_ethtool_stats,
@@ -1412,6 +1410,7 @@ static const struct ksz_dev_ops ksz9477_dev_ops = {
.vlan_del = ksz9477_port_vlan_del,
.mirror_add = ksz9477_port_mirror_add,
.mirror_del = ksz9477_port_mirror_del,
+ .get_caps = ksz9477_get_caps,
.shutdown = ksz9477_reset_switch,
.init = ksz9477_switch_init,
.exit = ksz9477_switch_exit,
diff --git a/drivers/net/dsa/microchip/ksz_common.c b/drivers/net/dsa/microchip/ksz_common.c
index 676669d353ea6..0f02c62b02685 100644
--- a/drivers/net/dsa/microchip/ksz_common.c
+++ b/drivers/net/dsa/microchip/ksz_common.c
@@ -456,6 +456,9 @@ void ksz_phylink_get_caps(struct dsa_switch *ds, int port,
if (dev->info->internal_phy[port])
__set_bit(PHY_INTERFACE_MODE_INTERNAL,
config->supported_interfaces);
+
+ if (dev->dev_ops->get_caps)
+ dev->dev_ops->get_caps(dev, port, config);
}
EXPORT_SYMBOL_GPL(ksz_phylink_get_caps);

diff --git a/drivers/net/dsa/microchip/ksz_common.h b/drivers/net/dsa/microchip/ksz_common.h
index c724cbb437e29..10f9ef2dbf1ca 100644
--- a/drivers/net/dsa/microchip/ksz_common.h
+++ b/drivers/net/dsa/microchip/ksz_common.h
@@ -192,6 +192,8 @@ struct ksz_dev_ops {
bool ingress, struct netlink_ext_ack *extack);
void (*mirror_del)(struct ksz_device *dev, int port,
struct dsa_mall_mirror_tc_entry *mirror);
+ void (*get_caps)(struct ksz_device *dev, int port,
+ struct phylink_config *config);
void (*freeze_mib)(struct ksz_device *dev, int port, bool freeze);
void (*port_init_cnt)(struct ksz_device *dev, int port);
int (*shutdown)(struct ksz_device *dev);
--
2.35.1



2022-08-29 12:38:25

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 054/158] bnxt_en: fix LRO/GRO_HW features in ndo_fix_features callback

From: Vikas Gupta <[email protected]>

[ Upstream commit 366c304741729e64d778c80555d9eb422cf5cc89 ]

LRO/GRO_HW should be disabled if there is an attached XDP program.
BNXT_FLAG_TPA is the current setting of the LRO/GRO_HW. Using
BNXT_FLAG_TPA to disable LRO/GRO_HW will cause these features to be
permanently disabled once they are disabled.

Fixes: 1dc4c557bfed ("bnxt: adding bnxt_xdp_build_skb to build skb from multibuffer xdp_buff")
Signed-off-by: Vikas Gupta <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index cf9b00576ed36..964354536f9ce 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -11183,10 +11183,7 @@ static netdev_features_t bnxt_fix_features(struct net_device *dev,
if ((features & NETIF_F_NTUPLE) && !bnxt_rfs_capable(bp))
features &= ~NETIF_F_NTUPLE;

- if (bp->flags & BNXT_FLAG_NO_AGG_RINGS)
- features &= ~(NETIF_F_LRO | NETIF_F_GRO_HW);
-
- if (!(bp->flags & BNXT_FLAG_TPA))
+ if ((bp->flags & BNXT_FLAG_NO_AGG_RINGS) || bp->xdp_prog)
features &= ~(NETIF_F_LRO | NETIF_F_GRO_HW);

if (!(features & NETIF_F_GRO))
--
2.35.1



2022-08-29 12:38:47

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 103/158] perf/x86/lbr: Enable the branch type for the Arch LBR by default

From: Kan Liang <[email protected]>

commit 32ba156df1b1c8804a4e5be5339616945eafea22 upstream.

On the platform with Arch LBR, the HW raw branch type encoding may leak
to the perf tool when the SAVE_TYPE option is not set.

In the intel_pmu_store_lbr(), the HW raw branch type is stored in
lbr_entries[].type. If the SAVE_TYPE option is set, the
lbr_entries[].type will be converted into the generic PERF_BR_* type
in the intel_pmu_lbr_filter() and exposed to the user tools.
But if the SAVE_TYPE option is NOT set by the user, the current perf
kernel doesn't clear the field. The HW raw branch type leaks.

There are two solutions to fix the issue for the Arch LBR.
One is to clear the field if the SAVE_TYPE option is NOT set.
The other solution is to unconditionally convert the branch type and
expose the generic type to the user tools.

The latter is implemented here, because
- The branch type is valuable information. I don't see a case where
you would not benefit from the branch type. (Stephane Eranian)
- Not having the branch type DOES NOT save any space in the
branch record (Stephane Eranian)
- The Arch LBR HW can retrieve the common branch types from the
LBR_INFO. It doesn't require the high overhead SW disassemble.

Fixes: 47125db27e47 ("perf/x86/intel/lbr: Support Architectural LBR")
Reported-by: Stephane Eranian <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/events/intel/lbr.c | 8 ++++++++
1 file changed, 8 insertions(+)

--- a/arch/x86/events/intel/lbr.c
+++ b/arch/x86/events/intel/lbr.c
@@ -1097,6 +1097,14 @@ static int intel_pmu_setup_hw_lbr_filter

if (static_cpu_has(X86_FEATURE_ARCH_LBR)) {
reg->config = mask;
+
+ /*
+ * The Arch LBR HW can retrieve the common branch types
+ * from the LBR_INFO. It doesn't require the high overhead
+ * SW disassemble.
+ * Enable the branch type by default for the Arch LBR.
+ */
+ reg->reg |= X86_BR_TYPE_SAVE;
return 0;
}



2022-08-29 12:38:59

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 065/158] net: Fix data-races around sysctl_[rw]mem_(max|default).

From: Kuniyuki Iwashima <[email protected]>

[ Upstream commit 1227c1771dd2ad44318aa3ab9e3a293b3f34ff2a ]

While reading sysctl_[rw]mem_(max|default), they can be changed
concurrently. Thus, we need to add READ_ONCE() to its readers.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/core/filter.c | 4 ++--
net/core/sock.c | 8 ++++----
net/ipv4/ip_output.c | 2 +-
net/ipv4/tcp_output.c | 2 +-
net/netfilter/ipvs/ip_vs_sync.c | 4 ++--
5 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index 74f05ed6aff29..60c854e7d98ba 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5036,14 +5036,14 @@ static int _bpf_setsockopt(struct sock *sk, int level, int optname,
/* Only some socketops are supported */
switch (optname) {
case SO_RCVBUF:
- val = min_t(u32, val, sysctl_rmem_max);
+ val = min_t(u32, val, READ_ONCE(sysctl_rmem_max));
val = min_t(int, val, INT_MAX / 2);
sk->sk_userlocks |= SOCK_RCVBUF_LOCK;
WRITE_ONCE(sk->sk_rcvbuf,
max_t(int, val * 2, SOCK_MIN_RCVBUF));
break;
case SO_SNDBUF:
- val = min_t(u32, val, sysctl_wmem_max);
+ val = min_t(u32, val, READ_ONCE(sysctl_wmem_max));
val = min_t(int, val, INT_MAX / 2);
sk->sk_userlocks |= SOCK_SNDBUF_LOCK;
WRITE_ONCE(sk->sk_sndbuf,
diff --git a/net/core/sock.c b/net/core/sock.c
index 2ff40dd0a7a65..62f69bc3a0e6e 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1100,7 +1100,7 @@ int sock_setsockopt(struct socket *sock, int level, int optname,
* play 'guess the biggest size' games. RCVBUF/SNDBUF
* are treated in BSD as hints
*/
- val = min_t(u32, val, sysctl_wmem_max);
+ val = min_t(u32, val, READ_ONCE(sysctl_wmem_max));
set_sndbuf:
/* Ensure val * 2 fits into an int, to prevent max_t()
* from treating it as a negative value.
@@ -1132,7 +1132,7 @@ int sock_setsockopt(struct socket *sock, int level, int optname,
* play 'guess the biggest size' games. RCVBUF/SNDBUF
* are treated in BSD as hints
*/
- __sock_set_rcvbuf(sk, min_t(u32, val, sysctl_rmem_max));
+ __sock_set_rcvbuf(sk, min_t(u32, val, READ_ONCE(sysctl_rmem_max)));
break;

case SO_RCVBUFFORCE:
@@ -3307,8 +3307,8 @@ void sock_init_data(struct socket *sock, struct sock *sk)
timer_setup(&sk->sk_timer, NULL, 0);

sk->sk_allocation = GFP_KERNEL;
- sk->sk_rcvbuf = sysctl_rmem_default;
- sk->sk_sndbuf = sysctl_wmem_default;
+ sk->sk_rcvbuf = READ_ONCE(sysctl_rmem_default);
+ sk->sk_sndbuf = READ_ONCE(sysctl_wmem_default);
sk->sk_state = TCP_CLOSE;
sk_set_socket(sk, sock);

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 00b4bf26fd932..da8b3cc67234d 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1712,7 +1712,7 @@ void ip_send_unicast_reply(struct sock *sk, struct sk_buff *skb,

sk->sk_protocol = ip_hdr(skb)->protocol;
sk->sk_bound_dev_if = arg->bound_dev_if;
- sk->sk_sndbuf = sysctl_wmem_default;
+ sk->sk_sndbuf = READ_ONCE(sysctl_wmem_default);
ipc.sockc.mark = fl4.flowi4_mark;
err = ip_append_data(sk, &fl4, ip_reply_glue_bits, arg->iov->iov_base,
len, 0, &ipc, &rt, MSG_DONTWAIT);
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index aed0c5f828bef..84314de754f87 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -239,7 +239,7 @@ void tcp_select_initial_window(const struct sock *sk, int __space, __u32 mss,
if (wscale_ok) {
/* Set window scaling on max possible window */
space = max_t(u32, space, READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_rmem[2]));
- space = max_t(u32, space, sysctl_rmem_max);
+ space = max_t(u32, space, READ_ONCE(sysctl_rmem_max));
space = min_t(u32, space, *window_clamp);
*rcv_wscale = clamp_t(int, ilog2(space) - 15,
0, TCP_MAX_WSCALE);
diff --git a/net/netfilter/ipvs/ip_vs_sync.c b/net/netfilter/ipvs/ip_vs_sync.c
index 9d43277b8b4fe..a56fd0b5a430a 100644
--- a/net/netfilter/ipvs/ip_vs_sync.c
+++ b/net/netfilter/ipvs/ip_vs_sync.c
@@ -1280,12 +1280,12 @@ static void set_sock_size(struct sock *sk, int mode, int val)
lock_sock(sk);
if (mode) {
val = clamp_t(int, val, (SOCK_MIN_SNDBUF + 1) / 2,
- sysctl_wmem_max);
+ READ_ONCE(sysctl_wmem_max));
sk->sk_sndbuf = val * 2;
sk->sk_userlocks |= SOCK_SNDBUF_LOCK;
} else {
val = clamp_t(int, val, (SOCK_MIN_RCVBUF + 1) / 2,
- sysctl_rmem_max);
+ READ_ONCE(sysctl_rmem_max));
sk->sk_rcvbuf = val * 2;
sk->sk_userlocks |= SOCK_RCVBUF_LOCK;
}
--
2.35.1



2022-08-29 12:39:35

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 045/158] net: phy: Dont WARN for PHY_READY state in mdio_bus_phy_resume()

From: Xiaolei Wang <[email protected]>

[ Upstream commit 6dbe852c379ff032a70a6b13a91914918c82cb07 ]

For some MAC drivers, they set the mac_managed_pm to true in its
->ndo_open() callback. So before the mac_managed_pm is set to true,
we still want to leverage the mdio_bus_phy_suspend()/resume() for
the phy device suspend and resume. In this case, the phy device is
in PHY_READY, and we shouldn't warn about this. It also seems that
the check of mac_managed_pm in WARN_ON is redundant since we already
check this in the entry of mdio_bus_phy_resume(), so drop it.

Fixes: 744d23c71af3 ("net: phy: Warn about incorrect mdio_bus_phy_resume() state")
Signed-off-by: Xiaolei Wang <[email protected]>
Acked-by: Florian Fainelli <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/phy/phy_device.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 608de5a94165f..f90a21781d8d6 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -316,11 +316,11 @@ static __maybe_unused int mdio_bus_phy_resume(struct device *dev)

phydev->suspended_by_mdio_bus = 0;

- /* If we managed to get here with the PHY state machine in a state other
- * than PHY_HALTED this is an indication that something went wrong and
- * we should most likely be using MAC managed PM and we are not.
+ /* If we manged to get here with the PHY state machine in a state neither
+ * PHY_HALTED nor PHY_READY this is an indication that something went wrong
+ * and we should most likely be using MAC managed PM and we are not.
*/
- WARN_ON(phydev->state != PHY_HALTED && !phydev->mac_managed_pm);
+ WARN_ON(phydev->state != PHY_HALTED && phydev->state != PHY_READY);

ret = phy_init_hw(phydev);
if (ret < 0)
--
2.35.1



2022-08-29 12:39:44

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 056/158] netfilter: nf_tables: make table handle allocation per-netns friendly

From: Pablo Neira Ayuso <[email protected]>

[ Upstream commit ab482c6b66a4a8c0a8c0b0f577a785cf9ff1c2e2 ]

mutex is per-netns, move table_netns to the pernet area.

*read-write* to 0xffffffff883a01e8 of 8 bytes by task 6542 on cpu 0:
nf_tables_newtable+0x6dc/0xc00 net/netfilter/nf_tables_api.c:1221
nfnetlink_rcv_batch net/netfilter/nfnetlink.c:513 [inline]
nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:634 [inline]
nfnetlink_rcv+0xa6a/0x13a0 net/netfilter/nfnetlink.c:652
netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
netlink_unicast+0x652/0x730 net/netlink/af_netlink.c:1345
netlink_sendmsg+0x643/0x740 net/netlink/af_netlink.c:1921

Fixes: f102d66b335a ("netfilter: nf_tables: use dedicated mutex to guard transactions")
Reported-by: Abhishek Shah <[email protected]>
Reviewed-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
include/net/netfilter/nf_tables.h | 1 +
net/netfilter/nf_tables_api.c | 3 +--
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index b8890ace0f879..0daad6e63ccb2 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -1635,6 +1635,7 @@ struct nftables_pernet {
struct list_head module_list;
struct list_head notify_list;
struct mutex commit_mutex;
+ u64 table_handle;
unsigned int base_seq;
u8 validate_state;
};
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 8b6ee9df817fb..e171257739c2f 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -32,7 +32,6 @@ static LIST_HEAD(nf_tables_objects);
static LIST_HEAD(nf_tables_flowtables);
static LIST_HEAD(nf_tables_destroy_list);
static DEFINE_SPINLOCK(nf_tables_destroy_list_lock);
-static u64 table_handle;

enum {
NFT_VALIDATE_SKIP = 0,
@@ -1235,7 +1234,7 @@ static int nf_tables_newtable(struct sk_buff *skb, const struct nfnl_info *info,
INIT_LIST_HEAD(&table->flowtables);
table->family = family;
table->flags = flags;
- table->handle = ++table_handle;
+ table->handle = ++nft_net->table_handle;
if (table->flags & NFT_TABLE_F_OWNER)
table->nlpid = NETLINK_CB(skb).portid;

--
2.35.1



2022-08-29 12:40:06

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 066/158] net: Fix data-races around weight_p and dev_weight_[rt]x_bias.

From: Kuniyuki Iwashima <[email protected]>

[ Upstream commit bf955b5ab8f6f7b0632cdef8e36b14e4f6e77829 ]

While reading weight_p, it can be changed concurrently. Thus, we need
to add READ_ONCE() to its reader.

Also, dev_[rt]x_weight can be read/written at the same time. So, we
need to use READ_ONCE() and WRITE_ONCE() for its access. Moreover, to
use the same weight_p while changing dev_[rt]x_weight, we add a mutex
in proc_do_dev_weight().

Fixes: 3d48b53fb2ae ("net: dev_weight: TX/RX orthogonality")
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/core/dev.c | 2 +-
net/core/sysctl_net_core.c | 15 +++++++++------
net/sched/sch_generic.c | 2 +-
3 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 30a1603a7225c..2a7d81cd9e2ea 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5917,7 +5917,7 @@ static int process_backlog(struct napi_struct *napi, int quota)
net_rps_action_and_irq_enable(sd);
}

- napi->weight = dev_rx_weight;
+ napi->weight = READ_ONCE(dev_rx_weight);
while (again) {
struct sk_buff *skb;

diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index 71a13596ea2bf..725891527814c 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -234,14 +234,17 @@ static int set_default_qdisc(struct ctl_table *table, int write,
static int proc_do_dev_weight(struct ctl_table *table, int write,
void *buffer, size_t *lenp, loff_t *ppos)
{
- int ret;
+ static DEFINE_MUTEX(dev_weight_mutex);
+ int ret, weight;

+ mutex_lock(&dev_weight_mutex);
ret = proc_dointvec(table, write, buffer, lenp, ppos);
- if (ret != 0)
- return ret;
-
- dev_rx_weight = weight_p * dev_weight_rx_bias;
- dev_tx_weight = weight_p * dev_weight_tx_bias;
+ if (!ret && write) {
+ weight = READ_ONCE(weight_p);
+ WRITE_ONCE(dev_rx_weight, weight * dev_weight_rx_bias);
+ WRITE_ONCE(dev_tx_weight, weight * dev_weight_tx_bias);
+ }
+ mutex_unlock(&dev_weight_mutex);

return ret;
}
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index dba0b3e24af5e..a64c3c1541118 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -409,7 +409,7 @@ static inline bool qdisc_restart(struct Qdisc *q, int *packets)

void __qdisc_run(struct Qdisc *q)
{
- int quota = dev_tx_weight;
+ int quota = READ_ONCE(dev_tx_weight);
int packets;

while (qdisc_restart(q, &packets)) {
--
2.35.1



2022-08-29 12:40:32

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 114/158] mm/damon/dbgfs: avoid duplicate context directory creation

From: Badari Pulavarty <[email protected]>

commit d26f60703606ab425eee9882b32a1781a8bed74d upstream.

When user tries to create a DAMON context via the DAMON debugfs interface
with a name of an already existing context, the context directory creation
fails but a new context is created and added in the internal data
structure, due to absence of the directory creation success check. As a
result, memory could leak and DAMON cannot be turned on. An example test
case is as below:

# cd /sys/kernel/debug/damon/
# echo "off" > monitor_on
# echo paddr > target_ids
# echo "abc" > mk_context
# echo "abc" > mk_context
# echo $$ > abc/target_ids
# echo "on" > monitor_on <<< fails

Return value of 'debugfs_create_dir()' is expected to be ignored in
general, but this is an exceptional case as DAMON feature is depending
on the debugfs functionality and it has the potential duplicate name
issue. This commit therefore fixes the issue by checking the directory
creation failure and immediately return the error in the case.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 75c1c2b53c78 ("mm/damon/dbgfs: support multiple contexts")
Signed-off-by: Badari Pulavarty <[email protected]>
Signed-off-by: SeongJae Park <[email protected]>
Cc: <[email protected]> [ 5.15.x]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
mm/damon/dbgfs.c | 3 +++
1 file changed, 3 insertions(+)

--- a/mm/damon/dbgfs.c
+++ b/mm/damon/dbgfs.c
@@ -787,6 +787,9 @@ static int dbgfs_mk_context(char *name)
return -ENOENT;

new_dir = debugfs_create_dir(name, root);
+ /* Below check is required for a potential duplicated name case */
+ if (IS_ERR(new_dir))
+ return PTR_ERR(new_dir);
dbgfs_dirs[dbgfs_nr_ctxs] = new_dir;

new_ctx = dbgfs_new_ctx();


2022-08-29 12:40:44

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 086/158] rxrpc: Fix locking in rxrpcs sendmsg

From: David Howells <[email protected]>

[ Upstream commit b0f571ecd7943423c25947439045f0d352ca3dbf ]

Fix three bugs in the rxrpc's sendmsg implementation:

(1) rxrpc_new_client_call() should release the socket lock when returning
an error from rxrpc_get_call_slot().

(2) rxrpc_wait_for_tx_window_intr() will return without the call mutex
held in the event that we're interrupted by a signal whilst waiting
for tx space on the socket or relocking the call mutex afterwards.

Fix this by: (a) moving the unlock/lock of the call mutex up to
rxrpc_send_data() such that the lock is not held around all of
rxrpc_wait_for_tx_window*() and (b) indicating to higher callers
whether we're return with the lock dropped. Note that this means
recvmsg() will not block on this call whilst we're waiting.

(3) After dropping and regaining the call mutex, rxrpc_send_data() needs
to go and recheck the state of the tx_pending buffer and the
tx_total_len check in case we raced with another sendmsg() on the same
call.

Thinking on this some more, it might make sense to have different locks for
sendmsg() and recvmsg(). There's probably no need to make recvmsg() wait
for sendmsg(). It does mean that recvmsg() can return MSG_EOR indicating
that a call is dead before a sendmsg() to that call returns - but that can
currently happen anyway.

Without fix (2), something like the following can be induced:

WARNING: bad unlock balance detected!
5.16.0-rc6-syzkaller #0 Not tainted
-------------------------------------
syz-executor011/3597 is trying to release lock (&call->user_mutex) at:
[<ffffffff885163a3>] rxrpc_do_sendmsg+0xc13/0x1350 net/rxrpc/sendmsg.c:748
but there are no more locks to release!

other info that might help us debug this:
no locks held by syz-executor011/3597.
...
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_unlock_imbalance_bug include/trace/events/lock.h:58 [inline]
__lock_release kernel/locking/lockdep.c:5306 [inline]
lock_release.cold+0x49/0x4e kernel/locking/lockdep.c:5657
__mutex_unlock_slowpath+0x99/0x5e0 kernel/locking/mutex.c:900
rxrpc_do_sendmsg+0xc13/0x1350 net/rxrpc/sendmsg.c:748
rxrpc_sendmsg+0x420/0x630 net/rxrpc/af_rxrpc.c:561
sock_sendmsg_nosec net/socket.c:704 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:724
____sys_sendmsg+0x6e8/0x810 net/socket.c:2409
___sys_sendmsg+0xf3/0x170 net/socket.c:2463
__sys_sendmsg+0xe5/0x1b0 net/socket.c:2492
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae

[Thanks to Hawkins Jiawei and Khalid Masum for their attempts to fix this]

Fixes: bc5e3a546d55 ("rxrpc: Use MSG_WAITALL to tell sendmsg() to temporarily ignore signals")
Reported-by: [email protected]
Signed-off-by: David Howells <[email protected]>
Reviewed-by: Marc Dionne <[email protected]>
Tested-by: [email protected]
cc: Hawkins Jiawei <[email protected]>
cc: Khalid Masum <[email protected]>
cc: Dan Carpenter <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/166135894583.600315.7170979436768124075.stgit@warthog.procyon.org.uk
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/rxrpc/call_object.c | 4 +-
net/rxrpc/sendmsg.c | 92 ++++++++++++++++++++++++-----------------
2 files changed, 57 insertions(+), 39 deletions(-)

diff --git a/net/rxrpc/call_object.c b/net/rxrpc/call_object.c
index 84d0a41096450..6401cdf7a6246 100644
--- a/net/rxrpc/call_object.c
+++ b/net/rxrpc/call_object.c
@@ -285,8 +285,10 @@ struct rxrpc_call *rxrpc_new_client_call(struct rxrpc_sock *rx,
_enter("%p,%lx", rx, p->user_call_ID);

limiter = rxrpc_get_call_slot(p, gfp);
- if (!limiter)
+ if (!limiter) {
+ release_sock(&rx->sk);
return ERR_PTR(-ERESTARTSYS);
+ }

call = rxrpc_alloc_client_call(rx, srx, gfp, debug_id);
if (IS_ERR(call)) {
diff --git a/net/rxrpc/sendmsg.c b/net/rxrpc/sendmsg.c
index 1d38e279e2efa..3c3a626459deb 100644
--- a/net/rxrpc/sendmsg.c
+++ b/net/rxrpc/sendmsg.c
@@ -51,10 +51,7 @@ static int rxrpc_wait_for_tx_window_intr(struct rxrpc_sock *rx,
return sock_intr_errno(*timeo);

trace_rxrpc_transmit(call, rxrpc_transmit_wait);
- mutex_unlock(&call->user_mutex);
*timeo = schedule_timeout(*timeo);
- if (mutex_lock_interruptible(&call->user_mutex) < 0)
- return sock_intr_errno(*timeo);
}
}

@@ -290,37 +287,48 @@ static int rxrpc_queue_packet(struct rxrpc_sock *rx, struct rxrpc_call *call,
static int rxrpc_send_data(struct rxrpc_sock *rx,
struct rxrpc_call *call,
struct msghdr *msg, size_t len,
- rxrpc_notify_end_tx_t notify_end_tx)
+ rxrpc_notify_end_tx_t notify_end_tx,
+ bool *_dropped_lock)
{
struct rxrpc_skb_priv *sp;
struct sk_buff *skb;
struct sock *sk = &rx->sk;
+ enum rxrpc_call_state state;
long timeo;
- bool more;
- int ret, copied;
+ bool more = msg->msg_flags & MSG_MORE;
+ int ret, copied = 0;

timeo = sock_sndtimeo(sk, msg->msg_flags & MSG_DONTWAIT);

/* this should be in poll */
sk_clear_bit(SOCKWQ_ASYNC_NOSPACE, sk);

+reload:
+ ret = -EPIPE;
if (sk->sk_shutdown & SEND_SHUTDOWN)
- return -EPIPE;
-
- more = msg->msg_flags & MSG_MORE;
-
+ goto maybe_error;
+ state = READ_ONCE(call->state);
+ ret = -ESHUTDOWN;
+ if (state >= RXRPC_CALL_COMPLETE)
+ goto maybe_error;
+ ret = -EPROTO;
+ if (state != RXRPC_CALL_CLIENT_SEND_REQUEST &&
+ state != RXRPC_CALL_SERVER_ACK_REQUEST &&
+ state != RXRPC_CALL_SERVER_SEND_REPLY)
+ goto maybe_error;
+
+ ret = -EMSGSIZE;
if (call->tx_total_len != -1) {
- if (len > call->tx_total_len)
- return -EMSGSIZE;
- if (!more && len != call->tx_total_len)
- return -EMSGSIZE;
+ if (len - copied > call->tx_total_len)
+ goto maybe_error;
+ if (!more && len - copied != call->tx_total_len)
+ goto maybe_error;
}

skb = call->tx_pending;
call->tx_pending = NULL;
rxrpc_see_skb(skb, rxrpc_skb_seen);

- copied = 0;
do {
/* Check to see if there's a ping ACK to reply to. */
if (call->ackr_reason == RXRPC_ACK_PING_RESPONSE)
@@ -331,16 +339,8 @@ static int rxrpc_send_data(struct rxrpc_sock *rx,

_debug("alloc");

- if (!rxrpc_check_tx_space(call, NULL)) {
- ret = -EAGAIN;
- if (msg->msg_flags & MSG_DONTWAIT)
- goto maybe_error;
- ret = rxrpc_wait_for_tx_window(rx, call,
- &timeo,
- msg->msg_flags & MSG_WAITALL);
- if (ret < 0)
- goto maybe_error;
- }
+ if (!rxrpc_check_tx_space(call, NULL))
+ goto wait_for_space;

/* Work out the maximum size of a packet. Assume that
* the security header is going to be in the padded
@@ -468,6 +468,27 @@ static int rxrpc_send_data(struct rxrpc_sock *rx,
efault:
ret = -EFAULT;
goto out;
+
+wait_for_space:
+ ret = -EAGAIN;
+ if (msg->msg_flags & MSG_DONTWAIT)
+ goto maybe_error;
+ mutex_unlock(&call->user_mutex);
+ *_dropped_lock = true;
+ ret = rxrpc_wait_for_tx_window(rx, call, &timeo,
+ msg->msg_flags & MSG_WAITALL);
+ if (ret < 0)
+ goto maybe_error;
+ if (call->interruptibility == RXRPC_INTERRUPTIBLE) {
+ if (mutex_lock_interruptible(&call->user_mutex) < 0) {
+ ret = sock_intr_errno(timeo);
+ goto maybe_error;
+ }
+ } else {
+ mutex_lock(&call->user_mutex);
+ }
+ *_dropped_lock = false;
+ goto reload;
}

/*
@@ -629,6 +650,7 @@ int rxrpc_do_sendmsg(struct rxrpc_sock *rx, struct msghdr *msg, size_t len)
enum rxrpc_call_state state;
struct rxrpc_call *call;
unsigned long now, j;
+ bool dropped_lock = false;
int ret;

struct rxrpc_send_params p = {
@@ -737,21 +759,13 @@ int rxrpc_do_sendmsg(struct rxrpc_sock *rx, struct msghdr *msg, size_t len)
ret = rxrpc_send_abort_packet(call);
} else if (p.command != RXRPC_CMD_SEND_DATA) {
ret = -EINVAL;
- } else if (rxrpc_is_client_call(call) &&
- state != RXRPC_CALL_CLIENT_SEND_REQUEST) {
- /* request phase complete for this client call */
- ret = -EPROTO;
- } else if (rxrpc_is_service_call(call) &&
- state != RXRPC_CALL_SERVER_ACK_REQUEST &&
- state != RXRPC_CALL_SERVER_SEND_REPLY) {
- /* Reply phase not begun or not complete for service call. */
- ret = -EPROTO;
} else {
- ret = rxrpc_send_data(rx, call, msg, len, NULL);
+ ret = rxrpc_send_data(rx, call, msg, len, NULL, &dropped_lock);
}

out_put_unlock:
- mutex_unlock(&call->user_mutex);
+ if (!dropped_lock)
+ mutex_unlock(&call->user_mutex);
error_put:
rxrpc_put_call(call, rxrpc_call_put);
_leave(" = %d", ret);
@@ -779,6 +793,7 @@ int rxrpc_kernel_send_data(struct socket *sock, struct rxrpc_call *call,
struct msghdr *msg, size_t len,
rxrpc_notify_end_tx_t notify_end_tx)
{
+ bool dropped_lock = false;
int ret;

_enter("{%d,%s},", call->debug_id, rxrpc_call_states[call->state]);
@@ -796,7 +811,7 @@ int rxrpc_kernel_send_data(struct socket *sock, struct rxrpc_call *call,
case RXRPC_CALL_SERVER_ACK_REQUEST:
case RXRPC_CALL_SERVER_SEND_REPLY:
ret = rxrpc_send_data(rxrpc_sk(sock->sk), call, msg, len,
- notify_end_tx);
+ notify_end_tx, &dropped_lock);
break;
case RXRPC_CALL_COMPLETE:
read_lock_bh(&call->state_lock);
@@ -810,7 +825,8 @@ int rxrpc_kernel_send_data(struct socket *sock, struct rxrpc_call *call,
break;
}

- mutex_unlock(&call->user_mutex);
+ if (!dropped_lock)
+ mutex_unlock(&call->user_mutex);
_leave(" = %d", ret);
return ret;
}
--
2.35.1



2022-08-29 12:41:19

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 014/158] xfrm: clone missing x->lastused in xfrm_do_migrate

From: Antony Antony <[email protected]>

[ Upstream commit 6aa811acdb76facca0b705f4e4c1d948ccb6af8b ]

x->lastused was not cloned in xfrm_do_migrate. Add it to clone during
migrate.

Fixes: 80c9abaabf42 ("[XFRM]: Extension for dynamic update of endpoint address(es)")
Signed-off-by: Antony Antony <[email protected]>
Signed-off-by: Steffen Klassert <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/xfrm/xfrm_state.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index ccfb172eb5b8d..11d89af9cb55a 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -1592,6 +1592,7 @@ static struct xfrm_state *xfrm_state_clone(struct xfrm_state *orig,
x->replay = orig->replay;
x->preplay = orig->preplay;
x->mapping_maxage = orig->mapping_maxage;
+ x->lastused = orig->lastused;
x->new_mapping = 0;
x->new_mapping_sport = 0;

--
2.35.1



2022-08-29 12:41:31

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 006/158] parisc: Fix exception handler for fldw and fstw instructions

From: Helge Deller <[email protected]>

commit 7ae1f5508d9a33fd58ed3059bd2d569961e3b8bd upstream.

The exception handler is broken for unaligned memory acceses with fldw
and fstw instructions, because it trashes or uses randomly some other
floating point register than the one specified in the instruction word
on loads and stores.

The instruction "fldw 0(addr),%fr22L" (and the other fldw/fstw
instructions) encode the target register (%fr22) in the rightmost 5 bits
of the instruction word. The 7th rightmost bit of the instruction word
defines if the left or right half of %fr22 should be used.

While processing unaligned address accesses, the FR3() define is used to
extract the offset into the local floating-point register set. But the
calculation in FR3() was buggy, so that for example instead of %fr22,
register %fr12 [((22 * 2) & 0x1f) = 12] was used.

This bug has been since forever in the parisc kernel and I wonder why it
wasn't detected earlier. Interestingly I noticed this bug just because
the libime debian package failed to build on *native* hardware, while it
successfully built in qemu.

This patch corrects the bitshift and masking calculation in FR3().

Signed-off-by: Helge Deller <[email protected]>
Cc: <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/parisc/kernel/unaligned.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/parisc/kernel/unaligned.c
+++ b/arch/parisc/kernel/unaligned.c
@@ -93,7 +93,7 @@
#define R1(i) (((i)>>21)&0x1f)
#define R2(i) (((i)>>16)&0x1f)
#define R3(i) ((i)&0x1f)
-#define FR3(i) ((((i)<<1)&0x1f)|(((i)>>6)&1))
+#define FR3(i) ((((i)&0x1f)<<1)|(((i)>>6)&1))
#define IM(i,n) (((i)>>1&((1<<(n-1))-1))|((i)&1?((0-1L)<<(n-1)):0))
#define IM5_2(i) IM((i)>>16,5)
#define IM5_3(i) IM((i),5)


2022-08-29 12:41:40

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 100/158] btrfs: update generation of hole file extent item when merging holes

From: Filipe Manana <[email protected]>

commit e6e3dec6c3c288d556b991a85d5d8e3ee71e9046 upstream.

When punching a hole into a file range that is adjacent with a hole and we
are not using the no-holes feature, we expand the range of the adjacent
file extent item that represents a hole, to save metadata space.

However we don't update the generation of hole file extent item, which
means a full fsync will not log that file extent item if the fsync happens
in a later transaction (since commit 7f30c07288bb9e ("btrfs: stop copying
old file extents when doing a full fsync")).

For example, if we do this:

$ mkfs.btrfs -f -O ^no-holes /dev/sdb
$ mount /dev/sdb /mnt
$ xfs_io -f -c "pwrite -S 0xab 2M 2M" /mnt/foobar
$ sync

We end up with 2 file extent items in our file:

1) One that represents the hole for the file range [0, 2M), with a
generation of 7;

2) Another one that represents an extent covering the range [2M, 4M).

After that if we do the following:

$ xfs_io -c "fpunch 2M 2M" /mnt/foobar

We end up with a single file extent item in the file, which represents a
hole for the range [0, 4M) and with a generation of 7 - because we end
dropping the data extent for range [2M, 4M) and then update the file
extent item that represented the hole at [0, 2M), by increasing
length from 2M to 4M.

Then doing a full fsync and power failing:

$ xfs_io -c "fsync" /mnt/foobar
<power failure>

will result in the full fsync not logging the file extent item that
represents the hole for the range [0, 4M), because its generation is 7,
which is lower than the generation of the current transaction (8).
As a consequence, after mounting again the filesystem (after log replay),
the region [2M, 4M) does not have a hole, it still points to the
previous data extent.

So fix this by always updating the generation of existing file extent
items representing holes when we merge/expand them. This solves the
problem and it's the same approach as when we merge prealloc extents that
got written (at btrfs_mark_extent_written()). Setting the generation to
the current transaction's generation is also what we do when merging
the new hole extent map with the previous one or the next one.

A test case for fstests, covering both cases of hole file extent item
merging (to the left and to the right), will be sent soon.

Fixes: 7f30c07288bb9e ("btrfs: stop copying old file extents when doing a full fsync")
CC: [email protected] # 5.18+
Reviewed-by: Josef Bacik <[email protected]>
Signed-off-by: Filipe Manana <[email protected]>
Signed-off-by: David Sterba <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/btrfs/file.c | 2 ++
1 file changed, 2 insertions(+)

--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2483,6 +2483,7 @@ static int fill_holes(struct btrfs_trans
btrfs_set_file_extent_num_bytes(leaf, fi, num_bytes);
btrfs_set_file_extent_ram_bytes(leaf, fi, num_bytes);
btrfs_set_file_extent_offset(leaf, fi, 0);
+ btrfs_set_file_extent_generation(leaf, fi, trans->transid);
btrfs_mark_buffer_dirty(leaf);
goto out;
}
@@ -2499,6 +2500,7 @@ static int fill_holes(struct btrfs_trans
btrfs_set_file_extent_num_bytes(leaf, fi, num_bytes);
btrfs_set_file_extent_ram_bytes(leaf, fi, num_bytes);
btrfs_set_file_extent_offset(leaf, fi, 0);
+ btrfs_set_file_extent_generation(leaf, fi, trans->transid);
btrfs_mark_buffer_dirty(leaf);
goto out;
}


2022-08-29 12:41:49

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 032/158] net/mlx5: Disable irq when locking lag_lock

From: Vlad Buslov <[email protected]>

[ Upstream commit 8e93f29422ffe968d7161f91acdf0d47f5323727 ]

The lag_lock is taken from both process and softirq contexts which results
lockdep warning[0] about potential deadlock. However, just disabling
softirqs by using *_bh spinlock API is not enough since it will cause
warning in some contexts where the lock is obtained with hard irqs
disabled. To fix the issue save current irq state, disable them before
obtaining the lock an re-enable irqs from saved state after releasing it.

[0]:

[Sun Aug 7 13:12:29 2022] ================================
[Sun Aug 7 13:12:29 2022] WARNING: inconsistent lock state
[Sun Aug 7 13:12:29 2022] 5.19.0_for_upstream_debug_2022_08_04_16_06 #1 Not tainted
[Sun Aug 7 13:12:29 2022] --------------------------------
[Sun Aug 7 13:12:29 2022] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[Sun Aug 7 13:12:29 2022] swapper/0/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
[Sun Aug 7 13:12:29 2022] ffffffffa06dc0d8 (lag_lock){+.?.}-{2:2}, at: mlx5_lag_is_shared_fdb+0x1f/0x120 [mlx5_core]
[Sun Aug 7 13:12:29 2022] {SOFTIRQ-ON-W} state was registered at:
[Sun Aug 7 13:12:29 2022] lock_acquire+0x1c1/0x550
[Sun Aug 7 13:12:29 2022] _raw_spin_lock+0x2c/0x40
[Sun Aug 7 13:12:29 2022] mlx5_lag_add_netdev+0x13b/0x480 [mlx5_core]
[Sun Aug 7 13:12:29 2022] mlx5e_nic_enable+0x114/0x470 [mlx5_core]
[Sun Aug 7 13:12:29 2022] mlx5e_attach_netdev+0x30e/0x6a0 [mlx5_core]
[Sun Aug 7 13:12:29 2022] mlx5e_resume+0x105/0x160 [mlx5_core]
[Sun Aug 7 13:12:29 2022] mlx5e_probe+0xac3/0x14f0 [mlx5_core]
[Sun Aug 7 13:12:29 2022] auxiliary_bus_probe+0x9d/0xe0
[Sun Aug 7 13:12:29 2022] really_probe+0x1e0/0xaa0
[Sun Aug 7 13:12:29 2022] __driver_probe_device+0x219/0x480
[Sun Aug 7 13:12:29 2022] driver_probe_device+0x49/0x130
[Sun Aug 7 13:12:29 2022] __driver_attach+0x1e4/0x4d0
[Sun Aug 7 13:12:29 2022] bus_for_each_dev+0x11e/0x1a0
[Sun Aug 7 13:12:29 2022] bus_add_driver+0x3f4/0x5a0
[Sun Aug 7 13:12:29 2022] driver_register+0x20f/0x390
[Sun Aug 7 13:12:29 2022] __auxiliary_driver_register+0x14e/0x260
[Sun Aug 7 13:12:29 2022] mlx5e_init+0x38/0x90 [mlx5_core]
[Sun Aug 7 13:12:29 2022] vhost_iotlb_itree_augment_rotate+0xcb/0x180 [vhost_iotlb]
[Sun Aug 7 13:12:29 2022] do_one_initcall+0xc4/0x400
[Sun Aug 7 13:12:29 2022] do_init_module+0x18a/0x620
[Sun Aug 7 13:12:29 2022] load_module+0x563a/0x7040
[Sun Aug 7 13:12:29 2022] __do_sys_finit_module+0x122/0x1d0
[Sun Aug 7 13:12:29 2022] do_syscall_64+0x3d/0x90
[Sun Aug 7 13:12:29 2022] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[Sun Aug 7 13:12:29 2022] irq event stamp: 3596508
[Sun Aug 7 13:12:29 2022] hardirqs last enabled at (3596508): [<ffffffff813687c2>] __local_bh_enable_ip+0xa2/0x100
[Sun Aug 7 13:12:29 2022] hardirqs last disabled at (3596507): [<ffffffff813687da>] __local_bh_enable_ip+0xba/0x100
[Sun Aug 7 13:12:29 2022] softirqs last enabled at (3596488): [<ffffffff81368a2a>] irq_exit_rcu+0x11a/0x170
[Sun Aug 7 13:12:29 2022] softirqs last disabled at (3596495): [<ffffffff81368a2a>] irq_exit_rcu+0x11a/0x170
[Sun Aug 7 13:12:29 2022]
other info that might help us debug this:
[Sun Aug 7 13:12:29 2022] Possible unsafe locking scenario:

[Sun Aug 7 13:12:29 2022] CPU0
[Sun Aug 7 13:12:29 2022] ----
[Sun Aug 7 13:12:29 2022] lock(lag_lock);
[Sun Aug 7 13:12:29 2022] <Interrupt>
[Sun Aug 7 13:12:29 2022] lock(lag_lock);
[Sun Aug 7 13:12:29 2022]
*** DEADLOCK ***

[Sun Aug 7 13:12:29 2022] 4 locks held by swapper/0/0:
[Sun Aug 7 13:12:29 2022] #0: ffffffff84643260 (rcu_read_lock){....}-{1:2}, at: mlx5e_napi_poll+0x43/0x20a0 [mlx5_core]
[Sun Aug 7 13:12:29 2022] #1: ffffffff84643260 (rcu_read_lock){....}-{1:2}, at: netif_receive_skb_list_internal+0x2d7/0xd60
[Sun Aug 7 13:12:29 2022] #2: ffff888144a18b58 (&br->hash_lock){+.-.}-{2:2}, at: br_fdb_update+0x301/0x570
[Sun Aug 7 13:12:29 2022] #3: ffffffff84643260 (rcu_read_lock){....}-{1:2}, at: atomic_notifier_call_chain+0x5/0x1d0
[Sun Aug 7 13:12:29 2022]
stack backtrace:
[Sun Aug 7 13:12:29 2022] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.19.0_for_upstream_debug_2022_08_04_16_06 #1
[Sun Aug 7 13:12:29 2022] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[Sun Aug 7 13:12:29 2022] Call Trace:
[Sun Aug 7 13:12:29 2022] <IRQ>
[Sun Aug 7 13:12:29 2022] dump_stack_lvl+0x57/0x7d
[Sun Aug 7 13:12:29 2022] mark_lock.part.0.cold+0x5f/0x92
[Sun Aug 7 13:12:29 2022] ? lock_chain_count+0x20/0x20
[Sun Aug 7 13:12:29 2022] ? unwind_next_frame+0x1c4/0x1b50
[Sun Aug 7 13:12:29 2022] ? secondary_startup_64_no_verify+0xcd/0xdb
[Sun Aug 7 13:12:29 2022] ? mlx5e_napi_poll+0x4e9/0x20a0 [mlx5_core]
[Sun Aug 7 13:12:29 2022] ? mlx5e_napi_poll+0x4e9/0x20a0 [mlx5_core]
[Sun Aug 7 13:12:29 2022] ? stack_access_ok+0x1d0/0x1d0
[Sun Aug 7 13:12:29 2022] ? start_kernel+0x3a7/0x3c5
[Sun Aug 7 13:12:29 2022] __lock_acquire+0x1260/0x6720
[Sun Aug 7 13:12:29 2022] ? lock_chain_count+0x20/0x20
[Sun Aug 7 13:12:29 2022] ? lock_chain_count+0x20/0x20
[Sun Aug 7 13:12:29 2022] ? register_lock_class+0x1880/0x1880
[Sun Aug 7 13:12:29 2022] ? mark_lock.part.0+0xed/0x3060
[Sun Aug 7 13:12:29 2022] ? stack_trace_save+0x91/0xc0
[Sun Aug 7 13:12:29 2022] lock_acquire+0x1c1/0x550
[Sun Aug 7 13:12:29 2022] ? mlx5_lag_is_shared_fdb+0x1f/0x120 [mlx5_core]
[Sun Aug 7 13:12:29 2022] ? lockdep_hardirqs_on_prepare+0x400/0x400
[Sun Aug 7 13:12:29 2022] ? __lock_acquire+0xd6f/0x6720
[Sun Aug 7 13:12:29 2022] _raw_spin_lock+0x2c/0x40
[Sun Aug 7 13:12:29 2022] ? mlx5_lag_is_shared_fdb+0x1f/0x120 [mlx5_core]
[Sun Aug 7 13:12:29 2022] mlx5_lag_is_shared_fdb+0x1f/0x120 [mlx5_core]
[Sun Aug 7 13:12:29 2022] mlx5_esw_bridge_rep_vport_num_vhca_id_get+0x1a0/0x600 [mlx5_core]
[Sun Aug 7 13:12:29 2022] ? mlx5_esw_bridge_update_work+0x90/0x90 [mlx5_core]
[Sun Aug 7 13:12:29 2022] ? lock_acquire+0x1c1/0x550
[Sun Aug 7 13:12:29 2022] mlx5_esw_bridge_switchdev_event+0x185/0x8f0 [mlx5_core]
[Sun Aug 7 13:12:29 2022] ? mlx5_esw_bridge_port_obj_attr_set+0x3e0/0x3e0 [mlx5_core]
[Sun Aug 7 13:12:29 2022] ? check_chain_key+0x24a/0x580
[Sun Aug 7 13:12:29 2022] atomic_notifier_call_chain+0xd7/0x1d0
[Sun Aug 7 13:12:29 2022] br_switchdev_fdb_notify+0xea/0x100
[Sun Aug 7 13:12:29 2022] ? br_switchdev_set_port_flag+0x310/0x310
[Sun Aug 7 13:12:29 2022] fdb_notify+0x11b/0x150
[Sun Aug 7 13:12:29 2022] br_fdb_update+0x34c/0x570
[Sun Aug 7 13:12:29 2022] ? lock_chain_count+0x20/0x20
[Sun Aug 7 13:12:29 2022] ? br_fdb_add_local+0x50/0x50
[Sun Aug 7 13:12:29 2022] ? br_allowed_ingress+0x5f/0x1070
[Sun Aug 7 13:12:29 2022] ? check_chain_key+0x24a/0x580
[Sun Aug 7 13:12:29 2022] br_handle_frame_finish+0x786/0x18e0
[Sun Aug 7 13:12:29 2022] ? check_chain_key+0x24a/0x580
[Sun Aug 7 13:12:29 2022] ? br_handle_local_finish+0x20/0x20
[Sun Aug 7 13:12:29 2022] ? __lock_acquire+0xd6f/0x6720
[Sun Aug 7 13:12:29 2022] ? sctp_inet_bind_verify+0x4d/0x190
[Sun Aug 7 13:12:29 2022] ? xlog_unpack_data+0x2e0/0x310
[Sun Aug 7 13:12:29 2022] ? br_handle_local_finish+0x20/0x20
[Sun Aug 7 13:12:29 2022] br_nf_hook_thresh+0x227/0x380 [br_netfilter]
[Sun Aug 7 13:12:29 2022] ? setup_pre_routing+0x460/0x460 [br_netfilter]
[Sun Aug 7 13:12:29 2022] ? br_handle_local_finish+0x20/0x20
[Sun Aug 7 13:12:29 2022] ? br_nf_pre_routing_ipv6+0x48b/0x69c [br_netfilter]
[Sun Aug 7 13:12:29 2022] br_nf_pre_routing_finish_ipv6+0x5c2/0xbf0 [br_netfilter]
[Sun Aug 7 13:12:29 2022] ? br_handle_local_finish+0x20/0x20
[Sun Aug 7 13:12:29 2022] br_nf_pre_routing_ipv6+0x4c6/0x69c [br_netfilter]
[Sun Aug 7 13:12:29 2022] ? br_validate_ipv6+0x9e0/0x9e0 [br_netfilter]
[Sun Aug 7 13:12:29 2022] ? br_nf_forward_arp+0xb70/0xb70 [br_netfilter]
[Sun Aug 7 13:12:29 2022] ? br_nf_pre_routing+0xacf/0x1160 [br_netfilter]
[Sun Aug 7 13:12:29 2022] br_handle_frame+0x8a9/0x1270
[Sun Aug 7 13:12:29 2022] ? br_handle_frame_finish+0x18e0/0x18e0
[Sun Aug 7 13:12:29 2022] ? register_lock_class+0x1880/0x1880
[Sun Aug 7 13:12:29 2022] ? br_handle_local_finish+0x20/0x20
[Sun Aug 7 13:12:29 2022] ? bond_handle_frame+0xf9/0xac0 [bonding]
[Sun Aug 7 13:12:29 2022] ? br_handle_frame_finish+0x18e0/0x18e0
[Sun Aug 7 13:12:29 2022] __netif_receive_skb_core+0x7c0/0x2c70
[Sun Aug 7 13:12:29 2022] ? check_chain_key+0x24a/0x580
[Sun Aug 7 13:12:29 2022] ? generic_xdp_tx+0x5b0/0x5b0
[Sun Aug 7 13:12:29 2022] ? __lock_acquire+0xd6f/0x6720
[Sun Aug 7 13:12:29 2022] ? register_lock_class+0x1880/0x1880
[Sun Aug 7 13:12:29 2022] ? check_chain_key+0x24a/0x580
[Sun Aug 7 13:12:29 2022] __netif_receive_skb_list_core+0x2d7/0x8a0
[Sun Aug 7 13:12:29 2022] ? lock_acquire+0x1c1/0x550
[Sun Aug 7 13:12:29 2022] ? process_backlog+0x960/0x960
[Sun Aug 7 13:12:29 2022] ? lockdep_hardirqs_on_prepare+0x129/0x400
[Sun Aug 7 13:12:29 2022] ? kvm_clock_get_cycles+0x14/0x20
[Sun Aug 7 13:12:29 2022] netif_receive_skb_list_internal+0x5f4/0xd60
[Sun Aug 7 13:12:29 2022] ? do_xdp_generic+0x150/0x150
[Sun Aug 7 13:12:29 2022] ? mlx5e_poll_rx_cq+0xf6b/0x2960 [mlx5_core]
[Sun Aug 7 13:12:29 2022] ? mlx5e_poll_ico_cq+0x3d/0x1590 [mlx5_core]
[Sun Aug 7 13:12:29 2022] napi_complete_done+0x188/0x710
[Sun Aug 7 13:12:29 2022] mlx5e_napi_poll+0x4e9/0x20a0 [mlx5_core]
[Sun Aug 7 13:12:29 2022] ? __queue_work+0x53c/0xeb0
[Sun Aug 7 13:12:29 2022] __napi_poll+0x9f/0x540
[Sun Aug 7 13:12:29 2022] net_rx_action+0x420/0xb70
[Sun Aug 7 13:12:29 2022] ? napi_threaded_poll+0x470/0x470
[Sun Aug 7 13:12:29 2022] ? __common_interrupt+0x79/0x1a0
[Sun Aug 7 13:12:29 2022] __do_softirq+0x271/0x92c
[Sun Aug 7 13:12:29 2022] irq_exit_rcu+0x11a/0x170
[Sun Aug 7 13:12:29 2022] common_interrupt+0x7d/0xa0
[Sun Aug 7 13:12:29 2022] </IRQ>
[Sun Aug 7 13:12:29 2022] <TASK>
[Sun Aug 7 13:12:29 2022] asm_common_interrupt+0x22/0x40
[Sun Aug 7 13:12:29 2022] RIP: 0010:default_idle+0x42/0x60
[Sun Aug 7 13:12:29 2022] Code: c1 83 e0 07 48 c1 e9 03 83 c0 03 0f b6 14 11 38 d0 7c 04 84 d2 75 14 8b 05 6b f1 22 02 85 c0 7e 07 0f 00 2d 80 3b 4a 00 fb f4 <c3> 48 c7 c7 e0 07 7e 85 e8 21 bd 40 fe eb de 66 66 2e 0f 1f 84 00
[Sun Aug 7 13:12:29 2022] RSP: 0018:ffffffff84407e18 EFLAGS: 00000242
[Sun Aug 7 13:12:29 2022] RAX: 0000000000000001 RBX: ffffffff84ec4a68 RCX: 1ffffffff0afc0fc
[Sun Aug 7 13:12:29 2022] RDX: 0000000000000004 RSI: 0000000000000000 RDI: ffffffff835b1fac
[Sun Aug 7 13:12:29 2022] RBP: 0000000000000000 R08: 0000000000000001 R09: ffff8884d2c44ac3
[Sun Aug 7 13:12:29 2022] R10: ffffed109a588958 R11: 00000000ffffffff R12: 0000000000000000
[Sun Aug 7 13:12:29 2022] R13: ffffffff84efac20 R14: 0000000000000000 R15: dffffc0000000000
[Sun Aug 7 13:12:29 2022] ? default_idle_call+0xcc/0x460
[Sun Aug 7 13:12:29 2022] default_idle_call+0xec/0x460
[Sun Aug 7 13:12:29 2022] do_idle+0x394/0x450
[Sun Aug 7 13:12:29 2022] ? arch_cpu_idle_exit+0x40/0x40
[Sun Aug 7 13:12:29 2022] cpu_startup_entry+0x19/0x20
[Sun Aug 7 13:12:29 2022] rest_init+0x156/0x250
[Sun Aug 7 13:12:29 2022] arch_call_rest_init+0xf/0x15
[Sun Aug 7 13:12:29 2022] start_kernel+0x3a7/0x3c5
[Sun Aug 7 13:12:29 2022] secondary_startup_64_no_verify+0xcd/0xdb
[Sun Aug 7 13:12:29 2022] </TASK>

Fixes: ff9b7521468b ("net/mlx5: Bridge, support LAG")
Signed-off-by: Vlad Buslov <[email protected]>
Reviewed-by: Mark Bloch <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
.../net/ethernet/mellanox/mlx5/core/lag/lag.c | 55 +++++++++++--------
1 file changed, 33 insertions(+), 22 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
index c520edb942ca5..d98acd68af2ec 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
@@ -1067,30 +1067,32 @@ static void mlx5_ldev_add_netdev(struct mlx5_lag *ldev,
struct net_device *netdev)
{
unsigned int fn = mlx5_get_dev_index(dev);
+ unsigned long flags;

if (fn >= ldev->ports)
return;

- spin_lock(&lag_lock);
+ spin_lock_irqsave(&lag_lock, flags);
ldev->pf[fn].netdev = netdev;
ldev->tracker.netdev_state[fn].link_up = 0;
ldev->tracker.netdev_state[fn].tx_enabled = 0;
- spin_unlock(&lag_lock);
+ spin_unlock_irqrestore(&lag_lock, flags);
}

static void mlx5_ldev_remove_netdev(struct mlx5_lag *ldev,
struct net_device *netdev)
{
+ unsigned long flags;
int i;

- spin_lock(&lag_lock);
+ spin_lock_irqsave(&lag_lock, flags);
for (i = 0; i < ldev->ports; i++) {
if (ldev->pf[i].netdev == netdev) {
ldev->pf[i].netdev = NULL;
break;
}
}
- spin_unlock(&lag_lock);
+ spin_unlock_irqrestore(&lag_lock, flags);
}

static void mlx5_ldev_add_mdev(struct mlx5_lag *ldev,
@@ -1246,12 +1248,13 @@ void mlx5_lag_add_netdev(struct mlx5_core_dev *dev,
bool mlx5_lag_is_roce(struct mlx5_core_dev *dev)
{
struct mlx5_lag *ldev;
+ unsigned long flags;
bool res;

- spin_lock(&lag_lock);
+ spin_lock_irqsave(&lag_lock, flags);
ldev = mlx5_lag_dev(dev);
res = ldev && __mlx5_lag_is_roce(ldev);
- spin_unlock(&lag_lock);
+ spin_unlock_irqrestore(&lag_lock, flags);

return res;
}
@@ -1260,12 +1263,13 @@ EXPORT_SYMBOL(mlx5_lag_is_roce);
bool mlx5_lag_is_active(struct mlx5_core_dev *dev)
{
struct mlx5_lag *ldev;
+ unsigned long flags;
bool res;

- spin_lock(&lag_lock);
+ spin_lock_irqsave(&lag_lock, flags);
ldev = mlx5_lag_dev(dev);
res = ldev && __mlx5_lag_is_active(ldev);
- spin_unlock(&lag_lock);
+ spin_unlock_irqrestore(&lag_lock, flags);

return res;
}
@@ -1274,13 +1278,14 @@ EXPORT_SYMBOL(mlx5_lag_is_active);
bool mlx5_lag_is_master(struct mlx5_core_dev *dev)
{
struct mlx5_lag *ldev;
+ unsigned long flags;
bool res;

- spin_lock(&lag_lock);
+ spin_lock_irqsave(&lag_lock, flags);
ldev = mlx5_lag_dev(dev);
res = ldev && __mlx5_lag_is_active(ldev) &&
dev == ldev->pf[MLX5_LAG_P1].dev;
- spin_unlock(&lag_lock);
+ spin_unlock_irqrestore(&lag_lock, flags);

return res;
}
@@ -1289,12 +1294,13 @@ EXPORT_SYMBOL(mlx5_lag_is_master);
bool mlx5_lag_is_sriov(struct mlx5_core_dev *dev)
{
struct mlx5_lag *ldev;
+ unsigned long flags;
bool res;

- spin_lock(&lag_lock);
+ spin_lock_irqsave(&lag_lock, flags);
ldev = mlx5_lag_dev(dev);
res = ldev && __mlx5_lag_is_sriov(ldev);
- spin_unlock(&lag_lock);
+ spin_unlock_irqrestore(&lag_lock, flags);

return res;
}
@@ -1303,13 +1309,14 @@ EXPORT_SYMBOL(mlx5_lag_is_sriov);
bool mlx5_lag_is_shared_fdb(struct mlx5_core_dev *dev)
{
struct mlx5_lag *ldev;
+ unsigned long flags;
bool res;

- spin_lock(&lag_lock);
+ spin_lock_irqsave(&lag_lock, flags);
ldev = mlx5_lag_dev(dev);
res = ldev && __mlx5_lag_is_sriov(ldev) &&
test_bit(MLX5_LAG_MODE_FLAG_SHARED_FDB, &ldev->mode_flags);
- spin_unlock(&lag_lock);
+ spin_unlock_irqrestore(&lag_lock, flags);

return res;
}
@@ -1352,9 +1359,10 @@ struct net_device *mlx5_lag_get_roce_netdev(struct mlx5_core_dev *dev)
{
struct net_device *ndev = NULL;
struct mlx5_lag *ldev;
+ unsigned long flags;
int i;

- spin_lock(&lag_lock);
+ spin_lock_irqsave(&lag_lock, flags);
ldev = mlx5_lag_dev(dev);

if (!(ldev && __mlx5_lag_is_roce(ldev)))
@@ -1373,7 +1381,7 @@ struct net_device *mlx5_lag_get_roce_netdev(struct mlx5_core_dev *dev)
dev_hold(ndev);

unlock:
- spin_unlock(&lag_lock);
+ spin_unlock_irqrestore(&lag_lock, flags);

return ndev;
}
@@ -1383,10 +1391,11 @@ u8 mlx5_lag_get_slave_port(struct mlx5_core_dev *dev,
struct net_device *slave)
{
struct mlx5_lag *ldev;
+ unsigned long flags;
u8 port = 0;
int i;

- spin_lock(&lag_lock);
+ spin_lock_irqsave(&lag_lock, flags);
ldev = mlx5_lag_dev(dev);
if (!(ldev && __mlx5_lag_is_roce(ldev)))
goto unlock;
@@ -1401,7 +1410,7 @@ u8 mlx5_lag_get_slave_port(struct mlx5_core_dev *dev,
port = ldev->v2p_map[port * ldev->buckets];

unlock:
- spin_unlock(&lag_lock);
+ spin_unlock_irqrestore(&lag_lock, flags);
return port;
}
EXPORT_SYMBOL(mlx5_lag_get_slave_port);
@@ -1422,8 +1431,9 @@ struct mlx5_core_dev *mlx5_lag_get_peer_mdev(struct mlx5_core_dev *dev)
{
struct mlx5_core_dev *peer_dev = NULL;
struct mlx5_lag *ldev;
+ unsigned long flags;

- spin_lock(&lag_lock);
+ spin_lock_irqsave(&lag_lock, flags);
ldev = mlx5_lag_dev(dev);
if (!ldev)
goto unlock;
@@ -1433,7 +1443,7 @@ struct mlx5_core_dev *mlx5_lag_get_peer_mdev(struct mlx5_core_dev *dev)
ldev->pf[MLX5_LAG_P1].dev;

unlock:
- spin_unlock(&lag_lock);
+ spin_unlock_irqrestore(&lag_lock, flags);
return peer_dev;
}
EXPORT_SYMBOL(mlx5_lag_get_peer_mdev);
@@ -1446,6 +1456,7 @@ int mlx5_lag_query_cong_counters(struct mlx5_core_dev *dev,
int outlen = MLX5_ST_SZ_BYTES(query_cong_statistics_out);
struct mlx5_core_dev **mdev;
struct mlx5_lag *ldev;
+ unsigned long flags;
int num_ports;
int ret, i, j;
void *out;
@@ -1462,7 +1473,7 @@ int mlx5_lag_query_cong_counters(struct mlx5_core_dev *dev,

memset(values, 0, sizeof(*values) * num_counters);

- spin_lock(&lag_lock);
+ spin_lock_irqsave(&lag_lock, flags);
ldev = mlx5_lag_dev(dev);
if (ldev && __mlx5_lag_is_active(ldev)) {
num_ports = ldev->ports;
@@ -1472,7 +1483,7 @@ int mlx5_lag_query_cong_counters(struct mlx5_core_dev *dev,
num_ports = 1;
mdev[MLX5_LAG_P1] = dev;
}
- spin_unlock(&lag_lock);
+ spin_unlock_irqrestore(&lag_lock, flags);

for (i = 0; i < num_ports; ++i) {
u32 in[MLX5_ST_SZ_DW(query_cong_statistics_in)] = {};
--
2.35.1



2022-08-29 12:41:55

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 105/158] x86/unwind/orc: Unwind ftrace trampolines with correct ORC entry

From: Chen Zhongjin <[email protected]>

commit fc2e426b1161761561624ebd43ce8c8d2fa058da upstream.

When meeting ftrace trampolines in ORC unwinding, unwinder uses address
of ftrace_{regs_}call address to find the ORC entry, which gets next frame at
sp+176.

If there is an IRQ hitting at sub $0xa8,%rsp, the next frame should be
sp+8 instead of 176. It makes unwinder skip correct frame and throw
warnings such as "wrong direction" or "can't access registers", etc,
depending on the content of the incorrect frame address.

By adding the base address ftrace_{regs_}caller with the offset
*ip - ops->trampoline*, we can get the correct address to find the ORC entry.

Also change "caller" to "tramp_addr" to make variable name conform to
its content.

[ mingo: Clarified the changelog a bit. ]

Fixes: 6be7fa3c74d1 ("ftrace, orc, x86: Handle ftrace dynamically allocated trampolines")
Signed-off-by: Chen Zhongjin <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Reviewed-by: Steven Rostedt (Google) <[email protected]>
Cc: <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/kernel/unwind_orc.c | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)

--- a/arch/x86/kernel/unwind_orc.c
+++ b/arch/x86/kernel/unwind_orc.c
@@ -93,22 +93,27 @@ static struct orc_entry *orc_find(unsign
static struct orc_entry *orc_ftrace_find(unsigned long ip)
{
struct ftrace_ops *ops;
- unsigned long caller;
+ unsigned long tramp_addr, offset;

ops = ftrace_ops_trampoline(ip);
if (!ops)
return NULL;

+ /* Set tramp_addr to the start of the code copied by the trampoline */
if (ops->flags & FTRACE_OPS_FL_SAVE_REGS)
- caller = (unsigned long)ftrace_regs_call;
+ tramp_addr = (unsigned long)ftrace_regs_caller;
else
- caller = (unsigned long)ftrace_call;
+ tramp_addr = (unsigned long)ftrace_caller;
+
+ /* Now place tramp_addr to the location within the trampoline ip is at */
+ offset = ip - ops->trampoline;
+ tramp_addr += offset;

/* Prevent unlikely recursion */
- if (ip == caller)
+ if (ip == tramp_addr)
return NULL;

- return orc_find(caller);
+ return orc_find(tramp_addr);
}
#else
static struct orc_entry *orc_ftrace_find(unsigned long ip)


2022-08-29 12:42:11

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 011/158] mt76: mt7921: fix command timeout in AP stop period

From: Deren Wu <[email protected]>

commit 9d958b60ebc2434f2b7eae83d77849e22d1059eb upstream.

Due to AP stop improperly, mt7921 driver would face random command timeout
by chip fw problem. Migrate AP start/stop process to .start_ap/.stop_ap and
congiure BSS network settings in both hooks.

The new flow is shown below.
* AP start
.start_ap()
configure BSS network resource
set BSS to connected state
.bss_info_changed()
enable fw beacon offload

* AP stop
.bss_info_changed()
disable fw beacon offload (skip this command)
.stop_ap()
set BSS to disconnected state (beacon offload disabled automatically)
destroy BSS network resource

Fixes: 116c69603b01 ("mt76: mt7921: Add AP mode support")
Signed-off-by: Sean Wang <[email protected]>
Signed-off-by: Deren Wu <[email protected]>
Signed-off-by: Felix Fietkau <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/wireless/mediatek/mt76/mt76_connac_mcu.c | 2
drivers/net/wireless/mediatek/mt76/mt7921/main.c | 47 +++++++++++++++----
drivers/net/wireless/mediatek/mt76/mt7921/mcu.c | 5 --
drivers/net/wireless/mediatek/mt76/mt7921/mt7921.h | 2
4 files changed, 43 insertions(+), 13 deletions(-)

--- a/drivers/net/wireless/mediatek/mt76/mt76_connac_mcu.c
+++ b/drivers/net/wireless/mediatek/mt76/mt76_connac_mcu.c
@@ -1403,6 +1403,8 @@ int mt76_connac_mcu_uni_add_bss(struct m
else
conn_type = CONNECTION_INFRA_AP;
basic_req.basic.conn_type = cpu_to_le32(conn_type);
+ /* Fully active/deactivate BSS network in AP mode only */
+ basic_req.basic.active = enable;
break;
case NL80211_IFTYPE_STATION:
if (vif->p2p)
--- a/drivers/net/wireless/mediatek/mt76/mt7921/main.c
+++ b/drivers/net/wireless/mediatek/mt76/mt7921/main.c
@@ -653,15 +653,6 @@ static void mt7921_bss_info_changed(stru
}
}

- if (changed & BSS_CHANGED_BEACON_ENABLED && info->enable_beacon) {
- struct mt7921_vif *mvif = (struct mt7921_vif *)vif->drv_priv;
-
- mt76_connac_mcu_uni_add_bss(phy->mt76, vif, &mvif->sta.wcid,
- true);
- mt7921_mcu_sta_update(dev, NULL, vif, true,
- MT76_STA_INFO_STATE_NONE);
- }
-
if (changed & (BSS_CHANGED_BEACON |
BSS_CHANGED_BEACON_ENABLED))
mt7921_mcu_uni_add_beacon_offload(dev, hw, vif,
@@ -1500,6 +1491,42 @@ mt7921_channel_switch_beacon(struct ieee
mt7921_mutex_release(dev);
}

+static int
+mt7921_start_ap(struct ieee80211_hw *hw, struct ieee80211_vif *vif)
+{
+ struct mt7921_vif *mvif = (struct mt7921_vif *)vif->drv_priv;
+ struct mt7921_phy *phy = mt7921_hw_phy(hw);
+ struct mt7921_dev *dev = mt7921_hw_dev(hw);
+ int err;
+
+ err = mt76_connac_mcu_uni_add_bss(phy->mt76, vif, &mvif->sta.wcid,
+ true);
+ if (err)
+ return err;
+
+ err = mt7921_mcu_set_bss_pm(dev, vif, true);
+ if (err)
+ return err;
+
+ return mt7921_mcu_sta_update(dev, NULL, vif, true,
+ MT76_STA_INFO_STATE_NONE);
+}
+
+static void
+mt7921_stop_ap(struct ieee80211_hw *hw, struct ieee80211_vif *vif)
+{
+ struct mt7921_vif *mvif = (struct mt7921_vif *)vif->drv_priv;
+ struct mt7921_phy *phy = mt7921_hw_phy(hw);
+ struct mt7921_dev *dev = mt7921_hw_dev(hw);
+ int err;
+
+ err = mt7921_mcu_set_bss_pm(dev, vif, false);
+ if (err)
+ return;
+
+ mt76_connac_mcu_uni_add_bss(phy->mt76, vif, &mvif->sta.wcid, false);
+}
+
const struct ieee80211_ops mt7921_ops = {
.tx = mt7921_tx,
.start = mt7921_start,
@@ -1510,6 +1537,8 @@ const struct ieee80211_ops mt7921_ops =
.conf_tx = mt7921_conf_tx,
.configure_filter = mt7921_configure_filter,
.bss_info_changed = mt7921_bss_info_changed,
+ .start_ap = mt7921_start_ap,
+ .stop_ap = mt7921_stop_ap,
.sta_state = mt7921_sta_state,
.sta_pre_rcu_remove = mt76_sta_pre_rcu_remove,
.set_key = mt7921_set_key,
--- a/drivers/net/wireless/mediatek/mt76/mt7921/mcu.c
+++ b/drivers/net/wireless/mediatek/mt76/mt7921/mcu.c
@@ -1020,7 +1020,7 @@ mt7921_mcu_uni_bss_bcnft(struct mt7921_d
&bcnft_req, sizeof(bcnft_req), true);
}

-static int
+int
mt7921_mcu_set_bss_pm(struct mt7921_dev *dev, struct ieee80211_vif *vif,
bool enable)
{
@@ -1049,9 +1049,6 @@ mt7921_mcu_set_bss_pm(struct mt7921_dev
};
int err;

- if (vif->type != NL80211_IFTYPE_STATION)
- return 0;
-
err = mt76_mcu_send_msg(&dev->mt76, MCU_CE_CMD(SET_BSS_ABORT),
&req_hdr, sizeof(req_hdr), false);
if (err < 0 || !enable)
--- a/drivers/net/wireless/mediatek/mt76/mt7921/mt7921.h
+++ b/drivers/net/wireless/mediatek/mt76/mt7921/mt7921.h
@@ -280,6 +280,8 @@ int mt7921_wpdma_reset(struct mt7921_dev
int mt7921_wpdma_reinit_cond(struct mt7921_dev *dev);
void mt7921_dma_cleanup(struct mt7921_dev *dev);
int mt7921_run_firmware(struct mt7921_dev *dev);
+int mt7921_mcu_set_bss_pm(struct mt7921_dev *dev, struct ieee80211_vif *vif,
+ bool enable);
int mt7921_mcu_sta_update(struct mt7921_dev *dev, struct ieee80211_sta *sta,
struct ieee80211_vif *vif, bool enable,
enum mt76_sta_info_state state);


2022-08-29 12:42:27

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 055/158] netfilter: nf_tables: disallow updates of implicit chain

From: Pablo Neira Ayuso <[email protected]>

[ Upstream commit 5dc52d83baac30decf5f3b371d5eb41dfa1d1412 ]

Updates on existing implicit chain make no sense, disallow this.

Fixes: d0e2c7de92c7 ("netfilter: nf_tables: add NFT_CHAIN_BINDING")
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/netfilter/nf_tables_api.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 4bd6e9427c918..8b6ee9df817fb 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -2574,6 +2574,9 @@ static int nf_tables_newchain(struct sk_buff *skb, const struct nfnl_info *info,
nft_ctx_init(&ctx, net, skb, info->nlh, family, table, chain, nla);

if (chain != NULL) {
+ if (chain->flags & NFT_CHAIN_BINDING)
+ return -EINVAL;
+
if (info->nlh->nlmsg_flags & NLM_F_EXCL) {
NL_SET_BAD_ATTR(extack, attr);
return -EEXIST;
--
2.35.1



2022-08-29 12:42:28

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 128/158] xen/privcmd: fix error exit of privcmd_ioctl_dm_op()

From: Juergen Gross <[email protected]>

commit c5deb27895e017a0267de0a20d140ad5fcc55a54 upstream.

The error exit of privcmd_ioctl_dm_op() is calling unlock_pages()
potentially with pages being NULL, leading to a NULL dereference.

Additionally lock_pages() doesn't check for pin_user_pages_fast()
having been completely successful, resulting in potentially not
locking all pages into memory. This could result in sporadic failures
when using the related memory in user mode.

Fix all of that by calling unlock_pages() always with the real number
of pinned pages, which will be zero in case pages being NULL, and by
checking the number of pages pinned by pin_user_pages_fast() matching
the expected number of pages.

Cc: <[email protected]>
Fixes: ab520be8cd5d ("xen/privcmd: Add IOCTL_PRIVCMD_DM_OP")
Reported-by: Rustam Subkhankulov <[email protected]>
Signed-off-by: Juergen Gross <[email protected]>
Reviewed-by: Jan Beulich <[email protected]>
Reviewed-by: Oleksandr Tyshchenko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Juergen Gross <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/xen/privcmd.c | 21 +++++++++++----------
1 file changed, 11 insertions(+), 10 deletions(-)

--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -581,27 +581,30 @@ static int lock_pages(
struct privcmd_dm_op_buf kbufs[], unsigned int num,
struct page *pages[], unsigned int nr_pages, unsigned int *pinned)
{
- unsigned int i;
+ unsigned int i, off = 0;

- for (i = 0; i < num; i++) {
+ for (i = 0; i < num; ) {
unsigned int requested;
int page_count;

requested = DIV_ROUND_UP(
offset_in_page(kbufs[i].uptr) + kbufs[i].size,
- PAGE_SIZE);
+ PAGE_SIZE) - off;
if (requested > nr_pages)
return -ENOSPC;

page_count = pin_user_pages_fast(
- (unsigned long) kbufs[i].uptr,
+ (unsigned long)kbufs[i].uptr + off * PAGE_SIZE,
requested, FOLL_WRITE, pages);
- if (page_count < 0)
- return page_count;
+ if (page_count <= 0)
+ return page_count ? : -EFAULT;

*pinned += page_count;
nr_pages -= page_count;
pages += page_count;
+
+ off = (requested == page_count) ? 0 : off + page_count;
+ i += !off;
}

return 0;
@@ -677,10 +680,8 @@ static long privcmd_ioctl_dm_op(struct f
}

rc = lock_pages(kbufs, kdata.num, pages, nr_pages, &pinned);
- if (rc < 0) {
- nr_pages = pinned;
+ if (rc < 0)
goto out;
- }

for (i = 0; i < kdata.num; i++) {
set_xen_guest_handle(xbufs[i].h, kbufs[i].uptr);
@@ -692,7 +693,7 @@ static long privcmd_ioctl_dm_op(struct f
xen_preemptible_hcall_end();

out:
- unlock_pages(pages, nr_pages);
+ unlock_pages(pages, pinned);
kfree(xbufs);
kfree(pages);
kfree(kbufs);


2022-08-29 12:42:40

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 145/158] perf/x86/intel/ds: Fix precise store latency handling

From: Stephane Eranian <[email protected]>

commit d4bdb0bebc5ba3299d74f123c782d99cd4e25c49 upstream.

With the existing code in store_latency_data(), the memory operation (mem_op)
returned to the user is always OP_LOAD where in fact, it should be OP_STORE.
This comes from the fact that the function is simply grabbing the information
from a data source map which covers only load accesses. Intel 12th gen CPU
offers precise store sampling that captures both the data source and latency.
Therefore it can use the data source mapping table but must override the
memory operation to reflect stores instead of loads.

Fixes: 61b985e3e775 ("perf/x86/intel: Add perf core PMU support for Sapphire Rapids")
Signed-off-by: Stephane Eranian <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/events/intel/ds.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)

--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -291,6 +291,7 @@ static u64 load_latency_data(struct perf
static u64 store_latency_data(struct perf_event *event, u64 status)
{
union intel_x86_pebs_dse dse;
+ union perf_mem_data_src src;
u64 val;

dse.val = status;
@@ -304,7 +305,14 @@ static u64 store_latency_data(struct per

val |= P(BLK, NA);

- return val;
+ /*
+ * the pebs_data_source table is only for loads
+ * so override the mem_op to say STORE instead
+ */
+ src.val = val;
+ src.mem_op = P(OP,STORE);
+
+ return src.val;
}

struct pebs_record_core {


2022-08-29 12:42:40

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 150/158] arm64/sme: Dont flush SVE register state when handling SME traps

From: Mark Brown <[email protected]>

commit 714f3cbd70a4db9f9b7fe5b8a032896ed33fb824 upstream.

Currently as part of handling a SME access trap we flush the SVE register
state. This is not needed and would corrupt register state if the task has
access to the SVE registers already. For non-streaming mode accesses the
required flushing will be done in the SVE access trap. For streaming
mode SVE register accesses the architecture guarantees that the register
state will be flushed when streaming mode is entered or exited so there is
no need for us to do so. Simply remove the register initialisation.

Fixes: 8bd7f91c03d8 ("arm64/sme: Implement traps and syscall handling for SME")
Signed-off-by: Mark Brown <[email protected]>
Reviewed-by: Catalin Marinas <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/arm64/kernel/fpsimd.c | 11 -----------
1 file changed, 11 deletions(-)

--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -1463,17 +1463,6 @@ void do_sme_acc(unsigned long esr, struc
fpsimd_bind_task_to_cpu();
}

- /*
- * If SVE was not already active initialise the SVE registers,
- * any non-shared state between the streaming and regular SVE
- * registers is architecturally guaranteed to be zeroed when
- * we enter streaming mode. We do not need to initialize ZA
- * since ZA must be disabled at this point and enabling ZA is
- * architecturally defined to zero ZA.
- */
- if (system_supports_sve() && !test_thread_flag(TIF_SVE))
- sve_init_regs();
-
put_cpu_fpsimd_context();
}



2022-08-29 12:42:50

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 088/158] ionic: fix up issues with handling EAGAIN on FW cmds

From: Shannon Nelson <[email protected]>

[ Upstream commit 0fc4dd452d6c14828eed6369155c75c0ac15bab3 ]

In looping on FW update tests we occasionally see the
FW_ACTIVATE_STATUS command fail while it is in its EAGAIN loop
waiting for the FW activate step to finsh inside the FW. The
firmware is complaining that the done bit is set when a new
dev_cmd is going to be processed.

Doing a clean on the cmd registers and doorbell before exiting
the wait-for-done and cleaning the done bit before the sleep
prevents this from occurring.

Fixes: fbfb8031533c ("ionic: Add hardware init and device commands")
Signed-off-by: Shannon Nelson <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/ethernet/pensando/ionic/ionic_main.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/pensando/ionic/ionic_main.c b/drivers/net/ethernet/pensando/ionic/ionic_main.c
index 4029b4e021f86..56f93b0305519 100644
--- a/drivers/net/ethernet/pensando/ionic/ionic_main.c
+++ b/drivers/net/ethernet/pensando/ionic/ionic_main.c
@@ -474,8 +474,8 @@ static int __ionic_dev_cmd_wait(struct ionic *ionic, unsigned long max_seconds,
ionic_opcode_to_str(opcode), opcode,
ionic_error_to_str(err), err);

- msleep(1000);
iowrite32(0, &idev->dev_cmd_regs->done);
+ msleep(1000);
iowrite32(1, &idev->dev_cmd_regs->doorbell);
goto try_again;
}
@@ -488,6 +488,8 @@ static int __ionic_dev_cmd_wait(struct ionic *ionic, unsigned long max_seconds,
return ionic_error_to_errno(err);
}

+ ionic_dev_cmd_clean(ionic);
+
return 0;
}

--
2.35.1



2022-08-29 12:43:24

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 153/158] scsi: core: Fix passthrough retry counter handling

From: Mike Christie <[email protected]>

commit fac8e558da9485e13a0ae0488aa0b8a8c307cd34 upstream.

Passthrough users will set the scsi_cmnd->allowed value and were expecting
up to $allowed retries. The problem is that before:

commit 6aded12b10e0 ("scsi: core: Remove struct scsi_request")

we used to set the retries on the scsi_request then copy them over to
scsi_cmnd->allowed in scsi_setup_scsi_cmnd. With that patch we now set
scsi_cmnd->allowed to 0 in scsi_prepare_cmd and overwrite what the
passthrough user set.

This moves the allowed initialization to after the blk_rq_is_passthrough()
check so it's only done for the non-passthrough path where the ULD
init_command will normally set an allowed value it prefers.

Link: https://lore.kernel.org/r/[email protected]
Fixes: 6aded12b10e0 ("scsi: core: Remove struct scsi_request")
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Mike Christie <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/scsi/scsi_lib.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1549,7 +1549,6 @@ static blk_status_t scsi_prepare_cmd(str
scsi_init_command(sdev, cmd);

cmd->eh_eflags = 0;
- cmd->allowed = 0;
cmd->prot_type = 0;
cmd->prot_flags = 0;
cmd->submitter = 0;
@@ -1600,6 +1599,8 @@ static blk_status_t scsi_prepare_cmd(str
return ret;
}

+ /* Usually overridden by the ULP */
+ cmd->allowed = 0;
memset(cmd->cmnd, 0, sizeof(cmd->cmnd));
return scsi_cmd_to_driver(cmd)->init_command(cmd);
}


2022-08-29 12:43:37

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 092/158] net: lantiq_xrx200: fix lock under memory pressure

From: Aleksander Jan Bajkowski <[email protected]>

[ Upstream commit c4b6e9341f930e4dd089231c0414758f5f1f9dbd ]

When the xrx200_hw_receive() function returns -ENOMEM, the NAPI poll
function immediately returns an error.
This is incorrect for two reasons:
* the function terminates without enabling interrupts or scheduling NAPI,
* the error code (-ENOMEM) is returned instead of the number of received
packets.

After the first memory allocation failure occurs, packet reception is
locked due to disabled interrupts from DMA..

Fixes: fe1a56420cf2 ("net: lantiq: Add Lantiq / Intel VRX200 Ethernet driver")
Signed-off-by: Aleksander Jan Bajkowski <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/ethernet/lantiq_xrx200.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/lantiq_xrx200.c b/drivers/net/ethernet/lantiq_xrx200.c
index 89314b645c822..25adce7f0c7c0 100644
--- a/drivers/net/ethernet/lantiq_xrx200.c
+++ b/drivers/net/ethernet/lantiq_xrx200.c
@@ -294,7 +294,7 @@ static int xrx200_poll_rx(struct napi_struct *napi, int budget)
if (ret == XRX200_DMA_PACKET_IN_PROGRESS)
continue;
if (ret != XRX200_DMA_PACKET_COMPLETE)
- return ret;
+ break;
rx++;
} else {
break;
--
2.35.1



2022-08-29 12:43:55

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 109/158] x86/PAT: Have pat_enabled() properly reflect state when running on Xen

From: Jan Beulich <[email protected]>

commit 72cbc8f04fe2fa93443c0fcccb7ad91dfea3d9ce upstream.

After commit ID in the Fixes: tag, pat_enabled() returns false (because
of PAT initialization being suppressed in the absence of MTRRs being
announced to be available).

This has become a problem: the i915 driver now fails to initialize when
running PV on Xen (i915_gem_object_pin_map() is where I located the
induced failure), and its error handling is flaky enough to (at least
sometimes) result in a hung system.

Yet even beyond that problem the keying of the use of WC mappings to
pat_enabled() (see arch_can_pci_mmap_wc()) means that in particular
graphics frame buffer accesses would have been quite a bit less optimal
than possible.

Arrange for the function to return true in such environments, without
undermining the rest of PAT MSR management logic considering PAT to be
disabled: specifically, no writes to the PAT MSR should occur.

For the new boolean to live in .init.data, init_cache_modes() also needs
moving to .init.text (where it could/should have lived already before).

[ bp: This is the "small fix" variant for stable. It'll get replaced
with a proper PAT and MTRR detection split upstream but that is too
involved for a stable backport.
- additional touchups to commit msg. Use cpu_feature_enabled(). ]

Fixes: bdd8b6c98239 ("drm/i915: replace X86_FEATURE_PAT with pat_enabled()")
Signed-off-by: Jan Beulich <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Acked-by: Ingo Molnar <[email protected]>
Cc: <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Lucas De Marchi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/mm/pat/memtype.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)

--- a/arch/x86/mm/pat/memtype.c
+++ b/arch/x86/mm/pat/memtype.c
@@ -62,6 +62,7 @@

static bool __read_mostly pat_bp_initialized;
static bool __read_mostly pat_disabled = !IS_ENABLED(CONFIG_X86_PAT);
+static bool __initdata pat_force_disabled = !IS_ENABLED(CONFIG_X86_PAT);
static bool __read_mostly pat_bp_enabled;
static bool __read_mostly pat_cm_initialized;

@@ -86,6 +87,7 @@ void pat_disable(const char *msg_reason)
static int __init nopat(char *str)
{
pat_disable("PAT support disabled via boot option.");
+ pat_force_disabled = true;
return 0;
}
early_param("nopat", nopat);
@@ -272,7 +274,7 @@ static void pat_ap_init(u64 pat)
wrmsrl(MSR_IA32_CR_PAT, pat);
}

-void init_cache_modes(void)
+void __init init_cache_modes(void)
{
u64 pat = 0;

@@ -313,6 +315,12 @@ void init_cache_modes(void)
*/
pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) |
PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
+ } else if (!pat_force_disabled && cpu_feature_enabled(X86_FEATURE_HYPERVISOR)) {
+ /*
+ * Clearly PAT is enabled underneath. Allow pat_enabled() to
+ * reflect this.
+ */
+ pat_bp_enabled = true;
}

__init_cache_modes(pat);


2022-08-29 12:57:06

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 132/158] Revert "zram: remove double compression logic"

From: Jiri Slaby <[email protected]>

commit 37887783b3fef877bf34b8992c9199864da4afcb upstream.

This reverts commit e7be8d1dd983156b ("zram: remove double compression
logic") as it causes zram failures. It does not revert cleanly, PTR_ERR
handling was introduced in the meantime. This is handled by appropriate
IS_ERR.

When under memory pressure, zs_malloc() can fail. Before the above
commit, the allocation was retried with direct reclaim enabled (GFP_NOIO).
After the commit, it is not -- only __GFP_KSWAPD_RECLAIM is tried.

So when the failure occurs under memory pressure, the overlaying
filesystem such as ext2 (mounted by ext4 module in this case) can emit
failures, making the (file)system unusable:
EXT4-fs warning (device zram0): ext4_end_bio:343: I/O error 10 writing to inode 16386 starting block 159744)
Buffer I/O error on device zram0, logical block 159744

With direct reclaim, memory is really reclaimed and allocation succeeds,
eventually. In the worst case, the oom killer is invoked, which is proper
outcome if user sets up zram too large (in comparison to available RAM).

This very diff doesn't apply to 5.19 (stable) cleanly (see PTR_ERR note
above). Use revert of e7be8d1dd983 directly.

Link: https://bugzilla.suse.com/show_bug.cgi?id=1202203
Link: https://lkml.kernel.org/r/[email protected]
Fixes: e7be8d1dd983 ("zram: remove double compression logic")
Signed-off-by: Jiri Slaby <[email protected]>
Reviewed-by: Sergey Senozhatsky <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Nitin Gupta <[email protected]>
Cc: Alexey Romanov <[email protected]>
Cc: Dmitry Rokosov <[email protected]>
Cc: Lukas Czerner <[email protected]>
Cc: <[email protected]> [5.19]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/block/zram/zram_drv.c | 42 ++++++++++++++++++++++++++++++++----------
drivers/block/zram/zram_drv.h | 1 +
2 files changed, 33 insertions(+), 10 deletions(-)

--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -1144,14 +1144,15 @@ static ssize_t bd_stat_show(struct devic
static ssize_t debug_stat_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
- int version = 2;
+ int version = 1;
struct zram *zram = dev_to_zram(dev);
ssize_t ret;

down_read(&zram->init_lock);
ret = scnprintf(buf, PAGE_SIZE,
- "version: %d\n%8llu\n",
+ "version: %d\n%8llu %8llu\n",
version,
+ (u64)atomic64_read(&zram->stats.writestall),
(u64)atomic64_read(&zram->stats.miss_free));
up_read(&zram->init_lock);

@@ -1367,6 +1368,7 @@ static int __zram_bvec_write(struct zram
}
kunmap_atomic(mem);

+compress_again:
zstrm = zcomp_stream_get(zram->comp);
src = kmap_atomic(page);
ret = zcomp_compress(zstrm, src, &comp_len);
@@ -1375,20 +1377,39 @@ static int __zram_bvec_write(struct zram
if (unlikely(ret)) {
zcomp_stream_put(zram->comp);
pr_err("Compression failed! err=%d\n", ret);
+ zs_free(zram->mem_pool, handle);
return ret;
}

if (comp_len >= huge_class_size)
comp_len = PAGE_SIZE;
-
- handle = zs_malloc(zram->mem_pool, comp_len,
- __GFP_KSWAPD_RECLAIM |
- __GFP_NOWARN |
- __GFP_HIGHMEM |
- __GFP_MOVABLE);
-
- if (unlikely(!handle)) {
+ /*
+ * handle allocation has 2 paths:
+ * a) fast path is executed with preemption disabled (for
+ * per-cpu streams) and has __GFP_DIRECT_RECLAIM bit clear,
+ * since we can't sleep;
+ * b) slow path enables preemption and attempts to allocate
+ * the page with __GFP_DIRECT_RECLAIM bit set. we have to
+ * put per-cpu compression stream and, thus, to re-do
+ * the compression once handle is allocated.
+ *
+ * if we have a 'non-null' handle here then we are coming
+ * from the slow path and handle has already been allocated.
+ */
+ if (!handle)
+ handle = zs_malloc(zram->mem_pool, comp_len,
+ __GFP_KSWAPD_RECLAIM |
+ __GFP_NOWARN |
+ __GFP_HIGHMEM |
+ __GFP_MOVABLE);
+ if (!handle) {
zcomp_stream_put(zram->comp);
+ atomic64_inc(&zram->stats.writestall);
+ handle = zs_malloc(zram->mem_pool, comp_len,
+ GFP_NOIO | __GFP_HIGHMEM |
+ __GFP_MOVABLE);
+ if (handle)
+ goto compress_again;
return -ENOMEM;
}

@@ -1946,6 +1967,7 @@ static int zram_add(void)
if (ZRAM_LOGICAL_BLOCK_SIZE == PAGE_SIZE)
blk_queue_max_write_zeroes_sectors(zram->disk->queue, UINT_MAX);

+ blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES, zram->disk->queue);
ret = device_add_disk(NULL, zram->disk, zram_disk_groups);
if (ret)
goto out_cleanup_disk;
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -81,6 +81,7 @@ struct zram_stats {
atomic64_t huge_pages_since; /* no. of huge pages since zram set up */
atomic64_t pages_stored; /* no. of pages currently stored */
atomic_long_t max_used_pages; /* no. of maximum pages stored */
+ atomic64_t writestall; /* no. of write slow paths */
atomic64_t miss_free; /* no. of missed free */
#ifdef CONFIG_ZRAM_WRITEBACK
atomic64_t bd_count; /* no. of pages in backing device */


2022-08-29 12:57:46

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 087/158] ionic: clear broken state on generation change

From: Shannon Nelson <[email protected]>

[ Upstream commit 9cb9dadb8f45c67e4310e002c2f221b70312b293 ]

There is a case found in heavy testing where a link flap happens just
before a firmware Recovery event and the driver gets stuck in the
BROKEN state. This comes from the driver getting interrupted by a FW
generation change when coming back up from the link flap, and the call
to ionic_start_queues() in ionic_link_status_check() fails. This can be
addressed by having the fw_up code clear the BROKEN bit if seen, rather
than waiting for a user to manually force the interface down and then
back up.

Fixes: 9e8eaf8427b6 ("ionic: stop watchdog when in broken state")
Signed-off-by: Shannon Nelson <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/ethernet/pensando/ionic/ionic_lif.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/pensando/ionic/ionic_lif.c b/drivers/net/ethernet/pensando/ionic/ionic_lif.c
index 1443f788ee37c..d4226999547e8 100644
--- a/drivers/net/ethernet/pensando/ionic/ionic_lif.c
+++ b/drivers/net/ethernet/pensando/ionic/ionic_lif.c
@@ -2963,6 +2963,9 @@ static void ionic_lif_handle_fw_up(struct ionic_lif *lif)

mutex_lock(&lif->queue_lock);

+ if (test_and_clear_bit(IONIC_LIF_F_BROKEN, lif->state))
+ dev_info(ionic->dev, "FW Up: clearing broken state\n");
+
err = ionic_qcqs_alloc(lif);
if (err)
goto err_unlock;
--
2.35.1



2022-08-29 12:58:34

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 026/158] nfc: pn533: Fix use-after-free bugs caused by pn532_cmd_timeout

From: Duoming Zhou <[email protected]>

[ Upstream commit f1e941dbf80a9b8bab0bffbc4cbe41cc7f4c6fb6 ]

When the pn532 uart device is detaching, the pn532_uart_remove()
is called. But there are no functions in pn532_uart_remove() that
could delete the cmd_timeout timer, which will cause use-after-free
bugs. The process is shown below:

(thread 1) | (thread 2)
| pn532_uart_send_frame
pn532_uart_remove | mod_timer(&pn532->cmd_timeout,...)
... | (wait a time)
kfree(pn532) //FREE | pn532_cmd_timeout
| pn532_uart_send_frame
| pn532->... //USE

This patch adds del_timer_sync() in pn532_uart_remove() in order to
prevent the use-after-free bugs. What's more, the pn53x_unregister_nfc()
is well synchronized, it sets nfc_dev->shutting_down to true and there
are no syscalls could restart the cmd_timeout timer.

Fixes: c656aa4c27b1 ("nfc: pn533: add UART phy driver")
Signed-off-by: Duoming Zhou <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/nfc/pn533/uart.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/nfc/pn533/uart.c b/drivers/nfc/pn533/uart.c
index 2caf997f9bc94..07596bf5f7d6d 100644
--- a/drivers/nfc/pn533/uart.c
+++ b/drivers/nfc/pn533/uart.c
@@ -310,6 +310,7 @@ static void pn532_uart_remove(struct serdev_device *serdev)
pn53x_unregister_nfc(pn532->priv);
serdev_device_close(serdev);
pn53x_common_clean(pn532->priv);
+ del_timer_sync(&pn532->cmd_timeout);
kfree_skb(pn532->recv_skb);
kfree(pn532);
}
--
2.35.1



2022-08-29 12:59:27

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 076/158] net: Fix a data-race around netdev_budget_usecs.

From: Kuniyuki Iwashima <[email protected]>

[ Upstream commit fa45d484c52c73f79db2c23b0cdfc6c6455093ad ]

While reading netdev_budget_usecs, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 7acf8a1e8a28 ("Replace 2 jiffies with sysctl netdev_budget_usecs to enable softirq tuning")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/core/dev.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index a330f93629314..19baeaf65a646 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6646,7 +6646,7 @@ static __latent_entropy void net_rx_action(struct softirq_action *h)
{
struct softnet_data *sd = this_cpu_ptr(&softnet_data);
unsigned long time_limit = jiffies +
- usecs_to_jiffies(netdev_budget_usecs);
+ usecs_to_jiffies(READ_ONCE(netdev_budget_usecs));
int budget = READ_ONCE(netdev_budget);
LIST_HEAD(list);
LIST_HEAD(repoll);
--
2.35.1



2022-08-29 13:00:04

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 082/158] ixgbe: stop resetting SYSTIME in ixgbe_ptp_start_cyclecounter

From: Jacob Keller <[email protected]>

[ Upstream commit 25d7a5f5a6bb15a2dae0a3f39ea5dda215024726 ]

The ixgbe_ptp_start_cyclecounter is intended to be called whenever the
cyclecounter parameters need to be changed.

Since commit a9763f3cb54c ("ixgbe: Update PTP to support X550EM_x
devices"), this function has cleared the SYSTIME registers and reset the
TSAUXC DISABLE_SYSTIME bit.

While these need to be cleared during ixgbe_ptp_reset, it is wrong to clear
them during ixgbe_ptp_start_cyclecounter. This function may be called
during both reset and link status change. When link changes, the SYSTIME
counter is still operating normally, but the cyclecounter should be updated
to account for the possibly changed parameters.

Clearing SYSTIME when link changes causes the timecounter to jump because
the cycle counter now reads zero.

Extract the SYSTIME initialization out to a new function and call this
during ixgbe_ptp_reset. This prevents the timecounter adjustment and avoids
an unnecessary reset of the current time.

This also restores the original SYSTIME clearing that occurred during
ixgbe_ptp_reset before the commit above.

Reported-by: Steve Payne <[email protected]>
Reported-by: Ilya Evenbach <[email protected]>
Fixes: a9763f3cb54c ("ixgbe: Update PTP to support X550EM_x devices")
Signed-off-by: Jacob Keller <[email protected]>
Tested-by: Gurucharan <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c | 59 +++++++++++++++-----
1 file changed, 46 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c
index 336426a67ac1b..38cda659f65f4 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c
@@ -1208,7 +1208,6 @@ void ixgbe_ptp_start_cyclecounter(struct ixgbe_adapter *adapter)
struct cyclecounter cc;
unsigned long flags;
u32 incval = 0;
- u32 tsauxc = 0;
u32 fuse0 = 0;

/* For some of the boards below this mask is technically incorrect.
@@ -1243,18 +1242,6 @@ void ixgbe_ptp_start_cyclecounter(struct ixgbe_adapter *adapter)
case ixgbe_mac_x550em_a:
case ixgbe_mac_X550:
cc.read = ixgbe_ptp_read_X550;
-
- /* enable SYSTIME counter */
- IXGBE_WRITE_REG(hw, IXGBE_SYSTIMR, 0);
- IXGBE_WRITE_REG(hw, IXGBE_SYSTIML, 0);
- IXGBE_WRITE_REG(hw, IXGBE_SYSTIMH, 0);
- tsauxc = IXGBE_READ_REG(hw, IXGBE_TSAUXC);
- IXGBE_WRITE_REG(hw, IXGBE_TSAUXC,
- tsauxc & ~IXGBE_TSAUXC_DISABLE_SYSTIME);
- IXGBE_WRITE_REG(hw, IXGBE_TSIM, IXGBE_TSIM_TXTS);
- IXGBE_WRITE_REG(hw, IXGBE_EIMS, IXGBE_EIMS_TIMESYNC);
-
- IXGBE_WRITE_FLUSH(hw);
break;
case ixgbe_mac_X540:
cc.read = ixgbe_ptp_read_82599;
@@ -1286,6 +1273,50 @@ void ixgbe_ptp_start_cyclecounter(struct ixgbe_adapter *adapter)
spin_unlock_irqrestore(&adapter->tmreg_lock, flags);
}

+/**
+ * ixgbe_ptp_init_systime - Initialize SYSTIME registers
+ * @adapter: the ixgbe private board structure
+ *
+ * Initialize and start the SYSTIME registers.
+ */
+static void ixgbe_ptp_init_systime(struct ixgbe_adapter *adapter)
+{
+ struct ixgbe_hw *hw = &adapter->hw;
+ u32 tsauxc;
+
+ switch (hw->mac.type) {
+ case ixgbe_mac_X550EM_x:
+ case ixgbe_mac_x550em_a:
+ case ixgbe_mac_X550:
+ tsauxc = IXGBE_READ_REG(hw, IXGBE_TSAUXC);
+
+ /* Reset SYSTIME registers to 0 */
+ IXGBE_WRITE_REG(hw, IXGBE_SYSTIMR, 0);
+ IXGBE_WRITE_REG(hw, IXGBE_SYSTIML, 0);
+ IXGBE_WRITE_REG(hw, IXGBE_SYSTIMH, 0);
+
+ /* Reset interrupt settings */
+ IXGBE_WRITE_REG(hw, IXGBE_TSIM, IXGBE_TSIM_TXTS);
+ IXGBE_WRITE_REG(hw, IXGBE_EIMS, IXGBE_EIMS_TIMESYNC);
+
+ /* Activate the SYSTIME counter */
+ IXGBE_WRITE_REG(hw, IXGBE_TSAUXC,
+ tsauxc & ~IXGBE_TSAUXC_DISABLE_SYSTIME);
+ break;
+ case ixgbe_mac_X540:
+ case ixgbe_mac_82599EB:
+ /* Reset SYSTIME registers to 0 */
+ IXGBE_WRITE_REG(hw, IXGBE_SYSTIML, 0);
+ IXGBE_WRITE_REG(hw, IXGBE_SYSTIMH, 0);
+ break;
+ default:
+ /* Other devices aren't supported */
+ return;
+ };
+
+ IXGBE_WRITE_FLUSH(hw);
+}
+
/**
* ixgbe_ptp_reset
* @adapter: the ixgbe private board structure
@@ -1312,6 +1343,8 @@ void ixgbe_ptp_reset(struct ixgbe_adapter *adapter)

ixgbe_ptp_start_cyclecounter(adapter);

+ ixgbe_ptp_init_systime(adapter);
+
spin_lock_irqsave(&adapter->tmreg_lock, flags);
timecounter_init(&adapter->hw_tc, &adapter->hw_cc,
ktime_to_ns(ktime_get_real()));
--
2.35.1



2022-08-29 13:00:20

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 104/158] x86/entry: Fix entry_INT80_compat for Xen PV guests

From: Juergen Gross <[email protected]>

commit 5b9f0c4df1c1152403c738373fb063e9ffdac0a1 upstream.

Commit

c89191ce67ef ("x86/entry: Convert SWAPGS to swapgs and remove the definition of SWAPGS")

missed one use case of SWAPGS in entry_INT80_compat(). Removing of
the SWAPGS macro led to asm just using "swapgs", as it is accepting
instructions in capital letters, too.

This in turn leads to splats in Xen PV guests like:

[ 36.145223] general protection fault, maybe for address 0x2d: 0000 [#1] PREEMPT SMP NOPTI
[ 36.145794] CPU: 2 PID: 1847 Comm: ld-linux.so.2 Not tainted 5.19.1-1-default #1 \
openSUSE Tumbleweed f3b44bfb672cdb9f235aff53b57724eba8b9411b
[ 36.146608] Hardware name: HP ProLiant ML350p Gen8, BIOS P72 11/14/2013
[ 36.148126] RIP: e030:entry_INT80_compat+0x3/0xa3

Fix that by open coding this single instance of the SWAPGS macro.

Fixes: c89191ce67ef ("x86/entry: Convert SWAPGS to swapgs and remove the definition of SWAPGS")
Signed-off-by: Juergen Gross <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Jan Beulich <[email protected]>
Cc: <[email protected]> # 5.19
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/entry/entry_64_compat.S | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -311,7 +311,7 @@ SYM_CODE_START(entry_INT80_compat)
* Interrupts are off on entry.
*/
ASM_CLAC /* Do this early to minimize exposure */
- SWAPGS
+ ALTERNATIVE "swapgs", "", X86_FEATURE_XENPV

/*
* User tracing code (ptrace or signal handlers) might assume that


2022-08-29 13:01:48

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 008/158] kprobes: dont call disarm_kprobe() for disabled kprobes

From: Kuniyuki Iwashima <[email protected]>

commit 9c80e79906b4ca440d09e7f116609262bb747909 upstream.

The assumption in __disable_kprobe() is wrong, and it could try to disarm
an already disarmed kprobe and fire the WARN_ONCE() below. [0] We can
easily reproduce this issue.

1. Write 0 to /sys/kernel/debug/kprobes/enabled.

# echo 0 > /sys/kernel/debug/kprobes/enabled

2. Run execsnoop. At this time, one kprobe is disabled.

# /usr/share/bcc/tools/execsnoop &
[1] 2460
PCOMM PID PPID RET ARGS

# cat /sys/kernel/debug/kprobes/list
ffffffff91345650 r __x64_sys_execve+0x0 [FTRACE]
ffffffff91345650 k __x64_sys_execve+0x0 [DISABLED][FTRACE]

3. Write 1 to /sys/kernel/debug/kprobes/enabled, which changes
kprobes_all_disarmed to false but does not arm the disabled kprobe.

# echo 1 > /sys/kernel/debug/kprobes/enabled

# cat /sys/kernel/debug/kprobes/list
ffffffff91345650 r __x64_sys_execve+0x0 [FTRACE]
ffffffff91345650 k __x64_sys_execve+0x0 [DISABLED][FTRACE]

4. Kill execsnoop, when __disable_kprobe() calls disarm_kprobe() for the
disabled kprobe and hits the WARN_ONCE() in __disarm_kprobe_ftrace().

# fg
/usr/share/bcc/tools/execsnoop
^C

Actually, WARN_ONCE() is fired twice, and __unregister_kprobe_top() misses
some cleanups and leaves the aggregated kprobe in the hash table. Then,
__unregister_trace_kprobe() initialises tk->rp.kp.list and creates an
infinite loop like this.

aggregated kprobe.list -> kprobe.list -.
^ |
'.__.'

In this situation, these commands fall into the infinite loop and result
in RCU stall or soft lockup.

cat /sys/kernel/debug/kprobes/list : show_kprobe_addr() enters into the
infinite loop with RCU.

/usr/share/bcc/tools/execsnoop : warn_kprobe_rereg() holds kprobe_mutex,
and __get_valid_kprobe() is stuck in
the loop.

To avoid the issue, make sure we don't call disarm_kprobe() for disabled
kprobes.

[0]
Failed to disarm kprobe-ftrace at __x64_sys_execve+0x0/0x40 (error -2)
WARNING: CPU: 6 PID: 2460 at kernel/kprobes.c:1130 __disarm_kprobe_ftrace.isra.19 (kernel/kprobes.c:1129)
Modules linked in: ena
CPU: 6 PID: 2460 Comm: execsnoop Not tainted 5.19.0+ #28
Hardware name: Amazon EC2 c5.2xlarge/, BIOS 1.0 10/16/2017
RIP: 0010:__disarm_kprobe_ftrace.isra.19 (kernel/kprobes.c:1129)
Code: 24 8b 02 eb c1 80 3d c4 83 f2 01 00 75 d4 48 8b 75 00 89 c2 48 c7 c7 90 fa 0f 92 89 04 24 c6 05 ab 83 01 e8 e4 94 f0 ff <0f> 0b 8b 04 24 eb b1 89 c6 48 c7 c7 60 fa 0f 92 89 04 24 e8 cc 94
RSP: 0018:ffff9e6ec154bd98 EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffffffff930f7b00 RCX: 0000000000000001
RDX: 0000000080000001 RSI: ffffffff921461c5 RDI: 00000000ffffffff
RBP: ffff89c504286da8 R08: 0000000000000000 R09: c0000000fffeffff
R10: 0000000000000000 R11: ffff9e6ec154bc28 R12: ffff89c502394e40
R13: ffff89c502394c00 R14: ffff9e6ec154bc00 R15: 0000000000000000
FS: 00007fe800398740(0000) GS:ffff89c812d80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000c00057f010 CR3: 0000000103b54006 CR4: 00000000007706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<TASK>
__disable_kprobe (kernel/kprobes.c:1716)
disable_kprobe (kernel/kprobes.c:2392)
__disable_trace_kprobe (kernel/trace/trace_kprobe.c:340)
disable_trace_kprobe (kernel/trace/trace_kprobe.c:429)
perf_trace_event_unreg.isra.2 (./include/linux/tracepoint.h:93 kernel/trace/trace_event_perf.c:168)
perf_kprobe_destroy (kernel/trace/trace_event_perf.c:295)
_free_event (kernel/events/core.c:4971)
perf_event_release_kernel (kernel/events/core.c:5176)
perf_release (kernel/events/core.c:5186)
__fput (fs/file_table.c:321)
task_work_run (./include/linux/sched.h:2056 (discriminator 1) kernel/task_work.c:179 (discriminator 1))
exit_to_user_mode_prepare (./include/linux/resume_user_mode.h:49 kernel/entry/common.c:169 kernel/entry/common.c:201)
syscall_exit_to_user_mode (./arch/x86/include/asm/jump_label.h:55 ./arch/x86/include/asm/nospec-branch.h:384 ./arch/x86/include/asm/entry-common.h:94 kernel/entry/common.c:133 kernel/entry/common.c:296)
do_syscall_64 (arch/x86/entry/common.c:87)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
RIP: 0033:0x7fe7ff210654
Code: 15 79 89 20 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb be 0f 1f 00 8b 05 9a cd 20 00 48 63 ff 85 c0 75 11 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 3a f3 c3 48 83 ec 18 48 89 7c 24 08 e8 34 fc
RSP: 002b:00007ffdbd1d3538 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000000000000008 RCX: 00007fe7ff210654
RDX: 0000000000000000 RSI: 0000000000002401 RDI: 0000000000000008
RBP: 0000000000000000 R08: 94ae31d6fda838a4 R0900007fe8001c9d30
R10: 00007ffdbd1d34b0 R11: 0000000000000246 R12: 00007ffdbd1d3600
R13: 0000000000000000 R14: fffffffffffffffc R15: 00007ffdbd1d3560
</TASK>

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 69d54b916d83 ("kprobes: makes kprobes/enabled works correctly for optimized kprobes.")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Reported-by: Ayushman Dutta <[email protected]>
Cc: "Naveen N. Rao" <[email protected]>
Cc: Anil S Keshavamurthy <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Wang Nan <[email protected]>
Cc: Kuniyuki Iwashima <[email protected]>
Cc: Kuniyuki Iwashima <[email protected]>
Cc: Ayushman Dutta <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
kernel/kprobes.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)

--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1707,11 +1707,12 @@ static struct kprobe *__disable_kprobe(s
/* Try to disarm and disable this/parent probe */
if (p == orig_p || aggr_kprobe_disabled(orig_p)) {
/*
- * If 'kprobes_all_disarmed' is set, 'orig_p'
- * should have already been disarmed, so
- * skip unneed disarming process.
+ * Don't be lazy here. Even if 'kprobes_all_disarmed'
+ * is false, 'orig_p' might not have been armed yet.
+ * Note arm_all_kprobes() __tries__ to arm all kprobes
+ * on the best effort basis.
*/
- if (!kprobes_all_disarmed) {
+ if (!kprobes_all_disarmed && !kprobe_disabled(orig_p)) {
ret = disarm_kprobe(orig_p, true);
if (ret) {
p->flags &= ~KPROBE_FLAG_DISABLED;


2022-08-29 13:02:33

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 060/158] netfilter: nft_osf: restrict osf to ipv4, ipv6 and inet families

From: Pablo Neira Ayuso <[email protected]>

[ Upstream commit 5f3b7aae14a706d0d7da9f9e39def52ff5fc3d39 ]

As it was originally intended, restrict extension to supported families.

Fixes: b96af92d6eaf ("netfilter: nf_tables: implement Passive OS fingerprint module in nft_osf")
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/netfilter/nft_osf.c | 18 +++++++++++++++---
1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/net/netfilter/nft_osf.c b/net/netfilter/nft_osf.c
index 5eed18f90b020..175d666c8d87e 100644
--- a/net/netfilter/nft_osf.c
+++ b/net/netfilter/nft_osf.c
@@ -115,9 +115,21 @@ static int nft_osf_validate(const struct nft_ctx *ctx,
const struct nft_expr *expr,
const struct nft_data **data)
{
- return nft_chain_validate_hooks(ctx->chain, (1 << NF_INET_LOCAL_IN) |
- (1 << NF_INET_PRE_ROUTING) |
- (1 << NF_INET_FORWARD));
+ unsigned int hooks;
+
+ switch (ctx->family) {
+ case NFPROTO_IPV4:
+ case NFPROTO_IPV6:
+ case NFPROTO_INET:
+ hooks = (1 << NF_INET_LOCAL_IN) |
+ (1 << NF_INET_PRE_ROUTING) |
+ (1 << NF_INET_FORWARD);
+ break;
+ default:
+ return -EOPNOTSUPP;
+ }
+
+ return nft_chain_validate_hooks(ctx->chain, hooks);
}

static bool nft_osf_reduce(struct nft_regs_track *track,
--
2.35.1



2022-08-29 13:13:15

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.19 038/158] net: dsa: microchip: move switch chip_id detection to ksz_common

From: Arun Ramadoss <[email protected]>

[ Upstream commit 91a98917a8839923d404a77c21646ca5fc9e330a ]

KSZ87xx and KSZ88xx have chip_id representation at reg location 0. And
KSZ9477 compatible switch and LAN937x switch have same chip_id detection
at location 0x01 and 0x02. To have the common switch detect
functionality for ksz switches, ksz_switch_detect function is
introduced.

Signed-off-by: Arun Ramadoss <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/dsa/microchip/ksz8795.c | 48 +--------------
drivers/net/dsa/microchip/ksz8795_reg.h | 16 -----
drivers/net/dsa/microchip/ksz9477.c | 21 -------
drivers/net/dsa/microchip/ksz9477_reg.h | 1 -
drivers/net/dsa/microchip/ksz_common.c | 78 +++++++++++++++++++++++--
drivers/net/dsa/microchip/ksz_common.h | 19 +++++-
6 files changed, 93 insertions(+), 90 deletions(-)

diff --git a/drivers/net/dsa/microchip/ksz8795.c b/drivers/net/dsa/microchip/ksz8795.c
index 12a599d5e61a4..3cc51ee5fb6cc 100644
--- a/drivers/net/dsa/microchip/ksz8795.c
+++ b/drivers/net/dsa/microchip/ksz8795.c
@@ -1272,7 +1272,7 @@ static void ksz8_config_cpu_port(struct dsa_switch *ds)
continue;
if (!ksz_is_ksz88x3(dev)) {
ksz_pread8(dev, i, regs[P_REMOTE_STATUS], &remote);
- if (remote & PORT_FIBER_MODE)
+ if (remote & KSZ8_PORT_FIBER_MODE)
p->fiber = 1;
}
if (p->fiber)
@@ -1424,51 +1424,6 @@ static u32 ksz8_get_port_addr(int port, int offset)
return PORT_CTRL_ADDR(port, offset);
}

-static int ksz8_switch_detect(struct ksz_device *dev)
-{
- u8 id1, id2;
- u16 id16;
- int ret;
-
- /* read chip id */
- ret = ksz_read16(dev, REG_CHIP_ID0, &id16);
- if (ret)
- return ret;
-
- id1 = id16 >> 8;
- id2 = id16 & SW_CHIP_ID_M;
-
- switch (id1) {
- case KSZ87_FAMILY_ID:
- if ((id2 != CHIP_ID_94 && id2 != CHIP_ID_95))
- return -ENODEV;
-
- if (id2 == CHIP_ID_95) {
- u8 val;
-
- id2 = 0x95;
- ksz_read8(dev, REG_PORT_STATUS_0, &val);
- if (val & PORT_FIBER_MODE)
- id2 = 0x65;
- } else if (id2 == CHIP_ID_94) {
- id2 = 0x94;
- }
- break;
- case KSZ88_FAMILY_ID:
- if (id2 != CHIP_ID_63)
- return -ENODEV;
- break;
- default:
- dev_err(dev->dev, "invalid family id: %d\n", id1);
- return -ENODEV;
- }
- id16 &= ~0xff;
- id16 |= id2;
- dev->chip_id = id16;
-
- return 0;
-}
-
static int ksz8_switch_init(struct ksz_device *dev)
{
struct ksz8 *ksz8 = dev->priv;
@@ -1522,7 +1477,6 @@ static const struct ksz_dev_ops ksz8_dev_ops = {
.freeze_mib = ksz8_freeze_mib,
.port_init_cnt = ksz8_port_init_cnt,
.shutdown = ksz8_reset_switch,
- .detect = ksz8_switch_detect,
.init = ksz8_switch_init,
.exit = ksz8_switch_exit,
};
diff --git a/drivers/net/dsa/microchip/ksz8795_reg.h b/drivers/net/dsa/microchip/ksz8795_reg.h
index 4109433b6b6c2..b8f6ad7581bcd 100644
--- a/drivers/net/dsa/microchip/ksz8795_reg.h
+++ b/drivers/net/dsa/microchip/ksz8795_reg.h
@@ -14,23 +14,10 @@
#define KS_PRIO_M 0x3
#define KS_PRIO_S 2

-#define REG_CHIP_ID0 0x00
-
-#define KSZ87_FAMILY_ID 0x87
-#define KSZ88_FAMILY_ID 0x88
-
-#define REG_CHIP_ID1 0x01
-
-#define SW_CHIP_ID_M 0xF0
-#define SW_CHIP_ID_S 4
#define SW_REVISION_M 0x0E
#define SW_REVISION_S 1
#define SW_START 0x01

-#define CHIP_ID_94 0x60
-#define CHIP_ID_95 0x90
-#define CHIP_ID_63 0x30
-
#define KSZ8863_REG_SW_RESET 0x43

#define KSZ8863_GLOBAL_SOFTWARE_RESET BIT(4)
@@ -217,8 +204,6 @@
#define REG_PORT_4_STATUS_0 0x48

/* For KSZ8765. */
-#define PORT_FIBER_MODE BIT(7)
-
#define PORT_REMOTE_ASYM_PAUSE BIT(5)
#define PORT_REMOTE_SYM_PAUSE BIT(4)
#define PORT_REMOTE_100BTX_FD BIT(3)
@@ -322,7 +307,6 @@

#define REG_PORT_CTRL_5 0x05

-#define REG_PORT_STATUS_0 0x08
#define REG_PORT_STATUS_1 0x09
#define REG_PORT_LINK_MD_CTRL 0x0A
#define REG_PORT_LINK_MD_RESULT 0x0B
diff --git a/drivers/net/dsa/microchip/ksz9477.c b/drivers/net/dsa/microchip/ksz9477.c
index 876a801ac23a4..bcfdd505ca79a 100644
--- a/drivers/net/dsa/microchip/ksz9477.c
+++ b/drivers/net/dsa/microchip/ksz9477.c
@@ -1363,23 +1363,6 @@ static u32 ksz9477_get_port_addr(int port, int offset)
return PORT_CTRL_ADDR(port, offset);
}

-static int ksz9477_switch_detect(struct ksz_device *dev)
-{
- u32 id32;
- int ret;
-
- /* read chip id */
- ret = ksz_read32(dev, REG_CHIP_ID0__1, &id32);
- if (ret)
- return ret;
-
- dev_dbg(dev->dev, "Switch detect: ID=%08x\n", id32);
-
- dev->chip_id = id32 & 0x00FFFF00;
-
- return 0;
-}
-
static int ksz9477_switch_init(struct ksz_device *dev)
{
u8 data8;
@@ -1410,8 +1393,6 @@ static int ksz9477_switch_init(struct ksz_device *dev)
dev->features = GBIT_SUPPORT;

if (dev->chip_id == KSZ9893_CHIP_ID) {
- /* Chip is from KSZ9893 design. */
- dev_info(dev->dev, "Found KSZ9893\n");
dev->features |= IS_9893;

/* Chip does not support gigabit. */
@@ -1419,7 +1400,6 @@ static int ksz9477_switch_init(struct ksz_device *dev)
dev->features &= ~GBIT_SUPPORT;
dev->phy_port_cnt = 2;
} else {
- dev_info(dev->dev, "Found KSZ9477 or compatible\n");
/* Chip uses new XMII register definitions. */
dev->features |= NEW_XMII;

@@ -1446,7 +1426,6 @@ static const struct ksz_dev_ops ksz9477_dev_ops = {
.freeze_mib = ksz9477_freeze_mib,
.port_init_cnt = ksz9477_port_init_cnt,
.shutdown = ksz9477_reset_switch,
- .detect = ksz9477_switch_detect,
.init = ksz9477_switch_init,
.exit = ksz9477_switch_exit,
};
diff --git a/drivers/net/dsa/microchip/ksz9477_reg.h b/drivers/net/dsa/microchip/ksz9477_reg.h
index 7a2c8d4767aff..077e35ab11b54 100644
--- a/drivers/net/dsa/microchip/ksz9477_reg.h
+++ b/drivers/net/dsa/microchip/ksz9477_reg.h
@@ -25,7 +25,6 @@

#define REG_CHIP_ID2__1 0x0002

-#define CHIP_ID_63 0x63
#define CHIP_ID_66 0x66
#define CHIP_ID_67 0x67
#define CHIP_ID_77 0x77
diff --git a/drivers/net/dsa/microchip/ksz_common.c b/drivers/net/dsa/microchip/ksz_common.c
index 92a500e1ccd21..4511e99823f57 100644
--- a/drivers/net/dsa/microchip/ksz_common.c
+++ b/drivers/net/dsa/microchip/ksz_common.c
@@ -930,6 +930,72 @@ void ksz_port_stp_state_set(struct dsa_switch *ds, int port,
}
EXPORT_SYMBOL_GPL(ksz_port_stp_state_set);

+static int ksz_switch_detect(struct ksz_device *dev)
+{
+ u8 id1, id2;
+ u16 id16;
+ u32 id32;
+ int ret;
+
+ /* read chip id */
+ ret = ksz_read16(dev, REG_CHIP_ID0, &id16);
+ if (ret)
+ return ret;
+
+ id1 = FIELD_GET(SW_FAMILY_ID_M, id16);
+ id2 = FIELD_GET(SW_CHIP_ID_M, id16);
+
+ switch (id1) {
+ case KSZ87_FAMILY_ID:
+ if (id2 == KSZ87_CHIP_ID_95) {
+ u8 val;
+
+ dev->chip_id = KSZ8795_CHIP_ID;
+
+ ksz_read8(dev, KSZ8_PORT_STATUS_0, &val);
+ if (val & KSZ8_PORT_FIBER_MODE)
+ dev->chip_id = KSZ8765_CHIP_ID;
+ } else if (id2 == KSZ87_CHIP_ID_94) {
+ dev->chip_id = KSZ8794_CHIP_ID;
+ } else {
+ return -ENODEV;
+ }
+ break;
+ case KSZ88_FAMILY_ID:
+ if (id2 == KSZ88_CHIP_ID_63)
+ dev->chip_id = KSZ8830_CHIP_ID;
+ else
+ return -ENODEV;
+ break;
+ default:
+ ret = ksz_read32(dev, REG_CHIP_ID0, &id32);
+ if (ret)
+ return ret;
+
+ dev->chip_rev = FIELD_GET(SW_REV_ID_M, id32);
+ id32 &= ~0xFF;
+
+ switch (id32) {
+ case KSZ9477_CHIP_ID:
+ case KSZ9897_CHIP_ID:
+ case KSZ9893_CHIP_ID:
+ case KSZ9567_CHIP_ID:
+ case LAN9370_CHIP_ID:
+ case LAN9371_CHIP_ID:
+ case LAN9372_CHIP_ID:
+ case LAN9373_CHIP_ID:
+ case LAN9374_CHIP_ID:
+ dev->chip_id = id32;
+ break;
+ default:
+ dev_err(dev->dev,
+ "unsupported switch detected %x)\n", id32);
+ return -ENODEV;
+ }
+ }
+ return 0;
+}
+
struct ksz_device *ksz_switch_alloc(struct device *base, void *priv)
{
struct dsa_switch *ds;
@@ -986,10 +1052,9 @@ int ksz_switch_register(struct ksz_device *dev,
mutex_init(&dev->alu_mutex);
mutex_init(&dev->vlan_mutex);

- dev->dev_ops = ops;
-
- if (dev->dev_ops->detect(dev))
- return -EINVAL;
+ ret = ksz_switch_detect(dev);
+ if (ret)
+ return ret;

info = ksz_lookup_info(dev->chip_id);
if (!info)
@@ -998,10 +1063,15 @@ int ksz_switch_register(struct ksz_device *dev,
/* Update the compatible info with the probed one */
dev->info = info;

+ dev_info(dev->dev, "found switch: %s, rev %i\n",
+ dev->info->dev_name, dev->chip_rev);
+
ret = ksz_check_device_id(dev);
if (ret)
return ret;

+ dev->dev_ops = ops;
+
ret = dev->dev_ops->init(dev);
if (ret)
return ret;
diff --git a/drivers/net/dsa/microchip/ksz_common.h b/drivers/net/dsa/microchip/ksz_common.h
index 8500eaedad67a..e6bc5fb2b1303 100644
--- a/drivers/net/dsa/microchip/ksz_common.h
+++ b/drivers/net/dsa/microchip/ksz_common.h
@@ -90,6 +90,7 @@ struct ksz_device {

/* chip specific data */
u32 chip_id;
+ u8 chip_rev;
int cpu_port; /* port connected to CPU */
int phy_port_cnt;
phy_interface_t compat_interface;
@@ -182,7 +183,6 @@ struct ksz_dev_ops {
void (*freeze_mib)(struct ksz_device *dev, int port, bool freeze);
void (*port_init_cnt)(struct ksz_device *dev, int port);
int (*shutdown)(struct ksz_device *dev);
- int (*detect)(struct ksz_device *dev);
int (*init)(struct ksz_device *dev);
void (*exit)(struct ksz_device *dev);
};
@@ -353,6 +353,23 @@ static inline void ksz_regmap_unlock(void *__mtx)
#define PORT_RX_ENABLE BIT(1)
#define PORT_LEARN_DISABLE BIT(0)

+/* Switch ID Defines */
+#define REG_CHIP_ID0 0x00
+
+#define SW_FAMILY_ID_M GENMASK(15, 8)
+#define KSZ87_FAMILY_ID 0x87
+#define KSZ88_FAMILY_ID 0x88
+
+#define KSZ8_PORT_STATUS_0 0x08
+#define KSZ8_PORT_FIBER_MODE BIT(7)
+
+#define SW_CHIP_ID_M GENMASK(7, 4)
+#define KSZ87_CHIP_ID_94 0x6
+#define KSZ87_CHIP_ID_95 0x9
+#define KSZ88_CHIP_ID_63 0x3
+
+#define SW_REV_ID_M GENMASK(7, 4)
+
/* Regmap tables generation */
#define KSZ_SPI_OP_RD 3
#define KSZ_SPI_OP_WR 2
--
2.35.1



2022-08-29 18:38:32

by Florian Fainelli

[permalink] [raw]
Subject: Re: [PATCH 5.19 000/158] 5.19.6-rc1 review



On 8/29/2022 3:57 AM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 5.19.6 release.
> There are 158 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed, 31 Aug 2022 10:57:37 +0000.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.19.6-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.19.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h

On ARCH_BRCMSTB using 32-bit and 64-bit ARM kernels and build tested on
BMIPS_GENERIC:

Tested-by: Florian Fainelli <[email protected]>
--
Florian

2022-08-29 21:09:43

by Ron Economos

[permalink] [raw]
Subject: Re: [PATCH 5.19 000/158] 5.19.6-rc1 review

On 8/29/22 3:57 AM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 5.19.6 release.
> There are 158 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed, 31 Aug 2022 10:57:37 +0000.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.19.6-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.19.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h

Built and booted successfully on RISC-V RV64 (HiFive Unmatched).

Tested-by: Ron Economos <[email protected]>

2022-08-29 22:16:43

by Shuah Khan

[permalink] [raw]
Subject: Re: [PATCH 5.19 000/158] 5.19.6-rc1 review

On 8/29/22 04:57, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 5.19.6 release.
> There are 158 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed, 31 Aug 2022 10:57:37 +0000.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.19.6-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.19.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h
>

Compiled and booted on my test system. No dmesg regressions.

Tested-by: Shuah Khan <[email protected]>

thanks,
-- Shuah

2022-08-29 23:33:04

by Zan Aziz

[permalink] [raw]
Subject: Re: [PATCH 5.19 000/158] 5.19.6-rc1 review

On Mon, Aug 29, 2022 at 1:03 PM Greg Kroah-Hartman
<[email protected]> wrote:
>
> This is the start of the stable review cycle for the 5.19.6 release.
> There are 158 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed, 31 Aug 2022 10:57:37 +0000.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.19.6-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.19.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h

Hi Greg,

Compiled and booted on my test system Lenovo P50s: Intel Core i7
No emergency and critical messages in the dmesg

./perf bench sched all
# Running sched/messaging benchmark...
# 20 sender and receiver processes per group
# 10 groups == 400 processes run

Total time: 0.692 [sec]

# Running sched/pipe benchmark...
# Executed 1000000 pipe operations between two processes

Total time: 8.521 [sec]

8.521891 usecs/op
117344 ops/sec

Tested-by: Zan Aziz <[email protected]>

Thanks
-Zan

2022-08-30 01:43:06

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH 5.19 000/158] 5.19.6-rc1 review

On Mon, Aug 29, 2022 at 12:57:30PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 5.19.6 release.
> There are 158 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed, 31 Aug 2022 10:57:37 +0000.
> Anything received after that time might be too late.
>

Build results:
total: 150 pass: 150 fail: 0
Qemu test results:
total: 489 pass: 489 fail: 0

Tested-by: Guenter Roeck <[email protected]>

Guenter

2022-08-30 02:38:43

by Ronald Warsow

[permalink] [raw]
Subject: Re: [PATCH 5.19 000/158] 5.19.6-rc1 review

hallo Greg

5.19.6-rc1

compiles, boots and runs here on x86_64
(Intel i5-11400, Fedora 36)

Thanks

Tested-by: Ronald Warsow <[email protected]>

2022-08-30 03:02:17

by Daniel Díaz

[permalink] [raw]
Subject: Re: [PATCH 5.19 000/158] 5.19.6-rc1 review

Hello!

On 29/08/22 05:57, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 5.19.6 release.
> There are 158 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed, 31 Aug 2022 10:57:37 +0000.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.19.6-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.19.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h

Results from Linaro's test farm.
No regressions on arm64, arm, x86_64, and i386.

Tested-by: Linux Kernel Functional Testing <[email protected]>

## Build
* kernel: 5.19.6-rc1
* git: https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc
* git branch: linux-5.19.y
* git commit: 5cfa6cf5bd86ffb0f956ff2ac4876ec6905dd3cf
* git describe: v5.19.4-161-g5cfa6cf5bd86
* test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.19.y/build/v5.19.4-161-g5cfa6cf5bd86

## No test regressions (compared to v5.19.5)

## No metric regressions (compared to v5.19.5)

## No test fixes (compared to v5.19.5)

## No metric fixes (compared to v5.19.5)

## Test result summary
total: 112530, pass: 101420, fail: 894, skip: 9953, xfail: 263

## Build Summary
* arc: 10 total, 10 passed, 0 failed
* arm: 306 total, 303 passed, 3 failed
* arm64: 68 total, 65 passed, 3 failed
* i386: 57 total, 51 passed, 6 failed
* mips: 50 total, 47 passed, 3 failed
* parisc: 14 total, 14 passed, 0 failed
* powerpc: 65 total, 56 passed, 9 failed
* riscv: 32 total, 26 passed, 6 failed
* s390: 22 total, 20 passed, 2 failed
* sh: 26 total, 24 passed, 2 failed
* sparc: 14 total, 14 passed, 0 failed
* x86_64: 61 total, 58 passed, 3 failed

## Test suites summary
* fwts
* igt-gpu-tools
* kunit
* kvm-unit-tests
* libgpiod
* libhugetlbfs
* log-parser-boot
* log-parser-test
* ltp-cap_bounds
* ltp-commands
* ltp-containers
* ltp-controllers
* ltp-cpuhotplug
* ltp-crypto
* ltp-cve
* ltp-dio
* ltp-fcntl-locktests
* ltp-filecaps
* ltp-fs
* ltp-fs_bind
* ltp-fs_perms_simple
* ltp-fsx
* ltp-hugetlb
* ltp-io
* ltp-ipc
* ltp-math
* ltp-mm
* ltp-nptl
* ltp-open-posix-tests
* ltp-pty
* ltp-sched
* ltp-securebits
* ltp-smoke
* ltp-syscalls
* ltp-tracing
* network-basic-tests
* packetdrill
* rcutorture
* v4l2-compliance
* vdso


Greetings!

Daniel Díaz
[email protected]

--
Linaro LKFT
https://lkft.linaro.org

2022-08-30 11:17:34

by Sudip Mukherjee

[permalink] [raw]
Subject: Re: [PATCH 5.19 000/158] 5.19.6-rc1 review

Hi Greg,

On Mon, Aug 29, 2022 at 12:57:30PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 5.19.6 release.
> There are 158 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed, 31 Aug 2022 10:57:37 +0000.
> Anything received after that time might be too late.

Build test (gcc version 12.2.1 20220819):
mips: 59 configs -> 1 failure
arm: 99 configs -> no failure
arm64: 3 configs -> no failure
x86_64: 4 configs -> no failure
alpha allmodconfig -> no failure
csky allmodconfig -> fails
powerpc allmodconfig -> no failure
riscv allmodconfig -> no failure
s390 allmodconfig -> no failure
xtensa allmodconfig -> no failure

mips and csky are known failure. Fix not yet in mainline.

Boot test:
x86_64: Booted on my test laptop. No regression.
x86_64: Booted on qemu. No regression. [1]
arm64: Booted on rpi4b (4GB model). No regression. [2]
mips: Booted on ci20 board. No regression. [3]

DRM warnings in rpi4b, now fixed in mainline.
Will need:
258e483a4d5e ("drm/vc4: hdmi: Rework power up")
72e2329e7c9b ("drm/vc4: hdmi: Depends on CONFIG_PM")

[1]. https://openqa.qa.codethink.co.uk/tests/1731
[2]. https://openqa.qa.codethink.co.uk/tests/1734
[3]. https://openqa.qa.codethink.co.uk/tests/1736

Tested-by: Sudip Mukherjee <[email protected]>

--
Regards
Sudip

2022-08-30 12:22:16

by Bagas Sanjaya

[permalink] [raw]
Subject: Re: [PATCH 5.19 000/158] 5.19.6-rc1 review

On Mon, Aug 29, 2022 at 12:57:30PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 5.19.6 release.
> There are 158 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>

Successfully cross-compiled for arm64 (bcm2711_defconfig, GCC 10.2.0) and
powerpc (ps3_defconfig, GCC 12.1.0).

Tested-by: Bagas Sanjaya <[email protected]>

--
An old man doll... just what I always wanted! - Clara


Attachments:
(No filename) (538.00 B)
signature.asc (235.00 B)
Download all attachments

2022-08-30 12:28:30

by Bagas Sanjaya

[permalink] [raw]
Subject: Re: [PATCH 5.19 000/158] 5.19.6-rc1 review

On Mon, Aug 29, 2022 at 12:57:30PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 5.19.6 release.
> There are 158 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>

Successfully cross-compiled for arm64 (bcm2711_defconfig, GCC 10.2.0) and
powerpc (ps3_defconfig, GCC 12.1.0).

Tested-by: Bagas Sanjaya <[email protected]>

--
An old man doll... just what I always wanted! - Clara


Attachments:
(No filename) (539.00 B)
signature.asc (235.00 B)
Download all attachments

2022-08-30 14:51:53

by Rudi Heitbaum

[permalink] [raw]
Subject: Re: [PATCH 5.19 000/158] 5.19.6-rc1 review

On Mon, Aug 29, 2022 at 12:57:30PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 5.19.6 release.
> There are 158 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed, 31 Aug 2022 10:57:37 +0000.
> Anything received after that time might be too late.

Hi Greg,

5.19.6-rc1 tested.

Run tested on:
- Intel Tiger Lake x86_64 (nuc11 i7-1165G7)

In addition - build tested for:
- Allwinner A64
- Allwinner H3
- Allwinner H5
- Allwinner H6
- NXP iMX6
- NXP iMX8
- Qualcomm Dragonboard
- Rockchip RK3288
- Rockchip RK3328
- Rockchip RK3399pro
- Samsung Exynos

Tested-by: Rudi Heitbaum <[email protected]>
--
Rudi

2022-08-30 22:06:34

by Justin Forbes

[permalink] [raw]
Subject: Re: [PATCH 5.19 000/158] 5.19.6-rc1 review

On Mon, Aug 29, 2022 at 12:57:30PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 5.19.6 release.
> There are 158 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed, 31 Aug 2022 10:57:37 +0000.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.19.6-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.19.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h

Tested rc1 against the Fedora build system (aarch64, armv7, ppc64le,
s390x, x86_64), and boot tested x86_64. No regressions noted.

Tested-by: Justin M. Forbes <[email protected]>

2022-08-31 06:03:26

by Jiri Slaby

[permalink] [raw]
Subject: Re: [PATCH 5.19 000/158] 5.19.6-rc1 review

On 29. 08. 22, 12:57, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 5.19.6 release.
> There are 158 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed, 31 Aug 2022 10:57:37 +0000.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.19.6-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.19.y
> and the diffstat can be found below.

openSUSE configs¹⁾ all green.

Tested-by: Jiri Slaby <[email protected]>

¹⁾ armv6hl armv7hl arm64 i386 ppc64 ppc64le riscv64 s390x x86_64

--
js
suse labs

2022-09-01 09:58:05

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 5.19 000/158] 5.19.6-rc1 review

On Tue, Aug 30, 2022 at 11:37:26AM +0100, Sudip Mukherjee (Codethink) wrote:
> Hi Greg,
>
> On Mon, Aug 29, 2022 at 12:57:30PM +0200, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 5.19.6 release.
> > There are 158 patches in this series, all will be posted as a response
> > to this one. If anyone has any issues with these being applied, please
> > let me know.
> >
> > Responses should be made by Wed, 31 Aug 2022 10:57:37 +0000.
> > Anything received after that time might be too late.
>
> Build test (gcc version 12.2.1 20220819):
> mips: 59 configs -> 1 failure
> arm: 99 configs -> no failure
> arm64: 3 configs -> no failure
> x86_64: 4 configs -> no failure
> alpha allmodconfig -> no failure
> csky allmodconfig -> fails
> powerpc allmodconfig -> no failure
> riscv allmodconfig -> no failure
> s390 allmodconfig -> no failure
> xtensa allmodconfig -> no failure
>
> mips and csky are known failure. Fix not yet in mainline.
>
> Boot test:
> x86_64: Booted on my test laptop. No regression.
> x86_64: Booted on qemu. No regression. [1]
> arm64: Booted on rpi4b (4GB model). No regression. [2]
> mips: Booted on ci20 board. No regression. [3]
>
> DRM warnings in rpi4b, now fixed in mainline.
> Will need:
> 258e483a4d5e ("drm/vc4: hdmi: Rework power up")
> 72e2329e7c9b ("drm/vc4: hdmi: Depends on CONFIG_PM")

Both now queued up, thanks!

greg k-h

2022-09-02 18:24:31

by Chuck Zmudzinski

[permalink] [raw]
Subject: Re: [PATCH 5.19 109/158] x86/PAT: Have pat_enabled() properly reflect state when running on Xen

On 8/29/2022 6:59 AM, Greg Kroah-Hartman wrote:
> From: Jan Beulich <[email protected]>
>
> commit 72cbc8f04fe2fa93443c0fcccb7ad91dfea3d9ce upstream.
>
> After commit ID in the Fixes: tag, pat_enabled() returns false (because
> of PAT initialization being suppressed in the absence of MTRRs being
> announced to be available).
>
> This has become a problem: the i915 driver now fails to initialize when
> running PV on Xen (i915_gem_object_pin_map() is where I located the
> induced failure), and its error handling is flaky enough to (at least
> sometimes) result in a hung system.
>
> Yet even beyond that problem the keying of the use of WC mappings to
> pat_enabled() (see arch_can_pci_mmap_wc()) means that in particular
> graphics frame buffer accesses would have been quite a bit less optimal
> than possible.
>
> Arrange for the function to return true in such environments, without
> undermining the rest of PAT MSR management logic considering PAT to be
> disabled: specifically, no writes to the PAT MSR should occur.
>
> For the new boolean to live in .init.data, init_cache_modes() also needs
> moving to .init.text (where it could/should have lived already before).
>
> [ bp: This is the "small fix" variant for stable. It'll get replaced
> with a proper PAT and MTRR detection split upstream but that is too
> involved for a stable backport.
> - additional touchups to commit msg. Use cpu_feature_enabled(). ]
>
> Fixes: bdd8b6c98239 ("drm/i915: replace X86_FEATURE_PAT with pat_enabled()")
> Signed-off-by: Jan Beulich <[email protected]>
> Signed-off-by: Borislav Petkov <[email protected]>
> Acked-by: Ingo Molnar <[email protected]>
> Cc: <[email protected]>
> Cc: Juergen Gross <[email protected]>
> Cc: Lucas De Marchi <[email protected]>
> Link: https://lore.kernel.org/r/[email protected]
> Signed-off-by: Greg Kroah-Hartman <[email protected]>
> ---
> arch/x86/mm/pat/memtype.c | 10 +++++++++-
> 1 file changed, 9 insertions(+), 1 deletion(-)
>
> --- a/arch/x86/mm/pat/memtype.c
> +++ b/arch/x86/mm/pat/memtype.c
> @@ -62,6 +62,7 @@
>
> static bool __read_mostly pat_bp_initialized;
> static bool __read_mostly pat_disabled = !IS_ENABLED(CONFIG_X86_PAT);
> +static bool __initdata pat_force_disabled = !IS_ENABLED(CONFIG_X86_PAT);
> static bool __read_mostly pat_bp_enabled;
> static bool __read_mostly pat_cm_initialized;
>
> @@ -86,6 +87,7 @@ void pat_disable(const char *msg_reason)
> static int __init nopat(char *str)
> {
> pat_disable("PAT support disabled via boot option.");
> + pat_force_disabled = true;
> return 0;
> }
> early_param("nopat", nopat);
> @@ -272,7 +274,7 @@ static void pat_ap_init(u64 pat)
> wrmsrl(MSR_IA32_CR_PAT, pat);
> }
>
> -void init_cache_modes(void)
> +void __init init_cache_modes(void)
> {
> u64 pat = 0;
>
> @@ -313,6 +315,12 @@ void init_cache_modes(void)
> */
> pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) |
> PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
> + } else if (!pat_force_disabled && cpu_feature_enabled(X86_FEATURE_HYPERVISOR)) {
> + /*
> + * Clearly PAT is enabled underneath. Allow pat_enabled() to
> + * reflect this.
> + */
> + pat_bp_enabled = true;
> }
>
> __init_cache_modes(pat);
>
>

Dear Boris, Jan, Juergen, Thorsten, and Greg,

Just a note to say thanks to everyone who worked on this regression.
I just upgraded my Xen PV dom0 on fc36 that was affected with the
Fedora build of 5.19.6, which has the fix, and it works fine.

Thanks again,

Chuck