2022-08-29 11:51:20

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 00/86] 5.10.140-rc1 review

This is the start of the stable review cycle for the 5.10.140 release.
There are 86 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.

Responses should be made by Wed, 31 Aug 2022 10:57:37 +0000.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.140-rc1.gz
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y
and the diffstat can be found below.

thanks,

greg k-h

-------------
Pseudo-Shortlog of commits:

Greg Kroah-Hartman <[email protected]>
Linux 5.10.140-rc1

Daniel Borkmann <[email protected]>
bpf: Don't use tnum_range on array range checking for poke descriptors

Saurabh Sengar <[email protected]>
scsi: storvsc: Remove WQ_MEM_RECLAIM from storvsc_error_wq

Kiwoong Kim <[email protected]>
scsi: ufs: core: Enable link lost interrupt

Stephane Eranian <[email protected]>
perf/x86/intel/uncore: Fix broken read_counter() for SNB IMC PMU

James Clark <[email protected]>
perf python: Fix build when PYTHON_CONFIG is user supplied

Yu Kuai <[email protected]>
blk-mq: fix io hung due to missing commit_rqs

Salvatore Bonaccorso <[email protected]>
Documentation/ABI: Mention retbleed vulnerability info file for sysfs

Zenghui Yu <[email protected]>
arm64: Fix match_list for erratum 1286807 on Arm Cortex-A76

Guoqing Jiang <[email protected]>
md: call __md_stop_writes in md_stop

Guoqing Jiang <[email protected]>
Revert "md-raid: destroy the bitmap after destroying the thread"

David Hildenbrand <[email protected]>
mm/hugetlb: fix hugetlb not supporting softdirty tracking

Juergen Gross <[email protected]>
xen/privcmd: fix error exit of privcmd_ioctl_dm_op()

Riwen Lu <[email protected]>
ACPI: processor: Remove freq Qos request for all CPUs

Brian Foster <[email protected]>
s390: fix double free of GS and RI CBs on fork() failure

Quanyang Wang <[email protected]>
asm-generic: sections: refactor memory_intersects

Siddh Raman Pant <[email protected]>
loop: Check for overflow while configuring loop

Pawan Gupta <[email protected]>
x86/bugs: Add "unknown" reporting for MMIO Stale Data

Chen Zhongjin <[email protected]>
x86/unwind/orc: Unwind ftrace trampolines with correct ORC entry

Kan Liang <[email protected]>
perf/x86/lbr: Enable the branch type for the Arch LBR by default

Goldwyn Rodrigues <[email protected]>
btrfs: check if root is readonly while setting security xattr

Anand Jain <[email protected]>
btrfs: add info when mount fails due to stale replace target

Anand Jain <[email protected]>
btrfs: replace: drop assert for suspended replace

Filipe Manana <[email protected]>
btrfs: fix silent failure when deleting root reference

Shannon Nelson <[email protected]>
ionic: fix up issues with handling EAGAIN on FW cmds

David Howells <[email protected]>
rxrpc: Fix locking in rxrpc's sendmsg

Jacob Keller <[email protected]>
ixgbe: stop resetting SYSTIME in ixgbe_ptp_start_cyclecounter

Kuniyuki Iwashima <[email protected]>
net: Fix a data-race around sysctl_somaxconn.

Kuniyuki Iwashima <[email protected]>
net: Fix data-races around sysctl_devconf_inherit_init_net.

Kuniyuki Iwashima <[email protected]>
net: Fix data-races around sysctl_fb_tunnels_only_for_init_net.

Kuniyuki Iwashima <[email protected]>
net: Fix a data-race around netdev_budget_usecs.

Kuniyuki Iwashima <[email protected]>
net: Fix a data-race around netdev_budget.

Kuniyuki Iwashima <[email protected]>
net: Fix a data-race around sysctl_net_busy_read.

Kuniyuki Iwashima <[email protected]>
net: Fix a data-race around sysctl_net_busy_poll.

Kuniyuki Iwashima <[email protected]>
net: Fix a data-race around sysctl_tstamp_allow_data.

Kuniyuki Iwashima <[email protected]>
net: Fix data-races around sysctl_optmem_max.

Martin KaFai Lau <[email protected]>
bpf: Folding omem_charge() into sk_storage_charge()

Kuniyuki Iwashima <[email protected]>
ratelimit: Fix data-races in ___ratelimit().

Kuniyuki Iwashima <[email protected]>
net: Fix data-races around netdev_tstamp_prequeue.

Kuniyuki Iwashima <[email protected]>
net: Fix data-races around netdev_max_backlog.

Kuniyuki Iwashima <[email protected]>
net: Fix data-races around weight_p and dev_weight_[rt]x_bias.

Kuniyuki Iwashima <[email protected]>
net: Fix data-races around sysctl_[rw]mem_(max|default).

Kuniyuki Iwashima <[email protected]>
net: Fix data-races around sysctl_[rw]mem(_offset)?.

Eric Dumazet <[email protected]>
tcp: tweak len/truesize ratio for coalesce candidates

Pablo Neira Ayuso <[email protected]>
netfilter: nf_tables: disallow binding to already bound chain

Pablo Neira Ayuso <[email protected]>
netfilter: nf_tables: disallow jump to implicit chain from set element

Pablo Neira Ayuso <[email protected]>
netfilter: nf_tables: upfront validation of data via nft_data_init()

Jeremy Sowden <[email protected]>
netfilter: bitwise: improve error goto labels

Pablo Neira Ayuso <[email protected]>
netfilter: nft_cmp: optimize comparison for 16-bytes

Pablo Neira Ayuso <[email protected]>
netfilter: nf_tables: consolidate rule verdict trace call

Colin Ian King <[email protected]>
netfilter: nftables: remove redundant assignment of variable err

Pablo Neira Ayuso <[email protected]>
netfilter: nft_tunnel: restrict it to netdev family

Pablo Neira Ayuso <[email protected]>
netfilter: nft_osf: restrict osf to ipv4, ipv6 and inet families

Pablo Neira Ayuso <[email protected]>
netfilter: nf_tables: do not leave chain stats enabled on error

Pablo Neira Ayuso <[email protected]>
netfilter: nft_payload: do not truncate csum_offset and csum_type

Pablo Neira Ayuso <[email protected]>
netfilter: nft_payload: report ERANGE for too long offset and length

Vikas Gupta <[email protected]>
bnxt_en: fix NQ resource accounting during vf creation on 57500 chips

Florian Westphal <[email protected]>
netfilter: ebtables: reject blobs that don't provide all entry points

Maciej Żenczykowski <[email protected]>
net: ipvtap - add __init/__exit annotations to module init/exit funcs

Jonathan Toppins <[email protected]>
bonding: 802.3ad: fix no transmission of LACPDUs

Sergei Antonov <[email protected]>
net: moxa: get rid of asymmetry in DMA mapping/unmapping

Alex Elder <[email protected]>
net: ipa: don't assume SMEM is page-aligned

Vlad Buslov <[email protected]>
net/mlx5e: Properly disable vlan strip on non-UL reps

Maciej Fijalkowski <[email protected]>
ice: xsk: prohibit usage of non-balanced queue id

Maciej Fijalkowski <[email protected]>
ice: xsk: Force rings to be sized to power of 2

Duoming Zhou <[email protected]>
nfc: pn533: Fix use-after-free bugs caused by pn532_cmd_timeout

Bernard Pidoux <[email protected]>
rose: check NULL rose_loopback_neigh->loopback

Peter Xu <[email protected]>
mm/smaps: don't access young/dirty bit if pte unpresent

Miaohe Lin <[email protected]>
mm/huge_memory.c: use helper function migration_entry_to_page()

Trond Myklebust <[email protected]>
SUNRPC: RPC level errors should set task->tk_rpc_status

Olga Kornievskaia <[email protected]>
NFSv4.2 fix problems with __nfs42_ssc_open

Trond Myklebust <[email protected]>
NFS: Don't allocate nfs_fattr on the stack in __nfs42_ssc_open()

Nikolay Aleksandrov <[email protected]>
xfrm: policy: fix metadata dst->dev xmit null pointer dereference

Herbert Xu <[email protected]>
af_key: Do not call xfrm_probe_algs in parallel

Antony Antony <[email protected]>
xfrm: clone missing x->lastused in xfrm_do_migrate

Xin Xiong <[email protected]>
xfrm: fix refcount leak in __xfrm_policy_check()

Hui Su <[email protected]>
kernel/sched: Remove dl_boosted flag comment

Darrick J. Wong <[email protected]>
xfs: only bother with sync_filesystem during readonly remount

Darrick J. Wong <[email protected]>
xfs: return errors in xfs_fs_sync_fs

Darrick J. Wong <[email protected]>
vfs: make sync_filesystem return errors from ->sync_fs

Christoph Hellwig <[email protected]>
fs: remove __sync_filesystem

Darrick J. Wong <[email protected]>
xfs: reject crazy array sizes being fed to XFS_IOC_GETBMAP*

Dan Carpenter <[email protected]>
xfs: prevent a WARN_ONCE() in xfs_ioc_attr_list()

Basavaraj Natikar <[email protected]>
pinctrl: amd: Don't save/restore interrupt status and wake status bits

Randy Dunlap <[email protected]>
kernel/sys_ni: add compat entry for fadvise64_64

Helge Deller <[email protected]>
parisc: Fix exception handler for fldw and fstw instructions

Gaosheng Cui <[email protected]>
audit: fix potential double free on error path from fsnotify_add_inode_mark


-------------

Diffstat:

Documentation/ABI/testing/sysfs-devices-system-cpu | 1 +
.../hw-vuln/processor_mmio_stale_data.rst | 14 ++
Documentation/admin-guide/sysctl/net.rst | 2 +-
Makefile | 4 +-
arch/arm64/kernel/cpu_errata.c | 2 +
arch/parisc/kernel/unaligned.c | 2 +-
arch/s390/kernel/process.c | 22 +++-
arch/x86/events/intel/lbr.c | 8 ++
arch/x86/events/intel/uncore_snb.c | 18 ++-
arch/x86/include/asm/cpufeatures.h | 5 +-
arch/x86/kernel/cpu/bugs.c | 14 +-
arch/x86/kernel/cpu/common.c | 42 +++---
arch/x86/kernel/unwind_orc.c | 15 ++-
block/blk-mq.c | 5 +-
drivers/acpi/processor_thermal.c | 2 +-
drivers/block/loop.c | 5 +
drivers/md/md.c | 3 +-
drivers/net/bonding/bond_3ad.c | 38 +++---
drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c | 2 +-
drivers/net/ethernet/intel/ice/ice_xsk.c | 14 ++
drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c | 59 +++++++--
drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 2 +
drivers/net/ethernet/moxa/moxart_ether.c | 11 +-
drivers/net/ethernet/pensando/ionic/ionic_main.c | 4 +-
drivers/net/ipa/ipa_mem.c | 2 +-
drivers/net/ipvlan/ipvtap.c | 4 +-
drivers/nfc/pn533/uart.c | 1 +
drivers/pinctrl/pinctrl-amd.c | 11 +-
drivers/scsi/storvsc_drv.c | 2 +-
drivers/scsi/ufs/ufshci.h | 6 +-
drivers/xen/privcmd.c | 21 +--
fs/btrfs/dev-replace.c | 5 +-
fs/btrfs/root-tree.c | 5 +-
fs/btrfs/xattr.c | 3 +
fs/nfs/nfs4file.c | 16 ++-
fs/proc/task_mmu.c | 7 +-
fs/sync.c | 48 +++----
fs/xfs/xfs_ioctl.c | 4 +-
fs/xfs/xfs_ioctl.h | 5 +-
fs/xfs/xfs_super.c | 13 +-
include/asm-generic/sections.h | 7 +-
include/linux/netdevice.h | 20 ++-
include/linux/netfilter_bridge/ebtables.h | 4 -
include/linux/sched.h | 4 -
include/net/busy_poll.h | 2 +-
include/net/netfilter/nf_tables.h | 9 +-
include/net/netfilter/nf_tables_core.h | 9 ++
include/net/sock.h | 8 +-
kernel/audit_fsnotify.c | 1 +
kernel/bpf/verifier.c | 10 +-
kernel/sys_ni.c | 1 +
lib/ratelimit.c | 12 +-
mm/huge_memory.c | 4 +-
mm/mmap.c | 8 +-
net/bridge/netfilter/ebtable_broute.c | 8 --
net/bridge/netfilter/ebtable_filter.c | 8 --
net/bridge/netfilter/ebtable_nat.c | 8 --
net/bridge/netfilter/ebtables.c | 8 +-
net/core/bpf_sk_storage.c | 24 ++--
net/core/dev.c | 18 +--
net/core/filter.c | 13 +-
net/core/gro_cells.c | 2 +-
net/core/skbuff.c | 2 +-
net/core/sock.c | 18 +--
net/core/sysctl_net_core.c | 15 ++-
net/decnet/af_decnet.c | 4 +-
net/ipv4/devinet.c | 16 ++-
net/ipv4/ip_output.c | 2 +-
net/ipv4/ip_sockglue.c | 6 +-
net/ipv4/tcp.c | 6 +-
net/ipv4/tcp_input.c | 51 ++++++--
net/ipv4/tcp_output.c | 4 +-
net/ipv6/addrconf.c | 5 +-
net/ipv6/ipv6_sockglue.c | 4 +-
net/key/af_key.c | 3 +
net/mptcp/protocol.c | 6 +-
net/netfilter/ipvs/ip_vs_sync.c | 4 +-
net/netfilter/nf_tables_api.c | 90 +++++++------
net/netfilter/nf_tables_core.c | 55 +++++++-
net/netfilter/nft_bitwise.c | 67 +++++-----
net/netfilter/nft_cmp.c | 142 +++++++++++++++++----
net/netfilter/nft_immediate.c | 22 +++-
net/netfilter/nft_osf.c | 18 ++-
net/netfilter/nft_payload.c | 29 +++--
net/netfilter/nft_range.c | 27 ++--
net/netfilter/nft_tunnel.c | 1 +
net/rose/rose_loopback.c | 3 +-
net/rxrpc/call_object.c | 4 +-
net/rxrpc/sendmsg.c | 92 +++++++------
net/sched/sch_generic.c | 2 +-
net/socket.c | 2 +-
net/sunrpc/clnt.c | 2 +-
net/tipc/socket.c | 2 +-
net/xfrm/espintcp.c | 2 +-
net/xfrm/xfrm_input.c | 2 +-
net/xfrm/xfrm_policy.c | 3 +-
net/xfrm/xfrm_state.c | 1 +
tools/perf/Makefile.config | 2 +-
98 files changed, 874 insertions(+), 475 deletions(-)



2022-08-29 11:54:56

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 45/86] net: Fix data-races around sysctl_[rw]mem(_offset)?.

From: Kuniyuki Iwashima <[email protected]>

[ Upstream commit 02739545951ad4c1215160db7fbf9b7a918d3c0b ]

While reading these sysctl variables, they can be changed concurrently.
Thus, we need to add READ_ONCE() to their readers.

- .sysctl_rmem
- .sysctl_rwmem
- .sysctl_rmem_offset
- .sysctl_wmem_offset
- sysctl_tcp_rmem[1, 2]
- sysctl_tcp_wmem[1, 2]
- sysctl_decnet_rmem[1]
- sysctl_decnet_wmem[1]
- sysctl_tipc_rmem[1]

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
include/net/sock.h | 8 ++++----
net/decnet/af_decnet.c | 4 ++--
net/ipv4/tcp.c | 6 +++---
net/ipv4/tcp_input.c | 13 +++++++------
net/ipv4/tcp_output.c | 2 +-
net/mptcp/protocol.c | 6 +++---
net/tipc/socket.c | 2 +-
7 files changed, 21 insertions(+), 20 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 333131f47ac13..d31c2b9107e54 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2678,18 +2678,18 @@ static inline int sk_get_wmem0(const struct sock *sk, const struct proto *proto)
{
/* Does this proto have per netns sysctl_wmem ? */
if (proto->sysctl_wmem_offset)
- return *(int *)((void *)sock_net(sk) + proto->sysctl_wmem_offset);
+ return READ_ONCE(*(int *)((void *)sock_net(sk) + proto->sysctl_wmem_offset));

- return *proto->sysctl_wmem;
+ return READ_ONCE(*proto->sysctl_wmem);
}

static inline int sk_get_rmem0(const struct sock *sk, const struct proto *proto)
{
/* Does this proto have per netns sysctl_rmem ? */
if (proto->sysctl_rmem_offset)
- return *(int *)((void *)sock_net(sk) + proto->sysctl_rmem_offset);
+ return READ_ONCE(*(int *)((void *)sock_net(sk) + proto->sysctl_rmem_offset));

- return *proto->sysctl_rmem;
+ return READ_ONCE(*proto->sysctl_rmem);
}

/* Default TCP Small queue budget is ~1 ms of data (1sec >> 10)
diff --git a/net/decnet/af_decnet.c b/net/decnet/af_decnet.c
index dc92a67baea39..7d542eb461729 100644
--- a/net/decnet/af_decnet.c
+++ b/net/decnet/af_decnet.c
@@ -480,8 +480,8 @@ static struct sock *dn_alloc_sock(struct net *net, struct socket *sock, gfp_t gf
sk->sk_family = PF_DECnet;
sk->sk_protocol = 0;
sk->sk_allocation = gfp;
- sk->sk_sndbuf = sysctl_decnet_wmem[1];
- sk->sk_rcvbuf = sysctl_decnet_rmem[1];
+ sk->sk_sndbuf = READ_ONCE(sysctl_decnet_wmem[1]);
+ sk->sk_rcvbuf = READ_ONCE(sysctl_decnet_rmem[1]);

/* Initialization of DECnet Session Control Port */
scp = DN_SK(sk);
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 78460eb39b3af..bfeb05f62b94f 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -451,8 +451,8 @@ void tcp_init_sock(struct sock *sk)

icsk->icsk_sync_mss = tcp_sync_mss;

- WRITE_ONCE(sk->sk_sndbuf, sock_net(sk)->ipv4.sysctl_tcp_wmem[1]);
- WRITE_ONCE(sk->sk_rcvbuf, sock_net(sk)->ipv4.sysctl_tcp_rmem[1]);
+ WRITE_ONCE(sk->sk_sndbuf, READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_wmem[1]));
+ WRITE_ONCE(sk->sk_rcvbuf, READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_rmem[1]));

sk_sockets_allocated_inc(sk);
sk->sk_route_forced_caps = NETIF_F_GSO;
@@ -1711,7 +1711,7 @@ int tcp_set_rcvlowat(struct sock *sk, int val)
if (sk->sk_userlocks & SOCK_RCVBUF_LOCK)
cap = sk->sk_rcvbuf >> 1;
else
- cap = sock_net(sk)->ipv4.sysctl_tcp_rmem[2] >> 1;
+ cap = READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_rmem[2]) >> 1;
val = min(val, cap);
WRITE_ONCE(sk->sk_rcvlowat, val ? : 1);

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 33a3fb04ac4df..41b44b311e8a0 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -425,7 +425,7 @@ static void tcp_sndbuf_expand(struct sock *sk)

if (sk->sk_sndbuf < sndmem)
WRITE_ONCE(sk->sk_sndbuf,
- min(sndmem, sock_net(sk)->ipv4.sysctl_tcp_wmem[2]));
+ min(sndmem, READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_wmem[2])));
}

/* 2. Tuning advertised window (window_clamp, rcv_ssthresh)
@@ -460,7 +460,7 @@ static int __tcp_grow_window(const struct sock *sk, const struct sk_buff *skb,
struct tcp_sock *tp = tcp_sk(sk);
/* Optimize this! */
int truesize = tcp_win_from_space(sk, skbtruesize) >> 1;
- int window = tcp_win_from_space(sk, sock_net(sk)->ipv4.sysctl_tcp_rmem[2]) >> 1;
+ int window = tcp_win_from_space(sk, READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_rmem[2])) >> 1;

while (tp->rcv_ssthresh <= window) {
if (truesize <= skb->len)
@@ -565,16 +565,17 @@ static void tcp_clamp_window(struct sock *sk)
struct tcp_sock *tp = tcp_sk(sk);
struct inet_connection_sock *icsk = inet_csk(sk);
struct net *net = sock_net(sk);
+ int rmem2;

icsk->icsk_ack.quick = 0;
+ rmem2 = READ_ONCE(net->ipv4.sysctl_tcp_rmem[2]);

- if (sk->sk_rcvbuf < net->ipv4.sysctl_tcp_rmem[2] &&
+ if (sk->sk_rcvbuf < rmem2 &&
!(sk->sk_userlocks & SOCK_RCVBUF_LOCK) &&
!tcp_under_memory_pressure(sk) &&
sk_memory_allocated(sk) < sk_prot_mem_limits(sk, 0)) {
WRITE_ONCE(sk->sk_rcvbuf,
- min(atomic_read(&sk->sk_rmem_alloc),
- net->ipv4.sysctl_tcp_rmem[2]));
+ min(atomic_read(&sk->sk_rmem_alloc), rmem2));
}
if (atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf)
tp->rcv_ssthresh = min(tp->window_clamp, 2U * tp->advmss);
@@ -736,7 +737,7 @@ void tcp_rcv_space_adjust(struct sock *sk)

do_div(rcvwin, tp->advmss);
rcvbuf = min_t(u64, rcvwin * rcvmem,
- sock_net(sk)->ipv4.sysctl_tcp_rmem[2]);
+ READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_rmem[2]));
if (rcvbuf > sk->sk_rcvbuf) {
WRITE_ONCE(sk->sk_rcvbuf, rcvbuf);

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 4c9274cb92d55..c90c9541996bf 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -238,7 +238,7 @@ void tcp_select_initial_window(const struct sock *sk, int __space, __u32 mss,
*rcv_wscale = 0;
if (wscale_ok) {
/* Set window scaling on max possible window */
- space = max_t(u32, space, sock_net(sk)->ipv4.sysctl_tcp_rmem[2]);
+ space = max_t(u32, space, READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_rmem[2]));
space = max_t(u32, space, sysctl_rmem_max);
space = min_t(u32, space, *window_clamp);
*rcv_wscale = clamp_t(int, ilog2(space) - 15,
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index d0e91aa7b30e5..e61c85873ea2f 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1439,7 +1439,7 @@ static void mptcp_rcv_space_adjust(struct mptcp_sock *msk, int copied)

do_div(rcvwin, advmss);
rcvbuf = min_t(u64, rcvwin * rcvmem,
- sock_net(sk)->ipv4.sysctl_tcp_rmem[2]);
+ READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_rmem[2]));

if (rcvbuf > sk->sk_rcvbuf) {
u32 window_clamp;
@@ -1872,8 +1872,8 @@ static int mptcp_init_sock(struct sock *sk)
return ret;

sk_sockets_allocated_inc(sk);
- sk->sk_rcvbuf = sock_net(sk)->ipv4.sysctl_tcp_rmem[1];
- sk->sk_sndbuf = sock_net(sk)->ipv4.sysctl_tcp_wmem[1];
+ sk->sk_rcvbuf = READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_rmem[1]);
+ sk->sk_sndbuf = READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_wmem[1]);

return 0;
}
diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 38256aabf4f1d..8f3c9fbb99165 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -504,7 +504,7 @@ static int tipc_sk_create(struct net *net, struct socket *sock,
timer_setup(&sk->sk_timer, tipc_sk_timeout, 0);
sk->sk_shutdown = 0;
sk->sk_backlog_rcv = tipc_sk_backlog_rcv;
- sk->sk_rcvbuf = sysctl_tipc_rmem[1];
+ sk->sk_rcvbuf = READ_ONCE(sysctl_tipc_rmem[1]);
sk->sk_data_ready = tipc_data_ready;
sk->sk_write_space = tipc_write_space;
sk->sk_destruct = tipc_sock_destruct;
--
2.35.1



2022-08-29 11:55:27

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 30/86] netfilter: ebtables: reject blobs that dont provide all entry points

From: Florian Westphal <[email protected]>

[ Upstream commit 7997eff82828304b780dc0a39707e1946d6f1ebf ]

Harshit Mogalapalli says:
In ebt_do_table() function dereferencing 'private->hook_entry[hook]'
can lead to NULL pointer dereference. [..] Kernel panic:

general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f]
[..]
RIP: 0010:ebt_do_table+0x1dc/0x1ce0
Code: 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 5c 16 00 00 48 b8 00 00 00 00 00 fc ff df 49 8b 6c df 08 48 8d 7d 2c 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 88
[..]
Call Trace:
nf_hook_slow+0xb1/0x170
__br_forward+0x289/0x730
maybe_deliver+0x24b/0x380
br_flood+0xc6/0x390
br_dev_xmit+0xa2e/0x12c0

For some reason ebtables rejects blobs that provide entry points that are
not supported by the table, but what it should instead reject is the
opposite: blobs that DO NOT provide an entry point supported by the table.

t->valid_hooks is the bitmask of hooks (input, forward ...) that will see
packets. Providing an entry point that is not support is harmless
(never called/used), but the inverse isn't: it results in a crash
because the ebtables traverser doesn't expect a NULL blob for a location
its receiving packets for.

Instead of fixing all the individual checks, do what iptables is doing and
reject all blobs that differ from the expected hooks.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: Harshit Mogalapalli <[email protected]>
Reported-by: syzkaller <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
include/linux/netfilter_bridge/ebtables.h | 4 ----
net/bridge/netfilter/ebtable_broute.c | 8 --------
net/bridge/netfilter/ebtable_filter.c | 8 --------
net/bridge/netfilter/ebtable_nat.c | 8 --------
net/bridge/netfilter/ebtables.c | 8 +-------
5 files changed, 1 insertion(+), 35 deletions(-)

diff --git a/include/linux/netfilter_bridge/ebtables.h b/include/linux/netfilter_bridge/ebtables.h
index 3a956145a25cb..a18fb73a2b772 100644
--- a/include/linux/netfilter_bridge/ebtables.h
+++ b/include/linux/netfilter_bridge/ebtables.h
@@ -94,10 +94,6 @@ struct ebt_table {
struct ebt_replace_kernel *table;
unsigned int valid_hooks;
rwlock_t lock;
- /* e.g. could be the table explicitly only allows certain
- * matches, targets, ... 0 == let it in */
- int (*check)(const struct ebt_table_info *info,
- unsigned int valid_hooks);
/* the data used by the kernel */
struct ebt_table_info *private;
struct module *me;
diff --git a/net/bridge/netfilter/ebtable_broute.c b/net/bridge/netfilter/ebtable_broute.c
index 32bc2821027f3..57f91efce0f73 100644
--- a/net/bridge/netfilter/ebtable_broute.c
+++ b/net/bridge/netfilter/ebtable_broute.c
@@ -36,18 +36,10 @@ static struct ebt_replace_kernel initial_table = {
.entries = (char *)&initial_chain,
};

-static int check(const struct ebt_table_info *info, unsigned int valid_hooks)
-{
- if (valid_hooks & ~(1 << NF_BR_BROUTING))
- return -EINVAL;
- return 0;
-}
-
static const struct ebt_table broute_table = {
.name = "broute",
.table = &initial_table,
.valid_hooks = 1 << NF_BR_BROUTING,
- .check = check,
.me = THIS_MODULE,
};

diff --git a/net/bridge/netfilter/ebtable_filter.c b/net/bridge/netfilter/ebtable_filter.c
index bcf982e12f16b..7f2e620f4978f 100644
--- a/net/bridge/netfilter/ebtable_filter.c
+++ b/net/bridge/netfilter/ebtable_filter.c
@@ -43,18 +43,10 @@ static struct ebt_replace_kernel initial_table = {
.entries = (char *)initial_chains,
};

-static int check(const struct ebt_table_info *info, unsigned int valid_hooks)
-{
- if (valid_hooks & ~FILTER_VALID_HOOKS)
- return -EINVAL;
- return 0;
-}
-
static const struct ebt_table frame_filter = {
.name = "filter",
.table = &initial_table,
.valid_hooks = FILTER_VALID_HOOKS,
- .check = check,
.me = THIS_MODULE,
};

diff --git a/net/bridge/netfilter/ebtable_nat.c b/net/bridge/netfilter/ebtable_nat.c
index 0d092773f8161..1743a105485c4 100644
--- a/net/bridge/netfilter/ebtable_nat.c
+++ b/net/bridge/netfilter/ebtable_nat.c
@@ -43,18 +43,10 @@ static struct ebt_replace_kernel initial_table = {
.entries = (char *)initial_chains,
};

-static int check(const struct ebt_table_info *info, unsigned int valid_hooks)
-{
- if (valid_hooks & ~NAT_VALID_HOOKS)
- return -EINVAL;
- return 0;
-}
-
static const struct ebt_table frame_nat = {
.name = "nat",
.table = &initial_table,
.valid_hooks = NAT_VALID_HOOKS,
- .check = check,
.me = THIS_MODULE,
};

diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c
index d481ff24a1501..310740cc684ad 100644
--- a/net/bridge/netfilter/ebtables.c
+++ b/net/bridge/netfilter/ebtables.c
@@ -999,8 +999,7 @@ static int do_replace_finish(struct net *net, struct ebt_replace *repl,
goto free_iterate;
}

- /* the table doesn't like it */
- if (t->check && (ret = t->check(newinfo, repl->valid_hooks)))
+ if (repl->valid_hooks != t->valid_hooks)
goto free_unlock;

if (repl->num_counters && repl->num_counters != t->private->nentries) {
@@ -1186,11 +1185,6 @@ int ebt_register_table(struct net *net, const struct ebt_table *input_table,
if (ret != 0)
goto free_chainstack;

- if (table->check && table->check(newinfo, table->valid_hooks)) {
- ret = -EINVAL;
- goto free_chainstack;
- }
-
table->private = newinfo;
rwlock_init(&table->lock);
mutex_lock(&ebt_mutex);
--
2.35.1



2022-08-29 11:57:48

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 67/86] btrfs: check if root is readonly while setting security xattr

From: Goldwyn Rodrigues <[email protected]>

commit b51111271b0352aa596c5ae8faf06939e91b3b68 upstream.

For a filesystem which has btrfs read-only property set to true, all
write operations including xattr should be denied. However, security
xattr can still be changed even if btrfs ro property is true.

This happens because xattr_permission() does not have any restrictions
on security.*, system.* and in some cases trusted.* from VFS and
the decision is left to the underlying filesystem. See comments in
xattr_permission() for more details.

This patch checks if the root is read-only before performing the set
xattr operation.

Testcase:

DEV=/dev/vdb
MNT=/mnt

mkfs.btrfs -f $DEV
mount $DEV $MNT
echo "file one" > $MNT/f1

setfattr -n "security.one" -v 2 $MNT/f1
btrfs property set /mnt ro true

setfattr -n "security.one" -v 1 $MNT/f1

umount $MNT

CC: [email protected] # 4.9+
Reviewed-by: Qu Wenruo <[email protected]>
Reviewed-by: Filipe Manana <[email protected]>
Signed-off-by: Goldwyn Rodrigues <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/btrfs/xattr.c | 3 +++
1 file changed, 3 insertions(+)

--- a/fs/btrfs/xattr.c
+++ b/fs/btrfs/xattr.c
@@ -389,6 +389,9 @@ static int btrfs_xattr_handler_set(const
const char *name, const void *buffer,
size_t size, int flags)
{
+ if (btrfs_root_readonly(BTRFS_I(inode)->root))
+ return -EROFS;
+
name = xattr_full_name(handler, name);
return btrfs_setxattr_trans(inode, name, buffer, size, flags);
}


2022-08-29 11:58:33

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 28/86] bonding: 802.3ad: fix no transmission of LACPDUs

From: Jonathan Toppins <[email protected]>

[ Upstream commit d745b5062ad2b5da90a5e728d7ca884fc07315fd ]

This is caused by the global variable ad_ticks_per_sec being zero as
demonstrated by the reproducer script discussed below. This causes
all timer values in __ad_timer_to_ticks to be zero, resulting
in the periodic timer to never fire.

To reproduce:
Run the script in
`tools/testing/selftests/drivers/net/bonding/bond-break-lacpdu-tx.sh` which
puts bonding into a state where it never transmits LACPDUs.

line 44: ip link add fbond type bond mode 4 miimon 200 \
xmit_hash_policy 1 ad_actor_sys_prio 65535 lacp_rate fast
setting bond param: ad_actor_sys_prio
given:
params.ad_actor_system = 0
call stack:
bond_option_ad_actor_sys_prio()
-> bond_3ad_update_ad_actor_settings()
-> set ad.system.sys_priority = bond->params.ad_actor_sys_prio
-> ad.system.sys_mac_addr = bond->dev->dev_addr; because
params.ad_actor_system == 0
results:
ad.system.sys_mac_addr = bond->dev->dev_addr

line 48: ip link set fbond address 52:54:00:3B:7C:A6
setting bond MAC addr
call stack:
bond->dev->dev_addr = new_mac

line 52: ip link set fbond type bond ad_actor_sys_prio 65535
setting bond param: ad_actor_sys_prio
given:
params.ad_actor_system = 0
call stack:
bond_option_ad_actor_sys_prio()
-> bond_3ad_update_ad_actor_settings()
-> set ad.system.sys_priority = bond->params.ad_actor_sys_prio
-> ad.system.sys_mac_addr = bond->dev->dev_addr; because
params.ad_actor_system == 0
results:
ad.system.sys_mac_addr = bond->dev->dev_addr

line 60: ip link set veth1-bond down master fbond
given:
params.ad_actor_system = 0
params.mode = BOND_MODE_8023AD
ad.system.sys_mac_addr == bond->dev->dev_addr
call stack:
bond_enslave
-> bond_3ad_initialize(); because first slave
-> if ad.system.sys_mac_addr != bond->dev->dev_addr
return
results:
Nothing is run in bond_3ad_initialize() because dev_addr equals
sys_mac_addr leaving the global ad_ticks_per_sec zero as it is
never initialized anywhere else.

The if check around the contents of bond_3ad_initialize() is no longer
needed due to commit 5ee14e6d336f ("bonding: 3ad: apply ad_actor settings
changes immediately") which sets ad.system.sys_mac_addr if any one of
the bonding parameters whos set function calls
bond_3ad_update_ad_actor_settings(). This is because if
ad.system.sys_mac_addr is zero it will be set to the current bond mac
address, this causes the if check to never be true.

Fixes: 5ee14e6d336f ("bonding: 3ad: apply ad_actor settings changes immediately")
Signed-off-by: Jonathan Toppins <[email protected]>
Acked-by: Jay Vosburgh <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/bonding/bond_3ad.c | 38 ++++++++++++++--------------------
1 file changed, 16 insertions(+), 22 deletions(-)

diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index 325b20729d8ba..b0f8d551b61db 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -1988,30 +1988,24 @@ void bond_3ad_initiate_agg_selection(struct bonding *bond, int timeout)
*/
void bond_3ad_initialize(struct bonding *bond, u16 tick_resolution)
{
- /* check that the bond is not initialized yet */
- if (!MAC_ADDRESS_EQUAL(&(BOND_AD_INFO(bond).system.sys_mac_addr),
- bond->dev->dev_addr)) {
-
- BOND_AD_INFO(bond).aggregator_identifier = 0;
-
- BOND_AD_INFO(bond).system.sys_priority =
- bond->params.ad_actor_sys_prio;
- if (is_zero_ether_addr(bond->params.ad_actor_system))
- BOND_AD_INFO(bond).system.sys_mac_addr =
- *((struct mac_addr *)bond->dev->dev_addr);
- else
- BOND_AD_INFO(bond).system.sys_mac_addr =
- *((struct mac_addr *)bond->params.ad_actor_system);
+ BOND_AD_INFO(bond).aggregator_identifier = 0;
+ BOND_AD_INFO(bond).system.sys_priority =
+ bond->params.ad_actor_sys_prio;
+ if (is_zero_ether_addr(bond->params.ad_actor_system))
+ BOND_AD_INFO(bond).system.sys_mac_addr =
+ *((struct mac_addr *)bond->dev->dev_addr);
+ else
+ BOND_AD_INFO(bond).system.sys_mac_addr =
+ *((struct mac_addr *)bond->params.ad_actor_system);

- /* initialize how many times this module is called in one
- * second (should be about every 100ms)
- */
- ad_ticks_per_sec = tick_resolution;
+ /* initialize how many times this module is called in one
+ * second (should be about every 100ms)
+ */
+ ad_ticks_per_sec = tick_resolution;

- bond_3ad_initiate_agg_selection(bond,
- AD_AGGREGATOR_SELECTION_TIMER *
- ad_ticks_per_sec);
- }
+ bond_3ad_initiate_agg_selection(bond,
+ AD_AGGREGATOR_SELECTION_TIMER *
+ ad_ticks_per_sec);
}

/**
--
2.35.1



2022-08-29 12:05:09

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 60/86] net: Fix a data-race around sysctl_somaxconn.

From: Kuniyuki Iwashima <[email protected]>

[ Upstream commit 3c9ba81d72047f2e81bb535d42856517b613aba7 ]

While reading sysctl_somaxconn, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/socket.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/socket.c b/net/socket.c
index d52c265ad449b..bcf68b150fe29 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -1670,7 +1670,7 @@ int __sys_listen(int fd, int backlog)

sock = sockfd_lookup_light(fd, &err, &fput_needed);
if (sock) {
- somaxconn = sock_net(sock->sk)->core.sysctl_somaxconn;
+ somaxconn = READ_ONCE(sock_net(sock->sk)->core.sysctl_somaxconn);
if ((unsigned int)backlog > somaxconn)
backlog = somaxconn;

--
2.35.1



2022-08-29 12:05:53

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 40/86] netfilter: bitwise: improve error goto labels

From: Jeremy Sowden <[email protected]>

[ Upstream commit 00bd435208e5201eb935d273052930bd3b272b6f ]

Replace two labels (`err1` and `err2`) with more informative ones.

Signed-off-by: Jeremy Sowden <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/netfilter/nft_bitwise.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/net/netfilter/nft_bitwise.c b/net/netfilter/nft_bitwise.c
index 47b0dba95054f..d0c648b64cd40 100644
--- a/net/netfilter/nft_bitwise.c
+++ b/net/netfilter/nft_bitwise.c
@@ -109,22 +109,23 @@ static int nft_bitwise_init_bool(struct nft_bitwise *priv,
return err;
if (mask.type != NFT_DATA_VALUE || mask.len != priv->len) {
err = -EINVAL;
- goto err1;
+ goto err_mask_release;
}

err = nft_data_init(NULL, &priv->xor, sizeof(priv->xor), &xor,
tb[NFTA_BITWISE_XOR]);
if (err < 0)
- goto err1;
+ goto err_mask_release;
if (xor.type != NFT_DATA_VALUE || xor.len != priv->len) {
err = -EINVAL;
- goto err2;
+ goto err_xor_release;
}

return 0;
-err2:
+
+err_xor_release:
nft_data_release(&priv->xor, xor.type);
-err1:
+err_mask_release:
nft_data_release(&priv->mask, mask.type);
return err;
}
--
2.35.1



2022-08-29 12:05:58

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 82/86] perf python: Fix build when PYTHON_CONFIG is user supplied

From: James Clark <[email protected]>

commit bc9e7fe313d5e56d4d5f34bcc04d1165f94f86fb upstream.

The previous change to Python autodetection had a small mistake where
the auto value was used to determine the Python binary, rather than the
user supplied value. The Python binary is only used for one part of the
build process, rather than the final linking, so it was producing
correct builds in most scenarios, especially when the auto detected
value matched what the user wanted, or the system only had a valid set
of Pythons.

Change it so that the Python binary path is derived from either the
PYTHON_CONFIG value or PYTHON value, depending on what is specified by
the user. This was the original intention.

This error was spotted in a build failure an odd cross compilation
environment after commit 4c41cb46a732fe82 ("perf python: Prefer
python3") was merged.

Fixes: 630af16eee495f58 ("perf tools: Use Python devtools for version autodetection rather than runtime")
Signed-off-by: James Clark <[email protected]>
Acked-by: Ian Rogers <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
tools/perf/Makefile.config | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/tools/perf/Makefile.config
+++ b/tools/perf/Makefile.config
@@ -248,7 +248,7 @@ endif
# defined. get-executable-or-default fails with an error if the first argument is supplied but
# doesn't exist.
override PYTHON_CONFIG := $(call get-executable-or-default,PYTHON_CONFIG,$(PYTHON_AUTO))
-override PYTHON := $(call get-executable-or-default,PYTHON,$(subst -config,,$(PYTHON_AUTO)))
+override PYTHON := $(call get-executable-or-default,PYTHON,$(subst -config,,$(PYTHON_CONFIG)))

grep-libs = $(filter -l%,$(1))
strip-libs = $(filter-out -l%,$(1))


2022-08-29 12:06:03

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 79/86] arm64: Fix match_list for erratum 1286807 on Arm Cortex-A76

From: Zenghui Yu <[email protected]>

commit 5e1e087457c94ad7fafbe1cf6f774c6999ee29d4 upstream.

Since commit 51f559d66527 ("arm64: Enable repeat tlbi workaround on KRYO4XX
gold CPUs"), we failed to detect erratum 1286807 on Cortex-A76 because its
entry in arm64_repeat_tlbi_list[] was accidently corrupted by this commit.

Fix this issue by creating a separate entry for Kryo4xx Gold.

Fixes: 51f559d66527 ("arm64: Enable repeat tlbi workaround on KRYO4XX gold CPUs")
Cc: Shreyas K K <[email protected]>
Signed-off-by: Zenghui Yu <[email protected]>
Acked-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/arm64/kernel/cpu_errata.c | 2 ++
1 file changed, 2 insertions(+)

--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -220,6 +220,8 @@ static const struct arm64_cpu_capabiliti
#ifdef CONFIG_ARM64_ERRATUM_1286807
{
ERRATA_MIDR_RANGE(MIDR_CORTEX_A76, 0, 0, 3, 0),
+ },
+ {
/* Kryo4xx Gold (rcpe to rfpe) => (r0p0 to r3p0) */
ERRATA_MIDR_RANGE(MIDR_QCOM_KRYO_4XX_GOLD, 0xc, 0xe, 0xf, 0xe),
},


2022-08-29 12:07:00

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 51/86] bpf: Folding omem_charge() into sk_storage_charge()

From: Martin KaFai Lau <[email protected]>

[ Upstream commit 9e838b02b0bb795793f12049307a354e28b5749c ]

sk_storage_charge() is the only user of omem_charge().
This patch simplifies it by folding omem_charge() into
sk_storage_charge().

Signed-off-by: Martin KaFai Lau <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Song Liu <[email protected]>
Acked-by: KP Singh <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
---
net/core/bpf_sk_storage.c | 23 ++++++++++-------------
1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
index 5f773624948ff..39c5a059d1c2b 100644
--- a/net/core/bpf_sk_storage.c
+++ b/net/core/bpf_sk_storage.c
@@ -15,18 +15,6 @@

DEFINE_BPF_STORAGE_CACHE(sk_cache);

-static int omem_charge(struct sock *sk, unsigned int size)
-{
- /* same check as in sock_kmalloc() */
- if (size <= sysctl_optmem_max &&
- atomic_read(&sk->sk_omem_alloc) + size < sysctl_optmem_max) {
- atomic_add(size, &sk->sk_omem_alloc);
- return 0;
- }
-
- return -ENOMEM;
-}
-
static struct bpf_local_storage_data *
sk_storage_lookup(struct sock *sk, struct bpf_map *map, bool cacheit_lockit)
{
@@ -316,7 +304,16 @@ BPF_CALL_2(bpf_sk_storage_delete, struct bpf_map *, map, struct sock *, sk)
static int sk_storage_charge(struct bpf_local_storage_map *smap,
void *owner, u32 size)
{
- return omem_charge(owner, size);
+ struct sock *sk = (struct sock *)owner;
+
+ /* same check as in sock_kmalloc() */
+ if (size <= sysctl_optmem_max &&
+ atomic_read(&sk->sk_omem_alloc) + size < sysctl_optmem_max) {
+ atomic_add(size, &sk->sk_omem_alloc);
+ return 0;
+ }
+
+ return -ENOMEM;
}

static void sk_storage_uncharge(struct bpf_local_storage_map *smap,
--
2.35.1



2022-08-29 12:08:04

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 07/86] fs: remove __sync_filesystem

From: Christoph Hellwig <[email protected]>

commit 9a208ba5c9afa62c7b1e9c6f5e783066e84e2d3c upstream.

[backported for dependency]

There is no clear benefit in having this helper vs just open coding it.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Amir Goldstein <[email protected]>
Acked-by: Darrick J. Wong <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/sync.c | 38 +++++++++++++++++---------------------
1 file changed, 17 insertions(+), 21 deletions(-)

--- a/fs/sync.c
+++ b/fs/sync.c
@@ -22,25 +22,6 @@
SYNC_FILE_RANGE_WAIT_AFTER)

/*
- * Do the filesystem syncing work. For simple filesystems
- * writeback_inodes_sb(sb) just dirties buffers with inodes so we have to
- * submit IO for these buffers via __sync_blockdev(). This also speeds up the
- * wait == 1 case since in that case write_inode() functions do
- * sync_dirty_buffer() and thus effectively write one block at a time.
- */
-static int __sync_filesystem(struct super_block *sb, int wait)
-{
- if (wait)
- sync_inodes_sb(sb);
- else
- writeback_inodes_sb(sb, WB_REASON_SYNC);
-
- if (sb->s_op->sync_fs)
- sb->s_op->sync_fs(sb, wait);
- return __sync_blockdev(sb->s_bdev, wait);
-}
-
-/*
* Write out and wait upon all dirty data associated with this
* superblock. Filesystem data as well as the underlying block
* device. Takes the superblock lock.
@@ -61,10 +42,25 @@ int sync_filesystem(struct super_block *
if (sb_rdonly(sb))
return 0;

- ret = __sync_filesystem(sb, 0);
+ /*
+ * Do the filesystem syncing work. For simple filesystems
+ * writeback_inodes_sb(sb) just dirties buffers with inodes so we have
+ * to submit I/O for these buffers via __sync_blockdev(). This also
+ * speeds up the wait == 1 case since in that case write_inode()
+ * methods call sync_dirty_buffer() and thus effectively write one block
+ * at a time.
+ */
+ writeback_inodes_sb(sb, WB_REASON_SYNC);
+ if (sb->s_op->sync_fs)
+ sb->s_op->sync_fs(sb, 0);
+ ret = __sync_blockdev(sb->s_bdev, 0);
if (ret < 0)
return ret;
- return __sync_filesystem(sb, 1);
+
+ sync_inodes_sb(sb);
+ if (sb->s_op->sync_fs)
+ sb->s_op->sync_fs(sb, 1);
+ return __sync_blockdev(sb->s_bdev, 1);
}
EXPORT_SYMBOL(sync_filesystem);



2022-08-29 12:08:31

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 68/86] perf/x86/lbr: Enable the branch type for the Arch LBR by default

From: Kan Liang <[email protected]>

commit 32ba156df1b1c8804a4e5be5339616945eafea22 upstream.

On the platform with Arch LBR, the HW raw branch type encoding may leak
to the perf tool when the SAVE_TYPE option is not set.

In the intel_pmu_store_lbr(), the HW raw branch type is stored in
lbr_entries[].type. If the SAVE_TYPE option is set, the
lbr_entries[].type will be converted into the generic PERF_BR_* type
in the intel_pmu_lbr_filter() and exposed to the user tools.
But if the SAVE_TYPE option is NOT set by the user, the current perf
kernel doesn't clear the field. The HW raw branch type leaks.

There are two solutions to fix the issue for the Arch LBR.
One is to clear the field if the SAVE_TYPE option is NOT set.
The other solution is to unconditionally convert the branch type and
expose the generic type to the user tools.

The latter is implemented here, because
- The branch type is valuable information. I don't see a case where
you would not benefit from the branch type. (Stephane Eranian)
- Not having the branch type DOES NOT save any space in the
branch record (Stephane Eranian)
- The Arch LBR HW can retrieve the common branch types from the
LBR_INFO. It doesn't require the high overhead SW disassemble.

Fixes: 47125db27e47 ("perf/x86/intel/lbr: Support Architectural LBR")
Reported-by: Stephane Eranian <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/events/intel/lbr.c | 8 ++++++++
1 file changed, 8 insertions(+)

--- a/arch/x86/events/intel/lbr.c
+++ b/arch/x86/events/intel/lbr.c
@@ -1114,6 +1114,14 @@ static int intel_pmu_setup_hw_lbr_filter

if (static_cpu_has(X86_FEATURE_ARCH_LBR)) {
reg->config = mask;
+
+ /*
+ * The Arch LBR HW can retrieve the common branch types
+ * from the LBR_INFO. It doesn't require the high overhead
+ * SW disassemble.
+ * Enable the branch type by default for the Arch LBR.
+ */
+ reg->reg |= X86_BR_TYPE_SAVE;
return 0;
}



2022-08-29 12:09:06

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 66/86] btrfs: add info when mount fails due to stale replace target

From: Anand Jain <[email protected]>

commit f2c3bec215694fb8bc0ef5010f2a758d1906fc2d upstream.

If the replace target device reappears after the suspended replace is
cancelled, it blocks the mount operation as it can't find the matching
replace-item in the metadata. As shown below,

BTRFS error (device sda5): replace devid present without an active replace item

To overcome this situation, the user can run the command

btrfs device scan --forget <replace target device>

and try the mount command again. And also, to avoid repeating the issue,
superblock on the devid=0 must be wiped.

wipefs -a device-path-to-devid=0.

This patch adds some info when this situation occurs.

Reported-by: Samuel Greiner <[email protected]>
Link: https://lore.kernel.org/linux-btrfs/[email protected]/T/
CC: [email protected] # 5.0+
Signed-off-by: Anand Jain <[email protected]>
Signed-off-by: David Sterba <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/btrfs/dev-replace.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -161,7 +161,7 @@ no_valid_dev_replace_entry_found:
if (btrfs_find_device(fs_info->fs_devices,
BTRFS_DEV_REPLACE_DEVID, NULL, NULL, false)) {
btrfs_err(fs_info,
- "replace devid present without an active replace item");
+"replace without active item, run 'device scan --forget' on the target device");
ret = -EUCLEAN;
} else {
dev_replace->srcdev = NULL;


2022-08-29 12:09:42

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 41/86] netfilter: nf_tables: upfront validation of data via nft_data_init()

From: Pablo Neira Ayuso <[email protected]>

[ Upstream commit 341b6941608762d8235f3fd1e45e4d7114ed8c2c ]

Instead of parsing the data and then validate that type and length are
correct, pass a description of the expected data so it can be validated
upfront before parsing it to bail out earlier.

This patch adds a new .size field to specify the maximum size of the
data area. The .len field is optional and it is used as an input/output
field, it provides the specific length of the expected data in the input
path. If then .len field is not specified, then obtained length from the
netlink attribute is stored. This is required by cmp, bitwise, range and
immediate, which provide no netlink attribute that describes the data
length. The immediate expression uses the destination register type to
infer the expected data type.

Relying on opencoded validation of the expected data might lead to
subtle bugs as described in 7e6bc1f6cabc ("netfilter: nf_tables:
stricter validation of element data").

Signed-off-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
include/net/netfilter/nf_tables.h | 4 +-
net/netfilter/nf_tables_api.c | 78 ++++++++++++++++---------------
net/netfilter/nft_bitwise.c | 66 +++++++++++++-------------
net/netfilter/nft_cmp.c | 44 ++++++++---------
net/netfilter/nft_immediate.c | 22 +++++++--
net/netfilter/nft_range.c | 27 +++++------
6 files changed, 126 insertions(+), 115 deletions(-)

diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index b9948e7861f22..6c062b2509b9b 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -178,11 +178,11 @@ struct nft_ctx {

struct nft_data_desc {
enum nft_data_types type;
+ unsigned int size;
unsigned int len;
};

-int nft_data_init(const struct nft_ctx *ctx,
- struct nft_data *data, unsigned int size,
+int nft_data_init(const struct nft_ctx *ctx, struct nft_data *data,
struct nft_data_desc *desc, const struct nlattr *nla);
void nft_data_hold(const struct nft_data *data, enum nft_data_types type);
void nft_data_release(const struct nft_data *data, enum nft_data_types type);
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 456988b5c076e..df79ea6004a59 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -4841,19 +4841,13 @@ static int nft_setelem_parse_flags(const struct nft_set *set,
static int nft_setelem_parse_key(struct nft_ctx *ctx, struct nft_set *set,
struct nft_data *key, struct nlattr *attr)
{
- struct nft_data_desc desc;
- int err;
-
- err = nft_data_init(ctx, key, NFT_DATA_VALUE_MAXLEN, &desc, attr);
- if (err < 0)
- return err;
-
- if (desc.type != NFT_DATA_VALUE || desc.len != set->klen) {
- nft_data_release(key, desc.type);
- return -EINVAL;
- }
+ struct nft_data_desc desc = {
+ .type = NFT_DATA_VALUE,
+ .size = NFT_DATA_VALUE_MAXLEN,
+ .len = set->klen,
+ };

- return 0;
+ return nft_data_init(ctx, key, &desc, attr);
}

static int nft_setelem_parse_data(struct nft_ctx *ctx, struct nft_set *set,
@@ -4862,24 +4856,17 @@ static int nft_setelem_parse_data(struct nft_ctx *ctx, struct nft_set *set,
struct nlattr *attr)
{
u32 dtype;
- int err;
-
- err = nft_data_init(ctx, data, NFT_DATA_VALUE_MAXLEN, desc, attr);
- if (err < 0)
- return err;

if (set->dtype == NFT_DATA_VERDICT)
dtype = NFT_DATA_VERDICT;
else
dtype = NFT_DATA_VALUE;

- if (dtype != desc->type ||
- set->dlen != desc->len) {
- nft_data_release(data, desc->type);
- return -EINVAL;
- }
+ desc->type = dtype;
+ desc->size = NFT_DATA_VALUE_MAXLEN;
+ desc->len = set->dlen;

- return 0;
+ return nft_data_init(ctx, data, desc, attr);
}

static int nft_get_set_elem(struct nft_ctx *ctx, struct nft_set *set,
@@ -8697,7 +8684,7 @@ static int nft_verdict_init(const struct nft_ctx *ctx, struct nft_data *data,
}

desc->len = sizeof(data->verdict);
- desc->type = NFT_DATA_VERDICT;
+
return 0;
}

@@ -8750,20 +8737,25 @@ int nft_verdict_dump(struct sk_buff *skb, int type, const struct nft_verdict *v)
}

static int nft_value_init(const struct nft_ctx *ctx,
- struct nft_data *data, unsigned int size,
- struct nft_data_desc *desc, const struct nlattr *nla)
+ struct nft_data *data, struct nft_data_desc *desc,
+ const struct nlattr *nla)
{
unsigned int len;

len = nla_len(nla);
if (len == 0)
return -EINVAL;
- if (len > size)
+ if (len > desc->size)
return -EOVERFLOW;
+ if (desc->len) {
+ if (len != desc->len)
+ return -EINVAL;
+ } else {
+ desc->len = len;
+ }

nla_memcpy(data->data, nla, len);
- desc->type = NFT_DATA_VALUE;
- desc->len = len;
+
return 0;
}

@@ -8783,7 +8775,6 @@ static const struct nla_policy nft_data_policy[NFTA_DATA_MAX + 1] = {
*
* @ctx: context of the expression using the data
* @data: destination struct nft_data
- * @size: maximum data length
* @desc: data description
* @nla: netlink attribute containing data
*
@@ -8793,24 +8784,35 @@ static const struct nla_policy nft_data_policy[NFTA_DATA_MAX + 1] = {
* The caller can indicate that it only wants to accept data of type
* NFT_DATA_VALUE by passing NULL for the ctx argument.
*/
-int nft_data_init(const struct nft_ctx *ctx,
- struct nft_data *data, unsigned int size,
+int nft_data_init(const struct nft_ctx *ctx, struct nft_data *data,
struct nft_data_desc *desc, const struct nlattr *nla)
{
struct nlattr *tb[NFTA_DATA_MAX + 1];
int err;

+ if (WARN_ON_ONCE(!desc->size))
+ return -EINVAL;
+
err = nla_parse_nested_deprecated(tb, NFTA_DATA_MAX, nla,
nft_data_policy, NULL);
if (err < 0)
return err;

- if (tb[NFTA_DATA_VALUE])
- return nft_value_init(ctx, data, size, desc,
- tb[NFTA_DATA_VALUE]);
- if (tb[NFTA_DATA_VERDICT] && ctx != NULL)
- return nft_verdict_init(ctx, data, desc, tb[NFTA_DATA_VERDICT]);
- return -EINVAL;
+ if (tb[NFTA_DATA_VALUE]) {
+ if (desc->type != NFT_DATA_VALUE)
+ return -EINVAL;
+
+ err = nft_value_init(ctx, data, desc, tb[NFTA_DATA_VALUE]);
+ } else if (tb[NFTA_DATA_VERDICT] && ctx != NULL) {
+ if (desc->type != NFT_DATA_VERDICT)
+ return -EINVAL;
+
+ err = nft_verdict_init(ctx, data, desc, tb[NFTA_DATA_VERDICT]);
+ } else {
+ err = -EINVAL;
+ }
+
+ return err;
}
EXPORT_SYMBOL_GPL(nft_data_init);

diff --git a/net/netfilter/nft_bitwise.c b/net/netfilter/nft_bitwise.c
index d0c648b64cd40..d6ab7aa14adc2 100644
--- a/net/netfilter/nft_bitwise.c
+++ b/net/netfilter/nft_bitwise.c
@@ -93,7 +93,16 @@ static const struct nla_policy nft_bitwise_policy[NFTA_BITWISE_MAX + 1] = {
static int nft_bitwise_init_bool(struct nft_bitwise *priv,
const struct nlattr *const tb[])
{
- struct nft_data_desc mask, xor;
+ struct nft_data_desc mask = {
+ .type = NFT_DATA_VALUE,
+ .size = sizeof(priv->mask),
+ .len = priv->len,
+ };
+ struct nft_data_desc xor = {
+ .type = NFT_DATA_VALUE,
+ .size = sizeof(priv->xor),
+ .len = priv->len,
+ };
int err;

if (tb[NFTA_BITWISE_DATA])
@@ -103,37 +112,30 @@ static int nft_bitwise_init_bool(struct nft_bitwise *priv,
!tb[NFTA_BITWISE_XOR])
return -EINVAL;

- err = nft_data_init(NULL, &priv->mask, sizeof(priv->mask), &mask,
- tb[NFTA_BITWISE_MASK]);
+ err = nft_data_init(NULL, &priv->mask, &mask, tb[NFTA_BITWISE_MASK]);
if (err < 0)
return err;
- if (mask.type != NFT_DATA_VALUE || mask.len != priv->len) {
- err = -EINVAL;
- goto err_mask_release;
- }

- err = nft_data_init(NULL, &priv->xor, sizeof(priv->xor), &xor,
- tb[NFTA_BITWISE_XOR]);
+ err = nft_data_init(NULL, &priv->xor, &xor, tb[NFTA_BITWISE_XOR]);
if (err < 0)
- goto err_mask_release;
- if (xor.type != NFT_DATA_VALUE || xor.len != priv->len) {
- err = -EINVAL;
- goto err_xor_release;
- }
+ goto err_xor_err;

return 0;

-err_xor_release:
- nft_data_release(&priv->xor, xor.type);
-err_mask_release:
+err_xor_err:
nft_data_release(&priv->mask, mask.type);
+
return err;
}

static int nft_bitwise_init_shift(struct nft_bitwise *priv,
const struct nlattr *const tb[])
{
- struct nft_data_desc d;
+ struct nft_data_desc desc = {
+ .type = NFT_DATA_VALUE,
+ .size = sizeof(priv->data),
+ .len = sizeof(u32),
+ };
int err;

if (tb[NFTA_BITWISE_MASK] ||
@@ -143,13 +145,12 @@ static int nft_bitwise_init_shift(struct nft_bitwise *priv,
if (!tb[NFTA_BITWISE_DATA])
return -EINVAL;

- err = nft_data_init(NULL, &priv->data, sizeof(priv->data), &d,
- tb[NFTA_BITWISE_DATA]);
+ err = nft_data_init(NULL, &priv->data, &desc, tb[NFTA_BITWISE_DATA]);
if (err < 0)
return err;
- if (d.type != NFT_DATA_VALUE || d.len != sizeof(u32) ||
- priv->data.data[0] >= BITS_PER_TYPE(u32)) {
- nft_data_release(&priv->data, d.type);
+
+ if (priv->data.data[0] >= BITS_PER_TYPE(u32)) {
+ nft_data_release(&priv->data, desc.type);
return -EINVAL;
}

@@ -291,22 +292,21 @@ static const struct nft_expr_ops nft_bitwise_ops = {
static int
nft_bitwise_extract_u32_data(const struct nlattr * const tb, u32 *out)
{
- struct nft_data_desc desc;
struct nft_data data;
- int err = 0;
+ struct nft_data_desc desc = {
+ .type = NFT_DATA_VALUE,
+ .size = sizeof(data),
+ .len = sizeof(u32),
+ };
+ int err;

- err = nft_data_init(NULL, &data, sizeof(data), &desc, tb);
+ err = nft_data_init(NULL, &data, &desc, tb);
if (err < 0)
return err;

- if (desc.type != NFT_DATA_VALUE || desc.len != sizeof(u32)) {
- err = -EINVAL;
- goto err;
- }
*out = data.data[0];
-err:
- nft_data_release(&data, desc.type);
- return err;
+
+ return 0;
}

static int nft_bitwise_fast_init(const struct nft_ctx *ctx,
diff --git a/net/netfilter/nft_cmp.c b/net/netfilter/nft_cmp.c
index 917072af09df9..461763a571f20 100644
--- a/net/netfilter/nft_cmp.c
+++ b/net/netfilter/nft_cmp.c
@@ -73,20 +73,16 @@ static int nft_cmp_init(const struct nft_ctx *ctx, const struct nft_expr *expr,
const struct nlattr * const tb[])
{
struct nft_cmp_expr *priv = nft_expr_priv(expr);
- struct nft_data_desc desc;
+ struct nft_data_desc desc = {
+ .type = NFT_DATA_VALUE,
+ .size = sizeof(priv->data),
+ };
int err;

- err = nft_data_init(NULL, &priv->data, sizeof(priv->data), &desc,
- tb[NFTA_CMP_DATA]);
+ err = nft_data_init(NULL, &priv->data, &desc, tb[NFTA_CMP_DATA]);
if (err < 0)
return err;

- if (desc.type != NFT_DATA_VALUE) {
- err = -EINVAL;
- nft_data_release(&priv->data, desc.type);
- return err;
- }
-
err = nft_parse_register_load(tb[NFTA_CMP_SREG], &priv->sreg, desc.len);
if (err < 0)
return err;
@@ -201,12 +197,14 @@ static int nft_cmp_fast_init(const struct nft_ctx *ctx,
const struct nlattr * const tb[])
{
struct nft_cmp_fast_expr *priv = nft_expr_priv(expr);
- struct nft_data_desc desc;
struct nft_data data;
+ struct nft_data_desc desc = {
+ .type = NFT_DATA_VALUE,
+ .size = sizeof(data),
+ };
int err;

- err = nft_data_init(NULL, &data, sizeof(data), &desc,
- tb[NFTA_CMP_DATA]);
+ err = nft_data_init(NULL, &data, &desc, tb[NFTA_CMP_DATA]);
if (err < 0)
return err;

@@ -299,11 +297,13 @@ static int nft_cmp16_fast_init(const struct nft_ctx *ctx,
const struct nlattr * const tb[])
{
struct nft_cmp16_fast_expr *priv = nft_expr_priv(expr);
- struct nft_data_desc desc;
+ struct nft_data_desc desc = {
+ .type = NFT_DATA_VALUE,
+ .size = sizeof(priv->data),
+ };
int err;

- err = nft_data_init(NULL, &priv->data, sizeof(priv->data), &desc,
- tb[NFTA_CMP_DATA]);
+ err = nft_data_init(NULL, &priv->data, &desc, tb[NFTA_CMP_DATA]);
if (err < 0)
return err;

@@ -365,8 +365,11 @@ const struct nft_expr_ops nft_cmp16_fast_ops = {
static const struct nft_expr_ops *
nft_cmp_select_ops(const struct nft_ctx *ctx, const struct nlattr * const tb[])
{
- struct nft_data_desc desc;
struct nft_data data;
+ struct nft_data_desc desc = {
+ .type = NFT_DATA_VALUE,
+ .size = sizeof(data),
+ };
enum nft_cmp_ops op;
u8 sreg;
int err;
@@ -389,14 +392,10 @@ nft_cmp_select_ops(const struct nft_ctx *ctx, const struct nlattr * const tb[])
return ERR_PTR(-EINVAL);
}

- err = nft_data_init(NULL, &data, sizeof(data), &desc,
- tb[NFTA_CMP_DATA]);
+ err = nft_data_init(NULL, &data, &desc, tb[NFTA_CMP_DATA]);
if (err < 0)
return ERR_PTR(err);

- if (desc.type != NFT_DATA_VALUE)
- goto err1;
-
sreg = ntohl(nla_get_be32(tb[NFTA_CMP_SREG]));

if (op == NFT_CMP_EQ || op == NFT_CMP_NEQ) {
@@ -408,9 +407,6 @@ nft_cmp_select_ops(const struct nft_ctx *ctx, const struct nlattr * const tb[])
return &nft_cmp16_fast_ops;
}
return &nft_cmp_ops;
-err1:
- nft_data_release(&data, desc.type);
- return ERR_PTR(-EINVAL);
}

struct nft_expr_type nft_cmp_type __read_mostly = {
diff --git a/net/netfilter/nft_immediate.c b/net/netfilter/nft_immediate.c
index d0f67d325bdfd..fcdbc5ed3f367 100644
--- a/net/netfilter/nft_immediate.c
+++ b/net/netfilter/nft_immediate.c
@@ -29,20 +29,36 @@ static const struct nla_policy nft_immediate_policy[NFTA_IMMEDIATE_MAX + 1] = {
[NFTA_IMMEDIATE_DATA] = { .type = NLA_NESTED },
};

+static enum nft_data_types nft_reg_to_type(const struct nlattr *nla)
+{
+ enum nft_data_types type;
+ u8 reg;
+
+ reg = ntohl(nla_get_be32(nla));
+ if (reg == NFT_REG_VERDICT)
+ type = NFT_DATA_VERDICT;
+ else
+ type = NFT_DATA_VALUE;
+
+ return type;
+}
+
static int nft_immediate_init(const struct nft_ctx *ctx,
const struct nft_expr *expr,
const struct nlattr * const tb[])
{
struct nft_immediate_expr *priv = nft_expr_priv(expr);
- struct nft_data_desc desc;
+ struct nft_data_desc desc = {
+ .size = sizeof(priv->data),
+ };
int err;

if (tb[NFTA_IMMEDIATE_DREG] == NULL ||
tb[NFTA_IMMEDIATE_DATA] == NULL)
return -EINVAL;

- err = nft_data_init(ctx, &priv->data, sizeof(priv->data), &desc,
- tb[NFTA_IMMEDIATE_DATA]);
+ desc.type = nft_reg_to_type(tb[NFTA_IMMEDIATE_DREG]);
+ err = nft_data_init(ctx, &priv->data, &desc, tb[NFTA_IMMEDIATE_DATA]);
if (err < 0)
return err;

diff --git a/net/netfilter/nft_range.c b/net/netfilter/nft_range.c
index e4a1c44d7f513..e6bbe32c323df 100644
--- a/net/netfilter/nft_range.c
+++ b/net/netfilter/nft_range.c
@@ -51,7 +51,14 @@ static int nft_range_init(const struct nft_ctx *ctx, const struct nft_expr *expr
const struct nlattr * const tb[])
{
struct nft_range_expr *priv = nft_expr_priv(expr);
- struct nft_data_desc desc_from, desc_to;
+ struct nft_data_desc desc_from = {
+ .type = NFT_DATA_VALUE,
+ .size = sizeof(priv->data_from),
+ };
+ struct nft_data_desc desc_to = {
+ .type = NFT_DATA_VALUE,
+ .size = sizeof(priv->data_to),
+ };
int err;
u32 op;

@@ -61,26 +68,16 @@ static int nft_range_init(const struct nft_ctx *ctx, const struct nft_expr *expr
!tb[NFTA_RANGE_TO_DATA])
return -EINVAL;

- err = nft_data_init(NULL, &priv->data_from, sizeof(priv->data_from),
- &desc_from, tb[NFTA_RANGE_FROM_DATA]);
+ err = nft_data_init(NULL, &priv->data_from, &desc_from,
+ tb[NFTA_RANGE_FROM_DATA]);
if (err < 0)
return err;

- if (desc_from.type != NFT_DATA_VALUE) {
- err = -EINVAL;
- goto err1;
- }
-
- err = nft_data_init(NULL, &priv->data_to, sizeof(priv->data_to),
- &desc_to, tb[NFTA_RANGE_TO_DATA]);
+ err = nft_data_init(NULL, &priv->data_to, &desc_to,
+ tb[NFTA_RANGE_TO_DATA]);
if (err < 0)
goto err1;

- if (desc_to.type != NFT_DATA_VALUE) {
- err = -EINVAL;
- goto err2;
- }
-
if (desc_from.len != desc_to.len) {
err = -EINVAL;
goto err2;
--
2.35.1



2022-08-29 12:09:43

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 85/86] scsi: storvsc: Remove WQ_MEM_RECLAIM from storvsc_error_wq

From: Saurabh Sengar <[email protected]>

commit d957e7ffb2c72410bcc1a514153a46719255a5da upstream.

storvsc_error_wq workqueue should not be marked as WQ_MEM_RECLAIM as it
doesn't need to make forward progress under memory pressure. Marking this
workqueue as WQ_MEM_RECLAIM may cause deadlock while flushing a
non-WQ_MEM_RECLAIM workqueue. In the current state it causes the following
warning:

[ 14.506347] ------------[ cut here ]------------
[ 14.506354] workqueue: WQ_MEM_RECLAIM storvsc_error_wq_0:storvsc_remove_lun is flushing !WQ_MEM_RECLAIM events_freezable_power_:disk_events_workfn
[ 14.506360] WARNING: CPU: 0 PID: 8 at <-snip->kernel/workqueue.c:2623 check_flush_dependency+0xb5/0x130
[ 14.506390] CPU: 0 PID: 8 Comm: kworker/u4:0 Not tainted 5.4.0-1086-azure #91~18.04.1-Ubuntu
[ 14.506391] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 05/09/2022
[ 14.506393] Workqueue: storvsc_error_wq_0 storvsc_remove_lun
[ 14.506395] RIP: 0010:check_flush_dependency+0xb5/0x130
<-snip->
[ 14.506408] Call Trace:
[ 14.506412] __flush_work+0xf1/0x1c0
[ 14.506414] __cancel_work_timer+0x12f/0x1b0
[ 14.506417] ? kernfs_put+0xf0/0x190
[ 14.506418] cancel_delayed_work_sync+0x13/0x20
[ 14.506420] disk_block_events+0x78/0x80
[ 14.506421] del_gendisk+0x3d/0x2f0
[ 14.506423] sr_remove+0x28/0x70
[ 14.506427] device_release_driver_internal+0xef/0x1c0
[ 14.506428] device_release_driver+0x12/0x20
[ 14.506429] bus_remove_device+0xe1/0x150
[ 14.506431] device_del+0x167/0x380
[ 14.506432] __scsi_remove_device+0x11d/0x150
[ 14.506433] scsi_remove_device+0x26/0x40
[ 14.506434] storvsc_remove_lun+0x40/0x60
[ 14.506436] process_one_work+0x209/0x400
[ 14.506437] worker_thread+0x34/0x400
[ 14.506439] kthread+0x121/0x140
[ 14.506440] ? process_one_work+0x400/0x400
[ 14.506441] ? kthread_park+0x90/0x90
[ 14.506443] ret_from_fork+0x35/0x40
[ 14.506445] ---[ end trace 2d9633159fdc6ee7 ]---

Link: https://lore.kernel.org/r/[email protected]
Fixes: 436ad9413353 ("scsi: storvsc: Allow only one remove lun work item to be issued per lun")
Reviewed-by: Michael Kelley <[email protected]>
Signed-off-by: Saurabh Sengar <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/scsi/storvsc_drv.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -1997,7 +1997,7 @@ static int storvsc_probe(struct hv_devic
*/
host_dev->handle_error_wq =
alloc_ordered_workqueue("storvsc_error_wq_%d",
- WQ_MEM_RECLAIM,
+ 0,
host->host_no);
if (!host_dev->handle_error_wq) {
ret = -ENOMEM;


2022-08-29 12:09:45

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 16/86] NFS: Dont allocate nfs_fattr on the stack in __nfs42_ssc_open()

From: Trond Myklebust <[email protected]>

[ Upstream commit 156cd28562a4e8ca454d11b234d9f634a45d6390 ]

The preferred behaviour is always to allocate struct nfs_fattr from the
slab.

Signed-off-by: Trond Myklebust <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
fs/nfs/nfs4file.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c
index 9fdecd9090493..4928eaa0d4c02 100644
--- a/fs/nfs/nfs4file.c
+++ b/fs/nfs/nfs4file.c
@@ -321,7 +321,7 @@ static int read_name_gen = 1;
static struct file *__nfs42_ssc_open(struct vfsmount *ss_mnt,
struct nfs_fh *src_fh, nfs4_stateid *stateid)
{
- struct nfs_fattr fattr;
+ struct nfs_fattr *fattr = nfs_alloc_fattr();
struct file *filep, *res;
struct nfs_server *server;
struct inode *r_ino = NULL;
@@ -332,9 +332,10 @@ static struct file *__nfs42_ssc_open(struct vfsmount *ss_mnt,

server = NFS_SERVER(ss_mnt->mnt_root->d_inode);

- nfs_fattr_init(&fattr);
+ if (!fattr)
+ return ERR_PTR(-ENOMEM);

- status = nfs4_proc_getattr(server, src_fh, &fattr, NULL, NULL);
+ status = nfs4_proc_getattr(server, src_fh, fattr, NULL, NULL);
if (status < 0) {
res = ERR_PTR(status);
goto out;
@@ -347,7 +348,7 @@ static struct file *__nfs42_ssc_open(struct vfsmount *ss_mnt,
goto out;
snprintf(read_name, len, SSC_READ_NAME_BODY, read_name_gen++);

- r_ino = nfs_fhget(ss_mnt->mnt_root->d_inode->i_sb, src_fh, &fattr,
+ r_ino = nfs_fhget(ss_mnt->mnt_root->d_inode->i_sb, src_fh, fattr,
NULL);
if (IS_ERR(r_ino)) {
res = ERR_CAST(r_ino);
@@ -392,6 +393,7 @@ static struct file *__nfs42_ssc_open(struct vfsmount *ss_mnt,
out_free_name:
kfree(read_name);
out:
+ nfs_free_fattr(fattr);
return res;
out_stateowner:
nfs4_put_state_owner(sp);
--
2.35.1



2022-08-29 12:11:28

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 12/86] xfrm: fix refcount leak in __xfrm_policy_check()

From: Xin Xiong <[email protected]>

[ Upstream commit 9c9cb23e00ddf45679b21b4dacc11d1ae7961ebe ]

The issue happens on an error path in __xfrm_policy_check(). When the
fetching process of the object `pols[1]` fails, the function simply
returns 0, forgetting to decrement the reference count of `pols[0]`,
which is incremented earlier by either xfrm_sk_policy_lookup() or
xfrm_policy_lookup(). This may result in memory leaks.

Fix it by decreasing the reference count of `pols[0]` in that path.

Fixes: 134b0fc544ba ("IPsec: propagate security module errors up from flow_cache_lookup")
Signed-off-by: Xin Xiong <[email protected]>
Signed-off-by: Xin Tan <[email protected]>
Signed-off-by: Steffen Klassert <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/xfrm/xfrm_policy.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 603b05ed7eb4c..2cd66f3e52386 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -3641,6 +3641,7 @@ int __xfrm_policy_check(struct sock *sk, int dir, struct sk_buff *skb,
if (pols[1]) {
if (IS_ERR(pols[1])) {
XFRM_INC_STATS(net, LINUX_MIB_XFRMINPOLERROR);
+ xfrm_pol_put(pols[0]);
return 0;
}
pols[1]->curlft.use_time = ktime_get_real_seconds();
--
2.35.1



2022-08-29 12:11:31

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 46/86] net: Fix data-races around sysctl_[rw]mem_(max|default).

From: Kuniyuki Iwashima <[email protected]>

[ Upstream commit 1227c1771dd2ad44318aa3ab9e3a293b3f34ff2a ]

While reading sysctl_[rw]mem_(max|default), they can be changed
concurrently. Thus, we need to add READ_ONCE() to its readers.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/core/filter.c | 4 ++--
net/core/sock.c | 8 ++++----
net/ipv4/ip_output.c | 2 +-
net/ipv4/tcp_output.c | 2 +-
net/netfilter/ipvs/ip_vs_sync.c | 4 ++--
5 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index 815edf7bc4390..6a90c1eb6f67e 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -4713,14 +4713,14 @@ static int _bpf_setsockopt(struct sock *sk, int level, int optname,
/* Only some socketops are supported */
switch (optname) {
case SO_RCVBUF:
- val = min_t(u32, val, sysctl_rmem_max);
+ val = min_t(u32, val, READ_ONCE(sysctl_rmem_max));
val = min_t(int, val, INT_MAX / 2);
sk->sk_userlocks |= SOCK_RCVBUF_LOCK;
WRITE_ONCE(sk->sk_rcvbuf,
max_t(int, val * 2, SOCK_MIN_RCVBUF));
break;
case SO_SNDBUF:
- val = min_t(u32, val, sysctl_wmem_max);
+ val = min_t(u32, val, READ_ONCE(sysctl_wmem_max));
val = min_t(int, val, INT_MAX / 2);
sk->sk_userlocks |= SOCK_SNDBUF_LOCK;
WRITE_ONCE(sk->sk_sndbuf,
diff --git a/net/core/sock.c b/net/core/sock.c
index 6d9af4ef93d7a..25d25dcd0c3db 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -887,7 +887,7 @@ int sock_setsockopt(struct socket *sock, int level, int optname,
* play 'guess the biggest size' games. RCVBUF/SNDBUF
* are treated in BSD as hints
*/
- val = min_t(u32, val, sysctl_wmem_max);
+ val = min_t(u32, val, READ_ONCE(sysctl_wmem_max));
set_sndbuf:
/* Ensure val * 2 fits into an int, to prevent max_t()
* from treating it as a negative value.
@@ -919,7 +919,7 @@ int sock_setsockopt(struct socket *sock, int level, int optname,
* play 'guess the biggest size' games. RCVBUF/SNDBUF
* are treated in BSD as hints
*/
- __sock_set_rcvbuf(sk, min_t(u32, val, sysctl_rmem_max));
+ __sock_set_rcvbuf(sk, min_t(u32, val, READ_ONCE(sysctl_rmem_max)));
break;

case SO_RCVBUFFORCE:
@@ -2974,8 +2974,8 @@ void sock_init_data(struct socket *sock, struct sock *sk)
timer_setup(&sk->sk_timer, NULL, 0);

sk->sk_allocation = GFP_KERNEL;
- sk->sk_rcvbuf = sysctl_rmem_default;
- sk->sk_sndbuf = sysctl_wmem_default;
+ sk->sk_rcvbuf = READ_ONCE(sysctl_rmem_default);
+ sk->sk_sndbuf = READ_ONCE(sysctl_wmem_default);
sk->sk_state = TCP_CLOSE;
sk_set_socket(sk, sock);

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index f77b0af3cb657..0dbf950de832f 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1721,7 +1721,7 @@ void ip_send_unicast_reply(struct sock *sk, struct sk_buff *skb,

sk->sk_protocol = ip_hdr(skb)->protocol;
sk->sk_bound_dev_if = arg->bound_dev_if;
- sk->sk_sndbuf = sysctl_wmem_default;
+ sk->sk_sndbuf = READ_ONCE(sysctl_wmem_default);
ipc.sockc.mark = fl4.flowi4_mark;
err = ip_append_data(sk, &fl4, ip_reply_glue_bits, arg->iov->iov_base,
len, 0, &ipc, &rt, MSG_DONTWAIT);
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index c90c9541996bf..48fce999dc612 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -239,7 +239,7 @@ void tcp_select_initial_window(const struct sock *sk, int __space, __u32 mss,
if (wscale_ok) {
/* Set window scaling on max possible window */
space = max_t(u32, space, READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_rmem[2]));
- space = max_t(u32, space, sysctl_rmem_max);
+ space = max_t(u32, space, READ_ONCE(sysctl_rmem_max));
space = min_t(u32, space, *window_clamp);
*rcv_wscale = clamp_t(int, ilog2(space) - 15,
0, TCP_MAX_WSCALE);
diff --git a/net/netfilter/ipvs/ip_vs_sync.c b/net/netfilter/ipvs/ip_vs_sync.c
index 16b48064f715e..daab857c52a80 100644
--- a/net/netfilter/ipvs/ip_vs_sync.c
+++ b/net/netfilter/ipvs/ip_vs_sync.c
@@ -1280,12 +1280,12 @@ static void set_sock_size(struct sock *sk, int mode, int val)
lock_sock(sk);
if (mode) {
val = clamp_t(int, val, (SOCK_MIN_SNDBUF + 1) / 2,
- sysctl_wmem_max);
+ READ_ONCE(sysctl_wmem_max));
sk->sk_sndbuf = val * 2;
sk->sk_userlocks |= SOCK_SNDBUF_LOCK;
} else {
val = clamp_t(int, val, (SOCK_MIN_RCVBUF + 1) / 2,
- sysctl_rmem_max);
+ READ_ONCE(sysctl_rmem_max));
sk->sk_rcvbuf = val * 2;
sk->sk_userlocks |= SOCK_RCVBUF_LOCK;
}
--
2.35.1



2022-08-29 12:11:32

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 14/86] af_key: Do not call xfrm_probe_algs in parallel

From: Herbert Xu <[email protected]>

[ Upstream commit ba953a9d89a00c078b85f4b190bc1dde66fe16b5 ]

When namespace support was added to xfrm/afkey, it caused the
previously single-threaded call to xfrm_probe_algs to become
multi-threaded. This is buggy and needs to be fixed with a mutex.

Reported-by: Abhishek Shah <[email protected]>
Fixes: 283bc9f35bbb ("xfrm: Namespacify xfrm state/policy locks")
Signed-off-by: Herbert Xu <[email protected]>
Signed-off-by: Steffen Klassert <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/key/af_key.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/net/key/af_key.c b/net/key/af_key.c
index 2aa16a171285b..05e2710988883 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -1701,9 +1701,12 @@ static int pfkey_register(struct sock *sk, struct sk_buff *skb, const struct sad
pfk->registered |= (1<<hdr->sadb_msg_satype);
}

+ mutex_lock(&pfkey_mutex);
xfrm_probe_algs();

supp_skb = compose_sadb_supported(hdr, GFP_KERNEL | __GFP_ZERO);
+ mutex_unlock(&pfkey_mutex);
+
if (!supp_skb) {
if (hdr->sadb_msg_satype != SADB_SATYPE_UNSPEC)
pfk->registered &= ~(1<<hdr->sadb_msg_satype);
--
2.35.1



2022-08-29 12:11:42

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 18/86] SUNRPC: RPC level errors should set task->tk_rpc_status

From: Trond Myklebust <[email protected]>

[ Upstream commit ed06fce0b034b2e25bd93430f5c4cbb28036cc1a ]

Fix up a case in call_encode() where we're failing to set
task->tk_rpc_status when an RPC level error occurred.

Fixes: 9c5948c24869 ("SUNRPC: task should be exit if encode return EKEYEXPIRED more times")
Signed-off-by: Trond Myklebust <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/sunrpc/clnt.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index c5af31312e0cf..78c6648af7827 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -1867,7 +1867,7 @@ call_encode(struct rpc_task *task)
break;
case -EKEYEXPIRED:
if (!task->tk_cred_retry) {
- rpc_exit(task, task->tk_status);
+ rpc_call_rpcerror(task, task->tk_status);
} else {
task->tk_action = call_refresh;
task->tk_cred_retry--;
--
2.35.1



2022-08-29 12:11:50

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 44/86] tcp: tweak len/truesize ratio for coalesce candidates

From: Eric Dumazet <[email protected]>

[ Upstream commit 240bfd134c592791fdceba1ce7fc3f973c33df2d ]

tcp_grow_window() is using skb->len/skb->truesize to increase tp->rcv_ssthresh
which has a direct impact on advertized window sizes.

We added TCP coalescing in linux-3.4 & linux-3.5:

Instead of storing skbs with one or two MSS in receive queue (or OFO queue),
we try to append segments together to reduce memory overhead.

High performance network drivers tend to cook skb with 3 parts :

1) sk_buff structure (256 bytes)
2) skb->head contains room to copy headers as needed, and skb_shared_info
3) page fragment(s) containing the ~1514 bytes frame (or more depending on MTU)

Once coalesced into a previous skb, 1) and 2) are freed.

We can therefore tweak the way we compute len/truesize ratio knowing
that skb->truesize is inflated by 1) and 2) soon to be freed.

This is done only for in-order skb, or skb coalesced into OFO queue.

The result is that low rate flows no longer pay the memory price of having
low GRO aggregation factor. Same result for drivers not using GRO.

This is critical to allow a big enough receiver window,
typically tcp_rmem[2] / 2.

We have been using this at Google for about 5 years, it is due time
to make it upstream.

Signed-off-by: Eric Dumazet <[email protected]>
Cc: Soheil Hassas Yeganeh <[email protected]>
Cc: Neal Cardwell <[email protected]>
Cc: Yuchung Cheng <[email protected]>
Acked-by: Soheil Hassas Yeganeh <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/ipv4/tcp_input.c | 38 ++++++++++++++++++++++++++++++--------
1 file changed, 30 insertions(+), 8 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index d35e88b5ffcbe..33a3fb04ac4df 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -454,11 +454,12 @@ static void tcp_sndbuf_expand(struct sock *sk)
*/

/* Slow part of check#2. */
-static int __tcp_grow_window(const struct sock *sk, const struct sk_buff *skb)
+static int __tcp_grow_window(const struct sock *sk, const struct sk_buff *skb,
+ unsigned int skbtruesize)
{
struct tcp_sock *tp = tcp_sk(sk);
/* Optimize this! */
- int truesize = tcp_win_from_space(sk, skb->truesize) >> 1;
+ int truesize = tcp_win_from_space(sk, skbtruesize) >> 1;
int window = tcp_win_from_space(sk, sock_net(sk)->ipv4.sysctl_tcp_rmem[2]) >> 1;

while (tp->rcv_ssthresh <= window) {
@@ -471,7 +472,27 @@ static int __tcp_grow_window(const struct sock *sk, const struct sk_buff *skb)
return 0;
}

-static void tcp_grow_window(struct sock *sk, const struct sk_buff *skb)
+/* Even if skb appears to have a bad len/truesize ratio, TCP coalescing
+ * can play nice with us, as sk_buff and skb->head might be either
+ * freed or shared with up to MAX_SKB_FRAGS segments.
+ * Only give a boost to drivers using page frag(s) to hold the frame(s),
+ * and if no payload was pulled in skb->head before reaching us.
+ */
+static u32 truesize_adjust(bool adjust, const struct sk_buff *skb)
+{
+ u32 truesize = skb->truesize;
+
+ if (adjust && !skb_headlen(skb)) {
+ truesize -= SKB_TRUESIZE(skb_end_offset(skb));
+ /* paranoid check, some drivers might be buggy */
+ if (unlikely((int)truesize < (int)skb->len))
+ truesize = skb->truesize;
+ }
+ return truesize;
+}
+
+static void tcp_grow_window(struct sock *sk, const struct sk_buff *skb,
+ bool adjust)
{
struct tcp_sock *tp = tcp_sk(sk);
int room;
@@ -480,15 +501,16 @@ static void tcp_grow_window(struct sock *sk, const struct sk_buff *skb)

/* Check #1 */
if (room > 0 && !tcp_under_memory_pressure(sk)) {
+ unsigned int truesize = truesize_adjust(adjust, skb);
int incr;

/* Check #2. Increase window, if skb with such overhead
* will fit to rcvbuf in future.
*/
- if (tcp_win_from_space(sk, skb->truesize) <= skb->len)
+ if (tcp_win_from_space(sk, truesize) <= skb->len)
incr = 2 * tp->advmss;
else
- incr = __tcp_grow_window(sk, skb);
+ incr = __tcp_grow_window(sk, skb, truesize);

if (incr) {
incr = max_t(int, incr, 2 * skb->len);
@@ -782,7 +804,7 @@ static void tcp_event_data_recv(struct sock *sk, struct sk_buff *skb)
tcp_ecn_check_ce(sk, skb);

if (skb->len >= 128)
- tcp_grow_window(sk, skb);
+ tcp_grow_window(sk, skb, true);
}

/* Called to compute a smoothed rtt estimate. The data fed to this
@@ -4761,7 +4783,7 @@ static void tcp_data_queue_ofo(struct sock *sk, struct sk_buff *skb)
* and trigger fast retransmit.
*/
if (tcp_is_sack(tp))
- tcp_grow_window(sk, skb);
+ tcp_grow_window(sk, skb, true);
kfree_skb_partial(skb, fragstolen);
skb = NULL;
goto add_sack;
@@ -4849,7 +4871,7 @@ static void tcp_data_queue_ofo(struct sock *sk, struct sk_buff *skb)
* and trigger fast retransmit.
*/
if (tcp_is_sack(tp))
- tcp_grow_window(sk, skb);
+ tcp_grow_window(sk, skb, false);
skb_condense(skb);
skb_set_owner_r(skb, sk);
}
--
2.35.1



2022-08-29 12:12:21

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 69/86] x86/unwind/orc: Unwind ftrace trampolines with correct ORC entry

From: Chen Zhongjin <[email protected]>

commit fc2e426b1161761561624ebd43ce8c8d2fa058da upstream.

When meeting ftrace trampolines in ORC unwinding, unwinder uses address
of ftrace_{regs_}call address to find the ORC entry, which gets next frame at
sp+176.

If there is an IRQ hitting at sub $0xa8,%rsp, the next frame should be
sp+8 instead of 176. It makes unwinder skip correct frame and throw
warnings such as "wrong direction" or "can't access registers", etc,
depending on the content of the incorrect frame address.

By adding the base address ftrace_{regs_}caller with the offset
*ip - ops->trampoline*, we can get the correct address to find the ORC entry.

Also change "caller" to "tramp_addr" to make variable name conform to
its content.

[ mingo: Clarified the changelog a bit. ]

Fixes: 6be7fa3c74d1 ("ftrace, orc, x86: Handle ftrace dynamically allocated trampolines")
Signed-off-by: Chen Zhongjin <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Reviewed-by: Steven Rostedt (Google) <[email protected]>
Cc: <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/kernel/unwind_orc.c | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)

--- a/arch/x86/kernel/unwind_orc.c
+++ b/arch/x86/kernel/unwind_orc.c
@@ -93,22 +93,27 @@ static struct orc_entry *orc_find(unsign
static struct orc_entry *orc_ftrace_find(unsigned long ip)
{
struct ftrace_ops *ops;
- unsigned long caller;
+ unsigned long tramp_addr, offset;

ops = ftrace_ops_trampoline(ip);
if (!ops)
return NULL;

+ /* Set tramp_addr to the start of the code copied by the trampoline */
if (ops->flags & FTRACE_OPS_FL_SAVE_REGS)
- caller = (unsigned long)ftrace_regs_call;
+ tramp_addr = (unsigned long)ftrace_regs_caller;
else
- caller = (unsigned long)ftrace_call;
+ tramp_addr = (unsigned long)ftrace_caller;
+
+ /* Now place tramp_addr to the location within the trampoline ip is at */
+ offset = ip - ops->trampoline;
+ tramp_addr += offset;

/* Prevent unlikely recursion */
- if (ip == caller)
+ if (ip == tramp_addr)
return NULL;

- return orc_find(caller);
+ return orc_find(tramp_addr);
}
#else
static struct orc_entry *orc_ftrace_find(unsigned long ip)


2022-08-29 12:12:26

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 77/86] Revert "md-raid: destroy the bitmap after destroying the thread"

From: Guoqing Jiang <[email protected]>

commit 1d258758cf06a0734482989911d184dd5837ed4e upstream.

This reverts commit e151db8ecfb019b7da31d076130a794574c89f6f. Because it
obviously breaks clustered raid as noticed by Neil though it fixed KASAN
issue for dm-raid, let's revert it and fix KASAN issue in next commit.

[1]. https://lore.kernel.org/linux-raid/[email protected]/T/#m95ac225cab7409f66c295772483d091084a6d470

Fixes: e151db8ecfb0 ("md-raid: destroy the bitmap after destroying the thread")
Signed-off-by: Guoqing Jiang <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/md/md.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -6278,11 +6278,11 @@ static void mddev_detach(struct mddev *m
static void __md_stop(struct mddev *mddev)
{
struct md_personality *pers = mddev->pers;
+ md_bitmap_destroy(mddev);
mddev_detach(mddev);
/* Ensure ->event_work is done */
if (mddev->event_work.func)
flush_workqueue(md_misc_wq);
- md_bitmap_destroy(mddev);
spin_lock(&mddev->lock);
mddev->pers = NULL;
spin_unlock(&mddev->lock);


2022-08-29 12:12:26

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 86/86] bpf: Dont use tnum_range on array range checking for poke descriptors

From: Daniel Borkmann <[email protected]>

commit a657182a5c5150cdfacb6640aad1d2712571a409 upstream.

Hsin-Wei reported a KASAN splat triggered by their BPF runtime fuzzer which
is based on a customized syzkaller:

BUG: KASAN: slab-out-of-bounds in bpf_int_jit_compile+0x1257/0x13f0
Read of size 8 at addr ffff888004e90b58 by task syz-executor.0/1489
CPU: 1 PID: 1489 Comm: syz-executor.0 Not tainted 5.19.0 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.13.0-1ubuntu1.1 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x9c/0xc9
print_address_description.constprop.0+0x1f/0x1f0
? bpf_int_jit_compile+0x1257/0x13f0
kasan_report.cold+0xeb/0x197
? kvmalloc_node+0x170/0x200
? bpf_int_jit_compile+0x1257/0x13f0
bpf_int_jit_compile+0x1257/0x13f0
? arch_prepare_bpf_dispatcher+0xd0/0xd0
? rcu_read_lock_sched_held+0x43/0x70
bpf_prog_select_runtime+0x3e8/0x640
? bpf_obj_name_cpy+0x149/0x1b0
bpf_prog_load+0x102f/0x2220
? __bpf_prog_put.constprop.0+0x220/0x220
? find_held_lock+0x2c/0x110
? __might_fault+0xd6/0x180
? lock_downgrade+0x6e0/0x6e0
? lock_is_held_type+0xa6/0x120
? __might_fault+0x147/0x180
__sys_bpf+0x137b/0x6070
? bpf_perf_link_attach+0x530/0x530
? new_sync_read+0x600/0x600
? __fget_files+0x255/0x450
? lock_downgrade+0x6e0/0x6e0
? fput+0x30/0x1a0
? ksys_write+0x1a8/0x260
__x64_sys_bpf+0x7a/0xc0
? syscall_enter_from_user_mode+0x21/0x70
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f917c4e2c2d

The problem here is that a range of tnum_range(0, map->max_entries - 1) has
limited ability to represent the concrete tight range with the tnum as the
set of resulting states from value + mask can result in a superset of the
actual intended range, and as such a tnum_in(range, reg->var_off) check may
yield true when it shouldn't, for example tnum_range(0, 2) would result in
00XX -> v = 0000, m = 0011 such that the intended set of {0, 1, 2} is here
represented by a less precise superset of {0, 1, 2, 3}. As the register is
known const scalar, really just use the concrete reg->var_off.value for the
upper index check.

Fixes: d2e4c1e6c294 ("bpf: Constant map key tracking for prog array pokes")
Reported-by: Hsin-Wei Hung <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Cc: Shung-Hsi Yu <[email protected]>
Acked-by: John Fastabend <[email protected]>
Link: https://lore.kernel.org/r/984b37f9fdf7ac36831d2137415a4a915744c1b6.1661462653.git.daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
kernel/bpf/verifier.c | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)

--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -5282,8 +5282,7 @@ record_func_key(struct bpf_verifier_env
struct bpf_insn_aux_data *aux = &env->insn_aux_data[insn_idx];
struct bpf_reg_state *regs = cur_regs(env), *reg;
struct bpf_map *map = meta->map_ptr;
- struct tnum range;
- u64 val;
+ u64 val, max;
int err;

if (func_id != BPF_FUNC_tail_call)
@@ -5293,10 +5292,11 @@ record_func_key(struct bpf_verifier_env
return -EINVAL;
}

- range = tnum_range(0, map->max_entries - 1);
reg = &regs[BPF_REG_3];
+ val = reg->var_off.value;
+ max = map->max_entries;

- if (!register_is_const(reg) || !tnum_in(range, reg->var_off)) {
+ if (!(register_is_const(reg) && val < max)) {
bpf_map_key_store(aux, BPF_MAP_KEY_POISON);
return 0;
}
@@ -5304,8 +5304,6 @@ record_func_key(struct bpf_verifier_env
err = mark_chain_precision(env, BPF_REG_3);
if (err)
return err;
-
- val = reg->var_off.value;
if (bpf_map_key_unseen(aux))
bpf_map_key_store(aux, val);
else if (!bpf_map_key_poisoned(aux) &&


2022-08-29 12:12:49

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 64/86] btrfs: fix silent failure when deleting root reference

From: Filipe Manana <[email protected]>

commit 47bf225a8d2cccb15f7e8d4a1ed9b757dd86afd7 upstream.

At btrfs_del_root_ref(), if btrfs_search_slot() returns an error, we end
up returning from the function with a value of 0 (success). This happens
because the function returns the value stored in the variable 'err',
which is 0, while the error value we got from btrfs_search_slot() is
stored in the 'ret' variable.

So fix it by setting 'err' with the error value.

Fixes: 8289ed9f93bef2 ("btrfs: replace the BUG_ON in btrfs_del_root_ref with proper error handling")
CC: [email protected] # 5.16+
Reviewed-by: Qu Wenruo <[email protected]>
Signed-off-by: Filipe Manana <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/btrfs/root-tree.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

--- a/fs/btrfs/root-tree.c
+++ b/fs/btrfs/root-tree.c
@@ -336,9 +336,10 @@ int btrfs_del_root_ref(struct btrfs_tran
key.offset = ref_id;
again:
ret = btrfs_search_slot(trans, tree_root, &key, path, -1, 1);
- if (ret < 0)
+ if (ret < 0) {
+ err = ret;
goto out;
- if (ret == 0) {
+ } else if (ret == 0) {
leaf = path->nodes[0];
ref = btrfs_item_ptr(leaf, path->slots[0],
struct btrfs_root_ref);


2022-08-29 12:13:02

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 84/86] scsi: ufs: core: Enable link lost interrupt

From: Kiwoong Kim <[email protected]>

commit 6d17a112e9a63ff6a5edffd1676b99e0ffbcd269 upstream.

Link lost is treated as fatal error with commit c99b9b230149 ("scsi: ufs:
Treat link loss as fatal error"), but the event isn't registered as
interrupt source. Enable it.

Link: https://lore.kernel.org/r/[email protected]
Fixes: c99b9b230149 ("scsi: ufs: Treat link loss as fatal error")
Reviewed-by: Bart Van Assche <[email protected]>
Signed-off-by: Kiwoong Kim <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/scsi/ufs/ufshci.h | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)

--- a/drivers/scsi/ufs/ufshci.h
+++ b/drivers/scsi/ufs/ufshci.h
@@ -129,11 +129,7 @@ enum {

#define UFSHCD_UIC_MASK (UIC_COMMAND_COMPL | UFSHCD_UIC_PWR_MASK)

-#define UFSHCD_ERROR_MASK (UIC_ERROR |\
- DEVICE_FATAL_ERROR |\
- CONTROLLER_FATAL_ERROR |\
- SYSTEM_BUS_FATAL_ERROR |\
- CRYPTO_ENGINE_FATAL_ERROR)
+#define UFSHCD_ERROR_MASK (UIC_ERROR | INT_FATAL_ERRORS)

#define INT_FATAL_ERRORS (DEVICE_FATAL_ERROR |\
CONTROLLER_FATAL_ERROR |\


2022-08-29 12:13:04

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 31/86] bnxt_en: fix NQ resource accounting during vf creation on 57500 chips

From: Vikas Gupta <[email protected]>

[ Upstream commit 09a89cc59ad67794a11e1d3dd13c5b3172adcc51 ]

There are 2 issues:

1. We should decrement hw_resc->max_nqs instead of hw_resc->max_irqs
with the number of NQs assigned to the VFs. The IRQs are fixed
on each function and cannot be re-assigned. Only the NQs are being
assigned to the VFs.

2. vf_msix is the total number of NQs to be assigned to the VFs. So
we should decrement vf_msix from hw_resc->max_nqs.

Fixes: b16b68918674 ("bnxt_en: Add SR-IOV support for 57500 chips.")
Signed-off-by: Vikas Gupta <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
index 23b80aa171dd0..819f9df9425c6 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
@@ -599,7 +599,7 @@ static int bnxt_hwrm_func_vf_resc_cfg(struct bnxt *bp, int num_vfs, bool reset)
hw_resc->max_stat_ctxs -= le16_to_cpu(req.min_stat_ctx) * n;
hw_resc->max_vnics -= le16_to_cpu(req.min_vnics) * n;
if (bp->flags & BNXT_FLAG_CHIP_P5)
- hw_resc->max_irqs -= vf_msix * n;
+ hw_resc->max_nqs -= vf_msix;

rc = pf->active_vfs;
}
--
2.35.1



2022-08-29 12:13:34

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 57/86] net: Fix a data-race around netdev_budget_usecs.

From: Kuniyuki Iwashima <[email protected]>

[ Upstream commit fa45d484c52c73f79db2c23b0cdfc6c6455093ad ]

While reading netdev_budget_usecs, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 7acf8a1e8a28 ("Replace 2 jiffies with sysctl netdev_budget_usecs to enable softirq tuning")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/core/dev.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index c4eb1b666a21c..8355cc5e11a98 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6879,7 +6879,7 @@ static __latent_entropy void net_rx_action(struct softirq_action *h)
{
struct softnet_data *sd = this_cpu_ptr(&softnet_data);
unsigned long time_limit = jiffies +
- usecs_to_jiffies(netdev_budget_usecs);
+ usecs_to_jiffies(READ_ONCE(netdev_budget_usecs));
int budget = READ_ONCE(netdev_budget);
LIST_HEAD(list);
LIST_HEAD(repoll);
--
2.35.1



2022-08-29 12:13:38

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 17/86] NFSv4.2 fix problems with __nfs42_ssc_open

From: Olga Kornievskaia <[email protected]>

[ Upstream commit fcfc8be1e9cf2f12b50dce8b579b3ae54443a014 ]

A destination server while doing a COPY shouldn't accept using the
passed in filehandle if its not a regular filehandle.

If alloc_file_pseudo() has failed, we need to decrement a reference
on the newly created inode, otherwise it leaks.

Reported-by: Al Viro <[email protected]>
Fixes: ec4b092508982 ("NFS: inter ssc open")
Signed-off-by: Olga Kornievskaia <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
fs/nfs/nfs4file.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c
index 4928eaa0d4c02..70cd0d764c447 100644
--- a/fs/nfs/nfs4file.c
+++ b/fs/nfs/nfs4file.c
@@ -341,6 +341,11 @@ static struct file *__nfs42_ssc_open(struct vfsmount *ss_mnt,
goto out;
}

+ if (!S_ISREG(fattr->mode)) {
+ res = ERR_PTR(-EBADF);
+ goto out;
+ }
+
res = ERR_PTR(-ENOMEM);
len = strlen(SSC_READ_NAME_BODY) + 16;
read_name = kzalloc(len, GFP_NOFS);
@@ -359,6 +364,7 @@ static struct file *__nfs42_ssc_open(struct vfsmount *ss_mnt,
r_ino->i_fop);
if (IS_ERR(filep)) {
res = ERR_CAST(filep);
+ iput(r_ino);
goto out_free_name;
}
filep->f_mode |= FMODE_READ;
--
2.35.1



2022-08-29 12:13:49

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 76/86] mm/hugetlb: fix hugetlb not supporting softdirty tracking

From: David Hildenbrand <[email protected]>

commit f96f7a40874d7c746680c0b9f57cef2262ae551f upstream.

Patch series "mm/hugetlb: fix write-fault handling for shared mappings", v2.

I observed that hugetlb does not support/expect write-faults in shared
mappings that would have to map the R/O-mapped page writable -- and I
found two case where we could currently get such faults and would
erroneously map an anon page into a shared mapping.

Reproducers part of the patches.

I propose to backport both fixes to stable trees. The first fix needs a
small adjustment.


This patch (of 2):

Staring at hugetlb_wp(), one might wonder where all the logic for shared
mappings is when stumbling over a write-protected page in a shared
mapping. In fact, there is none, and so far we thought we could get away
with that because e.g., mprotect() should always do the right thing and
map all pages directly writable.

Looks like we were wrong:

--------------------------------------------------------------------------
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <unistd.h>
#include <errno.h>
#include <sys/mman.h>

#define HUGETLB_SIZE (2 * 1024 * 1024u)

static void clear_softdirty(void)
{
int fd = open("/proc/self/clear_refs", O_WRONLY);
const char *ctrl = "4";
int ret;

if (fd < 0) {
fprintf(stderr, "open(clear_refs) failed\n");
exit(1);
}
ret = write(fd, ctrl, strlen(ctrl));
if (ret != strlen(ctrl)) {
fprintf(stderr, "write(clear_refs) failed\n");
exit(1);
}
close(fd);
}

int main(int argc, char **argv)
{
char *map;
int fd;

fd = open("/dev/hugepages/tmp", O_RDWR | O_CREAT);
if (!fd) {
fprintf(stderr, "open() failed\n");
return -errno;
}
if (ftruncate(fd, HUGETLB_SIZE)) {
fprintf(stderr, "ftruncate() failed\n");
return -errno;
}

map = mmap(NULL, HUGETLB_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
if (map == MAP_FAILED) {
fprintf(stderr, "mmap() failed\n");
return -errno;
}

*map = 0;

if (mprotect(map, HUGETLB_SIZE, PROT_READ)) {
fprintf(stderr, "mmprotect() failed\n");
return -errno;
}

clear_softdirty();

if (mprotect(map, HUGETLB_SIZE, PROT_READ|PROT_WRITE)) {
fprintf(stderr, "mmprotect() failed\n");
return -errno;
}

*map = 0;

return 0;
}
--------------------------------------------------------------------------

Above test fails with SIGBUS when there is only a single free hugetlb page.
# echo 1 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
# ./test
Bus error (core dumped)

And worse, with sufficient free hugetlb pages it will map an anonymous page
into a shared mapping, for example, messing up accounting during unmap
and breaking MAP_SHARED semantics:
# echo 2 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
# ./test
# cat /proc/meminfo | grep HugePages_
HugePages_Total: 2
HugePages_Free: 1
HugePages_Rsvd: 18446744073709551615
HugePages_Surp: 0

Reason in this particular case is that vma_wants_writenotify() will
return "true", removing VM_SHARED in vma_set_page_prot() to map pages
write-protected. Let's teach vma_wants_writenotify() that hugetlb does not
support softdirty tracking.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 64e455079e1b ("mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared")
Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Mike Kravetz <[email protected]>
Cc: Peter Feiner <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Cyrill Gorcunov <[email protected]>
Cc: Pavel Emelyanov <[email protected]>
Cc: Jamie Liu <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Bjorn Helgaas <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: <[email protected]> [3.18+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
mm/mmap.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1694,8 +1694,12 @@ int vma_wants_writenotify(struct vm_area
pgprot_val(vm_pgprot_modify(vm_page_prot, vm_flags)))
return 0;

- /* Do we need to track softdirty? */
- if (IS_ENABLED(CONFIG_MEM_SOFT_DIRTY) && !(vm_flags & VM_SOFTDIRTY))
+ /*
+ * Do we need to track softdirty? hugetlb does not support softdirty
+ * tracking yet.
+ */
+ if (IS_ENABLED(CONFIG_MEM_SOFT_DIRTY) && !(vm_flags & VM_SOFTDIRTY) &&
+ !is_vm_hugetlb_page(vma))
return 1;

/* Specialty mapping? */


2022-08-29 12:14:02

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 73/86] s390: fix double free of GS and RI CBs on fork() failure

From: Brian Foster <[email protected]>

commit 13cccafe0edcd03bf1c841de8ab8a1c8e34f77d9 upstream.

The pointers for guarded storage and runtime instrumentation control
blocks are stored in the thread_struct of the associated task. These
pointers are initially copied on fork() via arch_dup_task_struct()
and then cleared via copy_thread() before fork() returns. If fork()
happens to fail after the initial task dup and before copy_thread(),
the newly allocated task and associated thread_struct memory are
freed via free_task() -> arch_release_task_struct(). This results in
a double free of the guarded storage and runtime info structs
because the fields in the failed task still refer to memory
associated with the source task.

This problem can manifest as a BUG_ON() in set_freepointer() (with
CONFIG_SLAB_FREELIST_HARDENED enabled) or KASAN splat (if enabled)
when running trinity syscall fuzz tests on s390x. To avoid this
problem, clear the associated pointer fields in
arch_dup_task_struct() immediately after the new task is copied.
Note that the RI flag is still cleared in copy_thread() because it
resides in thread stack memory and that is where stack info is
copied.

Signed-off-by: Brian Foster <[email protected]>
Fixes: 8d9047f8b967c ("s390/runtime instrumentation: simplify task exit handling")
Fixes: 7b83c6297d2fc ("s390/guarded storage: simplify task exit handling")
Cc: <[email protected]> # 4.15
Reviewed-by: Gerald Schaefer <[email protected]>
Reviewed-by: Heiko Carstens <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Vasily Gorbik <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/s390/kernel/process.c | 22 ++++++++++++++++------
1 file changed, 16 insertions(+), 6 deletions(-)

--- a/arch/s390/kernel/process.c
+++ b/arch/s390/kernel/process.c
@@ -77,6 +77,18 @@ int arch_dup_task_struct(struct task_str

memcpy(dst, src, arch_task_struct_size);
dst->thread.fpu.regs = dst->thread.fpu.fprs;
+
+ /*
+ * Don't transfer over the runtime instrumentation or the guarded
+ * storage control block pointers. These fields are cleared here instead
+ * of in copy_thread() to avoid premature freeing of associated memory
+ * on fork() failure. Wait to clear the RI flag because ->stack still
+ * refers to the source thread.
+ */
+ dst->thread.ri_cb = NULL;
+ dst->thread.gs_cb = NULL;
+ dst->thread.gs_bc_cb = NULL;
+
return 0;
}

@@ -134,13 +146,11 @@ int copy_thread(unsigned long clone_flag
frame->childregs.flags = 0;
if (new_stackp)
frame->childregs.gprs[15] = new_stackp;
-
- /* Don't copy runtime instrumentation info */
- p->thread.ri_cb = NULL;
+ /*
+ * Clear the runtime instrumentation flag after the above childregs
+ * copy. The CB pointer was already cleared in arch_dup_task_struct().
+ */
frame->childregs.psw.mask &= ~PSW_MASK_RI;
- /* Don't copy guarded storage control block */
- p->thread.gs_cb = NULL;
- p->thread.gs_bc_cb = NULL;

/* Set a new TLS ? */
if (clone_flags & CLONE_SETTLS) {


2022-08-29 12:14:16

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 39/86] netfilter: nft_cmp: optimize comparison for 16-bytes

From: Pablo Neira Ayuso <[email protected]>

[ Upstream commit 23f68d462984bfda47c7bf663dca347e8e3df549 ]

Allow up to 16-byte comparisons with a new cmp fast version. Use two
64-bit words and calculate the mask representing the bits to be
compared. Make sure the comparison is 64-bit aligned and avoid
out-of-bound memory access on registers.

Signed-off-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
include/net/netfilter/nf_tables_core.h | 9 +++
net/netfilter/nf_tables_core.c | 16 ++++
net/netfilter/nft_cmp.c | 102 ++++++++++++++++++++++++-
3 files changed, 125 insertions(+), 2 deletions(-)

diff --git a/include/net/netfilter/nf_tables_core.h b/include/net/netfilter/nf_tables_core.h
index fd10a7862fdc6..ce75121782bf7 100644
--- a/include/net/netfilter/nf_tables_core.h
+++ b/include/net/netfilter/nf_tables_core.h
@@ -38,6 +38,14 @@ struct nft_cmp_fast_expr {
bool inv;
};

+struct nft_cmp16_fast_expr {
+ struct nft_data data;
+ struct nft_data mask;
+ u8 sreg;
+ u8 len;
+ bool inv;
+};
+
struct nft_immediate_expr {
struct nft_data data;
u8 dreg;
@@ -55,6 +63,7 @@ static inline u32 nft_cmp_fast_mask(unsigned int len)
}

extern const struct nft_expr_ops nft_cmp_fast_ops;
+extern const struct nft_expr_ops nft_cmp16_fast_ops;

struct nft_payload {
enum nft_payload_bases base:8;
diff --git a/net/netfilter/nf_tables_core.c b/net/netfilter/nf_tables_core.c
index 6dd27c8cd4253..9dc18429ed875 100644
--- a/net/netfilter/nf_tables_core.c
+++ b/net/netfilter/nf_tables_core.c
@@ -67,6 +67,20 @@ static void nft_cmp_fast_eval(const struct nft_expr *expr,
regs->verdict.code = NFT_BREAK;
}

+static void nft_cmp16_fast_eval(const struct nft_expr *expr,
+ struct nft_regs *regs)
+{
+ const struct nft_cmp16_fast_expr *priv = nft_expr_priv(expr);
+ const u64 *reg_data = (const u64 *)&regs->data[priv->sreg];
+ const u64 *mask = (const u64 *)&priv->mask;
+ const u64 *data = (const u64 *)&priv->data;
+
+ if (((reg_data[0] & mask[0]) == data[0] &&
+ ((reg_data[1] & mask[1]) == data[1])) ^ priv->inv)
+ return;
+ regs->verdict.code = NFT_BREAK;
+}
+
static noinline void __nft_trace_verdict(struct nft_traceinfo *info,
const struct nft_chain *chain,
const struct nft_regs *regs)
@@ -215,6 +229,8 @@ nft_do_chain(struct nft_pktinfo *pkt, void *priv)
nft_rule_for_each_expr(expr, last, rule) {
if (expr->ops == &nft_cmp_fast_ops)
nft_cmp_fast_eval(expr, &regs);
+ else if (expr->ops == &nft_cmp16_fast_ops)
+ nft_cmp16_fast_eval(expr, &regs);
else if (expr->ops == &nft_bitwise_fast_ops)
nft_bitwise_fast_eval(expr, &regs);
else if (expr->ops != &nft_payload_fast_ops ||
diff --git a/net/netfilter/nft_cmp.c b/net/netfilter/nft_cmp.c
index 47b6d05f1ae69..917072af09df9 100644
--- a/net/netfilter/nft_cmp.c
+++ b/net/netfilter/nft_cmp.c
@@ -272,12 +272,103 @@ const struct nft_expr_ops nft_cmp_fast_ops = {
.offload = nft_cmp_fast_offload,
};

+static u32 nft_cmp_mask(u32 bitlen)
+{
+ return (__force u32)cpu_to_le32(~0U >> (sizeof(u32) * BITS_PER_BYTE - bitlen));
+}
+
+static void nft_cmp16_fast_mask(struct nft_data *data, unsigned int bitlen)
+{
+ int len = bitlen / BITS_PER_BYTE;
+ int i, words = len / sizeof(u32);
+
+ for (i = 0; i < words; i++) {
+ data->data[i] = 0xffffffff;
+ bitlen -= sizeof(u32) * BITS_PER_BYTE;
+ }
+
+ if (len % sizeof(u32))
+ data->data[i++] = nft_cmp_mask(bitlen);
+
+ for (; i < 4; i++)
+ data->data[i] = 0;
+}
+
+static int nft_cmp16_fast_init(const struct nft_ctx *ctx,
+ const struct nft_expr *expr,
+ const struct nlattr * const tb[])
+{
+ struct nft_cmp16_fast_expr *priv = nft_expr_priv(expr);
+ struct nft_data_desc desc;
+ int err;
+
+ err = nft_data_init(NULL, &priv->data, sizeof(priv->data), &desc,
+ tb[NFTA_CMP_DATA]);
+ if (err < 0)
+ return err;
+
+ err = nft_parse_register_load(tb[NFTA_CMP_SREG], &priv->sreg, desc.len);
+ if (err < 0)
+ return err;
+
+ nft_cmp16_fast_mask(&priv->mask, desc.len * BITS_PER_BYTE);
+ priv->inv = ntohl(nla_get_be32(tb[NFTA_CMP_OP])) != NFT_CMP_EQ;
+ priv->len = desc.len;
+
+ return 0;
+}
+
+static int nft_cmp16_fast_offload(struct nft_offload_ctx *ctx,
+ struct nft_flow_rule *flow,
+ const struct nft_expr *expr)
+{
+ const struct nft_cmp16_fast_expr *priv = nft_expr_priv(expr);
+ struct nft_cmp_expr cmp = {
+ .data = priv->data,
+ .sreg = priv->sreg,
+ .len = priv->len,
+ .op = priv->inv ? NFT_CMP_NEQ : NFT_CMP_EQ,
+ };
+
+ return __nft_cmp_offload(ctx, flow, &cmp);
+}
+
+static int nft_cmp16_fast_dump(struct sk_buff *skb, const struct nft_expr *expr)
+{
+ const struct nft_cmp16_fast_expr *priv = nft_expr_priv(expr);
+ enum nft_cmp_ops op = priv->inv ? NFT_CMP_NEQ : NFT_CMP_EQ;
+
+ if (nft_dump_register(skb, NFTA_CMP_SREG, priv->sreg))
+ goto nla_put_failure;
+ if (nla_put_be32(skb, NFTA_CMP_OP, htonl(op)))
+ goto nla_put_failure;
+
+ if (nft_data_dump(skb, NFTA_CMP_DATA, &priv->data,
+ NFT_DATA_VALUE, priv->len) < 0)
+ goto nla_put_failure;
+ return 0;
+
+nla_put_failure:
+ return -1;
+}
+
+
+const struct nft_expr_ops nft_cmp16_fast_ops = {
+ .type = &nft_cmp_type,
+ .size = NFT_EXPR_SIZE(sizeof(struct nft_cmp16_fast_expr)),
+ .eval = NULL, /* inlined */
+ .init = nft_cmp16_fast_init,
+ .dump = nft_cmp16_fast_dump,
+ .offload = nft_cmp16_fast_offload,
+};
+
static const struct nft_expr_ops *
nft_cmp_select_ops(const struct nft_ctx *ctx, const struct nlattr * const tb[])
{
struct nft_data_desc desc;
struct nft_data data;
enum nft_cmp_ops op;
+ u8 sreg;
int err;

if (tb[NFTA_CMP_SREG] == NULL ||
@@ -306,9 +397,16 @@ nft_cmp_select_ops(const struct nft_ctx *ctx, const struct nlattr * const tb[])
if (desc.type != NFT_DATA_VALUE)
goto err1;

- if (desc.len <= sizeof(u32) && (op == NFT_CMP_EQ || op == NFT_CMP_NEQ))
- return &nft_cmp_fast_ops;
+ sreg = ntohl(nla_get_be32(tb[NFTA_CMP_SREG]));

+ if (op == NFT_CMP_EQ || op == NFT_CMP_NEQ) {
+ if (desc.len <= sizeof(u32))
+ return &nft_cmp_fast_ops;
+ else if (desc.len <= sizeof(data) &&
+ ((sreg >= NFT_REG_1 && sreg <= NFT_REG_4) ||
+ (sreg >= NFT_REG32_00 && sreg <= NFT_REG32_12 && sreg % 2 == 0)))
+ return &nft_cmp16_fast_ops;
+ }
return &nft_cmp_ops;
err1:
nft_data_release(&data, desc.type);
--
2.35.1



2022-08-29 12:15:06

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 22/86] nfc: pn533: Fix use-after-free bugs caused by pn532_cmd_timeout

From: Duoming Zhou <[email protected]>

[ Upstream commit f1e941dbf80a9b8bab0bffbc4cbe41cc7f4c6fb6 ]

When the pn532 uart device is detaching, the pn532_uart_remove()
is called. But there are no functions in pn532_uart_remove() that
could delete the cmd_timeout timer, which will cause use-after-free
bugs. The process is shown below:

(thread 1) | (thread 2)
| pn532_uart_send_frame
pn532_uart_remove | mod_timer(&pn532->cmd_timeout,...)
... | (wait a time)
kfree(pn532) //FREE | pn532_cmd_timeout
| pn532_uart_send_frame
| pn532->... //USE

This patch adds del_timer_sync() in pn532_uart_remove() in order to
prevent the use-after-free bugs. What's more, the pn53x_unregister_nfc()
is well synchronized, it sets nfc_dev->shutting_down to true and there
are no syscalls could restart the cmd_timeout timer.

Fixes: c656aa4c27b1 ("nfc: pn533: add UART phy driver")
Signed-off-by: Duoming Zhou <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/nfc/pn533/uart.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/nfc/pn533/uart.c b/drivers/nfc/pn533/uart.c
index a0665d8ea85bc..e92535ebb5287 100644
--- a/drivers/nfc/pn533/uart.c
+++ b/drivers/nfc/pn533/uart.c
@@ -310,6 +310,7 @@ static void pn532_uart_remove(struct serdev_device *serdev)
pn53x_unregister_nfc(pn532->priv);
serdev_device_close(serdev);
pn53x_common_clean(pn532->priv);
+ del_timer_sync(&pn532->cmd_timeout);
kfree_skb(pn532->recv_skb);
kfree(pn532);
}
--
2.35.1



2022-08-29 12:15:29

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 37/86] netfilter: nftables: remove redundant assignment of variable err

From: Colin Ian King <[email protected]>

[ Upstream commit 626899a02e6afcd4b2ce5c0551092e3554cec4aa ]

The variable err is being assigned a value that is never read,
the same error number is being returned at the error return
path via label err1. Clean up the code by removing the assignment.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/netfilter/nft_cmp.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/net/netfilter/nft_cmp.c b/net/netfilter/nft_cmp.c
index b529c0e865466..47b6d05f1ae69 100644
--- a/net/netfilter/nft_cmp.c
+++ b/net/netfilter/nft_cmp.c
@@ -303,10 +303,8 @@ nft_cmp_select_ops(const struct nft_ctx *ctx, const struct nlattr * const tb[])
if (err < 0)
return ERR_PTR(err);

- if (desc.type != NFT_DATA_VALUE) {
- err = -EINVAL;
+ if (desc.type != NFT_DATA_VALUE)
goto err1;
- }

if (desc.len <= sizeof(u32) && (op == NFT_CMP_EQ || op == NFT_CMP_NEQ))
return &nft_cmp_fast_ops;
--
2.35.1



2022-08-29 12:16:17

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 56/86] net: Fix a data-race around netdev_budget.

From: Kuniyuki Iwashima <[email protected]>

[ Upstream commit 2e0c42374ee32e72948559d2ae2f7ba3dc6b977c ]

While reading netdev_budget, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 51b0bdedb8e7 ("[NET]: Separate two usages of netdev_max_backlog.")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/core/dev.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 1ea75768c5b23..c4eb1b666a21c 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6880,7 +6880,7 @@ static __latent_entropy void net_rx_action(struct softirq_action *h)
struct softnet_data *sd = this_cpu_ptr(&softnet_data);
unsigned long time_limit = jiffies +
usecs_to_jiffies(netdev_budget_usecs);
- int budget = netdev_budget;
+ int budget = READ_ONCE(netdev_budget);
LIST_HEAD(list);
LIST_HEAD(repoll);

--
2.35.1



2022-08-29 12:18:15

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 78/86] md: call __md_stop_writes in md_stop

From: Guoqing Jiang <[email protected]>

commit 0dd84b319352bb8ba64752d4e45396d8b13e6018 upstream.

>From the link [1], we can see raid1d was running even after the path
raid_dtr -> md_stop -> __md_stop.

Let's stop write first in destructor to align with normal md-raid to
fix the KASAN issue.

[1]. https://lore.kernel.org/linux-raid/CAPhsuW5gc4AakdGNdF8ubpezAuDLFOYUO_sfMZcec6hQFm8nhg@mail.gmail.com/T/#m7f12bf90481c02c6d2da68c64aeed4779b7df74a

Fixes: 48df498daf62 ("md: move bitmap_destroy to the beginning of __md_stop")
Reported-by: Mikulas Patocka <[email protected]>
Signed-off-by: Guoqing Jiang <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/md/md.c | 1 +
1 file changed, 1 insertion(+)

--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -6299,6 +6299,7 @@ void md_stop(struct mddev *mddev)
/* stop the array and free an attached data structures.
* This is called from dm-raid
*/
+ __md_stop_writes(mddev);
__md_stop(mddev);
bioset_exit(&mddev->bio_set);
bioset_exit(&mddev->sync_set);


2022-08-29 12:19:13

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 75/86] xen/privcmd: fix error exit of privcmd_ioctl_dm_op()

From: Juergen Gross <[email protected]>

commit c5deb27895e017a0267de0a20d140ad5fcc55a54 upstream.

The error exit of privcmd_ioctl_dm_op() is calling unlock_pages()
potentially with pages being NULL, leading to a NULL dereference.

Additionally lock_pages() doesn't check for pin_user_pages_fast()
having been completely successful, resulting in potentially not
locking all pages into memory. This could result in sporadic failures
when using the related memory in user mode.

Fix all of that by calling unlock_pages() always with the real number
of pinned pages, which will be zero in case pages being NULL, and by
checking the number of pages pinned by pin_user_pages_fast() matching
the expected number of pages.

Cc: <[email protected]>
Fixes: ab520be8cd5d ("xen/privcmd: Add IOCTL_PRIVCMD_DM_OP")
Reported-by: Rustam Subkhankulov <[email protected]>
Signed-off-by: Juergen Gross <[email protected]>
Reviewed-by: Jan Beulich <[email protected]>
Reviewed-by: Oleksandr Tyshchenko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Juergen Gross <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/xen/privcmd.c | 21 +++++++++++----------
1 file changed, 11 insertions(+), 10 deletions(-)

--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -581,27 +581,30 @@ static int lock_pages(
struct privcmd_dm_op_buf kbufs[], unsigned int num,
struct page *pages[], unsigned int nr_pages, unsigned int *pinned)
{
- unsigned int i;
+ unsigned int i, off = 0;

- for (i = 0; i < num; i++) {
+ for (i = 0; i < num; ) {
unsigned int requested;
int page_count;

requested = DIV_ROUND_UP(
offset_in_page(kbufs[i].uptr) + kbufs[i].size,
- PAGE_SIZE);
+ PAGE_SIZE) - off;
if (requested > nr_pages)
return -ENOSPC;

page_count = pin_user_pages_fast(
- (unsigned long) kbufs[i].uptr,
+ (unsigned long)kbufs[i].uptr + off * PAGE_SIZE,
requested, FOLL_WRITE, pages);
- if (page_count < 0)
- return page_count;
+ if (page_count <= 0)
+ return page_count ? : -EFAULT;

*pinned += page_count;
nr_pages -= page_count;
pages += page_count;
+
+ off = (requested == page_count) ? 0 : off + page_count;
+ i += !off;
}

return 0;
@@ -677,10 +680,8 @@ static long privcmd_ioctl_dm_op(struct f
}

rc = lock_pages(kbufs, kdata.num, pages, nr_pages, &pinned);
- if (rc < 0) {
- nr_pages = pinned;
+ if (rc < 0)
goto out;
- }

for (i = 0; i < kdata.num; i++) {
set_xen_guest_handle(xbufs[i].h, kbufs[i].uptr);
@@ -692,7 +693,7 @@ static long privcmd_ioctl_dm_op(struct f
xen_preemptible_hcall_end();

out:
- unlock_pages(pages, nr_pages);
+ unlock_pages(pages, pinned);
kfree(xbufs);
kfree(pages);
kfree(kbufs);


2022-08-29 12:27:11

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 53/86] net: Fix a data-race around sysctl_tstamp_allow_data.

From: Kuniyuki Iwashima <[email protected]>

[ Upstream commit d2154b0afa73c0159b2856f875c6b4fe7cf6a95e ]

While reading sysctl_tstamp_allow_data, it can be changed
concurrently. Thus, we need to add READ_ONCE() to its reader.

Fixes: b245be1f4db1 ("net-timestamp: no-payload only sysctl")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/core/skbuff.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 48b6438f2a3d9..635cabcf8794f 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4691,7 +4691,7 @@ static bool skb_may_tx_timestamp(struct sock *sk, bool tsonly)
{
bool ret;

- if (likely(sysctl_tstamp_allow_data || tsonly))
+ if (likely(READ_ONCE(sysctl_tstamp_allow_data) || tsonly))
return true;

read_lock_bh(&sk->sk_callback_lock);
--
2.35.1



2022-08-29 12:28:23

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 36/86] netfilter: nft_tunnel: restrict it to netdev family

From: Pablo Neira Ayuso <[email protected]>

[ Upstream commit 01e4092d53bc4fe122a6e4b6d664adbd57528ca3 ]

Only allow to use this expression from NFPROTO_NETDEV family.

Fixes: af308b94a2a4 ("netfilter: nf_tables: add tunnel support")
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/netfilter/nft_tunnel.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/net/netfilter/nft_tunnel.c b/net/netfilter/nft_tunnel.c
index 3b27926d5382c..2ee50996da8cc 100644
--- a/net/netfilter/nft_tunnel.c
+++ b/net/netfilter/nft_tunnel.c
@@ -133,6 +133,7 @@ static const struct nft_expr_ops nft_tunnel_get_ops = {

static struct nft_expr_type nft_tunnel_type __read_mostly = {
.name = "tunnel",
+ .family = NFPROTO_NETDEV,
.ops = &nft_tunnel_get_ops,
.policy = nft_tunnel_policy,
.maxattr = NFTA_TUNNEL_MAX,
--
2.35.1



2022-08-29 12:28:29

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 54/86] net: Fix a data-race around sysctl_net_busy_poll.

From: Kuniyuki Iwashima <[email protected]>

[ Upstream commit c42b7cddea47503411bfb5f2f93a4154aaffa2d9 ]

While reading sysctl_net_busy_poll, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 060212928670 ("net: add low latency socket poll")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
include/net/busy_poll.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h
index 716b7c5f6fdd9..36e5e75e71720 100644
--- a/include/net/busy_poll.h
+++ b/include/net/busy_poll.h
@@ -31,7 +31,7 @@ extern unsigned int sysctl_net_busy_poll __read_mostly;

static inline bool net_busy_loop_on(void)
{
- return sysctl_net_busy_poll;
+ return READ_ONCE(sysctl_net_busy_poll);
}

static inline bool sk_can_busy_loop(const struct sock *sk)
--
2.35.1



2022-08-29 12:30:56

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 35/86] netfilter: nft_osf: restrict osf to ipv4, ipv6 and inet families

From: Pablo Neira Ayuso <[email protected]>

[ Upstream commit 5f3b7aae14a706d0d7da9f9e39def52ff5fc3d39 ]

As it was originally intended, restrict extension to supported families.

Fixes: b96af92d6eaf ("netfilter: nf_tables: implement Passive OS fingerprint module in nft_osf")
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/netfilter/nft_osf.c | 18 +++++++++++++++---
1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/net/netfilter/nft_osf.c b/net/netfilter/nft_osf.c
index d82677e83400b..720dc9fba6d4f 100644
--- a/net/netfilter/nft_osf.c
+++ b/net/netfilter/nft_osf.c
@@ -115,9 +115,21 @@ static int nft_osf_validate(const struct nft_ctx *ctx,
const struct nft_expr *expr,
const struct nft_data **data)
{
- return nft_chain_validate_hooks(ctx->chain, (1 << NF_INET_LOCAL_IN) |
- (1 << NF_INET_PRE_ROUTING) |
- (1 << NF_INET_FORWARD));
+ unsigned int hooks;
+
+ switch (ctx->family) {
+ case NFPROTO_IPV4:
+ case NFPROTO_IPV6:
+ case NFPROTO_INET:
+ hooks = (1 << NF_INET_LOCAL_IN) |
+ (1 << NF_INET_PRE_ROUTING) |
+ (1 << NF_INET_FORWARD);
+ break;
+ default:
+ return -EOPNOTSUPP;
+ }
+
+ return nft_chain_validate_hooks(ctx->chain, hooks);
}

static struct nft_expr_type nft_osf_type;
--
2.35.1



2022-08-29 12:32:05

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 25/86] net/mlx5e: Properly disable vlan strip on non-UL reps

From: Vlad Buslov <[email protected]>

[ Upstream commit f37044fd759b6bc40b6398a978e0b1acdf717372 ]

When querying mlx5 non-uplink representors capabilities with ethtool
rx-vlan-offload is marked as "off [fixed]". However, it is actually always
enabled because mlx5e_params->vlan_strip_disable is 0 by default when
initializing struct mlx5e_params instance. Fix the issue by explicitly
setting the vlan_strip_disable to 'true' for non-uplink representors.

Fixes: cb67b832921c ("net/mlx5e: Introduce SRIOV VF representors")
Signed-off-by: Vlad Buslov <[email protected]>
Reviewed-by: Roi Dayan <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 304435e561170..b991f03c7e991 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -706,6 +706,8 @@ static void mlx5e_build_rep_params(struct net_device *netdev)

params->num_tc = 1;
params->tunneled_offload_en = false;
+ if (rep->vport != MLX5_VPORT_UPLINK)
+ params->vlan_strip_disable = true;

mlx5_query_min_inline(mdev, &params->tx_min_inline_mode);

--
2.35.1



2022-08-29 12:32:05

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 81/86] blk-mq: fix io hung due to missing commit_rqs

From: Yu Kuai <[email protected]>

commit 65fac0d54f374625b43a9d6ad1f2c212bd41f518 upstream.

Currently, in virtio_scsi, if 'bd->last' is not set to true while
dispatching request, such io will stay in driver's queue, and driver
will wait for block layer to dispatch more rqs. However, if block
layer failed to dispatch more rq, it should trigger commit_rqs to
inform driver.

There is a problem in blk_mq_try_issue_list_directly() that commit_rqs
won't be called:

// assume that queue_depth is set to 1, list contains two rq
blk_mq_try_issue_list_directly
blk_mq_request_issue_directly
// dispatch first rq
// last is false
__blk_mq_try_issue_directly
blk_mq_get_dispatch_budget
// succeed to get first budget
__blk_mq_issue_directly
scsi_queue_rq
cmd->flags |= SCMD_LAST
virtscsi_queuecommand
kick = (sc->flags & SCMD_LAST) != 0
// kick is false, first rq won't issue to disk
queued++

blk_mq_request_issue_directly
// dispatch second rq
__blk_mq_try_issue_directly
blk_mq_get_dispatch_budget
// failed to get second budget
ret == BLK_STS_RESOURCE
blk_mq_request_bypass_insert
// errors is still 0

if (!list_empty(list) || errors && ...)
// won't pass, commit_rqs won't be called

In this situation, first rq relied on second rq to dispatch, while
second rq relied on first rq to complete, thus they will both hung.

Fix the problem by also treat 'BLK_STS_*RESOURCE' as 'errors' since
it means that request is not queued successfully.

Same problem exists in blk_mq_dispatch_rq_list(), 'BLK_STS_*RESOURCE'
can't be treated as 'errors' here, fix the problem by calling
commit_rqs if queue_rq return 'BLK_STS_*RESOURCE'.

Fixes: d666ba98f849 ("blk-mq: add mq_ops->commit_rqs()")
Signed-off-by: Yu Kuai <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
block/blk-mq.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1402,7 +1402,8 @@ out:
/* If we didn't flush the entire list, we could have told the driver
* there was more coming, but that turned out to be a lie.
*/
- if ((!list_empty(list) || errors) && q->mq_ops->commit_rqs && queued)
+ if ((!list_empty(list) || errors || needs_resource ||
+ ret == BLK_STS_DEV_RESOURCE) && q->mq_ops->commit_rqs && queued)
q->mq_ops->commit_rqs(hctx);
/*
* Any items that need requeuing? Stuff them into hctx->dispatch,
@@ -2080,6 +2081,7 @@ void blk_mq_try_issue_list_directly(stru
list_del_init(&rq->queuelist);
ret = blk_mq_request_issue_directly(rq, list_empty(list));
if (ret != BLK_STS_OK) {
+ errors++;
if (ret == BLK_STS_RESOURCE ||
ret == BLK_STS_DEV_RESOURCE) {
blk_mq_request_bypass_insert(rq, false,
@@ -2087,7 +2089,6 @@ void blk_mq_try_issue_list_directly(stru
break;
}
blk_mq_end_request(rq, ret);
- errors++;
} else
queued++;
}


2022-08-29 12:32:05

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 58/86] net: Fix data-races around sysctl_fb_tunnels_only_for_init_net.

From: Kuniyuki Iwashima <[email protected]>

[ Upstream commit af67508ea6cbf0e4ea27f8120056fa2efce127dd ]

While reading sysctl_fb_tunnels_only_for_init_net, it can be changed
concurrently. Thus, we need to add READ_ONCE() to its readers.

Fixes: 79134e6ce2c9 ("net: do not create fallback tunnels for non-default namespaces")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
include/linux/netdevice.h | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index ed2d531400051..c05701f89d6dd 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -633,9 +633,14 @@ extern int sysctl_devconf_inherit_init_net;
*/
static inline bool net_has_fallback_tunnels(const struct net *net)
{
- return !IS_ENABLED(CONFIG_SYSCTL) ||
- !sysctl_fb_tunnels_only_for_init_net ||
- (net == &init_net && sysctl_fb_tunnels_only_for_init_net == 1);
+#if IS_ENABLED(CONFIG_SYSCTL)
+ int fb_tunnels_only_for_init_net = READ_ONCE(sysctl_fb_tunnels_only_for_init_net);
+
+ return !fb_tunnels_only_for_init_net ||
+ (net_eq(net, &init_net) && fb_tunnels_only_for_init_net == 1);
+#else
+ return true;
+#endif
}

static inline int netdev_queue_numa_node_read(const struct netdev_queue *q)
--
2.35.1



2022-08-29 12:32:08

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 26/86] net: ipa: dont assume SMEM is page-aligned

From: Alex Elder <[email protected]>

[ Upstream commit b8d4380365c515d8e0351f2f46d371738dd19be1 ]

In ipa_smem_init(), a Qualcomm SMEM region is allocated (if needed)
and then its virtual address is fetched using qcom_smem_get(). The
physical address associated with that region is also fetched.

The physical address is adjusted so that it is page-aligned, and an
attempt is made to update the size of the region to compensate for
any non-zero adjustment.

But that adjustment isn't done properly. The physical address is
aligned twice, and as a result the size is never actually adjusted.

Fix this by *not* aligning the "addr" local variable, and instead
making the "phys" local variable be the adjusted "addr" value.

Fixes: a0036bb413d5b ("net: ipa: define SMEM memory region for IPA")
Signed-off-by: Alex Elder <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/ipa/ipa_mem.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ipa/ipa_mem.c b/drivers/net/ipa/ipa_mem.c
index a78d66051a17d..25a8d029f2075 100644
--- a/drivers/net/ipa/ipa_mem.c
+++ b/drivers/net/ipa/ipa_mem.c
@@ -414,7 +414,7 @@ static int ipa_smem_init(struct ipa *ipa, u32 item, size_t size)
}

/* Align the address down and the size up to a page boundary */
- addr = qcom_smem_virt_to_phys(virt) & PAGE_MASK;
+ addr = qcom_smem_virt_to_phys(virt);
phys = addr & PAGE_MASK;
size = PAGE_ALIGN(size + addr - phys);
iova = phys; /* We just want a direct mapping */
--
2.35.1



2022-08-29 12:32:16

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 48/86] net: Fix data-races around netdev_max_backlog.

From: Kuniyuki Iwashima <[email protected]>

[ Upstream commit 5dcd08cd19912892586c6082d56718333e2d19db ]

While reading netdev_max_backlog, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

While at it, we remove the unnecessary spaces in the doc.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
Documentation/admin-guide/sysctl/net.rst | 2 +-
net/core/dev.c | 4 ++--
net/core/gro_cells.c | 2 +-
net/xfrm/espintcp.c | 2 +-
net/xfrm/xfrm_input.c | 2 +-
5 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/Documentation/admin-guide/sysctl/net.rst b/Documentation/admin-guide/sysctl/net.rst
index f2ab8a5b6a4b8..7f553859dba82 100644
--- a/Documentation/admin-guide/sysctl/net.rst
+++ b/Documentation/admin-guide/sysctl/net.rst
@@ -271,7 +271,7 @@ poll cycle or the number of packets processed reaches netdev_budget.
netdev_max_backlog
------------------

-Maximum number of packets, queued on the INPUT side, when the interface
+Maximum number of packets, queued on the INPUT side, when the interface
receives packets faster than kernel can process them.

netdev_rss_key
diff --git a/net/core/dev.c b/net/core/dev.c
index 701a1afc91ff1..215c43aecc67e 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4516,7 +4516,7 @@ static bool skb_flow_limit(struct sk_buff *skb, unsigned int qlen)
struct softnet_data *sd;
unsigned int old_flow, new_flow;

- if (qlen < (netdev_max_backlog >> 1))
+ if (qlen < (READ_ONCE(netdev_max_backlog) >> 1))
return false;

sd = this_cpu_ptr(&softnet_data);
@@ -4564,7 +4564,7 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu,
if (!netif_running(skb->dev))
goto drop;
qlen = skb_queue_len(&sd->input_pkt_queue);
- if (qlen <= netdev_max_backlog && !skb_flow_limit(skb, qlen)) {
+ if (qlen <= READ_ONCE(netdev_max_backlog) && !skb_flow_limit(skb, qlen)) {
if (qlen) {
enqueue:
__skb_queue_tail(&sd->input_pkt_queue, skb);
diff --git a/net/core/gro_cells.c b/net/core/gro_cells.c
index 6eb2e5ec2c506..2f66f3f295630 100644
--- a/net/core/gro_cells.c
+++ b/net/core/gro_cells.c
@@ -26,7 +26,7 @@ int gro_cells_receive(struct gro_cells *gcells, struct sk_buff *skb)

cell = this_cpu_ptr(gcells->cells);

- if (skb_queue_len(&cell->napi_skbs) > netdev_max_backlog) {
+ if (skb_queue_len(&cell->napi_skbs) > READ_ONCE(netdev_max_backlog)) {
drop:
atomic_long_inc(&dev->rx_dropped);
kfree_skb(skb);
diff --git a/net/xfrm/espintcp.c b/net/xfrm/espintcp.c
index 1f08ebf7d80c5..24ca49ecebea3 100644
--- a/net/xfrm/espintcp.c
+++ b/net/xfrm/espintcp.c
@@ -170,7 +170,7 @@ int espintcp_queue_out(struct sock *sk, struct sk_buff *skb)
{
struct espintcp_ctx *ctx = espintcp_getctx(sk);

- if (skb_queue_len(&ctx->out_queue) >= netdev_max_backlog)
+ if (skb_queue_len(&ctx->out_queue) >= READ_ONCE(netdev_max_backlog))
return -ENOBUFS;

__skb_queue_tail(&ctx->out_queue, skb);
diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
index 61e6220ddd5ae..77e82033ad700 100644
--- a/net/xfrm/xfrm_input.c
+++ b/net/xfrm/xfrm_input.c
@@ -782,7 +782,7 @@ int xfrm_trans_queue_net(struct net *net, struct sk_buff *skb,

trans = this_cpu_ptr(&xfrm_trans_tasklet);

- if (skb_queue_len(&trans->queue) >= netdev_max_backlog)
+ if (skb_queue_len(&trans->queue) >= READ_ONCE(netdev_max_backlog))
return -ENOBUFS;

BUILD_BUG_ON(sizeof(struct xfrm_trans_cb) > sizeof(skb->cb));
--
2.35.1



2022-08-29 12:32:38

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 61/86] ixgbe: stop resetting SYSTIME in ixgbe_ptp_start_cyclecounter

From: Jacob Keller <[email protected]>

[ Upstream commit 25d7a5f5a6bb15a2dae0a3f39ea5dda215024726 ]

The ixgbe_ptp_start_cyclecounter is intended to be called whenever the
cyclecounter parameters need to be changed.

Since commit a9763f3cb54c ("ixgbe: Update PTP to support X550EM_x
devices"), this function has cleared the SYSTIME registers and reset the
TSAUXC DISABLE_SYSTIME bit.

While these need to be cleared during ixgbe_ptp_reset, it is wrong to clear
them during ixgbe_ptp_start_cyclecounter. This function may be called
during both reset and link status change. When link changes, the SYSTIME
counter is still operating normally, but the cyclecounter should be updated
to account for the possibly changed parameters.

Clearing SYSTIME when link changes causes the timecounter to jump because
the cycle counter now reads zero.

Extract the SYSTIME initialization out to a new function and call this
during ixgbe_ptp_reset. This prevents the timecounter adjustment and avoids
an unnecessary reset of the current time.

This also restores the original SYSTIME clearing that occurred during
ixgbe_ptp_reset before the commit above.

Reported-by: Steve Payne <[email protected]>
Reported-by: Ilya Evenbach <[email protected]>
Fixes: a9763f3cb54c ("ixgbe: Update PTP to support X550EM_x devices")
Signed-off-by: Jacob Keller <[email protected]>
Tested-by: Gurucharan <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c | 59 +++++++++++++++-----
1 file changed, 46 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c
index 22a874eee2e84..8b7f300355710 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c
@@ -1211,7 +1211,6 @@ void ixgbe_ptp_start_cyclecounter(struct ixgbe_adapter *adapter)
struct cyclecounter cc;
unsigned long flags;
u32 incval = 0;
- u32 tsauxc = 0;
u32 fuse0 = 0;

/* For some of the boards below this mask is technically incorrect.
@@ -1246,18 +1245,6 @@ void ixgbe_ptp_start_cyclecounter(struct ixgbe_adapter *adapter)
case ixgbe_mac_x550em_a:
case ixgbe_mac_X550:
cc.read = ixgbe_ptp_read_X550;
-
- /* enable SYSTIME counter */
- IXGBE_WRITE_REG(hw, IXGBE_SYSTIMR, 0);
- IXGBE_WRITE_REG(hw, IXGBE_SYSTIML, 0);
- IXGBE_WRITE_REG(hw, IXGBE_SYSTIMH, 0);
- tsauxc = IXGBE_READ_REG(hw, IXGBE_TSAUXC);
- IXGBE_WRITE_REG(hw, IXGBE_TSAUXC,
- tsauxc & ~IXGBE_TSAUXC_DISABLE_SYSTIME);
- IXGBE_WRITE_REG(hw, IXGBE_TSIM, IXGBE_TSIM_TXTS);
- IXGBE_WRITE_REG(hw, IXGBE_EIMS, IXGBE_EIMS_TIMESYNC);
-
- IXGBE_WRITE_FLUSH(hw);
break;
case ixgbe_mac_X540:
cc.read = ixgbe_ptp_read_82599;
@@ -1289,6 +1276,50 @@ void ixgbe_ptp_start_cyclecounter(struct ixgbe_adapter *adapter)
spin_unlock_irqrestore(&adapter->tmreg_lock, flags);
}

+/**
+ * ixgbe_ptp_init_systime - Initialize SYSTIME registers
+ * @adapter: the ixgbe private board structure
+ *
+ * Initialize and start the SYSTIME registers.
+ */
+static void ixgbe_ptp_init_systime(struct ixgbe_adapter *adapter)
+{
+ struct ixgbe_hw *hw = &adapter->hw;
+ u32 tsauxc;
+
+ switch (hw->mac.type) {
+ case ixgbe_mac_X550EM_x:
+ case ixgbe_mac_x550em_a:
+ case ixgbe_mac_X550:
+ tsauxc = IXGBE_READ_REG(hw, IXGBE_TSAUXC);
+
+ /* Reset SYSTIME registers to 0 */
+ IXGBE_WRITE_REG(hw, IXGBE_SYSTIMR, 0);
+ IXGBE_WRITE_REG(hw, IXGBE_SYSTIML, 0);
+ IXGBE_WRITE_REG(hw, IXGBE_SYSTIMH, 0);
+
+ /* Reset interrupt settings */
+ IXGBE_WRITE_REG(hw, IXGBE_TSIM, IXGBE_TSIM_TXTS);
+ IXGBE_WRITE_REG(hw, IXGBE_EIMS, IXGBE_EIMS_TIMESYNC);
+
+ /* Activate the SYSTIME counter */
+ IXGBE_WRITE_REG(hw, IXGBE_TSAUXC,
+ tsauxc & ~IXGBE_TSAUXC_DISABLE_SYSTIME);
+ break;
+ case ixgbe_mac_X540:
+ case ixgbe_mac_82599EB:
+ /* Reset SYSTIME registers to 0 */
+ IXGBE_WRITE_REG(hw, IXGBE_SYSTIML, 0);
+ IXGBE_WRITE_REG(hw, IXGBE_SYSTIMH, 0);
+ break;
+ default:
+ /* Other devices aren't supported */
+ return;
+ };
+
+ IXGBE_WRITE_FLUSH(hw);
+}
+
/**
* ixgbe_ptp_reset
* @adapter: the ixgbe private board structure
@@ -1315,6 +1346,8 @@ void ixgbe_ptp_reset(struct ixgbe_adapter *adapter)

ixgbe_ptp_start_cyclecounter(adapter);

+ ixgbe_ptp_init_systime(adapter);
+
spin_lock_irqsave(&adapter->tmreg_lock, flags);
timecounter_init(&adapter->hw_tc, &adapter->hw_cc,
ktime_to_ns(ktime_get_real()));
--
2.35.1



2022-08-29 12:32:38

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 20/86] mm/smaps: dont access young/dirty bit if pte unpresent

From: Peter Xu <[email protected]>

[ Upstream commit efd4149342db2df41b1bbe68972ead853b30e444 ]

These bits should only be valid when the ptes are present. Introducing
two booleans for it and set it to false when !pte_present() for both pte
and pmd accountings.

The bug is found during code reading and no real world issue reported, but
logically such an error can cause incorrect readings for either smaps or
smaps_rollup output on quite a few fields.

For example, it could cause over-estimate on values like Shared_Dirty,
Private_Dirty, Referenced. Or it could also cause under-estimate on
values like LazyFree, Shared_Clean, Private_Clean.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: b1d4d9e0cbd0 ("proc/smaps: carefully handle migration entries")
Fixes: c94b6923fa0a ("/proc/PID/smaps: Add PMD migration entry parsing")
Signed-off-by: Peter Xu <[email protected]>
Reviewed-by: Vlastimil Babka <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Yang Shi <[email protected]>
Cc: Konstantin Khlebnikov <[email protected]>
Cc: Huang Ying <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
fs/proc/task_mmu.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index ba98371e9d164..ef18f0d71b11b 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -503,10 +503,12 @@ static void smaps_pte_entry(pte_t *pte, unsigned long addr,
struct vm_area_struct *vma = walk->vma;
bool locked = !!(vma->vm_flags & VM_LOCKED);
struct page *page = NULL;
- bool migration = false;
+ bool migration = false, young = false, dirty = false;

if (pte_present(*pte)) {
page = vm_normal_page(vma, addr, *pte);
+ young = pte_young(*pte);
+ dirty = pte_dirty(*pte);
} else if (is_swap_pte(*pte)) {
swp_entry_t swpent = pte_to_swp_entry(*pte);

@@ -540,8 +542,7 @@ static void smaps_pte_entry(pte_t *pte, unsigned long addr,
if (!page)
return;

- smaps_account(mss, page, false, pte_young(*pte), pte_dirty(*pte),
- locked, migration);
+ smaps_account(mss, page, false, young, dirty, locked, migration);
}

#ifdef CONFIG_TRANSPARENT_HUGEPAGE
--
2.35.1



2022-08-29 12:33:00

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 70/86] x86/bugs: Add "unknown" reporting for MMIO Stale Data

From: Pawan Gupta <[email protected]>

commit 7df548840c496b0141fb2404b889c346380c2b22 upstream.

Older Intel CPUs that are not in the affected processor list for MMIO
Stale Data vulnerabilities currently report "Not affected" in sysfs,
which may not be correct. Vulnerability status for these older CPUs is
unknown.

Add known-not-affected CPUs to the whitelist. Report "unknown"
mitigation status for CPUs that are not in blacklist, whitelist and also
don't enumerate MSR ARCH_CAPABILITIES bits that reflect hardware
immunity to MMIO Stale Data vulnerabilities.

Mitigation is not deployed when the status is unknown.

[ bp: Massage, fixup. ]

Fixes: 8d50cdf8b834 ("x86/speculation/mmio: Add sysfs reporting for Processor MMIO Stale Data")
Suggested-by: Andrew Cooper <[email protected]>
Suggested-by: Tony Luck <[email protected]>
Signed-off-by: Pawan Gupta <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/a932c154772f2121794a5f2eded1a11013114711.1657846269.git.pawan.kumar.gupta@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
Documentation/admin-guide/hw-vuln/processor_mmio_stale_data.rst | 14 +++
arch/x86/include/asm/cpufeatures.h | 5 -
arch/x86/kernel/cpu/bugs.c | 14 ++-
arch/x86/kernel/cpu/common.c | 42 ++++++----
4 files changed, 56 insertions(+), 19 deletions(-)

--- a/Documentation/admin-guide/hw-vuln/processor_mmio_stale_data.rst
+++ b/Documentation/admin-guide/hw-vuln/processor_mmio_stale_data.rst
@@ -230,6 +230,20 @@ The possible values in this file are:
* - 'Mitigation: Clear CPU buffers'
- The processor is vulnerable and the CPU buffer clearing mitigation is
enabled.
+ * - 'Unknown: No mitigations'
+ - The processor vulnerability status is unknown because it is
+ out of Servicing period. Mitigation is not attempted.
+
+Definitions:
+------------
+
+Servicing period: The process of providing functional and security updates to
+Intel processors or platforms, utilizing the Intel Platform Update (IPU)
+process or other similar mechanisms.
+
+End of Servicing Updates (ESU): ESU is the date at which Intel will no
+longer provide Servicing, such as through IPU or other similar update
+processes. ESU dates will typically be aligned to end of quarter.

If the processor is vulnerable then the following information is appended to
the above information:
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -429,7 +429,8 @@
#define X86_BUG_ITLB_MULTIHIT X86_BUG(23) /* CPU may incur MCE during certain page attribute changes */
#define X86_BUG_SRBDS X86_BUG(24) /* CPU may leak RNG bits if not mitigated */
#define X86_BUG_MMIO_STALE_DATA X86_BUG(25) /* CPU is affected by Processor MMIO Stale Data vulnerabilities */
-#define X86_BUG_RETBLEED X86_BUG(26) /* CPU is affected by RETBleed */
-#define X86_BUG_EIBRS_PBRSB X86_BUG(27) /* EIBRS is vulnerable to Post Barrier RSB Predictions */
+#define X86_BUG_MMIO_UNKNOWN X86_BUG(26) /* CPU is too old and its MMIO Stale Data status is unknown */
+#define X86_BUG_RETBLEED X86_BUG(27) /* CPU is affected by RETBleed */
+#define X86_BUG_EIBRS_PBRSB X86_BUG(28) /* EIBRS is vulnerable to Post Barrier RSB Predictions */

#endif /* _ASM_X86_CPUFEATURES_H */
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -424,7 +424,8 @@ static void __init mmio_select_mitigatio
u64 ia32_cap;

if (!boot_cpu_has_bug(X86_BUG_MMIO_STALE_DATA) ||
- cpu_mitigations_off()) {
+ boot_cpu_has_bug(X86_BUG_MMIO_UNKNOWN) ||
+ cpu_mitigations_off()) {
mmio_mitigation = MMIO_MITIGATION_OFF;
return;
}
@@ -529,6 +530,8 @@ out:
pr_info("TAA: %s\n", taa_strings[taa_mitigation]);
if (boot_cpu_has_bug(X86_BUG_MMIO_STALE_DATA))
pr_info("MMIO Stale Data: %s\n", mmio_strings[mmio_mitigation]);
+ else if (boot_cpu_has_bug(X86_BUG_MMIO_UNKNOWN))
+ pr_info("MMIO Stale Data: Unknown: No mitigations\n");
}

static void __init md_clear_select_mitigation(void)
@@ -2198,6 +2201,9 @@ static ssize_t tsx_async_abort_show_stat

static ssize_t mmio_stale_data_show_state(char *buf)
{
+ if (boot_cpu_has_bug(X86_BUG_MMIO_UNKNOWN))
+ return sysfs_emit(buf, "Unknown: No mitigations\n");
+
if (mmio_mitigation == MMIO_MITIGATION_OFF)
return sysfs_emit(buf, "%s\n", mmio_strings[mmio_mitigation]);

@@ -2344,6 +2350,7 @@ static ssize_t cpu_show_common(struct de
return srbds_show_state(buf);

case X86_BUG_MMIO_STALE_DATA:
+ case X86_BUG_MMIO_UNKNOWN:
return mmio_stale_data_show_state(buf);

case X86_BUG_RETBLEED:
@@ -2403,7 +2410,10 @@ ssize_t cpu_show_srbds(struct device *de

ssize_t cpu_show_mmio_stale_data(struct device *dev, struct device_attribute *attr, char *buf)
{
- return cpu_show_common(dev, attr, buf, X86_BUG_MMIO_STALE_DATA);
+ if (boot_cpu_has_bug(X86_BUG_MMIO_UNKNOWN))
+ return cpu_show_common(dev, attr, buf, X86_BUG_MMIO_UNKNOWN);
+ else
+ return cpu_show_common(dev, attr, buf, X86_BUG_MMIO_STALE_DATA);
}

ssize_t cpu_show_retbleed(struct device *dev, struct device_attribute *attr, char *buf)
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1024,7 +1024,8 @@ static void identify_cpu_without_cpuid(s
#define NO_SWAPGS BIT(6)
#define NO_ITLB_MULTIHIT BIT(7)
#define NO_SPECTRE_V2 BIT(8)
-#define NO_EIBRS_PBRSB BIT(9)
+#define NO_MMIO BIT(9)
+#define NO_EIBRS_PBRSB BIT(10)

#define VULNWL(vendor, family, model, whitelist) \
X86_MATCH_VENDOR_FAM_MODEL(vendor, family, model, whitelist)
@@ -1045,6 +1046,11 @@ static const __initconst struct x86_cpu_
VULNWL(NSC, 5, X86_MODEL_ANY, NO_SPECULATION),

/* Intel Family 6 */
+ VULNWL_INTEL(TIGERLAKE, NO_MMIO),
+ VULNWL_INTEL(TIGERLAKE_L, NO_MMIO),
+ VULNWL_INTEL(ALDERLAKE, NO_MMIO),
+ VULNWL_INTEL(ALDERLAKE_L, NO_MMIO),
+
VULNWL_INTEL(ATOM_SALTWELL, NO_SPECULATION | NO_ITLB_MULTIHIT),
VULNWL_INTEL(ATOM_SALTWELL_TABLET, NO_SPECULATION | NO_ITLB_MULTIHIT),
VULNWL_INTEL(ATOM_SALTWELL_MID, NO_SPECULATION | NO_ITLB_MULTIHIT),
@@ -1063,9 +1069,9 @@ static const __initconst struct x86_cpu_
VULNWL_INTEL(ATOM_AIRMONT_MID, NO_L1TF | MSBDS_ONLY | NO_SWAPGS | NO_ITLB_MULTIHIT),
VULNWL_INTEL(ATOM_AIRMONT_NP, NO_L1TF | NO_SWAPGS | NO_ITLB_MULTIHIT),

- VULNWL_INTEL(ATOM_GOLDMONT, NO_MDS | NO_L1TF | NO_SWAPGS | NO_ITLB_MULTIHIT),
- VULNWL_INTEL(ATOM_GOLDMONT_D, NO_MDS | NO_L1TF | NO_SWAPGS | NO_ITLB_MULTIHIT),
- VULNWL_INTEL(ATOM_GOLDMONT_PLUS, NO_MDS | NO_L1TF | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_EIBRS_PBRSB),
+ VULNWL_INTEL(ATOM_GOLDMONT, NO_MDS | NO_L1TF | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_MMIO),
+ VULNWL_INTEL(ATOM_GOLDMONT_D, NO_MDS | NO_L1TF | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_MMIO),
+ VULNWL_INTEL(ATOM_GOLDMONT_PLUS, NO_MDS | NO_L1TF | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_MMIO | NO_EIBRS_PBRSB),

/*
* Technically, swapgs isn't serializing on AMD (despite it previously
@@ -1080,18 +1086,18 @@ static const __initconst struct x86_cpu_
VULNWL_INTEL(ATOM_TREMONT_D, NO_ITLB_MULTIHIT | NO_EIBRS_PBRSB),

/* AMD Family 0xf - 0x12 */
- VULNWL_AMD(0x0f, NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT),
- VULNWL_AMD(0x10, NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT),
- VULNWL_AMD(0x11, NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT),
- VULNWL_AMD(0x12, NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT),
+ VULNWL_AMD(0x0f, NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_MMIO),
+ VULNWL_AMD(0x10, NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_MMIO),
+ VULNWL_AMD(0x11, NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_MMIO),
+ VULNWL_AMD(0x12, NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_MMIO),

/* FAMILY_ANY must be last, otherwise 0x0f - 0x12 matches won't work */
- VULNWL_AMD(X86_FAMILY_ANY, NO_MELTDOWN | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT),
- VULNWL_HYGON(X86_FAMILY_ANY, NO_MELTDOWN | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT),
+ VULNWL_AMD(X86_FAMILY_ANY, NO_MELTDOWN | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_MMIO),
+ VULNWL_HYGON(X86_FAMILY_ANY, NO_MELTDOWN | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_MMIO),

/* Zhaoxin Family 7 */
- VULNWL(CENTAUR, 7, X86_MODEL_ANY, NO_SPECTRE_V2 | NO_SWAPGS),
- VULNWL(ZHAOXIN, 7, X86_MODEL_ANY, NO_SPECTRE_V2 | NO_SWAPGS),
+ VULNWL(CENTAUR, 7, X86_MODEL_ANY, NO_SPECTRE_V2 | NO_SWAPGS | NO_MMIO),
+ VULNWL(ZHAOXIN, 7, X86_MODEL_ANY, NO_SPECTRE_V2 | NO_SWAPGS | NO_MMIO),
{}
};

@@ -1245,10 +1251,16 @@ static void __init cpu_set_bug_bits(stru
* Affected CPU list is generally enough to enumerate the vulnerability,
* but for virtualization case check for ARCH_CAP MSR bits also, VMM may
* not want the guest to enumerate the bug.
+ *
+ * Set X86_BUG_MMIO_UNKNOWN for CPUs that are neither in the blacklist,
+ * nor in the whitelist and also don't enumerate MSR ARCH_CAP MMIO bits.
*/
- if (cpu_matches(cpu_vuln_blacklist, MMIO) &&
- !arch_cap_mmio_immune(ia32_cap))
- setup_force_cpu_bug(X86_BUG_MMIO_STALE_DATA);
+ if (!arch_cap_mmio_immune(ia32_cap)) {
+ if (cpu_matches(cpu_vuln_blacklist, MMIO))
+ setup_force_cpu_bug(X86_BUG_MMIO_STALE_DATA);
+ else if (!cpu_matches(cpu_vuln_whitelist, NO_MMIO))
+ setup_force_cpu_bug(X86_BUG_MMIO_UNKNOWN);
+ }

if (!cpu_has(c, X86_FEATURE_BTC_NO)) {
if (cpu_matches(cpu_vuln_blacklist, RETBLEED) || (ia32_cap & ARCH_CAP_RSBA))


2022-08-29 12:33:01

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 42/86] netfilter: nf_tables: disallow jump to implicit chain from set element

From: Pablo Neira Ayuso <[email protected]>

[ Upstream commit f323ef3a0d49e147365284bc1f02212e617b7f09 ]

Extend struct nft_data_desc to add a flag field that specifies
nft_data_init() is being called for set element data.

Use it to disallow jump to implicit chain from set element, only jump
to chain via immediate expression is allowed.

Fixes: d0e2c7de92c7 ("netfilter: nf_tables: add NFT_CHAIN_BINDING")
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
include/net/netfilter/nf_tables.h | 5 +++++
net/netfilter/nf_tables_api.c | 4 ++++
2 files changed, 9 insertions(+)

diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index 6c062b2509b9b..e66fee99ed3ea 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -176,10 +176,15 @@ struct nft_ctx {
bool report;
};

+enum nft_data_desc_flags {
+ NFT_DATA_DESC_SETELEM = (1 << 0),
+};
+
struct nft_data_desc {
enum nft_data_types type;
unsigned int size;
unsigned int len;
+ unsigned int flags;
};

int nft_data_init(const struct nft_ctx *ctx, struct nft_data *data,
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index df79ea6004a59..b36728cfc5d81 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -4865,6 +4865,7 @@ static int nft_setelem_parse_data(struct nft_ctx *ctx, struct nft_set *set,
desc->type = dtype;
desc->size = NFT_DATA_VALUE_MAXLEN;
desc->len = set->dlen;
+ desc->flags = NFT_DATA_DESC_SETELEM;

return nft_data_init(ctx, data, desc, attr);
}
@@ -8677,6 +8678,9 @@ static int nft_verdict_init(const struct nft_ctx *ctx, struct nft_data *data,
return PTR_ERR(chain);
if (nft_is_base_chain(chain))
return -EOPNOTSUPP;
+ if (desc->flags & NFT_DATA_DESC_SETELEM &&
+ chain->flags & NFT_CHAIN_BINDING)
+ return -EINVAL;

chain->use++;
data->verdict.chain = chain;
--
2.35.1



2022-08-29 12:34:28

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 43/86] netfilter: nf_tables: disallow binding to already bound chain

From: Pablo Neira Ayuso <[email protected]>

[ Upstream commit e02f0d3970404bfea385b6edb86f2d936db0ea2b ]

Update nft_data_init() to report EINVAL if chain is already bound.

Fixes: d0e2c7de92c7 ("netfilter: nf_tables: add NFT_CHAIN_BINDING")
Reported-by: Gwangun Jung <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/netfilter/nf_tables_api.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index b36728cfc5d81..1b039476e4d6a 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -8678,6 +8678,8 @@ static int nft_verdict_init(const struct nft_ctx *ctx, struct nft_data *data,
return PTR_ERR(chain);
if (nft_is_base_chain(chain))
return -EOPNOTSUPP;
+ if (nft_chain_is_bound(chain))
+ return -EINVAL;
if (desc->flags & NFT_DATA_DESC_SETELEM &&
chain->flags & NFT_CHAIN_BINDING)
return -EINVAL;
--
2.35.1



2022-08-29 12:35:36

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 59/86] net: Fix data-races around sysctl_devconf_inherit_init_net.

From: Kuniyuki Iwashima <[email protected]>

[ Upstream commit a5612ca10d1aa05624ebe72633e0c8c792970833 ]

While reading sysctl_devconf_inherit_init_net, it can be changed
concurrently. Thus, we need to add READ_ONCE() to its readers.

Fixes: 856c395cfa63 ("net: introduce a knob to control whether to inherit devconf config")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
include/linux/netdevice.h | 9 +++++++++
net/ipv4/devinet.c | 16 ++++++++++------
net/ipv6/addrconf.c | 5 ++---
3 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index c05701f89d6dd..6564fb4ac49e1 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -643,6 +643,15 @@ static inline bool net_has_fallback_tunnels(const struct net *net)
#endif
}

+static inline int net_inherit_devconf(void)
+{
+#if IS_ENABLED(CONFIG_SYSCTL)
+ return READ_ONCE(sysctl_devconf_inherit_init_net);
+#else
+ return 0;
+#endif
+}
+
static inline int netdev_queue_numa_node_read(const struct netdev_queue *q)
{
#if defined(CONFIG_XPS) && defined(CONFIG_NUMA)
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 148ef484a66ce..8f17538755507 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -2668,23 +2668,27 @@ static __net_init int devinet_init_net(struct net *net)
#endif

if (!net_eq(net, &init_net)) {
- if (IS_ENABLED(CONFIG_SYSCTL) &&
- sysctl_devconf_inherit_init_net == 3) {
+ switch (net_inherit_devconf()) {
+ case 3:
/* copy from the current netns */
memcpy(all, current->nsproxy->net_ns->ipv4.devconf_all,
sizeof(ipv4_devconf));
memcpy(dflt,
current->nsproxy->net_ns->ipv4.devconf_dflt,
sizeof(ipv4_devconf_dflt));
- } else if (!IS_ENABLED(CONFIG_SYSCTL) ||
- sysctl_devconf_inherit_init_net != 2) {
- /* inherit == 0 or 1: copy from init_net */
+ break;
+ case 0:
+ case 1:
+ /* copy from init_net */
memcpy(all, init_net.ipv4.devconf_all,
sizeof(ipv4_devconf));
memcpy(dflt, init_net.ipv4.devconf_dflt,
sizeof(ipv4_devconf_dflt));
+ break;
+ case 2:
+ /* use compiled values */
+ break;
}
- /* else inherit == 2: use compiled values */
}

#ifdef CONFIG_SYSCTL
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 05317e6f48f8a..ed1e5bfc97b31 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -7042,9 +7042,8 @@ static int __net_init addrconf_init_net(struct net *net)
if (!dflt)
goto err_alloc_dflt;

- if (IS_ENABLED(CONFIG_SYSCTL) &&
- !net_eq(net, &init_net)) {
- switch (sysctl_devconf_inherit_init_net) {
+ if (!net_eq(net, &init_net)) {
+ switch (net_inherit_devconf()) {
case 1: /* copy from init_net */
memcpy(all, init_net.ipv6.devconf_all,
sizeof(ipv6_devconf));
--
2.35.1



2022-08-29 12:36:27

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 34/86] netfilter: nf_tables: do not leave chain stats enabled on error

From: Pablo Neira Ayuso <[email protected]>

[ Upstream commit 43eb8949cfdffa764b92bc6c54b87cbe5b0003fe ]

Error might occur later in the nf_tables_addchain() codepath, enable
static key only after transaction has been created.

Fixes: 9f08ea848117 ("netfilter: nf_tables: keep chain counters away from hot path")
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/netfilter/nf_tables_api.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 30bd4b867912c..456988b5c076e 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -1999,9 +1999,9 @@ static int nf_tables_addchain(struct nft_ctx *ctx, u8 family, u8 genmask,
u8 policy, u32 flags)
{
const struct nlattr * const *nla = ctx->nla;
+ struct nft_stats __percpu *stats = NULL;
struct nft_table *table = ctx->table;
struct nft_base_chain *basechain;
- struct nft_stats __percpu *stats;
struct net *net = ctx->net;
char name[NFT_NAME_MAXLEN];
struct nft_trans *trans;
@@ -2037,7 +2037,6 @@ static int nf_tables_addchain(struct nft_ctx *ctx, u8 family, u8 genmask,
return PTR_ERR(stats);
}
rcu_assign_pointer(basechain->stats, stats);
- static_branch_inc(&nft_counters_enabled);
}

err = nft_basechain_init(basechain, family, &hook, flags);
@@ -2120,6 +2119,9 @@ static int nf_tables_addchain(struct nft_ctx *ctx, u8 family, u8 genmask,
goto err_unregister_hook;
}

+ if (stats)
+ static_branch_inc(&nft_counters_enabled);
+
table->use++;

return 0;
--
2.35.1



2022-08-29 12:36:49

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 33/86] netfilter: nft_payload: do not truncate csum_offset and csum_type

From: Pablo Neira Ayuso <[email protected]>

[ Upstream commit 7044ab281febae9e2fa9b0b247693d6026166293 ]

Instead report ERANGE if csum_offset is too long, and EOPNOTSUPP if type
is not support.

Fixes: 7ec3f7b47b8d ("netfilter: nft_payload: add packet mangling support")
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/netfilter/nft_payload.c | 19 +++++++++++++------
1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/net/netfilter/nft_payload.c b/net/netfilter/nft_payload.c
index bdb07362c0ef0..551e0d6cf63d4 100644
--- a/net/netfilter/nft_payload.c
+++ b/net/netfilter/nft_payload.c
@@ -660,17 +660,23 @@ static int nft_payload_set_init(const struct nft_ctx *ctx,
const struct nlattr * const tb[])
{
struct nft_payload_set *priv = nft_expr_priv(expr);
+ u32 csum_offset, csum_type = NFT_PAYLOAD_CSUM_NONE;
+ int err;

priv->base = ntohl(nla_get_be32(tb[NFTA_PAYLOAD_BASE]));
priv->offset = ntohl(nla_get_be32(tb[NFTA_PAYLOAD_OFFSET]));
priv->len = ntohl(nla_get_be32(tb[NFTA_PAYLOAD_LEN]));

if (tb[NFTA_PAYLOAD_CSUM_TYPE])
- priv->csum_type =
- ntohl(nla_get_be32(tb[NFTA_PAYLOAD_CSUM_TYPE]));
- if (tb[NFTA_PAYLOAD_CSUM_OFFSET])
- priv->csum_offset =
- ntohl(nla_get_be32(tb[NFTA_PAYLOAD_CSUM_OFFSET]));
+ csum_type = ntohl(nla_get_be32(tb[NFTA_PAYLOAD_CSUM_TYPE]));
+ if (tb[NFTA_PAYLOAD_CSUM_OFFSET]) {
+ err = nft_parse_u32_check(tb[NFTA_PAYLOAD_CSUM_OFFSET], U8_MAX,
+ &csum_offset);
+ if (err < 0)
+ return err;
+
+ priv->csum_offset = csum_offset;
+ }
if (tb[NFTA_PAYLOAD_CSUM_FLAGS]) {
u32 flags;

@@ -681,7 +687,7 @@ static int nft_payload_set_init(const struct nft_ctx *ctx,
priv->csum_flags = flags;
}

- switch (priv->csum_type) {
+ switch (csum_type) {
case NFT_PAYLOAD_CSUM_NONE:
case NFT_PAYLOAD_CSUM_INET:
break;
@@ -695,6 +701,7 @@ static int nft_payload_set_init(const struct nft_ctx *ctx,
default:
return -EOPNOTSUPP;
}
+ priv->csum_type = csum_type;

return nft_parse_register_load(tb[NFTA_PAYLOAD_SREG], &priv->sreg,
priv->len);
--
2.35.1



2022-08-29 12:36:55

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 11/86] kernel/sched: Remove dl_boosted flag comment

From: Hui Su <[email protected]>

commit 0e3872499de1a1230cef5221607d71aa09264bd5 upstream.

since commit 2279f540ea7d ("sched/deadline: Fix priority
inheritance with multiple scheduling classes"), we should not
keep it here.

Signed-off-by: Hui Su <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Daniel Bristot de Oliveira <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/linux/sched.h | 4 ----
1 file changed, 4 deletions(-)

--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -542,10 +542,6 @@ struct sched_dl_entity {
* task has to wait for a replenishment to be performed at the
* next firing of dl_timer.
*
- * @dl_boosted tells if we are boosted due to DI. If so we are
- * outside bandwidth enforcement mechanism (but only until we
- * exit the critical section);
- *
* @dl_yielded tells if task gave up the CPU before consuming
* all its available runtime during the last job.
*


2022-08-29 12:37:09

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 24/86] ice: xsk: prohibit usage of non-balanced queue id

From: Maciej Fijalkowski <[email protected]>

[ Upstream commit 5a42f112d367bb4700a8a41f5c12724fde6bfbb9 ]

Fix the following scenario:
1. ethtool -L $IFACE rx 8 tx 96
2. xdpsock -q 10 -t -z

Above refers to a case where user would like to attach XSK socket in
txonly mode at a queue id that does not have a corresponding Rx queue.
At this moment ice's XSK logic is tightly bound to act on a "queue pair",
e.g. both Tx and Rx queues at a given queue id are disabled/enabled and
both of them will get XSK pool assigned, which is broken for the presented
queue configuration. This results in the splat included at the bottom,
which is basically an OOB access to Rx ring array.

To fix this, allow using the ids only in scope of "combined" queues
reported by ethtool. However, logic should be rewritten to allow such
configurations later on, which would end up as a complete rewrite of the
control path, so let us go with this temporary fix.

[420160.558008] BUG: kernel NULL pointer dereference, address: 0000000000000082
[420160.566359] #PF: supervisor read access in kernel mode
[420160.572657] #PF: error_code(0x0000) - not-present page
[420160.579002] PGD 0 P4D 0
[420160.582756] Oops: 0000 [#1] PREEMPT SMP NOPTI
[420160.588396] CPU: 10 PID: 21232 Comm: xdpsock Tainted: G OE 5.19.0-rc7+ #10
[420160.597893] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
[420160.609894] RIP: 0010:ice_xsk_pool_setup+0x44/0x7d0 [ice]
[420160.616968] Code: f3 48 83 ec 40 48 8b 4f 20 48 8b 3f 65 48 8b 04 25 28 00 00 00 48 89 44 24 38 31 c0 48 8d 04 ed 00 00 00 00 48 01 c1 48 8b 11 <0f> b7 92 82 00 00 00 48 85 d2 0f 84 2d 75 00 00 48 8d 72 ff 48 85
[420160.639421] RSP: 0018:ffffc9002d2afd48 EFLAGS: 00010282
[420160.646650] RAX: 0000000000000050 RBX: ffff88811d8bdd00 RCX: ffff888112c14ff8
[420160.655893] RDX: 0000000000000000 RSI: ffff88811d8bdd00 RDI: ffff888109861000
[420160.665166] RBP: 000000000000000a R08: 000000000000000a R09: 0000000000000000
[420160.674493] R10: 000000000000889f R11: 0000000000000000 R12: 000000000000000a
[420160.683833] R13: 000000000000000a R14: 0000000000000000 R15: ffff888117611828
[420160.693211] FS: 00007fa869fc1f80(0000) GS:ffff8897e0880000(0000) knlGS:0000000000000000
[420160.703645] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[420160.711783] CR2: 0000000000000082 CR3: 00000001d076c001 CR4: 00000000007706e0
[420160.721399] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[420160.731045] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[420160.740707] PKRU: 55555554
[420160.745960] Call Trace:
[420160.750962] <TASK>
[420160.755597] ? kmalloc_large_node+0x79/0x90
[420160.762703] ? __kmalloc_node+0x3f5/0x4b0
[420160.769341] xp_assign_dev+0xfd/0x210
[420160.775661] ? shmem_file_read_iter+0x29a/0x420
[420160.782896] xsk_bind+0x152/0x490
[420160.788943] __sys_bind+0xd0/0x100
[420160.795097] ? exit_to_user_mode_prepare+0x20/0x120
[420160.802801] __x64_sys_bind+0x16/0x20
[420160.809298] do_syscall_64+0x38/0x90
[420160.815741] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[420160.823731] RIP: 0033:0x7fa86a0dd2fb
[420160.830264] Code: c3 66 0f 1f 44 00 00 48 8b 15 69 8b 0c 00 f7 d8 64 89 02 b8 ff ff ff ff eb bc 0f 1f 44 00 00 f3 0f 1e fa b8 31 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3d 8b 0c 00 f7 d8 64 89 01 48
[420160.855410] RSP: 002b:00007ffc1146f618 EFLAGS: 00000246 ORIG_RAX: 0000000000000031
[420160.866366] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fa86a0dd2fb
[420160.876957] RDX: 0000000000000010 RSI: 00007ffc1146f680 RDI: 0000000000000003
[420160.887604] RBP: 000055d7113a0520 R08: 00007fa868fb8000 R09: 0000000080000000
[420160.898293] R10: 0000000000008001 R11: 0000000000000246 R12: 000055d7113a04e0
[420160.909038] R13: 000055d7113a0320 R14: 000000000000000a R15: 0000000000000000
[420160.919817] </TASK>
[420160.925659] Modules linked in: ice(OE) af_packet binfmt_misc nls_iso8859_1 ipmi_ssif intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp mei_me coretemp ioatdma mei ipmi_si wmi ipmi_msghandler acpi_pad acpi_power_meter ip_tables x_tables autofs4 ixgbe i40e crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd ahci mdio dca libahci lpc_ich [last unloaded: ice]
[420160.977576] CR2: 0000000000000082
[420160.985037] ---[ end trace 0000000000000000 ]---
[420161.097724] RIP: 0010:ice_xsk_pool_setup+0x44/0x7d0 [ice]
[420161.107341] Code: f3 48 83 ec 40 48 8b 4f 20 48 8b 3f 65 48 8b 04 25 28 00 00 00 48 89 44 24 38 31 c0 48 8d 04 ed 00 00 00 00 48 01 c1 48 8b 11 <0f> b7 92 82 00 00 00 48 85 d2 0f 84 2d 75 00 00 48 8d 72 ff 48 85
[420161.134741] RSP: 0018:ffffc9002d2afd48 EFLAGS: 00010282
[420161.144274] RAX: 0000000000000050 RBX: ffff88811d8bdd00 RCX: ffff888112c14ff8
[420161.155690] RDX: 0000000000000000 RSI: ffff88811d8bdd00 RDI: ffff888109861000
[420161.168088] RBP: 000000000000000a R08: 000000000000000a R09: 0000000000000000
[420161.179295] R10: 000000000000889f R11: 0000000000000000 R12: 000000000000000a
[420161.190420] R13: 000000000000000a R14: 0000000000000000 R15: ffff888117611828
[420161.201505] FS: 00007fa869fc1f80(0000) GS:ffff8897e0880000(0000) knlGS:0000000000000000
[420161.213628] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[420161.223413] CR2: 0000000000000082 CR3: 00000001d076c001 CR4: 00000000007706e0
[420161.234653] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[420161.245893] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[420161.257052] PKRU: 55555554

Fixes: 2d4238f55697 ("ice: Add support for AF_XDP")
Signed-off-by: Maciej Fijalkowski <[email protected]>
Tested-by: George Kuruvinakunnel <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/ethernet/intel/ice/ice_xsk.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
index 4bb62950d92de..59963b901be0f 100644
--- a/drivers/net/ethernet/intel/ice/ice_xsk.c
+++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
@@ -371,6 +371,12 @@ int ice_xsk_pool_setup(struct ice_vsi *vsi, struct xsk_buff_pool *pool, u16 qid)
bool if_running, pool_present = !!pool;
int ret = 0, pool_failure = 0;

+ if (qid >= vsi->num_rxq || qid >= vsi->num_txq) {
+ netdev_err(vsi->netdev, "Please use queue id in scope of combined queues count\n");
+ pool_failure = -EINVAL;
+ goto failure;
+ }
+
if (!is_power_of_2(vsi->rx_rings[qid]->count) ||
!is_power_of_2(vsi->tx_rings[qid]->count)) {
netdev_err(vsi->netdev, "Please align ring sizes to power of 2\n");
--
2.35.1



2022-08-29 12:38:17

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 74/86] ACPI: processor: Remove freq Qos request for all CPUs

From: Riwen Lu <[email protected]>

commit 36527b9d882362567ceb4eea8666813280f30e6f upstream.

The freq Qos request would be removed repeatedly if the cpufreq policy
relates to more than one CPU. Then, it would cause the "called for unknown
object" warning.

Remove the freq Qos request for each CPU relates to the cpufreq policy,
instead of removing repeatedly for the last CPU of it.

Fixes: a1bb46c36ce3 ("ACPI: processor: Add QoS requests for all CPUs")
Reported-by: Jeremy Linton <[email protected]>
Tested-by: Jeremy Linton <[email protected]>
Signed-off-by: Riwen Lu <[email protected]>
Cc: 5.4+ <[email protected]> # 5.4+
Signed-off-by: Rafael J. Wysocki <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/acpi/processor_thermal.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/acpi/processor_thermal.c
+++ b/drivers/acpi/processor_thermal.c
@@ -148,7 +148,7 @@ void acpi_thermal_cpufreq_exit(struct cp
unsigned int cpu;

for_each_cpu(cpu, policy->related_cpus) {
- struct acpi_processor *pr = per_cpu(processors, policy->cpu);
+ struct acpi_processor *pr = per_cpu(processors, cpu);

if (pr)
freq_qos_remove_request(&pr->thermal_req);


2022-08-29 12:38:43

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 13/86] xfrm: clone missing x->lastused in xfrm_do_migrate

From: Antony Antony <[email protected]>

[ Upstream commit 6aa811acdb76facca0b705f4e4c1d948ccb6af8b ]

x->lastused was not cloned in xfrm_do_migrate. Add it to clone during
migrate.

Fixes: 80c9abaabf42 ("[XFRM]: Extension for dynamic update of endpoint address(es)")
Signed-off-by: Antony Antony <[email protected]>
Signed-off-by: Steffen Klassert <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/xfrm/xfrm_state.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index bc0bbb1571cef..fdbd56ed4bd52 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -1557,6 +1557,7 @@ static struct xfrm_state *xfrm_state_clone(struct xfrm_state *orig,
x->replay = orig->replay;
x->preplay = orig->preplay;
x->mapping_maxage = orig->mapping_maxage;
+ x->lastused = orig->lastused;
x->new_mapping = 0;
x->new_mapping_sport = 0;

--
2.35.1



2022-08-29 12:39:18

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 21/86] rose: check NULL rose_loopback_neigh->loopback

From: Bernard Pidoux <[email protected]>

[ Upstream commit 3c53cd65dece47dd1f9d3a809f32e59d1d87b2b8 ]

Commit 3b3fd068c56e3fbea30090859216a368398e39bf added NULL check for
`rose_loopback_neigh->dev` in rose_loopback_timer() but omitted to
check rose_loopback_neigh->loopback.

It thus prevents *all* rose connect.

The reason is that a special rose_neigh loopback has a NULL device.

/proc/net/rose_neigh illustrates it via rose_neigh_show() function :
[...]
seq_printf(seq, "%05d %-9s %-4s %3d %3d %3s %3s %3lu %3lu",
rose_neigh->number,
(rose_neigh->loopback) ? "RSLOOP-0" : ax2asc(buf, &rose_neigh->callsign),
rose_neigh->dev ? rose_neigh->dev->name : "???",
rose_neigh->count,

/proc/net/rose_neigh displays special rose_loopback_neigh->loopback as
callsign RSLOOP-0:

addr callsign dev count use mode restart t0 tf digipeaters
00001 RSLOOP-0 ??? 1 2 DCE yes 0 0

By checking rose_loopback_neigh->loopback, rose_rx_call_request() is called
even in case rose_loopback_neigh->dev is NULL. This repairs rose connections.

Verification with rose client application FPAC:

FPAC-Node v 4.1.3 (built Aug 5 2022) for LINUX (help = h)
F6BVP-4 (Commands = ?) : u
Users - AX.25 Level 2 sessions :
Port Callsign Callsign AX.25 state ROSE state NetRom status
axudp F6BVP-5 -> F6BVP-9 Connected Connected ---------

Fixes: 3b3fd068c56e ("rose: Fix Null pointer dereference in rose_send_frame()")
Signed-off-by: Bernard Pidoux <[email protected]>
Suggested-by: Francois Romieu <[email protected]>
Cc: Thomas DL9SAU Osterried <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/rose/rose_loopback.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/rose/rose_loopback.c b/net/rose/rose_loopback.c
index 11c45c8c6c164..036d92c0ad794 100644
--- a/net/rose/rose_loopback.c
+++ b/net/rose/rose_loopback.c
@@ -96,7 +96,8 @@ static void rose_loopback_timer(struct timer_list *unused)
}

if (frametype == ROSE_CALL_REQUEST) {
- if (!rose_loopback_neigh->dev) {
+ if (!rose_loopback_neigh->dev &&
+ !rose_loopback_neigh->loopback) {
kfree_skb(skb);
continue;
}
--
2.35.1



2022-08-29 12:39:20

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 23/86] ice: xsk: Force rings to be sized to power of 2

From: Maciej Fijalkowski <[email protected]>

[ Upstream commit 296f13ff3854535009a185aaf8e3603266d39d94 ]

With the upcoming introduction of batching to XSK data path,
performance wise it will be the best to have the ring descriptor count
to be aligned to power of 2.

Check if ring sizes that user is going to attach the XSK socket fulfill
the condition above. For Tx side, although check is being done against
the Tx queue and in the end the socket will be attached to the XDP
queue, it is fine since XDP queues get the ring->count setting from Tx
queues.

Suggested-by: Alexander Lobakin <[email protected]>
Signed-off-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Reviewed-by: Alexander Lobakin <[email protected]>
Acked-by: Magnus Karlsson <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/ethernet/intel/ice/ice_xsk.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
index 5733526fa245c..4bb62950d92de 100644
--- a/drivers/net/ethernet/intel/ice/ice_xsk.c
+++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
@@ -371,6 +371,13 @@ int ice_xsk_pool_setup(struct ice_vsi *vsi, struct xsk_buff_pool *pool, u16 qid)
bool if_running, pool_present = !!pool;
int ret = 0, pool_failure = 0;

+ if (!is_power_of_2(vsi->rx_rings[qid]->count) ||
+ !is_power_of_2(vsi->tx_rings[qid]->count)) {
+ netdev_err(vsi->netdev, "Please align ring sizes to power of 2\n");
+ pool_failure = -EINVAL;
+ goto failure;
+ }
+
if_running = netif_running(vsi->netdev) && ice_is_xdp_ena_vsi(vsi);

if (if_running) {
@@ -393,6 +400,7 @@ int ice_xsk_pool_setup(struct ice_vsi *vsi, struct xsk_buff_pool *pool, u16 qid)
netdev_err(vsi->netdev, "ice_qp_ena error = %d\n", ret);
}

+failure:
if (pool_failure) {
netdev_err(vsi->netdev, "Could not %sable buffer pool, error = %d\n",
pool_present ? "en" : "dis", pool_failure);
--
2.35.1



2022-08-29 12:40:10

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 32/86] netfilter: nft_payload: report ERANGE for too long offset and length

From: Pablo Neira Ayuso <[email protected]>

[ Upstream commit 94254f990c07e9ddf1634e0b727fab821c3b5bf9 ]

Instead of offset and length are truncation to u8, report ERANGE.

Fixes: 96518518cc41 ("netfilter: add nftables")
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/netfilter/nft_payload.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/nft_payload.c b/net/netfilter/nft_payload.c
index 01878c16418c2..bdb07362c0ef0 100644
--- a/net/netfilter/nft_payload.c
+++ b/net/netfilter/nft_payload.c
@@ -733,6 +733,7 @@ nft_payload_select_ops(const struct nft_ctx *ctx,
{
enum nft_payload_bases base;
unsigned int offset, len;
+ int err;

if (tb[NFTA_PAYLOAD_BASE] == NULL ||
tb[NFTA_PAYLOAD_OFFSET] == NULL ||
@@ -758,8 +759,13 @@ nft_payload_select_ops(const struct nft_ctx *ctx,
if (tb[NFTA_PAYLOAD_DREG] == NULL)
return ERR_PTR(-EINVAL);

- offset = ntohl(nla_get_be32(tb[NFTA_PAYLOAD_OFFSET]));
- len = ntohl(nla_get_be32(tb[NFTA_PAYLOAD_LEN]));
+ err = nft_parse_u32_check(tb[NFTA_PAYLOAD_OFFSET], U8_MAX, &offset);
+ if (err < 0)
+ return ERR_PTR(err);
+
+ err = nft_parse_u32_check(tb[NFTA_PAYLOAD_LEN], U8_MAX, &len);
+ if (err < 0)
+ return ERR_PTR(err);

if (len <= 4 && is_power_of_2(len) && IS_ALIGNED(offset, len) &&
base != NFT_PAYLOAD_LL_HEADER)
--
2.35.1



2022-08-29 12:40:27

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 83/86] perf/x86/intel/uncore: Fix broken read_counter() for SNB IMC PMU

From: Stephane Eranian <[email protected]>

commit 11745ecfe8fea4b4a4c322967a7605d2ecbd5080 upstream.

Existing code was generating bogus counts for the SNB IMC bandwidth counters:

$ perf stat -a -I 1000 -e uncore_imc/data_reads/,uncore_imc/data_writes/
1.000327813 1,024.03 MiB uncore_imc/data_reads/
1.000327813 20.73 MiB uncore_imc/data_writes/
2.000580153 261,120.00 MiB uncore_imc/data_reads/
2.000580153 23.28 MiB uncore_imc/data_writes/

The problem was introduced by commit:
07ce734dd8ad ("perf/x86/intel/uncore: Clean up client IMC")

Where the read_counter callback was replace to point to the generic
uncore_mmio_read_counter() function.

The SNB IMC counters are freerunnig 32-bit counters laid out contiguously in
MMIO. But uncore_mmio_read_counter() is using a readq() call to read from
MMIO therefore reading 64-bit from MMIO. Although this is okay for the
uncore_perf_event_update() function because it is shifting the value based
on the actual counter width to compute a delta, it is not okay for the
uncore_pmu_event_start() which is simply reading the counter and therefore
priming the event->prev_count with a bogus value which is responsible for
causing bogus deltas in the perf stat command above.

The fix is to reintroduce the custom callback for read_counter for the SNB
IMC PMU and use readl() instead of readq(). With the change the output of
perf stat is back to normal:
$ perf stat -a -I 1000 -e uncore_imc/data_reads/,uncore_imc/data_writes/
1.000120987 296.94 MiB uncore_imc/data_reads/
1.000120987 138.42 MiB uncore_imc/data_writes/
2.000403144 175.91 MiB uncore_imc/data_reads/
2.000403144 68.50 MiB uncore_imc/data_writes/

Fixes: 07ce734dd8ad ("perf/x86/intel/uncore: Clean up client IMC")
Signed-off-by: Stephane Eranian <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/events/intel/uncore_snb.c | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)

--- a/arch/x86/events/intel/uncore_snb.c
+++ b/arch/x86/events/intel/uncore_snb.c
@@ -657,6 +657,22 @@ int snb_pci2phy_map_init(int devid)
return 0;
}

+static u64 snb_uncore_imc_read_counter(struct intel_uncore_box *box, struct perf_event *event)
+{
+ struct hw_perf_event *hwc = &event->hw;
+
+ /*
+ * SNB IMC counters are 32-bit and are laid out back to back
+ * in MMIO space. Therefore we must use a 32-bit accessor function
+ * using readq() from uncore_mmio_read_counter() causes problems
+ * because it is reading 64-bit at a time. This is okay for the
+ * uncore_perf_event_update() function because it drops the upper
+ * 32-bits but not okay for plain uncore_read_counter() as invoked
+ * in uncore_pmu_event_start().
+ */
+ return (u64)readl(box->io_addr + hwc->event_base);
+}
+
static struct pmu snb_uncore_imc_pmu = {
.task_ctx_nr = perf_invalid_context,
.event_init = snb_uncore_imc_event_init,
@@ -676,7 +692,7 @@ static struct intel_uncore_ops snb_uncor
.disable_event = snb_uncore_imc_disable_event,
.enable_event = snb_uncore_imc_enable_event,
.hw_config = snb_uncore_imc_hw_config,
- .read_counter = uncore_mmio_read_counter,
+ .read_counter = snb_uncore_imc_read_counter,
};

static struct intel_uncore_type snb_uncore_imc = {


2022-08-29 12:40:33

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 47/86] net: Fix data-races around weight_p and dev_weight_[rt]x_bias.

From: Kuniyuki Iwashima <[email protected]>

[ Upstream commit bf955b5ab8f6f7b0632cdef8e36b14e4f6e77829 ]

While reading weight_p, it can be changed concurrently. Thus, we need
to add READ_ONCE() to its reader.

Also, dev_[rt]x_weight can be read/written at the same time. So, we
need to use READ_ONCE() and WRITE_ONCE() for its access. Moreover, to
use the same weight_p while changing dev_[rt]x_weight, we add a mutex
in proc_do_dev_weight().

Fixes: 3d48b53fb2ae ("net: dev_weight: TX/RX orthogonality")
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/core/dev.c | 2 +-
net/core/sysctl_net_core.c | 15 +++++++++------
net/sched/sch_generic.c | 2 +-
3 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 637bc576fbd26..701a1afc91ff1 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6371,7 +6371,7 @@ static int process_backlog(struct napi_struct *napi, int quota)
net_rps_action_and_irq_enable(sd);
}

- napi->weight = dev_rx_weight;
+ napi->weight = READ_ONCE(dev_rx_weight);
while (again) {
struct sk_buff *skb;

diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index 2e0a4378e778a..0dfe9f255ab3a 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -235,14 +235,17 @@ static int set_default_qdisc(struct ctl_table *table, int write,
static int proc_do_dev_weight(struct ctl_table *table, int write,
void *buffer, size_t *lenp, loff_t *ppos)
{
- int ret;
+ static DEFINE_MUTEX(dev_weight_mutex);
+ int ret, weight;

+ mutex_lock(&dev_weight_mutex);
ret = proc_dointvec(table, write, buffer, lenp, ppos);
- if (ret != 0)
- return ret;
-
- dev_rx_weight = weight_p * dev_weight_rx_bias;
- dev_tx_weight = weight_p * dev_weight_tx_bias;
+ if (!ret && write) {
+ weight = READ_ONCE(weight_p);
+ WRITE_ONCE(dev_rx_weight, weight * dev_weight_rx_bias);
+ WRITE_ONCE(dev_tx_weight, weight * dev_weight_tx_bias);
+ }
+ mutex_unlock(&dev_weight_mutex);

return ret;
}
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 5d5391adb667c..68f1e89430b3b 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -403,7 +403,7 @@ static inline bool qdisc_restart(struct Qdisc *q, int *packets)

void __qdisc_run(struct Qdisc *q)
{
- int quota = dev_tx_weight;
+ int quota = READ_ONCE(dev_tx_weight);
int packets;

while (qdisc_restart(q, &packets)) {
--
2.35.1



2022-08-29 12:40:39

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 80/86] Documentation/ABI: Mention retbleed vulnerability info file for sysfs

From: Salvatore Bonaccorso <[email protected]>

commit 00da0cb385d05a89226e150a102eb49d8abb0359 upstream.

While reporting for the AMD retbleed vulnerability was added in

6b80b59b3555 ("x86/bugs: Report AMD retbleed vulnerability")

the new sysfs file was not mentioned so far in the ABI documentation for
sysfs-devices-system-cpu. Fix that.

Fixes: 6b80b59b3555 ("x86/bugs: Report AMD retbleed vulnerability")
Signed-off-by: Salvatore Bonaccorso <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
Documentation/ABI/testing/sysfs-devices-system-cpu | 1 +
1 file changed, 1 insertion(+)

--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -511,6 +511,7 @@ What: /sys/devices/system/cpu/vulnerabi
/sys/devices/system/cpu/vulnerabilities/tsx_async_abort
/sys/devices/system/cpu/vulnerabilities/itlb_multihit
/sys/devices/system/cpu/vulnerabilities/mmio_stale_data
+ /sys/devices/system/cpu/vulnerabilities/retbleed
Date: January 2018
Contact: Linux kernel mailing list <[email protected]>
Description: Information about CPU vulnerabilities


2022-08-29 12:41:03

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 38/86] netfilter: nf_tables: consolidate rule verdict trace call

From: Pablo Neira Ayuso <[email protected]>

[ Upstream commit 4765473fefd4403b5eeca371637065b561522c50 ]

Add function to consolidate verdict tracing.

Signed-off-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/netfilter/nf_tables_core.c | 39 ++++++++++++++++++++++++++++------
1 file changed, 32 insertions(+), 7 deletions(-)

diff --git a/net/netfilter/nf_tables_core.c b/net/netfilter/nf_tables_core.c
index a61b5bf5aa0fb..6dd27c8cd4253 100644
--- a/net/netfilter/nf_tables_core.c
+++ b/net/netfilter/nf_tables_core.c
@@ -67,6 +67,36 @@ static void nft_cmp_fast_eval(const struct nft_expr *expr,
regs->verdict.code = NFT_BREAK;
}

+static noinline void __nft_trace_verdict(struct nft_traceinfo *info,
+ const struct nft_chain *chain,
+ const struct nft_regs *regs)
+{
+ enum nft_trace_types type;
+
+ switch (regs->verdict.code) {
+ case NFT_CONTINUE:
+ case NFT_RETURN:
+ type = NFT_TRACETYPE_RETURN;
+ break;
+ default:
+ type = NFT_TRACETYPE_RULE;
+ break;
+ }
+
+ __nft_trace_packet(info, chain, type);
+}
+
+static inline void nft_trace_verdict(struct nft_traceinfo *info,
+ const struct nft_chain *chain,
+ const struct nft_rule *rule,
+ const struct nft_regs *regs)
+{
+ if (static_branch_unlikely(&nft_trace_enabled)) {
+ info->rule = rule;
+ __nft_trace_verdict(info, chain, regs);
+ }
+}
+
static bool nft_payload_fast_eval(const struct nft_expr *expr,
struct nft_regs *regs,
const struct nft_pktinfo *pkt)
@@ -207,13 +237,13 @@ nft_do_chain(struct nft_pktinfo *pkt, void *priv)
break;
}

+ nft_trace_verdict(&info, chain, rule, &regs);
+
switch (regs.verdict.code & NF_VERDICT_MASK) {
case NF_ACCEPT:
case NF_DROP:
case NF_QUEUE:
case NF_STOLEN:
- nft_trace_packet(&info, chain, rule,
- NFT_TRACETYPE_RULE);
return regs.verdict.code;
}

@@ -226,15 +256,10 @@ nft_do_chain(struct nft_pktinfo *pkt, void *priv)
stackptr++;
fallthrough;
case NFT_GOTO:
- nft_trace_packet(&info, chain, rule,
- NFT_TRACETYPE_RULE);
-
chain = regs.verdict.chain;
goto do_chain;
case NFT_CONTINUE:
case NFT_RETURN:
- nft_trace_packet(&info, chain, rule,
- NFT_TRACETYPE_RETURN);
break;
default:
WARN_ON(1);
--
2.35.1



2022-08-29 12:41:11

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 63/86] ionic: fix up issues with handling EAGAIN on FW cmds

From: Shannon Nelson <[email protected]>

[ Upstream commit 0fc4dd452d6c14828eed6369155c75c0ac15bab3 ]

In looping on FW update tests we occasionally see the
FW_ACTIVATE_STATUS command fail while it is in its EAGAIN loop
waiting for the FW activate step to finsh inside the FW. The
firmware is complaining that the done bit is set when a new
dev_cmd is going to be processed.

Doing a clean on the cmd registers and doorbell before exiting
the wait-for-done and cleaning the done bit before the sleep
prevents this from occurring.

Fixes: fbfb8031533c ("ionic: Add hardware init and device commands")
Signed-off-by: Shannon Nelson <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/ethernet/pensando/ionic/ionic_main.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/pensando/ionic/ionic_main.c b/drivers/net/ethernet/pensando/ionic/ionic_main.c
index e14869a2e24a5..f60ffef33e0ce 100644
--- a/drivers/net/ethernet/pensando/ionic/ionic_main.c
+++ b/drivers/net/ethernet/pensando/ionic/ionic_main.c
@@ -378,8 +378,8 @@ int ionic_dev_cmd_wait(struct ionic *ionic, unsigned long max_seconds)
ionic_opcode_to_str(opcode), opcode,
ionic_error_to_str(err), err);

- msleep(1000);
iowrite32(0, &idev->dev_cmd_regs->done);
+ msleep(1000);
iowrite32(1, &idev->dev_cmd_regs->doorbell);
goto try_again;
}
@@ -392,6 +392,8 @@ int ionic_dev_cmd_wait(struct ionic *ionic, unsigned long max_seconds)
return ionic_error_to_errno(err);
}

+ ionic_dev_cmd_clean(ionic);
+
return 0;
}

--
2.35.1



2022-08-29 12:49:08

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 71/86] loop: Check for overflow while configuring loop

From: Siddh Raman Pant <[email protected]>

commit c490a0b5a4f36da3918181a8acdc6991d967c5f3 upstream.

The userspace can configure a loop using an ioctl call, wherein
a configuration of type loop_config is passed (see lo_ioctl()'s
case on line 1550 of drivers/block/loop.c). This proceeds to call
loop_configure() which in turn calls loop_set_status_from_info()
(see line 1050 of loop.c), passing &config->info which is of type
loop_info64*. This function then sets the appropriate values, like
the offset.

loop_device has lo_offset of type loff_t (see line 52 of loop.c),
which is typdef-chained to long long, whereas loop_info64 has
lo_offset of type __u64 (see line 56 of include/uapi/linux/loop.h).

The function directly copies offset from info to the device as
follows (See line 980 of loop.c):
lo->lo_offset = info->lo_offset;

This results in an overflow, which triggers a warning in iomap_iter()
due to a call to iomap_iter_done() which has:
WARN_ON_ONCE(iter->iomap.offset > iter->pos);

Thus, check for negative value during loop_set_status_from_info().

Bug report: https://syzkaller.appspot.com/bug?id=c620fe14aac810396d3c3edc9ad73848bf69a29e

Reported-and-tested-by: [email protected]
Cc: [email protected]
Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Siddh Raman Pant <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/block/loop.c | 5 +++++
1 file changed, 5 insertions(+)

--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1031,6 +1031,11 @@ loop_set_status_from_info(struct loop_de

lo->lo_offset = info->lo_offset;
lo->lo_sizelimit = info->lo_sizelimit;
+
+ /* loff_t vars have been assigned __u64 */
+ if (lo->lo_offset < 0 || lo->lo_sizelimit < 0)
+ return -EOVERFLOW;
+
memcpy(lo->lo_file_name, info->lo_file_name, LO_NAME_SIZE);
memcpy(lo->lo_crypt_name, info->lo_crypt_name, LO_NAME_SIZE);
lo->lo_file_name[LO_NAME_SIZE-1] = 0;


2022-08-29 12:57:37

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 65/86] btrfs: replace: drop assert for suspended replace

From: Anand Jain <[email protected]>

commit 59a3991984dbc1fc47e5651a265c5200bd85464e upstream.

If the filesystem mounts with the replace-operation in a suspended state
and try to cancel the suspended replace-operation, we hit the assert. The
assert came from the commit fe97e2e173af ("btrfs: dev-replace: replace's
scrub must not be running in suspended state") that was actually not
required. So just remove it.

$ mount /dev/sda5 /btrfs

BTRFS info (device sda5): cannot continue dev_replace, tgtdev is missing
BTRFS info (device sda5): you may cancel the operation after 'mount -o degraded'

$ mount -o degraded /dev/sda5 /btrfs <-- success.

$ btrfs replace cancel /btrfs

kernel: assertion failed: ret != -ENOTCONN, in fs/btrfs/dev-replace.c:1131
kernel: ------------[ cut here ]------------
kernel: kernel BUG at fs/btrfs/ctree.h:3750!

After the patch:

$ btrfs replace cancel /btrfs

BTRFS info (device sda5): suspended dev_replace from /dev/sda5 (devid 1) to <missing disk> canceled

Fixes: fe97e2e173af ("btrfs: dev-replace: replace's scrub must not be running in suspended state")
CC: [email protected] # 5.0+
Signed-off-by: Anand Jain <[email protected]>
Signed-off-by: David Sterba <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/btrfs/dev-replace.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -954,8 +954,7 @@ int btrfs_dev_replace_cancel(struct btrf
up_write(&dev_replace->rwsem);

/* Scrub for replace must not be running in suspended state */
- ret = btrfs_scrub_cancel(fs_info);
- ASSERT(ret != -ENOTCONN);
+ btrfs_scrub_cancel(fs_info);

trans = btrfs_start_transaction(root, 0);
if (IS_ERR(trans)) {


2022-08-29 12:58:28

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.10 50/86] ratelimit: Fix data-races in ___ratelimit().

From: Kuniyuki Iwashima <[email protected]>

[ Upstream commit 6bae8ceb90ba76cdba39496db936164fa672b9be ]

While reading rs->interval and rs->burst, they can be changed
concurrently via sysctl (e.g. net_ratelimit_state). Thus, we
need to add READ_ONCE() to their readers.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
lib/ratelimit.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/lib/ratelimit.c b/lib/ratelimit.c
index e01a93f46f833..ce945c17980b9 100644
--- a/lib/ratelimit.c
+++ b/lib/ratelimit.c
@@ -26,10 +26,16 @@
*/
int ___ratelimit(struct ratelimit_state *rs, const char *func)
{
+ /* Paired with WRITE_ONCE() in .proc_handler().
+ * Changing two values seperately could be inconsistent
+ * and some message could be lost. (See: net_ratelimit_state).
+ */
+ int interval = READ_ONCE(rs->interval);
+ int burst = READ_ONCE(rs->burst);
unsigned long flags;
int ret;

- if (!rs->interval)
+ if (!interval)
return 1;

/*
@@ -44,7 +50,7 @@ int ___ratelimit(struct ratelimit_state *rs, const char *func)
if (!rs->begin)
rs->begin = jiffies;

- if (time_is_before_jiffies(rs->begin + rs->interval)) {
+ if (time_is_before_jiffies(rs->begin + interval)) {
if (rs->missed) {
if (!(rs->flags & RATELIMIT_MSG_ON_RELEASE)) {
printk_deferred(KERN_WARNING
@@ -56,7 +62,7 @@ int ___ratelimit(struct ratelimit_state *rs, const char *func)
rs->begin = jiffies;
rs->printed = 0;
}
- if (rs->burst && rs->burst > rs->printed) {
+ if (burst && burst > rs->printed) {
rs->printed++;
ret = 1;
} else {
--
2.35.1



2022-08-29 18:56:24

by Florian Fainelli

[permalink] [raw]
Subject: Re: [PATCH 5.10 00/86] 5.10.140-rc1 review



On 8/29/2022 3:58 AM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 5.10.140 release.
> There are 86 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed, 31 Aug 2022 10:57:37 +0000.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.140-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h

On ARCH_BRCMSTB using 32-bit and 64-bit ARM kernels and build tested on
BMIPS_GENERIC:

Tested-by: Florian Fainelli <[email protected]>
--
Florian

2022-08-29 19:09:50

by Slade's Kernel Patch Bot

[permalink] [raw]
Subject: Re: [PATCH 5.10 00/86] 5.10.140-rc1 review

On Mon, Aug 29, 2022, at 6:58 AM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 5.10.140 release.
> There are 86 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed, 31 Aug 2022 10:57:37 +0000.
> Anything received after that time might be too late.

5.10.140-rc1 compiled and booted with no errors or regressions on my x86_64 test system.

Tested-by: Slade Watkins <[email protected]>

Cheers,
-srw

2022-08-29 19:17:46

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH 5.10 00/86] 5.10.140-rc1 review

Hi!

> This is the start of the stable review cycle for the 5.10.140 release.
> There are 86 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.

CIP testing did not find any problems here:

https://gitlab.com/cip-project/cip-testing/linux-stable-rc-ci/-/tree/linux-5.10.y

Tested-by: Pavel Machek (CIP) <[email protected]>

Best regards,
Pavel
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany


Attachments:
(No filename) (661.00 B)
signature.asc (201.00 B)
Download all attachments

2022-08-29 22:20:32

by Shuah Khan

[permalink] [raw]
Subject: Re: [PATCH 5.10 00/86] 5.10.140-rc1 review

On 8/29/22 04:58, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 5.10.140 release.
> There are 86 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed, 31 Aug 2022 10:57:37 +0000.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.140-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h
>

Compiled and booted on my test system. No dmesg regressions.

Tested-by: Shuah Khan <[email protected]>

thanks,
-- Shuah

2022-08-30 00:51:28

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH 5.10 00/86] 5.10.140-rc1 review

On Mon, Aug 29, 2022 at 12:58:26PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 5.10.140 release.
> There are 86 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed, 31 Aug 2022 10:57:37 +0000.
> Anything received after that time might be too late.
>

Build results:
total: 163 pass: 163 fail: 0
Qemu test results:
total: 474 pass: 474 fail: 0

Tested-by: Guenter Roeck <[email protected]>

Guenter

2022-08-30 02:49:18

by Daniel Díaz

[permalink] [raw]
Subject: Re: [PATCH 5.10 00/86] 5.10.140-rc1 review

Hello!

On 29/08/22 05:58, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 5.10.140 release.
> There are 86 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed, 31 Aug 2022 10:57:37 +0000.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.140-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h

Results from Linaro's test farm.
No regressions on arm64, arm, x86_64, and i386.

Tested-by: Linux Kernel Functional Testing <[email protected]>

## Build
* kernel: 5.10.140-rc1
* git: https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc
* git branch: linux-5.10.y
* git commit: 10c6bbc07890234ed728ef39924dcdd3bd211e15
* git describe: v5.10.138-89-g10c6bbc07890
* test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.10.y/build/v5.10.138-89-g10c6bbc07890

## No test regressions (compared to v5.10.139)

## No metric regressions (compared to v5.10.139)

## No test fixes (compared to v5.10.139)

## No metric fixes (compared to v5.10.139)

## Test result summary
total: 117997, pass: 106134, fail: 761, skip: 10810, xfail: 292

## Build Summary
* arc: 20 total, 20 passed, 0 failed
* arm: 590 total, 590 passed, 0 failed
* arm64: 110 total, 107 passed, 3 failed
* i386: 92 total, 89 passed, 3 failed
* mips: 90 total, 90 passed, 0 failed
* parisc: 24 total, 24 passed, 0 failed
* powerpc: 102 total, 102 passed, 0 failed
* riscv: 54 total, 54 passed, 0 failed
* s390: 42 total, 42 passed, 0 failed
* sh: 48 total, 48 passed, 0 failed
* sparc: 24 total, 24 passed, 0 failed
* x86_64: 97 total, 94 passed, 3 failed

## Test suites summary
* fwts
* igt-gpu-tools
* kunit
* kvm-unit-tests
* libgpiod
* libhugetlbfs
* log-parser-boot
* log-parser-test
* ltp-cap_bounds
* ltp-commands
* ltp-containers
* ltp-controllers
* ltp-cpuhotplug
* ltp-crypto
* ltp-cve
* ltp-dio
* ltp-fcntl-locktests
* ltp-filecaps
* ltp-fs
* ltp-fs_bind
* ltp-fs_perms_simple
* ltp-fsx
* ltp-hugetlb
* ltp-io
* ltp-ipc
* ltp-math
* ltp-mm
* ltp-nptl
* ltp-open-posix-tests
* ltp-pty
* ltp-sched
* ltp-securebits
* ltp-syscalls
* ltp-tracing
* network-basic-tests
* packetdrill
* rcutorture
* v4l2-compliance
* vdso


Greetings!

Daniel Díaz
[email protected]

--
Linaro LKFT
https://lkft.linaro.org

2022-08-30 10:37:26

by Jon Hunter

[permalink] [raw]
Subject: Re: [PATCH 5.10 00/86] 5.10.140-rc1 review


On 29/08/2022 11:58, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 5.10.140 release.
> There are 86 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed, 31 Aug 2022 10:57:37 +0000.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.140-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h


All tests passing for Tegra ...

Test results for stable-v5.10:
10 builds: 10 pass, 0 fail
28 boots: 28 pass, 0 fail
75 tests: 75 pass, 0 fail

Linux version: 5.10.140-rc1-g10c6bbc07890
Boards tested: tegra124-jetson-tk1, tegra186-p2771-0000,
tegra194-p2972-0000, tegra194-p3509-0000+p3668-0000,
tegra20-ventana, tegra210-p2371-2180,
tegra210-p3450-0000, tegra30-cardhu-a04

Tested-by: Jon Hunter <[email protected]>

Jon

--
nvpublic

2022-08-30 11:10:46

by Sudip Mukherjee

[permalink] [raw]
Subject: Re: [PATCH 5.10 00/86] 5.10.140-rc1 review

Hi Greg,

On Mon, Aug 29, 2022 at 12:58:26PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 5.10.140 release.
> There are 86 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed, 31 Aug 2022 10:57:37 +0000.
> Anything received after that time might be too late.

Build test (gcc version 11.3.1 20220819):
mips: 63 configs -> no failure
arm: 104 configs -> no failure
arm64: 3 configs -> no failure
x86_64: 4 configs -> no failure
alpha allmodconfig -> no failure
powerpc allmodconfig -> no failure
riscv allmodconfig -> no failure
s390 allmodconfig -> no failure
xtensa allmodconfig -> no failure

Boot test:
x86_64: Booted on my test laptop. No regression.
x86_64: Booted on qemu. No regression. [1]
arm64: Booted on rpi4b (4GB model). No regression. [2]

[1]. https://openqa.qa.codethink.co.uk/tests/1728
[2]. https://openqa.qa.codethink.co.uk/tests/1732


Tested-by: Sudip Mukherjee <[email protected]>

--
Regards
Sudip

2022-08-30 12:05:46

by Rudi Heitbaum

[permalink] [raw]
Subject: Re: [PATCH 5.10 00/86] 5.10.140-rc1 review

On Mon, Aug 29, 2022 at 12:58:26PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 5.10.140 release.
> There are 86 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed, 31 Aug 2022 10:57:37 +0000.
> Anything received after that time might be too late.

Hi Greg,

5.10.140-rc1 tested.

Run tested on:
- Intel Skylake x86_64 (nuc6 i5-6260U)

In addition - build tested for:
- Allwinner A64
- Allwinner H3
- Allwinner H5
- Allwinner H6
- Rockchip RK3288
- Rockchip RK3328
- Rockchip RK3399pro

Tested-by: Rudi Heitbaum <[email protected]>
--
Rudi