2022-08-01 12:48:39

by Greg KH

[permalink] [raw]
Subject: [PATCH 5.18 00/88] 5.18.16-rc1 review

This is the start of the stable review cycle for the 5.18.16 release.
There are 88 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.

Responses should be made by Wed, 03 Aug 2022 11:41:16 +0000.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.18.16-rc1.gz
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.18.y
and the diffstat can be found below.

thanks,

greg k-h

-------------
Pseudo-Shortlog of commits:

Greg Kroah-Hartman <[email protected]>
Linux 5.18.16-rc1

Thadeu Lima de Souza Cascardo <[email protected]>
x86/bugs: Do not enable IBPB at firmware entry when IBPB is not available

Waiman Long <[email protected]>
locking/rwsem: Allow slowpath writer to ignore handoff bit if not set by first waiter

Eiichi Tsukata <[email protected]>
docs/kernel-parameters: Update descriptions for "mitigations=" param with retbleed

Sherry Sun <[email protected]>
EDAC/synopsys: Re-enable the error interrupts on v3 hw

Sherry Sun <[email protected]>
EDAC/synopsys: Use the correct register to disable the error interrupt on v3 hw

Toshi Kani <[email protected]>
EDAC/ghes: Set the DIMM label unconditionally

Florian Fainelli <[email protected]>
ARM: 9216/1: Fix MAX_DMA_ADDRESS overflow

Kuniyuki Iwashima <[email protected]>
tcp: Fix data-races around sysctl_tcp_workaround_signed_windows.

Jaewon Kim <[email protected]>
page_alloc: fix invalid watermark check on a negative value

Ralph Campbell <[email protected]>
mm/hmm: fault non-owner device private entries

Dan Carpenter <[email protected]>
stmmac: dwmac-mediatek: fix resource leak in probe

Dimitris Michailidis <[email protected]>
net/funeth: Fix fun_xdp_tx() and XDP packet reclaim

Xin Long <[email protected]>
sctp: leave the err path free in sctp_stream_init to sctp_stream_free

Alejandro Lucero <[email protected]>
sfc: disable softirqs for ptp TX

Leo Yan <[email protected]>
perf symbol: Correct address for bss symbols

Jason Wang <[email protected]>
virtio-net: fix the race between refill work and close

Geliang Tang <[email protected]>
mptcp: don't send RST for single subflow

Bart Van Assche <[email protected]>
scsi: ufs: core: Fix a race condition related to device management

Bart Van Assche <[email protected]>
scsi: ufs: Support clearing multiple commands at once

Florian Westphal <[email protected]>
netfilter: nf_queue: do not allow packet truncation below transport header offset

Sunil Goutham <[email protected]>
octeontx2-pf: cn10k: Fix egress ratelimit configuration

Duoming Zhou <[email protected]>
sctp: fix sleep in atomic context bug in timer handlers

Vladimir Oltean <[email protected]>
net: dsa: fix reference counting for LAG FDBs

Michal Maloszewski <[email protected]>
i40e: Fix interface init with MSI interrupts (no MSI-X)

Kuniyuki Iwashima <[email protected]>
ipv4: Fix data-races around sysctl_fib_notify_on_flag_change.

Kuniyuki Iwashima <[email protected]>
tcp: Fix data-races around sysctl_tcp_reflect_tos.

Kuniyuki Iwashima <[email protected]>
tcp: Fix a data-race around sysctl_tcp_comp_sack_nr.

Kuniyuki Iwashima <[email protected]>
tcp: Fix a data-race around sysctl_tcp_comp_sack_slack_ns.

Kuniyuki Iwashima <[email protected]>
tcp: Fix a data-race around sysctl_tcp_comp_sack_delay_ns.

Kuniyuki Iwashima <[email protected]>
net: Fix data-races around sysctl_[rw]mem(_offset)?.

Kuniyuki Iwashima <[email protected]>
tcp: Fix data-races around sk_pacing_rate.

Taehee Yoo <[email protected]>
net: mld: fix reference count leak in mld_{query | report}_work()

Jianglei Nie <[email protected]>
net: macsec: fix potential resource leak in macsec_add_rxsa() and macsec_add_txsa()

Sabrina Dubroca <[email protected]>
macsec: always read MACSEC_SA_ATTR_PN as a u64

Sabrina Dubroca <[email protected]>
macsec: limit replay window size with XPN

Sabrina Dubroca <[email protected]>
macsec: fix error message in macsec_add_rxsa and _txsa

Sabrina Dubroca <[email protected]>
macsec: fix NULL deref in macsec_add_rxsa

Xin Long <[email protected]>
Documentation: fix sctp_wmem in ip-sysctl.rst

Kuniyuki Iwashima <[email protected]>
tcp: Fix a data-race around sysctl_tcp_invalid_ratelimit.

Kuniyuki Iwashima <[email protected]>
tcp: Fix a data-race around sysctl_tcp_autocorking.

Kuniyuki Iwashima <[email protected]>
tcp: Fix a data-race around sysctl_tcp_min_rtt_wlen.

Kuniyuki Iwashima <[email protected]>
tcp: Fix a data-race around sysctl_tcp_tso_rtt_log.

Kuniyuki Iwashima <[email protected]>
tcp: Fix a data-race around sysctl_tcp_min_tso_segs.

Tom Rix <[email protected]>
mlxsw: spectrum_router: simplify list unwinding

Liang He <[email protected]>
net: sungem_phy: Add of_node_put() for reference returned by of_get_parent()

Vladimir Oltean <[email protected]>
net: pcs: xpcs: propagate xpcs_read error to xpcs_get_state_c37_sgmii

Maxim Mikityanskiy <[email protected]>
net/tls: Remove the context from the list in tls_device_down

Ziyang Xuan <[email protected]>
ipv6/addrconf: fix a null-ptr-deref bug for ip6_ptr

Kuniyuki Iwashima <[email protected]>
net: ping6: Fix memleak in ipv6_renew_options().

David Jeffery <[email protected]>
scsi: mpt3sas: Stop fw fault watchdog work item during system shutdown

Jason Yan <[email protected]>
scsi: core: Fix warning in scsi_alloc_sgtables()

Kuniyuki Iwashima <[email protected]>
tcp: Fix a data-race around sysctl_tcp_challenge_ack_limit.

Kuniyuki Iwashima <[email protected]>
tcp: Fix a data-race around sysctl_tcp_limit_output_bytes.

Kuniyuki Iwashima <[email protected]>
tcp: Fix data-races around sysctl_tcp_moderate_rcvbuf.

Eric Dumazet <[email protected]>
tcp: md5: fix IPv4-mapped support

Subbaraya Sundeep <[email protected]>
octeontx2-pf: Fix UDP/TCP src and dst port tc filters

Wei Wang <[email protected]>
Revert "tcp: change pingpong threshold to 3"

Liang He <[email protected]>
scsi: ufs: host: Hold reference returned by of_parse_phandle()

Anirudh Venkataramanan <[email protected]>
ice: Fix VSIs unable to share unicast MAC

Maciej Fijalkowski <[email protected]>
ice: do not setup vlan for loopback VSI

Maciej Fijalkowski <[email protected]>
ice: check (DD | EOF) bits on Rx descriptor rather than (EOP | RS)

Przemyslaw Patynowski <[email protected]>
ice: Fix max VLANs available for VF

Benjamin Poirier <[email protected]>
bridge: Do not send empty IFLA_AF_SPEC attribute

Kuniyuki Iwashima <[email protected]>
tcp: Fix data-races around sysctl_tcp_no_ssthresh_metrics_save.

Kuniyuki Iwashima <[email protected]>
tcp: Fix a data-race around sysctl_tcp_nometrics_save.

Kuniyuki Iwashima <[email protected]>
tcp: Fix a data-race around sysctl_tcp_frto.

Kuniyuki Iwashima <[email protected]>
tcp: Fix a data-race around sysctl_tcp_adv_win_scale.

Kuniyuki Iwashima <[email protected]>
tcp: Fix a data-race around sysctl_tcp_app_win.

Kuniyuki Iwashima <[email protected]>
tcp: Fix data-races around sysctl_tcp_dsack.

Linus Torvalds <[email protected]>
watch_queue: Fix missing locking in add_watch_to_object()

David Howells <[email protected]>
watch_queue: Fix missing rcu annotation

Nathan Chancellor <[email protected]>
drm/simpledrm: Fix return type of simpledrm_simple_display_pipe_mode_valid()

Alistair Popple <[email protected]>
nouveau/svm: Fix to migrate all requested pages

Waiman Long <[email protected]>
intel_idle: Fix false positive RCU splats due to incorrect hardirqs state

Harald Freudenberger <[email protected]>
s390/archrandom: prevent CPACF trng invocations in interrupt context

Lukas Bulwahn <[email protected]>
asm-generic: remove a broken and needless ifdef conditional

Miaohe Lin <[email protected]>
hugetlb: fix memoryleak in hugetlb_mcopy_atomic_pte

Muchun Song <[email protected]>
mm: fix missing wake-up event for FSDAX pages

Josef Bacik <[email protected]>
mm: fix page leak with multiple threads mapping the same page

Mike Rapoport <[email protected]>
secretmem: fix unhandled fault in truncate

Andrei Vagin <[email protected]>
fs: sendfile handles O_NONBLOCK of out_fd

ChenXiaoSong <[email protected]>
ntfs: fix use-after-free in ntfs_ucsncmp()

Nadav Amit <[email protected]>
userfaultfd: provide properly masked address for huge-pages

Junxiao Bi <[email protected]>
Revert "ocfs2: mount shared volume without ha stack"

Linus Walleij <[email protected]>
ARM: pxa2xx: Fix GPIO descriptor tables

Michael Walle <[email protected]>
ARM: dts: lan966x: fix sys_clk frequency

Luiz Augusto von Dentz <[email protected]>
Bluetooth: L2CAP: Fix use-after-free caused by l2cap_chan_put

Abhishek Pandit-Subedi <[email protected]>
Bluetooth: Always set event mask on suspend


-------------

Diffstat:

Documentation/admin-guide/kernel-parameters.txt | 2 +
Documentation/networking/ip-sysctl.rst | 9 +-
Makefile | 4 +-
arch/arm/boot/dts/lan966x.dtsi | 2 +-
arch/arm/include/asm/dma.h | 2 +-
arch/arm/mach-pxa/corgi.c | 2 +-
arch/arm/mach-pxa/hx4700.c | 2 +-
arch/arm/mach-pxa/icontrol.c | 4 +-
arch/arm/mach-pxa/littleton.c | 2 +-
arch/arm/mach-pxa/magician.c | 2 +-
arch/arm/mach-pxa/spitz.c | 2 +-
arch/arm/mach-pxa/z2.c | 4 +-
arch/s390/include/asm/archrandom.h | 9 +-
arch/x86/kernel/cpu/bugs.c | 1 +
drivers/edac/ghes_edac.c | 11 ++-
drivers/edac/synopsys_edac.c | 44 +++++----
drivers/gpu/drm/nouveau/nouveau_dmem.c | 6 +-
drivers/gpu/drm/tiny/simpledrm.c | 2 +-
drivers/idle/intel_idle.c | 8 +-
drivers/net/ethernet/fungible/funeth/funeth_rx.c | 5 +-
drivers/net/ethernet/fungible/funeth/funeth_tx.c | 20 ++--
drivers/net/ethernet/fungible/funeth/funeth_txrx.h | 6 +-
drivers/net/ethernet/intel/i40e/i40e_main.c | 4 +
drivers/net/ethernet/intel/ice/ice_ethtool.c | 3 +-
drivers/net/ethernet/intel/ice/ice_main.c | 10 +-
drivers/net/ethernet/intel/ice/ice_sriov.c | 40 --------
drivers/net/ethernet/intel/ice/ice_virtchnl.c | 3 +-
.../net/ethernet/marvell/octeontx2/nic/otx2_tc.c | 106 ++++++++++++++-------
.../net/ethernet/mellanox/mlxsw/spectrum_router.c | 20 ++--
drivers/net/ethernet/sfc/ptp.c | 22 +++++
.../net/ethernet/stmicro/stmmac/dwmac-mediatek.c | 9 +-
drivers/net/macsec.c | 33 ++++---
drivers/net/pcs/pcs-xpcs.c | 2 +-
drivers/net/sungem_phy.c | 1 +
drivers/net/virtio_net.c | 37 ++++++-
drivers/scsi/mpt3sas/mpt3sas_scsih.c | 1 +
drivers/scsi/scsi_ioctl.c | 2 +-
drivers/scsi/ufs/ufshcd-pltfrm.c | 15 ++-
drivers/scsi/ufs/ufshcd.c | 98 +++++++++++++------
fs/ntfs/attrib.c | 8 +-
fs/ocfs2/ocfs2.h | 4 +-
fs/ocfs2/slot_map.c | 46 ++++-----
fs/ocfs2/super.c | 21 ----
fs/read_write.c | 3 +
fs/userfaultfd.c | 12 ++-
include/asm-generic/io.h | 2 -
include/linux/mm.h | 14 ++-
include/net/addrconf.h | 3 +
include/net/bluetooth/l2cap.h | 1 +
include/net/inet_connection_sock.h | 10 +-
include/net/sock.h | 8 +-
include/net/tcp.h | 2 +-
kernel/locking/rwsem.c | 30 ++++--
kernel/watch_queue.c | 58 ++++++-----
mm/gup.c | 6 +-
mm/hmm.c | 19 ++--
mm/hugetlb.c | 1 +
mm/memory.c | 7 +-
mm/memremap.c | 6 +-
mm/page_alloc.c | 12 ++-
mm/secretmem.c | 33 +++++--
net/bluetooth/hci_sync.c | 6 +-
net/bluetooth/l2cap_core.c | 61 +++++++++---
net/bridge/br_netlink.c | 8 +-
net/decnet/af_decnet.c | 4 +-
net/dsa/switch.c | 1 +
net/ipv4/fib_trie.c | 7 +-
net/ipv4/tcp.c | 23 +++--
net/ipv4/tcp_input.c | 41 ++++----
net/ipv4/tcp_ipv4.c | 4 +-
net/ipv4/tcp_metrics.c | 10 +-
net/ipv4/tcp_output.c | 27 +++---
net/ipv6/mcast.c | 14 +--
net/ipv6/ping.c | 6 ++
net/ipv6/tcp_ipv6.c | 4 +-
net/mptcp/protocol.c | 8 +-
net/mptcp/subflow.c | 10 +-
net/netfilter/nfnetlink_queue.c | 7 +-
net/sctp/associola.c | 5 +-
net/sctp/stream.c | 19 +---
net/sctp/stream_sched.c | 2 +-
net/tipc/socket.c | 2 +-
net/tls/tls_device.c | 7 +-
tools/perf/util/symbol-elf.c | 45 ++++++++-
84 files changed, 727 insertions(+), 455 deletions(-)




2022-08-01 12:48:46

by Greg KH

[permalink] [raw]
Subject: [PATCH 5.18 31/88] scsi: ufs: host: Hold reference returned by of_parse_phandle()

From: Liang He <[email protected]>

commit a3435afba87dc6cd83f5595e7607f3c40f93ef01 upstream.

In ufshcd_populate_vreg(), we should hold the reference returned by
of_parse_phandle() and then use it to call of_node_put() for refcount
balance.

Link: https://lore.kernel.org/r/[email protected]
Fixes: aa4976130934 ("ufs: Add regulator enable support")
Reviewed-by: Bart Van Assche <[email protected]>
Signed-off-by: Liang He <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/scsi/ufs/ufshcd-pltfrm.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)

--- a/drivers/scsi/ufs/ufshcd-pltfrm.c
+++ b/drivers/scsi/ufs/ufshcd-pltfrm.c
@@ -107,9 +107,20 @@ out:
return ret;
}

+static bool phandle_exists(const struct device_node *np,
+ const char *phandle_name, int index)
+{
+ struct device_node *parse_np = of_parse_phandle(np, phandle_name, index);
+
+ if (parse_np)
+ of_node_put(parse_np);
+
+ return parse_np != NULL;
+}
+
#define MAX_PROP_SIZE 32
static int ufshcd_populate_vreg(struct device *dev, const char *name,
- struct ufs_vreg **out_vreg)
+ struct ufs_vreg **out_vreg)
{
char prop_name[MAX_PROP_SIZE];
struct ufs_vreg *vreg = NULL;
@@ -121,7 +132,7 @@ static int ufshcd_populate_vreg(struct d
}

snprintf(prop_name, MAX_PROP_SIZE, "%s-supply", name);
- if (!of_parse_phandle(np, prop_name, 0)) {
+ if (!phandle_exists(np, prop_name, 0)) {
dev_info(dev, "%s: Unable to find %s regulator, assuming enabled\n",
__func__, prop_name);
goto out;



2022-08-01 12:48:54

by Greg KH

[permalink] [raw]
Subject: [PATCH 5.18 56/88] net: macsec: fix potential resource leak in macsec_add_rxsa() and macsec_add_txsa()

From: Jianglei Nie <[email protected]>

[ Upstream commit c7b205fbbf3cffa374721bb7623f7aa8c46074f1 ]

init_rx_sa() allocates relevant resource for rx_sa->stats and rx_sa->
key.tfm with alloc_percpu() and macsec_alloc_tfm(). When some error
occurs after init_rx_sa() is called in macsec_add_rxsa(), the function
released rx_sa with kfree() without releasing rx_sa->stats and rx_sa->
key.tfm, which will lead to a resource leak.

We should call macsec_rxsa_put() instead of kfree() to decrease the ref
count of rx_sa and release the relevant resource if the refcount is 0.
The same bug exists in macsec_add_txsa() for tx_sa as well. This patch
fixes the above two bugs.

Fixes: 3cf3227a21d1 ("net: macsec: hardware offloading infrastructure")
Signed-off-by: Jianglei Nie <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/macsec.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c
index 95578f04f212..f354fad05714 100644
--- a/drivers/net/macsec.c
+++ b/drivers/net/macsec.c
@@ -1844,7 +1844,7 @@ static int macsec_add_rxsa(struct sk_buff *skb, struct genl_info *info)
return 0;

cleanup:
- kfree(rx_sa);
+ macsec_rxsa_put(rx_sa);
rtnl_unlock();
return err;
}
@@ -2087,7 +2087,7 @@ static int macsec_add_txsa(struct sk_buff *skb, struct genl_info *info)

cleanup:
secy->operational = was_operational;
- kfree(tx_sa);
+ macsec_txsa_put(tx_sa);
rtnl_unlock();
return err;
}
--
2.35.1




2022-08-01 12:49:22

by Greg KH

[permalink] [raw]
Subject: [PATCH 5.18 50/88] tcp: Fix a data-race around sysctl_tcp_invalid_ratelimit.

From: Kuniyuki Iwashima <[email protected]>

[ Upstream commit 2afdbe7b8de84c28e219073a6661080e1b3ded48 ]

While reading sysctl_tcp_invalid_ratelimit, it can be changed
concurrently. Thus, we need to add READ_ONCE() to its reader.

Fixes: 032ee4236954 ("tcp: helpers to mitigate ACK loops by rate-limiting out-of-window dupacks")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/ipv4/tcp_input.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index f3b658fa3e7b..db78197a44ff 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3581,7 +3581,8 @@ static bool __tcp_oow_rate_limited(struct net *net, int mib_idx,
if (*last_oow_ack_time) {
s32 elapsed = (s32)(tcp_jiffies32 - *last_oow_ack_time);

- if (0 <= elapsed && elapsed < net->ipv4.sysctl_tcp_invalid_ratelimit) {
+ if (0 <= elapsed &&
+ elapsed < READ_ONCE(net->ipv4.sysctl_tcp_invalid_ratelimit)) {
NET_INC_STATS(net, mib_idx);
return true; /* rate-limited: don't send yet! */
}
--
2.35.1




2022-08-01 12:49:46

by Greg KH

[permalink] [raw]
Subject: [PATCH 5.18 76/88] sctp: leave the err path free in sctp_stream_init to sctp_stream_free

From: Xin Long <[email protected]>

[ Upstream commit 181d8d2066c000ba0a0e6940a7ad80f1a0e68e9d ]

A NULL pointer dereference was reported by Wei Chen:

BUG: kernel NULL pointer dereference, address: 0000000000000000
RIP: 0010:__list_del_entry_valid+0x26/0x80
Call Trace:
<TASK>
sctp_sched_dequeue_common+0x1c/0x90
sctp_sched_prio_dequeue+0x67/0x80
__sctp_outq_teardown+0x299/0x380
sctp_outq_free+0x15/0x20
sctp_association_free+0xc3/0x440
sctp_do_sm+0x1ca7/0x2210
sctp_assoc_bh_rcv+0x1f6/0x340

This happens when calling sctp_sendmsg without connecting to server first.
In this case, a data chunk already queues up in send queue of client side
when processing the INIT_ACK from server in sctp_process_init() where it
calls sctp_stream_init() to alloc stream_in. If it fails to alloc stream_in
all stream_out will be freed in sctp_stream_init's err path. Then in the
asoc freeing it will crash when dequeuing this data chunk as stream_out
is missing.

As we can't free stream out before dequeuing all data from send queue, and
this patch is to fix it by moving the err path stream_out/in freeing in
sctp_stream_init() to sctp_stream_free() which is eventually called when
freeing the asoc in sctp_association_free(). This fix also makes the code
in sctp_process_init() more clear.

Note that in sctp_association_init() when it fails in sctp_stream_init(),
sctp_association_free() will not be called, and in that case it should
go to 'stream_free' err path to free stream instead of 'fail_init'.

Fixes: 5bbbbe32a431 ("sctp: introduce stream scheduler foundations")
Reported-by: Wei Chen <[email protected]>
Signed-off-by: Xin Long <[email protected]>
Link: https://lore.kernel.org/r/831a3dc100c4908ff76e5bcc363be97f2778bc0b.1658787066.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/sctp/associola.c | 5 ++---
net/sctp/stream.c | 19 +++----------------
2 files changed, 5 insertions(+), 19 deletions(-)

diff --git a/net/sctp/associola.c b/net/sctp/associola.c
index be29da09cc7a..3460abceba44 100644
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -229,9 +229,8 @@ static struct sctp_association *sctp_association_init(
if (!sctp_ulpq_init(&asoc->ulpq, asoc))
goto fail_init;

- if (sctp_stream_init(&asoc->stream, asoc->c.sinit_num_ostreams,
- 0, gfp))
- goto fail_init;
+ if (sctp_stream_init(&asoc->stream, asoc->c.sinit_num_ostreams, 0, gfp))
+ goto stream_free;

/* Initialize default path MTU. */
asoc->pathmtu = sp->pathmtu;
diff --git a/net/sctp/stream.c b/net/sctp/stream.c
index 6dc95dcc0ff4..ef9fceadef8d 100644
--- a/net/sctp/stream.c
+++ b/net/sctp/stream.c
@@ -137,7 +137,7 @@ int sctp_stream_init(struct sctp_stream *stream, __u16 outcnt, __u16 incnt,

ret = sctp_stream_alloc_out(stream, outcnt, gfp);
if (ret)
- goto out_err;
+ return ret;

for (i = 0; i < stream->outcnt; i++)
SCTP_SO(stream, i)->state = SCTP_STREAM_OPEN;
@@ -145,22 +145,9 @@ int sctp_stream_init(struct sctp_stream *stream, __u16 outcnt, __u16 incnt,
handle_in:
sctp_stream_interleave_init(stream);
if (!incnt)
- goto out;
-
- ret = sctp_stream_alloc_in(stream, incnt, gfp);
- if (ret)
- goto in_err;
-
- goto out;
+ return 0;

-in_err:
- sched->free(stream);
- genradix_free(&stream->in);
-out_err:
- genradix_free(&stream->out);
- stream->outcnt = 0;
-out:
- return ret;
+ return sctp_stream_alloc_in(stream, incnt, gfp);
}

int sctp_stream_init_ext(struct sctp_stream *stream, __u16 sid)
--
2.35.1




2022-08-01 12:49:49

by Greg KH

[permalink] [raw]
Subject: [PATCH 5.18 63/88] tcp: Fix data-races around sysctl_tcp_reflect_tos.

From: Kuniyuki Iwashima <[email protected]>

[ Upstream commit 870e3a634b6a6cb1543b359007aca73fe6a03ac5 ]

While reading sysctl_tcp_reflect_tos, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

Fixes: ac8f1710c12b ("tcp: reflect tos value received in SYN to the socket")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Acked-by: Wei Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/ipv4/tcp_ipv4.c | 4 ++--
net/ipv6/tcp_ipv6.c | 4 ++--
2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index a57f96b86874..1db9938163c4 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1007,7 +1007,7 @@ static int tcp_v4_send_synack(const struct sock *sk, struct dst_entry *dst,
if (skb) {
__tcp_v4_send_check(skb, ireq->ir_loc_addr, ireq->ir_rmt_addr);

- tos = sock_net(sk)->ipv4.sysctl_tcp_reflect_tos ?
+ tos = READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_reflect_tos) ?
(tcp_rsk(req)->syn_tos & ~INET_ECN_MASK) |
(inet_sk(sk)->tos & INET_ECN_MASK) :
inet_sk(sk)->tos;
@@ -1527,7 +1527,7 @@ struct sock *tcp_v4_syn_recv_sock(const struct sock *sk, struct sk_buff *skb,
/* Set ToS of the new socket based upon the value of incoming SYN.
* ECT bits are set later in tcp_init_transfer().
*/
- if (sock_net(sk)->ipv4.sysctl_tcp_reflect_tos)
+ if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_reflect_tos))
newinet->tos = tcp_rsk(req)->syn_tos & ~INET_ECN_MASK;

if (!dst) {
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 5185c11dc444..979e0d7b2119 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -546,7 +546,7 @@ static int tcp_v6_send_synack(const struct sock *sk, struct dst_entry *dst,
if (np->repflow && ireq->pktopts)
fl6->flowlabel = ip6_flowlabel(ipv6_hdr(ireq->pktopts));

- tclass = sock_net(sk)->ipv4.sysctl_tcp_reflect_tos ?
+ tclass = READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_reflect_tos) ?
(tcp_rsk(req)->syn_tos & ~INET_ECN_MASK) |
(np->tclass & INET_ECN_MASK) :
np->tclass;
@@ -1314,7 +1314,7 @@ static struct sock *tcp_v6_syn_recv_sock(const struct sock *sk, struct sk_buff *
/* Set ToS of the new socket based upon the value of incoming SYN.
* ECT bits are set later in tcp_init_transfer().
*/
- if (sock_net(sk)->ipv4.sysctl_tcp_reflect_tos)
+ if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_reflect_tos))
newnp->tclass = tcp_rsk(req)->syn_tos & ~INET_ECN_MASK;

/* Clone native IPv6 options from listening socket (if any)
--
2.35.1




2022-08-01 12:49:51

by Greg KH

[permalink] [raw]
Subject: [PATCH 5.18 32/88] Revert "tcp: change pingpong threshold to 3"

From: Wei Wang <[email protected]>

commit 4d8f24eeedc58d5f87b650ddda73c16e8ba56559 upstream.

This reverts commit 4a41f453bedfd5e9cd040bad509d9da49feb3e2c.

This to-be-reverted commit was meant to apply a stricter rule for the
stack to enter pingpong mode. However, the condition used to check for
interactive session "before(tp->lsndtime, icsk->icsk_ack.lrcvtime)" is
jiffy based and might be too coarse, which delays the stack entering
pingpong mode.
We revert this patch so that we no longer use the above condition to
determine interactive session, and also reduce pingpong threshold to 1.

Fixes: 4a41f453bedf ("tcp: change pingpong threshold to 3")
Reported-by: LemmyHuang <[email protected]>
Suggested-by: Neal Cardwell <[email protected]>
Signed-off-by: Wei Wang <[email protected]>
Acked-by: Neal Cardwell <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/net/inet_connection_sock.h | 10 +---------
net/ipv4/tcp_output.c | 15 ++++++---------
2 files changed, 7 insertions(+), 18 deletions(-)

--- a/include/net/inet_connection_sock.h
+++ b/include/net/inet_connection_sock.h
@@ -323,7 +323,7 @@ void inet_csk_update_fastreuse(struct in

struct dst_entry *inet_csk_update_pmtu(struct sock *sk, u32 mtu);

-#define TCP_PINGPONG_THRESH 3
+#define TCP_PINGPONG_THRESH 1

static inline void inet_csk_enter_pingpong_mode(struct sock *sk)
{
@@ -340,14 +340,6 @@ static inline bool inet_csk_in_pingpong_
return inet_csk(sk)->icsk_ack.pingpong >= TCP_PINGPONG_THRESH;
}

-static inline void inet_csk_inc_pingpong_cnt(struct sock *sk)
-{
- struct inet_connection_sock *icsk = inet_csk(sk);
-
- if (icsk->icsk_ack.pingpong < U8_MAX)
- icsk->icsk_ack.pingpong++;
-}
-
static inline bool inet_csk_has_ulp(struct sock *sk)
{
return inet_sk(sk)->is_icsk && !!inet_csk(sk)->icsk_ulp_ops;
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -167,16 +167,13 @@ static void tcp_event_data_sent(struct t
if (tcp_packets_in_flight(tp) == 0)
tcp_ca_event(sk, CA_EVENT_TX_START);

- /* If this is the first data packet sent in response to the
- * previous received data,
- * and it is a reply for ato after last received packet,
- * increase pingpong count.
- */
- if (before(tp->lsndtime, icsk->icsk_ack.lrcvtime) &&
- (u32)(now - icsk->icsk_ack.lrcvtime) < icsk->icsk_ack.ato)
- inet_csk_inc_pingpong_cnt(sk);
-
tp->lsndtime = now;
+
+ /* If it is a reply for ato after last received
+ * packet, enter pingpong mode.
+ */
+ if ((u32)(now - icsk->icsk_ack.lrcvtime) < icsk->icsk_ack.ato)
+ inet_csk_enter_pingpong_mode(sk);
}

/* Account for an ACK we sent. */



2022-08-01 12:49:52

by Greg KH

[permalink] [raw]
Subject: [PATCH 5.18 48/88] tcp: Fix a data-race around sysctl_tcp_min_rtt_wlen.

From: Kuniyuki Iwashima <[email protected]>

[ Upstream commit 1330ffacd05fc9ac4159d19286ce119e22450ed2 ]

While reading sysctl_tcp_min_rtt_wlen, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: f672258391b4 ("tcp: track min RTT using windowed min-filter")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/ipv4/tcp_input.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 78e16891f12b..f3b658fa3e7b 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3058,7 +3058,7 @@ static void tcp_fastretrans_alert(struct sock *sk, const u32 prior_snd_una,

static void tcp_update_rtt_min(struct sock *sk, u32 rtt_us, const int flag)
{
- u32 wlen = sock_net(sk)->ipv4.sysctl_tcp_min_rtt_wlen * HZ;
+ u32 wlen = READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_min_rtt_wlen) * HZ;
struct tcp_sock *tp = tcp_sk(sk);

if ((flag & FLAG_ACK_MAYBE_DELAYED) && rtt_us > tcp_min_rtt(tp)) {
--
2.35.1




2022-08-01 12:49:52

by Greg KH

[permalink] [raw]
Subject: [PATCH 5.18 61/88] tcp: Fix a data-race around sysctl_tcp_comp_sack_slack_ns.

From: Kuniyuki Iwashima <[email protected]>

[ Upstream commit 22396941a7f343d704738360f9ef0e6576489d43 ]

While reading sysctl_tcp_comp_sack_slack_ns, it can be changed
concurrently. Thus, we need to add READ_ONCE() to its reader.

Fixes: a70437cc09a1 ("tcp: add hrtimer slack to sack compression")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/ipv4/tcp_input.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 3591a25a8631..5de396075a27 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -5551,7 +5551,7 @@ static void __tcp_ack_snd_check(struct sock *sk, int ofo_possible)
rtt * (NSEC_PER_USEC >> 3)/20);
sock_hold(sk);
hrtimer_start_range_ns(&tp->compressed_ack_timer, ns_to_ktime(delay),
- sock_net(sk)->ipv4.sysctl_tcp_comp_sack_slack_ns,
+ READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_comp_sack_slack_ns),
HRTIMER_MODE_REL_PINNED_SOFT);
}

--
2.35.1




2022-08-01 12:50:03

by Greg KH

[permalink] [raw]
Subject: [PATCH 5.18 10/88] mm: fix page leak with multiple threads mapping the same page

From: Josef Bacik <[email protected]>

commit 3fe2895cfecd03ac74977f32102b966b6589f481 upstream.

We have an application with a lot of threads that use a shared mmap backed
by tmpfs mounted with -o huge=within_size. This application started
leaking loads of huge pages when we upgraded to a recent kernel.

Using the page ref tracepoints and a BPF program written by Tejun Heo we
were able to determine that these pages would have multiple refcounts from
the page fault path, but when it came to unmap time we wouldn't drop the
number of refs we had added from the faults.

I wrote a reproducer that mmap'ed a file backed by tmpfs with -o
huge=always, and then spawned 20 threads all looping faulting random
offsets in this map, while using madvise(MADV_DONTNEED) randomly for huge
page aligned ranges. This very quickly reproduced the problem.

The problem here is that we check for the case that we have multiple
threads faulting in a range that was previously unmapped. One thread maps
the PMD, the other thread loses the race and then returns 0. However at
this point we already have the page, and we are no longer putting this
page into the processes address space, and so we leak the page. We
actually did the correct thing prior to f9ce0be71d1f, however it looks
like Kirill copied what we do in the anonymous page case. In the
anonymous page case we don't yet have a page, so we don't have to drop a
reference on anything. Previously we did the correct thing for file based
faults by returning VM_FAULT_NOPAGE so we correctly drop the reference on
the page we faulted in.

Fix this by returning VM_FAULT_NOPAGE in the pmd_devmap_trans_unstable()
case, this makes us drop the ref on the page properly, and now my
reproducer no longer leaks the huge pages.

[[email protected]: v2]
Link: https://lkml.kernel.org/r/e90c8f0dbae836632b669c2afc434006a00d4a67.1657721478.git.josef@toxicpanda.com
Link: https://lkml.kernel.org/r/2b798acfd95c9ab9395fe85e8d5a835e2e10a920.1657051137.git.josef@toxicpanda.com
Fixes: f9ce0be71d1f ("mm: Cleanup faultaround and finish_fault() codepaths")
Signed-off-by: Josef Bacik <[email protected]>
Signed-off-by: Rik van Riel <[email protected]>
Signed-off-by: Chris Mason <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
mm/memory.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4108,9 +4108,12 @@ vm_fault_t finish_fault(struct vm_fault
return VM_FAULT_OOM;
}

- /* See comment in handle_pte_fault() */
+ /*
+ * See comment in handle_pte_fault() for how this scenario happens, we
+ * need to return NOPAGE so that we drop this page.
+ */
if (pmd_devmap_trans_unstable(vmf->pmd))
- return 0;
+ return VM_FAULT_NOPAGE;

vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
vmf->address, &vmf->ptl);



2022-08-01 12:50:18

by Greg KH

[permalink] [raw]
Subject: [PATCH 5.18 35/88] tcp: Fix data-races around sysctl_tcp_moderate_rcvbuf.

From: Kuniyuki Iwashima <[email protected]>

commit 780476488844e070580bfc9e3bc7832ec1cea883 upstream.

While reading sysctl_tcp_moderate_rcvbuf, it can be changed
concurrently. Thus, we need to add READ_ONCE() to its readers.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv4/tcp_input.c | 2 +-
net/mptcp/protocol.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -724,7 +724,7 @@ void tcp_rcv_space_adjust(struct sock *s
* <prev RTT . ><current RTT .. ><next RTT .... >
*/

- if (sock_net(sk)->ipv4.sysctl_tcp_moderate_rcvbuf &&
+ if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_moderate_rcvbuf) &&
!(sk->sk_userlocks & SOCK_RCVBUF_LOCK)) {
int rcvmem, rcvbuf;
u64 rcvwin, grow;
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1882,7 +1882,7 @@ static void mptcp_rcv_space_adjust(struc
if (msk->rcvq_space.copied <= msk->rcvq_space.space)
goto new_measure;

- if (sock_net(sk)->ipv4.sysctl_tcp_moderate_rcvbuf &&
+ if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_moderate_rcvbuf) &&
!(sk->sk_userlocks & SOCK_RCVBUF_LOCK)) {
int rcvmem, rcvbuf;
u64 rcvwin, grow;



2022-08-01 12:50:34

by Greg KH

[permalink] [raw]
Subject: [PATCH 5.18 04/88] ARM: pxa2xx: Fix GPIO descriptor tables

From: Linus Walleij <[email protected]>

commit c5cdb9286913aa5a5ebb81bcca0c17df3b0e2c79 upstream.

Laurence reports:

"Kernel >5.18 on Zaurus has a bug where the power management code can't
talk to devices, emitting the following errors:

sharpsl-pm sharpsl-pm: Error: AC check failed: voltage -22.
sharpsl-pm sharpsl-pm: Charging Error!
sharpsl-pm sharpsl-pm: Warning: Cannot read main battery!

Looking at the recent changes, I found that commit 31455bbda208 ("spi:
pxa2xx_spi: Convert to use GPIO descriptors") replaced the deprecated
SPI chip select platform device code with a gpiod lookup table. However,
this didn't seem to work until I changed the `dev_id` member from the
device name to the bus id. I'm not entirely sure why this is necessary,
but I suspect it is related to the fact that in sysfs SPI devices are
attached under /sys/devices/.../dev_name/spi_master/spiB/spiB.C, rather
than directly to the device."

After reviewing the change I conclude that the same fix is needed
for all affected boards.

Fixes: 31455bbda208 ("spi: pxa2xx_spi: Convert to use GPIO descriptors")
Reported-by: Laurence de Bruxelles <[email protected]>
Signed-off-by: Linus Walleij <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]'
Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/arm/mach-pxa/corgi.c | 2 +-
arch/arm/mach-pxa/hx4700.c | 2 +-
arch/arm/mach-pxa/icontrol.c | 4 ++--
arch/arm/mach-pxa/littleton.c | 2 +-
arch/arm/mach-pxa/magician.c | 2 +-
arch/arm/mach-pxa/spitz.c | 2 +-
arch/arm/mach-pxa/z2.c | 4 ++--
7 files changed, 9 insertions(+), 9 deletions(-)

--- a/arch/arm/mach-pxa/corgi.c
+++ b/arch/arm/mach-pxa/corgi.c
@@ -531,7 +531,7 @@ static struct pxa2xx_spi_controller corg
};

static struct gpiod_lookup_table corgi_spi_gpio_table = {
- .dev_id = "pxa2xx-spi.1",
+ .dev_id = "spi1",
.table = {
GPIO_LOOKUP_IDX("gpio-pxa", CORGI_GPIO_ADS7846_CS, "cs", 0, GPIO_ACTIVE_LOW),
GPIO_LOOKUP_IDX("gpio-pxa", CORGI_GPIO_LCDCON_CS, "cs", 1, GPIO_ACTIVE_LOW),
--- a/arch/arm/mach-pxa/hx4700.c
+++ b/arch/arm/mach-pxa/hx4700.c
@@ -635,7 +635,7 @@ static struct pxa2xx_spi_controller pxa_
};

static struct gpiod_lookup_table pxa_ssp2_gpio_table = {
- .dev_id = "pxa2xx-spi.2",
+ .dev_id = "spi2",
.table = {
GPIO_LOOKUP_IDX("gpio-pxa", GPIO88_HX4700_TSC2046_CS, "cs", 0, GPIO_ACTIVE_LOW),
{ },
--- a/arch/arm/mach-pxa/icontrol.c
+++ b/arch/arm/mach-pxa/icontrol.c
@@ -140,7 +140,7 @@ struct platform_device pxa_spi_ssp4 = {
};

static struct gpiod_lookup_table pxa_ssp3_gpio_table = {
- .dev_id = "pxa2xx-spi.3",
+ .dev_id = "spi3",
.table = {
GPIO_LOOKUP_IDX("gpio-pxa", ICONTROL_MCP251x_nCS1, "cs", 0, GPIO_ACTIVE_LOW),
GPIO_LOOKUP_IDX("gpio-pxa", ICONTROL_MCP251x_nCS2, "cs", 1, GPIO_ACTIVE_LOW),
@@ -149,7 +149,7 @@ static struct gpiod_lookup_table pxa_ssp
};

static struct gpiod_lookup_table pxa_ssp4_gpio_table = {
- .dev_id = "pxa2xx-spi.4",
+ .dev_id = "spi4",
.table = {
GPIO_LOOKUP_IDX("gpio-pxa", ICONTROL_MCP251x_nCS3, "cs", 0, GPIO_ACTIVE_LOW),
GPIO_LOOKUP_IDX("gpio-pxa", ICONTROL_MCP251x_nCS4, "cs", 1, GPIO_ACTIVE_LOW),
--- a/arch/arm/mach-pxa/littleton.c
+++ b/arch/arm/mach-pxa/littleton.c
@@ -208,7 +208,7 @@ static struct spi_board_info littleton_s
};

static struct gpiod_lookup_table littleton_spi_gpio_table = {
- .dev_id = "pxa2xx-spi.2",
+ .dev_id = "spi2",
.table = {
GPIO_LOOKUP_IDX("gpio-pxa", LITTLETON_GPIO_LCD_CS, "cs", 0, GPIO_ACTIVE_LOW),
{ },
--- a/arch/arm/mach-pxa/magician.c
+++ b/arch/arm/mach-pxa/magician.c
@@ -946,7 +946,7 @@ static struct pxa2xx_spi_controller magi
};

static struct gpiod_lookup_table magician_spi_gpio_table = {
- .dev_id = "pxa2xx-spi.2",
+ .dev_id = "spi2",
.table = {
/* NOTICE must be GPIO, incompatibility with hw PXA SPI framing */
GPIO_LOOKUP_IDX("gpio-pxa", GPIO14_MAGICIAN_TSC2046_CS, "cs", 0, GPIO_ACTIVE_LOW),
--- a/arch/arm/mach-pxa/spitz.c
+++ b/arch/arm/mach-pxa/spitz.c
@@ -578,7 +578,7 @@ static struct pxa2xx_spi_controller spit
};

static struct gpiod_lookup_table spitz_spi_gpio_table = {
- .dev_id = "pxa2xx-spi.2",
+ .dev_id = "spi2",
.table = {
GPIO_LOOKUP_IDX("gpio-pxa", SPITZ_GPIO_ADS7846_CS, "cs", 0, GPIO_ACTIVE_LOW),
GPIO_LOOKUP_IDX("gpio-pxa", SPITZ_GPIO_LCDCON_CS, "cs", 1, GPIO_ACTIVE_LOW),
--- a/arch/arm/mach-pxa/z2.c
+++ b/arch/arm/mach-pxa/z2.c
@@ -623,7 +623,7 @@ static struct pxa2xx_spi_controller pxa_
};

static struct gpiod_lookup_table pxa_ssp1_gpio_table = {
- .dev_id = "pxa2xx-spi.1",
+ .dev_id = "spi1",
.table = {
GPIO_LOOKUP_IDX("gpio-pxa", GPIO24_ZIPITZ2_WIFI_CS, "cs", 0, GPIO_ACTIVE_LOW),
{ },
@@ -631,7 +631,7 @@ static struct gpiod_lookup_table pxa_ssp
};

static struct gpiod_lookup_table pxa_ssp2_gpio_table = {
- .dev_id = "pxa2xx-spi.2",
+ .dev_id = "spi2",
.table = {
GPIO_LOOKUP_IDX("gpio-pxa", GPIO88_ZIPITZ2_LCD_CS, "cs", 0, GPIO_ACTIVE_LOW),
{ },



2022-08-01 12:50:45

by Greg KH

[permalink] [raw]
Subject: [PATCH 5.18 26/88] bridge: Do not send empty IFLA_AF_SPEC attribute

From: Benjamin Poirier <[email protected]>

commit 9b134b1694ec8926926ba6b7b80884ea829245a0 upstream.

After commit b6c02ef54913 ("bridge: Netlink interface fix."),
br_fill_ifinfo() started to send an empty IFLA_AF_SPEC attribute when a
bridge vlan dump is requested but an interface does not have any vlans
configured.

iproute2 ignores such an empty attribute since commit b262a9becbcb
("bridge: Fix output with empty vlan lists") but older iproute2 versions as
well as other utilities have their output changed by the cited kernel
commit, resulting in failed test cases. Regardless, emitting an empty
attribute is pointless and inefficient.

Avoid this change by canceling the attribute if no AF_SPEC data was added.

Fixes: b6c02ef54913 ("bridge: Netlink interface fix.")
Reviewed-by: Ido Schimmel <[email protected]>
Signed-off-by: Benjamin Poirier <[email protected]>
Acked-by: Nikolay Aleksandrov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/bridge/br_netlink.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -589,9 +589,13 @@ static int br_fill_ifinfo(struct sk_buff
}

done:
+ if (af) {
+ if (nlmsg_get_pos(skb) - (void *)af > nla_attr_size(0))
+ nla_nest_end(skb, af);
+ else
+ nla_nest_cancel(skb, af);
+ }

- if (af)
- nla_nest_end(skb, af);
nlmsg_end(skb, nlh);
return 0;




2022-08-01 12:50:50

by Greg KH

[permalink] [raw]
Subject: [PATCH 5.18 79/88] mm/hmm: fault non-owner device private entries

From: Ralph Campbell <[email protected]>

commit 8a295dbbaf7292c582a40ce469c326f472d51f66 upstream.

If hmm_range_fault() is called with the HMM_PFN_REQ_FAULT flag and a
device private PTE is found, the hmm_range::dev_private_owner page is used
to determine if the device private page should not be faulted in.
However, if the device private page is not owned by the caller,
hmm_range_fault() returns an error instead of calling migrate_to_ram() to
fault in the page.

For example, if a page is migrated to GPU private memory and a RDMA fault
capable NIC tries to read the migrated page, without this patch it will
get an error. With this patch, the page will be migrated back to system
memory and the NIC will be able to read the data.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 08ddddda667b ("mm/hmm: check the device private page owner in hmm_range_fault()")
Signed-off-by: Ralph Campbell <[email protected]>
Reported-by: Felix Kuehling <[email protected]>
Reviewed-by: Alistair Popple <[email protected]>
Cc: Philip Yang <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
mm/hmm.c | 19 ++++++++-----------
1 file changed, 8 insertions(+), 11 deletions(-)

--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -212,14 +212,6 @@ int hmm_vma_handle_pmd(struct mm_walk *w
unsigned long end, unsigned long hmm_pfns[], pmd_t pmd);
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */

-static inline bool hmm_is_device_private_entry(struct hmm_range *range,
- swp_entry_t entry)
-{
- return is_device_private_entry(entry) &&
- pfn_swap_entry_to_page(entry)->pgmap->owner ==
- range->dev_private_owner;
-}
-
static inline unsigned long pte_to_hmm_pfn_flags(struct hmm_range *range,
pte_t pte)
{
@@ -252,10 +244,12 @@ static int hmm_vma_handle_pte(struct mm_
swp_entry_t entry = pte_to_swp_entry(pte);

/*
- * Never fault in device private pages, but just report
- * the PFN even if not present.
+ * Don't fault in device private pages owned by the caller,
+ * just report the PFN.
*/
- if (hmm_is_device_private_entry(range, entry)) {
+ if (is_device_private_entry(entry) &&
+ pfn_swap_entry_to_page(entry)->pgmap->owner ==
+ range->dev_private_owner) {
cpu_flags = HMM_PFN_VALID;
if (is_writable_device_private_entry(entry))
cpu_flags |= HMM_PFN_WRITE;
@@ -273,6 +267,9 @@ static int hmm_vma_handle_pte(struct mm_
if (!non_swap_entry(entry))
goto fault;

+ if (is_device_private_entry(entry))
+ goto fault;
+
if (is_device_exclusive_entry(entry))
goto fault;




2022-08-01 12:51:16

by Greg KH

[permalink] [raw]
Subject: [PATCH 5.18 59/88] net: Fix data-races around sysctl_[rw]mem(_offset)?.

From: Kuniyuki Iwashima <[email protected]>

[ Upstream commit 02739545951ad4c1215160db7fbf9b7a918d3c0b ]

While reading these sysctl variables, they can be changed concurrently.
Thus, we need to add READ_ONCE() to their readers.

- .sysctl_rmem
- .sysctl_rwmem
- .sysctl_rmem_offset
- .sysctl_wmem_offset
- sysctl_tcp_rmem[1, 2]
- sysctl_tcp_wmem[1, 2]
- sysctl_decnet_rmem[1]
- sysctl_decnet_wmem[1]
- sysctl_tipc_rmem[1]

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
include/net/sock.h | 8 ++++----
net/decnet/af_decnet.c | 4 ++--
net/ipv4/tcp.c | 6 +++---
net/ipv4/tcp_input.c | 13 +++++++------
net/ipv4/tcp_output.c | 2 +-
net/mptcp/protocol.c | 6 +++---
net/tipc/socket.c | 2 +-
7 files changed, 21 insertions(+), 20 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 6bef0ffb1e7b..9563a093fdfc 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2834,18 +2834,18 @@ static inline int sk_get_wmem0(const struct sock *sk, const struct proto *proto)
{
/* Does this proto have per netns sysctl_wmem ? */
if (proto->sysctl_wmem_offset)
- return *(int *)((void *)sock_net(sk) + proto->sysctl_wmem_offset);
+ return READ_ONCE(*(int *)((void *)sock_net(sk) + proto->sysctl_wmem_offset));

- return *proto->sysctl_wmem;
+ return READ_ONCE(*proto->sysctl_wmem);
}

static inline int sk_get_rmem0(const struct sock *sk, const struct proto *proto)
{
/* Does this proto have per netns sysctl_rmem ? */
if (proto->sysctl_rmem_offset)
- return *(int *)((void *)sock_net(sk) + proto->sysctl_rmem_offset);
+ return READ_ONCE(*(int *)((void *)sock_net(sk) + proto->sysctl_rmem_offset));

- return *proto->sysctl_rmem;
+ return READ_ONCE(*proto->sysctl_rmem);
}

/* Default TCP Small queue budget is ~1 ms of data (1sec >> 10)
diff --git a/net/decnet/af_decnet.c b/net/decnet/af_decnet.c
index dc92a67baea3..7d542eb46172 100644
--- a/net/decnet/af_decnet.c
+++ b/net/decnet/af_decnet.c
@@ -480,8 +480,8 @@ static struct sock *dn_alloc_sock(struct net *net, struct socket *sock, gfp_t gf
sk->sk_family = PF_DECnet;
sk->sk_protocol = 0;
sk->sk_allocation = gfp;
- sk->sk_sndbuf = sysctl_decnet_wmem[1];
- sk->sk_rcvbuf = sysctl_decnet_rmem[1];
+ sk->sk_sndbuf = READ_ONCE(sysctl_decnet_wmem[1]);
+ sk->sk_rcvbuf = READ_ONCE(sysctl_decnet_rmem[1]);

/* Initialization of DECnet Session Control Port */
scp = DN_SK(sk);
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 60b46f2a6896..91735d631a28 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -452,8 +452,8 @@ void tcp_init_sock(struct sock *sk)

icsk->icsk_sync_mss = tcp_sync_mss;

- WRITE_ONCE(sk->sk_sndbuf, sock_net(sk)->ipv4.sysctl_tcp_wmem[1]);
- WRITE_ONCE(sk->sk_rcvbuf, sock_net(sk)->ipv4.sysctl_tcp_rmem[1]);
+ WRITE_ONCE(sk->sk_sndbuf, READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_wmem[1]));
+ WRITE_ONCE(sk->sk_rcvbuf, READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_rmem[1]));

sk_sockets_allocated_inc(sk);
}
@@ -1743,7 +1743,7 @@ int tcp_set_rcvlowat(struct sock *sk, int val)
if (sk->sk_userlocks & SOCK_RCVBUF_LOCK)
cap = sk->sk_rcvbuf >> 1;
else
- cap = sock_net(sk)->ipv4.sysctl_tcp_rmem[2] >> 1;
+ cap = READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_rmem[2]) >> 1;
val = min(val, cap);
WRITE_ONCE(sk->sk_rcvlowat, val ? : 1);

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index de066fad7dfe..f09b1321a960 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -426,7 +426,7 @@ static void tcp_sndbuf_expand(struct sock *sk)

if (sk->sk_sndbuf < sndmem)
WRITE_ONCE(sk->sk_sndbuf,
- min(sndmem, sock_net(sk)->ipv4.sysctl_tcp_wmem[2]));
+ min(sndmem, READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_wmem[2])));
}

/* 2. Tuning advertised window (window_clamp, rcv_ssthresh)
@@ -461,7 +461,7 @@ static int __tcp_grow_window(const struct sock *sk, const struct sk_buff *skb,
struct tcp_sock *tp = tcp_sk(sk);
/* Optimize this! */
int truesize = tcp_win_from_space(sk, skbtruesize) >> 1;
- int window = tcp_win_from_space(sk, sock_net(sk)->ipv4.sysctl_tcp_rmem[2]) >> 1;
+ int window = tcp_win_from_space(sk, READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_rmem[2])) >> 1;

while (tp->rcv_ssthresh <= window) {
if (truesize <= skb->len)
@@ -574,16 +574,17 @@ static void tcp_clamp_window(struct sock *sk)
struct tcp_sock *tp = tcp_sk(sk);
struct inet_connection_sock *icsk = inet_csk(sk);
struct net *net = sock_net(sk);
+ int rmem2;

icsk->icsk_ack.quick = 0;
+ rmem2 = READ_ONCE(net->ipv4.sysctl_tcp_rmem[2]);

- if (sk->sk_rcvbuf < net->ipv4.sysctl_tcp_rmem[2] &&
+ if (sk->sk_rcvbuf < rmem2 &&
!(sk->sk_userlocks & SOCK_RCVBUF_LOCK) &&
!tcp_under_memory_pressure(sk) &&
sk_memory_allocated(sk) < sk_prot_mem_limits(sk, 0)) {
WRITE_ONCE(sk->sk_rcvbuf,
- min(atomic_read(&sk->sk_rmem_alloc),
- net->ipv4.sysctl_tcp_rmem[2]));
+ min(atomic_read(&sk->sk_rmem_alloc), rmem2));
}
if (atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf)
tp->rcv_ssthresh = min(tp->window_clamp, 2U * tp->advmss);
@@ -745,7 +746,7 @@ void tcp_rcv_space_adjust(struct sock *sk)

do_div(rcvwin, tp->advmss);
rcvbuf = min_t(u64, rcvwin * rcvmem,
- sock_net(sk)->ipv4.sysctl_tcp_rmem[2]);
+ READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_rmem[2]));
if (rcvbuf > sk->sk_rcvbuf) {
WRITE_ONCE(sk->sk_rcvbuf, rcvbuf);

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 60c9f7f444e0..66836b8bd46f 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -238,7 +238,7 @@ void tcp_select_initial_window(const struct sock *sk, int __space, __u32 mss,
*rcv_wscale = 0;
if (wscale_ok) {
/* Set window scaling on max possible window */
- space = max_t(u32, space, sock_net(sk)->ipv4.sysctl_tcp_rmem[2]);
+ space = max_t(u32, space, READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_rmem[2]));
space = max_t(u32, space, sysctl_rmem_max);
space = min_t(u32, space, *window_clamp);
*rcv_wscale = clamp_t(int, ilog2(space) - 15,
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index e2790a6e90fb..07b5a2044cab 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1900,7 +1900,7 @@ static void mptcp_rcv_space_adjust(struct mptcp_sock *msk, int copied)

do_div(rcvwin, advmss);
rcvbuf = min_t(u64, rcvwin * rcvmem,
- sock_net(sk)->ipv4.sysctl_tcp_rmem[2]);
+ READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_rmem[2]));

if (rcvbuf > sk->sk_rcvbuf) {
u32 window_clamp;
@@ -2597,8 +2597,8 @@ static int mptcp_init_sock(struct sock *sk)
mptcp_ca_reset(sk);

sk_sockets_allocated_inc(sk);
- sk->sk_rcvbuf = sock_net(sk)->ipv4.sysctl_tcp_rmem[1];
- sk->sk_sndbuf = sock_net(sk)->ipv4.sysctl_tcp_wmem[1];
+ sk->sk_rcvbuf = READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_rmem[1]);
+ sk->sk_sndbuf = READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_wmem[1]);

return 0;
}
diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 43509c7e90fc..f1c3b8eb4b3d 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -517,7 +517,7 @@ static int tipc_sk_create(struct net *net, struct socket *sock,
timer_setup(&sk->sk_timer, tipc_sk_timeout, 0);
sk->sk_shutdown = 0;
sk->sk_backlog_rcv = tipc_sk_backlog_rcv;
- sk->sk_rcvbuf = sysctl_tipc_rmem[1];
+ sk->sk_rcvbuf = READ_ONCE(sysctl_tipc_rmem[1]);
sk->sk_data_ready = tipc_data_ready;
sk->sk_write_space = tipc_write_space;
sk->sk_destruct = tipc_sock_destruct;
--
2.35.1




2022-08-01 12:51:33

by Greg KH

[permalink] [raw]
Subject: [PATCH 5.18 19/88] watch_queue: Fix missing locking in add_watch_to_object()

From: Linus Torvalds <[email protected]>

commit e64ab2dbd882933b65cd82ff6235d705ad65dbb6 upstream.

If a watch is being added to a queue, it needs to guard against
interference from addition of a new watch, manual removal of a watch and
removal of a watch due to some other queue being destroyed.

KEYCTL_WATCH_KEY guards against this for the same {key,queue} pair by
holding the key->sem writelocked and by holding refs on both the key and
the queue - but that doesn't prevent interaction from other {key,queue}
pairs.

While add_watch_to_object() does take the spinlock on the event queue,
it doesn't take the lock on the source's watch list. The assumption was
that the caller would prevent that (say by taking key->sem) - but that
doesn't prevent interference from the destruction of another queue.

Fix this by locking the watcher list in add_watch_to_object().

Fixes: c73be61cede5 ("pipe: Add general notification queue support")
Reported-by: [email protected]
Signed-off-by: David Howells <[email protected]>
cc: [email protected]
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
kernel/watch_queue.c | 58 +++++++++++++++++++++++++++++++--------------------
1 file changed, 36 insertions(+), 22 deletions(-)

--- a/kernel/watch_queue.c
+++ b/kernel/watch_queue.c
@@ -454,6 +454,33 @@ void init_watch(struct watch *watch, str
rcu_assign_pointer(watch->queue, wqueue);
}

+static int add_one_watch(struct watch *watch, struct watch_list *wlist, struct watch_queue *wqueue)
+{
+ const struct cred *cred;
+ struct watch *w;
+
+ hlist_for_each_entry(w, &wlist->watchers, list_node) {
+ struct watch_queue *wq = rcu_access_pointer(w->queue);
+ if (wqueue == wq && watch->id == w->id)
+ return -EBUSY;
+ }
+
+ cred = current_cred();
+ if (atomic_inc_return(&cred->user->nr_watches) > task_rlimit(current, RLIMIT_NOFILE)) {
+ atomic_dec(&cred->user->nr_watches);
+ return -EAGAIN;
+ }
+
+ watch->cred = get_cred(cred);
+ rcu_assign_pointer(watch->watch_list, wlist);
+
+ kref_get(&wqueue->usage);
+ kref_get(&watch->usage);
+ hlist_add_head(&watch->queue_node, &wqueue->watches);
+ hlist_add_head_rcu(&watch->list_node, &wlist->watchers);
+ return 0;
+}
+
/**
* add_watch_to_object - Add a watch on an object to a watch list
* @watch: The watch to add
@@ -468,34 +495,21 @@ void init_watch(struct watch *watch, str
*/
int add_watch_to_object(struct watch *watch, struct watch_list *wlist)
{
- struct watch_queue *wqueue = rcu_access_pointer(watch->queue);
- struct watch *w;
+ struct watch_queue *wqueue;
+ int ret = -ENOENT;

- hlist_for_each_entry(w, &wlist->watchers, list_node) {
- struct watch_queue *wq = rcu_access_pointer(w->queue);
- if (wqueue == wq && watch->id == w->id)
- return -EBUSY;
- }
-
- watch->cred = get_current_cred();
- rcu_assign_pointer(watch->watch_list, wlist);
-
- if (atomic_inc_return(&watch->cred->user->nr_watches) >
- task_rlimit(current, RLIMIT_NOFILE)) {
- atomic_dec(&watch->cred->user->nr_watches);
- put_cred(watch->cred);
- return -EAGAIN;
- }
+ rcu_read_lock();

+ wqueue = rcu_access_pointer(watch->queue);
if (lock_wqueue(wqueue)) {
- kref_get(&wqueue->usage);
- kref_get(&watch->usage);
- hlist_add_head(&watch->queue_node, &wqueue->watches);
+ spin_lock(&wlist->lock);
+ ret = add_one_watch(watch, wlist, wqueue);
+ spin_unlock(&wlist->lock);
unlock_wqueue(wqueue);
}

- hlist_add_head_rcu(&watch->list_node, &wlist->watchers);
- return 0;
+ rcu_read_unlock();
+ return ret;
}
EXPORT_SYMBOL(add_watch_to_object);




2022-08-01 12:51:36

by Greg KH

[permalink] [raw]
Subject: [PATCH 5.18 70/88] scsi: ufs: Support clearing multiple commands at once

From: Bart Van Assche <[email protected]>

[ Upstream commit d1a7644648b7cdacaf8d1013a4285001911e9bc8 ]

Modify ufshcd_clear_cmd() such that it supports clearing multiple commands
at once instead of one command at a time. This change will be used in a
later patch to reduce the time spent in the reset handler.

Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Stanley Chu <[email protected]>
Reviewed-by: Adrian Hunter <[email protected]>
Signed-off-by: Bart Van Assche <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/scsi/ufs/ufshcd.c | 42 ++++++++++++++++++++++++++-------------
1 file changed, 28 insertions(+), 14 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 452ad0612067..a34c1fab0246 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -734,17 +734,28 @@ static enum utp_ocs ufshcd_get_tr_ocs(struct ufshcd_lrb *lrbp)
}

/**
- * ufshcd_utrl_clear - Clear a bit in UTRLCLR register
+ * ufshcd_utrl_clear() - Clear requests from the controller request list.
* @hba: per adapter instance
- * @pos: position of the bit to be cleared
+ * @mask: mask with one bit set for each request to be cleared
*/
-static inline void ufshcd_utrl_clear(struct ufs_hba *hba, u32 pos)
+static inline void ufshcd_utrl_clear(struct ufs_hba *hba, u32 mask)
{
if (hba->quirks & UFSHCI_QUIRK_BROKEN_REQ_LIST_CLR)
- ufshcd_writel(hba, (1 << pos), REG_UTP_TRANSFER_REQ_LIST_CLEAR);
- else
- ufshcd_writel(hba, ~(1 << pos),
- REG_UTP_TRANSFER_REQ_LIST_CLEAR);
+ mask = ~mask;
+ /*
+ * From the UFSHCI specification: "UTP Transfer Request List CLear
+ * Register (UTRLCLR): This field is bit significant. Each bit
+ * corresponds to a slot in the UTP Transfer Request List, where bit 0
+ * corresponds to request slot 0. A bit in this field is set to ‘0’
+ * by host software to indicate to the host controller that a transfer
+ * request slot is cleared. The host controller
+ * shall free up any resources associated to the request slot
+ * immediately, and shall set the associated bit in UTRLDBR to ‘0’. The
+ * host software indicates no change to request slots by setting the
+ * associated bits in this field to ‘1’. Bits in this field shall only
+ * be set ‘1’ or ‘0’ by host software when UTRLRSR is set to ‘1’."
+ */
+ ufshcd_writel(hba, ~mask, REG_UTP_TRANSFER_REQ_LIST_CLEAR);
}

/**
@@ -2853,16 +2864,19 @@ static int ufshcd_compose_dev_cmd(struct ufs_hba *hba,
return ufshcd_compose_devman_upiu(hba, lrbp);
}

-static int
-ufshcd_clear_cmd(struct ufs_hba *hba, int tag)
+/*
+ * Clear all the requests from the controller for which a bit has been set in
+ * @mask and wait until the controller confirms that these requests have been
+ * cleared.
+ */
+static int ufshcd_clear_cmds(struct ufs_hba *hba, u32 mask)
{
int err = 0;
unsigned long flags;
- u32 mask = 1 << tag;

/* clear outstanding transaction before retry */
spin_lock_irqsave(hba->host->host_lock, flags);
- ufshcd_utrl_clear(hba, tag);
+ ufshcd_utrl_clear(hba, mask);
spin_unlock_irqrestore(hba->host->host_lock, flags);

/*
@@ -2953,7 +2967,7 @@ static int ufshcd_wait_for_dev_cmd(struct ufs_hba *hba,
err = -ETIMEDOUT;
dev_dbg(hba->dev, "%s: dev_cmd request timedout, tag %d\n",
__func__, lrbp->task_tag);
- if (!ufshcd_clear_cmd(hba, lrbp->task_tag))
+ if (!ufshcd_clear_cmds(hba, 1U << lrbp->task_tag))
/* successfully cleared the command, retry if needed */
err = -EAGAIN;
/*
@@ -6988,7 +7002,7 @@ static int ufshcd_eh_device_reset_handler(struct scsi_cmnd *cmd)
/* clear the commands that were pending for corresponding LUN */
for_each_set_bit(pos, &hba->outstanding_reqs, hba->nutrs) {
if (hba->lrb[pos].lun == lun) {
- err = ufshcd_clear_cmd(hba, pos);
+ err = ufshcd_clear_cmds(hba, 1U << pos);
if (err)
break;
__ufshcd_transfer_req_compl(hba, 1U << pos);
@@ -7090,7 +7104,7 @@ static int ufshcd_try_to_abort_task(struct ufs_hba *hba, int tag)
goto out;
}

- err = ufshcd_clear_cmd(hba, tag);
+ err = ufshcd_clear_cmds(hba, 1U << tag);
if (err)
dev_err(hba->dev, "%s: Failed clearing cmd at tag %d, err %d\n",
__func__, tag, err);
--
2.35.1




2022-08-01 12:51:49

by Greg KH

[permalink] [raw]
Subject: [PATCH 5.18 62/88] tcp: Fix a data-race around sysctl_tcp_comp_sack_nr.

From: Kuniyuki Iwashima <[email protected]>

[ Upstream commit 79f55473bfc8ac51bd6572929a679eeb4da22251 ]

While reading sysctl_tcp_comp_sack_nr, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 9c21d2fc41c0 ("tcp: add tcp_comp_sack_nr sysctl")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/ipv4/tcp_input.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 5de396075a27..9221c8c7b9a9 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -5525,7 +5525,7 @@ static void __tcp_ack_snd_check(struct sock *sk, int ofo_possible)
}

if (!tcp_is_sack(tp) ||
- tp->compressed_ack >= sock_net(sk)->ipv4.sysctl_tcp_comp_sack_nr)
+ tp->compressed_ack >= READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_comp_sack_nr))
goto send_now;

if (tp->compressed_ack_rcv_nxt != tp->rcv_nxt) {
--
2.35.1




2022-08-01 12:51:51

by Greg KH

[permalink] [raw]
Subject: [PATCH 5.18 66/88] net: dsa: fix reference counting for LAG FDBs

From: Vladimir Oltean <[email protected]>

[ Upstream commit c7560d1203b7a1ea0b99a5c575547e95d564b2a8 ]

Due to an invalid conflict resolution on my side while working on 2
different series (LAG FDBs and FDB isolation), dsa_switch_do_lag_fdb_add()
does not store the database associated with a dsa_mac_addr structure.

So after adding an FDB entry associated with a LAG, dsa_mac_addr_find()
fails to find it while deleting it, because &a->db is zeroized memory
for all stored FDB entries of lag->fdbs, and dsa_switch_do_lag_fdb_del()
returns -ENOENT rather than deleting the entry.

Fixes: c26933639b54 ("net: dsa: request drivers to perform FDB isolation")
Signed-off-by: Vladimir Oltean <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/dsa/switch.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/net/dsa/switch.c b/net/dsa/switch.c
index d8a80cf9742c..52f84ea349d2 100644
--- a/net/dsa/switch.c
+++ b/net/dsa/switch.c
@@ -363,6 +363,7 @@ static int dsa_switch_do_lag_fdb_add(struct dsa_switch *ds, struct dsa_lag *lag,

ether_addr_copy(a->addr, addr);
a->vid = vid;
+ a->db = db;
refcount_set(&a->refcount, 1);
list_add_tail(&a->list, &lag->fdbs);

--
2.35.1




2022-08-01 12:51:57

by Greg KH

[permalink] [raw]
Subject: [PATCH 5.18 78/88] stmmac: dwmac-mediatek: fix resource leak in probe

From: Dan Carpenter <[email protected]>

[ Upstream commit 4d3d3a1b244fd54629a6b7047f39a7bbc8d11910 ]

If mediatek_dwmac_clks_config() fails, then call stmmac_remove_config_dt()
before returning. Otherwise it is a resource leak.

Fixes: fa4b3ca60e80 ("stmmac: dwmac-mediatek: fix clock issue")
Signed-off-by: Dan Carpenter <[email protected]>
Link: https://lore.kernel.org/r/YuJ4aZyMUlG6yGGa@kili
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/ethernet/stmicro/stmmac/dwmac-mediatek.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-mediatek.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-mediatek.c
index ca8ab290013c..d42e1afb6521 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-mediatek.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-mediatek.c
@@ -688,18 +688,19 @@ static int mediatek_dwmac_probe(struct platform_device *pdev)

ret = mediatek_dwmac_clks_config(priv_plat, true);
if (ret)
- return ret;
+ goto err_remove_config_dt;

ret = stmmac_dvr_probe(&pdev->dev, plat_dat, &stmmac_res);
- if (ret) {
- stmmac_remove_config_dt(pdev, plat_dat);
+ if (ret)
goto err_drv_probe;
- }

return 0;

err_drv_probe:
mediatek_dwmac_clks_config(priv_plat, false);
+err_remove_config_dt:
+ stmmac_remove_config_dt(pdev, plat_dat);
+
return ret;
}

--
2.35.1




2022-08-01 12:52:12

by Greg KH

[permalink] [raw]
Subject: [PATCH 5.18 30/88] ice: Fix VSIs unable to share unicast MAC

From: Anirudh Venkataramanan <[email protected]>

commit 5c8e3c7ff3e7bd7b938659be704f75cc746b697f upstream.

The driver currently does not allow two VSIs in the same PF domain
to have the same unicast MAC address. This is incorrect in the sense
that a policy decision is being made in the driver when it must be
left to the user. This approach was causing issues when rebooting
the system with VFs spawned not being able to change their MAC addresses.
Such errors were present in dmesg:

[ 7921.068237] ice 0000:b6:00.2 ens2f2: Unicast MAC 6a:0d:e4:70:ca:d1 already
exists on this PF. Preventing setting VF 7 unicast MAC address to 6a:0d:e4:70:ca:d1

Fix that by removing this restriction. Doing this also allows
us to remove some additional code that's checking if a unicast MAC
filter already exists.

Fixes: 47ebc7b02485 ("ice: Check if unicast MAC exists before setting VF MAC")
Signed-off-by: Anirudh Venkataramanan <[email protected]>
Signed-off-by: Sylwester Dziedziuch <[email protected]>
Signed-off-by: Mateusz Palczewski <[email protected]>
Signed-off-by: Jedrzej Jagielski <[email protected]>
Tested-by: Marek Szlosek <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/intel/ice/ice_main.c | 2 +
drivers/net/ethernet/intel/ice/ice_sriov.c | 40 -----------------------------
2 files changed, 2 insertions(+), 40 deletions(-)

--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -4640,6 +4640,8 @@ ice_probe(struct pci_dev *pdev, const st
ice_set_safe_mode_caps(hw);
}

+ hw->ucast_shared = true;
+
err = ice_init_pf(pf);
if (err) {
dev_err(dev, "ice_init_pf failed: %d\n", err);
--- a/drivers/net/ethernet/intel/ice/ice_sriov.c
+++ b/drivers/net/ethernet/intel/ice/ice_sriov.c
@@ -1310,39 +1310,6 @@ out_put_vf:
}

/**
- * ice_unicast_mac_exists - check if the unicast MAC exists on the PF's switch
- * @pf: PF used to reference the switch's rules
- * @umac: unicast MAC to compare against existing switch rules
- *
- * Return true on the first/any match, else return false
- */
-static bool ice_unicast_mac_exists(struct ice_pf *pf, u8 *umac)
-{
- struct ice_sw_recipe *mac_recipe_list =
- &pf->hw.switch_info->recp_list[ICE_SW_LKUP_MAC];
- struct ice_fltr_mgmt_list_entry *list_itr;
- struct list_head *rule_head;
- struct mutex *rule_lock; /* protect MAC filter list access */
-
- rule_head = &mac_recipe_list->filt_rules;
- rule_lock = &mac_recipe_list->filt_rule_lock;
-
- mutex_lock(rule_lock);
- list_for_each_entry(list_itr, rule_head, list_entry) {
- u8 *existing_mac = &list_itr->fltr_info.l_data.mac.mac_addr[0];
-
- if (ether_addr_equal(existing_mac, umac)) {
- mutex_unlock(rule_lock);
- return true;
- }
- }
-
- mutex_unlock(rule_lock);
-
- return false;
-}
-
-/**
* ice_set_vf_mac
* @netdev: network interface device structure
* @vf_id: VF identifier
@@ -1376,13 +1343,6 @@ int ice_set_vf_mac(struct net_device *ne
if (ret)
goto out_put_vf;

- if (ice_unicast_mac_exists(pf, mac)) {
- netdev_err(netdev, "Unicast MAC %pM already exists on this PF. Preventing setting VF %u unicast MAC address to %pM\n",
- mac, vf_id, mac);
- ret = -EINVAL;
- goto out_put_vf;
- }
-
mutex_lock(&vf->cfg_lock);

/* VF is notified of its new MAC via the PF's response to the



2022-08-01 12:52:22

by Greg KH

[permalink] [raw]
Subject: [PATCH 5.18 34/88] tcp: md5: fix IPv4-mapped support

From: Eric Dumazet <[email protected]>

commit e62d2e110356093c034998e093675df83057e511 upstream.

After the blamed commit, IPv4 SYN packets handled
by a dual stack IPv6 socket are dropped, even if
perfectly valid.

$ nstat | grep MD5
TcpExtTCPMD5Failure 5 0.0

For a dual stack listener, an incoming IPv4 SYN packet
would call tcp_inbound_md5_hash() with @family == AF_INET,
while tp->af_specific is pointing to tcp_sock_ipv6_specific.

Only later when an IPv4-mapped child is created, tp->af_specific
is changed to tcp_sock_ipv6_mapped_specific.

Fixes: 7bbb765b7349 ("net/tcp: Merge TCP-MD5 inbound callbacks")
Reported-by: Brian Vazquez <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Reviewed-by: Dmitry Safonov <[email protected]>
Tested-by: Leonard Crestez <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv4/tcp.c | 15 ++++++++++++---
1 file changed, 12 insertions(+), 3 deletions(-)

--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -4481,9 +4481,18 @@ tcp_inbound_md5_hash(const struct sock *
return SKB_DROP_REASON_TCP_MD5UNEXPECTED;
}

- /* check the signature */
- genhash = tp->af_specific->calc_md5_hash(newhash, hash_expected,
- NULL, skb);
+ /* Check the signature.
+ * To support dual stack listeners, we need to handle
+ * IPv4-mapped case.
+ */
+ if (family == AF_INET)
+ genhash = tcp_v4_md5_hash_skb(newhash,
+ hash_expected,
+ NULL, skb);
+ else
+ genhash = tp->af_specific->calc_md5_hash(newhash,
+ hash_expected,
+ NULL, skb);

if (genhash || memcmp(hash_location, newhash, 16) != 0) {
NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPMD5FAILURE);



2022-08-01 12:52:44

by Greg KH

[permalink] [raw]
Subject: [PATCH 5.18 01/88] Bluetooth: Always set event mask on suspend

From: Abhishek Pandit-Subedi <[email protected]>

commit ef61b6ea154464fefd8a6712d7a3b43b445c3d4a upstream.

When suspending, always set the event mask once disconnects are
successful. Otherwise, if wakeup is disallowed, the event mask is not
set before suspend continues and can result in an early wakeup.

Fixes: 182ee45da083 ("Bluetooth: hci_sync: Rework hci_suspend_notifier")
Cc: [email protected]
Signed-off-by: Abhishek Pandit-Subedi <[email protected]>
Signed-off-by: Luiz Augusto von Dentz <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/bluetooth/hci_sync.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

--- a/net/bluetooth/hci_sync.c
+++ b/net/bluetooth/hci_sync.c
@@ -4942,6 +4942,9 @@ int hci_suspend_sync(struct hci_dev *hde
return err;
}

+ /* Update event mask so only the allowed event can wakeup the host */
+ hci_set_event_mask_sync(hdev);
+
/* Only configure accept list if disconnect succeeded and wake
* isn't being prevented.
*/
@@ -4953,9 +4956,6 @@ int hci_suspend_sync(struct hci_dev *hde
/* Unpause to take care of updating scanning params */
hdev->scanning_paused = false;

- /* Update event mask so only the allowed event can wakeup the host */
- hci_set_event_mask_sync(hdev);
-
/* Enable event filter for paired devices */
hci_update_event_filter_sync(hdev);




2022-08-01 12:52:49

by Greg KH

[permalink] [raw]
Subject: [PATCH 5.18 47/88] tcp: Fix a data-race around sysctl_tcp_tso_rtt_log.

From: Kuniyuki Iwashima <[email protected]>

[ Upstream commit 2455e61b85e9c99af38cd889a7101f1d48b33cb4 ]

While reading sysctl_tcp_tso_rtt_log, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 65466904b015 ("tcp: adjust TSO packet sizes based on min_rtt")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/ipv4/tcp_output.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 08466421e7e0..60c9f7f444e0 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1971,7 +1971,7 @@ static u32 tcp_tso_autosize(const struct sock *sk, unsigned int mss_now,

bytes = sk->sk_pacing_rate >> READ_ONCE(sk->sk_pacing_shift);

- r = tcp_min_rtt(tcp_sk(sk)) >> sock_net(sk)->ipv4.sysctl_tcp_tso_rtt_log;
+ r = tcp_min_rtt(tcp_sk(sk)) >> READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_tso_rtt_log);
if (r < BITS_PER_TYPE(sk->sk_gso_max_size))
bytes += sk->sk_gso_max_size >> r;

--
2.35.1




2022-08-01 12:52:55

by Greg KH

[permalink] [raw]
Subject: [PATCH 5.18 58/88] tcp: Fix data-races around sk_pacing_rate.

From: Kuniyuki Iwashima <[email protected]>

[ Upstream commit 59bf6c65a09fff74215517aecffbbdcd67df76e3 ]

While reading sysctl_tcp_pacing_(ss|ca)_ratio, they can be changed
concurrently. Thus, we need to add READ_ONCE() to their readers.

Fixes: 43e122b014c9 ("tcp: refine pacing rate determination")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/ipv4/tcp_input.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index db78197a44ff..de066fad7dfe 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -910,9 +910,9 @@ static void tcp_update_pacing_rate(struct sock *sk)
* end of slow start and should slow down.
*/
if (tcp_snd_cwnd(tp) < tp->snd_ssthresh / 2)
- rate *= sock_net(sk)->ipv4.sysctl_tcp_pacing_ss_ratio;
+ rate *= READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_pacing_ss_ratio);
else
- rate *= sock_net(sk)->ipv4.sysctl_tcp_pacing_ca_ratio;
+ rate *= READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_pacing_ca_ratio);

rate *= max(tcp_snd_cwnd(tp), tp->packets_out);

--
2.35.1




2022-08-01 12:52:55

by Greg KH

[permalink] [raw]
Subject: [PATCH 5.18 75/88] sfc: disable softirqs for ptp TX

From: Alejandro Lucero <[email protected]>

[ Upstream commit 67c3b611d92fc238c43734878bc3e232ab570c79 ]

Sending a PTP packet can imply to use the normal TX driver datapath but
invoked from the driver's ptp worker. The kernel generic TX code
disables softirqs and preemption before calling specific driver TX code,
but the ptp worker does not. Although current ptp driver functionality
does not require it, there are several reasons for doing so:

1) The invoked code is always executed with softirqs disabled for non
PTP packets.
2) Better if a ptp packet transmission is not interrupted by softirq
handling which could lead to high latencies.
3) netdev_xmit_more used by the TX code requires preemption to be
disabled.

Indeed a solution for dealing with kernel preemption state based on static
kernel configuration is not possible since the introduction of dynamic
preemption level configuration at boot time using the static calls
functionality.

Fixes: f79c957a0b537 ("drivers: net: sfc: use netdev_xmit_more helper")
Signed-off-by: Alejandro Lucero <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/ethernet/sfc/ptp.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)

diff --git a/drivers/net/ethernet/sfc/ptp.c b/drivers/net/ethernet/sfc/ptp.c
index 4625f85acab2..10ad0b93d283 100644
--- a/drivers/net/ethernet/sfc/ptp.c
+++ b/drivers/net/ethernet/sfc/ptp.c
@@ -1100,7 +1100,29 @@ static void efx_ptp_xmit_skb_queue(struct efx_nic *efx, struct sk_buff *skb)

tx_queue = efx_channel_get_tx_queue(ptp_data->channel, type);
if (tx_queue && tx_queue->timestamping) {
+ /* This code invokes normal driver TX code which is always
+ * protected from softirqs when called from generic TX code,
+ * which in turn disables preemption. Look at __dev_queue_xmit
+ * which uses rcu_read_lock_bh disabling preemption for RCU
+ * plus disabling softirqs. We do not need RCU reader
+ * protection here.
+ *
+ * Although it is theoretically safe for current PTP TX/RX code
+ * running without disabling softirqs, there are three good
+ * reasond for doing so:
+ *
+ * 1) The code invoked is mainly implemented for non-PTP
+ * packets and it is always executed with softirqs
+ * disabled.
+ * 2) This being a single PTP packet, better to not
+ * interrupt its processing by softirqs which can lead
+ * to high latencies.
+ * 3) netdev_xmit_more checks preemption is disabled and
+ * triggers a BUG_ON if not.
+ */
+ local_bh_disable();
efx_enqueue_skb(tx_queue, skb);
+ local_bh_enable();
} else {
WARN_ONCE(1, "PTP channel has no timestamped tx queue\n");
dev_kfree_skb_any(skb);
--
2.35.1




2022-08-01 12:52:55

by Greg KH

[permalink] [raw]
Subject: [PATCH 5.18 82/88] ARM: 9216/1: Fix MAX_DMA_ADDRESS overflow

From: Florian Fainelli <[email protected]>

[ Upstream commit fb0fd3469ead5b937293c213daa1f589b4b7ce46 ]

Commit 26f09e9b3a06 ("mm/memblock: add memblock memory allocation apis")
added a check to determine whether arm_dma_zone_size is exceeding the
amount of kernel virtual address space available between the upper 4GB
virtual address limit and PAGE_OFFSET in order to provide a suitable
definition of MAX_DMA_ADDRESS that should fit within the 32-bit virtual
address space. The quantity used for comparison was off by a missing
trailing 0, leading to MAX_DMA_ADDRESS to be overflowing a 32-bit
quantity.

This was caught thanks to CONFIG_DEBUG_VIRTUAL on the bcm2711 platform
where we define a dma_zone_size of 1GB and we have a PAGE_OFFSET value
of 0xc000_0000 (CONFIG_VMSPLIT_3G) leading to MAX_DMA_ADDRESS being
0x1_0000_0000 which overflows the unsigned long type used throughout
__pa() and then __virt_addr_valid(). Because the virtual address passed
to __virt_addr_valid() would now be 0, the function would loudly warn
and flood the kernel log, thus making the platform unable to boot
properly.

Fixes: 26f09e9b3a06 ("mm/memblock: add memblock memory allocation apis")
Signed-off-by: Florian Fainelli <[email protected]>
Reviewed-by: Linus Walleij <[email protected]>
Signed-off-by: Russell King (Oracle) <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
arch/arm/include/asm/dma.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/include/asm/dma.h b/arch/arm/include/asm/dma.h
index a81dda65c576..45180a2cc47c 100644
--- a/arch/arm/include/asm/dma.h
+++ b/arch/arm/include/asm/dma.h
@@ -10,7 +10,7 @@
#else
#define MAX_DMA_ADDRESS ({ \
extern phys_addr_t arm_dma_zone_size; \
- arm_dma_zone_size && arm_dma_zone_size < (0x10000000 - PAGE_OFFSET) ? \
+ arm_dma_zone_size && arm_dma_zone_size < (0x100000000ULL - PAGE_OFFSET) ? \
(PAGE_OFFSET + arm_dma_zone_size) : 0xffffffffUL; })
#endif

--
2.35.1




2022-08-01 12:52:56

by Greg KH

[permalink] [raw]
Subject: [PATCH 5.18 60/88] tcp: Fix a data-race around sysctl_tcp_comp_sack_delay_ns.

From: Kuniyuki Iwashima <[email protected]>

[ Upstream commit 4866b2b0f7672b6d760c4b8ece6fb56f965dcc8a ]

While reading sysctl_tcp_comp_sack_delay_ns, it can be changed
concurrently. Thus, we need to add READ_ONCE() to its reader.

Fixes: 6d82aa242092 ("tcp: add tcp_comp_sack_delay_ns sysctl")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/ipv4/tcp_input.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index f09b1321a960..3591a25a8631 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -5546,7 +5546,8 @@ static void __tcp_ack_snd_check(struct sock *sk, int ofo_possible)
if (tp->srtt_us && tp->srtt_us < rtt)
rtt = tp->srtt_us;

- delay = min_t(unsigned long, sock_net(sk)->ipv4.sysctl_tcp_comp_sack_delay_ns,
+ delay = min_t(unsigned long,
+ READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_comp_sack_delay_ns),
rtt * (NSEC_PER_USEC >> 3)/20);
sock_hold(sk);
hrtimer_start_range_ns(&tp->compressed_ack_timer, ns_to_ktime(delay),
--
2.35.1




2022-08-01 12:53:13

by Greg KH

[permalink] [raw]
Subject: [PATCH 5.18 71/88] scsi: ufs: core: Fix a race condition related to device management

From: Bart Van Assche <[email protected]>

[ Upstream commit f5c2976e0cb0f6236013bfb479868531b04f61d4 ]

If a device management command completion happens after
wait_for_completion_timeout() times out and before ufshcd_clear_cmds() is
called, then the completion code may crash on the complete() call in
__ufshcd_transfer_req_compl().

Fix the following crash:

Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
Call trace:
complete+0x64/0x178
__ufshcd_transfer_req_compl+0x30c/0x9c0
ufshcd_poll+0xf0/0x208
ufshcd_sl_intr+0xb8/0xf0
ufshcd_intr+0x168/0x2f4
__handle_irq_event_percpu+0xa0/0x30c
handle_irq_event+0x84/0x178
handle_fasteoi_irq+0x150/0x2e8
__handle_domain_irq+0x114/0x1e4
gic_handle_irq.31846+0x58/0x300
el1_irq+0xe4/0x1c0
efi_header_end+0x110/0x680
__irq_exit_rcu+0x108/0x124
__handle_domain_irq+0x118/0x1e4
gic_handle_irq.31846+0x58/0x300
el1_irq+0xe4/0x1c0
cpuidle_enter_state+0x3ac/0x8c4
do_idle+0x2fc/0x55c
cpu_startup_entry+0x84/0x90
kernel_init+0x0/0x310
start_kernel+0x0/0x608
start_kernel+0x4ec/0x608

Link: https://lore.kernel.org/r/[email protected]
Fixes: 5a0b0cb9bee7 ("[SCSI] ufs: Add support for sending NOP OUT UPIU")
Cc: Adrian Hunter <[email protected]>
Cc: Avri Altman <[email protected]>
Cc: Bean Huo <[email protected]>
Cc: Stanley Chu <[email protected]>
Signed-off-by: Bart Van Assche <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/scsi/ufs/ufshcd.c | 58 +++++++++++++++++++++++++++------------
1 file changed, 40 insertions(+), 18 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index a34c1fab0246..874490f7f5e7 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -2947,37 +2947,59 @@ ufshcd_dev_cmd_completion(struct ufs_hba *hba, struct ufshcd_lrb *lrbp)
static int ufshcd_wait_for_dev_cmd(struct ufs_hba *hba,
struct ufshcd_lrb *lrbp, int max_timeout)
{
- int err = 0;
- unsigned long time_left;
+ unsigned long time_left = msecs_to_jiffies(max_timeout);
unsigned long flags;
+ bool pending;
+ int err;

+retry:
time_left = wait_for_completion_timeout(hba->dev_cmd.complete,
- msecs_to_jiffies(max_timeout));
+ time_left);

- spin_lock_irqsave(hba->host->host_lock, flags);
- hba->dev_cmd.complete = NULL;
if (likely(time_left)) {
+ /*
+ * The completion handler called complete() and the caller of
+ * this function still owns the @lrbp tag so the code below does
+ * not trigger any race conditions.
+ */
+ hba->dev_cmd.complete = NULL;
err = ufshcd_get_tr_ocs(lrbp);
if (!err)
err = ufshcd_dev_cmd_completion(hba, lrbp);
- }
- spin_unlock_irqrestore(hba->host->host_lock, flags);
-
- if (!time_left) {
+ } else {
err = -ETIMEDOUT;
dev_dbg(hba->dev, "%s: dev_cmd request timedout, tag %d\n",
__func__, lrbp->task_tag);
- if (!ufshcd_clear_cmds(hba, 1U << lrbp->task_tag))
+ if (ufshcd_clear_cmds(hba, 1U << lrbp->task_tag) == 0) {
/* successfully cleared the command, retry if needed */
err = -EAGAIN;
- /*
- * in case of an error, after clearing the doorbell,
- * we also need to clear the outstanding_request
- * field in hba
- */
- spin_lock_irqsave(&hba->outstanding_lock, flags);
- __clear_bit(lrbp->task_tag, &hba->outstanding_reqs);
- spin_unlock_irqrestore(&hba->outstanding_lock, flags);
+ /*
+ * Since clearing the command succeeded we also need to
+ * clear the task tag bit from the outstanding_reqs
+ * variable.
+ */
+ spin_lock_irqsave(&hba->outstanding_lock, flags);
+ pending = test_bit(lrbp->task_tag,
+ &hba->outstanding_reqs);
+ if (pending) {
+ hba->dev_cmd.complete = NULL;
+ __clear_bit(lrbp->task_tag,
+ &hba->outstanding_reqs);
+ }
+ spin_unlock_irqrestore(&hba->outstanding_lock, flags);
+
+ if (!pending) {
+ /*
+ * The completion handler ran while we tried to
+ * clear the command.
+ */
+ time_left = 1;
+ goto retry;
+ }
+ } else {
+ dev_err(hba->dev, "%s: failed to clear tag %d\n",
+ __func__, lrbp->task_tag);
+ }
}

return err;
--
2.35.1




2022-08-01 12:53:15

by Greg KH

[permalink] [raw]
Subject: [PATCH 5.18 36/88] tcp: Fix a data-race around sysctl_tcp_limit_output_bytes.

From: Kuniyuki Iwashima <[email protected]>

commit 9fb90193fbd66b4c5409ef729fd081861f8b6351 upstream.

While reading sysctl_tcp_limit_output_bytes, it can be changed
concurrently. Thus, we need to add READ_ONCE() to its reader.

Fixes: 46d3ceabd8d9 ("tcp: TCP Small Queues")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv4/tcp_output.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2502,7 +2502,7 @@ static bool tcp_small_queue_check(struct
sk->sk_pacing_rate >> READ_ONCE(sk->sk_pacing_shift));
if (sk->sk_pacing_status == SK_PACING_NONE)
limit = min_t(unsigned long, limit,
- sock_net(sk)->ipv4.sysctl_tcp_limit_output_bytes);
+ READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_limit_output_bytes));
limit <<= factor;

if (static_branch_unlikely(&tcp_tx_delay_enabled) &&



2022-08-01 12:53:34

by Greg KH

[permalink] [raw]
Subject: [PATCH 5.18 54/88] macsec: limit replay window size with XPN

From: Sabrina Dubroca <[email protected]>

[ Upstream commit b07a0e2044057f201d694ab474f5c42a02b6465b ]

IEEE 802.1AEbw-2013 (section 10.7.8) specifies that the maximum value
of the replay window is 2^30-1, to help with recovery of the upper
bits of the PN.

To avoid leaving the existing macsec device in an inconsistent state
if this test fails during changelink, reuse the cleanup mechanism
introduced for HW offload. This wasn't needed until now because
macsec_changelink_common could not fail during changelink, as
modifying the cipher suite was not allowed.

Finally, this must happen after handling IFLA_MACSEC_CIPHER_SUITE so
that secy->xpn is set.

Fixes: 48ef50fa866a ("macsec: Netlink support of XPN cipher suites (IEEE 802.1AEbw)")
Signed-off-by: Sabrina Dubroca <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/macsec.c | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c
index 634452d3ecc5..b3834e353c22 100644
--- a/drivers/net/macsec.c
+++ b/drivers/net/macsec.c
@@ -243,6 +243,7 @@ static struct macsec_cb *macsec_skb_cb(struct sk_buff *skb)
#define DEFAULT_SEND_SCI true
#define DEFAULT_ENCRYPT false
#define DEFAULT_ENCODING_SA 0
+#define MACSEC_XPN_MAX_REPLAY_WINDOW (((1 << 30) - 1))

static bool send_sci(const struct macsec_secy *secy)
{
@@ -3746,9 +3747,6 @@ static int macsec_changelink_common(struct net_device *dev,
secy->operational = tx_sa && tx_sa->active;
}

- if (data[IFLA_MACSEC_WINDOW])
- secy->replay_window = nla_get_u32(data[IFLA_MACSEC_WINDOW]);
-
if (data[IFLA_MACSEC_ENCRYPT])
tx_sc->encrypt = !!nla_get_u8(data[IFLA_MACSEC_ENCRYPT]);

@@ -3794,6 +3792,16 @@ static int macsec_changelink_common(struct net_device *dev,
}
}

+ if (data[IFLA_MACSEC_WINDOW]) {
+ secy->replay_window = nla_get_u32(data[IFLA_MACSEC_WINDOW]);
+
+ /* IEEE 802.1AEbw-2013 10.7.8 - maximum replay window
+ * for XPN cipher suites */
+ if (secy->xpn &&
+ secy->replay_window > MACSEC_XPN_MAX_REPLAY_WINDOW)
+ return -EINVAL;
+ }
+
return 0;
}

@@ -3823,7 +3831,7 @@ static int macsec_changelink(struct net_device *dev, struct nlattr *tb[],

ret = macsec_changelink_common(dev, data);
if (ret)
- return ret;
+ goto cleanup;

/* If h/w offloading is available, propagate to the device */
if (macsec_is_offloaded(macsec)) {
--
2.35.1




2022-08-01 12:53:35

by Greg KH

[permalink] [raw]
Subject: [PATCH 5.18 03/88] ARM: dts: lan966x: fix sys_clk frequency

From: Michael Walle <[email protected]>

commit ef0324b6415db6742bd632dc0dfbb8fbc111473b upstream.

The sys_clk frequency is 165.625MHz. The register reference of the
Generic Clock controller lists the CPU clock as 600MHz, the DDR clock as
300MHz and the SYS clock as 162.5MHz. This is wrong. It was first
noticed during the fan driver development and it was measured and
verified via the CLK_MON output of the SoC which can be configured to
output sys_clk/64.

The core PLL settings (which drives the SYS clock) seems to be as
follows:
DIVF = 52
DIVQ = 3
DIVR = 1

With a refernce clock of 25MHz, this means we have a post divider clock
Fpfd = Fref / (DIVR + 1) = 25MHz / (1 + 1) = 12.5MHz

The resulting VCO frequency is then
Fvco = Fpfd * (DIVF + 1) * 2 = 12.5MHz * (52 + 1) * 2 = 1325MHz

And the output frequency is
Fout = Fvco / 2^DIVQ = 1325MHz / 2^3 = 165.625Mhz

This all adds up to the constrains of the PLL:
10MHz <= Fpfd <= 200MHz
20MHz <= Fout <= 1000MHz
1000MHz <= Fvco <= 2000MHz

Fixes: 290deaa10c50 ("ARM: dts: add DT for lan966 SoC and 2-port board pcb8291")
Signed-off-by: Michael Walle <[email protected]>
Reviewed-by: Kavyasree Kotagiri <[email protected]>
Signed-off-by: Claudiu Beznea <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/arm/boot/dts/lan966x.dtsi | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/boot/dts/lan966x.dtsi b/arch/arm/boot/dts/lan966x.dtsi
index 3cb02fffe716..38e90a31d2dd 100644
--- a/arch/arm/boot/dts/lan966x.dtsi
+++ b/arch/arm/boot/dts/lan966x.dtsi
@@ -38,7 +38,7 @@ clocks {
sys_clk: sys_clk {
compatible = "fixed-clock";
#clock-cells = <0>;
- clock-frequency = <162500000>;
+ clock-frequency = <165625000>;
};

cpu_clk: cpu_clk {
--
2.37.1




2022-08-01 12:53:37

by Greg KH

[permalink] [raw]
Subject: [PATCH 5.18 80/88] page_alloc: fix invalid watermark check on a negative value

From: Jaewon Kim <[email protected]>

commit 9282012fc0aa248b77a69f5eb802b67c5a16bb13 upstream.

There was a report that a task is waiting at the
throttle_direct_reclaim. The pgscan_direct_throttle in vmstat was
increasing.

This is a bug where zone_watermark_fast returns true even when the free
is very low. The commit f27ce0e14088 ("page_alloc: consider highatomic
reserve in watermark fast") changed the watermark fast to consider
highatomic reserve. But it did not handle a negative value case which
can be happened when reserved_highatomic pageblock is bigger than the
actual free.

If watermark is considered as ok for the negative value, allocating
contexts for order-0 will consume all free pages without direct reclaim,
and finally free page may become depleted except highatomic free.

Then allocating contexts may fall into throttle_direct_reclaim. This
symptom may easily happen in a system where wmark min is low and other
reclaimers like kswapd does not make free pages quickly.

Handle the negative case by using MIN.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: f27ce0e14088 ("page_alloc: consider highatomic reserve in watermark fast")
Signed-off-by: Jaewon Kim <[email protected]>
Reported-by: GyeongHwan Hong <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Yong-Taek Lee <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
mm/page_alloc.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)

--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3953,11 +3953,15 @@ static inline bool zone_watermark_fast(s
* need to be calculated.
*/
if (!order) {
- long fast_free;
+ long usable_free;
+ long reserved;

- fast_free = free_pages;
- fast_free -= __zone_watermark_unusable_free(z, 0, alloc_flags);
- if (fast_free > mark + z->lowmem_reserve[highest_zoneidx])
+ usable_free = free_pages;
+ reserved = __zone_watermark_unusable_free(z, 0, alloc_flags);
+
+ /* reserved may over estimate high-atomic reserves. */
+ usable_free -= min(usable_free, reserved);
+ if (usable_free > mark + z->lowmem_reserve[highest_zoneidx])
return true;
}




2022-08-01 12:53:53

by Greg KH

[permalink] [raw]
Subject: [PATCH 5.18 51/88] Documentation: fix sctp_wmem in ip-sysctl.rst

From: Xin Long <[email protected]>

[ Upstream commit aa709da0e032cee7c202047ecd75f437bb0126ed ]

Since commit 1033990ac5b2 ("sctp: implement memory accounting on tx path"),
SCTP has supported memory accounting on tx path where 'sctp_wmem' is used
by sk_wmem_schedule(). So we should fix the description for this option in
ip-sysctl.rst accordingly.

v1->v2:
- Improve the description as Marcelo suggested.

Fixes: 1033990ac5b2 ("sctp: implement memory accounting on tx path")
Signed-off-by: Xin Long <[email protected]>
Acked-by: Marcelo Ricardo Leitner <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
Documentation/networking/ip-sysctl.rst | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst
index 8899b474edbf..e29017d4d7a2 100644
--- a/Documentation/networking/ip-sysctl.rst
+++ b/Documentation/networking/ip-sysctl.rst
@@ -2848,7 +2848,14 @@ sctp_rmem - vector of 3 INTEGERs: min, default, max
Default: 4K

sctp_wmem - vector of 3 INTEGERs: min, default, max
- Currently this tunable has no effect.
+ Only the first value ("min") is used, "default" and "max" are
+ ignored.
+
+ min: Minimum size of send buffer that can be used by SCTP sockets.
+ It is guaranteed to each SCTP socket (but not association) even
+ under moderate memory pressure.
+
+ Default: 4K

addr_scope_policy - INTEGER
Control IPv4 address scoping - draft-stewart-tsvwg-sctp-ipv4-00
--
2.35.1




2022-08-01 12:53:59

by Greg KH

[permalink] [raw]
Subject: [PATCH 5.18 74/88] perf symbol: Correct address for bss symbols

From: Leo Yan <[email protected]>

[ Upstream commit 2d86612aacb7805f72873691a2644d7279ed0630 ]

When using 'perf mem' and 'perf c2c', an issue is observed that tool
reports the wrong offset for global data symbols. This is a common
issue on both x86 and Arm64 platforms.

Let's see an example, for a test program, below is the disassembly for
its .bss section which is dumped with objdump:

...

Disassembly of section .bss:

0000000000004040 <completed.0>:
...

0000000000004080 <buf1>:
...

00000000000040c0 <buf2>:
...

0000000000004100 <thread>:
...

First we used 'perf mem record' to run the test program and then used
'perf --debug verbose=4 mem report' to observe what's the symbol info
for 'buf1' and 'buf2' structures.

# ./perf mem record -e ldlat-loads,ldlat-stores -- false_sharing.exe 8
# ./perf --debug verbose=4 mem report
...
dso__load_sym_internal: adjusting symbol: st_value: 0x40c0 sh_addr: 0x4040 sh_offset: 0x3028
symbol__new: buf2 0x30a8-0x30e8
...
dso__load_sym_internal: adjusting symbol: st_value: 0x4080 sh_addr: 0x4040 sh_offset: 0x3028
symbol__new: buf1 0x3068-0x30a8
...

The perf tool relies on libelf to parse symbols, in executable and
shared object files, 'st_value' holds a virtual address; 'sh_addr' is
the address at which section's first byte should reside in memory, and
'sh_offset' is the byte offset from the beginning of the file to the
first byte in the section. The perf tool uses below formula to convert
a symbol's memory address to a file address:

file_address = st_value - sh_addr + sh_offset
^
` Memory address

We can see the final adjusted address ranges for buf1 and buf2 are
[0x30a8-0x30e8) and [0x3068-0x30a8) respectively, apparently this is
incorrect, in the code, the structure for 'buf1' and 'buf2' specifies
compiler attribute with 64-byte alignment.

The problem happens for 'sh_offset', libelf returns it as 0x3028 which
is not 64-byte aligned, combining with disassembly, it's likely libelf
doesn't respect the alignment for .bss section, therefore, it doesn't
return the aligned value for 'sh_offset'.

Suggested by Fangrui Song, ELF file contains program header which
contains PT_LOAD segments, the fields p_vaddr and p_offset in PT_LOAD
segments contain the execution info. A better choice for converting
memory address to file address is using the formula:

file_address = st_value - p_vaddr + p_offset

This patch introduces elf_read_program_header() which returns the
program header based on the passed 'st_value', then it uses the formula
above to calculate the symbol file address; and the debugging log is
updated respectively.

After applying the change:

# ./perf --debug verbose=4 mem report
...
dso__load_sym_internal: adjusting symbol: st_value: 0x40c0 p_vaddr: 0x3d28 p_offset: 0x2d28
symbol__new: buf2 0x30c0-0x3100
...
dso__load_sym_internal: adjusting symbol: st_value: 0x4080 p_vaddr: 0x3d28 p_offset: 0x2d28
symbol__new: buf1 0x3080-0x30c0
...

Fixes: f17e04afaff84b5c ("perf report: Fix ELF symbol parsing")
Reported-by: Chang Rui <[email protected]>
Suggested-by: Fangrui Song <[email protected]>
Signed-off-by: Leo Yan <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
tools/perf/util/symbol-elf.c | 45 ++++++++++++++++++++++++++++++++----
1 file changed, 41 insertions(+), 4 deletions(-)

diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index ecd377938eea..ef6ced5c5746 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -233,6 +233,33 @@ Elf_Scn *elf_section_by_name(Elf *elf, GElf_Ehdr *ep,
return NULL;
}

+static int elf_read_program_header(Elf *elf, u64 vaddr, GElf_Phdr *phdr)
+{
+ size_t i, phdrnum;
+ u64 sz;
+
+ if (elf_getphdrnum(elf, &phdrnum))
+ return -1;
+
+ for (i = 0; i < phdrnum; i++) {
+ if (gelf_getphdr(elf, i, phdr) == NULL)
+ return -1;
+
+ if (phdr->p_type != PT_LOAD)
+ continue;
+
+ sz = max(phdr->p_memsz, phdr->p_filesz);
+ if (!sz)
+ continue;
+
+ if (vaddr >= phdr->p_vaddr && (vaddr < phdr->p_vaddr + sz))
+ return 0;
+ }
+
+ /* Not found any valid program header */
+ return -1;
+}
+
static bool want_demangle(bool is_kernel_sym)
{
return is_kernel_sym ? symbol_conf.demangle_kernel : symbol_conf.demangle;
@@ -1209,6 +1236,7 @@ dso__load_sym_internal(struct dso *dso, struct map *map, struct symsrc *syms_ss,
sym.st_value);
used_opd = true;
}
+
/*
* When loading symbols in a data mapping, ABS symbols (which
* has a value of SHN_ABS in its st_shndx) failed at
@@ -1262,11 +1290,20 @@ dso__load_sym_internal(struct dso *dso, struct map *map, struct symsrc *syms_ss,
goto out_elf_end;
} else if ((used_opd && runtime_ss->adjust_symbols) ||
(!used_opd && syms_ss->adjust_symbols)) {
+ GElf_Phdr phdr;
+
+ if (elf_read_program_header(syms_ss->elf,
+ (u64)sym.st_value, &phdr)) {
+ pr_warning("%s: failed to find program header for "
+ "symbol: %s st_value: %#" PRIx64 "\n",
+ __func__, elf_name, (u64)sym.st_value);
+ continue;
+ }
pr_debug4("%s: adjusting symbol: st_value: %#" PRIx64 " "
- "sh_addr: %#" PRIx64 " sh_offset: %#" PRIx64 "\n", __func__,
- (u64)sym.st_value, (u64)shdr.sh_addr,
- (u64)shdr.sh_offset);
- sym.st_value -= shdr.sh_addr - shdr.sh_offset;
+ "p_vaddr: %#" PRIx64 " p_offset: %#" PRIx64 "\n",
+ __func__, (u64)sym.st_value, (u64)phdr.p_vaddr,
+ (u64)phdr.p_offset);
+ sym.st_value -= phdr.p_vaddr - phdr.p_offset;
}

demangled = demangle_sym(dso, kmodule, elf_name);
--
2.35.1




2022-08-01 12:53:59

by Greg KH

[permalink] [raw]
Subject: [PATCH 5.18 16/88] nouveau/svm: Fix to migrate all requested pages

From: Alistair Popple <[email protected]>

commit 66cee9097e2b74ff3c8cc040ce5717c521a0c3fa upstream.

Users may request that pages from an OpenCL SVM allocation be migrated
to the GPU with clEnqueueSVMMigrateMem(). In Nouveau this will call into
nouveau_dmem_migrate_vma() to do the migration. If the total range to be
migrated exceeds SG_MAX_SINGLE_ALLOC the pages will be migrated in
chunks of size SG_MAX_SINGLE_ALLOC. However a typo in updating the
starting address means that only the first chunk will get migrated.

Fix the calculation so that the entire range will get migrated if
possible.

Signed-off-by: Alistair Popple <[email protected]>
Fixes: e3d8b0890469 ("drm/nouveau/svm: map pages after migration")
Reviewed-by: Ralph Campbell <[email protected]>
Reviewed-by: Lyude Paul <[email protected]>
Signed-off-by: Lyude Paul <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Cc: <[email protected]> # v5.8+
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/gpu/drm/nouveau/nouveau_dmem.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

--- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
@@ -680,7 +680,11 @@ nouveau_dmem_migrate_vma(struct nouveau_
goto out_free_dma;

for (i = 0; i < npages; i += max) {
- args.end = start + (max << PAGE_SHIFT);
+ if (args.start + (max << PAGE_SHIFT) > end)
+ args.end = end;
+ else
+ args.end = args.start + (max << PAGE_SHIFT);
+
ret = migrate_vma_setup(&args);
if (ret)
goto out_free_pfns;



2022-08-01 12:54:05

by Greg KH

[permalink] [raw]
Subject: [PATCH 5.18 42/88] net/tls: Remove the context from the list in tls_device_down

From: Maxim Mikityanskiy <[email protected]>

commit f6336724a4d4220c89a4ec38bca84b03b178b1a3 upstream.

tls_device_down takes a reference on all contexts it's going to move to
the degraded state (software fallback). If sk_destruct runs afterwards,
it can reduce the reference counter back to 1 and return early without
destroying the context. Then tls_device_down will release the reference
it took and call tls_device_free_ctx. However, the context will still
stay in tls_device_down_list forever. The list will contain an item,
memory for which is released, making a memory corruption possible.

Fix the above bug by properly removing the context from all lists before
any call to tls_device_free_ctx.

Fixes: 3740651bf7e2 ("tls: Fix context leak on tls_device_down")
Signed-off-by: Maxim Mikityanskiy <[email protected]>
Reviewed-by: Tariq Toukan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/tls/tls_device.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

--- a/net/tls/tls_device.c
+++ b/net/tls/tls_device.c
@@ -1351,8 +1351,13 @@ static int tls_device_down(struct net_de
* by tls_device_free_ctx. rx_conf and tx_conf stay in TLS_HW.
* Now release the ref taken above.
*/
- if (refcount_dec_and_test(&ctx->refcount))
+ if (refcount_dec_and_test(&ctx->refcount)) {
+ /* sk_destruct ran after tls_device_down took a ref, and
+ * it returned early. Complete the destruction here.
+ */
+ list_del(&ctx->list);
tls_device_free_ctx(ctx);
+ }
}

up_write(&device_offload_lock);



2022-08-01 12:54:08

by Greg KH

[permalink] [raw]
Subject: [PATCH 5.18 55/88] macsec: always read MACSEC_SA_ATTR_PN as a u64

From: Sabrina Dubroca <[email protected]>

[ Upstream commit c630d1fe6219769049c87d1a6a0e9a6de55328a1 ]

Currently, MACSEC_SA_ATTR_PN is handled inconsistently, sometimes as a
u32, sometimes forced into a u64 without checking the actual length of
the attribute. Instead, we can use nla_get_u64 everywhere, which will
read up to 64 bits into a u64, capped by the actual length of the
attribute coming from userspace.

This fixes several issues:
- the check in validate_add_rxsa doesn't work with 32-bit attributes
- the checks in validate_add_txsa and validate_upd_sa incorrectly
reject X << 32 (with X != 0)

Fixes: 48ef50fa866a ("macsec: Netlink support of XPN cipher suites (IEEE 802.1AEbw)")
Signed-off-by: Sabrina Dubroca <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/macsec.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c
index b3834e353c22..95578f04f212 100644
--- a/drivers/net/macsec.c
+++ b/drivers/net/macsec.c
@@ -1698,7 +1698,7 @@ static bool validate_add_rxsa(struct nlattr **attrs)
return false;

if (attrs[MACSEC_SA_ATTR_PN] &&
- *(u64 *)nla_data(attrs[MACSEC_SA_ATTR_PN]) == 0)
+ nla_get_u64(attrs[MACSEC_SA_ATTR_PN]) == 0)
return false;

if (attrs[MACSEC_SA_ATTR_ACTIVE]) {
@@ -1941,7 +1941,7 @@ static bool validate_add_txsa(struct nlattr **attrs)
if (nla_get_u8(attrs[MACSEC_SA_ATTR_AN]) >= MACSEC_NUM_AN)
return false;

- if (nla_get_u32(attrs[MACSEC_SA_ATTR_PN]) == 0)
+ if (nla_get_u64(attrs[MACSEC_SA_ATTR_PN]) == 0)
return false;

if (attrs[MACSEC_SA_ATTR_ACTIVE]) {
@@ -2295,7 +2295,7 @@ static bool validate_upd_sa(struct nlattr **attrs)
if (nla_get_u8(attrs[MACSEC_SA_ATTR_AN]) >= MACSEC_NUM_AN)
return false;

- if (attrs[MACSEC_SA_ATTR_PN] && nla_get_u32(attrs[MACSEC_SA_ATTR_PN]) == 0)
+ if (attrs[MACSEC_SA_ATTR_PN] && nla_get_u64(attrs[MACSEC_SA_ATTR_PN]) == 0)
return false;

if (attrs[MACSEC_SA_ATTR_ACTIVE]) {
--
2.35.1




2022-08-01 12:54:16

by Greg KH

[permalink] [raw]
Subject: [PATCH 5.18 88/88] x86/bugs: Do not enable IBPB at firmware entry when IBPB is not available

From: Thadeu Lima de Souza Cascardo <[email protected]>

commit 571c30b1a88465a1c85a6f7762609939b9085a15 upstream.

Some cloud hypervisors do not provide IBPB on very recent CPU processors,
including AMD processors affected by Retbleed.

Using IBPB before firmware calls on such systems would cause a GPF at boot
like the one below. Do not enable such calls when IBPB support is not
present.

EFI Variables Facility v0.08 2004-May-17
general protection fault, maybe for address 0x1: 0000 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 24 Comm: kworker/u2:1 Not tainted 5.19.0-rc8+ #7
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
Workqueue: efi_rts_wq efi_call_rts
RIP: 0010:efi_call_rts
Code: e8 37 33 58 ff 41 bf 48 00 00 00 49 89 c0 44 89 f9 48 83 c8 01 4c 89 c2 48 c1 ea 20 66 90 b9 49 00 00 00 b8 01 00 00 00 31 d2 <0f> 30 e8 7b 9f 5d ff e8 f6 f8 ff ff 4c 89 f1 4c 89 ea 4c 89 e6 48
RSP: 0018:ffffb373800d7e38 EFLAGS: 00010246
RAX: 0000000000000001 RBX: 0000000000000006 RCX: 0000000000000049
RDX: 0000000000000000 RSI: ffff94fbc19d8fe0 RDI: ffff94fbc1b2b300
RBP: ffffb373800d7e70 R08: 0000000000000000 R09: 0000000000000000
R10: 000000000000000b R11: 000000000000000b R12: ffffb3738001fd78
R13: ffff94fbc2fcfc00 R14: ffffb3738001fd80 R15: 0000000000000048
FS: 0000000000000000(0000) GS:ffff94fc3da00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff94fc30201000 CR3: 000000006f610000 CR4: 00000000000406f0
Call Trace:
<TASK>
? __wake_up
process_one_work
worker_thread
? rescuer_thread
kthread
? kthread_complete_and_exit
ret_from_fork
</TASK>
Modules linked in:

Fixes: 28a99e95f55c ("x86/amd: Use IBPB for firmware calls")
Reported-by: Dimitri John Ledkov <[email protected]>
Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/kernel/cpu/bugs.c | 1 +
1 file changed, 1 insertion(+)

--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -1513,6 +1513,7 @@ static void __init spectre_v2_select_mit
* enable IBRS around firmware calls.
*/
if (boot_cpu_has_bug(X86_BUG_RETBLEED) &&
+ boot_cpu_has(X86_FEATURE_IBPB) &&
(boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||
boot_cpu_data.x86_vendor == X86_VENDOR_HYGON)) {




2022-08-01 12:55:14

by Greg KH

[permalink] [raw]
Subject: [PATCH 5.18 43/88] net: pcs: xpcs: propagate xpcs_read error to xpcs_get_state_c37_sgmii

From: Vladimir Oltean <[email protected]>

[ Upstream commit 27161db0904ee48e59140aa8d0835939a666c1f1 ]

While phylink_pcs_ops :: pcs_get_state does return void, xpcs_get_state()
does check for a non-zero return code from xpcs_get_state_c37_sgmii()
and prints that as a message to the kernel log.

However, a non-zero return code from xpcs_read() is translated into
"return false" (i.e. zero as int) and the I/O error is therefore not
printed. Fix that.

Fixes: b97b5331b8ab ("net: pcs: add C37 SGMII AN support for intel mGbE controller")
Signed-off-by: Vladimir Oltean <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/pcs/pcs-xpcs.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/pcs/pcs-xpcs.c b/drivers/net/pcs/pcs-xpcs.c
index 61418d4dc0cd..8768f6e34846 100644
--- a/drivers/net/pcs/pcs-xpcs.c
+++ b/drivers/net/pcs/pcs-xpcs.c
@@ -898,7 +898,7 @@ static int xpcs_get_state_c37_sgmii(struct dw_xpcs *xpcs,
*/
ret = xpcs_read(xpcs, MDIO_MMD_VEND2, DW_VR_MII_AN_INTR_STS);
if (ret < 0)
- return false;
+ return ret;

if (ret & DW_VR_MII_C37_ANSGM_SP_LNKSTS) {
int speed_value;
--
2.35.1




2022-08-01 13:16:42

by Greg KH

[permalink] [raw]
Subject: [PATCH 5.18 69/88] netfilter: nf_queue: do not allow packet truncation below transport header offset

From: Florian Westphal <[email protected]>

[ Upstream commit 99a63d36cb3ed5ca3aa6fcb64cffbeaf3b0fb164 ]

Domingo Dirutigliano and Nicola Guerrera report kernel panic when
sending nf_queue verdict with 1-byte nfta_payload attribute.

The IP/IPv6 stack pulls the IP(v6) header from the packet after the
input hook.

If user truncates the packet below the header size, this skb_pull() will
result in a malformed skb (skb->len < 0).

Fixes: 7af4cc3fa158 ("[NETFILTER]: Add "nfnetlink_queue" netfilter queue handler over nfnetlink")
Reported-by: Domingo Dirutigliano <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>
Reviewed-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/netfilter/nfnetlink_queue.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c
index a364f8e5e698..87a9009d5234 100644
--- a/net/netfilter/nfnetlink_queue.c
+++ b/net/netfilter/nfnetlink_queue.c
@@ -843,11 +843,16 @@ nfqnl_enqueue_packet(struct nf_queue_entry *entry, unsigned int queuenum)
}

static int
-nfqnl_mangle(void *data, int data_len, struct nf_queue_entry *e, int diff)
+nfqnl_mangle(void *data, unsigned int data_len, struct nf_queue_entry *e, int diff)
{
struct sk_buff *nskb;

if (diff < 0) {
+ unsigned int min_len = skb_transport_offset(e->skb);
+
+ if (data_len < min_len)
+ return -EINVAL;
+
if (pskb_trim(e->skb, data_len))
return -ENOMEM;
} else if (diff > 0) {
--
2.35.1




2022-08-01 13:16:54

by Greg KH

[permalink] [raw]
Subject: [PATCH 5.18 64/88] ipv4: Fix data-races around sysctl_fib_notify_on_flag_change.

From: Kuniyuki Iwashima <[email protected]>

[ Upstream commit 96b9bd8c6d125490f9adfb57d387ef81a55a103e ]

While reading sysctl_fib_notify_on_flag_change, it can be changed
concurrently. Thus, we need to add READ_ONCE() to its readers.

Fixes: 680aea08e78c ("net: ipv4: Emit notification when fib hardware flags are changed")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/ipv4/fib_trie.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 43a496272227..c1b53854047b 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -1042,6 +1042,7 @@ fib_find_matching_alias(struct net *net, const struct fib_rt_info *fri)

void fib_alias_hw_flags_set(struct net *net, const struct fib_rt_info *fri)
{
+ u8 fib_notify_on_flag_change;
struct fib_alias *fa_match;
struct sk_buff *skb;
int err;
@@ -1063,14 +1064,16 @@ void fib_alias_hw_flags_set(struct net *net, const struct fib_rt_info *fri)
WRITE_ONCE(fa_match->offload, fri->offload);
WRITE_ONCE(fa_match->trap, fri->trap);

+ fib_notify_on_flag_change = READ_ONCE(net->ipv4.sysctl_fib_notify_on_flag_change);
+
/* 2 means send notifications only if offload_failed was changed. */
- if (net->ipv4.sysctl_fib_notify_on_flag_change == 2 &&
+ if (fib_notify_on_flag_change == 2 &&
READ_ONCE(fa_match->offload_failed) == fri->offload_failed)
goto out;

WRITE_ONCE(fa_match->offload_failed, fri->offload_failed);

- if (!net->ipv4.sysctl_fib_notify_on_flag_change)
+ if (!fib_notify_on_flag_change)
goto out;

skb = nlmsg_new(fib_nlmsg_size(fa_match->fa_info), GFP_ATOMIC);
--
2.35.1




2022-08-01 13:17:45

by Greg KH

[permalink] [raw]
Subject: [PATCH 5.18 13/88] asm-generic: remove a broken and needless ifdef conditional

From: Lukas Bulwahn <[email protected]>

commit e2a619ca0b38f2114347b7078b8a67d72d457a3d upstream.

Commit 527701eda5f1 ("lib: Add a generic version of devmem_is_allowed()")
introduces the config symbol GENERIC_LIB_DEVMEM_IS_ALLOWED, but then
falsely refers to CONFIG_GENERIC_DEVMEM_IS_ALLOWED (note the missing LIB
in the reference) in ./include/asm-generic/io.h.

Luckily, ./scripts/checkkconfigsymbols.py warns on non-existing configs:

GENERIC_DEVMEM_IS_ALLOWED
Referencing files: include/asm-generic/io.h

The actual fix, though, is simply to not to make this function declaration
dependent on any kernel config. For architectures that intend to use
the generic version, the arch's 'select GENERIC_LIB_DEVMEM_IS_ALLOWED' will
lead to picking the function definition, and for other architectures, this
function is simply defined elsewhere.

The wrong '#ifndef' on a non-existing config symbol also always had the
same effect (although more by mistake than by intent). So, there is no
functional change.

Remove this broken and needless ifdef conditional.

Fixes: 527701eda5f1 ("lib: Add a generic version of devmem_is_allowed()")
Signed-off-by: Lukas Bulwahn <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/asm-generic/io.h | 2 --
1 file changed, 2 deletions(-)

--- a/include/asm-generic/io.h
+++ b/include/asm-generic/io.h
@@ -1125,9 +1125,7 @@ static inline void memcpy_toio(volatile
}
#endif

-#ifndef CONFIG_GENERIC_DEVMEM_IS_ALLOWED
extern int devmem_is_allowed(unsigned long pfn);
-#endif

#endif /* __KERNEL__ */




2022-08-01 13:18:45

by Greg KH

[permalink] [raw]
Subject: [PATCH 5.18 29/88] ice: do not setup vlan for loopback VSI

From: Maciej Fijalkowski <[email protected]>

commit cc019545a238518fa9da1e2a889f6e1bb1005a63 upstream.

Currently loopback test is failiing due to the error returned from
ice_vsi_vlan_setup(). Skip calling it when preparing loopback VSI.

Fixes: 0e674aeb0b77 ("ice: Add handler for ethtool selftest")
Signed-off-by: Maciej Fijalkowski <[email protected]>
Tested-by: George Kuruvinakunnel <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/intel/ice/ice_main.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)

--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -5994,10 +5994,12 @@ int ice_vsi_cfg(struct ice_vsi *vsi)
if (vsi->netdev) {
ice_set_rx_mode(vsi->netdev);

- err = ice_vsi_vlan_setup(vsi);
+ if (vsi->type != ICE_VSI_LB) {
+ err = ice_vsi_vlan_setup(vsi);

- if (err)
- return err;
+ if (err)
+ return err;
+ }
}
ice_vsi_cfg_dcb_rings(vsi);




2022-08-01 13:18:47

by Greg KH

[permalink] [raw]
Subject: [PATCH 5.18 05/88] Revert "ocfs2: mount shared volume without ha stack"

From: Junxiao Bi <[email protected]>

commit c80af0c250c8f8a3c978aa5aafbe9c39b336b813 upstream.

This reverts commit 912f655d78c5d4ad05eac287f23a435924df7144.

This commit introduced a regression that can cause mount hung. The
changes in __ocfs2_find_empty_slot causes that any node with none-zero
node number can grab the slot that was already taken by node 0, so node 1
will access the same journal with node 0, when it try to grab journal
cluster lock, it will hung because it was already acquired by node 0.
It's very easy to reproduce this, in one cluster, mount node 0 first, then
node 1, you will see the following call trace from node 1.

[13148.735424] INFO: task mount.ocfs2:53045 blocked for more than 122 seconds.
[13148.739691] Not tainted 5.15.0-2148.0.4.el8uek.mountracev2.x86_64 #2
[13148.742560] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[13148.745846] task:mount.ocfs2 state:D stack: 0 pid:53045 ppid: 53044 flags:0x00004000
[13148.749354] Call Trace:
[13148.750718] <TASK>
[13148.752019] ? usleep_range+0x90/0x89
[13148.753882] __schedule+0x210/0x567
[13148.755684] schedule+0x44/0xa8
[13148.757270] schedule_timeout+0x106/0x13c
[13148.759273] ? __prepare_to_swait+0x53/0x78
[13148.761218] __wait_for_common+0xae/0x163
[13148.763144] __ocfs2_cluster_lock.constprop.0+0x1d6/0x870 [ocfs2]
[13148.765780] ? ocfs2_inode_lock_full_nested+0x18d/0x398 [ocfs2]
[13148.768312] ocfs2_inode_lock_full_nested+0x18d/0x398 [ocfs2]
[13148.770968] ocfs2_journal_init+0x91/0x340 [ocfs2]
[13148.773202] ocfs2_check_volume+0x39/0x461 [ocfs2]
[13148.775401] ? iput+0x69/0xba
[13148.777047] ocfs2_mount_volume.isra.0.cold+0x40/0x1f5 [ocfs2]
[13148.779646] ocfs2_fill_super+0x54b/0x853 [ocfs2]
[13148.781756] mount_bdev+0x190/0x1b7
[13148.783443] ? ocfs2_remount+0x440/0x440 [ocfs2]
[13148.785634] legacy_get_tree+0x27/0x48
[13148.787466] vfs_get_tree+0x25/0xd0
[13148.789270] do_new_mount+0x18c/0x2d9
[13148.791046] __x64_sys_mount+0x10e/0x142
[13148.792911] do_syscall_64+0x3b/0x89
[13148.794667] entry_SYSCALL_64_after_hwframe+0x170/0x0
[13148.797051] RIP: 0033:0x7f2309f6e26e
[13148.798784] RSP: 002b:00007ffdcee7d408 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
[13148.801974] RAX: ffffffffffffffda RBX: 00007ffdcee7d4a0 RCX: 00007f2309f6e26e
[13148.804815] RDX: 0000559aa762a8ae RSI: 0000559aa939d340 RDI: 0000559aa93a22b0
[13148.807719] RBP: 00007ffdcee7d5b0 R08: 0000559aa93a2290 R09: 00007f230a0b4820
[13148.810659] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffdcee7d420
[13148.813609] R13: 0000000000000000 R14: 0000559aa939f000 R15: 0000000000000000
[13148.816564] </TASK>

To fix it, we can just fix __ocfs2_find_empty_slot. But original commit
introduced the feature to mount ocfs2 locally even it is cluster based,
that is a very dangerous, it can easily cause serious data corruption,
there is no way to stop other nodes mounting the fs and corrupting it.
Setup ha or other cluster-aware stack is just the cost that we have to
take for avoiding corruption, otherwise we have to do it in kernel.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 912f655d78c5("ocfs2: mount shared volume without ha stack")
Signed-off-by: Junxiao Bi <[email protected]>
Acked-by: Joseph Qi <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Gang He <[email protected]>
Cc: Jun Piao <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/ocfs2/ocfs2.h | 4 +---
fs/ocfs2/slot_map.c | 46 +++++++++++++++++++---------------------------
fs/ocfs2/super.c | 21 ---------------------
3 files changed, 20 insertions(+), 51 deletions(-)

--- a/fs/ocfs2/ocfs2.h
+++ b/fs/ocfs2/ocfs2.h
@@ -277,7 +277,6 @@ enum ocfs2_mount_options
OCFS2_MOUNT_JOURNAL_ASYNC_COMMIT = 1 << 15, /* Journal Async Commit */
OCFS2_MOUNT_ERRORS_CONT = 1 << 16, /* Return EIO to the calling process on error */
OCFS2_MOUNT_ERRORS_ROFS = 1 << 17, /* Change filesystem to read-only on error */
- OCFS2_MOUNT_NOCLUSTER = 1 << 18, /* No cluster aware filesystem mount */
};

#define OCFS2_OSB_SOFT_RO 0x0001
@@ -673,8 +672,7 @@ static inline int ocfs2_cluster_o2cb_glo

static inline int ocfs2_mount_local(struct ocfs2_super *osb)
{
- return ((osb->s_feature_incompat & OCFS2_FEATURE_INCOMPAT_LOCAL_MOUNT)
- || (osb->s_mount_opt & OCFS2_MOUNT_NOCLUSTER));
+ return (osb->s_feature_incompat & OCFS2_FEATURE_INCOMPAT_LOCAL_MOUNT);
}

static inline int ocfs2_uses_extended_slot_map(struct ocfs2_super *osb)
--- a/fs/ocfs2/slot_map.c
+++ b/fs/ocfs2/slot_map.c
@@ -252,16 +252,14 @@ static int __ocfs2_find_empty_slot(struc
int i, ret = -ENOSPC;

if ((preferred >= 0) && (preferred < si->si_num_slots)) {
- if (!si->si_slots[preferred].sl_valid ||
- !si->si_slots[preferred].sl_node_num) {
+ if (!si->si_slots[preferred].sl_valid) {
ret = preferred;
goto out;
}
}

for(i = 0; i < si->si_num_slots; i++) {
- if (!si->si_slots[i].sl_valid ||
- !si->si_slots[i].sl_node_num) {
+ if (!si->si_slots[i].sl_valid) {
ret = i;
break;
}
@@ -456,30 +454,24 @@ int ocfs2_find_slot(struct ocfs2_super *
spin_lock(&osb->osb_lock);
ocfs2_update_slot_info(si);

- if (ocfs2_mount_local(osb))
- /* use slot 0 directly in local mode */
- slot = 0;
- else {
- /* search for ourselves first and take the slot if it already
- * exists. Perhaps we need to mark this in a variable for our
- * own journal recovery? Possibly not, though we certainly
- * need to warn to the user */
- slot = __ocfs2_node_num_to_slot(si, osb->node_num);
+ /* search for ourselves first and take the slot if it already
+ * exists. Perhaps we need to mark this in a variable for our
+ * own journal recovery? Possibly not, though we certainly
+ * need to warn to the user */
+ slot = __ocfs2_node_num_to_slot(si, osb->node_num);
+ if (slot < 0) {
+ /* if no slot yet, then just take 1st available
+ * one. */
+ slot = __ocfs2_find_empty_slot(si, osb->preferred_slot);
if (slot < 0) {
- /* if no slot yet, then just take 1st available
- * one. */
- slot = __ocfs2_find_empty_slot(si, osb->preferred_slot);
- if (slot < 0) {
- spin_unlock(&osb->osb_lock);
- mlog(ML_ERROR, "no free slots available!\n");
- status = -EINVAL;
- goto bail;
- }
- } else
- printk(KERN_INFO "ocfs2: Slot %d on device (%s) was "
- "already allocated to this node!\n",
- slot, osb->dev_str);
- }
+ spin_unlock(&osb->osb_lock);
+ mlog(ML_ERROR, "no free slots available!\n");
+ status = -EINVAL;
+ goto bail;
+ }
+ } else
+ printk(KERN_INFO "ocfs2: Slot %d on device (%s) was already "
+ "allocated to this node!\n", slot, osb->dev_str);

ocfs2_set_slot(si, slot, osb->node_num);
osb->slot_num = slot;
--- a/fs/ocfs2/super.c
+++ b/fs/ocfs2/super.c
@@ -172,7 +172,6 @@ enum {
Opt_dir_resv_level,
Opt_journal_async_commit,
Opt_err_cont,
- Opt_nocluster,
Opt_err,
};

@@ -206,7 +205,6 @@ static const match_table_t tokens = {
{Opt_dir_resv_level, "dir_resv_level=%u"},
{Opt_journal_async_commit, "journal_async_commit"},
{Opt_err_cont, "errors=continue"},
- {Opt_nocluster, "nocluster"},
{Opt_err, NULL}
};

@@ -618,13 +616,6 @@ static int ocfs2_remount(struct super_bl
goto out;
}

- tmp = OCFS2_MOUNT_NOCLUSTER;
- if ((osb->s_mount_opt & tmp) != (parsed_options.mount_opt & tmp)) {
- ret = -EINVAL;
- mlog(ML_ERROR, "Cannot change nocluster option on remount\n");
- goto out;
- }
-
tmp = OCFS2_MOUNT_HB_LOCAL | OCFS2_MOUNT_HB_GLOBAL |
OCFS2_MOUNT_HB_NONE;
if ((osb->s_mount_opt & tmp) != (parsed_options.mount_opt & tmp)) {
@@ -865,7 +856,6 @@ static int ocfs2_verify_userspace_stack(
}

if (ocfs2_userspace_stack(osb) &&
- !(osb->s_mount_opt & OCFS2_MOUNT_NOCLUSTER) &&
strncmp(osb->osb_cluster_stack, mopt->cluster_stack,
OCFS2_STACK_LABEL_LEN)) {
mlog(ML_ERROR,
@@ -1144,11 +1134,6 @@ static int ocfs2_fill_super(struct super
osb->s_mount_opt & OCFS2_MOUNT_DATA_WRITEBACK ? "writeback" :
"ordered");

- if ((osb->s_mount_opt & OCFS2_MOUNT_NOCLUSTER) &&
- !(osb->s_feature_incompat & OCFS2_FEATURE_INCOMPAT_LOCAL_MOUNT))
- printk(KERN_NOTICE "ocfs2: The shared device (%s) is mounted "
- "without cluster aware mode.\n", osb->dev_str);
-
atomic_set(&osb->vol_state, VOLUME_MOUNTED);
wake_up(&osb->osb_mount_event);

@@ -1455,9 +1440,6 @@ static int ocfs2_parse_options(struct su
case Opt_journal_async_commit:
mopt->mount_opt |= OCFS2_MOUNT_JOURNAL_ASYNC_COMMIT;
break;
- case Opt_nocluster:
- mopt->mount_opt |= OCFS2_MOUNT_NOCLUSTER;
- break;
default:
mlog(ML_ERROR,
"Unrecognized mount option \"%s\" "
@@ -1569,9 +1551,6 @@ static int ocfs2_show_options(struct seq
if (opts & OCFS2_MOUNT_JOURNAL_ASYNC_COMMIT)
seq_printf(s, ",journal_async_commit");

- if (opts & OCFS2_MOUNT_NOCLUSTER)
- seq_printf(s, ",nocluster");
-
return 0;
}




2022-08-01 13:18:52

by Greg KH

[permalink] [raw]
Subject: [PATCH 5.18 14/88] s390/archrandom: prevent CPACF trng invocations in interrupt context

From: Harald Freudenberger <[email protected]>

commit 918e75f77af7d2e049bb70469ec0a2c12782d96a upstream.

This patch slightly reworks the s390 arch_get_random_seed_{int,long}
implementation: Make sure the CPACF trng instruction is never
called in any interrupt context. This is done by adding an
additional condition in_task().

Justification:

There are some constrains to satisfy for the invocation of the
arch_get_random_seed_{int,long}() functions:
- They should provide good random data during kernel initialization.
- They should not be called in interrupt context as the TRNG
instruction is relatively heavy weight and may for example
make some network loads cause to timeout and buck.

However, it was not clear what kind of interrupt context is exactly
encountered during kernel init or network traffic eventually calling
arch_get_random_seed_long().

After some days of investigations it is clear that the s390
start_kernel function is not running in any interrupt context and
so the trng is called:

Jul 11 18:33:39 t35lp54 kernel: [<00000001064e90ca>] arch_get_random_seed_long.part.0+0x32/0x70
Jul 11 18:33:39 t35lp54 kernel: [<000000010715f246>] random_init+0xf6/0x238
Jul 11 18:33:39 t35lp54 kernel: [<000000010712545c>] start_kernel+0x4a4/0x628
Jul 11 18:33:39 t35lp54 kernel: [<000000010590402a>] startup_continue+0x2a/0x40

The condition in_task() is true and the CPACF trng provides random data
during kernel startup.

The network traffic however, is more difficult. A typical call stack
looks like this:

Jul 06 17:37:07 t35lp54 kernel: [<000000008b5600fc>] extract_entropy.constprop.0+0x23c/0x240
Jul 06 17:37:07 t35lp54 kernel: [<000000008b560136>] crng_reseed+0x36/0xd8
Jul 06 17:37:07 t35lp54 kernel: [<000000008b5604b8>] crng_make_state+0x78/0x340
Jul 06 17:37:07 t35lp54 kernel: [<000000008b5607e0>] _get_random_bytes+0x60/0xf8
Jul 06 17:37:07 t35lp54 kernel: [<000000008b56108a>] get_random_u32+0xda/0x248
Jul 06 17:37:07 t35lp54 kernel: [<000000008aefe7a8>] kfence_guarded_alloc+0x48/0x4b8
Jul 06 17:37:07 t35lp54 kernel: [<000000008aeff35e>] __kfence_alloc+0x18e/0x1b8
Jul 06 17:37:07 t35lp54 kernel: [<000000008aef7f10>] __kmalloc_node_track_caller+0x368/0x4d8
Jul 06 17:37:07 t35lp54 kernel: [<000000008b611eac>] kmalloc_reserve+0x44/0xa0
Jul 06 17:37:07 t35lp54 kernel: [<000000008b611f98>] __alloc_skb+0x90/0x178
Jul 06 17:37:07 t35lp54 kernel: [<000000008b6120dc>] __napi_alloc_skb+0x5c/0x118
Jul 06 17:37:07 t35lp54 kernel: [<000000008b8f06b4>] qeth_extract_skb+0x13c/0x680
Jul 06 17:37:07 t35lp54 kernel: [<000000008b8f6526>] qeth_poll+0x256/0x3f8
Jul 06 17:37:07 t35lp54 kernel: [<000000008b63d76e>] __napi_poll.constprop.0+0x46/0x2f8
Jul 06 17:37:07 t35lp54 kernel: [<000000008b63dbec>] net_rx_action+0x1cc/0x408
Jul 06 17:37:07 t35lp54 kernel: [<000000008b937302>] __do_softirq+0x132/0x6b0
Jul 06 17:37:07 t35lp54 kernel: [<000000008abf46ce>] __irq_exit_rcu+0x13e/0x170
Jul 06 17:37:07 t35lp54 kernel: [<000000008abf531a>] irq_exit_rcu+0x22/0x50
Jul 06 17:37:07 t35lp54 kernel: [<000000008b922506>] do_io_irq+0xe6/0x198
Jul 06 17:37:07 t35lp54 kernel: [<000000008b935826>] io_int_handler+0xd6/0x110
Jul 06 17:37:07 t35lp54 kernel: [<000000008b9358a6>] psw_idle_exit+0x0/0xa
Jul 06 17:37:07 t35lp54 kernel: ([<000000008ab9c59a>] arch_cpu_idle+0x52/0xe0)
Jul 06 17:37:07 t35lp54 kernel: [<000000008b933cfe>] default_idle_call+0x6e/0xd0
Jul 06 17:37:07 t35lp54 kernel: [<000000008ac59f4e>] do_idle+0xf6/0x1b0
Jul 06 17:37:07 t35lp54 kernel: [<000000008ac5a28e>] cpu_startup_entry+0x36/0x40
Jul 06 17:37:07 t35lp54 kernel: [<000000008abb0d90>] smp_start_secondary+0x148/0x158
Jul 06 17:37:07 t35lp54 kernel: [<000000008b935b9e>] restart_int_handler+0x6e/0x90

which confirms that the call is in softirq context. So in_task() covers exactly
the cases where we want to have CPACF trng called: not in nmi, not in hard irq,
not in soft irq but in normal task context and during kernel init.

Signed-off-by: Harald Freudenberger <[email protected]>
Acked-by: Jason A. Donenfeld <[email protected]>
Reviewed-by: Juergen Christ <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Fixes: e4f74400308c ("s390/archrandom: simplify back to earlier design and initialize earlier")
[[email protected] changed desc, added Fixes and Link, removed -stable]
Signed-off-by: Alexander Gordeev <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/s390/include/asm/archrandom.h | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)

--- a/arch/s390/include/asm/archrandom.h
+++ b/arch/s390/include/asm/archrandom.h
@@ -2,7 +2,7 @@
/*
* Kernel interface for the s390 arch_random_* functions
*
- * Copyright IBM Corp. 2017, 2020
+ * Copyright IBM Corp. 2017, 2022
*
* Author: Harald Freudenberger <[email protected]>
*
@@ -14,6 +14,7 @@
#ifdef CONFIG_ARCH_RANDOM

#include <linux/static_key.h>
+#include <linux/preempt.h>
#include <linux/atomic.h>
#include <asm/cpacf.h>

@@ -32,7 +33,8 @@ static inline bool __must_check arch_get

static inline bool __must_check arch_get_random_seed_long(unsigned long *v)
{
- if (static_branch_likely(&s390_arch_random_available)) {
+ if (static_branch_likely(&s390_arch_random_available) &&
+ in_task()) {
cpacf_trng(NULL, 0, (u8 *)v, sizeof(*v));
atomic64_add(sizeof(*v), &s390_arch_random_counter);
return true;
@@ -42,7 +44,8 @@ static inline bool __must_check arch_get

static inline bool __must_check arch_get_random_seed_int(unsigned int *v)
{
- if (static_branch_likely(&s390_arch_random_available)) {
+ if (static_branch_likely(&s390_arch_random_available) &&
+ in_task()) {
cpacf_trng(NULL, 0, (u8 *)v, sizeof(*v));
atomic64_add(sizeof(*v), &s390_arch_random_counter);
return true;



2022-08-01 13:19:00

by Greg KH

[permalink] [raw]
Subject: [PATCH 5.18 81/88] tcp: Fix data-races around sysctl_tcp_workaround_signed_windows.

From: Kuniyuki Iwashima <[email protected]>

[ Upstream commit 0f1e4d06591d0a7907c71f7b6d1c79f8a4de8098 ]

While reading sysctl_tcp_workaround_signed_windows, it can be changed
concurrently. Thus, we need to add READ_ONCE() to its readers.

Fixes: 15d99e02baba ("[TCP]: sysctl to allow TCP window > 32767 sans wscale")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/ipv4/tcp_output.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 66836b8bd46f..a7f0a1f0c2a3 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -227,7 +227,7 @@ void tcp_select_initial_window(const struct sock *sk, int __space, __u32 mss,
* which we interpret as a sign the remote TCP is not
* misinterpreting the window field as a signed quantity.
*/
- if (sock_net(sk)->ipv4.sysctl_tcp_workaround_signed_windows)
+ if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_workaround_signed_windows))
(*rcv_wnd) = min(space, MAX_TCP_WINDOW);
else
(*rcv_wnd) = min_t(u32, space, U16_MAX);
@@ -282,7 +282,7 @@ static u16 tcp_select_window(struct sock *sk)
* scaled window.
*/
if (!tp->rx_opt.rcv_wscale &&
- sock_net(sk)->ipv4.sysctl_tcp_workaround_signed_windows)
+ READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_workaround_signed_windows))
new_win = min(new_win, MAX_TCP_WINDOW);
else
new_win = min(new_win, (65535U << tp->rx_opt.rcv_wscale));
--
2.35.1




2022-08-01 13:19:05

by Greg KH

[permalink] [raw]
Subject: [PATCH 5.18 20/88] tcp: Fix data-races around sysctl_tcp_dsack.

From: Kuniyuki Iwashima <[email protected]>

commit 58ebb1c8b35a8ef38cd6927431e0fa7b173a632d upstream.

While reading sysctl_tcp_dsack, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv4/tcp_input.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4426,7 +4426,7 @@ static void tcp_dsack_set(struct sock *s
{
struct tcp_sock *tp = tcp_sk(sk);

- if (tcp_is_sack(tp) && sock_net(sk)->ipv4.sysctl_tcp_dsack) {
+ if (tcp_is_sack(tp) && READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_dsack)) {
int mib_idx;

if (before(seq, tp->rcv_nxt))
@@ -4473,7 +4473,7 @@ static void tcp_send_dupack(struct sock
NET_INC_STATS(sock_net(sk), LINUX_MIB_DELAYEDACKLOST);
tcp_enter_quickack_mode(sk, TCP_MAX_QUICKACKS);

- if (tcp_is_sack(tp) && sock_net(sk)->ipv4.sysctl_tcp_dsack) {
+ if (tcp_is_sack(tp) && READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_dsack)) {
u32 end_seq = TCP_SKB_CB(skb)->end_seq;

tcp_rcv_spurious_retrans(sk, skb);



2022-08-01 13:19:32

by Greg KH

[permalink] [raw]
Subject: [PATCH 5.18 85/88] EDAC/synopsys: Re-enable the error interrupts on v3 hw

From: Sherry Sun <[email protected]>

commit 4bcffe941758ee17becb43af3b25487f848f6512 upstream.

zynqmp_get_error_info() writes 0 to the ECC_CLR_OFST register after
an interrupt for a {un-,}correctable error is raised, which disables
the error interrupts. Then the interrupt handler will be called only
once. Therefore, re-enable the error interrupt line at the end of
intr_handler() for v3.x Synopsys EDAC DDR.

Fixes: f7824ded4149 ("EDAC/synopsys: Add support for version 3 of the Synopsys EDAC DDR")
Signed-off-by: Sherry Sun <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Shubhrajyoti Datta <[email protected]>
Acked-by: Michal Simek <[email protected]>
Cc: <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/edac/synopsys_edac.c | 47 ++++++++++++++++++++++---------------------
1 file changed, 25 insertions(+), 22 deletions(-)

--- a/drivers/edac/synopsys_edac.c
+++ b/drivers/edac/synopsys_edac.c
@@ -527,6 +527,28 @@ static void handle_error(struct mem_ctl_
memset(p, 0, sizeof(*p));
}

+static void enable_intr(struct synps_edac_priv *priv)
+{
+ /* Enable UE/CE Interrupts */
+ if (priv->p_data->quirks & DDR_ECC_INTR_SELF_CLEAR)
+ writel(DDR_UE_MASK | DDR_CE_MASK,
+ priv->baseaddr + ECC_CLR_OFST);
+ else
+ writel(DDR_QOSUE_MASK | DDR_QOSCE_MASK,
+ priv->baseaddr + DDR_QOS_IRQ_EN_OFST);
+
+}
+
+static void disable_intr(struct synps_edac_priv *priv)
+{
+ /* Disable UE/CE Interrupts */
+ if (priv->p_data->quirks & DDR_ECC_INTR_SELF_CLEAR)
+ writel(0x0, priv->baseaddr + ECC_CLR_OFST);
+ else
+ writel(DDR_QOSUE_MASK | DDR_QOSCE_MASK,
+ priv->baseaddr + DDR_QOS_IRQ_DB_OFST);
+}
+
/**
* intr_handler - Interrupt Handler for ECC interrupts.
* @irq: IRQ number.
@@ -568,6 +590,9 @@ static irqreturn_t intr_handler(int irq,
/* v3.0 of the controller does not have this register */
if (!(priv->p_data->quirks & DDR_ECC_INTR_SELF_CLEAR))
writel(regval, priv->baseaddr + DDR_QOS_IRQ_STAT_OFST);
+ else
+ enable_intr(priv);
+
return IRQ_HANDLED;
}

@@ -850,28 +875,6 @@ static void mc_init(struct mem_ctl_info
init_csrows(mci);
}

-static void enable_intr(struct synps_edac_priv *priv)
-{
- /* Enable UE/CE Interrupts */
- if (priv->p_data->quirks & DDR_ECC_INTR_SELF_CLEAR)
- writel(DDR_UE_MASK | DDR_CE_MASK,
- priv->baseaddr + ECC_CLR_OFST);
- else
- writel(DDR_QOSUE_MASK | DDR_QOSCE_MASK,
- priv->baseaddr + DDR_QOS_IRQ_EN_OFST);
-
-}
-
-static void disable_intr(struct synps_edac_priv *priv)
-{
- /* Disable UE/CE Interrupts */
- if (priv->p_data->quirks & DDR_ECC_INTR_SELF_CLEAR)
- writel(0x0, priv->baseaddr + ECC_CLR_OFST);
- else
- writel(DDR_QOSUE_MASK | DDR_QOSCE_MASK,
- priv->baseaddr + DDR_QOS_IRQ_DB_OFST);
-}
-
static int setup_irq(struct mem_ctl_info *mci,
struct platform_device *pdev)
{



2022-08-01 13:19:33

by Greg KH

[permalink] [raw]
Subject: [PATCH 5.18 17/88] drm/simpledrm: Fix return type of simpledrm_simple_display_pipe_mode_valid()

From: Nathan Chancellor <[email protected]>

commit 0c09bc33aa8e9dc867300acaadc318c2f0d85a1e upstream.

When booting a kernel compiled with clang's CFI protection
(CONFIG_CFI_CLANG), there is a CFI failure in
drm_simple_kms_crtc_mode_valid() when trying to call
simpledrm_simple_display_pipe_mode_valid() through ->mode_valid():

[ 0.322802] CFI failure (target: simpledrm_simple_display_pipe_mode_valid+0x0/0x8):
...
[ 0.324928] Call trace:
[ 0.324969] __ubsan_handle_cfi_check_fail+0x58/0x60
[ 0.325053] __cfi_check_fail+0x3c/0x44
[ 0.325120] __cfi_slowpath_diag+0x178/0x200
[ 0.325192] drm_simple_kms_crtc_mode_valid+0x58/0x80
[ 0.325279] __drm_helper_update_and_validate+0x31c/0x464
...

The ->mode_valid() member in 'struct drm_simple_display_pipe_funcs'
expects a return type of 'enum drm_mode_status', not 'int'. Correct it
to fix the CFI failure.

Cc: [email protected]
Fixes: 11e8f5fd223b ("drm: Add simpledrm driver")
Link: https://github.com/ClangBuiltLinux/linux/issues/1647
Reported-by: Tomasz Paweł Gajc <[email protected]>
Signed-off-by: Nathan Chancellor <[email protected]>
Signed-off-by: Thomas Zimmermann <[email protected]>
Reviewed-by: Sami Tolvanen <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/gpu/drm/tiny/simpledrm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/gpu/drm/tiny/simpledrm.c
+++ b/drivers/gpu/drm/tiny/simpledrm.c
@@ -627,7 +627,7 @@ static const struct drm_connector_funcs
.atomic_destroy_state = drm_atomic_helper_connector_destroy_state,
};

-static int
+static enum drm_mode_status
simpledrm_simple_display_pipe_mode_valid(struct drm_simple_display_pipe *pipe,
const struct drm_display_mode *mode)
{



2022-08-01 13:19:35

by Greg KH

[permalink] [raw]
Subject: [PATCH 5.18 08/88] fs: sendfile handles O_NONBLOCK of out_fd

From: Andrei Vagin <[email protected]>

commit bdeb77bc2c405fa9f954c20269db175a0bd2793f upstream.

sendfile has to return EAGAIN if out_fd is nonblocking and the write into
it would block.

Here is a small reproducer for the problem:

#define _GNU_SOURCE /* See feature_test_macros(7) */
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#include <errno.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/sendfile.h>


#define FILE_SIZE (1UL << 30)
int main(int argc, char **argv) {
int p[2], fd;

if (pipe2(p, O_NONBLOCK))
return 1;

fd = open(argv[1], O_RDWR | O_TMPFILE, 0666);
if (fd < 0)
return 1;
ftruncate(fd, FILE_SIZE);

if (sendfile(p[1], fd, 0, FILE_SIZE) == -1) {
fprintf(stderr, "FAIL\n");
}
if (sendfile(p[1], fd, 0, FILE_SIZE) != -1 || errno != EAGAIN) {
fprintf(stderr, "FAIL\n");
}
return 0;
}

It worked before b964bf53e540, it is stuck after b964bf53e540, and it
works again with this fix.

This regression occurred because do_splice_direct() calls pipe_write
that handles O_NONBLOCK. Here is a trace log from the reproducer:

1) | __x64_sys_sendfile64() {
1) | do_sendfile() {
1) | __fdget()
1) | rw_verify_area()
1) | __fdget()
1) | rw_verify_area()
1) | do_splice_direct() {
1) | rw_verify_area()
1) | splice_direct_to_actor() {
1) | do_splice_to() {
1) | rw_verify_area()
1) | generic_file_splice_read()
1) + 74.153 us | }
1) | direct_splice_actor() {
1) | iter_file_splice_write() {
1) | __kmalloc()
1) 0.148 us | pipe_lock();
1) 0.153 us | splice_from_pipe_next.part.0();
1) 0.162 us | page_cache_pipe_buf_confirm();
... 16 times
1) 0.159 us | page_cache_pipe_buf_confirm();
1) | vfs_iter_write() {
1) | do_iter_write() {
1) | rw_verify_area()
1) | do_iter_readv_writev() {
1) | pipe_write() {
1) | mutex_lock()
1) 0.153 us | mutex_unlock();
1) 1.368 us | }
1) 1.686 us | }
1) 5.798 us | }
1) 6.084 us | }
1) 0.174 us | kfree();
1) 0.152 us | pipe_unlock();
1) + 14.461 us | }
1) + 14.783 us | }
1) 0.164 us | page_cache_pipe_buf_release();
... 16 times
1) 0.161 us | page_cache_pipe_buf_release();
1) | touch_atime()
1) + 95.854 us | }
1) + 99.784 us | }
1) ! 107.393 us | }
1) ! 107.699 us | }

Link: https://lkml.kernel.org/r/[email protected]
Fixes: b964bf53e540 ("teach sendfile(2) to handle send-to-pipe directly")
Signed-off-by: Andrei Vagin <[email protected]>
Cc: Al Viro <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/read_write.c | 3 +++
1 file changed, 3 insertions(+)

--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1247,6 +1247,9 @@ static ssize_t do_sendfile(int out_fd, i
count, fl);
file_end_write(out.file);
} else {
+ if (out.file->f_flags & O_NONBLOCK)
+ fl |= SPLICE_F_NONBLOCK;
+
retval = splice_file_to_pipe(in.file, opipe, &pos, count, fl);
}




2022-08-01 17:44:16

by Ronald Warsow

[permalink] [raw]
Subject: Re: [PATCH 5.18 00/88] 5.18.16-rc1 review

hallo Greg

5.18.16-rc1

compiles, boots and runs here on x86_64
(Intel i5-11400, Fedora 36)

Thanks

Tested-by: Ronald Warsow <[email protected]>

2022-08-01 19:57:35

by Daniel Díaz

[permalink] [raw]
Subject: Re: [PATCH 5.18 00/88] 5.18.16-rc1 review

Hello!

On 01/08/22 06:46, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 5.18.16 release.
> There are 88 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed, 03 Aug 2022 11:41:16 +0000.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.18.16-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.18.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h

Results from Linaro's test farm.
No regressions on arm64, arm, x86_64, and i386.

## Build
* kernel: 5.18.16-rc1
* git: https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc
* git branch: linux-5.18.y
* git commit: 7e8a7b1c98057a3014222a505c28c6bd43ed5666
* git describe: v5.18.14-248-g7e8a7b1c9805
* test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.18.y/build/v5.18.14-248-g7e8a7b1c9805

## No test regressions (compared to v5.18.14-159-g63d1be154edd)

## No metric regressions (compared to v5.18.14-159-g63d1be154edd)

## No test fixes (compared to v5.18.14-159-g63d1be154edd)

## No metric fixes (compared to v5.18.14-159-g63d1be154edd)

## Test result summary
total: 136635, pass: 122379, fail: 825, skip: 12686, xfail: 745

## Build Summary
* arc: 10 total, 10 passed, 0 failed
* arm: 311 total, 308 passed, 3 failed
* arm64: 68 total, 66 passed, 2 failed
* i386: 57 total, 51 passed, 6 failed
* mips: 50 total, 47 passed, 3 failed
* parisc: 14 total, 14 passed, 0 failed
* powerpc: 65 total, 56 passed, 9 failed
* riscv: 32 total, 27 passed, 5 failed
* s390: 23 total, 20 passed, 3 failed
* sh: 26 total, 24 passed, 2 failed
* sparc: 14 total, 14 passed, 0 failed
* x86_64: 61 total, 59 passed, 2 failed

## Test suites summary
* fwts
* igt-gpu-tools
* kunit
* kvm-unit-tests
* libgpiod
* libhugetlbfs
* log-parser-boot
* log-parser-test
* ltp-cap_bounds
* ltp-commands
* ltp-containers
* ltp-controllers
* ltp-cpuhotplug
* ltp-crypto
* ltp-cve
* ltp-dio
* ltp-fcntl-locktests
* ltp-filecaps
* ltp-fs
* ltp-fs_bind
* ltp-fs_perms_simple
* ltp-fsx
* ltp-hugetlb
* ltp-io
* ltp-ipc
* ltp-math
* ltp-mm
* ltp-nptl
* ltp-open-posix-tests
* ltp-pty
* ltp-sched
* ltp-securebits
* ltp-smoke
* ltp-syscalls
* ltp-tracing
* network-basic-tests
* packetdrill
* rcutorture
* ssuite
* v4l2-compliance
* vdso


Greetings!

Daniel Díaz
[email protected]

--
Linaro LKFT
https://lkft.linaro.org

2022-08-01 22:57:31

by Florian Fainelli

[permalink] [raw]
Subject: Re: [PATCH 5.18 00/88] 5.18.16-rc1 review

On 8/1/22 04:46, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 5.18.16 release.
> There are 88 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed, 03 Aug 2022 11:41:16 +0000.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.18.16-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.18.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h

On ARCH_BRCMSTB using 32-bit and 64-bit ARM kernels, build tested on BMIPS_GENERIC:

Tested-by: Florian Fainelli <[email protected]>
--
Florian

2022-08-01 22:58:37

by Shuah Khan

[permalink] [raw]
Subject: Re: [PATCH 5.18 00/88] 5.18.16-rc1 review

On 8/1/22 5:46 AM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 5.18.16 release.
> There are 88 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed, 03 Aug 2022 11:41:16 +0000.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.18.16-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.18.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h
>

Compiled and booted on my test system. No dmesg regressions.

Tested-by: Shuah Khan <[email protected]>

thanks,
-- Shuah

2022-08-01 23:30:02

by Zan Aziz

[permalink] [raw]
Subject: Re: [PATCH 5.18 00/88] 5.18.16-rc1 review

On Mon, Aug 1, 2022 at 1:37 PM Greg Kroah-Hartman
<[email protected]> wrote:
>
> This is the start of the stable review cycle for the 5.18.16 release.
> There are 88 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed, 03 Aug 2022 11:41:16 +0000.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.18.16-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.18.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h

Hi Greg,

Compiled and booted on my test system Lenovo P50s: Intel Core i7
No emergency and critical messages in the dmesg

./perf bench sched all
# Running sched/messaging benchmark...
# 20 sender and receiver processes per group
# 10 groups == 400 processes run

Total time: 0.718 [sec]

# Running sched/pipe benchmark...
# Executed 1000000 pipe operations between two processes

Total time: 13.324 [sec]

13.324520 usecs/op
75049 ops/sec

Tested-by: Zan Aziz <[email protected]>

Thanks
-Zan

2022-08-02 01:03:48

by Ron Economos

[permalink] [raw]
Subject: Re: [PATCH 5.18 00/88] 5.18.16-rc1 review

On 8/1/22 4:46 AM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 5.18.16 release.
> There are 88 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed, 03 Aug 2022 11:41:16 +0000.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.18.16-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.18.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h

Built and booted successfully on RISC-V RV64 (HiFive Unmatched).

Tested-by: Ron Economos <[email protected]>


2022-08-02 05:33:13

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH 5.18 00/88] 5.18.16-rc1 review

On Mon, Aug 01, 2022 at 01:46:14PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 5.18.16 release.
> There are 88 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed, 03 Aug 2022 11:41:16 +0000.
> Anything received after that time might be too late.
>

Build results:
total: 154 pass: 154 fail: 0
Qemu test results:
total: 489 pass: 489 fail: 0

Tested-by: Guenter Roeck <[email protected]>

Guenter

2022-08-02 10:24:19

by Justin Forbes

[permalink] [raw]
Subject: Re: [PATCH 5.18 00/88] 5.18.16-rc1 review

On Mon, Aug 01, 2022 at 01:46:14PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 5.18.16 release.
> There are 88 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed, 03 Aug 2022 11:41:16 +0000.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.18.16-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.18.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h

Tested rc1 against the Fedora build system (aarch64, armv7, ppc64le,
s390x, x86_64), and boot tested x86_64. No regressions noted.

Tested-by: Justin M. Forbes <[email protected]>

2022-08-02 12:38:02

by Bagas Sanjaya

[permalink] [raw]
Subject: Re: [PATCH 5.18 00/88] 5.18.16-rc1 review

On Mon, Aug 01, 2022 at 01:46:14PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 5.18.16 release.
> There are 88 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>

Successfully cross-compiled for arm64 (bcm2711_defconfig, GCC 10.2.0)
and powerpc (ps3_defconfig, GCC 12.1.0).

Tested-by: Bagas Sanjaya <[email protected]>

--
An old man doll... just what I always wanted! - Clara

2022-08-02 17:53:55

by Sudip Mukherjee

[permalink] [raw]
Subject: Re: [PATCH 5.18 00/88] 5.18.16-rc1 review

Hi Greg,

On Mon, Aug 01, 2022 at 01:46:14PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 5.18.16 release.
> There are 88 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed, 03 Aug 2022 11:41:16 +0000.
> Anything received after that time might be too late.

Build test (gcc version 12.1.1 20220724):
mips: 59 configs -> no failure
arm: 99 configs -> no failure
arm64: 3 configs -> no failure
x86_64: 4 configs -> no failure
alpha allmodconfig -> no failure
csky allmodconfig -> no failure
powerpc allmodconfig -> no failure
riscv allmodconfig -> no failure
s390 allmodconfig -> no failure
xtensa allmodconfig -> no failure

Boot test:
x86_64: Booted on my test laptop. No regression.
x86_64: Booted on qemu. No regression. [1]
arm64: Booted on rpi4b (4GB model). No regression. [2]
mips: Booted on ci20 board. No regression. [3]

[1]. https://openqa.qa.codethink.co.uk/tests/1607
[2]. https://openqa.qa.codethink.co.uk/tests/1608
[3]. https://openqa.qa.codethink.co.uk/tests/1611

Tested-by: Sudip Mukherjee <[email protected]>

--
Regards
Sudip