LinuxLists.cc - [PATCH 5.15 00/69] 5.15.59-rc1 review

2022-08-01 12:51:57

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 5.15 00/69] 5.15.59-rc1 review

This is the start of the stable review cycle for the 5.15.59 release.
There are 69 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.

Responses should be made by Wed, 03 Aug 2022 11:41:16 +0000.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.59-rc1.gz
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y
and the diffstat can be found below.

thanks,

greg k-h

-------------
Pseudo-Shortlog of commits:

Greg Kroah-Hartman <[email protected]>
Linux 5.15.59-rc1

Thadeu Lima de Souza Cascardo <[email protected]>
x86/bugs: Do not enable IBPB at firmware entry when IBPB is not available

Waiman Long <[email protected]>
locking/rwsem: Allow slowpath writer to ignore handoff bit if not set by first waiter

Eiichi Tsukata <[email protected]>
docs/kernel-parameters: Update descriptions for "mitigations=" param with retbleed

Toshi Kani <[email protected]>
EDAC/ghes: Set the DIMM label unconditionally

Florian Fainelli <[email protected]>
ARM: 9216/1: Fix MAX_DMA_ADDRESS overflow

Jaewon Kim <[email protected]>
page_alloc: fix invalid watermark check on a negative value

Ralph Campbell <[email protected]>
mm/hmm: fault non-owner device private entries

Greg Kroah-Hartman <[email protected]>
ARM: crypto: comment out gcc warning that breaks clang builds

Xin Long <[email protected]>
sctp: leave the err path free in sctp_stream_init to sctp_stream_free

Alejandro Lucero <[email protected]>
sfc: disable softirqs for ptp TX

Leo Yan <[email protected]>
perf symbol: Correct address for bss symbols

Jason Wang <[email protected]>
virtio-net: fix the race between refill work and close

Florian Westphal <[email protected]>
netfilter: nf_queue: do not allow packet truncation below transport header offset

Sunil Goutham <[email protected]>
octeontx2-pf: cn10k: Fix egress ratelimit configuration

Duoming Zhou <[email protected]>
sctp: fix sleep in atomic context bug in timer handlers

Michal Maloszewski <[email protected]>
i40e: Fix interface init with MSI interrupts (no MSI-X)

Kuniyuki Iwashima <[email protected]>
ipv4: Fix data-races around sysctl_fib_notify_on_flag_change.

Kuniyuki Iwashima <[email protected]>
tcp: Fix data-races around sysctl_tcp_reflect_tos.

Kuniyuki Iwashima <[email protected]>
tcp: Fix a data-race around sysctl_tcp_comp_sack_nr.

Kuniyuki Iwashima <[email protected]>
tcp: Fix a data-race around sysctl_tcp_comp_sack_slack_ns.

Kuniyuki Iwashima <[email protected]>
tcp: Fix a data-race around sysctl_tcp_comp_sack_delay_ns.

Kuniyuki Iwashima <[email protected]>
net: Fix data-races around sysctl_[rw]mem(_offset)?.

Kuniyuki Iwashima <[email protected]>
tcp: Fix data-races around sk_pacing_rate.

Taehee Yoo <[email protected]>
net: mld: fix reference count leak in mld_{query | report}_work()

Jianglei Nie <[email protected]>
net: macsec: fix potential resource leak in macsec_add_rxsa() and macsec_add_txsa()

Sabrina Dubroca <[email protected]>
macsec: always read MACSEC_SA_ATTR_PN as a u64

Sabrina Dubroca <[email protected]>
macsec: limit replay window size with XPN

Sabrina Dubroca <[email protected]>
macsec: fix error message in macsec_add_rxsa and _txsa

Sabrina Dubroca <[email protected]>
macsec: fix NULL deref in macsec_add_rxsa

Xin Long <[email protected]>
Documentation: fix sctp_wmem in ip-sysctl.rst

Kuniyuki Iwashima <[email protected]>
tcp: Fix a data-race around sysctl_tcp_invalid_ratelimit.

Kuniyuki Iwashima <[email protected]>
tcp: Fix a data-race around sysctl_tcp_autocorking.

Kuniyuki Iwashima <[email protected]>
tcp: Fix a data-race around sysctl_tcp_min_rtt_wlen.

Kuniyuki Iwashima <[email protected]>
tcp: Fix a data-race around sysctl_tcp_min_tso_segs.

Liang He <[email protected]>
net: sungem_phy: Add of_node_put() for reference returned by of_get_parent()

Vladimir Oltean <[email protected]>
net: pcs: xpcs: propagate xpcs_read error to xpcs_get_state_c37_sgmii

Kuniyuki Iwashima <[email protected]>
igmp: Fix data-races around sysctl_igmp_qrv.

Maxim Mikityanskiy <[email protected]>
net/tls: Remove the context from the list in tls_device_down

Ziyang Xuan <[email protected]>
ipv6/addrconf: fix a null-ptr-deref bug for ip6_ptr

Kuniyuki Iwashima <[email protected]>
net: ping6: Fix memleak in ipv6_renew_options().

David Jeffery <[email protected]>
scsi: mpt3sas: Stop fw fault watchdog work item during system shutdown

Jason Yan <[email protected]>
scsi: core: Fix warning in scsi_alloc_sgtables()

Kuniyuki Iwashima <[email protected]>
tcp: Fix a data-race around sysctl_tcp_challenge_ack_limit.

Kuniyuki Iwashima <[email protected]>
tcp: Fix a data-race around sysctl_tcp_limit_output_bytes.

Kuniyuki Iwashima <[email protected]>
tcp: Fix data-races around sysctl_tcp_moderate_rcvbuf.

Subbaraya Sundeep <[email protected]>
octeontx2-pf: Fix UDP/TCP src and dst port tc filters

Wei Wang <[email protected]>
Revert "tcp: change pingpong threshold to 3"

Liang He <[email protected]>
scsi: ufs: host: Hold reference returned by of_parse_phandle()

Maciej Fijalkowski <[email protected]>
ice: do not setup vlan for loopback VSI

Maciej Fijalkowski <[email protected]>
ice: check (DD | EOF) bits on Rx descriptor rather than (EOP | RS)

Kuniyuki Iwashima <[email protected]>
tcp: Fix data-races around sysctl_tcp_no_ssthresh_metrics_save.

Kuniyuki Iwashima <[email protected]>
tcp: Fix a data-race around sysctl_tcp_nometrics_save.

Kuniyuki Iwashima <[email protected]>
tcp: Fix a data-race around sysctl_tcp_frto.

Kuniyuki Iwashima <[email protected]>
tcp: Fix a data-race around sysctl_tcp_adv_win_scale.

Kuniyuki Iwashima <[email protected]>
tcp: Fix a data-race around sysctl_tcp_app_win.

Kuniyuki Iwashima <[email protected]>
tcp: Fix data-races around sysctl_tcp_dsack.

Linus Torvalds <[email protected]>
watch_queue: Fix missing locking in add_watch_to_object()

David Howells <[email protected]>
watch_queue: Fix missing rcu annotation

Nathan Chancellor <[email protected]>
drm/simpledrm: Fix return type of simpledrm_simple_display_pipe_mode_valid()

Alistair Popple <[email protected]>
nouveau/svm: Fix to migrate all requested pages

Harald Freudenberger <[email protected]>
s390/archrandom: prevent CPACF trng invocations in interrupt context

Lukas Bulwahn <[email protected]>
asm-generic: remove a broken and needless ifdef conditional

Miaohe Lin <[email protected]>
hugetlb: fix memoryleak in hugetlb_mcopy_atomic_pte

Josef Bacik <[email protected]>
mm: fix page leak with multiple threads mapping the same page

Mike Rapoport <[email protected]>
secretmem: fix unhandled fault in truncate

Andrei Vagin <[email protected]>
fs: sendfile handles O_NONBLOCK of out_fd

ChenXiaoSong <[email protected]>
ntfs: fix use-after-free in ntfs_ucsncmp()

Junxiao Bi <[email protected]>
Revert "ocfs2: mount shared volume without ha stack"

Luiz Augusto von Dentz <[email protected]>
Bluetooth: L2CAP: Fix use-after-free caused by l2cap_chan_put

-------------

Diffstat:

Documentation/admin-guide/kernel-parameters.txt | 2 +
Documentation/networking/ip-sysctl.rst | 9 +-
Makefile | 4 +-
arch/arm/include/asm/dma.h | 2 +-
arch/arm/lib/xor-neon.c | 3 +-
arch/s390/include/asm/archrandom.h | 9 +-
arch/x86/kernel/cpu/bugs.c | 1 +
drivers/edac/ghes_edac.c | 11 ++-
drivers/gpu/drm/nouveau/nouveau_dmem.c | 6 +-
drivers/gpu/drm/tiny/simpledrm.c | 2 +-
drivers/net/ethernet/intel/i40e/i40e_main.c | 4 +
drivers/net/ethernet/intel/ice/ice_ethtool.c | 3 +-
drivers/net/ethernet/intel/ice/ice_main.c | 8 +-
.../net/ethernet/marvell/octeontx2/nic/otx2_tc.c | 106 ++++++++++++++-------
drivers/net/ethernet/sfc/ptp.c | 22 +++++
drivers/net/macsec.c | 33 ++++---
drivers/net/pcs/pcs-xpcs.c | 2 +-
drivers/net/sungem_phy.c | 1 +
drivers/net/virtio_net.c | 37 ++++++-
drivers/scsi/mpt3sas/mpt3sas_scsih.c | 1 +
drivers/scsi/scsi_ioctl.c | 2 +-
drivers/scsi/ufs/ufshcd-pltfrm.c | 15 ++-
fs/ntfs/attrib.c | 8 +-
fs/ocfs2/ocfs2.h | 4 +-
fs/ocfs2/slot_map.c | 46 ++++-----
fs/ocfs2/super.c | 21 ----
fs/read_write.c | 3 +
include/asm-generic/io.h | 2 -
include/net/addrconf.h | 3 +
include/net/bluetooth/l2cap.h | 1 +
include/net/inet_connection_sock.h | 10 +-
include/net/sock.h | 8 +-
include/net/tcp.h | 2 +-
kernel/locking/rwsem.c | 30 ++++--
kernel/watch_queue.c | 58 ++++++-----
mm/hmm.c | 19 ++--
mm/hugetlb.c | 1 +
mm/memory.c | 7 +-
mm/page_alloc.c | 12 ++-
mm/secretmem.c | 33 +++++--
net/bluetooth/l2cap_core.c | 61 +++++++++---
net/decnet/af_decnet.c | 4 +-
net/ipv4/fib_trie.c | 7 +-
net/ipv4/igmp.c | 24 ++---
net/ipv4/tcp.c | 8 +-
net/ipv4/tcp_input.c | 41 ++++----
net/ipv4/tcp_ipv4.c | 4 +-
net/ipv4/tcp_metrics.c | 10 +-
net/ipv4/tcp_output.c | 21 ++--
net/ipv6/mcast.c | 14 +--
net/ipv6/ping.c | 6 ++
net/ipv6/tcp_ipv6.c | 4 +-
net/mptcp/protocol.c | 8 +-
net/netfilter/nfnetlink_queue.c | 7 +-
net/sctp/associola.c | 5 +-
net/sctp/stream.c | 19 +---
net/sctp/stream_sched.c | 2 +-
net/tipc/socket.c | 2 +-
net/tls/tls_device.c | 7 +-
tools/perf/util/symbol-elf.c | 45 ++++++++-
60 files changed, 547 insertions(+), 303 deletions(-)

2022-08-01 12:52:12

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 5.15 54/69] i40e: Fix interface init with MSI interrupts (no MSI-X)

From: Michal Maloszewski <[email protected]>

[ Upstream commit 5fcbb711024aac6d4db385623e6f2fdf019f7782 ]

Fix the inability to bring an interface up on a setup with
only MSI interrupts enabled (no MSI-X).
Solution is to add a default number of QPs = 1. This is enough,
since without MSI-X support driver enables only a basic feature set.

Fixes: bc6d33c8d93f ("i40e: Fix the number of queues available to be mapped for use")
Signed-off-by: Dawid Lukwinski <[email protected]>
Signed-off-by: Michal Maloszewski <[email protected]>
Tested-by: Dave Switzer <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/ethernet/intel/i40e/i40e_main.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index c801b128e5b2..b07d55c99317 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -1908,11 +1908,15 @@ static void i40e_vsi_setup_queue_map(struct i40e_vsi *vsi,
* non-zero req_queue_pairs says that user requested a new
* queue count via ethtool's set_channels, so use this
* value for queues distribution across traffic classes
+ * We need at least one queue pair for the interface
+ * to be usable as we see in else statement.
*/
if (vsi->req_queue_pairs > 0)
vsi->num_queue_pairs = vsi->req_queue_pairs;
else if (pf->flags & I40E_FLAG_MSIX_ENABLED)
vsi->num_queue_pairs = pf->num_lan_msix;
+ else
+ vsi->num_queue_pairs = 1;
}

/* Number of queues per enabled TC */
--
2.35.1

2022-08-01 12:52:50

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 5.15 28/69] scsi: core: Fix warning in scsi_alloc_sgtables()

From: Jason Yan <[email protected]>

commit d9a434fa0c12ed5f7afe1e9dd30003ab5d059b85 upstream.

As explained in SG_IO howto[1]:

"If iovec_count is non-zero then 'dxfer_len' should be equal to the sum of
iov_len lengths. If not, the minimum of the two is the transfer length."

When iovec_count is non-zero and dxfer_len is zero, the sg_io() just
genarated a null bio, and finally caused a warning below. To fix it, skip
generating a bio for this request if dxfer_len is zero.

[1] https://tldp.org/HOWTO/SCSI-Generic-HOWTO/x198.html

WARNING: CPU: 2 PID: 3643 at drivers/scsi/scsi_lib.c:1032 scsi_alloc_sgtables+0xc7d/0xf70 drivers/scsi/scsi_lib.c:1032
Modules linked in:

CPU: 2 PID: 3643 Comm: syz-executor397 Not tainted
5.17.0-rc3-syzkaller-00316-gb81b1829e7e3 #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-204/01/2014
RIP: 0010:scsi_alloc_sgtables+0xc7d/0xf70 drivers/scsi/scsi_lib.c:1032
Code: e7 fc 31 ff 44 89 f6 e8 c1 4e e7 fc 45 85 f6 0f 84 1a f5 ff ff e8
93 4c e7 fc 83 c5 01 0f b7 ed e9 0f f5 ff ff e8 83 4c e7 fc <0f> 0b 41
bc 0a 00 00 00 e9 2b fb ff ff 41 bc 09 00 00 00 e9 20 fb
RSP: 0018:ffffc90000d07558 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff88801bfc96a0 RCX: 0000000000000000
RDX: ffff88801c876000 RSI: ffffffff849060bd RDI: 0000000000000003
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: ffffffff849055b9 R11: 0000000000000000 R12: ffff888012b8c000
R13: ffff88801bfc9580 R14: 0000000000000000 R15: ffff88801432c000
FS: 00007effdec8e700(0000) GS:ffff88802cc00000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007effdec6d718 CR3: 00000000206d6000 CR4: 0000000000150ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
scsi_setup_scsi_cmnd drivers/scsi/scsi_lib.c:1219 [inline]
scsi_prepare_cmd drivers/scsi/scsi_lib.c:1614 [inline]
scsi_queue_rq+0x283e/0x3630 drivers/scsi/scsi_lib.c:1730
blk_mq_dispatch_rq_list+0x6ea/0x22e0 block/blk-mq.c:1851
__blk_mq_sched_dispatch_requests+0x20b/0x410 block/blk-mq-sched.c:299
blk_mq_sched_dispatch_requests+0xfb/0x180 block/blk-mq-sched.c:332
__blk_mq_run_hw_queue+0xf9/0x350 block/blk-mq.c:1968
__blk_mq_delay_run_hw_queue+0x5b6/0x6c0 block/blk-mq.c:2045
blk_mq_run_hw_queue+0x30f/0x480 block/blk-mq.c:2096
blk_mq_sched_insert_request+0x340/0x440 block/blk-mq-sched.c:451
blk_execute_rq+0xcc/0x340 block/blk-mq.c:1231
sg_io+0x67c/0x1210 drivers/scsi/scsi_ioctl.c:485
scsi_ioctl_sg_io drivers/scsi/scsi_ioctl.c:866 [inline]
scsi_ioctl+0xa66/0x1560 drivers/scsi/scsi_ioctl.c:921
sd_ioctl+0x199/0x2a0 drivers/scsi/sd.c:1576
blkdev_ioctl+0x37a/0x800 block/ioctl.c:588
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:874 [inline]
__se_sys_ioctl fs/ioctl.c:860 [inline]
__x64_sys_ioctl+0x193/0x200 fs/ioctl.c:860
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7effdecdc5d9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 81 14 00 00 90 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01
f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007effdec8e2f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007effded664c0 RCX: 00007effdecdc5d9
RDX: 0000000020002300 RSI: 0000000000002285 RDI: 0000000000000004
RBP: 00007effded34034 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003
R13: 00007effded34054 R14: 2f30656c69662f2e R15: 00007effded664c8

Link: https://lore.kernel.org/r/[email protected]
Fixes: 25636e282fe9 ("block: fix SG_IO vector request data length handling")
Reported-by: [email protected]
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Bart Van Assche <[email protected]>
Signed-off-by: Jason Yan <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/scsi/scsi_ioctl.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/scsi/scsi_ioctl.c
+++ b/drivers/scsi/scsi_ioctl.c
@@ -457,7 +457,7 @@ static int sg_io(struct scsi_device *sde
goto out_free_cdb;

ret = 0;
- if (hdr->iovec_count) {
+ if (hdr->iovec_count && hdr->dxfer_len) {
struct iov_iter i;
struct iovec *iov = NULL;

2022-08-01 12:52:51

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 5.15 57/69] netfilter: nf_queue: do not allow packet truncation below transport header offset

From: Florian Westphal <[email protected]>

[ Upstream commit 99a63d36cb3ed5ca3aa6fcb64cffbeaf3b0fb164 ]

Domingo Dirutigliano and Nicola Guerrera report kernel panic when
sending nf_queue verdict with 1-byte nfta_payload attribute.

The IP/IPv6 stack pulls the IP(v6) header from the packet after the
input hook.

If user truncates the packet below the header size, this skb_pull() will
result in a malformed skb (skb->len < 0).

Fixes: 7af4cc3fa158 ("[NETFILTER]: Add "nfnetlink_queue" netfilter queue handler over nfnetlink")
Reported-by: Domingo Dirutigliano <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>
Reviewed-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/netfilter/nfnetlink_queue.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c
index 8787d0613ad8..5329ebf19a18 100644
--- a/net/netfilter/nfnetlink_queue.c
+++ b/net/netfilter/nfnetlink_queue.c
@@ -836,11 +836,16 @@ nfqnl_enqueue_packet(struct nf_queue_entry *entry, unsigned int queuenum)
}

static int
-nfqnl_mangle(void *data, int data_len, struct nf_queue_entry *e, int diff)
+nfqnl_mangle(void *data, unsigned int data_len, struct nf_queue_entry *e, int diff)
{
struct sk_buff *nskb;

if (diff < 0) {
+ unsigned int min_len = skb_transport_offset(e->skb);
+
+ if (data_len < min_len)
+ return -EINVAL;
+
if (pskb_trim(e->skb, data_len))
return -ENOMEM;
} else if (diff > 0) {
--
2.35.1

2022-08-01 12:53:02

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 5.15 61/69] sctp: leave the err path free in sctp_stream_init to sctp_stream_free

From: Xin Long <[email protected]>

[ Upstream commit 181d8d2066c000ba0a0e6940a7ad80f1a0e68e9d ]

A NULL pointer dereference was reported by Wei Chen:

BUG: kernel NULL pointer dereference, address: 0000000000000000
RIP: 0010:__list_del_entry_valid+0x26/0x80
Call Trace:
<TASK>
sctp_sched_dequeue_common+0x1c/0x90
sctp_sched_prio_dequeue+0x67/0x80
__sctp_outq_teardown+0x299/0x380
sctp_outq_free+0x15/0x20
sctp_association_free+0xc3/0x440
sctp_do_sm+0x1ca7/0x2210
sctp_assoc_bh_rcv+0x1f6/0x340

This happens when calling sctp_sendmsg without connecting to server first.
In this case, a data chunk already queues up in send queue of client side
when processing the INIT_ACK from server in sctp_process_init() where it
calls sctp_stream_init() to alloc stream_in. If it fails to alloc stream_in
all stream_out will be freed in sctp_stream_init's err path. Then in the
asoc freeing it will crash when dequeuing this data chunk as stream_out
is missing.

As we can't free stream out before dequeuing all data from send queue, and
this patch is to fix it by moving the err path stream_out/in freeing in
sctp_stream_init() to sctp_stream_free() which is eventually called when
freeing the asoc in sctp_association_free(). This fix also makes the code
in sctp_process_init() more clear.

Note that in sctp_association_init() when it fails in sctp_stream_init(),
sctp_association_free() will not be called, and in that case it should
go to 'stream_free' err path to free stream instead of 'fail_init'.

Fixes: 5bbbbe32a431 ("sctp: introduce stream scheduler foundations")
Reported-by: Wei Chen <[email protected]>
Signed-off-by: Xin Long <[email protected]>
Link: https://lore.kernel.org/r/831a3dc100c4908ff76e5bcc363be97f2778bc0b.1658787066.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/sctp/associola.c | 5 ++---
net/sctp/stream.c | 19 +++----------------
2 files changed, 5 insertions(+), 19 deletions(-)

diff --git a/net/sctp/associola.c b/net/sctp/associola.c
index be29da09cc7a..3460abceba44 100644
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -229,9 +229,8 @@ static struct sctp_association *sctp_association_init(
if (!sctp_ulpq_init(&asoc->ulpq, asoc))
goto fail_init;

- if (sctp_stream_init(&asoc->stream, asoc->c.sinit_num_ostreams,
- 0, gfp))
- goto fail_init;
+ if (sctp_stream_init(&asoc->stream, asoc->c.sinit_num_ostreams, 0, gfp))
+ goto stream_free;

/* Initialize default path MTU. */
asoc->pathmtu = sp->pathmtu;
diff --git a/net/sctp/stream.c b/net/sctp/stream.c
index 6dc95dcc0ff4..ef9fceadef8d 100644
--- a/net/sctp/stream.c
+++ b/net/sctp/stream.c
@@ -137,7 +137,7 @@ int sctp_stream_init(struct sctp_stream *stream, __u16 outcnt, __u16 incnt,

ret = sctp_stream_alloc_out(stream, outcnt, gfp);
if (ret)
- goto out_err;
+ return ret;

for (i = 0; i < stream->outcnt; i++)
SCTP_SO(stream, i)->state = SCTP_STREAM_OPEN;
@@ -145,22 +145,9 @@ int sctp_stream_init(struct sctp_stream *stream, __u16 outcnt, __u16 incnt,
handle_in:
sctp_stream_interleave_init(stream);
if (!incnt)
- goto out;
-
- ret = sctp_stream_alloc_in(stream, incnt, gfp);
- if (ret)
- goto in_err;
-
- goto out;
+ return 0;

-in_err:
- sched->free(stream);
- genradix_free(&stream->in);
-out_err:
- genradix_free(&stream->out);
- stream->outcnt = 0;
-out:
- return ret;
+ return sctp_stream_alloc_in(stream, incnt, gfp);
}

int sctp_stream_init_ext(struct sctp_stream *stream, __u16 sid)
--
2.35.1

2022-08-01 12:53:11

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 5.15 15/69] tcp: Fix a data-race around sysctl_tcp_app_win.

From: Kuniyuki Iwashima <[email protected]>

commit 02ca527ac5581cf56749db9fd03d854e842253dd upstream.

While reading sysctl_tcp_app_win, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv4/tcp_input.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -526,7 +526,7 @@ static void tcp_grow_window(struct sock
*/
static void tcp_init_buffer_space(struct sock *sk)
{
- int tcp_app_win = sock_net(sk)->ipv4.sysctl_tcp_app_win;
+ int tcp_app_win = READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_app_win);
struct tcp_sock *tp = tcp_sk(sk);
int maxwin;

2022-08-01 12:53:17

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 5.15 53/69] ipv4: Fix data-races around sysctl_fib_notify_on_flag_change.

From: Kuniyuki Iwashima <[email protected]>

[ Upstream commit 96b9bd8c6d125490f9adfb57d387ef81a55a103e ]

While reading sysctl_fib_notify_on_flag_change, it can be changed
concurrently. Thus, we need to add READ_ONCE() to its readers.

Fixes: 680aea08e78c ("net: ipv4: Emit notification when fib hardware flags are changed")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/ipv4/fib_trie.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index a9cd9c2bd84e..19c6e7b93d3d 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -1037,6 +1037,7 @@ fib_find_matching_alias(struct net *net, const struct fib_rt_info *fri)

void fib_alias_hw_flags_set(struct net *net, const struct fib_rt_info *fri)
{
+ u8 fib_notify_on_flag_change;
struct fib_alias *fa_match;
struct sk_buff *skb;
int err;
@@ -1058,14 +1059,16 @@ void fib_alias_hw_flags_set(struct net *net, const struct fib_rt_info *fri)
WRITE_ONCE(fa_match->offload, fri->offload);
WRITE_ONCE(fa_match->trap, fri->trap);

+ fib_notify_on_flag_change = READ_ONCE(net->ipv4.sysctl_fib_notify_on_flag_change);
+
/* 2 means send notifications only if offload_failed was changed. */
- if (net->ipv4.sysctl_fib_notify_on_flag_change == 2 &&
+ if (fib_notify_on_flag_change == 2 &&
READ_ONCE(fa_match->offload_failed) == fri->offload_failed)
goto out;

WRITE_ONCE(fa_match->offload_failed, fri->offload_failed);

- if (!net->ipv4.sysctl_fib_notify_on_flag_change)
+ if (!fib_notify_on_flag_change)
goto out;

skb = nlmsg_new(fib_nlmsg_size(fa_match->fa_info), GFP_ATOMIC);
--
2.35.1

2022-08-01 12:54:04

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 5.15 63/69] mm/hmm: fault non-owner device private entries

From: Ralph Campbell <[email protected]>

commit 8a295dbbaf7292c582a40ce469c326f472d51f66 upstream.

If hmm_range_fault() is called with the HMM_PFN_REQ_FAULT flag and a
device private PTE is found, the hmm_range::dev_private_owner page is used
to determine if the device private page should not be faulted in.
However, if the device private page is not owned by the caller,
hmm_range_fault() returns an error instead of calling migrate_to_ram() to
fault in the page.

For example, if a page is migrated to GPU private memory and a RDMA fault
capable NIC tries to read the migrated page, without this patch it will
get an error. With this patch, the page will be migrated back to system
memory and the NIC will be able to read the data.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 08ddddda667b ("mm/hmm: check the device private page owner in hmm_range_fault()")
Signed-off-by: Ralph Campbell <[email protected]>
Reported-by: Felix Kuehling <[email protected]>
Reviewed-by: Alistair Popple <[email protected]>
Cc: Philip Yang <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
mm/hmm.c | 19 ++++++++-----------
1 file changed, 8 insertions(+), 11 deletions(-)

--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -212,14 +212,6 @@ int hmm_vma_handle_pmd(struct mm_walk *w
unsigned long end, unsigned long hmm_pfns[], pmd_t pmd);
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */

-static inline bool hmm_is_device_private_entry(struct hmm_range *range,
- swp_entry_t entry)
-{
- return is_device_private_entry(entry) &&
- pfn_swap_entry_to_page(entry)->pgmap->owner ==
- range->dev_private_owner;
-}
-
static inline unsigned long pte_to_hmm_pfn_flags(struct hmm_range *range,
pte_t pte)
{
@@ -252,10 +244,12 @@ static int hmm_vma_handle_pte(struct mm_
swp_entry_t entry = pte_to_swp_entry(pte);

/*
- * Never fault in device private pages, but just report
- * the PFN even if not present.
+ * Don't fault in device private pages owned by the caller,
+ * just report the PFN.
*/
- if (hmm_is_device_private_entry(range, entry)) {
+ if (is_device_private_entry(entry) &&
+ pfn_swap_entry_to_page(entry)->pgmap->owner ==
+ range->dev_private_owner) {
cpu_flags = HMM_PFN_VALID;
if (is_writable_device_private_entry(entry))
cpu_flags |= HMM_PFN_WRITE;
@@ -273,6 +267,9 @@ static int hmm_vma_handle_pte(struct mm_
if (!non_swap_entry(entry))
goto fault;

+ if (is_device_private_entry(entry))
+ goto fault;
+
if (is_device_exclusive_entry(entry))
goto fault;

2022-08-01 12:54:31

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 5.15 51/69] tcp: Fix a data-race around sysctl_tcp_comp_sack_nr.

From: Kuniyuki Iwashima <[email protected]>

[ Upstream commit 79f55473bfc8ac51bd6572929a679eeb4da22251 ]

While reading sysctl_tcp_comp_sack_nr, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 9c21d2fc41c0 ("tcp: add tcp_comp_sack_nr sysctl")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/ipv4/tcp_input.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 7b593865b4ae..a33e6aa42a4c 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -5491,7 +5491,7 @@ static void __tcp_ack_snd_check(struct sock *sk, int ofo_possible)
}

if (!tcp_is_sack(tp) ||
- tp->compressed_ack >= sock_net(sk)->ipv4.sysctl_tcp_comp_sack_nr)
+ tp->compressed_ack >= READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_comp_sack_nr))
goto send_now;

if (tp->compressed_ack_rcv_nxt != tp->rcv_nxt) {
--
2.35.1

2022-08-01 12:54:32

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 5.15 22/69] scsi: ufs: host: Hold reference returned by of_parse_phandle()

From: Liang He <[email protected]>

commit a3435afba87dc6cd83f5595e7607f3c40f93ef01 upstream.

In ufshcd_populate_vreg(), we should hold the reference returned by
of_parse_phandle() and then use it to call of_node_put() for refcount
balance.

Link: https://lore.kernel.org/r/[email protected]
Fixes: aa4976130934 ("ufs: Add regulator enable support")
Reviewed-by: Bart Van Assche <[email protected]>
Signed-off-by: Liang He <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/scsi/ufs/ufshcd-pltfrm.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)

--- a/drivers/scsi/ufs/ufshcd-pltfrm.c
+++ b/drivers/scsi/ufs/ufshcd-pltfrm.c
@@ -107,9 +107,20 @@ out:
return ret;
}

+static bool phandle_exists(const struct device_node *np,
+ const char *phandle_name, int index)
+{
+ struct device_node *parse_np = of_parse_phandle(np, phandle_name, index);
+
+ if (parse_np)
+ of_node_put(parse_np);
+
+ return parse_np != NULL;
+}
+
#define MAX_PROP_SIZE 32
static int ufshcd_populate_vreg(struct device *dev, const char *name,
- struct ufs_vreg **out_vreg)
+ struct ufs_vreg **out_vreg)
{
char prop_name[MAX_PROP_SIZE];
struct ufs_vreg *vreg = NULL;
@@ -121,7 +132,7 @@ static int ufshcd_populate_vreg(struct d
}

snprintf(prop_name, MAX_PROP_SIZE, "%s-supply", name);
- if (!of_parse_phandle(np, prop_name, 0)) {
+ if (!phandle_exists(np, prop_name, 0)) {
dev_info(dev, "%s: Unable to find %s regulator, assuming enabled\n",
__func__, prop_name);
goto out;

2022-08-01 12:55:56

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 5.15 69/69] x86/bugs: Do not enable IBPB at firmware entry when IBPB is not available

From: Thadeu Lima de Souza Cascardo <[email protected]>

commit 571c30b1a88465a1c85a6f7762609939b9085a15 upstream.

Some cloud hypervisors do not provide IBPB on very recent CPU processors,
including AMD processors affected by Retbleed.

Using IBPB before firmware calls on such systems would cause a GPF at boot
like the one below. Do not enable such calls when IBPB support is not
present.

EFI Variables Facility v0.08 2004-May-17
general protection fault, maybe for address 0x1: 0000 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 24 Comm: kworker/u2:1 Not tainted 5.19.0-rc8+ #7
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
Workqueue: efi_rts_wq efi_call_rts
RIP: 0010:efi_call_rts
Code: e8 37 33 58 ff 41 bf 48 00 00 00 49 89 c0 44 89 f9 48 83 c8 01 4c 89 c2 48 c1 ea 20 66 90 b9 49 00 00 00 b8 01 00 00 00 31 d2 <0f> 30 e8 7b 9f 5d ff e8 f6 f8 ff ff 4c 89 f1 4c 89 ea 4c 89 e6 48
RSP: 0018:ffffb373800d7e38 EFLAGS: 00010246
RAX: 0000000000000001 RBX: 0000000000000006 RCX: 0000000000000049
RDX: 0000000000000000 RSI: ffff94fbc19d8fe0 RDI: ffff94fbc1b2b300
RBP: ffffb373800d7e70 R08: 0000000000000000 R09: 0000000000000000
R10: 000000000000000b R11: 000000000000000b R12: ffffb3738001fd78
R13: ffff94fbc2fcfc00 R14: ffffb3738001fd80 R15: 0000000000000048
FS: 0000000000000000(0000) GS:ffff94fc3da00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff94fc30201000 CR3: 000000006f610000 CR4: 00000000000406f0
Call Trace:
<TASK>
? __wake_up
process_one_work
worker_thread
? rescuer_thread
kthread
? kthread_complete_and_exit
ret_from_fork
</TASK>
Modules linked in:

Fixes: 28a99e95f55c ("x86/amd: Use IBPB for firmware calls")
Reported-by: Dimitri John Ledkov <[email protected]>
Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/kernel/cpu/bugs.c | 1 +
1 file changed, 1 insertion(+)

--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -1513,6 +1513,7 @@ static void __init spectre_v2_select_mit
* enable IBRS around firmware calls.
*/
if (boot_cpu_has_bug(X86_BUG_RETBLEED) &&
+ boot_cpu_has(X86_FEATURE_IBPB) &&
(boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||
boot_cpu_data.x86_vendor == X86_VENDOR_HYGON)) {

2022-08-01 13:08:02

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 5.15 24/69] octeontx2-pf: Fix UDP/TCP src and dst port tc filters

From: Subbaraya Sundeep <[email protected]>

commit 59e1be6f83b928a04189bbf3ab683a1fc6248db3 upstream.

Check the mask for non-zero value before installing tc filters
for L4 source and destination ports. Otherwise installing a
filter for source port installs destination port too and
vice-versa.

Fixes: 1d4d9e42c240 ("octeontx2-pf: Add tc flower hardware offload on ingress traffic")
Signed-off-by: Subbaraya Sundeep <[email protected]>
Signed-off-by: Sunil Goutham <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/marvell/octeontx2/nic/otx2_tc.c | 30 +++++++++++--------
1 file changed, 18 insertions(+), 12 deletions(-)

--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_tc.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_tc.c
@@ -571,21 +571,27 @@ static int otx2_tc_prepare_flow(struct o

flow_spec->dport = match.key->dst;
flow_mask->dport = match.mask->dst;
- if (ip_proto == IPPROTO_UDP)
- req->features |= BIT_ULL(NPC_DPORT_UDP);
- else if (ip_proto == IPPROTO_TCP)
- req->features |= BIT_ULL(NPC_DPORT_TCP);
- else if (ip_proto == IPPROTO_SCTP)
- req->features |= BIT_ULL(NPC_DPORT_SCTP);
+
+ if (flow_mask->dport) {
+ if (ip_proto == IPPROTO_UDP)
+ req->features |= BIT_ULL(NPC_DPORT_UDP);
+ else if (ip_proto == IPPROTO_TCP)
+ req->features |= BIT_ULL(NPC_DPORT_TCP);
+ else if (ip_proto == IPPROTO_SCTP)
+ req->features |= BIT_ULL(NPC_DPORT_SCTP);
+ }

flow_spec->sport = match.key->src;
flow_mask->sport = match.mask->src;
- if (ip_proto == IPPROTO_UDP)
- req->features |= BIT_ULL(NPC_SPORT_UDP);
- else if (ip_proto == IPPROTO_TCP)
- req->features |= BIT_ULL(NPC_SPORT_TCP);
- else if (ip_proto == IPPROTO_SCTP)
- req->features |= BIT_ULL(NPC_SPORT_SCTP);
+
+ if (flow_mask->sport) {
+ if (ip_proto == IPPROTO_UDP)
+ req->features |= BIT_ULL(NPC_SPORT_UDP);
+ else if (ip_proto == IPPROTO_TCP)
+ req->features |= BIT_ULL(NPC_SPORT_TCP);
+ else if (ip_proto == IPPROTO_SCTP)
+ req->features |= BIT_ULL(NPC_SPORT_SCTP);
+ }
}

return otx2_tc_parse_actions(nic, &rule->action, req, f, node);

2022-08-01 13:08:43

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 5.15 46/69] net: mld: fix reference count leak in mld_{query | report}_work()

From: Taehee Yoo <[email protected]>

[ Upstream commit 3e7d18b9dca388940a19cae30bfc1f76dccd8c28 ]

mld_{query | report}_work() processes queued events.
If there are too many events in the queue, it re-queue a work.
And then, it returns without in6_dev_put().
But if queuing is failed, it should call in6_dev_put(), but it doesn't.
So, a reference count leak would occur.

THREAD0 THREAD1
mld_report_work()
spin_lock_bh()
if (!mod_delayed_work())
in6_dev_hold();
spin_unlock_bh()
spin_lock_bh()
schedule_delayed_work()
spin_unlock_bh()

Script to reproduce(by Hangbin Liu):
ip netns add ns1
ip netns add ns2
ip netns exec ns1 sysctl -w net.ipv6.conf.all.force_mld_version=1
ip netns exec ns2 sysctl -w net.ipv6.conf.all.force_mld_version=1

ip -n ns1 link add veth0 type veth peer name veth0 netns ns2
ip -n ns1 link set veth0 up
ip -n ns2 link set veth0 up

for i in `seq 50`; do
for j in `seq 100`; do
ip -n ns1 addr add 2021:${i}::${j}/64 dev veth0
ip -n ns2 addr add 2022:${i}::${j}/64 dev veth0
done
done
modprobe -r veth
ip -a netns del

splat looks like:
unregister_netdevice: waiting for veth0 to become free. Usage count = 2
leaked reference.
ipv6_add_dev+0x324/0xec0
addrconf_notify+0x481/0xd10
raw_notifier_call_chain+0xe3/0x120
call_netdevice_notifiers+0x106/0x160
register_netdevice+0x114c/0x16b0
veth_newlink+0x48b/0xa50 [veth]
rtnl_newlink+0x11a2/0x1a40
rtnetlink_rcv_msg+0x63f/0xc00
netlink_rcv_skb+0x1df/0x3e0
netlink_unicast+0x5de/0x850
netlink_sendmsg+0x6c9/0xa90
____sys_sendmsg+0x76a/0x780
__sys_sendmsg+0x27c/0x340
do_syscall_64+0x43/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Tested-by: Hangbin Liu <[email protected]>
Fixes: f185de28d9ae ("mld: add new workqueues for process mld events")
Signed-off-by: Taehee Yoo <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/ipv6/mcast.c | 14 ++++++++------
1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c
index 7f695c39d9a8..87c699d57b36 100644
--- a/net/ipv6/mcast.c
+++ b/net/ipv6/mcast.c
@@ -1522,7 +1522,6 @@ static void mld_query_work(struct work_struct *work)

if (++cnt >= MLD_MAX_QUEUE) {
rework = true;
- schedule_delayed_work(&idev->mc_query_work, 0);
break;
}
}
@@ -1533,8 +1532,10 @@ static void mld_query_work(struct work_struct *work)
__mld_query_work(skb);
mutex_unlock(&idev->mc_lock);

- if (!rework)
- in6_dev_put(idev);
+ if (rework && queue_delayed_work(mld_wq, &idev->mc_query_work, 0))
+ return;
+
+ in6_dev_put(idev);
}

/* called with rcu_read_lock() */
@@ -1624,7 +1625,6 @@ static void mld_report_work(struct work_struct *work)

if (++cnt >= MLD_MAX_QUEUE) {
rework = true;
- schedule_delayed_work(&idev->mc_report_work, 0);
break;
}
}
@@ -1635,8 +1635,10 @@ static void mld_report_work(struct work_struct *work)
__mld_report_work(skb);
mutex_unlock(&idev->mc_lock);

- if (!rework)
- in6_dev_put(idev);
+ if (rework && queue_delayed_work(mld_wq, &idev->mc_report_work, 0))
+ return;
+
+ in6_dev_put(idev);
}

static bool is_in(struct ifmcaddr6 *pmc, struct ip6_sf_list *psf, int type,
--
2.35.1

2022-08-01 13:08:42

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 5.15 09/69] s390/archrandom: prevent CPACF trng invocations in interrupt context

From: Harald Freudenberger <[email protected]>

commit 918e75f77af7d2e049bb70469ec0a2c12782d96a upstream.

This patch slightly reworks the s390 arch_get_random_seed_{int,long}
implementation: Make sure the CPACF trng instruction is never
called in any interrupt context. This is done by adding an
additional condition in_task().

Justification:

There are some constrains to satisfy for the invocation of the
arch_get_random_seed_{int,long}() functions:
- They should provide good random data during kernel initialization.
- They should not be called in interrupt context as the TRNG
instruction is relatively heavy weight and may for example
make some network loads cause to timeout and buck.

However, it was not clear what kind of interrupt context is exactly
encountered during kernel init or network traffic eventually calling
arch_get_random_seed_long().

After some days of investigations it is clear that the s390
start_kernel function is not running in any interrupt context and
so the trng is called:

Jul 11 18:33:39 t35lp54 kernel: [<00000001064e90ca>] arch_get_random_seed_long.part.0+0x32/0x70
Jul 11 18:33:39 t35lp54 kernel: [<000000010715f246>] random_init+0xf6/0x238
Jul 11 18:33:39 t35lp54 kernel: [<000000010712545c>] start_kernel+0x4a4/0x628
Jul 11 18:33:39 t35lp54 kernel: [<000000010590402a>] startup_continue+0x2a/0x40

The condition in_task() is true and the CPACF trng provides random data
during kernel startup.

The network traffic however, is more difficult. A typical call stack
looks like this:

Jul 06 17:37:07 t35lp54 kernel: [<000000008b5600fc>] extract_entropy.constprop.0+0x23c/0x240
Jul 06 17:37:07 t35lp54 kernel: [<000000008b560136>] crng_reseed+0x36/0xd8
Jul 06 17:37:07 t35lp54 kernel: [<000000008b5604b8>] crng_make_state+0x78/0x340
Jul 06 17:37:07 t35lp54 kernel: [<000000008b5607e0>] _get_random_bytes+0x60/0xf8
Jul 06 17:37:07 t35lp54 kernel: [<000000008b56108a>] get_random_u32+0xda/0x248
Jul 06 17:37:07 t35lp54 kernel: [<000000008aefe7a8>] kfence_guarded_alloc+0x48/0x4b8
Jul 06 17:37:07 t35lp54 kernel: [<000000008aeff35e>] __kfence_alloc+0x18e/0x1b8
Jul 06 17:37:07 t35lp54 kernel: [<000000008aef7f10>] __kmalloc_node_track_caller+0x368/0x4d8
Jul 06 17:37:07 t35lp54 kernel: [<000000008b611eac>] kmalloc_reserve+0x44/0xa0
Jul 06 17:37:07 t35lp54 kernel: [<000000008b611f98>] __alloc_skb+0x90/0x178
Jul 06 17:37:07 t35lp54 kernel: [<000000008b6120dc>] __napi_alloc_skb+0x5c/0x118
Jul 06 17:37:07 t35lp54 kernel: [<000000008b8f06b4>] qeth_extract_skb+0x13c/0x680
Jul 06 17:37:07 t35lp54 kernel: [<000000008b8f6526>] qeth_poll+0x256/0x3f8
Jul 06 17:37:07 t35lp54 kernel: [<000000008b63d76e>] __napi_poll.constprop.0+0x46/0x2f8
Jul 06 17:37:07 t35lp54 kernel: [<000000008b63dbec>] net_rx_action+0x1cc/0x408
Jul 06 17:37:07 t35lp54 kernel: [<000000008b937302>] __do_softirq+0x132/0x6b0
Jul 06 17:37:07 t35lp54 kernel: [<000000008abf46ce>] __irq_exit_rcu+0x13e/0x170
Jul 06 17:37:07 t35lp54 kernel: [<000000008abf531a>] irq_exit_rcu+0x22/0x50
Jul 06 17:37:07 t35lp54 kernel: [<000000008b922506>] do_io_irq+0xe6/0x198
Jul 06 17:37:07 t35lp54 kernel: [<000000008b935826>] io_int_handler+0xd6/0x110
Jul 06 17:37:07 t35lp54 kernel: [<000000008b9358a6>] psw_idle_exit+0x0/0xa
Jul 06 17:37:07 t35lp54 kernel: ([<000000008ab9c59a>] arch_cpu_idle+0x52/0xe0)
Jul 06 17:37:07 t35lp54 kernel: [<000000008b933cfe>] default_idle_call+0x6e/0xd0
Jul 06 17:37:07 t35lp54 kernel: [<000000008ac59f4e>] do_idle+0xf6/0x1b0
Jul 06 17:37:07 t35lp54 kernel: [<000000008ac5a28e>] cpu_startup_entry+0x36/0x40
Jul 06 17:37:07 t35lp54 kernel: [<000000008abb0d90>] smp_start_secondary+0x148/0x158
Jul 06 17:37:07 t35lp54 kernel: [<000000008b935b9e>] restart_int_handler+0x6e/0x90

which confirms that the call is in softirq context. So in_task() covers exactly
the cases where we want to have CPACF trng called: not in nmi, not in hard irq,
not in soft irq but in normal task context and during kernel init.

Signed-off-by: Harald Freudenberger <[email protected]>
Acked-by: Jason A. Donenfeld <[email protected]>
Reviewed-by: Juergen Christ <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Fixes: e4f74400308c ("s390/archrandom: simplify back to earlier design and initialize earlier")
[[email protected] changed desc, added Fixes and Link, removed -stable]
Signed-off-by: Alexander Gordeev <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/s390/include/asm/archrandom.h | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)

--- a/arch/s390/include/asm/archrandom.h
+++ b/arch/s390/include/asm/archrandom.h
@@ -2,7 +2,7 @@
/*
* Kernel interface for the s390 arch_random_* functions
*
- * Copyright IBM Corp. 2017, 2020
+ * Copyright IBM Corp. 2017, 2022
*
* Author: Harald Freudenberger <[email protected]>
*
@@ -14,6 +14,7 @@
#ifdef CONFIG_ARCH_RANDOM

#include <linux/static_key.h>
+#include <linux/preempt.h>
#include <linux/atomic.h>
#include <asm/cpacf.h>

@@ -32,7 +33,8 @@ static inline bool __must_check arch_get

static inline bool __must_check arch_get_random_seed_long(unsigned long *v)
{
- if (static_branch_likely(&s390_arch_random_available)) {
+ if (static_branch_likely(&s390_arch_random_available) &&
+ in_task()) {
cpacf_trng(NULL, 0, (u8 *)v, sizeof(*v));
atomic64_add(sizeof(*v), &s390_arch_random_counter);
return true;
@@ -42,7 +44,8 @@ static inline bool __must_check arch_get

static inline bool __must_check arch_get_random_seed_int(unsigned int *v)
{
- if (static_branch_likely(&s390_arch_random_available)) {
+ if (static_branch_likely(&s390_arch_random_available) &&
+ in_task()) {
cpacf_trng(NULL, 0, (u8 *)v, sizeof(*v));
atomic64_add(sizeof(*v), &s390_arch_random_counter);
return true;

2022-08-01 13:15:28

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 5.15 44/69] macsec: always read MACSEC_SA_ATTR_PN as a u64

From: Sabrina Dubroca <[email protected]>

[ Upstream commit c630d1fe6219769049c87d1a6a0e9a6de55328a1 ]

Currently, MACSEC_SA_ATTR_PN is handled inconsistently, sometimes as a
u32, sometimes forced into a u64 without checking the actual length of
the attribute. Instead, we can use nla_get_u64 everywhere, which will
read up to 64 bits into a u64, capped by the actual length of the
attribute coming from userspace.

This fixes several issues:
- the check in validate_add_rxsa doesn't work with 32-bit attributes
- the checks in validate_add_txsa and validate_upd_sa incorrectly
reject X << 32 (with X != 0)

Fixes: 48ef50fa866a ("macsec: Netlink support of XPN cipher suites (IEEE 802.1AEbw)")
Signed-off-by: Sabrina Dubroca <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/macsec.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c
index 1f2eb576533c..3e74dcc1f875 100644
--- a/drivers/net/macsec.c
+++ b/drivers/net/macsec.c
@@ -1696,7 +1696,7 @@ static bool validate_add_rxsa(struct nlattr **attrs)
return false;

if (attrs[MACSEC_SA_ATTR_PN] &&
- *(u64 *)nla_data(attrs[MACSEC_SA_ATTR_PN]) == 0)
+ nla_get_u64(attrs[MACSEC_SA_ATTR_PN]) == 0)
return false;

if (attrs[MACSEC_SA_ATTR_ACTIVE]) {
@@ -1939,7 +1939,7 @@ static bool validate_add_txsa(struct nlattr **attrs)
if (nla_get_u8(attrs[MACSEC_SA_ATTR_AN]) >= MACSEC_NUM_AN)
return false;

- if (nla_get_u32(attrs[MACSEC_SA_ATTR_PN]) == 0)
+ if (nla_get_u64(attrs[MACSEC_SA_ATTR_PN]) == 0)
return false;

if (attrs[MACSEC_SA_ATTR_ACTIVE]) {
@@ -2293,7 +2293,7 @@ static bool validate_upd_sa(struct nlattr **attrs)
if (nla_get_u8(attrs[MACSEC_SA_ATTR_AN]) >= MACSEC_NUM_AN)
return false;

- if (attrs[MACSEC_SA_ATTR_PN] && nla_get_u32(attrs[MACSEC_SA_ATTR_PN]) == 0)
+ if (attrs[MACSEC_SA_ATTR_PN] && nla_get_u64(attrs[MACSEC_SA_ATTR_PN]) == 0)
return false;

if (attrs[MACSEC_SA_ATTR_ACTIVE]) {
--
2.35.1

2022-08-01 13:16:19

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 5.15 68/69] locking/rwsem: Allow slowpath writer to ignore handoff bit if not set by first waiter

From: Waiman Long <[email protected]>

commit 6eebd5fb20838f5971ba17df9f55cc4f84a31053 upstream.

With commit d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more
consistent"), the writer that sets the handoff bit can be interrupted
out without clearing the bit if the wait queue isn't empty. This disables
reader and writer optimistic lock spinning and stealing.

Now if a non-first writer in the queue is somehow woken up or a new
waiter enters the slowpath, it can't acquire the lock. This is not the
case before commit d257cc8cb8d5 as the writer that set the handoff bit
will clear it when exiting out via the out_nolock path. This is less
efficient as the busy rwsem stays in an unlock state for a longer time.

In some cases, this new behavior may cause lockups as shown in [1] and
[2].

This patch allows a non-first writer to ignore the handoff bit if it
is not originally set or initiated by the first waiter. This patch is
shown to be effective in fixing the lockup problem reported in [1].

[1] https://lore.kernel.org/lkml/[email protected]/
[2] https://lore.kernel.org/lkml/[email protected]/

Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more consistent")
Signed-off-by: Waiman Long <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: John Donnelly <[email protected]>
Tested-by: Mel Gorman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
kernel/locking/rwsem.c | 30 ++++++++++++++++++++----------
1 file changed, 20 insertions(+), 10 deletions(-)

--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -335,8 +335,6 @@ struct rwsem_waiter {
struct task_struct *task;
enum rwsem_waiter_type type;
unsigned long timeout;
-
- /* Writer only, not initialized in reader */
bool handoff_set;
};
#define rwsem_first_waiter(sem) \
@@ -456,10 +454,12 @@ static void rwsem_mark_wake(struct rw_se
* to give up the lock), request a HANDOFF to
* force the issue.
*/
- if (!(oldcount & RWSEM_FLAG_HANDOFF) &&
- time_after(jiffies, waiter->timeout)) {
- adjustment -= RWSEM_FLAG_HANDOFF;
- lockevent_inc(rwsem_rlock_handoff);
+ if (time_after(jiffies, waiter->timeout)) {
+ if (!(oldcount & RWSEM_FLAG_HANDOFF)) {
+ adjustment -= RWSEM_FLAG_HANDOFF;
+ lockevent_inc(rwsem_rlock_handoff);
+ }
+ waiter->handoff_set = true;
}

atomic_long_add(-adjustment, &sem->count);
@@ -569,7 +569,7 @@ static void rwsem_mark_wake(struct rw_se
static inline bool rwsem_try_write_lock(struct rw_semaphore *sem,
struct rwsem_waiter *waiter)
{
- bool first = rwsem_first_waiter(sem) == waiter;
+ struct rwsem_waiter *first = rwsem_first_waiter(sem);
long count, new;

lockdep_assert_held(&sem->wait_lock);
@@ -579,11 +579,20 @@ static inline bool rwsem_try_write_lock(
bool has_handoff = !!(count & RWSEM_FLAG_HANDOFF);

if (has_handoff) {
- if (!first)
+ /*
+ * Honor handoff bit and yield only when the first
+ * waiter is the one that set it. Otherwisee, we
+ * still try to acquire the rwsem.
+ */
+ if (first->handoff_set && (waiter != first))
return false;

- /* First waiter inherits a previously set handoff bit */
- waiter->handoff_set = true;
+ /*
+ * First waiter can inherit a previously set handoff
+ * bit and spin on rwsem if lock acquisition fails.
+ */
+ if (waiter == first)
+ waiter->handoff_set = true;
}

new = count;
@@ -978,6 +987,7 @@ queue:
waiter.task = current;
waiter.type = RWSEM_WAITING_FOR_READ;
waiter.timeout = jiffies + RWSEM_WAIT_TIMEOUT;
+ waiter.handoff_set = false;

raw_spin_lock_irq(&sem->wait_lock);
if (list_empty(&sem->wait_list)) {

2022-08-01 13:16:33

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 5.15 64/69] page_alloc: fix invalid watermark check on a negative value

From: Jaewon Kim <[email protected]>

commit 9282012fc0aa248b77a69f5eb802b67c5a16bb13 upstream.

There was a report that a task is waiting at the
throttle_direct_reclaim. The pgscan_direct_throttle in vmstat was
increasing.

This is a bug where zone_watermark_fast returns true even when the free
is very low. The commit f27ce0e14088 ("page_alloc: consider highatomic
reserve in watermark fast") changed the watermark fast to consider
highatomic reserve. But it did not handle a negative value case which
can be happened when reserved_highatomic pageblock is bigger than the
actual free.

If watermark is considered as ok for the negative value, allocating
contexts for order-0 will consume all free pages without direct reclaim,
and finally free page may become depleted except highatomic free.

Then allocating contexts may fall into throttle_direct_reclaim. This
symptom may easily happen in a system where wmark min is low and other
reclaimers like kswapd does not make free pages quickly.

Handle the negative case by using MIN.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: f27ce0e14088 ("page_alloc: consider highatomic reserve in watermark fast")
Signed-off-by: Jaewon Kim <[email protected]>
Reported-by: GyeongHwan Hong <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Yong-Taek Lee <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
mm/page_alloc.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)

--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3928,11 +3928,15 @@ static inline bool zone_watermark_fast(s
* need to be calculated.
*/
if (!order) {
- long fast_free;
+ long usable_free;
+ long reserved;

- fast_free = free_pages;
- fast_free -= __zone_watermark_unusable_free(z, 0, alloc_flags);
- if (fast_free > mark + z->lowmem_reserve[highest_zoneidx])
+ usable_free = free_pages;
+ reserved = __zone_watermark_unusable_free(z, 0, alloc_flags);
+
+ /* reserved may over estimate high-atomic reserves. */
+ usable_free -= min(usable_free, reserved);
+ if (usable_free > mark + z->lowmem_reserve[highest_zoneidx])
return true;
}

2022-08-01 13:16:45

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 5.15 58/69] virtio-net: fix the race between refill work and close

From: Jason Wang <[email protected]>

[ Upstream commit 5a159128faff151b7fe5f4eb0f310b1e0a2d56bf ]

We try using cancel_delayed_work_sync() to prevent the work from
enabling NAPI. This is insufficient since we don't disable the source
of the refill work scheduling. This means an NAPI poll callback after
cancel_delayed_work_sync() can schedule the refill work then can
re-enable the NAPI that leads to use-after-free [1].

Since the work can enable NAPI, we can't simply disable NAPI before
calling cancel_delayed_work_sync(). So fix this by introducing a
dedicated boolean to control whether or not the work could be
scheduled from NAPI.

[1]
==================================================================
BUG: KASAN: use-after-free in refill_work+0x43/0xd4
Read of size 2 at addr ffff88810562c92e by task kworker/2:1/42

CPU: 2 PID: 42 Comm: kworker/2:1 Not tainted 5.19.0-rc1+ #480
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Workqueue: events refill_work
Call Trace:
<TASK>
dump_stack_lvl+0x34/0x44
print_report.cold+0xbb/0x6ac
? _printk+0xad/0xde
? refill_work+0x43/0xd4
kasan_report+0xa8/0x130
? refill_work+0x43/0xd4
refill_work+0x43/0xd4
process_one_work+0x43d/0x780
worker_thread+0x2a0/0x6f0
? process_one_work+0x780/0x780
kthread+0x167/0x1a0
? kthread_exit+0x50/0x50
ret_from_fork+0x22/0x30
</TASK>
...

Fixes: b2baed69e605c ("virtio_net: set/cancel work on ndo_open/ndo_stop")
Signed-off-by: Jason Wang <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Xuan Zhuo <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/virtio_net.c | 37 ++++++++++++++++++++++++++++++++++---
1 file changed, 34 insertions(+), 3 deletions(-)

--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -213,9 +213,15 @@ struct virtnet_info {
/* Packet virtio header size */
u8 hdr_len;

- /* Work struct for refilling if we run low on memory. */
+ /* Work struct for delayed refilling if we run low on memory. */
struct delayed_work refill;

+ /* Is delayed refill enabled? */
+ bool refill_enabled;
+
+ /* The lock to synchronize the access to refill_enabled */
+ spinlock_t refill_lock;
+
/* Work struct for config space updates */
struct work_struct config_work;

@@ -319,6 +325,20 @@ static struct page *get_a_page(struct re
return p;
}

+static void enable_delayed_refill(struct virtnet_info *vi)
+{
+ spin_lock_bh(&vi->refill_lock);
+ vi->refill_enabled = true;
+ spin_unlock_bh(&vi->refill_lock);
+}
+
+static void disable_delayed_refill(struct virtnet_info *vi)
+{
+ spin_lock_bh(&vi->refill_lock);
+ vi->refill_enabled = false;
+ spin_unlock_bh(&vi->refill_lock);
+}
+
static void virtqueue_napi_schedule(struct napi_struct *napi,
struct virtqueue *vq)
{
@@ -1454,8 +1474,12 @@ static int virtnet_receive(struct receiv
}

if (rq->vq->num_free > min((unsigned int)budget, virtqueue_get_vring_size(rq->vq)) / 2) {
- if (!try_fill_recv(vi, rq, GFP_ATOMIC))
- schedule_delayed_work(&vi->refill, 0);
+ if (!try_fill_recv(vi, rq, GFP_ATOMIC)) {
+ spin_lock(&vi->refill_lock);
+ if (vi->refill_enabled)
+ schedule_delayed_work(&vi->refill, 0);
+ spin_unlock(&vi->refill_lock);
+ }
}

u64_stats_update_begin(&rq->stats.syncp);
@@ -1578,6 +1602,8 @@ static int virtnet_open(struct net_devic
struct virtnet_info *vi = netdev_priv(dev);
int i, err;

+ enable_delayed_refill(vi);
+
for (i = 0; i < vi->max_queue_pairs; i++) {
if (i < vi->curr_queue_pairs)
/* Make sure we have some buffers: if oom use wq. */
@@ -1958,6 +1984,8 @@ static int virtnet_close(struct net_devi
struct virtnet_info *vi = netdev_priv(dev);
int i;

+ /* Make sure NAPI doesn't schedule refill work */
+ disable_delayed_refill(vi);
/* Make sure refill_work doesn't re-enable napi! */
cancel_delayed_work_sync(&vi->refill);

@@ -2455,6 +2483,8 @@ static int virtnet_restore_up(struct vir

virtio_device_ready(vdev);

+ enable_delayed_refill(vi);
+
if (netif_running(vi->dev)) {
err = virtnet_open(vi->dev);
if (err)
@@ -3162,6 +3192,7 @@ static int virtnet_probe(struct virtio_d
vdev->priv = vi;

INIT_WORK(&vi->config_work, virtnet_config_changed_work);
+ spin_lock_init(&vi->refill_lock);

/* If we can receive ANY GSO packets, we must allocate large ones. */
if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||

2022-08-01 13:17:11

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 5.15 02/69] Revert "ocfs2: mount shared volume without ha stack"

From: Junxiao Bi <[email protected]>

commit c80af0c250c8f8a3c978aa5aafbe9c39b336b813 upstream.

This reverts commit 912f655d78c5d4ad05eac287f23a435924df7144.

This commit introduced a regression that can cause mount hung. The
changes in __ocfs2_find_empty_slot causes that any node with none-zero
node number can grab the slot that was already taken by node 0, so node 1
will access the same journal with node 0, when it try to grab journal
cluster lock, it will hung because it was already acquired by node 0.
It's very easy to reproduce this, in one cluster, mount node 0 first, then
node 1, you will see the following call trace from node 1.

[13148.735424] INFO: task mount.ocfs2:53045 blocked for more than 122 seconds.
[13148.739691] Not tainted 5.15.0-2148.0.4.el8uek.mountracev2.x86_64 #2
[13148.742560] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[13148.745846] task:mount.ocfs2 state:D stack: 0 pid:53045 ppid: 53044 flags:0x00004000
[13148.749354] Call Trace:
[13148.750718] <TASK>
[13148.752019] ? usleep_range+0x90/0x89
[13148.753882] __schedule+0x210/0x567
[13148.755684] schedule+0x44/0xa8
[13148.757270] schedule_timeout+0x106/0x13c
[13148.759273] ? __prepare_to_swait+0x53/0x78
[13148.761218] __wait_for_common+0xae/0x163
[13148.763144] __ocfs2_cluster_lock.constprop.0+0x1d6/0x870 [ocfs2]
[13148.765780] ? ocfs2_inode_lock_full_nested+0x18d/0x398 [ocfs2]
[13148.768312] ocfs2_inode_lock_full_nested+0x18d/0x398 [ocfs2]
[13148.770968] ocfs2_journal_init+0x91/0x340 [ocfs2]
[13148.773202] ocfs2_check_volume+0x39/0x461 [ocfs2]
[13148.775401] ? iput+0x69/0xba
[13148.777047] ocfs2_mount_volume.isra.0.cold+0x40/0x1f5 [ocfs2]
[13148.779646] ocfs2_fill_super+0x54b/0x853 [ocfs2]
[13148.781756] mount_bdev+0x190/0x1b7
[13148.783443] ? ocfs2_remount+0x440/0x440 [ocfs2]
[13148.785634] legacy_get_tree+0x27/0x48
[13148.787466] vfs_get_tree+0x25/0xd0
[13148.789270] do_new_mount+0x18c/0x2d9
[13148.791046] __x64_sys_mount+0x10e/0x142
[13148.792911] do_syscall_64+0x3b/0x89
[13148.794667] entry_SYSCALL_64_after_hwframe+0x170/0x0
[13148.797051] RIP: 0033:0x7f2309f6e26e
[13148.798784] RSP: 002b:00007ffdcee7d408 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
[13148.801974] RAX: ffffffffffffffda RBX: 00007ffdcee7d4a0 RCX: 00007f2309f6e26e
[13148.804815] RDX: 0000559aa762a8ae RSI: 0000559aa939d340 RDI: 0000559aa93a22b0
[13148.807719] RBP: 00007ffdcee7d5b0 R08: 0000559aa93a2290 R09: 00007f230a0b4820
[13148.810659] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffdcee7d420
[13148.813609] R13: 0000000000000000 R14: 0000559aa939f000 R15: 0000000000000000
[13148.816564] </TASK>

To fix it, we can just fix __ocfs2_find_empty_slot. But original commit
introduced the feature to mount ocfs2 locally even it is cluster based,
that is a very dangerous, it can easily cause serious data corruption,
there is no way to stop other nodes mounting the fs and corrupting it.
Setup ha or other cluster-aware stack is just the cost that we have to
take for avoiding corruption, otherwise we have to do it in kernel.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 912f655d78c5("ocfs2: mount shared volume without ha stack")
Signed-off-by: Junxiao Bi <[email protected]>
Acked-by: Joseph Qi <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Gang He <[email protected]>
Cc: Jun Piao <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/ocfs2/ocfs2.h | 4 +---
fs/ocfs2/slot_map.c | 46 +++++++++++++++++++---------------------------
fs/ocfs2/super.c | 21 ---------------------
3 files changed, 20 insertions(+), 51 deletions(-)

--- a/fs/ocfs2/ocfs2.h
+++ b/fs/ocfs2/ocfs2.h
@@ -277,7 +277,6 @@ enum ocfs2_mount_options
OCFS2_MOUNT_JOURNAL_ASYNC_COMMIT = 1 << 15, /* Journal Async Commit */
OCFS2_MOUNT_ERRORS_CONT = 1 << 16, /* Return EIO to the calling process on error */
OCFS2_MOUNT_ERRORS_ROFS = 1 << 17, /* Change filesystem to read-only on error */
- OCFS2_MOUNT_NOCLUSTER = 1 << 18, /* No cluster aware filesystem mount */
};

#define OCFS2_OSB_SOFT_RO 0x0001
@@ -673,8 +672,7 @@ static inline int ocfs2_cluster_o2cb_glo

static inline int ocfs2_mount_local(struct ocfs2_super *osb)
{
- return ((osb->s_feature_incompat & OCFS2_FEATURE_INCOMPAT_LOCAL_MOUNT)
- || (osb->s_mount_opt & OCFS2_MOUNT_NOCLUSTER));
+ return (osb->s_feature_incompat & OCFS2_FEATURE_INCOMPAT_LOCAL_MOUNT);
}

static inline int ocfs2_uses_extended_slot_map(struct ocfs2_super *osb)
--- a/fs/ocfs2/slot_map.c
+++ b/fs/ocfs2/slot_map.c
@@ -252,16 +252,14 @@ static int __ocfs2_find_empty_slot(struc
int i, ret = -ENOSPC;

if ((preferred >= 0) && (preferred < si->si_num_slots)) {
- if (!si->si_slots[preferred].sl_valid ||
- !si->si_slots[preferred].sl_node_num) {
+ if (!si->si_slots[preferred].sl_valid) {
ret = preferred;
goto out;
}
}

for(i = 0; i < si->si_num_slots; i++) {
- if (!si->si_slots[i].sl_valid ||
- !si->si_slots[i].sl_node_num) {
+ if (!si->si_slots[i].sl_valid) {
ret = i;
break;
}
@@ -456,30 +454,24 @@ int ocfs2_find_slot(struct ocfs2_super *
spin_lock(&osb->osb_lock);
ocfs2_update_slot_info(si);

- if (ocfs2_mount_local(osb))
- /* use slot 0 directly in local mode */
- slot = 0;
- else {
- /* search for ourselves first and take the slot if it already
- * exists. Perhaps we need to mark this in a variable for our
- * own journal recovery? Possibly not, though we certainly
- * need to warn to the user */
- slot = __ocfs2_node_num_to_slot(si, osb->node_num);
+ /* search for ourselves first and take the slot if it already
+ * exists. Perhaps we need to mark this in a variable for our
+ * own journal recovery? Possibly not, though we certainly
+ * need to warn to the user */
+ slot = __ocfs2_node_num_to_slot(si, osb->node_num);
+ if (slot < 0) {
+ /* if no slot yet, then just take 1st available
+ * one. */
+ slot = __ocfs2_find_empty_slot(si, osb->preferred_slot);
if (slot < 0) {
- /* if no slot yet, then just take 1st available
- * one. */
- slot = __ocfs2_find_empty_slot(si, osb->preferred_slot);
- if (slot < 0) {
- spin_unlock(&osb->osb_lock);
- mlog(ML_ERROR, "no free slots available!\n");
- status = -EINVAL;
- goto bail;
- }
- } else
- printk(KERN_INFO "ocfs2: Slot %d on device (%s) was "
- "already allocated to this node!\n",
- slot, osb->dev_str);
- }
+ spin_unlock(&osb->osb_lock);
+ mlog(ML_ERROR, "no free slots available!\n");
+ status = -EINVAL;
+ goto bail;
+ }
+ } else
+ printk(KERN_INFO "ocfs2: Slot %d on device (%s) was already "
+ "allocated to this node!\n", slot, osb->dev_str);

ocfs2_set_slot(si, slot, osb->node_num);
osb->slot_num = slot;
--- a/fs/ocfs2/super.c
+++ b/fs/ocfs2/super.c
@@ -173,7 +173,6 @@ enum {
Opt_dir_resv_level,
Opt_journal_async_commit,
Opt_err_cont,
- Opt_nocluster,
Opt_err,
};

@@ -207,7 +206,6 @@ static const match_table_t tokens = {
{Opt_dir_resv_level, "dir_resv_level=%u"},
{Opt_journal_async_commit, "journal_async_commit"},
{Opt_err_cont, "errors=continue"},
- {Opt_nocluster, "nocluster"},
{Opt_err, NULL}
};

@@ -619,13 +617,6 @@ static int ocfs2_remount(struct super_bl
goto out;
}

- tmp = OCFS2_MOUNT_NOCLUSTER;
- if ((osb->s_mount_opt & tmp) != (parsed_options.mount_opt & tmp)) {
- ret = -EINVAL;
- mlog(ML_ERROR, "Cannot change nocluster option on remount\n");
- goto out;
- }
-
tmp = OCFS2_MOUNT_HB_LOCAL | OCFS2_MOUNT_HB_GLOBAL |
OCFS2_MOUNT_HB_NONE;
if ((osb->s_mount_opt & tmp) != (parsed_options.mount_opt & tmp)) {
@@ -866,7 +857,6 @@ static int ocfs2_verify_userspace_stack(
}

if (ocfs2_userspace_stack(osb) &&
- !(osb->s_mount_opt & OCFS2_MOUNT_NOCLUSTER) &&
strncmp(osb->osb_cluster_stack, mopt->cluster_stack,
OCFS2_STACK_LABEL_LEN)) {
mlog(ML_ERROR,
@@ -1145,11 +1135,6 @@ static int ocfs2_fill_super(struct super
osb->s_mount_opt & OCFS2_MOUNT_DATA_WRITEBACK ? "writeback" :
"ordered");

- if ((osb->s_mount_opt & OCFS2_MOUNT_NOCLUSTER) &&
- !(osb->s_feature_incompat & OCFS2_FEATURE_INCOMPAT_LOCAL_MOUNT))
- printk(KERN_NOTICE "ocfs2: The shared device (%s) is mounted "
- "without cluster aware mode.\n", osb->dev_str);
-
atomic_set(&osb->vol_state, VOLUME_MOUNTED);
wake_up(&osb->osb_mount_event);

@@ -1456,9 +1441,6 @@ static int ocfs2_parse_options(struct su
case Opt_journal_async_commit:
mopt->mount_opt |= OCFS2_MOUNT_JOURNAL_ASYNC_COMMIT;
break;
- case Opt_nocluster:
- mopt->mount_opt |= OCFS2_MOUNT_NOCLUSTER;
- break;
default:
mlog(ML_ERROR,
"Unrecognized mount option \"%s\" "
@@ -1570,9 +1552,6 @@ static int ocfs2_show_options(struct seq
if (opts & OCFS2_MOUNT_JOURNAL_ASYNC_COMMIT)
seq_printf(s, ",journal_async_commit");

- if (opts & OCFS2_MOUNT_NOCLUSTER)
- seq_printf(s, ",nocluster");
-
return 0;
}

2022-08-01 13:17:24

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 5.15 47/69] tcp: Fix data-races around sk_pacing_rate.

From: Kuniyuki Iwashima <[email protected]>

[ Upstream commit 59bf6c65a09fff74215517aecffbbdcd67df76e3 ]

While reading sysctl_tcp_pacing_(ss|ca)_ratio, they can be changed
concurrently. Thus, we need to add READ_ONCE() to their readers.

Fixes: 43e122b014c9 ("tcp: refine pacing rate determination")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/ipv4/tcp_input.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 018be3f346e6..566745f527fe 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -902,9 +902,9 @@ static void tcp_update_pacing_rate(struct sock *sk)
* end of slow start and should slow down.
*/
if (tcp_snd_cwnd(tp) < tp->snd_ssthresh / 2)
- rate *= sock_net(sk)->ipv4.sysctl_tcp_pacing_ss_ratio;
+ rate *= READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_pacing_ss_ratio);
else
- rate *= sock_net(sk)->ipv4.sysctl_tcp_pacing_ca_ratio;
+ rate *= READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_pacing_ca_ratio);

rate *= max(tcp_snd_cwnd(tp), tp->packets_out);

--
2.35.1

2022-08-01 13:17:29

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 5.15 60/69] sfc: disable softirqs for ptp TX

From: Alejandro Lucero <[email protected]>

[ Upstream commit 67c3b611d92fc238c43734878bc3e232ab570c79 ]

Sending a PTP packet can imply to use the normal TX driver datapath but
invoked from the driver's ptp worker. The kernel generic TX code
disables softirqs and preemption before calling specific driver TX code,
but the ptp worker does not. Although current ptp driver functionality
does not require it, there are several reasons for doing so:

1) The invoked code is always executed with softirqs disabled for non
PTP packets.
2) Better if a ptp packet transmission is not interrupted by softirq
handling which could lead to high latencies.
3) netdev_xmit_more used by the TX code requires preemption to be
disabled.

Indeed a solution for dealing with kernel preemption state based on static
kernel configuration is not possible since the introduction of dynamic
preemption level configuration at boot time using the static calls
functionality.

Fixes: f79c957a0b537 ("drivers: net: sfc: use netdev_xmit_more helper")
Signed-off-by: Alejandro Lucero <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/net/ethernet/sfc/ptp.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)

diff --git a/drivers/net/ethernet/sfc/ptp.c b/drivers/net/ethernet/sfc/ptp.c
index 725b0f38813a..a2b4e3befa59 100644
--- a/drivers/net/ethernet/sfc/ptp.c
+++ b/drivers/net/ethernet/sfc/ptp.c
@@ -1100,7 +1100,29 @@ static void efx_ptp_xmit_skb_queue(struct efx_nic *efx, struct sk_buff *skb)

tx_queue = efx_channel_get_tx_queue(ptp_data->channel, type);
if (tx_queue && tx_queue->timestamping) {
+ /* This code invokes normal driver TX code which is always
+ * protected from softirqs when called from generic TX code,
+ * which in turn disables preemption. Look at __dev_queue_xmit
+ * which uses rcu_read_lock_bh disabling preemption for RCU
+ * plus disabling softirqs. We do not need RCU reader
+ * protection here.
+ *
+ * Although it is theoretically safe for current PTP TX/RX code
+ * running without disabling softirqs, there are three good
+ * reasond for doing so:
+ *
+ * 1) The code invoked is mainly implemented for non-PTP
+ * packets and it is always executed with softirqs
+ * disabled.
+ * 2) This being a single PTP packet, better to not
+ * interrupt its processing by softirqs which can lead
+ * to high latencies.
+ * 3) netdev_xmit_more checks preemption is disabled and
+ * triggers a BUG_ON if not.
+ */
+ local_bh_disable();
efx_enqueue_skb(tx_queue, skb);
+ local_bh_enable();
} else {
WARN_ONCE(1, "PTP channel has no timestamped tx queue\n");
dev_kfree_skb_any(skb);
--
2.35.1

2022-08-01 13:17:39

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 5.15 37/69] tcp: Fix a data-race around sysctl_tcp_min_rtt_wlen.

From: Kuniyuki Iwashima <[email protected]>

[ Upstream commit 1330ffacd05fc9ac4159d19286ce119e22450ed2 ]

While reading sysctl_tcp_min_rtt_wlen, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: f672258391b4 ("tcp: track min RTT using windowed min-filter")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/ipv4/tcp_input.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index a5357ebfbcc0..b925c766f1d2 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3050,7 +3050,7 @@ static void tcp_fastretrans_alert(struct sock *sk, const u32 prior_snd_una,

static void tcp_update_rtt_min(struct sock *sk, u32 rtt_us, const int flag)
{
- u32 wlen = sock_net(sk)->ipv4.sysctl_tcp_min_rtt_wlen * HZ;
+ u32 wlen = READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_min_rtt_wlen) * HZ;
struct tcp_sock *tp = tcp_sk(sk);

if ((flag & FLAG_ACK_MAYBE_DELAYED) && rtt_us > tcp_min_rtt(tp)) {
--
2.35.1

2022-08-01 13:18:17

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 5.15 50/69] tcp: Fix a data-race around sysctl_tcp_comp_sack_slack_ns.

From: Kuniyuki Iwashima <[email protected]>

[ Upstream commit 22396941a7f343d704738360f9ef0e6576489d43 ]

While reading sysctl_tcp_comp_sack_slack_ns, it can be changed
concurrently. Thus, we need to add READ_ONCE() to its reader.

Fixes: a70437cc09a1 ("tcp: add hrtimer slack to sack compression")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/ipv4/tcp_input.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 486ca1d5b436..7b593865b4ae 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -5517,7 +5517,7 @@ static void __tcp_ack_snd_check(struct sock *sk, int ofo_possible)
rtt * (NSEC_PER_USEC >> 3)/20);
sock_hold(sk);
hrtimer_start_range_ns(&tp->compressed_ack_timer, ns_to_ktime(delay),
- sock_net(sk)->ipv4.sysctl_tcp_comp_sack_slack_ns,
+ READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_comp_sack_slack_ns),
HRTIMER_MODE_REL_PINNED_SOFT);
}

--
2.35.1

2022-08-01 13:18:44

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 5.15 07/69] hugetlb: fix memoryleak in hugetlb_mcopy_atomic_pte

From: Miaohe Lin <[email protected]>

commit da9a298f5fad0dc615079a340da42928bc5b138e upstream.

When alloc_huge_page fails, *pagep is set to NULL without put_page first.
So the hugepage indicated by *pagep is leaked.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 8cc5fcbb5be8 ("mm, hugetlb: fix racy resv_huge_pages underflow on UFFDIO_COPY")
Signed-off-by: Miaohe Lin <[email protected]>
Acked-by: Muchun Song <[email protected]>
Reviewed-by: Anshuman Khandual <[email protected]>
Reviewed-by: Baolin Wang <[email protected]>
Reviewed-by: Mike Kravetz <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
mm/hugetlb.c | 1 +
1 file changed, 1 insertion(+)

--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5314,6 +5314,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_s

page = alloc_huge_page(dst_vma, dst_addr, 0);
if (IS_ERR(page)) {
+ put_page(*pagep);
ret = -ENOMEM;
*pagep = NULL;
goto out;

2022-08-01 13:18:50

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 5.15 01/69] Bluetooth: L2CAP: Fix use-after-free caused by l2cap_chan_put

From: Luiz Augusto von Dentz <[email protected]>

commit d0be8347c623e0ac4202a1d4e0373882821f56b0 upstream.

This fixes the following trace which is caused by hci_rx_work starting up
*after* the final channel reference has been put() during sock_close() but
*before* the references to the channel have been destroyed, so instead
the code now rely on kref_get_unless_zero/l2cap_chan_hold_unless_zero to
prevent referencing a channel that is about to be destroyed.

refcount_t: increment on 0; use-after-free.
BUG: KASAN: use-after-free in refcount_dec_and_test+0x20/0xd0
Read of size 4 at addr ffffffc114f5bf18 by task kworker/u17:14/705

CPU: 4 PID: 705 Comm: kworker/u17:14 Tainted: G S W
4.14.234-00003-g1fb6d0bd49a4-dirty #28
Hardware name: Qualcomm Technologies, Inc. SM8150 V2 PM8150
Google Inc. MSM sm8150 Flame DVT (DT)
Workqueue: hci0 hci_rx_work
Call trace:
dump_backtrace+0x0/0x378
show_stack+0x20/0x2c
dump_stack+0x124/0x148
print_address_description+0x80/0x2e8
__kasan_report+0x168/0x188
kasan_report+0x10/0x18
__asan_load4+0x84/0x8c
refcount_dec_and_test+0x20/0xd0
l2cap_chan_put+0x48/0x12c
l2cap_recv_frame+0x4770/0x6550
l2cap_recv_acldata+0x44c/0x7a4
hci_acldata_packet+0x100/0x188
hci_rx_work+0x178/0x23c
process_one_work+0x35c/0x95c
worker_thread+0x4cc/0x960
kthread+0x1a8/0x1c4
ret_from_fork+0x10/0x18

Cc: [email protected]
Reported-by: Lee Jones <[email protected]>
Signed-off-by: Luiz Augusto von Dentz <[email protected]>
Tested-by: Lee Jones <[email protected]>
Signed-off-by: Luiz Augusto von Dentz <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/net/bluetooth/l2cap.h | 1
net/bluetooth/l2cap_core.c | 61 +++++++++++++++++++++++++++++++++---------
2 files changed, 49 insertions(+), 13 deletions(-)

--- a/include/net/bluetooth/l2cap.h
+++ b/include/net/bluetooth/l2cap.h
@@ -847,6 +847,7 @@ enum {
};

void l2cap_chan_hold(struct l2cap_chan *c);
+struct l2cap_chan *l2cap_chan_hold_unless_zero(struct l2cap_chan *c);
void l2cap_chan_put(struct l2cap_chan *c);

static inline void l2cap_chan_lock(struct l2cap_chan *chan)
--- a/net/bluetooth/l2cap_core.c
+++ b/net/bluetooth/l2cap_core.c
@@ -111,7 +111,8 @@ static struct l2cap_chan *__l2cap_get_ch
}

/* Find channel with given SCID.
- * Returns locked channel. */
+ * Returns a reference locked channel.
+ */
static struct l2cap_chan *l2cap_get_chan_by_scid(struct l2cap_conn *conn,
u16 cid)
{
@@ -119,15 +120,19 @@ static struct l2cap_chan *l2cap_get_chan

mutex_lock(&conn->chan_lock);
c = __l2cap_get_chan_by_scid(conn, cid);
- if (c)
- l2cap_chan_lock(c);
+ if (c) {
+ /* Only lock if chan reference is not 0 */
+ c = l2cap_chan_hold_unless_zero(c);
+ if (c)
+ l2cap_chan_lock(c);
+ }
mutex_unlock(&conn->chan_lock);

return c;
}

/* Find channel with given DCID.
- * Returns locked channel.
+ * Returns a reference locked channel.
*/
static struct l2cap_chan *l2cap_get_chan_by_dcid(struct l2cap_conn *conn,
u16 cid)
@@ -136,8 +141,12 @@ static struct l2cap_chan *l2cap_get_chan

mutex_lock(&conn->chan_lock);
c = __l2cap_get_chan_by_dcid(conn, cid);
- if (c)
- l2cap_chan_lock(c);
+ if (c) {
+ /* Only lock if chan reference is not 0 */
+ c = l2cap_chan_hold_unless_zero(c);
+ if (c)
+ l2cap_chan_lock(c);
+ }
mutex_unlock(&conn->chan_lock);

return c;
@@ -162,8 +171,12 @@ static struct l2cap_chan *l2cap_get_chan

mutex_lock(&conn->chan_lock);
c = __l2cap_get_chan_by_ident(conn, ident);
- if (c)
- l2cap_chan_lock(c);
+ if (c) {
+ /* Only lock if chan reference is not 0 */
+ c = l2cap_chan_hold_unless_zero(c);
+ if (c)
+ l2cap_chan_lock(c);
+ }
mutex_unlock(&conn->chan_lock);

return c;
@@ -497,6 +510,16 @@ void l2cap_chan_hold(struct l2cap_chan *
kref_get(&c->kref);
}

+struct l2cap_chan *l2cap_chan_hold_unless_zero(struct l2cap_chan *c)
+{
+ BT_DBG("chan %p orig refcnt %u", c, kref_read(&c->kref));
+
+ if (!kref_get_unless_zero(&c->kref))
+ return NULL;
+
+ return c;
+}
+
void l2cap_chan_put(struct l2cap_chan *c)
{
BT_DBG("chan %p orig refcnt %u", c, kref_read(&c->kref));
@@ -1969,7 +1992,10 @@ static struct l2cap_chan *l2cap_global_c
src_match = !bacmp(&c->src, src);
dst_match = !bacmp(&c->dst, dst);
if (src_match && dst_match) {
- l2cap_chan_hold(c);
+ c = l2cap_chan_hold_unless_zero(c);
+ if (!c)
+ continue;
+
read_unlock(&chan_list_lock);
return c;
}
@@ -1984,7 +2010,7 @@ static struct l2cap_chan *l2cap_global_c
}

if (c1)
- l2cap_chan_hold(c1);
+ c1 = l2cap_chan_hold_unless_zero(c1);

read_unlock(&chan_list_lock);

@@ -4464,6 +4490,7 @@ static inline int l2cap_config_req(struc

unlock:
l2cap_chan_unlock(chan);
+ l2cap_chan_put(chan);
return err;
}

@@ -4578,6 +4605,7 @@ static inline int l2cap_config_rsp(struc

done:
l2cap_chan_unlock(chan);
+ l2cap_chan_put(chan);
return err;
}

@@ -5305,6 +5333,7 @@ send_move_response:
l2cap_send_move_chan_rsp(chan, result);

l2cap_chan_unlock(chan);
+ l2cap_chan_put(chan);

return 0;
}
@@ -5397,6 +5426,7 @@ static void l2cap_move_continue(struct l
}

l2cap_chan_unlock(chan);
+ l2cap_chan_put(chan);
}

static void l2cap_move_fail(struct l2cap_conn *conn, u8 ident, u16 icid,
@@ -5426,6 +5456,7 @@ static void l2cap_move_fail(struct l2cap
l2cap_send_move_chan_cfm(chan, L2CAP_MC_UNCONFIRMED);

l2cap_chan_unlock(chan);
+ l2cap_chan_put(chan);
}

static int l2cap_move_channel_rsp(struct l2cap_conn *conn,
@@ -5489,6 +5520,7 @@ static int l2cap_move_channel_confirm(st
l2cap_send_move_chan_cfm_rsp(conn, cmd->ident, icid);

l2cap_chan_unlock(chan);
+ l2cap_chan_put(chan);

return 0;
}
@@ -5524,6 +5556,7 @@ static inline int l2cap_move_channel_con
}

l2cap_chan_unlock(chan);
+ l2cap_chan_put(chan);

return 0;
}
@@ -5896,12 +5929,11 @@ static inline int l2cap_le_credits(struc
if (credits > max_credits) {
BT_ERR("LE credits overflow");
l2cap_send_disconn_req(chan, ECONNRESET);
- l2cap_chan_unlock(chan);

/* Return 0 so that we don't trigger an unnecessary
* command reject packet.
*/
- return 0;
+ goto unlock;
}

chan->tx_credits += credits;
@@ -5912,7 +5944,9 @@ static inline int l2cap_le_credits(struc
if (chan->tx_credits)
chan->ops->resume(chan);

+unlock:
l2cap_chan_unlock(chan);
+ l2cap_chan_put(chan);

return 0;
}
@@ -7598,6 +7632,7 @@ drop:

done:
l2cap_chan_unlock(chan);
+ l2cap_chan_put(chan);
}

static void l2cap_conless_channel(struct l2cap_conn *conn, __le16 psm,
@@ -8086,7 +8121,7 @@ static struct l2cap_chan *l2cap_global_f
if (src_type != c->src_type)
continue;

- l2cap_chan_hold(c);
+ c = l2cap_chan_hold_unless_zero(c);
read_unlock(&chan_list_lock);
return c;
}

2022-08-01 13:19:29

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 5.15 06/69] mm: fix page leak with multiple threads mapping the same page

From: Josef Bacik <[email protected]>

commit 3fe2895cfecd03ac74977f32102b966b6589f481 upstream.

We have an application with a lot of threads that use a shared mmap backed
by tmpfs mounted with -o huge=within_size. This application started
leaking loads of huge pages when we upgraded to a recent kernel.

Using the page ref tracepoints and a BPF program written by Tejun Heo we
were able to determine that these pages would have multiple refcounts from
the page fault path, but when it came to unmap time we wouldn't drop the
number of refs we had added from the faults.

I wrote a reproducer that mmap'ed a file backed by tmpfs with -o
huge=always, and then spawned 20 threads all looping faulting random
offsets in this map, while using madvise(MADV_DONTNEED) randomly for huge
page aligned ranges. This very quickly reproduced the problem.

The problem here is that we check for the case that we have multiple
threads faulting in a range that was previously unmapped. One thread maps
the PMD, the other thread loses the race and then returns 0. However at
this point we already have the page, and we are no longer putting this
page into the processes address space, and so we leak the page. We
actually did the correct thing prior to f9ce0be71d1f, however it looks
like Kirill copied what we do in the anonymous page case. In the
anonymous page case we don't yet have a page, so we don't have to drop a
reference on anything. Previously we did the correct thing for file based
faults by returning VM_FAULT_NOPAGE so we correctly drop the reference on
the page we faulted in.

Fix this by returning VM_FAULT_NOPAGE in the pmd_devmap_trans_unstable()
case, this makes us drop the ref on the page properly, and now my
reproducer no longer leaks the huge pages.

[[email protected]: v2]
Link: https://lkml.kernel.org/r/e90c8f0dbae836632b669c2afc434006a00d4a67.1657721478.git.josef@toxicpanda.com
Link: https://lkml.kernel.org/r/2b798acfd95c9ab9395fe85e8d5a835e2e10a920.1657051137.git.josef@toxicpanda.com
Fixes: f9ce0be71d1f ("mm: Cleanup faultaround and finish_fault() codepaths")
Signed-off-by: Josef Bacik <[email protected]>
Signed-off-by: Rik van Riel <[email protected]>
Signed-off-by: Chris Mason <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
mm/memory.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4071,9 +4071,12 @@ vm_fault_t finish_fault(struct vm_fault
}
}

- /* See comment in handle_pte_fault() */
+ /*
+ * See comment in handle_pte_fault() for how this scenario happens, we
+ * need to return NOPAGE so that we drop this page.
+ */
if (pmd_devmap_trans_unstable(vmf->pmd))
- return 0;
+ return VM_FAULT_NOPAGE;

vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
vmf->address, &vmf->ptl);

2022-08-01 13:19:33

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 5.15 08/69] asm-generic: remove a broken and needless ifdef conditional

From: Lukas Bulwahn <[email protected]>

commit e2a619ca0b38f2114347b7078b8a67d72d457a3d upstream.

Commit 527701eda5f1 ("lib: Add a generic version of devmem_is_allowed()")
introduces the config symbol GENERIC_LIB_DEVMEM_IS_ALLOWED, but then
falsely refers to CONFIG_GENERIC_DEVMEM_IS_ALLOWED (note the missing LIB
in the reference) in ./include/asm-generic/io.h.

Luckily, ./scripts/checkkconfigsymbols.py warns on non-existing configs:

GENERIC_DEVMEM_IS_ALLOWED
Referencing files: include/asm-generic/io.h

The actual fix, though, is simply to not to make this function declaration
dependent on any kernel config. For architectures that intend to use
the generic version, the arch's 'select GENERIC_LIB_DEVMEM_IS_ALLOWED' will
lead to picking the function definition, and for other architectures, this
function is simply defined elsewhere.

The wrong '#ifndef' on a non-existing config symbol also always had the
same effect (although more by mistake than by intent). So, there is no
functional change.

Remove this broken and needless ifdef conditional.

Fixes: 527701eda5f1 ("lib: Add a generic version of devmem_is_allowed()")
Signed-off-by: Lukas Bulwahn <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/asm-generic/io.h | 2 --
1 file changed, 2 deletions(-)

--- a/include/asm-generic/io.h
+++ b/include/asm-generic/io.h
@@ -1125,9 +1125,7 @@ static inline void memcpy_toio(volatile
}
#endif

-#ifndef CONFIG_GENERIC_DEVMEM_IS_ALLOWED
extern int devmem_is_allowed(unsigned long pfn);
-#endif

#endif /* __KERNEL__ */

2022-08-01 13:19:33

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 5.15 30/69] net: ping6: Fix memleak in ipv6_renew_options().

From: Kuniyuki Iwashima <[email protected]>

commit e27326009a3d247b831eda38878c777f6f4eb3d1 upstream.

When we close ping6 sockets, some resources are left unfreed because
pingv6_prot is missing sk->sk_prot->destroy(). As reported by
syzbot [0], just three syscalls leak 96 bytes and easily cause OOM.

struct ipv6_sr_hdr *hdr;
char data[24] = {0};
int fd;

hdr = (struct ipv6_sr_hdr *)data;
hdr->hdrlen = 2;
hdr->type = IPV6_SRCRT_TYPE_4;

fd = socket(AF_INET6, SOCK_DGRAM, NEXTHDR_ICMP);
setsockopt(fd, IPPROTO_IPV6, IPV6_RTHDR, data, 24);
close(fd);

To fix memory leaks, let's add a destroy function.

Note the socket() syscall checks if the GID is within the range of
net.ipv4.ping_group_range. The default value is [1, 0] so that no
GID meets the condition (1 <= GID <= 0). Thus, the local DoS does
not succeed until we change the default value. However, at least
Ubuntu/Fedora/RHEL loosen it.

$ cat /usr/lib/sysctl.d/50-default.conf
...
-net.ipv4.ping_group_range = 0 2147483647

Also, there could be another path reported with these options, and
some of them require CAP_NET_RAW.

setsockopt
IPV6_ADDRFORM (inet6_sk(sk)->pktoptions)
IPV6_RECVPATHMTU (inet6_sk(sk)->rxpmtu)
IPV6_HOPOPTS (inet6_sk(sk)->opt)
IPV6_RTHDRDSTOPTS (inet6_sk(sk)->opt)
IPV6_RTHDR (inet6_sk(sk)->opt)
IPV6_DSTOPTS (inet6_sk(sk)->opt)
IPV6_2292PKTOPTIONS (inet6_sk(sk)->opt)

getsockopt
IPV6_FLOWLABEL_MGR (inet6_sk(sk)->ipv6_fl_list)

For the record, I left a different splat with syzbot's one.

unreferenced object 0xffff888006270c60 (size 96):
comm "repro2", pid 231, jiffies 4294696626 (age 13.118s)
hex dump (first 32 bytes):
01 00 00 00 44 00 00 00 00 00 00 00 00 00 00 00 ....D...........
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<00000000f6bc7ea9>] sock_kmalloc (net/core/sock.c:2564 net/core/sock.c:2554)
[<000000006d699550>] do_ipv6_setsockopt.constprop.0 (net/ipv6/ipv6_sockglue.c:715)
[<00000000c3c3b1f5>] ipv6_setsockopt (net/ipv6/ipv6_sockglue.c:1024)
[<000000007096a025>] __sys_setsockopt (net/socket.c:2254)
[<000000003a8ff47b>] __x64_sys_setsockopt (net/socket.c:2265 net/socket.c:2262 net/socket.c:2262)
[<000000007c409dcb>] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
[<00000000e939c4a9>] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)

[0]: https://syzkaller.appspot.com/bug?extid=a8430774139ec3ab7176

Fixes: 6d0bfe226116 ("net: ipv6: Add IPv6 support to the ping socket.")
Reported-by: [email protected]
Reported-by: Ayushman Dutta <[email protected]>
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv6/ping.c | 6 ++++++
1 file changed, 6 insertions(+)

--- a/net/ipv6/ping.c
+++ b/net/ipv6/ping.c
@@ -22,6 +22,11 @@
#include <linux/proc_fs.h>
#include <net/ping.h>

+static void ping_v6_destroy(struct sock *sk)
+{
+ inet6_destroy_sock(sk);
+}
+
/* Compatibility glue so we can support IPv6 when it's compiled as a module */
static int dummy_ipv6_recv_error(struct sock *sk, struct msghdr *msg, int len,
int *addr_len)
@@ -166,6 +171,7 @@ struct proto pingv6_prot = {
.owner = THIS_MODULE,
.init = ping_init_sock,
.close = ping_close,
+ .destroy = ping_v6_destroy,
.connect = ip6_datagram_connect_v6_only,
.disconnect = __udp_disconnect,
.setsockopt = ipv6_setsockopt,

2022-08-01 21:47:27

by Florian Fainelli

[permalink] [raw]

Subject: Re: [PATCH 5.15 00/69] 5.15.59-rc1 review

On 8/1/22 04:46, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 5.15.59 release.
> There are 69 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed, 03 Aug 2022 11:41:16 +0000.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.59-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h

On ARCH_BRCMSTB using 32-bit and 64-bit ARM kernels, build tested on BMIPS_GENERIC:

Tested-by: Florian Fainelli <[email protected]>
--
Florian

2022-08-01 22:23:09

by Daniel Díaz

[permalink] [raw]

Subject: Re: [PATCH 5.15 00/69] 5.15.59-rc1 review

Hello!

On 01/08/22 06:46, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 5.15.59 release.
> There are 69 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed, 03 Aug 2022 11:41:16 +0000.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.59-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h

Results from Linaro's test farm.
No regressions on arm64, arm, x86_64, and i386.

Tested-by: Linux Kernel Functional Testing <[email protected]>

## Build
* kernel: 5.15.59-rc1
* git: https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc
* git branch: linux-5.15.y
* git commit: f7c660e98f9bd2987c86dfe0017ad54afb4026e1
* git describe: v5.15.57-273-gf7c660e98f9b
* test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.15.y/build/v5.15.57-273-gf7c660e98f9b

## No test regressions (compared to v5.15.58)

## No metric regressions (compared to v5.15.58)

## No test fixes (compared to v5.15.58)

## No metric fixes (compared to v5.15.58)

## Test result summary
total: 138438, pass: 122071, fail: 702, skip: 14903, xfail: 762

## Build Summary
* arc: 10 total, 10 passed, 0 failed
* arm: 311 total, 308 passed, 3 failed
* arm64: 68 total, 66 passed, 2 failed
* i386: 57 total, 51 passed, 6 failed
* mips: 50 total, 47 passed, 3 failed
* parisc: 14 total, 14 passed, 0 failed
* powerpc: 59 total, 56 passed, 3 failed
* riscv: 27 total, 27 passed, 0 failed
* s390: 26 total, 23 passed, 3 failed
* sh: 26 total, 24 passed, 2 failed
* sparc: 14 total, 14 passed, 0 failed
* x86_64: 61 total, 59 passed, 2 failed

## Test suites summary
* fwts
* igt-gpu-tools
* kunit
* kvm-unit-tests
* libgpiod
* libhugetlbfs
* log-parser-boot
* log-parser-test
* ltp-cap_bounds
* ltp-commands
* ltp-containers
* ltp-controllers
* ltp-cpuhotplug
* ltp-crypto
* ltp-cve
* ltp-dio
* ltp-fcntl-locktests
* ltp-filecaps
* ltp-fs
* ltp-fs_bind
* ltp-fs_perms_simple
* ltp-fsx
* ltp-hugetlb
* ltp-io
* ltp-ipc
* ltp-math
* ltp-mm
* ltp-nptl
* ltp-open-posix-tests
* ltp-pty
* ltp-sched
* ltp-securebits
* ltp-smoke
* ltp-syscalls
* ltp-tracing
* network-basic-tests
* packetdrill
* rcutorture
* ssuite
* v4l2-compliance
* vdso

Greetings!

Daniel Díaz
[email protected]

--
Linaro LKFT
https://lkft.linaro.org

2022-08-01 23:00:31

[permalink] [raw]

Subject: Re: [PATCH 5.15 00/69] 5.15.59-rc1 review

On 8/1/22 5:46 AM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 5.15.59 release.
> There are 69 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed, 03 Aug 2022 11:41:16 +0000.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.59-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h
>

Compiled and booted on my test system. No dmesg regressions.

Tested-by: Shuah Khan <[email protected]>

thanks,
-- Shuah

2022-08-02 04:51:19

by Bagas Sanjaya

[permalink] [raw]

Subject: Re: [PATCH 5.15 00/69] 5.15.59-rc1 review

On Mon, Aug 01, 2022 at 01:46:24PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 5.15.59 release.
> There are 69 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>

Successfully cross-compiled for arm64 (bcm2711_defconfig, GCC 10.2.0)
and powerpc (ps3_defconfig, GCC 12.1.0).

Tested-by: Bagas Sanjaya <[email protected]>

--
An old man doll... just what I always wanted! - Clara

2022-08-02 06:09:38

by Guenter Roeck

[permalink] [raw]

Subject: Re: [PATCH 5.15 00/69] 5.15.59-rc1 review

On Mon, Aug 01, 2022 at 01:46:24PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 5.15.59 release.
> There are 69 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed, 03 Aug 2022 11:41:16 +0000.
> Anything received after that time might be too late.
>

Build results:
total: 159 pass: 159 fail: 0
Qemu test results:
total: 488 pass: 488 fail: 0

Tested-by: Guenter Roeck <[email protected]>

Guenter

2022-08-02 11:17:46

by Ron Economos

[permalink] [raw]

Subject: Re: [PATCH 5.15 00/69] 5.15.59-rc1 review

On 8/1/22 4:46 AM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 5.15.59 release.
> There are 69 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed, 03 Aug 2022 11:41:16 +0000.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.59-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h

Built and booted successfully on RISC-V RV64 (HiFive Unmatched).

Tested-by: Ron Economos <[email protected]>

2022-08-02 17:50:41

by Sudip Mukherjee

[permalink] [raw]

Subject: Re: [PATCH 5.15 00/69] 5.15.59-rc1 review

Hi Greg,

On Mon, Aug 01, 2022 at 01:46:24PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 5.15.59 release.
> There are 69 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed, 03 Aug 2022 11:41:16 +0000.
> Anything received after that time might be too late.

Build test (gcc version 11.3.1 20220724):
mips: 62 configs -> no failure
arm: 99 configs -> no failure
arm64: 3 configs -> no failure
x86_64: 4 configs -> no failure
alpha allmodconfig -> no failure
csky allmodconfig -> no failure
powerpc allmodconfig -> no failure
riscv allmodconfig -> no failure
s390 allmodconfig -> no failure
xtensa allmodconfig -> no failure

Boot test:
x86_64: Booted on my test laptop. No regression.
x86_64: Booted on qemu. No regression. [1]
arm64: Booted on rpi4b (4GB model). No regression. [2]
mips: Booted on ci20 board. No regression. [3]

[1]. https://openqa.qa.codethink.co.uk/tests/1604
[2]. https://openqa.qa.codethink.co.uk/tests/1610
[3]. https://openqa.qa.codethink.co.uk/tests/1612

Tested-by: Sudip Mukherjee <[email protected]>

--
Regards
Sudip