2024-04-07 13:31:22

by Sasha Levin

[permalink] [raw]
Subject: [PATCH AUTOSEL 6.1 01/13] scsi: lpfc: Move NPIV's transport unregistration to after resource clean up

From: Justin Tee <[email protected]>

[ Upstream commit 4ddf01f2f1504fa08b766e8cfeec558e9f8eef6c ]

There are cases after NPIV deletion where the fabric switch still believes
the NPIV is logged into the fabric. This occurs when a vport is
unregistered before the Remove All DA_ID CT and LOGO ELS are sent to the
fabric.

Currently fc_remove_host(), which calls dev_loss_tmo for all D_IDs including
the fabric D_ID, removes the last ndlp reference and frees the ndlp rport
object. This sometimes causes the race condition where the final DA_ID and
LOGO are skipped from being sent to the fabric switch.

Fix by moving the fc_remove_host() and scsi_remove_host() calls after DA_ID
and LOGO are sent.

Signed-off-by: Justin Tee <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Martin K. Petersen <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/scsi/lpfc/lpfc_vport.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_vport.c b/drivers/scsi/lpfc/lpfc_vport.c
index 4d171f5c213f7..6b4259894584f 100644
--- a/drivers/scsi/lpfc/lpfc_vport.c
+++ b/drivers/scsi/lpfc/lpfc_vport.c
@@ -693,10 +693,6 @@ lpfc_vport_delete(struct fc_vport *fc_vport)
lpfc_free_sysfs_attr(vport);
lpfc_debugfs_terminate(vport);

- /* Remove FC host to break driver binding. */
- fc_remove_host(shost);
- scsi_remove_host(shost);
-
/* Send the DA_ID and Fabric LOGO to cleanup Nameserver entries. */
ndlp = lpfc_findnode_did(vport, Fabric_DID);
if (!ndlp)
@@ -740,6 +736,10 @@ lpfc_vport_delete(struct fc_vport *fc_vport)

skip_logo:

+ /* Remove FC host to break driver binding. */
+ fc_remove_host(shost);
+ scsi_remove_host(shost);
+
lpfc_cleanup(vport);

/* Remove scsi host now. The nodes are cleaned up. */
--
2.43.0



2024-04-07 13:31:29

by Sasha Levin

[permalink] [raw]
Subject: [PATCH AUTOSEL 6.1 02/13] scsi: lpfc: Update lpfc_ramp_down_queue_handler() logic

From: Justin Tee <[email protected]>

[ Upstream commit bb011631435c705cdeddca68d5c85fd40a4320f9 ]

Typically when an out of resource CQE status is detected, the
lpfc_ramp_down_queue_handler() logic is called to help reduce I/O load by
reducing an sdev's queue_depth.

However, the current lpfc_rampdown_queue_depth() logic does not help reduce
queue_depth. num_cmd_success is never updated and is always zero, which
means new_queue_depth will always be set to sdev->queue_depth. So,
new_queue_depth = sdev->queue_depth - new_queue_depth always sets
new_queue_depth to zero. And, scsi_change_queue_depth(sdev, 0) is
essentially a no-op.

Change the lpfc_ramp_down_queue_handler() logic to set new_queue_depth
equal to sdev->queue_depth subtracted from number of times num_rsrc_err was
incremented. If num_rsrc_err is >= sdev->queue_depth, then set
new_queue_depth equal to 1. Eventually, the frequency of Good_Status
frames will signal SCSI upper layer to auto increase the queue_depth back
to the driver default of 64 via scsi_handle_queue_ramp_up().

Signed-off-by: Justin Tee <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Martin K. Petersen <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/scsi/lpfc/lpfc.h | 1 -
drivers/scsi/lpfc/lpfc_scsi.c | 13 ++++---------
2 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc.h b/drivers/scsi/lpfc/lpfc.h
index dc5ac3cc70f6d..6f08fbe103cb9 100644
--- a/drivers/scsi/lpfc/lpfc.h
+++ b/drivers/scsi/lpfc/lpfc.h
@@ -1355,7 +1355,6 @@ struct lpfc_hba {
struct timer_list fabric_block_timer;
unsigned long bit_flags;
atomic_t num_rsrc_err;
- atomic_t num_cmd_success;
unsigned long last_rsrc_error_time;
unsigned long last_ramp_down_time;
#ifdef CONFIG_SCSI_LPFC_DEBUG_FS
diff --git a/drivers/scsi/lpfc/lpfc_scsi.c b/drivers/scsi/lpfc/lpfc_scsi.c
index 0bb7e164b525f..2a81a42de5c14 100644
--- a/drivers/scsi/lpfc/lpfc_scsi.c
+++ b/drivers/scsi/lpfc/lpfc_scsi.c
@@ -167,11 +167,10 @@ lpfc_ramp_down_queue_handler(struct lpfc_hba *phba)
struct Scsi_Host *shost;
struct scsi_device *sdev;
unsigned long new_queue_depth;
- unsigned long num_rsrc_err, num_cmd_success;
+ unsigned long num_rsrc_err;
int i;

num_rsrc_err = atomic_read(&phba->num_rsrc_err);
- num_cmd_success = atomic_read(&phba->num_cmd_success);

/*
* The error and success command counters are global per
@@ -186,20 +185,16 @@ lpfc_ramp_down_queue_handler(struct lpfc_hba *phba)
for (i = 0; i <= phba->max_vports && vports[i] != NULL; i++) {
shost = lpfc_shost_from_vport(vports[i]);
shost_for_each_device(sdev, shost) {
- new_queue_depth =
- sdev->queue_depth * num_rsrc_err /
- (num_rsrc_err + num_cmd_success);
- if (!new_queue_depth)
- new_queue_depth = sdev->queue_depth - 1;
+ if (num_rsrc_err >= sdev->queue_depth)
+ new_queue_depth = 1;
else
new_queue_depth = sdev->queue_depth -
- new_queue_depth;
+ num_rsrc_err;
scsi_change_queue_depth(sdev, new_queue_depth);
}
}
lpfc_destroy_vport_work_array(phba, vports);
atomic_set(&phba->num_rsrc_err, 0);
- atomic_set(&phba->num_cmd_success, 0);
}

/**
--
2.43.0


2024-04-07 13:31:38

by Sasha Levin

[permalink] [raw]
Subject: [PATCH AUTOSEL 6.1 05/13] gfs2: Fix invalid metadata access in punch_hole

From: Andrew Price <[email protected]>

[ Upstream commit c95346ac918c5badf51b9a7ac58a26d3bd5bb224 ]

In punch_hole(), when the offset lies in the final block for a given
height, there is no hole to punch, but the maximum size check fails to
detect that. Consequently, punch_hole() will try to punch a hole beyond
the end of the metadata and fail. Fix the maximum size check.

Signed-off-by: Andrew Price <[email protected]>
Signed-off-by: Andreas Gruenbacher <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
fs/gfs2/bmap.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index e7537fd305dd2..9ad11e5bf14c3 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -1702,7 +1702,8 @@ static int punch_hole(struct gfs2_inode *ip, u64 offset, u64 length)
struct buffer_head *dibh, *bh;
struct gfs2_holder rd_gh;
unsigned int bsize_shift = sdp->sd_sb.sb_bsize_shift;
- u64 lblock = (offset + (1 << bsize_shift) - 1) >> bsize_shift;
+ unsigned int bsize = 1 << bsize_shift;
+ u64 lblock = (offset + bsize - 1) >> bsize_shift;
__u16 start_list[GFS2_MAX_META_HEIGHT];
__u16 __end_list[GFS2_MAX_META_HEIGHT], *end_list = NULL;
unsigned int start_aligned, end_aligned;
@@ -1713,7 +1714,7 @@ static int punch_hole(struct gfs2_inode *ip, u64 offset, u64 length)
u64 prev_bnr = 0;
__be64 *start, *end;

- if (offset >= maxsize) {
+ if (offset + bsize - 1 >= maxsize) {
/*
* The starting point lies beyond the allocated meta-data;
* there are no blocks do deallocate.
--
2.43.0


2024-04-07 13:31:50

by Sasha Levin

[permalink] [raw]
Subject: [PATCH AUTOSEL 6.1 04/13] scsi: lpfc: Release hbalock before calling lpfc_worker_wake_up()

From: Justin Tee <[email protected]>

[ Upstream commit ded20192dff31c91cef2a04f7e20e60e9bb887d3 ]

lpfc_worker_wake_up() calls the lpfc_work_done() routine, which takes the
hbalock. Thus, lpfc_worker_wake_up() should not be called while holding the
hbalock to avoid potential deadlock.

Signed-off-by: Justin Tee <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Martin K. Petersen <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/scsi/lpfc/lpfc_els.c | 20 ++++++++++----------
drivers/scsi/lpfc/lpfc_hbadisc.c | 5 ++---
drivers/scsi/lpfc/lpfc_sli.c | 14 +++++++-------
3 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_els.c b/drivers/scsi/lpfc/lpfc_els.c
index 6b5ce9869e6b4..05764008f6e70 100644
--- a/drivers/scsi/lpfc/lpfc_els.c
+++ b/drivers/scsi/lpfc/lpfc_els.c
@@ -4384,23 +4384,23 @@ lpfc_els_retry_delay(struct timer_list *t)
unsigned long flags;
struct lpfc_work_evt *evtp = &ndlp->els_retry_evt;

+ /* Hold a node reference for outstanding queued work */
+ if (!lpfc_nlp_get(ndlp))
+ return;
+
spin_lock_irqsave(&phba->hbalock, flags);
if (!list_empty(&evtp->evt_listp)) {
spin_unlock_irqrestore(&phba->hbalock, flags);
+ lpfc_nlp_put(ndlp);
return;
}

- /* We need to hold the node by incrementing the reference
- * count until the queued work is done
- */
- evtp->evt_arg1 = lpfc_nlp_get(ndlp);
- if (evtp->evt_arg1) {
- evtp->evt = LPFC_EVT_ELS_RETRY;
- list_add_tail(&evtp->evt_listp, &phba->work_list);
- lpfc_worker_wake_up(phba);
- }
+ evtp->evt_arg1 = ndlp;
+ evtp->evt = LPFC_EVT_ELS_RETRY;
+ list_add_tail(&evtp->evt_listp, &phba->work_list);
spin_unlock_irqrestore(&phba->hbalock, flags);
- return;
+
+ lpfc_worker_wake_up(phba);
}

/**
diff --git a/drivers/scsi/lpfc/lpfc_hbadisc.c b/drivers/scsi/lpfc/lpfc_hbadisc.c
index 549fa7d6c0f6f..aaa98a006fdcb 100644
--- a/drivers/scsi/lpfc/lpfc_hbadisc.c
+++ b/drivers/scsi/lpfc/lpfc_hbadisc.c
@@ -241,7 +241,9 @@ lpfc_dev_loss_tmo_callbk(struct fc_rport *rport)
if (evtp->evt_arg1) {
evtp->evt = LPFC_EVT_DEV_LOSS;
list_add_tail(&evtp->evt_listp, &phba->work_list);
+ spin_unlock_irqrestore(&phba->hbalock, iflags);
lpfc_worker_wake_up(phba);
+ return;
}
spin_unlock_irqrestore(&phba->hbalock, iflags);
} else {
@@ -259,10 +261,7 @@ lpfc_dev_loss_tmo_callbk(struct fc_rport *rport)
lpfc_disc_state_machine(vport, ndlp, NULL,
NLP_EVT_DEVICE_RM);
}
-
}
-
- return;
}

/**
diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c
index 427a6ac803e50..47b8102a7063a 100644
--- a/drivers/scsi/lpfc/lpfc_sli.c
+++ b/drivers/scsi/lpfc/lpfc_sli.c
@@ -1217,9 +1217,9 @@ lpfc_set_rrq_active(struct lpfc_hba *phba, struct lpfc_nodelist *ndlp,
empty = list_empty(&phba->active_rrq_list);
list_add_tail(&rrq->list, &phba->active_rrq_list);
phba->hba_flag |= HBA_RRQ_ACTIVE;
+ spin_unlock_irqrestore(&phba->hbalock, iflags);
if (empty)
lpfc_worker_wake_up(phba);
- spin_unlock_irqrestore(&phba->hbalock, iflags);
return 0;
out:
spin_unlock_irqrestore(&phba->hbalock, iflags);
@@ -11361,18 +11361,18 @@ lpfc_sli_post_recovery_event(struct lpfc_hba *phba,
unsigned long iflags;
struct lpfc_work_evt *evtp = &ndlp->recovery_evt;

+ /* Hold a node reference for outstanding queued work */
+ if (!lpfc_nlp_get(ndlp))
+ return;
+
spin_lock_irqsave(&phba->hbalock, iflags);
if (!list_empty(&evtp->evt_listp)) {
spin_unlock_irqrestore(&phba->hbalock, iflags);
+ lpfc_nlp_put(ndlp);
return;
}

- /* Incrementing the reference count until the queued work is done. */
- evtp->evt_arg1 = lpfc_nlp_get(ndlp);
- if (!evtp->evt_arg1) {
- spin_unlock_irqrestore(&phba->hbalock, iflags);
- return;
- }
+ evtp->evt_arg1 = ndlp;
evtp->evt = LPFC_EVT_RECOVER_PORT;
list_add_tail(&evtp->evt_listp, &phba->work_list);
spin_unlock_irqrestore(&phba->hbalock, iflags);
--
2.43.0


2024-04-07 13:32:48

by Sasha Levin

[permalink] [raw]
Subject: [PATCH AUTOSEL 6.1 08/13] net: mark racy access on sk->sk_rcvbuf

From: linke li <[email protected]>

[ Upstream commit c2deb2e971f5d9aca941ef13ee05566979e337a4 ]

sk->sk_rcvbuf in __sock_queue_rcv_skb() and __sk_receive_skb() can be
changed by other threads. Mark this as benign using READ_ONCE().

This patch is aimed at reducing the number of benign races reported by
KCSAN in order to focus future debugging effort on harmful races.

Signed-off-by: linke li <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/core/sock.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/core/sock.c b/net/core/sock.c
index c8803b95ea0da..054fbace08e0a 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -481,7 +481,7 @@ int __sock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
unsigned long flags;
struct sk_buff_head *list = &sk->sk_receive_queue;

- if (atomic_read(&sk->sk_rmem_alloc) >= sk->sk_rcvbuf) {
+ if (atomic_read(&sk->sk_rmem_alloc) >= READ_ONCE(sk->sk_rcvbuf)) {
atomic_inc(&sk->sk_drops);
trace_sock_rcvqueue_full(sk, skb);
return -ENOMEM;
@@ -551,7 +551,7 @@ int __sk_receive_skb(struct sock *sk, struct sk_buff *skb,

skb->dev = NULL;

- if (sk_rcvqueues_full(sk, sk->sk_rcvbuf)) {
+ if (sk_rcvqueues_full(sk, READ_ONCE(sk->sk_rcvbuf))) {
atomic_inc(&sk->sk_drops);
goto discard_and_relse;
}
--
2.43.0


2024-04-07 13:33:21

by Sasha Levin

[permalink] [raw]
Subject: [PATCH AUTOSEL 6.1 10/13] scsi: bnx2fc: Remove spin_lock_bh while releasing resources after upload

From: Saurav Kashyap <[email protected]>

[ Upstream commit c214ed2a4dda35b308b0b28eed804d7ae66401f9 ]

The session resources are used by FW and driver when session is offloaded,
once session is uploaded these resources are not used. The lock is not
required as these fields won't be used any longer. The offload and upload
calls are sequential, hence lock is not required.

This will suppress following BUG_ON():

[ 449.843143] ------------[ cut here ]------------
[ 449.848302] kernel BUG at mm/vmalloc.c:2727!
[ 449.853072] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[ 449.858712] CPU: 5 PID: 1996 Comm: kworker/u24:2 Not tainted 5.14.0-118.el9.x86_64 #1
Rebooting.
[ 449.867454] Hardware name: Dell Inc. PowerEdge R730/0WCJNT, BIOS 2.3.4 11/08/2016
[ 449.876966] Workqueue: fc_rport_eq fc_rport_work [libfc]
[ 449.882910] RIP: 0010:vunmap+0x2e/0x30
[ 449.887098] Code: 00 65 8b 05 14 a2 f0 4a a9 00 ff ff 00 75 1b 55 48 89 fd e8 34 36 79 00 48 85 ed 74 0b 48 89 ef 31 f6 5d e9 14 fc ff ff 5d c3 <0f> 0b 0f 1f 44 00 00 41 57 41 56 49 89 ce 41 55 49 89 fd 41 54 41
[ 449.908054] RSP: 0018:ffffb83d878b3d68 EFLAGS: 00010206
[ 449.913887] RAX: 0000000080000201 RBX: ffff8f4355133550 RCX: 000000000d400005
[ 449.921843] RDX: 0000000000000001 RSI: 0000000000001000 RDI: ffffb83da53f5000
[ 449.929808] RBP: ffff8f4ac6675800 R08: ffffb83d878b3d30 R09: 00000000000efbdf
[ 449.937774] R10: 0000000000000003 R11: ffff8f434573e000 R12: 0000000000001000
[ 449.945736] R13: 0000000000001000 R14: ffffb83da53f5000 R15: ffff8f43d4ea3ae0
[ 449.953701] FS: 0000000000000000(0000) GS:ffff8f529fc80000(0000) knlGS:0000000000000000
[ 449.962732] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 449.969138] CR2: 00007f8cf993e150 CR3: 0000000efbe10003 CR4: 00000000003706e0
[ 449.977102] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 449.985065] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 449.993028] Call Trace:
[ 449.995756] __iommu_dma_free+0x96/0x100
[ 450.000139] bnx2fc_free_session_resc+0x67/0x240 [bnx2fc]
[ 450.006171] bnx2fc_upload_session+0xce/0x100 [bnx2fc]
[ 450.011910] bnx2fc_rport_event_handler+0x9f/0x240 [bnx2fc]
[ 450.018136] fc_rport_work+0x103/0x5b0 [libfc]
[ 450.023103] process_one_work+0x1e8/0x3c0
[ 450.027581] worker_thread+0x50/0x3b0
[ 450.031669] ? rescuer_thread+0x370/0x370
[ 450.036143] kthread+0x149/0x170
[ 450.039744] ? set_kthread_struct+0x40/0x40
[ 450.044411] ret_from_fork+0x22/0x30
[ 450.048404] Modules linked in: vfat msdos fat xfs nfs_layout_nfsv41_files rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver dm_service_time qedf qed crc8 bnx2fc libfcoe libfc scsi_transport_fc intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp dcdbas rapl intel_cstate intel_uncore mei_me pcspkr mei ipmi_ssif lpc_ich ipmi_si fuse zram ext4 mbcache jbd2 loop nfsv3 nfs_acl nfs lockd grace fscache netfs irdma ice sd_mod t10_pi sg ib_uverbs ib_core 8021q garp mrp stp llc mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt mxm_wmi fb_sys_fops cec crct10dif_pclmul ahci crc32_pclmul bnx2x drm ghash_clmulni_intel libahci rfkill i40e libata megaraid_sas mdio wmi sunrpc lrw dm_crypt dm_round_robin dm_multipath dm_snapshot dm_bufio dm_mirror dm_region_hash dm_log dm_zero dm_mod linear raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid6_pq libcrc32c crc32c_intel raid1 raid0 iscsi_ibft squashfs be2iscsi bnx2i cnic uio cxgb4i cxgb4 tls
[ 450.048497] libcxgbi libcxgb qla4xxx iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi edd ipmi_devintf ipmi_msghandler
[ 450.159753] ---[ end trace 712de2c57c64abc8 ]---

Reported-by: Guangwu Zhang <[email protected]>
Signed-off-by: Saurav Kashyap <[email protected]>
Signed-off-by: Nilesh Javali <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Martin K. Petersen <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/scsi/bnx2fc/bnx2fc_tgt.c | 2 --
1 file changed, 2 deletions(-)

diff --git a/drivers/scsi/bnx2fc/bnx2fc_tgt.c b/drivers/scsi/bnx2fc/bnx2fc_tgt.c
index 2c246e80c1c4d..d91659811eb3c 100644
--- a/drivers/scsi/bnx2fc/bnx2fc_tgt.c
+++ b/drivers/scsi/bnx2fc/bnx2fc_tgt.c
@@ -833,7 +833,6 @@ static void bnx2fc_free_session_resc(struct bnx2fc_hba *hba,

BNX2FC_TGT_DBG(tgt, "Freeing up session resources\n");

- spin_lock_bh(&tgt->cq_lock);
ctx_base_ptr = tgt->ctx_base;
tgt->ctx_base = NULL;

@@ -889,7 +888,6 @@ static void bnx2fc_free_session_resc(struct bnx2fc_hba *hba,
tgt->sq, tgt->sq_dma);
tgt->sq = NULL;
}
- spin_unlock_bh(&tgt->cq_lock);

if (ctx_base_ptr)
iounmap(ctx_base_ptr);
--
2.43.0


2024-04-07 13:33:39

by Sasha Levin

[permalink] [raw]
Subject: [PATCH AUTOSEL 6.1 11/13] btrfs: return accurate error code on open failure in open_fs_devices()

From: Anand Jain <[email protected]>

[ Upstream commit 2f1aeab9fca1a5f583be1add175d1ee95c213cfa ]

When attempting to exclusive open a device which has no exclusive open
permission, such as a physical device associated with the flakey dm
device, the open operation will fail, resulting in a mount failure.

In this particular scenario, we erroneously return -EINVAL instead of the
correct error code provided by the bdev_open_by_path() function, which is
-EBUSY.

Fix this, by returning error code from the bdev_open_by_path() function.
With this correction, the mount error message will align with that of
ext4 and xfs.

Reviewed-by: Boris Burkov <[email protected]>
Signed-off-by: Anand Jain <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
fs/btrfs/volumes.c | 17 ++++++++++++-----
1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 6fc2d99270c18..a9b44755fc6f0 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1233,25 +1233,32 @@ static int open_fs_devices(struct btrfs_fs_devices *fs_devices,
struct btrfs_device *device;
struct btrfs_device *latest_dev = NULL;
struct btrfs_device *tmp_device;
+ int ret = 0;

flags |= FMODE_EXCL;

list_for_each_entry_safe(device, tmp_device, &fs_devices->devices,
dev_list) {
- int ret;
+ int ret2;

- ret = btrfs_open_one_device(fs_devices, device, flags, holder);
- if (ret == 0 &&
+ ret2 = btrfs_open_one_device(fs_devices, device, flags, holder);
+ if (ret2 == 0 &&
(!latest_dev || device->generation > latest_dev->generation)) {
latest_dev = device;
- } else if (ret == -ENODATA) {
+ } else if (ret2 == -ENODATA) {
fs_devices->num_devices--;
list_del(&device->dev_list);
btrfs_free_device(device);
}
+ if (ret == 0 && ret2 != 0)
+ ret = ret2;
}
- if (fs_devices->open_devices == 0)
+
+ if (fs_devices->open_devices == 0) {
+ if (ret)
+ return ret;
return -EINVAL;
+ }

fs_devices->opened = 1;
fs_devices->latest_dev = latest_dev;
--
2.43.0


2024-04-07 13:34:01

by Sasha Levin

[permalink] [raw]
Subject: [PATCH AUTOSEL 6.1 12/13] bpf: Check bloom filter map value size

From: Andrei Matei <[email protected]>

[ Upstream commit a8d89feba7e54e691ca7c4efc2a6264fa83f3687 ]

This patch adds a missing check to bloom filter creating, rejecting
values above KMALLOC_MAX_SIZE. This brings the bloom map in line with
many other map types.

The lack of this protection can cause kernel crashes for value sizes
that overflow int's. Such a crash was caught by syzkaller. The next
patch adds more guard-rails at a lower level.

Signed-off-by: Andrei Matei <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
kernel/bpf/bloom_filter.c | 13 +++++++++++++
.../selftests/bpf/prog_tests/bloom_filter_map.c | 6 ++++++
2 files changed, 19 insertions(+)

diff --git a/kernel/bpf/bloom_filter.c b/kernel/bpf/bloom_filter.c
index 48ee750849f25..78e810f49c445 100644
--- a/kernel/bpf/bloom_filter.c
+++ b/kernel/bpf/bloom_filter.c
@@ -88,6 +88,18 @@ static int bloom_map_get_next_key(struct bpf_map *map, void *key, void *next_key
return -EOPNOTSUPP;
}

+/* Called from syscall */
+static int bloom_map_alloc_check(union bpf_attr *attr)
+{
+ if (attr->value_size > KMALLOC_MAX_SIZE)
+ /* if value_size is bigger, the user space won't be able to
+ * access the elements.
+ */
+ return -E2BIG;
+
+ return 0;
+}
+
static struct bpf_map *bloom_map_alloc(union bpf_attr *attr)
{
u32 bitset_bytes, bitset_mask, nr_hash_funcs, nr_bits;
@@ -196,6 +208,7 @@ static int bloom_map_check_btf(const struct bpf_map *map,
BTF_ID_LIST_SINGLE(bpf_bloom_map_btf_ids, struct, bpf_bloom_filter)
const struct bpf_map_ops bloom_filter_map_ops = {
.map_meta_equal = bpf_map_meta_equal,
+ .map_alloc_check = bloom_map_alloc_check,
.map_alloc = bloom_map_alloc,
.map_free = bloom_map_free,
.map_get_next_key = bloom_map_get_next_key,
diff --git a/tools/testing/selftests/bpf/prog_tests/bloom_filter_map.c b/tools/testing/selftests/bpf/prog_tests/bloom_filter_map.c
index d2d9e965eba59..f79815b7e951b 100644
--- a/tools/testing/selftests/bpf/prog_tests/bloom_filter_map.c
+++ b/tools/testing/selftests/bpf/prog_tests/bloom_filter_map.c
@@ -2,6 +2,7 @@
/* Copyright (c) 2021 Facebook */

#include <sys/syscall.h>
+#include <limits.h>
#include <test_progs.h>
#include "bloom_filter_map.skel.h"

@@ -21,6 +22,11 @@ static void test_fail_cases(void)
if (!ASSERT_LT(fd, 0, "bpf_map_create bloom filter invalid value size 0"))
close(fd);

+ /* Invalid value size: too big */
+ fd = bpf_map_create(BPF_MAP_TYPE_BLOOM_FILTER, NULL, 0, INT32_MAX, 100, NULL);
+ if (!ASSERT_LT(fd, 0, "bpf_map_create bloom filter invalid value too large"))
+ close(fd);
+
/* Invalid max entries size */
fd = bpf_map_create(BPF_MAP_TYPE_BLOOM_FILTER, NULL, 0, sizeof(value), 0, NULL);
if (!ASSERT_LT(fd, 0, "bpf_map_create bloom filter invalid max entries size"))
--
2.43.0


2024-04-07 13:34:13

by Sasha Levin

[permalink] [raw]
Subject: [PATCH AUTOSEL 6.1 13/13] kbuild: Disable KCSAN for autogenerated *.mod.c intermediaries

From: "Borislav Petkov (AMD)" <[email protected]>

[ Upstream commit 54babdc0343fff2f32dfaafaaa9e42c4db278204 ]

When KCSAN and CONSTRUCTORS are enabled, one can trigger the

"Unpatched return thunk in use. This should not happen!"

catch-all warning.

Usually, when objtool runs on the .o objects, it does generate a section
return_sites which contains all offsets in the objects to the return
thunks of the functions present there. Those return thunks then get
patched at runtime by the alternatives.

KCSAN and CONSTRUCTORS add this to the object file's .text.startup
section:

-------------------
Disassembly of section .text.startup:

...

0000000000000010 <_sub_I_00099_0>:
10: f3 0f 1e fa endbr64
14: e8 00 00 00 00 call 19 <_sub_I_00099_0+0x9>
15: R_X86_64_PLT32 __tsan_init-0x4
19: e9 00 00 00 00 jmp 1e <__UNIQUE_ID___addressable_cryptd_alloc_aead349+0x6>
1a: R_X86_64_PLT32 __x86_return_thunk-0x4
-------------------

which, if it is built as a module goes through the intermediary stage of
creating a <module>.mod.c file which, when translated, receives a second
constructor:

-------------------
Disassembly of section .text.startup:

0000000000000010 <_sub_I_00099_0>:
10: f3 0f 1e fa endbr64
14: e8 00 00 00 00 call 19 <_sub_I_00099_0+0x9>
15: R_X86_64_PLT32 __tsan_init-0x4
19: e9 00 00 00 00 jmp 1e <_sub_I_00099_0+0xe>
1a: R_X86_64_PLT32 __x86_return_thunk-0x4

...

0000000000000030 <_sub_I_00099_0>:
30: f3 0f 1e fa endbr64
34: e8 00 00 00 00 call 39 <_sub_I_00099_0+0x9>
35: R_X86_64_PLT32 __tsan_init-0x4
39: e9 00 00 00 00 jmp 3e <__ksymtab_cryptd_alloc_ahash+0x2>
3a: R_X86_64_PLT32 __x86_return_thunk-0x4
-------------------

in the .ko file.

Objtool has run already so that second constructor's return thunk cannot
be added to the .return_sites section and thus the return thunk remains
unpatched and the warning rightfully fires.

Drop KCSAN flags from the mod.c generation stage as those constructors
do not contain data races one would be interested about.

Debugged together with David Kaplan <[email protected]> and Nikolay
Borisov <[email protected]>.

Reported-by: Paul Menzel <[email protected]>
Closes: https://lore.kernel.org/r/[email protected]
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Tested-by: Paul Menzel <[email protected]> # Dell XPS 13
Reviewed-by: Nikolay Borisov <[email protected]>
Reviewed-by: Marco Elver <[email protected]>
Signed-off-by: Masahiro Yamada <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
scripts/Makefile.modfinal | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/Makefile.modfinal b/scripts/Makefile.modfinal
index 3af5e5807983a..650d59388336f 100644
--- a/scripts/Makefile.modfinal
+++ b/scripts/Makefile.modfinal
@@ -23,7 +23,7 @@ modname = $(notdir $(@:.mod.o=))
part-of-module = y

quiet_cmd_cc_o_c = CC [M] $@
- cmd_cc_o_c = $(CC) $(filter-out $(CC_FLAGS_CFI) $(CFLAGS_GCOV), $(c_flags)) -c -o $@ $<
+ cmd_cc_o_c = $(CC) $(filter-out $(CC_FLAGS_CFI) $(CFLAGS_GCOV) $(CFLAGS_KCSAN), $(c_flags)) -c -o $@ $<

%.mod.o: %.mod.c FORCE
$(call if_changed_dep,cc_o_c)
--
2.43.0


2024-04-07 13:37:25

by Sasha Levin

[permalink] [raw]
Subject: [PATCH AUTOSEL 6.1 09/13] scsi: mpi3mr: Avoid memcpy field-spanning write WARNING

From: Shin'ichiro Kawasaki <[email protected]>

[ Upstream commit 429846b4b6ce9853e0d803a2357bb2e55083adf0 ]

When the "storcli2 show" command is executed for eHBA-9600, mpi3mr driver
prints this WARNING message:

memcpy: detected field-spanning write (size 128) of single field "bsg_reply_buf->reply_buf" at drivers/scsi/mpi3mr/mpi3mr_app.c:1658 (size 1)
WARNING: CPU: 0 PID: 12760 at drivers/scsi/mpi3mr/mpi3mr_app.c:1658 mpi3mr_bsg_request+0x6b12/0x7f10 [mpi3mr]

The cause of the WARN is 128 bytes memcpy to the 1 byte size array "__u8
replay_buf[1]" in the struct mpi3mr_bsg_in_reply_buf. The array is intended
to be a flexible length array, so the WARN is a false positive.

To suppress the WARN, remove the constant number '1' from the array
declaration and clarify that it has flexible length. Also, adjust the
memory allocation size to match the change.

Suggested-by: Sathya Prakash Veerichetty <[email protected]>
Signed-off-by: Shin'ichiro Kawasaki <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Martin K. Petersen <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/scsi/mpi3mr/mpi3mr_app.c | 2 +-
include/uapi/scsi/scsi_bsg_mpi3mr.h | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/mpi3mr/mpi3mr_app.c b/drivers/scsi/mpi3mr/mpi3mr_app.c
index 8c662d08706f1..42600e5c457a1 100644
--- a/drivers/scsi/mpi3mr/mpi3mr_app.c
+++ b/drivers/scsi/mpi3mr/mpi3mr_app.c
@@ -1344,7 +1344,7 @@ static long mpi3mr_bsg_process_mpt_cmds(struct bsg_job *job, unsigned int *reply
if ((mpirep_offset != 0xFF) &&
drv_bufs[mpirep_offset].bsg_buf_len) {
drv_buf_iter = &drv_bufs[mpirep_offset];
- drv_buf_iter->kern_buf_len = (sizeof(*bsg_reply_buf) - 1 +
+ drv_buf_iter->kern_buf_len = (sizeof(*bsg_reply_buf) +
mrioc->reply_sz);
bsg_reply_buf = kzalloc(drv_buf_iter->kern_buf_len, GFP_KERNEL);

diff --git a/include/uapi/scsi/scsi_bsg_mpi3mr.h b/include/uapi/scsi/scsi_bsg_mpi3mr.h
index fdc3517f9e199..c48c5d08c0fa0 100644
--- a/include/uapi/scsi/scsi_bsg_mpi3mr.h
+++ b/include/uapi/scsi/scsi_bsg_mpi3mr.h
@@ -382,7 +382,7 @@ struct mpi3mr_bsg_in_reply_buf {
__u8 mpi_reply_type;
__u8 rsvd1;
__u16 rsvd2;
- __u8 reply_buf[1];
+ __u8 reply_buf[];
};

/**
--
2.43.0