2023-03-06 21:03:45

by Vernon Yang

[permalink] [raw]
Subject: [PATCH v2 0/4] fix call cpumask_next() if no further cpus set

Hello,

I updated the Linux kernel to commit fe15c26ee26e ("Linux 6.3-rc1")
and found that when the system boots to systemd ranom initialization,
panic, as follows:

[ 3.607299] BUG: unable to handle page fault for address: 000000000001cc43
[ 3.607558] #PF: supervisor read access in kernel mode
[ 3.607704] #PF: error_code(0x0000) - not-present page
[ 3.607704] PGD 0 P4D 0
[ 3.607704] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 3.607704] CPU: 1 PID: 1 Comm: systemd Not tainted 6.3.0-rc1 #50
[ 3.607704] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[ 3.607704] RIP: 0010:_raw_spin_lock+0x12/0x30
[ 3.607704] Code: 84 00 00 00 00 00 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 65 ff 05 dd de 1e 7e 31 c0 ba 01 00 00 00 <f0> 0f b1 17 75 05 c3 cc cc cc cc 89 c6 e9 9c 00 00 00 6
[ 3.607704] RSP: 0018:ffffc90000013d50 EFLAGS: 00000002
[ 3.607704] RAX: 0000000000000000 RBX: 0000000000000040 RCX: 0000000000000002
[ 3.607704] RDX: 0000000000000001 RSI: 0000000000000246 RDI: 000000000001cc43
[ 3.607704] RBP: ffffc90000013dc8 R08: 00000000d6fbd601 R09: 0000000065abc912
[ 3.607704] R10: 00000000ba93b167 R11: 000000007bb5d0bf R12: 000000000001cc43
[ 3.607704] R13: 000000000001cc43 R14: 0000000000000003 R15: 0000000000000003
[ 3.607704] FS: 00007fbd4911b400(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000
[ 3.607704] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3.607704] CR2: 000000000001cc43 CR3: 0000000003b42000 CR4: 00000000000006e0
[ 3.607704] Call Trace:
[ 3.607704] <TASK>
[ 3.607704] add_timer_on+0x80/0x130
[ 3.607704] try_to_generate_entropy+0x246/0x270
[ 3.607704] ? do_filp_open+0xb1/0x160
[ 3.607704] ? __pfx_entropy_timer+0x10/0x10
[ 3.607704] ? inode_security+0x1d/0x60
[ 3.607704] urandom_read_iter+0x23/0x90
[ 3.607704] vfs_read+0x203/0x2d0
[ 3.607704] ksys_read+0x5e/0xe0
[ 3.607704] do_syscall_64+0x3f/0x90
[ 3.607704] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[ 3.607704] RIP: 0033:0x7fbd49a25992
[ 3.607704] Code: c0 e9 b2 fe ff ff 50 48 8d 3d fa b2 0c 00 e8 c5 1d 02 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 4
[ 3.607704] RSP: 002b:00007ffea3fe8318 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 3.607704] RAX: ffffffffffffffda RBX: 0000000000000010 RCX: 00007fbd49a25992
[ 3.607704] RDX: 0000000000000010 RSI: 00007ffea3fe83a0 RDI: 000000000000000c
[ 3.607704] RBP: 000000000000000c R08: 3983c6a57a866072 R09: c736ebfbeb917d7e
[ 3.607704] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[ 3.607704] R13: 0000000000000001 R14: 00007ffea3fe83a0 R15: 00005609e5454ea8
[ 3.607704] </TASK>
[ 3.607704] Modules linked in:
[ 3.607704] CR2: 000000000001cc43
[ 3.607704] ---[ end trace 0000000000000000 ]---
[ 3.607704] RIP: 0010:_raw_spin_lock+0x12/0x30
[ 3.607704] Code: 84 00 00 00 00 00 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 65 ff 05 dd de 1e 7e 31 c0 ba 01 00 00 00 <f0> 0f b1 17 75 05 c3 cc cc cc cc 89 c6 e9 9c 00 00 00 6
[ 3.607704] RSP: 0018:ffffc90000013d50 EFLAGS: 00000002
[ 3.607704] RAX: 0000000000000000 RBX: 0000000000000040 RCX: 0000000000000002
[ 3.607704] RDX: 0000000000000001 RSI: 0000000000000246 RDI: 000000000001cc43
[ 3.607704] RBP: ffffc90000013dc8 R08: 00000000d6fbd601 R09: 0000000065abc912
[ 3.607704] R10: 00000000ba93b167 R11: 000000007bb5d0bf R12: 000000000001cc43
[ 3.607704] R13: 000000000001cc43 R14: 0000000000000003 R15: 0000000000000003
[ 3.607704] FS: 00007fbd4911b400(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000
[ 3.607704] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3.607704] CR2: 000000000001cc43 CR3: 0000000003b42000 CR4: 00000000000006e0
[ 3.607704] note: systemd[1] exited with irqs disabled
[ 3.618556] note: systemd[1] exited with preempt_count 2
[ 3.618991] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
[ 3.619797] Kernel Offset: disabled
[ 3.619798] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009 ]---


Analysis add_timer_on() found that the parameter cpu is equal to 64, which
feels strange, because qemu only specifies two CPUs, continues to look up,
and finds that the parameter cpu is obtained by
try_to_generate_entropy() -> cpumask_next().

Then use git bisect to find the bug, and find that it was introduced by
commit 596ff4a09b89 ("cpumask: re-introduce constant-sized cpumask optimizations"),
carefully analyzing the cpumask_next() modification record, I found that
nr_cpumask_bits modified to small_cpumask_bits, and when NR_CPUS <= BITS_PER_LONG,
small_cpumask_bits is a macro and before nr_cpumask_bits is a variable-sized.

look for find_next_bit() If no bits are set, returns @size, I seem to
understand the cause of the problem.

I fixed this bug by make `if (cpu == nr_cpumask_bits)` to `if (cpu >= nr_cpu_ids)`

At the same time I think about this situation, maybe there are the same errors
elsewhere, check it, sure enough, there are, quite a few.

The patch "random:xxx" has been verified, it is valid, the other three fixes
have not been verified, because I do not have an environment, but they
principle are same, so also submitted at the same time, if someone helps to
verify, thanks you very much.

If there is anything error, please tell me, thanks.


v2 changes:
- remove patch 5
- make nr_cpumask_bits to nr_cpu_ids
- add helper function to get next present cpu for lpfc_cpu_affinity_check
- update commit comment

v1: https://lore.kernel.org/all/[email protected]/

Vernon Yang (4):
random: fix try_to_generate_entropy() if no further cpus set
wireguard: fix wg_cpumask_choose_online() if no further cpus set
scsi: lpfc: fix lpfc_cpu_affinity_check() if no further cpus set
scsi: lpfc: fix lpfc_nvmet_setup_io_context() if no further cpus set

drivers/char/random.c | 2 +-
drivers/net/wireguard/queueing.h | 2 +-
drivers/scsi/lpfc/lpfc_init.c | 43 +++++++++++++++-----------------
drivers/scsi/lpfc/lpfc_nvmet.c | 2 +-
4 files changed, 23 insertions(+), 26 deletions(-)

--
2.34.1



2023-03-06 21:03:55

by Vernon Yang

[permalink] [raw]
Subject: [PATCH v2 1/4] random: fix try_to_generate_entropy() if no further cpus set

When cpumask_next() the return value is greater than or equal to
nr_cpu_ids, it indicates invalid.

Before commit 596ff4a09b89 ("cpumask: re-introduce constant-sized cpumask
optimizations"), when cpumask_next() returned an invalid cpu, the driver
used the judgment equal to nr_cpu_ids to indicate the invalid cpu, so it
happened to work normally, but this is the wrong approach.

After commit 596ff4a09b89 ("cpumask: re-introduce constant-sized cpumask
optimizations"), these incorrect practices actively buggy, so fix it to
correctly.

Signed-off-by: Vernon Yang <[email protected]>
---
drivers/char/random.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index ce3ccd172cc8..253f2ddb8913 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1311,7 +1311,7 @@ static void __cold try_to_generate_entropy(void)
/* Basic CPU round-robin, which avoids the current CPU. */
do {
cpu = cpumask_next(cpu, &timer_cpus);
- if (cpu == nr_cpumask_bits)
+ if (cpu >= nr_cpu_ids)
cpu = cpumask_first(&timer_cpus);
} while (cpu == smp_processor_id() && num_cpus > 1);

--
2.34.1


2023-03-06 21:03:59

by Vernon Yang

[permalink] [raw]
Subject: [PATCH v2 2/4] wireguard: fix wg_cpumask_choose_online() if no further cpus set

When cpumask_next() the return value is greater than or equal to
nr_cpu_ids, it indicates invalid.

Before commit 596ff4a09b89 ("cpumask: re-introduce constant-sized cpumask
optimizations"), when cpumask_next() returned an invalid cpu, the driver
used the judgment equal to nr_cpu_ids to indicate the invalid cpu, so it
happened to work normally, but this is the wrong approach.

After commit 596ff4a09b89 ("cpumask: re-introduce constant-sized cpumask
optimizations"), these incorrect practices actively buggy, so fix it to
correctly.

Signed-off-by: Vernon Yang <[email protected]>
---
drivers/net/wireguard/queueing.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireguard/queueing.h b/drivers/net/wireguard/queueing.h
index 583adb37ee1e..125284b346a7 100644
--- a/drivers/net/wireguard/queueing.h
+++ b/drivers/net/wireguard/queueing.h
@@ -106,7 +106,7 @@ static inline int wg_cpumask_choose_online(int *stored_cpu, unsigned int id)
{
unsigned int cpu = *stored_cpu, cpu_index, i;

- if (unlikely(cpu == nr_cpumask_bits ||
+ if (unlikely(cpu >= nr_cpu_ids ||
!cpumask_test_cpu(cpu, cpu_online_mask))) {
cpu_index = id % cpumask_weight(cpu_online_mask);
cpu = cpumask_first(cpu_online_mask);
--
2.34.1


2023-03-06 21:04:09

by Vernon Yang

[permalink] [raw]
Subject: [PATCH v2 3/4] scsi: lpfc: fix lpfc_cpu_affinity_check() if no further cpus set

When cpumask_next() the return value is greater than or equal to
nr_cpu_ids, it indicates invalid.

Before commit 596ff4a09b89 ("cpumask: re-introduce constant-sized cpumask
optimizations"), when cpumask_next() returned an invalid cpu, the driver
used the judgment equal to nr_cpu_ids to indicate the invalid cpu, so it
happened to work normally, but this is the wrong approach.

After commit 596ff4a09b89 ("cpumask: re-introduce constant-sized cpumask
optimizations"), these incorrect practices actively buggy, so fix it to
correctly.

Signed-off-by: Vernon Yang <[email protected]>
---
drivers/scsi/lpfc/lpfc_init.c | 43 ++++++++++++++++-------------------
1 file changed, 20 insertions(+), 23 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c
index 61958a24a43d..acfffdbe9ba1 100644
--- a/drivers/scsi/lpfc/lpfc_init.c
+++ b/drivers/scsi/lpfc/lpfc_init.c
@@ -12473,6 +12473,16 @@ lpfc_hba_eq_hdl_array_init(struct lpfc_hba *phba)
}
}

+static inline int lpfc_next_present_cpu(int n, int first_cpu)
+{
+ n = cpumask_next(n, cpu_present_mask);
+
+ if (n >= nr_cpu_ids)
+ n = first_cpu;
+
+ return n;
+}
+
/**
* lpfc_cpu_affinity_check - Check vector CPU affinity mappings
* @phba: pointer to lpfc hba data structure.
@@ -12561,10 +12571,8 @@ lpfc_cpu_affinity_check(struct lpfc_hba *phba, int vectors)
(new_cpup->eq != LPFC_VECTOR_MAP_EMPTY) &&
(new_cpup->phys_id == cpup->phys_id))
goto found_same;
- new_cpu = cpumask_next(
- new_cpu, cpu_present_mask);
- if (new_cpu == nr_cpumask_bits)
- new_cpu = first_cpu;
+
+ new_cpu = lpfc_next_present_cpu(new_cpu, first_cpu);
}
/* At this point, we leave the CPU as unassigned */
continue;
@@ -12576,9 +12584,7 @@ lpfc_cpu_affinity_check(struct lpfc_hba *phba, int vectors)
* chance of having multiple unassigned CPU entries
* selecting the same IRQ.
*/
- start_cpu = cpumask_next(new_cpu, cpu_present_mask);
- if (start_cpu == nr_cpumask_bits)
- start_cpu = first_cpu;
+ start_cpu = lpfc_next_present_cpu(new_cpu, first_cpu);

lpfc_printf_log(phba, KERN_INFO, LOG_INIT,
"3337 Set Affinity: CPU %d "
@@ -12611,10 +12617,8 @@ lpfc_cpu_affinity_check(struct lpfc_hba *phba, int vectors)
if (!(new_cpup->flag & LPFC_CPU_MAP_UNASSIGN) &&
(new_cpup->eq != LPFC_VECTOR_MAP_EMPTY))
goto found_any;
- new_cpu = cpumask_next(
- new_cpu, cpu_present_mask);
- if (new_cpu == nr_cpumask_bits)
- new_cpu = first_cpu;
+
+ new_cpu = lpfc_next_present_cpu(new_cpu, first_cpu);
}
/* We should never leave an entry unassigned */
lpfc_printf_log(phba, KERN_ERR, LOG_INIT,
@@ -12630,9 +12634,7 @@ lpfc_cpu_affinity_check(struct lpfc_hba *phba, int vectors)
* chance of having multiple unassigned CPU entries
* selecting the same IRQ.
*/
- start_cpu = cpumask_next(new_cpu, cpu_present_mask);
- if (start_cpu == nr_cpumask_bits)
- start_cpu = first_cpu;
+ start_cpu = lpfc_next_present_cpu(new_cpu, first_cpu);

lpfc_printf_log(phba, KERN_INFO, LOG_INIT,
"3338 Set Affinity: CPU %d "
@@ -12703,9 +12705,8 @@ lpfc_cpu_affinity_check(struct lpfc_hba *phba, int vectors)
new_cpup->core_id == cpup->core_id) {
goto found_hdwq;
}
- new_cpu = cpumask_next(new_cpu, cpu_present_mask);
- if (new_cpu == nr_cpumask_bits)
- new_cpu = first_cpu;
+
+ new_cpu = lpfc_next_present_cpu(new_cpu, first_cpu);
}

/* If we can't match both phys_id and core_id,
@@ -12718,9 +12719,7 @@ lpfc_cpu_affinity_check(struct lpfc_hba *phba, int vectors)
new_cpup->phys_id == cpup->phys_id)
goto found_hdwq;

- new_cpu = cpumask_next(new_cpu, cpu_present_mask);
- if (new_cpu == nr_cpumask_bits)
- new_cpu = first_cpu;
+ new_cpu = lpfc_next_present_cpu(new_cpu, first_cpu);
}

/* Otherwise just round robin on cfg_hdw_queue */
@@ -12729,9 +12728,7 @@ lpfc_cpu_affinity_check(struct lpfc_hba *phba, int vectors)
goto logit;
found_hdwq:
/* We found an available entry, copy the IRQ info */
- start_cpu = cpumask_next(new_cpu, cpu_present_mask);
- if (start_cpu == nr_cpumask_bits)
- start_cpu = first_cpu;
+ start_cpu = lpfc_next_present_cpu(new_cpu, first_cpu);
cpup->hdwq = new_cpup->hdwq;
logit:
lpfc_printf_log(phba, KERN_INFO, LOG_INIT,
--
2.34.1


2023-03-06 21:04:30

by Vernon Yang

[permalink] [raw]
Subject: [PATCH v2 4/4] scsi: lpfc: fix lpfc_nvmet_setup_io_context() if no further cpus set

When cpumask_next() the return value is greater than or equal to
nr_cpu_ids, it indicates invalid.

Before commit 596ff4a09b89 ("cpumask: re-introduce constant-sized cpumask
optimizations"), when cpumask_next() returned an invalid cpu, the driver
used the judgment equal to nr_cpu_ids to indicate the invalid cpu, so it
happened to work normally, but this is the wrong approach.

After commit 596ff4a09b89 ("cpumask: re-introduce constant-sized cpumask
optimizations"), these incorrect practices actively buggy, so fix it to
correctly.

Signed-off-by: Vernon Yang <[email protected]>
---
drivers/scsi/lpfc/lpfc_nvmet.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/lpfc/lpfc_nvmet.c b/drivers/scsi/lpfc/lpfc_nvmet.c
index 7517dd55fe91..3ae7f330f827 100644
--- a/drivers/scsi/lpfc/lpfc_nvmet.c
+++ b/drivers/scsi/lpfc/lpfc_nvmet.c
@@ -1621,7 +1621,7 @@ lpfc_nvmet_setup_io_context(struct lpfc_hba *phba)
continue;
}
cpu = cpumask_next(cpu, cpu_present_mask);
- if (cpu == nr_cpu_ids)
+ if (cpu >= nr_cpu_ids)
cpu = cpumask_first(cpu_present_mask);

}
--
2.34.1


2023-03-08 00:41:28

by Justin Tee

[permalink] [raw]
Subject: Re: [PATCH v2 3/4] scsi: lpfc: fix lpfc_cpu_affinity_check() if no further cpus set

Hi Vernon,

Is it possible to move the new helper routine lpfc_next_present_cpu
into the lpfc.h header file around where lpfc_next_online_cpu is
defined?

Also, with slight modifications we could use the new
lpfc_next_present_cpu helper routine for the
lpfc_nvmet_setup_io_context() patch as well so that we can contain all
lpfc changes within one patch.

---
drivers/scsi/lpfc/lpfc.h | 20 ++++++++++++++++++++
drivers/scsi/lpfc/lpfc_init.c | 31 +++++++------------------------
drivers/scsi/lpfc/lpfc_nvmet.c | 5 +----
3 files changed, 28 insertions(+), 28 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc.h b/drivers/scsi/lpfc/lpfc.h
index cf55f8e3bd9f..f342d6bc5726 100644
--- a/drivers/scsi/lpfc/lpfc.h
+++ b/drivers/scsi/lpfc/lpfc.h
@@ -1737,6 +1737,26 @@ lpfc_next_online_cpu(const struct cpumask
*mask, unsigned int start)

return cpu_it;
}
+
+/**
+ * lpfc_next_present_cpu - Finds next present CPU after n
+ * @n: the cpu prior to search
+ *
+ * Note: If no next present cpu, then fallback to first present cpu.
+ *
+ **/
+static inline unsigned int lpfc_next_present_cpu(int n)
+{
+ unsigned int cpu;
+
+ cpu = cpumask_next(n, cpu_present_mask);
+
+ if (cpu >= nr_cpu_ids)
+ cpu = cpumask_first(cpu_present_mask);
+
+ return cpu;
+}
+
/**
* lpfc_sli4_mod_hba_eq_delay - update EQ delay
* @phba: Pointer to HBA context object.
diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c
index 3e1e1d17b2b4..f28af338341f 100644
--- a/drivers/scsi/lpfc/lpfc_init.c
+++ b/drivers/scsi/lpfc/lpfc_init.c
@@ -12560,10 +12560,7 @@ lpfc_cpu_affinity_check(struct lpfc_hba
*phba, int vectors)
(new_cpup->eq != LPFC_VECTOR_MAP_EMPTY) &&
(new_cpup->phys_id == cpup->phys_id))
goto found_same;
- new_cpu = cpumask_next(
- new_cpu, cpu_present_mask);
- if (new_cpu == nr_cpumask_bits)
- new_cpu = first_cpu;
+ new_cpu = lpfc_next_present_cpu(new_cpu);
}
/* At this point, we leave the CPU as unassigned */
continue;
@@ -12575,9 +12572,7 @@ lpfc_cpu_affinity_check(struct lpfc_hba *phba,
int vectors)
* chance of having multiple unassigned CPU entries
* selecting the same IRQ.
*/
- start_cpu = cpumask_next(new_cpu, cpu_present_mask);
- if (start_cpu == nr_cpumask_bits)
- start_cpu = first_cpu;
+ start_cpu = lpfc_next_present_cpu(new_cpu);

lpfc_printf_log(phba, KERN_INFO, LOG_INIT,
"3337 Set Affinity: CPU %d "
@@ -12610,10 +12605,7 @@ lpfc_cpu_affinity_check(struct lpfc_hba
*phba, int vectors)
if (!(new_cpup->flag & LPFC_CPU_MAP_UNASSIGN) &&
(new_cpup->eq != LPFC_VECTOR_MAP_EMPTY))
goto found_any;
- new_cpu = cpumask_next(
- new_cpu, cpu_present_mask);
- if (new_cpu == nr_cpumask_bits)
- new_cpu = first_cpu;
+ new_cpu = lpfc_next_present_cpu(new_cpu);
}
/* We should never leave an entry unassigned */
lpfc_printf_log(phba, KERN_ERR, LOG_INIT,
@@ -12629,9 +12621,7 @@ lpfc_cpu_affinity_check(struct lpfc_hba *phba,
int vectors)
* chance of having multiple unassigned CPU entries
* selecting the same IRQ.
*/
- start_cpu = cpumask_next(new_cpu, cpu_present_mask);
- if (start_cpu == nr_cpumask_bits)
- start_cpu = first_cpu;
+ start_cpu = lpfc_next_present_cpu(new_cpu);

lpfc_printf_log(phba, KERN_INFO, LOG_INIT,
"3338 Set Affinity: CPU %d "
@@ -12702,9 +12692,7 @@ lpfc_cpu_affinity_check(struct lpfc_hba *phba,
int vectors)
new_cpup->core_id == cpup->core_id) {
goto found_hdwq;
}
- new_cpu = cpumask_next(new_cpu, cpu_present_mask);
- if (new_cpu == nr_cpumask_bits)
- new_cpu = first_cpu;
+ new_cpu = lpfc_next_present_cpu(new_cpu);
}

/* If we can't match both phys_id and core_id,
@@ -12716,10 +12704,7 @@ lpfc_cpu_affinity_check(struct lpfc_hba
*phba, int vectors)
if (new_cpup->hdwq != LPFC_VECTOR_MAP_EMPTY &&
new_cpup->phys_id == cpup->phys_id)
goto found_hdwq;
-
- new_cpu = cpumask_next(new_cpu, cpu_present_mask);
- if (new_cpu == nr_cpumask_bits)
- new_cpu = first_cpu;
+ new_cpu = lpfc_next_present_cpu(new_cpu);
}

/* Otherwise just round robin on cfg_hdw_queue */
@@ -12728,9 +12713,7 @@ lpfc_cpu_affinity_check(struct lpfc_hba *phba,
int vectors)
goto logit;
found_hdwq:
/* We found an available entry, copy the IRQ info */
- start_cpu = cpumask_next(new_cpu, cpu_present_mask);
- if (start_cpu == nr_cpumask_bits)
- start_cpu = first_cpu;
+ start_cpu = lpfc_next_present_cpu(new_cpu);
cpup->hdwq = new_cpup->hdwq;
logit:
lpfc_printf_log(phba, KERN_INFO, LOG_INIT,
diff --git a/drivers/scsi/lpfc/lpfc_nvmet.c b/drivers/scsi/lpfc/lpfc_nvmet.c
index 7517dd55fe91..2d8ac2ceb6f3 100644
--- a/drivers/scsi/lpfc/lpfc_nvmet.c
+++ b/drivers/scsi/lpfc/lpfc_nvmet.c
@@ -1620,10 +1620,7 @@ lpfc_nvmet_setup_io_context(struct lpfc_hba *phba)
cpu = cpumask_first(cpu_present_mask);
continue;
}
- cpu = cpumask_next(cpu, cpu_present_mask);
- if (cpu == nr_cpu_ids)
- cpu = cpumask_first(cpu_present_mask);
-
+ cpu = lpfc_next_present_cpu(cpu);
}

for_each_present_cpu(i) {
--
2.38.0

Thanks,
Justin

On Mon, Mar 6, 2023 at 1:10 PM Vernon Yang <[email protected]> wrote:
>
> When cpumask_next() the return value is greater than or equal to
> nr_cpu_ids, it indicates invalid.
>
> Before commit 596ff4a09b89 ("cpumask: re-introduce constant-sized cpumask
> optimizations"), when cpumask_next() returned an invalid cpu, the driver
> used the judgment equal to nr_cpu_ids to indicate the invalid cpu, so it
> happened to work normally, but this is the wrong approach.
>
> After commit 596ff4a09b89 ("cpumask: re-introduce constant-sized cpumask
> optimizations"), these incorrect practices actively buggy, so fix it to
> correctly.
>
> Signed-off-by: Vernon Yang <[email protected]>
> ---
> drivers/scsi/lpfc/lpfc_init.c | 43 ++++++++++++++++-------------------
> 1 file changed, 20 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c
> index 61958a24a43d..acfffdbe9ba1 100644
> --- a/drivers/scsi/lpfc/lpfc_init.c
> +++ b/drivers/scsi/lpfc/lpfc_init.c
> @@ -12473,6 +12473,16 @@ lpfc_hba_eq_hdl_array_init(struct lpfc_hba *phba)
> }
> }
>
> +static inline int lpfc_next_present_cpu(int n, int first_cpu)
> +{
> + n = cpumask_next(n, cpu_present_mask);
> +
> + if (n >= nr_cpu_ids)
> + n = first_cpu;
> +
> + return n;
> +}
> +
> /**
> * lpfc_cpu_affinity_check - Check vector CPU affinity mappings
> * @phba: pointer to lpfc hba data structure.
> @@ -12561,10 +12571,8 @@ lpfc_cpu_affinity_check(struct lpfc_hba *phba, int vectors)
> (new_cpup->eq != LPFC_VECTOR_MAP_EMPTY) &&
> (new_cpup->phys_id == cpup->phys_id))
> goto found_same;
> - new_cpu = cpumask_next(
> - new_cpu, cpu_present_mask);
> - if (new_cpu == nr_cpumask_bits)
> - new_cpu = first_cpu;
> +
> + new_cpu = lpfc_next_present_cpu(new_cpu, first_cpu);
> }
> /* At this point, we leave the CPU as unassigned */
> continue;
> @@ -12576,9 +12584,7 @@ lpfc_cpu_affinity_check(struct lpfc_hba *phba, int vectors)
> * chance of having multiple unassigned CPU entries
> * selecting the same IRQ.
> */
> - start_cpu = cpumask_next(new_cpu, cpu_present_mask);
> - if (start_cpu == nr_cpumask_bits)
> - start_cpu = first_cpu;
> + start_cpu = lpfc_next_present_cpu(new_cpu, first_cpu);
>
> lpfc_printf_log(phba, KERN_INFO, LOG_INIT,
> "3337 Set Affinity: CPU %d "
> @@ -12611,10 +12617,8 @@ lpfc_cpu_affinity_check(struct lpfc_hba *phba, int vectors)
> if (!(new_cpup->flag & LPFC_CPU_MAP_UNASSIGN) &&
> (new_cpup->eq != LPFC_VECTOR_MAP_EMPTY))
> goto found_any;
> - new_cpu = cpumask_next(
> - new_cpu, cpu_present_mask);
> - if (new_cpu == nr_cpumask_bits)
> - new_cpu = first_cpu;
> +
> + new_cpu = lpfc_next_present_cpu(new_cpu, first_cpu);
> }
> /* We should never leave an entry unassigned */
> lpfc_printf_log(phba, KERN_ERR, LOG_INIT,
> @@ -12630,9 +12634,7 @@ lpfc_cpu_affinity_check(struct lpfc_hba *phba, int vectors)
> * chance of having multiple unassigned CPU entries
> * selecting the same IRQ.
> */
> - start_cpu = cpumask_next(new_cpu, cpu_present_mask);
> - if (start_cpu == nr_cpumask_bits)
> - start_cpu = first_cpu;
> + start_cpu = lpfc_next_present_cpu(new_cpu, first_cpu);
>
> lpfc_printf_log(phba, KERN_INFO, LOG_INIT,
> "3338 Set Affinity: CPU %d "
> @@ -12703,9 +12705,8 @@ lpfc_cpu_affinity_check(struct lpfc_hba *phba, int vectors)
> new_cpup->core_id == cpup->core_id) {
> goto found_hdwq;
> }
> - new_cpu = cpumask_next(new_cpu, cpu_present_mask);
> - if (new_cpu == nr_cpumask_bits)
> - new_cpu = first_cpu;
> +
> + new_cpu = lpfc_next_present_cpu(new_cpu, first_cpu);
> }
>
> /* If we can't match both phys_id and core_id,
> @@ -12718,9 +12719,7 @@ lpfc_cpu_affinity_check(struct lpfc_hba *phba, int vectors)
> new_cpup->phys_id == cpup->phys_id)
> goto found_hdwq;
>
> - new_cpu = cpumask_next(new_cpu, cpu_present_mask);
> - if (new_cpu == nr_cpumask_bits)
> - new_cpu = first_cpu;
> + new_cpu = lpfc_next_present_cpu(new_cpu, first_cpu);
> }
>
> /* Otherwise just round robin on cfg_hdw_queue */
> @@ -12729,9 +12728,7 @@ lpfc_cpu_affinity_check(struct lpfc_hba *phba, int vectors)
> goto logit;
> found_hdwq:
> /* We found an available entry, copy the IRQ info */
> - start_cpu = cpumask_next(new_cpu, cpu_present_mask);
> - if (start_cpu == nr_cpumask_bits)
> - start_cpu = first_cpu;
> + start_cpu = lpfc_next_present_cpu(new_cpu, first_cpu);
> cpup->hdwq = new_cpup->hdwq;
> logit:
> lpfc_printf_log(phba, KERN_INFO, LOG_INIT,
> --
> 2.34.1
>