Hi,
These two patches fixes IO hang issue reported by Laurence.
84676c1f21 ("genirq/affinity: assign vectors to all possible CPUs")
may cause one irq vector assigned to all offline CPUs, then this vector
can't handle irq any more.
The 1st patch moves irq vectors spread into one function, and prepares
for the fix done in 2nd patch.
The 2nd patch fixes the issue by trying to make sure online CPUs assigned
to irq vector.
Ming Lei (2):
genirq/affinity: move irq vectors spread into one function
genirq/affinity: try best to make sure online CPU is assigned to
vector
kernel/irq/affinity.c | 77 ++++++++++++++++++++++++++++++++++-----------------
1 file changed, 52 insertions(+), 25 deletions(-)
--
2.9.5
This patch is preparing for doing two steps spread:
- spread vectors across non-online CPUs
- spread vectors across online CPUs
This way is applied for trying best to avoid allocating all offline CPUs
to one single vector.
No functional change, and code gets cleaned up too.
Cc: Thomas Gleixner <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Ming Lei <[email protected]>
---
kernel/irq/affinity.c | 56 +++++++++++++++++++++++++++++++--------------------
1 file changed, 34 insertions(+), 22 deletions(-)
diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
index a37a3b4b6342..99eb38a4cc83 100644
--- a/kernel/irq/affinity.c
+++ b/kernel/irq/affinity.c
@@ -94,6 +94,35 @@ static int get_nodes_in_cpumask(cpumask_var_t *node_to_possible_cpumask,
return nodes;
}
+/* Spread irq vectors, and the result is stored to @irqmsk. */
+static int irq_vecs_spread_affinity(struct cpumask *irqmsk,
+ int max_irqmsks,
+ int max_vecs,
+ struct cpumask *nmsk)
+{
+ int v, ncpus = cpumask_weight(nmsk);
+ int vecs_to_assign, extra_vecs;
+
+ /* How many vectors we will try to spread */
+ vecs_to_assign = min(max_vecs, ncpus);
+
+ /* Account for rounding errors */
+ extra_vecs = ncpus - vecs_to_assign * (ncpus / vecs_to_assign);
+
+ for (v = 0; v < min(max_irqmsks, vecs_to_assign); v++) {
+ int cpus_per_vec = ncpus / vecs_to_assign;
+
+ /* Account for extra vectors to compensate rounding errors */
+ if (extra_vecs) {
+ cpus_per_vec++;
+ --extra_vecs;
+ }
+ irq_spread_init_one(irqmsk + v, nmsk, cpus_per_vec);
+ }
+
+ return v;
+}
+
/**
* irq_create_affinity_masks - Create affinity masks for multiqueue spreading
* @nvecs: The total number of vectors
@@ -104,7 +133,7 @@ static int get_nodes_in_cpumask(cpumask_var_t *node_to_possible_cpumask,
struct cpumask *
irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
{
- int n, nodes, cpus_per_vec, extra_vecs, curvec;
+ int n, nodes, curvec;
int affv = nvecs - affd->pre_vectors - affd->post_vectors;
int last_affv = affv + affd->pre_vectors;
nodemask_t nodemsk = NODE_MASK_NONE;
@@ -154,33 +183,16 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
}
for_each_node_mask(n, nodemsk) {
- int ncpus, v, vecs_to_assign, vecs_per_node;
+ int vecs_per_node;
/* Spread the vectors per node */
vecs_per_node = (affv - (curvec - affd->pre_vectors)) / nodes;
- /* Get the cpus on this node which are in the mask */
cpumask_and(nmsk, cpu_possible_mask, node_to_possible_cpumask[n]);
- /* Calculate the number of cpus per vector */
- ncpus = cpumask_weight(nmsk);
- vecs_to_assign = min(vecs_per_node, ncpus);
-
- /* Account for rounding errors */
- extra_vecs = ncpus - vecs_to_assign * (ncpus / vecs_to_assign);
-
- for (v = 0; curvec < last_affv && v < vecs_to_assign;
- curvec++, v++) {
- cpus_per_vec = ncpus / vecs_to_assign;
-
- /* Account for extra vectors to compensate rounding errors */
- if (extra_vecs) {
- cpus_per_vec++;
- --extra_vecs;
- }
- irq_spread_init_one(masks + curvec, nmsk, cpus_per_vec);
- }
-
+ curvec += irq_vecs_spread_affinity(&masks[curvec],
+ last_affv - curvec,
+ vecs_per_node, nmsk);
if (curvec >= last_affv)
break;
--nodes;
--
2.9.5
84676c1f21 ("genirq/affinity: assign vectors to all possible CPUs")
causes irq vector assigned to all offline CPUs, and IO hang is reported
on HPSA by Laurence.
This patch fixes this issue by trying best to make sure online CPU can be
assigned to irq vector. And take two steps to spread irq vectors:
1) spread irq vectors across offline CPUs in the node cpumask
2) spread irq vectors across online CPUs in the node cpumask
Fixes: 84676c1f21 ("genirq/affinity: assign vectors to all possible CPUs")
Cc: Thomas Gleixner <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Reported-by: Laurence Oberman <[email protected]>
Signed-off-by: Ming Lei <[email protected]>
---
kernel/irq/affinity.c | 21 ++++++++++++++++++---
1 file changed, 18 insertions(+), 3 deletions(-)
diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
index 99eb38a4cc83..8b716548b3db 100644
--- a/kernel/irq/affinity.c
+++ b/kernel/irq/affinity.c
@@ -103,6 +103,10 @@ static int irq_vecs_spread_affinity(struct cpumask *irqmsk,
int v, ncpus = cpumask_weight(nmsk);
int vecs_to_assign, extra_vecs;
+ /* May happen when spreading vectors across offline cpus */
+ if (!ncpus)
+ return 0;
+
/* How many vectors we will try to spread */
vecs_to_assign = min(max_vecs, ncpus);
@@ -165,13 +169,16 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
/* Stabilize the cpumasks */
get_online_cpus();
build_node_to_possible_cpumask(node_to_possible_cpumask);
- nodes = get_nodes_in_cpumask(node_to_possible_cpumask, cpu_possible_mask,
- &nodemsk);
/*
+ * Don't spread irq vector across offline node.
+ *
* If the number of nodes in the mask is greater than or equal the
* number of vectors we just spread the vectors across the nodes.
+ *
*/
+ nodes = get_nodes_in_cpumask(node_to_possible_cpumask, cpu_online_mask,
+ &nodemsk);
if (affv <= nodes) {
for_each_node_mask(n, nodemsk) {
cpumask_copy(masks + curvec,
@@ -182,14 +189,22 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
goto done;
}
+ nodes_clear(nodemsk);
+ nodes = get_nodes_in_cpumask(node_to_possible_cpumask, cpu_possible_mask,
+ &nodemsk);
for_each_node_mask(n, nodemsk) {
int vecs_per_node;
/* Spread the vectors per node */
vecs_per_node = (affv - (curvec - affd->pre_vectors)) / nodes;
- cpumask_and(nmsk, cpu_possible_mask, node_to_possible_cpumask[n]);
+ /* spread vectors across offline cpus in the node cpumask */
+ cpumask_andnot(nmsk, node_to_possible_cpumask[n], cpu_online_mask);
+ irq_vecs_spread_affinity(&masks[curvec], last_affv - curvec,
+ vecs_per_node, nmsk);
+ /* spread vectors across online cpus in the node cpumask */
+ cpumask_and(nmsk, node_to_possible_cpumask[n], cpu_online_mask);
curvec += irq_vecs_spread_affinity(&masks[curvec],
last_affv - curvec,
vecs_per_node, nmsk);
--
2.9.5
On Tue, Jan 16, 2018 at 12:03:43AM +0800, Ming Lei wrote:
> Hi,
>
> These two patches fixes IO hang issue reported by Laurence.
>
> 84676c1f21 ("genirq/affinity: assign vectors to all possible CPUs")
> may cause one irq vector assigned to all offline CPUs, then this vector
> can't handle irq any more.
Well, that very much was the intention of managed interrupts. Why
does the device raise an interrupt for a queue that has no online
cpu assigned to it?
On Tue, 16 Jan 2018, Ming Lei wrote:
> These two patches fixes IO hang issue reported by Laurence.
>
> 84676c1f21 ("genirq/affinity: assign vectors to all possible CPUs")
> may cause one irq vector assigned to all offline CPUs, then this vector
> can't handle irq any more.
>
> The 1st patch moves irq vectors spread into one function, and prepares
> for the fix done in 2nd patch.
>
> The 2nd patch fixes the issue by trying to make sure online CPUs assigned
> to irq vector.
Which means it's completely undoing the intent and mechanism of managed
interrupts. Not going to happen.
Which driver is that which abuses managed interrupts and does not keep its
queues properly sorted on cpu hotplug?
Thanks,
tglx
On Mon, 2018-01-15 at 18:43 +0100, Thomas Gleixner wrote:
> On Tue, 16 Jan 2018, Ming Lei wrote:
> > These two patches fixes IO hang issue reported by Laurence.
> >
> > 84676c1f21 ("genirq/affinity: assign vectors to all possible CPUs")
> > may cause one irq vector assigned to all offline CPUs, then this
> > vector
> > can't handle irq any more.
> >
> > The 1st patch moves irq vectors spread into one function, and
> > prepares
> > for the fix done in 2nd patch.
> >
> > The 2nd patch fixes the issue by trying to make sure online CPUs
> > assigned
> > to irq vector.
>
> Which means it's completely undoing the intent and mechanism of
> managed
> interrupts. Not going to happen.
>
> Which driver is that which abuses managed interrupts and does not
> keep its
> queues properly sorted on cpu hotplug?
>
> Thanks,
>
> tglx
Hello Thomas
The servers I am using are all booting off hpsa (SmartArray)
The system would hang on boot with this stack below.
So seen when booting off hpsa driver, not seen by Mike when booting off
a server not using hpsa.
Also not seen when reverting the patch I called out and reverted.
Putting that patch back into Mike/Jens combined tree and adding Ming's
patch seems to fix this issue now. I can boot.
I just did a quick sanity boot and check, not any in-depth testing
right now.
Its not code I am at all familiar with that Ming has changed to make it
work so I defer to Ming to explain in-depth
[ 246.751050] INFO: task systemd-udevd:411 blocked for more than 120
seconds.
[ 246.791852] Tainted: G I 4.15.0-
rc4.block.dm.4.16+ #1
[ 246.830650] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables
this message.
[ 246.874637] systemd-udevd D 0 411 408 0x80000004
[ 246.904934] Call Trace:
[ 246.918191] ? __schedule+0x28d/0x870
[ 246.937643] ? _cond_resched+0x15/0x30
[ 246.958222] schedule+0x32/0x80
[ 246.975424] async_synchronize_cookie_domain+0x8b/0x140
[ 247.004452] ? remove_wait_queue+0x60/0x60
[ 247.027335] do_init_module+0xbe/0x219
[ 247.048022] load_module+0x21d6/0x2910
[ 247.069436] ? m_show+0x1c0/0x1c0
[ 247.087999] SYSC_finit_module+0x94/0xe0
[ 247.110392] entry_SYSCALL_64_fastpath+0x1a/0x7d
[ 247.136669] RIP: 0033:0x7f84049287f9
[ 247.156112] RSP: 002b:00007ffd13199ab8 EFLAGS: 00000246 ORIG_RAX:
0000000000000139
[ 247.196883] RAX: ffffffffffffffda RBX: 000055b712b59e80 RCX:
00007f84049287f9
[ 247.237989] RDX: 0000000000000000 RSI: 00007f8405245099 RDI:
0000000000000008
[ 247.279105] RBP: 00007f8404bf2760 R08: 0000000000000000 R09:
000055b712b45760
[ 247.320005] R10: 0000000000000008 R11: 0000000000000246 R12:
0000000000000020
[ 247.360625] R13: 00007f8404bf2818 R14: 0000000000000050 R15:
00007f8404bf27b8
[ 247.401062] INFO: task scsi_eh_0:471 blocked for more than 120
seconds.
[ 247.438161] Tainted: G I 4.15.0-
rc4.block.dm.4.16+ #1
[ 247.476640] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables
this message.
[ 247.520700] scsi_eh_0 D 0 471 2 0x80000000
[ 247.551339] Call Trace:
[ 247.564360] ? __schedule+0x28d/0x870
[ 247.584720] schedule+0x32/0x80
[ 247.601294] hpsa_eh_device_reset_handler+0x68c/0x700 [hpsa]
[ 247.633358] ? remove_wait_queue+0x60/0x60
[ 247.656345] scsi_try_bus_device_reset+0x27/0x40
[ 247.682424] scsi_eh_ready_devs+0x53f/0xe20
[ 247.706467] ? __pm_runtime_resume+0x55/0x70
[ 247.730327] scsi_error_handler+0x434/0x5e0
[ 247.754387] ? __schedule+0x295/0x870
[ 247.775420] kthread+0xf5/0x130
[ 247.793461] ? scsi_eh_get_sense+0x240/0x240
[ 247.818008] ? kthread_associate_blkcg+0x90/0x90
[ 247.844759] ret_from_fork+0x1f/0x30
[ 247.865440] INFO: task scsi_id:488 blocked for more than 120
seconds.
[ 247.901112] Tainted: G I 4.15.0-
rc4.block.dm.4.16+ #1
[ 247.938743] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables
this message.
[ 247.981092] scsi_id D 0 488 1 0x00000004
[ 248.010535] Call Trace:
[ 248.023567] ? __schedule+0x28d/0x870
[ 248.044236] ? __switch_to+0x1f5/0x460
[ 248.065776] schedule+0x32/0x80
[ 248.084238] schedule_timeout+0x1d4/0x2f0
[ 248.106184] wait_for_completion+0x123/0x190
[ 248.130759] ? wake_up_q+0x70/0x70
[ 248.150295] flush_work+0x119/0x1a0
[ 248.169238] ? wake_up_worker+0x30/0x30
[ 248.189670] __cancel_work_timer+0x103/0x190
[ 248.213751] ? kobj_lookup+0x10b/0x160
[ 248.235441] disk_block_events+0x6f/0x90
[ 248.257820] __blkdev_get+0x6a/0x480
[ 248.278770] ? bd_acquire+0xd0/0xd0
[ 248.298438] blkdev_get+0x1a5/0x300
[ 248.316587] ? bd_acquire+0xd0/0xd0
[ 248.334814] do_dentry_open+0x202/0x320
[ 248.354372] ? security_inode_permission+0x3c/0x50
[ 248.378818] path_openat+0x537/0x12c0
[ 248.397386] ? vm_insert_page+0x1e0/0x1f0
[ 248.417664] ? vvar_fault+0x75/0x140
[ 248.435811] do_filp_open+0x91/0x100
[ 248.454061] do_sys_open+0x126/0x210
[ 248.472462] entry_SYSCALL_64_fastpath+0x1a/0x7d
[ 248.495438] RIP: 0033:0x7f39e60e1e90
[ 248.513136] RSP: 002b:00007ffc4c906ba8 EFLAGS: 00000246 ORIG_RAX:
0000000000000002
[ 248.550726] RAX: ffffffffffffffda RBX: 00005624aead3010 RCX:
00007f39e60e1e90
[ 248.586207] RDX: 00007f39e60cc0c4 RSI: 0000000000080800 RDI:
00007ffc4c906ed0
[ 248.622411] RBP: 00007ffc4c906b60 R08: 00007f39e60cc140 R09:
00007f39e60cc140
[ 248.658704] R10: 000000000000001f R11: 0000000000000246 R12:
00007ffc4c906ed0
[ 248.695771] R13: 000000009da9d520 R14: 0000000000000000 R15:
00007ffc4c906c28
On Mon, Jan 15, 2018 at 09:40:36AM -0800, Christoph Hellwig wrote:
> On Tue, Jan 16, 2018 at 12:03:43AM +0800, Ming Lei wrote:
> > Hi,
> >
> > These two patches fixes IO hang issue reported by Laurence.
> >
> > 84676c1f21 ("genirq/affinity: assign vectors to all possible CPUs")
> > may cause one irq vector assigned to all offline CPUs, then this vector
> > can't handle irq any more.
>
> Well, that very much was the intention of managed interrupts. Why
> does the device raise an interrupt for a queue that has no online
> cpu assigned to it?
It is because of irq_create_affinity_masks().
Once irq vectors spread across possible CPUs, some of which are offline,
may be assigned to one vector.
For example of HPSA, there are 8 irq vectors in this device, and the
system supports at most 32 CPUs, but only 16 presents(0-15) after booting,
we should allow to assign at least one CPU for handling each irq vector for
HPSA, but:
1) before commit 84676c1f21:
irq 25, cpu list 0
irq 26, cpu list 2
irq 27, cpu list 4
irq 28, cpu list 6
irq 29, cpu list 8
irq 30, cpu list 10
irq 31, cpu list 12
irq 32, cpu list 14
irq 33, cpu list 1
irq 34, cpu list 3
irq 35, cpu list 5
irq 36, cpu list 7
irq 37, cpu list 9
irq 38, cpu list 11
irq 39, cpu list 13
irq 40, cpu list 15
2) after commit 84676c1f21:
irq 25, cpu list 0, 2
irq 26, cpu list 4, 6
irq 27, cpu list 8, 10
irq 28, cpu list 12, 14
irq 29, cpu list 16, 18
irq 30, cpu list 20, 22
irq 31, cpu list 24, 26
irq 32, cpu list 28, 30
irq 33, cpu list 1, 3
irq 34, cpu list 5, 7
irq 35, cpu list 9, 11
irq 36, cpu list 13, 15
irq 37, cpu list 17, 19
irq 38, cpu list 21, 23
irq 39, cpu list 25, 27
irq 40, cpu list 29, 31
And vectors of 29-32, 37-40 are assigned to offline CPUs.
--
Ming
On Mon, Jan 15, 2018 at 06:43:47PM +0100, Thomas Gleixner wrote:
> On Tue, 16 Jan 2018, Ming Lei wrote:
> > These two patches fixes IO hang issue reported by Laurence.
> >
> > 84676c1f21 ("genirq/affinity: assign vectors to all possible CPUs")
> > may cause one irq vector assigned to all offline CPUs, then this vector
> > can't handle irq any more.
> >
> > The 1st patch moves irq vectors spread into one function, and prepares
> > for the fix done in 2nd patch.
> >
> > The 2nd patch fixes the issue by trying to make sure online CPUs assigned
> > to irq vector.
>
> Which means it's completely undoing the intent and mechanism of managed
> interrupts. Not going to happen.
As I replied in previous mail, some of offline CPUs may be assigned to
some of irq vectors after we assign vectors to all possible CPUs, some
of which are not present.
>
> Which driver is that which abuses managed interrupts and does not keep its
> queues properly sorted on cpu hotplug?
It isn't related with driver/device, and I can trigger this issue on NVMe
easily except for HPSA.
Thanks,
Ming
On Mon, Jan 15, 2018 at 09:40:36AM -0800, Christoph Hellwig wrote:
> On Tue, Jan 16, 2018 at 12:03:43AM +0800, Ming Lei wrote:
> > Hi,
> >
> > These two patches fixes IO hang issue reported by Laurence.
> >
> > 84676c1f21 ("genirq/affinity: assign vectors to all possible CPUs")
> > may cause one irq vector assigned to all offline CPUs, then this vector
> > can't handle irq any more.
>
> Well, that very much was the intention of managed interrupts. Why
> does the device raise an interrupt for a queue that has no online
> cpu assigned to it?
If pci_alloc_irq_vectors() returns OK, driver may think everything
is just fine, and configure the related hw queues(such as enabling irq
on queues), and finally irq comes and no CPU can handle them.
Also I think there may not drivers which check if the CPUs assigned
for irq vectors are online or not, and seems never a job which is
supposed to do by driver.
--
Ming
On Tue, 16 Jan 2018, Ming Lei wrote:
> On Mon, Jan 15, 2018 at 09:40:36AM -0800, Christoph Hellwig wrote:
> > On Tue, Jan 16, 2018 at 12:03:43AM +0800, Ming Lei wrote:
> > > Hi,
> > >
> > > These two patches fixes IO hang issue reported by Laurence.
> > >
> > > 84676c1f21 ("genirq/affinity: assign vectors to all possible CPUs")
> > > may cause one irq vector assigned to all offline CPUs, then this vector
> > > can't handle irq any more.
> >
> > Well, that very much was the intention of managed interrupts. Why
> > does the device raise an interrupt for a queue that has no online
> > cpu assigned to it?
>
> It is because of irq_create_affinity_masks().
That still does not answer the question. If the interrupt for a queue is
assigned to an offline CPU, then the queue should not be used and never
raise an interrupt. That's how managed interrupts have been designed.
Thanks,
tglx
On Tue, Jan 16, 2018 at 12:25:19PM +0100, Thomas Gleixner wrote:
> On Tue, 16 Jan 2018, Ming Lei wrote:
>
> > On Mon, Jan 15, 2018 at 09:40:36AM -0800, Christoph Hellwig wrote:
> > > On Tue, Jan 16, 2018 at 12:03:43AM +0800, Ming Lei wrote:
> > > > Hi,
> > > >
> > > > These two patches fixes IO hang issue reported by Laurence.
> > > >
> > > > 84676c1f21 ("genirq/affinity: assign vectors to all possible CPUs")
> > > > may cause one irq vector assigned to all offline CPUs, then this vector
> > > > can't handle irq any more.
> > >
> > > Well, that very much was the intention of managed interrupts. Why
> > > does the device raise an interrupt for a queue that has no online
> > > cpu assigned to it?
> >
> > It is because of irq_create_affinity_masks().
>
> That still does not answer the question. If the interrupt for a queue is
> assigned to an offline CPU, then the queue should not be used and never
> raise an interrupt. That's how managed interrupts have been designed.
Sorry for not answering it in 1st place, but later I realized that:
https://marc.info/?l=linux-block&m=151606896601195&w=2
Also wrt. HPSA's queue, looks they are not usual IO queue(such as NVMe's
hw queue) which supposes to be C/S model. And HPSA's queue is more like
a management queue, I guess, since HPSA is still a single queue HBA,
from blk-mq view.
Cc HPSA and SCSI guys.
Thanks,
Ming
On Tue, 2018-01-16 at 12:25 +0100, Thomas Gleixner wrote:
> On Tue, 16 Jan 2018, Ming Lei wrote:
>
> > On Mon, Jan 15, 2018 at 09:40:36AM -0800, Christoph Hellwig wrote:
> > > On Tue, Jan 16, 2018 at 12:03:43AM +0800, Ming Lei wrote:
> > > > Hi,
> > > >
> > > > These two patches fixes IO hang issue reported by Laurence.
> > > >
> > > > 84676c1f21 ("genirq/affinity: assign vectors to all possible
> > > > CPUs")
> > > > may cause one irq vector assigned to all offline CPUs, then
> > > > this vector
> > > > can't handle irq any more.
> > >
> > > Well, that very much was the intention of managed
> > > interrupts. Why
> > > does the device raise an interrupt for a queue that has no online
> > > cpu assigned to it?
> >
> > It is because of irq_create_affinity_masks().
>
> That still does not answer the question. If the interrupt for a queue
> is
> assigned to an offline CPU, then the queue should not be used and
> never
> raise an interrupt. That's how managed interrupts have been designed.
>
> Thanks,
>
> tglx
>
>
>
>
I captured a full boot log for this issue for Microsemi, I will send it
to Don Brace.
I enabled all the HPSA debug and here is snippet
[ 0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-4.15.0-
rc4.noming+ root=/dev/mapper/rhel_ibclient-root ro crashkernel=512M@64M
rd.lvm.lv=rhel_ibclient/root rd.lvm.lv=rhel_ibclient/swap
log_buf_len=54M console=ttyS1,115200n8 scsi_mod.use_blk_mq=y
dm_mod.use_blk_mq=y
[ 0.000000] Memory: 7834908K/1002852K available (8397K kernel code,
3012K rwdata, 3660K rodata, 2184K init, 15344K bss, 2356808K reserved,
0K cma-reserved)
[ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=32,
Nodes=2
[ 0.000000] ftrace: allocating 33084 entries in 130 pages
[ 0.000000] Running RCU self tests
[ 0.000000] Hierarchical RCU implementation.
[ 0.000000] RCU lockdep checking is enabled.
[ 0.000000] RCU restricting CPUs from NR_CPUS=8192 to
nr_cpu_ids=32.
[ 0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16,
nr_cpu_ids=32
[ 0.000000] NR_IRQS: 524544, nr_irqs: 1088, preallocated irqs: 16
..
..
0.190147] smp: Brought up 2 nodes, 16 CPUs
[ 0.192006] smpboot: Max logical packages: 4
[ 0.193007] smpboot: Total of 16 processors activated (76776.33
BogoMIPS)
[ 0.940640] node 0 initialised, 10803218 pages in 743ms
[ 1.005449] node 1 initialised, 11812066 pages in 807ms
..
..
[ 7.440896] hpsa 0000:05:00.0: can't disable ASPM; OS doesn't have
ASPM control
[ 7.442071] hpsa 0000:05:00.0: Logical aborts not supported
[ 7.442075] hpsa 0000:05:00.0: HP SSD Smart Path aborts not
supported
[ 7.442164] hpsa 0000:05:00.0: Controller Configuration information
[ 7.442167] hpsa 0000:05:00.0: ------------------------------------
[ 7.442173] hpsa 0000:05:00.0: Signature = CISS
[ 7.442177] hpsa 0000:05:00.0: Spec Number = 3
[ 7.442182] hpsa 0000:05:00.0: Transport methods supported =
0x7a000007
[ 7.442186] hpsa 0000:05:00.0: Transport methods active = 0x3
[ 7.442190] hpsa 0000:05:00.0: Requested transport Method = 0x2
[ 7.442194] hpsa 0000:05:00.0: Coalesce Interrupt Delay = 0x0
[ 7.442198] hpsa 0000:05:00.0: Coalesce Interrupt Count = 0x1
[ 7.442202] hpsa 0000:05:00.0: Max outstanding commands = 1024
[ 7.442206] hpsa 0000:05:00.0: Bus Types = 0x200000
[ 7.442220] hpsa 0000:05:00.0: Server Name = 2M21220149
[ 7.442224] hpsa 0000:05:00.0: Heartbeat Counter = 0xd23
[ 7.442224]
[ 7.442224]
..
..
246.751135] INFO: task systemd-udevd:413 blocked for more than 120
seconds.
[ 246.788008] Tainted: G I 4.15.0-rc4.noming+ #1
[ 246.822380] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 246.865594] systemd-udevd D 0 413 411 0x80000004
[ 246.895519] Call Trace:
[ 246.909713] ? __schedule+0x340/0xc20
[ 246.930236] schedule+0x32/0x80
[ 246.947905] schedule_timeout+0x23d/0x450
[ 246.970047] ? find_held_lock+0x2d/0x90
[ 246.991774] ? wait_for_completion_io+0x108/0x170
[ 247.018172] io_schedule_timeout+0x19/0x40
[ 247.041208] wait_for_completion_io+0x110/0x170
[ 247.067326] ? wake_up_q+0x70/0x70
[ 247.086801] hpsa_scsi_do_simple_cmd+0xc6/0x100 [hpsa]
[ 247.114315] hpsa_scsi_do_simple_cmd_with_retry+0xb7/0x1c0 [hpsa]
[ 247.146629] hpsa_scsi_do_inquiry+0x73/0xd0 [hpsa]
[ 247.174118] hpsa_init_one+0x12cb/0x1a59 [hpsa]
[ 247.199851] ? __pm_runtime_resume+0x55/0x70
[ 247.224527] local_pci_probe+0x3f/0xa0
[ 247.246034] pci_device_probe+0x146/0x1b0
[ 247.268413] driver_probe_device+0x2b3/0x4a0
[ 247.291868] __driver_attach+0xda/0xe0
[ 247.313370] ? driver_probe_device+0x4a0/0x4a0
[ 247.338399] bus_for_each_dev+0x6a/0xb0
[ 247.359912] bus_add_driver+0x41/0x260
[ 247.380244] driver_register+0x5b/0xd0
[ 247.400811] ? 0xffffffffc016b000
[ 247.418819] hpsa_init+0x38/0x1000 [hpsa]
[ 247.440763] ? 0xffffffffc016b000
[ 247.459451] do_one_initcall+0x4d/0x19c
[ 247.480539] ? do_init_module+0x22/0x220
[ 247.502575] ? rcu_read_lock_sched_held+0x64/0x70
[ 247.529549] ? kmem_cache_alloc_trace+0x1f7/0x260
[ 247.556204] ? do_init_module+0x22/0x220
[ 247.578633] do_init_module+0x5a/0x220
[ 247.600322] load_module+0x21e8/0x2a50
[ 247.621648] ? __symbol_put+0x60/0x60
[ 247.642796] SYSC_finit_module+0x94/0xe0
[ 247.665336] entry_SYSCALL_64_fastpath+0x1f/0x96
[ 247.691751] RIP: 0033:0x7fc63d6527f9
[ 247.712308] RSP: 002b:00007ffdf1659ba8 EFLAGS: 00000246 ORIG_RAX:
0000000000000139
[ 247.755272] RAX: ffffffffffffffda RBX: 0000556b524c5f70 RCX:
00007fc63d6527f9
[ 247.795779] RDX: 0000000000000000 RSI: 00007fc63df6f099 RDI:
0000000000000008
[ 247.836413] RBP: 00007fc63df6f099 R08: 0000000000000000 R09:
0000556b524be760
[ 247.876395] R10: 0000000000000008 R11: 0000000000000246 R12:
0000000000000000
[ 247.917597] R13: 0000556b524c5f10 R14: 0000000000020000 R15:
0000000000000000
[ 247.957272]
[ 247.957272] Showing all locks held in the system:
[ 247.992019] 1 lock held by khungtaskd/118:
[ 248.015019] #0: (tasklist_lock){.+.+}, at: [<000000004ef3538d>]
debug_show_all_locks+0x39/0x1b0
[ 248.064600] 2 locks held by systemd-udevd/413:
[ 248.090031] #0: (&dev->mutex){....}, at: [<000000002a395ec8>]
__driver_attach+0x4a/0xe0
[ 248.136620] #1: (&dev->mutex){....}, at: [<00000000d9def23c>]
__driver_attach+0x58/0xe0
[ 248.183245]
[ 248.191675] =============================================
[ 248.191675]
[ 314.825134] dracut-initqueue[437]: Warning: dracut-initqueue timeout
- starting timeout scripts
[ 315.368421] dracut-initqueue[437]: Warning: dracut-initqueue timeout
- starting timeout scripts
[ 315.894373] dracut-initqueue[437]: Warning: dracut-initqueue timeout
- starting timeout scripts
[ 316.418385] dracut-initqueue[437]: Warning: dracut-initqueue timeout
- starting timeout scripts
[ 316.944461] dracut-initqueue[437]: Warning: dracut-initqueue timeout
- starting timeout scripts
[ 317.466708] dracut-initqueue[437]: Warning: dracut-initqueue timeout
- starting timeout scripts
[ 317.994380] dracut-initqueue[437]: Warning: dracut-initqueue timeout
- starti
> -----Original Message-----
> From: Laurence Oberman [mailto:[email protected]]
> Sent: Tuesday, January 16, 2018 7:29 AM
> To: Thomas Gleixner <[email protected]>; Ming Lei <[email protected]>
> Cc: Christoph Hellwig <[email protected]>; Jens Axboe <[email protected]>;
> [email protected]; [email protected]; Mike Snitzer
> <[email protected]>; Don Brace <[email protected]>
> Subject: Re: [PATCH 0/2] genirq/affinity: try to make sure online CPU is assgined
> to irq vector
>
> > > It is because of irq_create_affinity_masks().
> >
> > That still does not answer the question. If the interrupt for a queue
> > is
> > assigned to an offline CPU, then the queue should not be used and
> > never
> > raise an interrupt. That's how managed interrupts have been designed.
> >
> > Thanks,
> >
> > tglx
> >
> >
> >
> >
>
> I captured a full boot log for this issue for Microsemi, I will send it
> to Don Brace.
> I enabled all the HPSA debug and here is snippet
>
>
> ..
> ..
> ..
> 246.751135] INFO: task systemd-udevd:413 blocked for more than 120
> seconds.
> [ 246.788008] Tainted: G I 4.15.0-rc4.noming+ #1
> [ 246.822380] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [ 246.865594] systemd-udevd D 0 413 411 0x80000004
> [ 246.895519] Call Trace:
> [ 246.909713] ? __schedule+0x340/0xc20
> [ 246.930236] schedule+0x32/0x80
> [ 246.947905] schedule_timeout+0x23d/0x450
> [ 246.970047] ? find_held_lock+0x2d/0x90
> [ 246.991774] ? wait_for_completion_io+0x108/0x170
> [ 247.018172] io_schedule_timeout+0x19/0x40
> [ 247.041208] wait_for_completion_io+0x110/0x170
> [ 247.067326] ? wake_up_q+0x70/0x70
> [ 247.086801] hpsa_scsi_do_simple_cmd+0xc6/0x100 [hpsa]
> [ 247.114315] hpsa_scsi_do_simple_cmd_with_retry+0xb7/0x1c0 [hpsa]
> [ 247.146629] hpsa_scsi_do_inquiry+0x73/0xd0 [hpsa]
> [ 247.174118] hpsa_init_one+0x12cb/0x1a59 [hpsa]
This trace comes from internally generated discovery commands. No SCSI devices have
been presented to the SML yet.
At this point we should be running on only one CPU. These commands are meant to use
reply queue 0 which are tied to CPU 0. It's interesting that the patch helps.
However, I was wondering if you could inspect the iLo IML logs and send the
AHS logs for inspection.
Thanks,
Don Brace
ESC - Smart Storage
Microsemi Corporation
> [ 247.199851] ? __pm_runtime_resume+0x55/0x70
> [ 247.224527] local_pci_probe+0x3f/0xa0
> [ 247.246034] pci_device_probe+0x146/0x1b0
> [ 247.268413] driver_probe_device+0x2b3/0x4a0
> [ 247.291868] __driver_attach+0xda/0xe0
> [ 247.313370] ? driver_probe_device+0x4a0/0x4a0
> [ 247.338399] bus_for_each_dev+0x6a/0xb0
> [ 247.359912] bus_add_driver+0x41/0x260
> [ 247.380244] driver_register+0x5b/0xd0
> [ 247.400811] ? 0xffffffffc016b000
> [ 247.418819] hpsa_init+0x38/0x1000 [hpsa]
> [ 247.440763] ? 0xffffffffc016b000
> [ 247.459451] do_one_initcall+0x4d/0x19c
> [ 247.480539] ? do_init_module+0x22/0x220
> [ 247.502575] ? rcu_read_lock_sched_held+0x64/0x70
> [ 247.529549] ? kmem_cache_alloc_trace+0x1f7/0x260
> [ 247.556204] ? do_init_module+0x22/0x220
> [ 247.578633] do_init_module+0x5a/0x220
> [ 247.600322] load_module+0x21e8/0x2a50
> [ 247.621648] ? __symbol_put+0x60/0x60
> [ 247.642796] SYSC_finit_module+0x94/0xe0
> [ 247.665336] entry_SYSCALL_64_fastpath+0x1f/0x96
> [ 247.691751] RIP: 0033:0x7fc63d6527f9
> [ 247.712308] RSP: 002b:00007ffdf1659ba8 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000139
> [ 247.755272] RAX: ffffffffffffffda RBX: 0000556b524c5f70 RCX:
> 00007fc63d6527f9
> [ 247.795779] RDX: 0000000000000000 RSI: 00007fc63df6f099 RDI:
> 0000000000000008
> [ 247.836413] RBP: 00007fc63df6f099 R08: 0000000000000000 R09:
> 0000556b524be760
> [ 247.876395] R10: 0000000000000008 R11: 0000000000000246 R12:
> 0000000000000000
> [ 247.917597] R13: 0000556b524c5f10 R14: 0000000000020000 R15:
> 0000000000000000
> [ 247.957272]
> [ 247.957272] Showing all locks held in the system:
> [ 247.992019] 1 lock held by khungtaskd/118:
> [ 248.015019] #0: (tasklist_lock){.+.+}, at: [<000000004ef3538d>]
> debug_show_all_locks+0x39/0x1b0
> [ 248.064600] 2 locks held by systemd-udevd/413:
> [ 248.090031] #0: (&dev->mutex){....}, at: [<000000002a395ec8>]
> __driver_attach+0x4a/0xe0
> [ 248.136620] #1: (&dev->mutex){....}, at: [<00000000d9def23c>]
> __driver_attach+0x58/0xe0
> [ 248.183245]
> [ 248.191675] =============================================
> [ 248.191675]
> [ 314.825134] dracut-initqueue[437]: Warning: dracut-initqueue timeout
> - starting timeout scripts
> [ 315.368421] dracut-initqueue[437]: Warning: dracut-initqueue timeout
> - starting timeout scripts
> [ 315.894373] dracut-initqueue[437]: Warning: dracut-initqueue timeout
> - starting timeout scripts
> [ 316.418385] dracut-initqueue[437]: Warning: dracut-initqueue timeout
> - starting timeout scripts
> [ 316.944461] dracut-initqueue[437]: Warning: dracut-initqueue timeout
> - starting timeout scripts
> [ 317.466708] dracut-initqueue[437]: Warning: dracut-initqueue timeout
> - starting timeout scripts
> [ 317.994380] dracut-initqueue[437]: Warning: dracut-initqueue timeout
> - starti
On Tue, 2018-01-16 at 15:22 +0000, Don Brace wrote:
> > -----Original Message-----
> > From: Laurence Oberman [mailto:[email protected]]
> > Sent: Tuesday, January 16, 2018 7:29 AM
> > To: Thomas Gleixner <[email protected]>; Ming Lei <ming.lei@redhat
> > .com>
> > Cc: Christoph Hellwig <[email protected]>; Jens Axboe <[email protected]
> > >;
> > [email protected]; [email protected]; Mike
> > Snitzer
> > <[email protected]>; Don Brace <[email protected]>
> > Subject: Re: [PATCH 0/2] genirq/affinity: try to make sure online
> > CPU is assgined
> > to irq vector
> >
> > > > It is because of irq_create_affinity_masks().
> > >
> > > That still does not answer the question. If the interrupt for a
> > > queue
> > > is
> > > assigned to an offline CPU, then the queue should not be used and
> > > never
> > > raise an interrupt. That's how managed interrupts have been
> > > designed.
> > >
> > > Thanks,
> > >
> > > tglx
> > >
> > >
> > >
> > >
> >
> > I captured a full boot log for this issue for Microsemi, I will
> > send it
> > to Don Brace.
> > I enabled all the HPSA debug and here is snippet
> >
> >
> > ..
> > ..
> > ..
> > 246.751135] INFO: task systemd-udevd:413 blocked for more than
> > 120
> > seconds.
> > [ 246.788008] Tainted: G I 4.15.0-rc4.noming+
> > #1
> > [ 246.822380] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > disables this message.
> > [ 246.865594] systemd-udevd D 0 413 411 0x80000004
> > [ 246.895519] Call Trace:
> > [ 246.909713] ? __schedule+0x340/0xc20
> > [ 246.930236] schedule+0x32/0x80
> > [ 246.947905] schedule_timeout+0x23d/0x450
> > [ 246.970047] ? find_held_lock+0x2d/0x90
> > [ 246.991774] ? wait_for_completion_io+0x108/0x170
> > [ 247.018172] io_schedule_timeout+0x19/0x40
> > [ 247.041208] wait_for_completion_io+0x110/0x170
> > [ 247.067326] ? wake_up_q+0x70/0x70
> > [ 247.086801] hpsa_scsi_do_simple_cmd+0xc6/0x100 [hpsa]
> > [ 247.114315] hpsa_scsi_do_simple_cmd_with_retry+0xb7/0x1c0
> > [hpsa]
> > [ 247.146629] hpsa_scsi_do_inquiry+0x73/0xd0 [hpsa]
> > [ 247.174118] hpsa_init_one+0x12cb/0x1a59 [hpsa]
>
> This trace comes from internally generated discovery commands. No
> SCSI devices have
> been presented to the SML yet.
>
> At this point we should be running on only one CPU. These commands
> are meant to use
> reply queue 0 which are tied to CPU 0. It's interesting that the
> patch helps.
>
> However, I was wondering if you could inspect the iLo IML logs and
> send the
> AHS logs for inspection.
>
> Thanks,
> Don Brace
> ESC - Smart Storage
> Microsemi Corporation
Hello Don
I took two other dl380 g7's and ran the same kernel and it hangs in the
identical place. Its absolutely consistent here.
I doubt all three have hardware issues.
Nothing is logged of interest in the IML.
Ming will have more to share on specifically why it helps.
I think he sent that along to you already.
Regards
Laurence
On Tue, Jan 16, 2018 at 03:22:18PM +0000, Don Brace wrote:
> > -----Original Message-----
> > From: Laurence Oberman [mailto:[email protected]]
> > Sent: Tuesday, January 16, 2018 7:29 AM
> > To: Thomas Gleixner <[email protected]>; Ming Lei <[email protected]>
> > Cc: Christoph Hellwig <[email protected]>; Jens Axboe <[email protected]>;
> > [email protected]; [email protected]; Mike Snitzer
> > <[email protected]>; Don Brace <[email protected]>
> > Subject: Re: [PATCH 0/2] genirq/affinity: try to make sure online CPU is assgined
> > to irq vector
> >
> > > > It is because of irq_create_affinity_masks().
> > >
> > > That still does not answer the question. If the interrupt for a queue
> > > is
> > > assigned to an offline CPU, then the queue should not be used and
> > > never
> > > raise an interrupt. That's how managed interrupts have been designed.
> > >
> > > Thanks,
> > >
> > > tglx
> > >
> > >
> > >
> > >
> >
> > I captured a full boot log for this issue for Microsemi, I will send it
> > to Don Brace.
> > I enabled all the HPSA debug and here is snippet
> >
> >
> > ..
> > ..
> > ..
> > 246.751135] INFO: task systemd-udevd:413 blocked for more than 120
> > seconds.
> > [??246.788008]???????Tainted: G I 4.15.0-rc4.noming+ #1
> > [??246.822380] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > disables this message.
> > [??246.865594] systemd-udevd D 0 413 411 0x80000004
> > [??246.895519] Call Trace:
> > [??246.909713]??? __schedule+0x340/0xc20
> > [??246.930236]??schedule+0x32/0x80
> > [??246.947905]??schedule_timeout+0x23d/0x450
> > [ 246.970047]??? find_held_lock+0x2d/0x90
> > [??246.991774]??? wait_for_completion_io+0x108/0x170
> > [??247.018172]??io_schedule_timeout+0x19/0x40
> > [??247.041208]??wait_for_completion_io+0x110/0x170
> > [??247.067326]??? wake_up_q+0x70/0x70
> > [??247.086801]??hpsa_scsi_do_simple_cmd+0xc6/0x100 [hpsa]
> > [??247.114315]??hpsa_scsi_do_simple_cmd_with_retry+0xb7/0x1c0 [hpsa]
> > [??247.146629]??hpsa_scsi_do_inquiry+0x73/0xd0 [hpsa]
> > [??247.174118]??hpsa_init_one+0x12cb/0x1a59 [hpsa]
>
> This trace comes from internally generated discovery commands. No SCSI devices have
> been presented to the SML yet.
>
> At this point we should be running on only one CPU. These commands are meant to use
> reply queue 0 which are tied to CPU 0. It's interesting that the patch helps.
In hpsa_interrupt_mode(), you pass PCI_IRQ_AFFINITY to pci_alloc_irq_vectors(),
which may spread one irq vector across all offline CPUs. That is the cause of
this hang reported by Laurence from my observation.
BTW, if the interrupt handler for the reply queue isn't performance sensitive,
maybe PCI_IRQ_AFFINITY can be removed for avoiding this issue.
But anyway, as I replied in this thread, this patch still improves irq
vectors spread.
Thanks,
Ming
On Tue, Jan 16, 2018 at 03:22:18PM +0000, Don Brace wrote:
> > -----Original Message-----
> > From: Laurence Oberman [mailto:[email protected]]
> > Sent: Tuesday, January 16, 2018 7:29 AM
> > To: Thomas Gleixner <[email protected]>; Ming Lei <[email protected]>
> > Cc: Christoph Hellwig <[email protected]>; Jens Axboe <[email protected]>;
> > [email protected]; [email protected]; Mike Snitzer
> > <[email protected]>; Don Brace <[email protected]>
> > Subject: Re: [PATCH 0/2] genirq/affinity: try to make sure online CPU is assgined
> > to irq vector
> >
> > > > It is because of irq_create_affinity_masks().
> > >
> > > That still does not answer the question. If the interrupt for a queue
> > > is
> > > assigned to an offline CPU, then the queue should not be used and
> > > never
> > > raise an interrupt. That's how managed interrupts have been designed.
> > >
> > > Thanks,
> > >
> > > tglx
> > >
> > >
> > >
> > >
> >
> > I captured a full boot log for this issue for Microsemi, I will send it
> > to Don Brace.
> > I enabled all the HPSA debug and here is snippet
> >
> >
> > ..
> > ..
> > ..
> > 246.751135] INFO: task systemd-udevd:413 blocked for more than 120
> > seconds.
> > [??246.788008]???????Tainted: G I 4.15.0-rc4.noming+ #1
> > [??246.822380] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > disables this message.
> > [??246.865594] systemd-udevd D 0 413 411 0x80000004
> > [??246.895519] Call Trace:
> > [??246.909713]??? __schedule+0x340/0xc20
> > [??246.930236]??schedule+0x32/0x80
> > [??246.947905]??schedule_timeout+0x23d/0x450
> > [ 246.970047]??? find_held_lock+0x2d/0x90
> > [??246.991774]??? wait_for_completion_io+0x108/0x170
> > [??247.018172]??io_schedule_timeout+0x19/0x40
> > [??247.041208]??wait_for_completion_io+0x110/0x170
> > [??247.067326]??? wake_up_q+0x70/0x70
> > [??247.086801]??hpsa_scsi_do_simple_cmd+0xc6/0x100 [hpsa]
> > [??247.114315]??hpsa_scsi_do_simple_cmd_with_retry+0xb7/0x1c0 [hpsa]
> > [??247.146629]??hpsa_scsi_do_inquiry+0x73/0xd0 [hpsa]
> > [??247.174118]??hpsa_init_one+0x12cb/0x1a59 [hpsa]
>
> This trace comes from internally generated discovery commands. No SCSI devices have
> been presented to the SML yet.
>
> At this point we should be running on only one CPU. These commands are meant to use
> reply queue 0 which are tied to CPU 0. It's interesting that the patch helps.
>
> However, I was wondering if you could inspect the iLo IML logs and send the
> AHS logs for inspection.
Hello Don,
Now the patch has been merged to linus tree as:
84676c1f21e8ff54b ("genirq/affinity: assign vectors to all possible CPUs")
and it breaks Laurence's machine completely, :-(
I just take a look at HPSA's code, and found that reply queue is chosen
in the following way in most of code path:
if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
cp->ReplyQueue = smp_processor_id() % h->nreply_queues;
h->nreply_queues is the msix vector number which is returned from
pci_alloc_irq_vectors(), and now some of vectors may be mapped to all
offline CPUs, for example, one processor isn't plugged to socket.
If I understand correctly, 'cp->ReplyQueue' is aligned to one irq
vector, and the command is expected by handled via that irq vector,
is it right?
If yes, now I guess this way can't work any more if number of online
CPUs is >= h->nreply_queues, and you may need to check the cpu affinity
of one vector before choosing the reply queue, and block/blk-mq-pci.c
may be helpful for you.
Thanks,
Ming
> -----Original Message-----
> From: Ming Lei [mailto:[email protected]]
> Sent: Thursday, February 01, 2018 4:37 AM
> To: Don Brace <[email protected]>
> Cc: Laurence Oberman <[email protected]>; Thomas Gleixner
> <[email protected]>; Christoph Hellwig <[email protected]>; Jens Axboe
> <[email protected]>; [email protected]; [email protected];
> Mike Snitzer <[email protected]>
> Subject: Re: [PATCH 0/2] genirq/affinity: try to make sure online CPU is assgined
> to irq vector
>
> EXTERNAL EMAIL
>
>
> On Tue, Jan 16, 2018 at 03:22:18PM +0000, Don Brace wrote:
> > > -----Original Message-----
> > > From: Laurence Oberman [mailto:[email protected]]
> > > Sent: Tuesday, January 16, 2018 7:29 AM
> > > To: Thomas Gleixner <[email protected]>; Ming Lei <[email protected]>
> > > Cc: Christoph Hellwig <[email protected]>; Jens Axboe <[email protected]>;
> > > [email protected]; [email protected]; Mike Snitzer
> > > <[email protected]>; Don Brace <[email protected]>
> > > Subject: Re: [PATCH 0/2] genirq/affinity: try to make sure online CPU is
> assgined
> > > to irq vector
> > >
> > > > > It is because of irq_create_affinity_masks().
> > > >
> > > > That still does not answer the question. If the interrupt for a queue
> > > > is
> > > > assigned to an offline CPU, then the queue should not be used and
> > > > never
> > > > raise an interrupt. That's how managed interrupts have been designed.
> > > >
> > > > Thanks,
> > > >
> > > > tglx
> > > >
> > > >
> > > >
> > > >
> > >
> > > I captured a full boot log for this issue for Microsemi, I will send it
> > > to Don Brace.
> > > I enabled all the HPSA debug and here is snippet
> > >
> > >
> > > ..
> > > ..
> > > ..
> > > 246.751135] INFO: task systemd-udevd:413 blocked for more than 120
> > > seconds.
> > > [??246.788008]???????Tainted: G I 4.15.0-rc4.noming+ #1
> > > [??246.822380] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > > disables this message.
> > > [??246.865594] systemd-udevd D 0 413 411 0x80000004
> > > [??246.895519] Call Trace:
> > > [??246.909713]??? __schedule+0x340/0xc20
> > > [??246.930236]??schedule+0x32/0x80
> > > [??246.947905]??schedule_timeout+0x23d/0x450
> > > [ 246.970047]??? find_held_lock+0x2d/0x90
> > > [??246.991774]??? wait_for_completion_io+0x108/0x170
> > > [??247.018172]??io_schedule_timeout+0x19/0x40
> > > [??247.041208]??wait_for_completion_io+0x110/0x170
> > > [??247.067326]??? wake_up_q+0x70/0x70
> > > [??247.086801]??hpsa_scsi_do_simple_cmd+0xc6/0x100 [hpsa]
> > > [??247.114315]??hpsa_scsi_do_simple_cmd_with_retry+0xb7/0x1c0 [hpsa]
> > > [??247.146629]??hpsa_scsi_do_inquiry+0x73/0xd0 [hpsa]
> > > [??247.174118]??hpsa_init_one+0x12cb/0x1a59 [hpsa]
> >
> > This trace comes from internally generated discovery commands. No SCSI
> devices have
> > been presented to the SML yet.
> >
> > At this point we should be running on only one CPU. These commands are
> meant to use
> > reply queue 0 which are tied to CPU 0. It's interesting that the patch helps.
> >
> > However, I was wondering if you could inspect the iLo IML logs and send the
> > AHS logs for inspection.
>
> Hello Don,
>
> Now the patch has been merged to linus tree as:
>
> 84676c1f21e8ff54b ("genirq/affinity: assign vectors to all possible CPUs")
>
> and it breaks Laurence's machine completely, :-(
>
> I just take a look at HPSA's code, and found that reply queue is chosen
> in the following way in most of code path:
>
> if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
> cp->ReplyQueue = smp_processor_id() % h->nreply_queues;
>
> h->nreply_queues is the msix vector number which is returned from
> pci_alloc_irq_vectors(), and now some of vectors may be mapped to all
> offline CPUs, for example, one processor isn't plugged to socket.
>
> If I understand correctly, 'cp->ReplyQueue' is aligned to one irq
> vector, and the command is expected by handled via that irq vector,
> is it right?
>
> If yes, now I guess this way can't work any more if number of online
> CPUs is >= h->nreply_queues, and you may need to check the cpu affinity
> of one vector before choosing the reply queue, and block/blk-mq-pci.c
> may be helpful for you.
>
> Thanks,
> Ming
Thanks Ming,
I start working up a patch.
On Thu, Feb 01, 2018 at 02:53:35PM +0000, Don Brace wrote:
> > -----Original Message-----
> > From: Ming Lei [mailto:[email protected]]
> > Sent: Thursday, February 01, 2018 4:37 AM
> > To: Don Brace <[email protected]>
> > Cc: Laurence Oberman <[email protected]>; Thomas Gleixner
> > <[email protected]>; Christoph Hellwig <[email protected]>; Jens Axboe
> > <[email protected]>; [email protected]; [email protected];
> > Mike Snitzer <[email protected]>
> > Subject: Re: [PATCH 0/2] genirq/affinity: try to make sure online CPU is assgined
> > to irq vector
> >
> > EXTERNAL EMAIL
> >
> >
> > On Tue, Jan 16, 2018 at 03:22:18PM +0000, Don Brace wrote:
> > > > -----Original Message-----
> > > > From: Laurence Oberman [mailto:[email protected]]
> > > > Sent: Tuesday, January 16, 2018 7:29 AM
> > > > To: Thomas Gleixner <[email protected]>; Ming Lei <[email protected]>
> > > > Cc: Christoph Hellwig <[email protected]>; Jens Axboe <[email protected]>;
> > > > [email protected]; [email protected]; Mike Snitzer
> > > > <[email protected]>; Don Brace <[email protected]>
> > > > Subject: Re: [PATCH 0/2] genirq/affinity: try to make sure online CPU is
> > assgined
> > > > to irq vector
> > > >
> > > > > > It is because of irq_create_affinity_masks().
> > > > >
> > > > > That still does not answer the question. If the interrupt for a queue
> > > > > is
> > > > > assigned to an offline CPU, then the queue should not be used and
> > > > > never
> > > > > raise an interrupt. That's how managed interrupts have been designed.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > tglx
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > > > I captured a full boot log for this issue for Microsemi, I will send it
> > > > to Don Brace.
> > > > I enabled all the HPSA debug and here is snippet
> > > >
> > > >
> > > > ..
> > > > ..
> > > > ..
> > > > 246.751135] INFO: task systemd-udevd:413 blocked for more than 120
> > > > seconds.
> > > > [??246.788008]???????Tainted: G I 4.15.0-rc4.noming+ #1
> > > > [??246.822380] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > > > disables this message.
> > > > [??246.865594] systemd-udevd D 0 413 411 0x80000004
> > > > [??246.895519] Call Trace:
> > > > [??246.909713]??? __schedule+0x340/0xc20
> > > > [??246.930236]??schedule+0x32/0x80
> > > > [??246.947905]??schedule_timeout+0x23d/0x450
> > > > [ 246.970047]??? find_held_lock+0x2d/0x90
> > > > [??246.991774]??? wait_for_completion_io+0x108/0x170
> > > > [??247.018172]??io_schedule_timeout+0x19/0x40
> > > > [??247.041208]??wait_for_completion_io+0x110/0x170
> > > > [??247.067326]??? wake_up_q+0x70/0x70
> > > > [??247.086801]??hpsa_scsi_do_simple_cmd+0xc6/0x100 [hpsa]
> > > > [??247.114315]??hpsa_scsi_do_simple_cmd_with_retry+0xb7/0x1c0 [hpsa]
> > > > [??247.146629]??hpsa_scsi_do_inquiry+0x73/0xd0 [hpsa]
> > > > [??247.174118]??hpsa_init_one+0x12cb/0x1a59 [hpsa]
> > >
> > > This trace comes from internally generated discovery commands. No SCSI
> > devices have
> > > been presented to the SML yet.
> > >
> > > At this point we should be running on only one CPU. These commands are
> > meant to use
> > > reply queue 0 which are tied to CPU 0. It's interesting that the patch helps.
> > >
> > > However, I was wondering if you could inspect the iLo IML logs and send the
> > > AHS logs for inspection.
> >
> > Hello Don,
> >
> > Now the patch has been merged to linus tree as:
> >
> > 84676c1f21e8ff54b ("genirq/affinity: assign vectors to all possible CPUs")
> >
> > and it breaks Laurence's machine completely, :-(
> >
> > I just take a look at HPSA's code, and found that reply queue is chosen
> > in the following way in most of code path:
> >
> > if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
> > cp->ReplyQueue = smp_processor_id() % h->nreply_queues;
> >
> > h->nreply_queues is the msix vector number which is returned from
> > pci_alloc_irq_vectors(), and now some of vectors may be mapped to all
> > offline CPUs, for example, one processor isn't plugged to socket.
> >
> > If I understand correctly, 'cp->ReplyQueue' is aligned to one irq
> > vector, and the command is expected by handled via that irq vector,
> > is it right?
> >
> > If yes, now I guess this way can't work any more if number of online
> > CPUs is >= h->nreply_queues, and you may need to check the cpu affinity
> > of one vector before choosing the reply queue, and block/blk-mq-pci.c
> > may be helpful for you.
> >
> > Thanks,
> > Ming
>
> Thanks Ming,
> I start working up a patch.
Also the reply queue may be mapped to blk-mq's hw queue directly, then the
conversion may be done by blk-mq's MQ framework, but legacy path still need
the fix.
thanks
Ming