LinuxLists.cc - 4.5.0+ panic when setup loop device

[permalink] [raw]

Subject: Re: 4.5.0+ panic when setup loop device

On Wed, 16 Mar 2016, Xiong Zhou wrote:
> full log , bisect log and config are attached.

Can you please provide a full boot log and the output of 'cat /proc/cpuinfo' ?

Thanks,

tglx

2016-03-17 01:56:11

by Murphy Zhou

[permalink] [raw]

Subject: Re: 4.5.0+ panic when setup loop device

On Wed, Mar 16, 2016 at 11:26 PM, Thomas Gleixner <[email protected]> wrote:
> On Wed, 16 Mar 2016, Xiong Zhou wrote:
>> full log , bisect log and config are attached.
>
> Can you please provide a full boot log and the output of 'cat /proc/cpuinfo' ?

Attached. Thank you.

>
> Thanks,
>
> tglx

Attachments:

bootlog (115.51 kB)
cpuinfo (34.49 kB)
Download all attachments

2016-03-17 09:52:47

[permalink] [raw]

Subject: Re: 4.5.0+ panic when setup loop device

On Thu, Mar 17, 2016 at 09:56:05AM +0800, Xiong Zhou wrote:
> On Wed, Mar 16, 2016 at 11:26 PM, Thomas Gleixner <[email protected]> wrote:

> > Can you please provide a full boot log and the output of 'cat /proc/cpuinfo' ?

Mar 17 17:34:30 myhost kernel: smpboot: Max logical packages: 1
Mar 17 17:34:30 myhost kernel: smpboot: APIC(20) Converting physical 1 to logical package 0
Mar 17 17:34:30 myhost kernel: smpboot: APIC(40) Package 2 exceeds logical package map

So that is busted.. it turns out AMD gets x86_max_cores wrong when there
are compute units.

Mar 17 17:34:30 myhost kernel: smpboot: CPU 1 APICId 40 disabled
Mar 17 17:34:30 myhost kernel: Switched APIC routing to physical flat.
Mar 17 17:34:30 myhost kernel: ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
Mar 17 17:34:30 myhost kernel: smpboot: CPU0: AMD Opteron(TM) Processor 6274 (family: 0x15, model: 0x1, stepping: 0x2)
Mar 17 17:34:30 myhost kernel: Performance Events: Fam15h core perfctr, Broken BIOS detected, complain to your hardware vendor.
Mar 17 17:34:30 myhost kernel: [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR c0010200 is 430076)
Mar 17 17:34:30 myhost kernel: AMD PMU driver.
Mar 17 17:34:30 myhost kernel: ... version: 0
Mar 17 17:34:30 myhost kernel: ... bit width: 48
Mar 17 17:34:30 myhost kernel: ... generic registers: 6
Mar 17 17:34:30 myhost kernel: ... value mask: 0000ffffffffffff
Mar 17 17:34:30 myhost kernel: ... max period: 00007fffffffffff
Mar 17 17:34:30 myhost kernel: ... fixed-purpose events: 0
Mar 17 17:34:30 myhost kernel: ... event mask: 000000000000003f
Mar 17 17:34:30 myhost kernel: NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.
Mar 17 17:34:30 myhost kernel: .... node #0, CPUs: #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 #16
Mar 17 17:34:30 myhost kernel: .... node #3, CPUs: #17
Mar 17 17:34:30 myhost kernel: .... node #0, CPUs: #18
Mar 17 17:34:30 myhost kernel: .... node #3, CPUs: #19
Mar 17 17:34:30 myhost kernel: .... node #0, CPUs: #20
Mar 17 17:34:30 myhost kernel: .... node #3, CPUs: #21
Mar 17 17:34:30 myhost kernel: .... node #0, CPUs: #22
Mar 17 17:34:30 myhost kernel: .... node #3, CPUs: #23
Mar 17 17:34:30 myhost kernel: .... node #0, CPUs: #24
Mar 17 17:34:30 myhost kernel: .... node #3, CPUs: #25
Mar 17 17:34:30 myhost kernel: .... node #0, CPUs: #26
Mar 17 17:34:30 myhost kernel: .... node #3, CPUs: #27
Mar 17 17:34:30 myhost kernel: .... node #0, CPUs: #28
Mar 17 17:34:30 myhost kernel: .... node #3, CPUs: #29
Mar 17 17:34:30 myhost kernel: .... node #0, CPUs: #30
Mar 17 17:34:30 myhost kernel: .... node #3, CPUs: #31
Mar 17 17:34:30 myhost kernel: x86: Booted up 2 nodes, 31 CPUs

And that is one weird node mapping..

I have a similar system, which after the below patch says:

[ 0.182174] max_cores: 8, cpu_ids: 32, num_siblings: 2, coreid_bits: 5
[ 0.188712] smpboot: Max logical packages: 2
[ 0.192988] smpboot: APIC(20) Converting physical 1 to logical package 0
[ 0.199689] smpboot: APIC(40) Converting physical 2 to logical package 1
[ 0.206405] Switched APIC routing to physical flat.
[ 0.211851] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[ 0.329578] smpboot: CPU0: AMD Opteron(tm) Processor 6278 (family: 0x15, model: 0x1, stepping: 0x2)
[ 0.338705] Performance Events: Fam15h core perfctr, AMD PMU driver.
[ 0.345134] ... version: 0
[ 0.349147] ... bit width: 48
[ 0.353262] ... generic registers: 6
[ 0.357274] ... value mask: 0000ffffffffffff
[ 0.362586] ... max period: 00007fffffffffff
[ 0.367900] ... fixed-purpose events: 0
[ 0.371911] ... event mask: 000000000000003f
[ 0.378664] MCE: In-kernel MCE decoding enabled.
[ 0.383965] NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.
[ 0.393079] x86: Booting SMP configuration:
[ 0.397262] .... node #0, CPUs: #1 #2 #3 #4 #5 #6 #7
[ 0.848764] .... node #1, CPUs: #8 #9 #10 #11 #12 #13 #14 #15
[ 1.364701] .... node #2, CPUs: #16 #17 #18 #19 #20 #21 #22 #23
[ 1.898586] .... node #3, CPUs: #24 #25 #26 #27 #28 #29 #30 #31
[ 2.413417] x86: Booted up 4 nodes, 32 CPUs

Could you please try? I'm not sure how this would explain your loop
device bug fail, but it certainly pointed towards broken.

Andreas; Borislav said to Cc you since you wrote all this.
The issue is that Linux assumes:

nr_logical_cpus = nr_cores * nr_siblings

But AMD reports its CU unit as 2 cores, but then sets num_smp_siblings
to 2 as well.

Thomas; I removed that first branch testing pkg against
__max_logical_packages because if the first pkg id is larger, then the
find_first_zero will find us logical package id 0. However, if the
second pkg id is indeed 0, we'll again claim it without testing if it
was already taken. Also, it fails to print the mapping.

---
arch/x86/kernel/cpu/amd.c | 8 ++++----
arch/x86/kernel/smpboot.c | 11 ++++++-----
2 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 97c59fd..6216e80 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -310,9 +310,9 @@ static void amd_get_topology(struct cpuinfo_x86 *c)
node_id = ecx & 7;

/* get compute unit information */
- smp_num_siblings = ((ebx >> 8) & 3) + 1;
+ cores_per_cu = smp_num_siblings = ((ebx >> 8) & 3) + 1;
+ c->x86_max_cores /= smp_num_siblings;
c->compute_unit_id = ebx & 0xff;
- cores_per_cu += ((ebx >> 8) & 3);
} else if (cpu_has(c, X86_FEATURE_NODEID_MSR)) {
u64 value;

@@ -328,8 +328,8 @@ static void amd_get_topology(struct cpuinfo_x86 *c)
u32 cus_per_node;

set_cpu_cap(c, X86_FEATURE_AMD_DCM);
- cores_per_node = c->x86_max_cores / nodes_per_socket;
- cus_per_node = cores_per_node / cores_per_cu;
+ cus_per_node = c->x86_max_cores / nodes_per_socket;
+ cores_per_node = cus_per_node * cores_per_cu;

/* store NodeID, use llc_shared_map to store sibling info */
per_cpu(cpu_llc_id, cpu) = node_id;
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 643dbdc..15c5fda 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -274,11 +274,6 @@ int topology_update_package_map(unsigned int apicid, unsigned int cpu)
if (test_and_set_bit(pkg, physical_package_map))
goto found;

- if (pkg < __max_logical_packages) {
- set_bit(pkg, logical_package_map);
- physical_to_logical_pkg[pkg] = pkg;
- goto found;
- }
new = find_first_zero_bit(logical_package_map, __max_logical_packages);
if (new >= __max_logical_packages) {
physical_to_logical_pkg[pkg] = -1;
@@ -314,6 +309,12 @@ static void __init smp_init_package_map(void)
unsigned int ncpus, cpu;
size_t size;

+ printk("max_cores: %d, cpu_ids: %d, num_siblings: %d, coreid_bits: %d\n",
+ boot_cpu_data.x86_max_cores,
+ nr_cpu_ids,
+ smp_num_siblings,
+ boot_cpu_data.x86_coreid_bits);
+
/*
* Today neither Intel nor AMD support heterogenous systems. That
* might change in the future....

2016-03-17 09:56:56

[permalink] [raw]

Subject: Re: 4.5.0+ panic when setup loop device

On Thu, Mar 17, 2016 at 10:52:20AM +0100, Peter Zijlstra wrote:
> Mar 17 17:34:30 myhost kernel: smpboot: CPU0: AMD Opteron(TM) Processor 6274 (family: 0x15, model: 0x1, stepping: 0x2)
> Mar 17 17:34:30 myhost kernel: Performance Events: Fam15h core perfctr, Broken BIOS detected, complain to your hardware vendor.
> Mar 17 17:34:30 myhost kernel: [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR c0010200 is 430076)

FWIW, you might want to talk to HP about that; some machines have a
magic key-combo in their BIOS screen with extra options, allowing you to
fix this.

2016-03-17 10:22:58

[permalink] [raw]

Subject: Re: 4.5.0+ panic when setup loop device

On Thu, 17 Mar 2016, Peter Zijlstra wrote:

> Could you please try? I'm not sure how this would explain your loop
> device bug fail, but it certainly pointed towards broken.

It definitely does not explain it. The wreckage that topo stuff causes is that
it disables a cpu, but that really is not a reason for block/loop to explode.

Thanks,

tglx

2016-03-17 10:26:42

[permalink] [raw]

Subject: Re: 4.5.0+ panic when setup loop device

On Thu, Mar 17, 2016 at 11:21:24AM +0100, Thomas Gleixner wrote:
> On Thu, 17 Mar 2016, Peter Zijlstra wrote:
>
> > Could you please try? I'm not sure how this would explain your loop
> > device bug fail, but it certainly pointed towards broken.
>
> It definitely does not explain it. The wreckage that topo stuff causes is that
> it disables a cpu, but that really is not a reason for block/loop to explode.

Right. Sadly I could not reproduce that error on my machine. But we can
at least start by fixing the 'obvious' problems and then maybe we get
more clues ;-)

2016-03-17 11:41:17

[permalink] [raw]

Subject: Re: 4.5.0+ panic when setup loop device

B1;2802;0cOn Thu, 17 Mar 2016, Peter Zijlstra wrote:

> On Thu, Mar 17, 2016 at 11:21:24AM +0100, Thomas Gleixner wrote:
> > On Thu, 17 Mar 2016, Peter Zijlstra wrote:
> >
> > > Could you please try? I'm not sure how this would explain your loop
> > > device bug fail, but it certainly pointed towards broken.
> >
> > It definitely does not explain it. The wreckage that topo stuff causes is that
> > it disables a cpu, but that really is not a reason for block/loop to explode.
>
> Right. Sadly I could not reproduce that error on my machine. But we can
> at least start by fixing the 'obvious' problems and then maybe we get
> more clues ;-)

I'm able to reproduce by rejecting a cpu in that topology map function
forcefully.

That stuff explodes, because the block-mq code assumes that cpu_possible_mask
has no holes.

#define queue_for_each_ctx(q, ctx, i) \
for ((i) = 0; (i) < (q)->nr_queues && \
({ ctx = per_cpu_ptr((q)->queue_ctx, (i)); 1; }); (i)++)

is what makes that assumption about a consecutive possible mask.

The cure for now is the patch below on top of PeterZ's patch.

But we have to clarify and document whether holes in cpu_possible_mask are not
allowed at all or if code like the above is simply broken.

Thanks,

tglx
---
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 643dbdccf4bc..f2ed8a01f870 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -345,7 +345,6 @@ static void __init smp_init_package_map(void)
continue;
pr_warn("CPU %u APICId %x disabled\n", cpu, apicid);
per_cpu(x86_bios_cpu_apicid, cpu) = BAD_APICID;
- set_cpu_possible(cpu, false);
set_cpu_present(cpu, false);
}
}

2016-03-17 11:51:30

[permalink] [raw]

Subject: Re: 4.5.0+ panic when setup loop device

On Thu, Mar 17, 2016 at 12:39:46PM +0100, Thomas Gleixner wrote:
> But we have to clarify and document whether holes in cpu_possible_mask are not
> allowed at all or if code like the above is simply broken.

So the general rule is that cpumasks can have holes, and exempting one
just muddles the water.

Therefore I'd call the code just plain broken.

2016-03-17 11:57:33

by Borislav Petkov

[permalink] [raw]

Subject: Re: 4.5.0+ panic when setup loop device

On Thu, Mar 17, 2016 at 12:51:20PM +0100, Peter Zijlstra wrote:
> On Thu, Mar 17, 2016 at 12:39:46PM +0100, Thomas Gleixner wrote:
> > But we have to clarify and document whether holes in cpu_possible_mask are not
> > allowed at all or if code like the above is simply broken.
>
> So the general rule is that cpumasks can have holes, and exempting one
> just muddles the water.
>
> Therefore I'd call the code just plain broken.

I'll say.

Can't the code simply do:

if (!cpu_possible(i))
continue;

?

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.

2016-03-17 12:03:02

[permalink] [raw]

Subject: Re: 4.5.0+ panic when setup loop device

On Thu, 17 Mar 2016, Peter Zijlstra wrote:
> On Thu, Mar 17, 2016 at 12:39:46PM +0100, Thomas Gleixner wrote:
> > But we have to clarify and document whether holes in cpu_possible_mask are not
> > allowed at all or if code like the above is simply broken.
>
> So the general rule is that cpumasks can have holes, and exempting one
> just muddles the water.
>
> Therefore I'd call the code just plain broken.

Agreed.

That macro is not really helping the readability of the code at all. So a
simple for_each_possible_cpu() loop would have avoided that wreckage.

Thanks,

tglx

2016-03-17 16:42:12

[permalink] [raw]

Subject: Re: 4.5.0+ panic when setup loop device

On 03/17/2016 05:01 AM, Thomas Gleixner wrote:
> On Thu, 17 Mar 2016, Peter Zijlstra wrote:
>> On Thu, Mar 17, 2016 at 12:39:46PM +0100, Thomas Gleixner wrote:
>>> But we have to clarify and document whether holes in cpu_possible_mask are not
>>> allowed at all or if code like the above is simply broken.
>>
>> So the general rule is that cpumasks can have holes, and exempting one
>> just muddles the water.
>>
>> Therefore I'd call the code just plain broken.
>
> Agreed.
>
> That macro is not really helping the readability of the code at all. So a
> simple for_each_possible_cpu() loop would have avoided that wreckage.

Does the attached work? The rest of blk-mq should deal with holes just
fine, we found some of those issues on sparc. Not sure why this one
slipped through the cracks.

--
Jens Axboe

Attachments:

blk-mq-discontig.patch (1.46 kB)

2016-03-17 18:26:27

[permalink] [raw]

Subject: Re: 4.5.0+ panic when setup loop device

On 03/17/2016 09:42 AM, Jens Axboe wrote:
> On 03/17/2016 05:01 AM, Thomas Gleixner wrote:
>> On Thu, 17 Mar 2016, Peter Zijlstra wrote:
>>> On Thu, Mar 17, 2016 at 12:39:46PM +0100, Thomas Gleixner wrote:
>>>> But we have to clarify and document whether holes in
>>>> cpu_possible_mask are not
>>>> allowed at all or if code like the above is simply broken.
>>>
>>> So the general rule is that cpumasks can have holes, and exempting one
>>> just muddles the water.
>>>
>>> Therefore I'd call the code just plain broken.
>>
>> Agreed.
>>
>> That macro is not really helping the readability of the code at all. So a
>> simple for_each_possible_cpu() loop would have avoided that wreckage.
>
> Does the attached work? The rest of blk-mq should deal with holes just
> fine, we found some of those issues on sparc. Not sure why this one
> slipped through the cracks.

This might be better, we need to start at -1 to not miss the first
one... Still untested.

--
Jens Axboe

Attachments:

blk-mq-discontig-v2.patch (1.46 kB)

2016-03-17 20:22:29

[permalink] [raw]

Subject: Re: 4.5.0+ panic when setup loop device

On Thu, 17 Mar 2016, Jens Axboe wrote:
> On 03/17/2016 09:42 AM, Jens Axboe wrote:
> > On 03/17/2016 05:01 AM, Thomas Gleixner wrote:
> > > On Thu, 17 Mar 2016, Peter Zijlstra wrote:
> > > > On Thu, Mar 17, 2016 at 12:39:46PM +0100, Thomas Gleixner wrote:
> > > > > But we have to clarify and document whether holes in
> > > > > cpu_possible_mask are not
> > > > > allowed at all or if code like the above is simply broken.
> > > >
> > > > So the general rule is that cpumasks can have holes, and exempting one
> > > > just muddles the water.
> > > >
> > > > Therefore I'd call the code just plain broken.
> > >
> > > Agreed.
> > >
> > > That macro is not really helping the readability of the code at all. So a
> > > simple for_each_possible_cpu() loop would have avoided that wreckage.
> >
> > Does the attached work? The rest of blk-mq should deal with holes just

Bah. Attachements ...

> > fine, we found some of those issues on sparc. Not sure why this one
> > slipped through the cracks.
>
> This might be better, we need to start at -1 to not miss the first one...
> Still untested.

> +static inline struct blk_mq_ctx *next_ctx(struct request_queue *q, int *i)
> +{
> + do {
> + (*i)++;
> + if (*i < q->nr_queues) {
> + if (cpu_possible(*i))
> + return per_cpu_ptr(q->queue_ctx, *i);
> + continue;
> + }
> + break;
> + } while (1);
> +
> + return NULL;
> +}
> +
> +#define queue_for_each_ctx(q, ctx, i) \
> + for ((i) = -1; (ctx = next_ctx((q), &(i))) != NULL;)
> +

What's wrong with

for_each_possible_cpu(cpu) {
ctx = per_cpu_ptr(q->queue_ctx, cpu);

....
}

instead of hiding it behind an incomprehensible macro mess?

Thanks,

tglx

2016-03-17 20:23:46

[permalink] [raw]

Subject: Re: 4.5.0+ panic when setup loop device

On 03/17/2016 01:20 PM, Thomas Gleixner wrote:
> On Thu, 17 Mar 2016, Jens Axboe wrote:
>> On 03/17/2016 09:42 AM, Jens Axboe wrote:
>>> On 03/17/2016 05:01 AM, Thomas Gleixner wrote:
>>>> On Thu, 17 Mar 2016, Peter Zijlstra wrote:
>>>>> On Thu, Mar 17, 2016 at 12:39:46PM +0100, Thomas Gleixner wrote:
>>>>>> But we have to clarify and document whether holes in
>>>>>> cpu_possible_mask are not
>>>>>> allowed at all or if code like the above is simply broken.
>>>>>
>>>>> So the general rule is that cpumasks can have holes, and exempting one
>>>>> just muddles the water.
>>>>>
>>>>> Therefore I'd call the code just plain broken.
>>>>
>>>> Agreed.
>>>>
>>>> That macro is not really helping the readability of the code at all. So a
>>>> simple for_each_possible_cpu() loop would have avoided that wreckage.
>>>
>>> Does the attached work? The rest of blk-mq should deal with holes just
>
> Bah. Attachements ...

You'll live. Let's face it, all mailers suck in one way or another.

>>> fine, we found some of those issues on sparc. Not sure why this one
>>> slipped through the cracks.
>>
>> This might be better, we need to start at -1 to not miss the first one...
>> Still untested.
>
>> +static inline struct blk_mq_ctx *next_ctx(struct request_queue *q, int *i)
>> +{
>> + do {
>> + (*i)++;
>> + if (*i < q->nr_queues) {
>> + if (cpu_possible(*i))
>> + return per_cpu_ptr(q->queue_ctx, *i);
>> + continue;
>> + }
>> + break;
>> + } while (1);
>> +
>> + return NULL;
>> +}
>> +
>> +#define queue_for_each_ctx(q, ctx, i) \
>> + for ((i) = -1; (ctx = next_ctx((q), &(i))) != NULL;)
>> +
>
> What's wrong with
>
> for_each_possible_cpu(cpu) {
> ctx = per_cpu_ptr(q->queue_ctx, cpu);
>
> ....
> }
>
> instead of hiding it behind an incomprehensible macro mess?

We might not have mapped all of them.

--
Jens Axboe

2016-03-17 20:31:32

[permalink] [raw]

Subject: Re: 4.5.0+ panic when setup loop device

On Thu, 17 Mar 2016, Jens Axboe wrote:
> On 03/17/2016 01:20 PM, Thomas Gleixner wrote:
> > > This might be better, we need to start at -1 to not miss the first one...
> > > Still untested.
> >
> > > +static inline struct blk_mq_ctx *next_ctx(struct request_queue *q, int
> > > *i)
> > > +{
> > > + do {
> > > + (*i)++;
> > > + if (*i < q->nr_queues) {
> > > + if (cpu_possible(*i))
> > > + return per_cpu_ptr(q->queue_ctx, *i);
> > > + continue;
> > > + }
> > > + break;
> > > + } while (1);
> > > +
> > > + return NULL;
> > > +}
> > > +
> > > +#define queue_for_each_ctx(q, ctx, i)
> > > \
> > > + for ((i) = -1; (ctx = next_ctx((q), &(i))) != NULL;)
> > > +
> >
> > What's wrong with
> >
> > for_each_possible_cpu(cpu) {
> > ctx = per_cpu_ptr(q->queue_ctx, cpu);
> >
> > ....
> > }
> >
> > instead of hiding it behind an incomprehensible macro mess?
>
> We might not have mapped all of them.

blk_mq_init_cpu_queues() tells a different story and q->queue_ctx is a per_cpu
allocation.

Thanks,

tglx

2016-03-17 20:41:29

[permalink] [raw]

Subject: Re: 4.5.0+ panic when setup loop device

On 03/17/2016 01:30 PM, Thomas Gleixner wrote:
> On Thu, 17 Mar 2016, Jens Axboe wrote:
>> On 03/17/2016 01:20 PM, Thomas Gleixner wrote:
>>>> This might be better, we need to start at -1 to not miss the first one...
>>>> Still untested.
>>>
>>>> +static inline struct blk_mq_ctx *next_ctx(struct request_queue *q, int
>>>> *i)
>>>> +{
>>>> + do {
>>>> + (*i)++;
>>>> + if (*i < q->nr_queues) {
>>>> + if (cpu_possible(*i))
>>>> + return per_cpu_ptr(q->queue_ctx, *i);
>>>> + continue;
>>>> + }
>>>> + break;
>>>> + } while (1);
>>>> +
>>>> + return NULL;
>>>> +}
>>>> +
>>>> +#define queue_for_each_ctx(q, ctx, i)
>>>> \
>>>> + for ((i) = -1; (ctx = next_ctx((q), &(i))) != NULL;)
>>>> +
>>>
>>> What's wrong with
>>>
>>> for_each_possible_cpu(cpu) {
>>> ctx = per_cpu_ptr(q->queue_ctx, cpu);
>>>
>>> ....
>>> }
>>>
>>> instead of hiding it behind an incomprehensible macro mess?
>>
>> We might not have mapped all of them.
>
> blk_mq_init_cpu_queues() tells a different story and q->queue_ctx is a per_cpu
> allocation.

Yeah my bad, I mistook the possible for online. So we can do the easier fix.

--
Jens Axboe

2016-03-18 02:31:27

by Murphy Zhou

[permalink] [raw]

Subject: Re: 4.5.0+ panic when setup loop device

Hi,

On Thu, Mar 17, 2016 at 7:39 PM, Thomas Gleixner <[email protected]> wrote:
> B1;2802;0cOn Thu, 17 Mar 2016, Peter Zijlstra wrote:
>
>> On Thu, Mar 17, 2016 at 11:21:24AM +0100, Thomas Gleixner wrote:
>> > On Thu, 17 Mar 2016, Peter Zijlstra wrote:
>> >
>> > > Could you please try? I'm not sure how this would explain your loop
>> > > device bug fail, but it certainly pointed towards broken.
>> >
>> > It definitely does not explain it. The wreckage that topo stuff causes is that
>> > it disables a cpu, but that really is not a reason for block/loop to explode.
>>
>> Right. Sadly I could not reproduce that error on my machine. But we can
>> at least start by fixing the 'obvious' problems and then maybe we get
>> more clues ;-)
>
> I'm able to reproduce by rejecting a cpu in that topology map function
> forcefully.
>
> That stuff explodes, because the block-mq code assumes that cpu_possible_mask
> has no holes.
>
> #define queue_for_each_ctx(q, ctx, i) \
> for ((i) = 0; (i) < (q)->nr_queues && \
> ({ ctx = per_cpu_ptr((q)->queue_ctx, (i)); 1; }); (i)++)
>
> is what makes that assumption about a consecutive possible mask.
>
> The cure for now is the patch below on top of PeterZ's patch.

No panic with both Peter's patch and yours.

Thanks all.

--
Xiong

>
> But we have to clarify and document whether holes in cpu_possible_mask are not
> allowed at all or if code like the above is simply broken.
>
> Thanks,
>
> tglx
> ---
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index 643dbdccf4bc..f2ed8a01f870 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -345,7 +345,6 @@ static void __init smp_init_package_map(void)
> continue;
> pr_warn("CPU %u APICId %x disabled\n", cpu, apicid);
> per_cpu(x86_bios_cpu_apicid, cpu) = BAD_APICID;
> - set_cpu_possible(cpu, false);
> set_cpu_present(cpu, false);
> }
> }
>
>
>
>
>
>
>
>

2016-03-18 04:12:10

[permalink] [raw]

Subject: Re: 4.5.0+ panic when setup loop device

On Thu, 2016-03-17 at 10:52 +0100, Peter Zijlstra wrote:

> Andreas; Borislav said to Cc you since you wrote all this.
> The issue is that Linux assumes:
>
> > nr_logical_cpus = nr_cores * nr_siblings

It also seems to now assume that if SMT is possible, it's enabled.

Below is my 8 socket DL980 G7, which has SMT turned off for RT testing,
booting NOPREEMPT master tuned for maximum bloat ala distro and getting
confused by me telling it (as always) nr_cpus=64. Bad juju ensues.

[ 0.216180] max_cores: 8, cpu_ids: 64, num_siblings: 2, coreid_bits: 5
[ 0.226593] smpboot: Max logical packages: 4 <== not
[ 0.233742] smpboot: APIC(0) Converting physical 0 to logical package 0
[ 0.244233] smpboot: APIC(20) Converting physical 1 to logical package 1
[ 0.253765] smpboot: APIC(40) Converting physical 2 to logical package 2
[ 0.264081] smpboot: APIC(60) Converting physical 3 to logical package 3
[ 0.274827] smpboot: APIC(80) Package 4 exceeds logical package map
[ 0.284705] smpboot: CPU 32 APICId 80 disabled
[ 0.292277] smpboot: APIC(a0) Package 5 exceeds logical package map
[ 0.302141] smpboot: CPU 40 APICId a0 disabled
[ 0.308607] smpboot: APIC(c0) Package 6 exceeds logical package map
[ 0.321682] smpboot: CPU 48 APICId c0 disabled
[ 0.328179] smpboot: APIC(e0) Package 7 exceeds logical package map
[ 0.337902] smpboot: CPU 56 APICId e0 disabled
[ 0.345695] DMAR: Host address width 40
[ 0.351511] DMAR: DRHD base: 0x000000b0100000 flags: 0x0
[ 0.360018] DMAR: dmar0: reg_base_addr b0100000 ver 1:0 cap c90780106f0462 ecap f0207e
[ 0.373342] DMAR: DRHD base: 0x000000a8000000 flags: 0x1
[ 0.383164] DMAR: dmar1: reg_base_addr a8000000 ver 1:0 cap c90780106f0462 ecap f0207e
[ 0.396475] DMAR: RMRR base: 0x0000007f7ee000 end: 0x0000007f7effff
[ 0.407255] DMAR: RMRR base: 0x0000007f7e7000 end: 0x0000007f7ecfff
[ 0.418136] DMAR: RMRR base: 0x0000007f62e000 end: 0x0000007f62ffff
[ 0.429787] DMAR: ATSR flags: 0x0
[ 0.434778] DMAR: ATSR flags: 0x0
[ 0.441624] DMAR-IR: IOAPIC id 10 under DRHD base 0xb0100000 IOMMU 0
[ 0.452716] DMAR-IR: IOAPIC id 8 under DRHD base 0xa8000000 IOMMU 1
[ 0.465782] DMAR-IR: IOAPIC id 0 under DRHD base 0xa8000000 IOMMU 1
[ 0.477123] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
[ 0.492918] DMAR-IR: Enabled IRQ remapping in x2apic mode
[ 0.502549] x2apic enabled
[ 0.506678] Switched APIC routing to cluster x2apic.
[ 0.519955] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[ 0.642858] smpboot: CPU0: Intel(R) Xeon(R) CPU X7560 @ 2.27GHz (family: 0x6, model: 0x2e, stepping: 0x6)
[ 0.668111] Performance Events: PEBS fmt1+, 16-deep LBR, Nehalem events, Broken BIOS detected, complain to your hardware vendor.
[ 0.694907] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR 38d is 330)
[ 0.713186] Intel PMU driver.
[ 0.719091] core: CPU erratum AAJ80 worked around
[ 0.731647] core: CPUID marked event: 'bus cycles' unavailable
[ 0.741499] ... version: 3
[ 0.747982] ... bit width: 48
[ 0.754109] ... generic registers: 4
[ 0.760980] ... value mask: 0000ffffffffffff
[ 0.769336] ... max period: 000000007fffffff
[ 0.776913] ... fixed-purpose events: 3
[ 0.783861] ... event mask: 000000070000000f
[ 0.793737] x86: Booting SMP configuration:
[ 0.800069] .... node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 #16 #17 #18 #19 #20 #21 #22 #23 #24 #25 #26 #27 #28 #29 #30 #31 #33 #34 #35 #36 #37 #38 #39 #41 #42 #43 #44 #45 #46 #47 #49 #50 #51 #5>
[ 4.717309] x86: Booted up 1 node, 60 CPUs
[ 4.724551] smpboot: Total of 60 processors activated (271280.00 BogoMIPS)
[ 5.007438] node 0 initialised, 1013474 pages in 36ms

2016-03-18 07:52:30

[permalink] [raw]

Subject: Re: 4.5.0+ panic when setup loop device

On Fri, Mar 18, 2016 at 05:11:54AM +0100, Mike Galbraith wrote:
> On Thu, 2016-03-17 at 10:52 +0100, Peter Zijlstra wrote:
>
> > Andreas; Borislav said to Cc you since you wrote all this.
> > The issue is that Linux assumes:
> >
> > > nr_logical_cpus = nr_cores * nr_siblings
>
> It also seems to now assume that if SMT is possible, it's enabled.

Urgh..

What I think, with my pre wakeup brain, happens is that the CPUID
topology muck still reports 2 siblings (it tends to do that).

But the BIOS only reports APIC-IDs for all your cores, so our
nr_cpu_ids is reduced, while the nr_siblings count it not.

And them *boom*.

This is the same old problem that is nearly impossible to tell if HT is
enabled or not -- complete and utter trainwreck :/

I need to go make wake-up-juice and ponder wth to do about this.

2016-03-18 10:16:34

[permalink] [raw]

Subject: Re: 4.5.0+ panic when setup loop device

On Fri, Mar 18, 2016 at 05:11:54AM +0100, Mike Galbraith wrote:
> On Thu, 2016-03-17 at 10:52 +0100, Peter Zijlstra wrote:
>
> > Andreas; Borislav said to Cc you since you wrote all this.
> > The issue is that Linux assumes:
> >
> > > nr_logical_cpus = nr_cores * nr_siblings
>
> It also seems to now assume that if SMT is possible, it's enabled.
>
> Below is my 8 socket DL980 G7, which has SMT turned off for RT testing,
> booting NOPREEMPT master tuned for maximum bloat ala distro and getting
> confused by me telling it (as always) nr_cpus=64. Bad juju ensues.

Ah, did you actually disable HT in the BIOS, or just skip the HT
enumeration by saying nr_cpus=64 (knowing that all the siblings are
last)?

In any case, Thomas has a clue and I'm going to test, but 4 socket
machine takes forever to boot, so might be a few minutes :/

2016-03-18 11:57:26

[permalink] [raw]

Subject: Re: 4.5.0+ panic when setup loop device

On Fri, 18 Mar 2016, Mike Galbraith wrote:
> On Thu, 2016-03-17 at 10:52 +0100, Peter Zijlstra wrote:
>
> > Andreas; Borislav said to Cc you since you wrote all this.
> > The issue is that Linux assumes:
> >
> > > nr_logical_cpus = nr_cores * nr_siblings
>
> It also seems to now assume that if SMT is possible, it's enabled.
>
> Below is my 8 socket DL980 G7, which has SMT turned off for RT testing,
> booting NOPREEMPT master tuned for maximum bloat ala distro and getting
> confused by me telling it (as always) nr_cpus=64. Bad juju ensues.

:)

Does the patch below fix the wreckage?

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 643dbdccf4bc..c5ac71276076 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -319,7 +319,7 @@ static void __init smp_init_package_map(void)
* might change in the future....
*/
ncpus = boot_cpu_data.x86_max_cores * smp_num_siblings;
- __max_logical_packages = DIV_ROUND_UP(nr_cpu_ids, ncpus);
+ __max_logical_packages = DIV_ROUND_UP(total_cpus, ncpus);

/*
* Possibly larger than what we need as the number of apic ids per

2016-03-18 12:39:24

[permalink] [raw]

Subject: Re: 4.5.0+ panic when setup loop device

On Fri, 2016-03-18 at 11:15 +0100, Peter Zijlstra wrote:
> On Fri, Mar 18, 2016 at 05:11:54AM +0100, Mike Galbraith wrote:
> > On Thu, 2016-03-17 at 10:52 +0100, Peter Zijlstra wrote:
> >
> > > Andreas; Borislav said to Cc you since you wrote all this.
> > > The issue is that Linux assumes:
> > >
> > > > nr_logical_cpus = nr_cores * nr_siblings
> >
> > It also seems to now assume that if SMT is possible, it's enabled.
> >
> > Below is my 8 socket DL980 G7, which has SMT turned off for RT
> > testing,
> > booting NOPREEMPT master tuned for maximum bloat ala distro and
> > getting
> > confused by me telling it (as always) nr_cpus=64. Bad juju ensues.
>
> Ah, did you actually disable HT in the BIOS, or just skip the HT
> enumeration by saying nr_cpus=64 (knowing that all the siblings are
> last)?

It's disabled in BIOS.

> In any case, Thomas has a clue and I'm going to test, but 4 socket
> machine takes forever to boot, so might be a few minutes :/

His one-liner made my DL980 all better.

-Mike

2016-03-18 12:39:53

[permalink] [raw]

Subject: Re: 4.5.0+ panic when setup loop device

On Fri, 2016-03-18 at 12:55 +0100, Thomas Gleixner wrote:

> Does the patch below fix the wreckage?

Yup, all better.

> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index 643dbdccf4bc..c5ac71276076 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -319,7 +319,7 @@ static void __init smp_init_package_map(void)
> * might change in the future....
> */
> ncpus = boot_cpu_data.x86_max_cores * smp_num_siblings;
> - __max_logical_packages = DIV_ROUND_UP(nr_cpu_ids, ncpus);
> + __max_logical_packages = DIV_ROUND_UP(total_cpus, ncpus);
>
> /*
> * Possibly larger than what we need as the number of apic
> ids per

2016-03-18 13:32:44

[permalink] [raw]

Subject: Re: 4.5.0+ panic when setup loop device

On Fri, Mar 18, 2016 at 01:39:16PM +0100, Mike Galbraith wrote:
> On Fri, 2016-03-18 at 11:15 +0100, Peter Zijlstra wrote:

> > Ah, did you actually disable HT in the BIOS, or just skip the HT
> > enumeration by saying nr_cpus=64 (knowing that all the siblings are
> > last)?
>
> It's disabled in BIOS.

OK, so I disabled HT in the BIOS too, on my IVB-EX

> > In any case, Thomas has a clue and I'm going to test, but 4 socket
> > machine takes forever to boot, so might be a few minutes :/
>
> His one-liner made my DL980 all better.

My machine is profoundly unhappy though; and since I have no nr_cpus=
nr_cpu_ids == total_cpus and his patch wouldn't do anything anyway.

[ 0.286838] max_cores: 15, cpu_ids: 60, num_siblings: 2, coreid_bits: 5
[ 0.293463] smpboot: Max logical packages: 2
[ 0.297733] smpboot: APIC(0) Converting physical 0 to logical package 0
[ 0.304346] smpboot: APIC(20) Converting physical 1 to logical package 1
[ 0.311047] smpboot: APIC(40) Package 2 exceeds logical package map
[ 0.317309] smpboot: CPU 30 APICId 40 disabled
[ 0.321757] smpboot: APIC(60) Package 3 exceeds logical package map
[ 0.328022] smpboot: CPU 45 APICId 60 disabled

This machine does exactly what I suspected yours did.

2016-03-18 14:07:54