2016-03-16 06:48:37

by Murphy Zhou

[permalink] [raw]
Subject: 4.5.0+ panic when setup loop device

hi,

This panic was not found on 4.5 final kernel. Now it is reproducible
till commit 710d60cbf .

bisect point to :
commit 1f12e32f4cd5243ae46d8b933181be0d022c6793
Author: Thomas Gleixner <[email protected]>
Date: Mon Feb 22 22:19:15 2016 +0000

x86/topology: Create logical package id


It showed on next-20150302 tree firstly, bisect between 0301 tree and
0302 tree went nowhere..
Now it is reproducible on Linus tree.

steps:
fallocate -l 1G test.img
loopdev=$(losetup --find --show test.img)


full log , bisect log and config are attached.

Thanks,
Xiong


[ 503.302458] loop: module loaded
[ 503.322815] BUG: unable to handle kernel paging request at ffffe8f8fca41a14
[ 503.363966] IP: [<ffffffff81340887>] kobject_init+0x17/0x90
[ 503.396755] PGD bbf22067 PUD bbf21067 PMD bbf20067 PTE 0
[ 503.428178] Oops: 0000 [#1] SMP
[ 503.447951] Modules linked in: loop kvm_amd kvm irqbypass
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel lrw
gf128mul ipmi_ssif glue_helper ablk_helper sg amd64_edac_mod nfsd
pcspkr hpilo cryptd ipmi_si edac_mce_amd hpwdt sp5100_tco
ipmi_msghandler edac_core fam15h_power shpchp i2c_piix4 k10temp
acpi_power_meter acpi_cpufreq auth_rpcgss nfs_acl lockd grace sunrpc
ip_tables xfs libcrc32c sd_mod radeon i2c_algo_bit drm_kms_helper
syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ata_generic
pata_acpi ahci drm libahci pata_atiixp hpsa libata crc32c_intel
i2c_core nd_pmem serio_raw bnx2 scsi_transport_sas dm_mirror
dm_region_hash dm_log dm_mod
[ 503.788265] CPU: 13 PID: 12511 Comm: losetup Not tainted 4.5.0-rc6+ #78
[ 503.827285] Hardware name: HP ProLiant DL385 G7, BIOS A18 03/19/2012
[ 503.865452] task: ffff880132fb8000 ti: ffff88083a210000 task.ti:
ffff88083a210000
[ 503.909812] RIP: 0010:[<ffffffff81340887>] [<ffffffff81340887>]
kobject_init+0x17/0x90
[ 503.958992] RSP: 0018:ffff88083a213d28 EFLAGS: 00010282
[ 503.990539] RAX: ffffe8f8fca41900 RBX: ffffe8f8fca419d8 RCX: 0000000000000001
[ 504.033492] RDX: 0000000000000020 RSI: ffffffff81adb040 RDI: ffffe8f8fca419d8
[ 504.075306] RBP: ffff88083a213d38 R08: 0000000000000000 R09: ffffffff813401be
[ 504.115608] R10: ffff88013bb5a400 R11: ffffea0020d61a00 R12: ffffffff81adb040
[ 504.155965] R13: ffff8800ac4f85c8 R14: ffff880836ef8800 R15: ffff880836ef8880
[ 504.195840] FS: 00007ff685dbb740(0000) GS:ffff88013bb40000(0000)
knlGS:0000000000000000
[ 504.241973] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 504.275430] CR2: ffffe8f8fca41a14 CR3: 0000000133a1d000 CR4: 00000000000406e0
[ 504.314203] Stack:
[ 504.325542] 0000000000000001 ffff8800ac4f8000 ffff88083a213d70
ffffffff81320296
[ 504.366429] ffff8800ac4f8000 ffff880836ef8800 ffff880836ef8870
ffff8800ac4f8588
[ 504.410991] ffff880836ef8880 ffff88083a213db0 ffffffff813157d4
ffff880836ef8880
[ 504.455541] Call Trace:
[ 504.469999] [<ffffffff81320296>] blk_mq_register_disk+0xa6/0x160
[ 504.505490] [<ffffffff813157d4>] blk_register_queue+0xb4/0x160
[ 504.540521] [<ffffffff813230fd>] add_disk+0x1dd/0x4a0
[ 504.570938] [<ffffffffa0516f70>] loop_add+0x1f0/0x270 [loop]
[ 504.604979] [<ffffffffa0517222>] loop_control_ioctl+0x112/0x160 [loop]
[ 504.644763] [<ffffffff81226986>] do_vfs_ioctl+0xa6/0x5c0
[ 504.674226] [<ffffffff81226f19>] SyS_ioctl+0x79/0x90
[ 504.702882] [<ffffffff816bb9ee>] entry_SYSCALL_64_fastpath+0x12/0x71
[ 504.737751] Code: e5 ff d0 5d c3 48 c7 c0 fb ff ff ff c3 0f 1f 80
00 00 00 00 55 48 85 ff 48 89 e5 41 54 53 48 89 fb 74 37 48 85 f6 49
89 f4 74 66 <f6> 47 3c 01 75 48 48 8d 43 08 c7 43 38 01 00 00 00 4c 89
63 28
[ 504.843245] RIP [<ffffffff81340887>] kobject_init+0x17/0x90
[ 504.875091] RSP <ffff88083a213d28>
[ 504.894984] CR2: ffffe8f8fca41a14
[ 504.914017] ---[ end trace 1afddf59bf08cc38 ]---


Attachments:
looppanic (151.85 kB)

2016-03-16 15:27:31

by Thomas Gleixner

[permalink] [raw]
Subject: Re: 4.5.0+ panic when setup loop device

On Wed, 16 Mar 2016, Xiong Zhou wrote:
> full log , bisect log and config are attached.

Can you please provide a full boot log and the output of 'cat /proc/cpuinfo' ?

Thanks,

tglx

2016-03-17 01:56:11

by Murphy Zhou

[permalink] [raw]
Subject: Re: 4.5.0+ panic when setup loop device

On Wed, Mar 16, 2016 at 11:26 PM, Thomas Gleixner <[email protected]> wrote:
> On Wed, 16 Mar 2016, Xiong Zhou wrote:
>> full log , bisect log and config are attached.
>
> Can you please provide a full boot log and the output of 'cat /proc/cpuinfo' ?

Attached. Thank you.

>
> Thanks,
>
> tglx


Attachments:
bootlog (115.51 kB)
cpuinfo (34.49 kB)
Download all attachments

2016-03-17 09:52:47

by Peter Zijlstra

[permalink] [raw]
Subject: Re: 4.5.0+ panic when setup loop device

On Thu, Mar 17, 2016 at 09:56:05AM +0800, Xiong Zhou wrote:
> On Wed, Mar 16, 2016 at 11:26 PM, Thomas Gleixner <[email protected]> wrote:

> > Can you please provide a full boot log and the output of 'cat /proc/cpuinfo' ?

Mar 17 17:34:30 myhost kernel: smpboot: Max logical packages: 1
Mar 17 17:34:30 myhost kernel: smpboot: APIC(20) Converting physical 1 to logical package 0
Mar 17 17:34:30 myhost kernel: smpboot: APIC(40) Package 2 exceeds logical package map

So that is busted.. it turns out AMD gets x86_max_cores wrong when there
are compute units.

Mar 17 17:34:30 myhost kernel: smpboot: CPU 1 APICId 40 disabled
Mar 17 17:34:30 myhost kernel: Switched APIC routing to physical flat.
Mar 17 17:34:30 myhost kernel: ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
Mar 17 17:34:30 myhost kernel: smpboot: CPU0: AMD Opteron(TM) Processor 6274 (family: 0x15, model: 0x1, stepping: 0x2)
Mar 17 17:34:30 myhost kernel: Performance Events: Fam15h core perfctr, Broken BIOS detected, complain to your hardware vendor.
Mar 17 17:34:30 myhost kernel: [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR c0010200 is 430076)
Mar 17 17:34:30 myhost kernel: AMD PMU driver.
Mar 17 17:34:30 myhost kernel: ... version: 0
Mar 17 17:34:30 myhost kernel: ... bit width: 48
Mar 17 17:34:30 myhost kernel: ... generic registers: 6
Mar 17 17:34:30 myhost kernel: ... value mask: 0000ffffffffffff
Mar 17 17:34:30 myhost kernel: ... max period: 00007fffffffffff
Mar 17 17:34:30 myhost kernel: ... fixed-purpose events: 0
Mar 17 17:34:30 myhost kernel: ... event mask: 000000000000003f
Mar 17 17:34:30 myhost kernel: NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.
Mar 17 17:34:30 myhost kernel: .... node #0, CPUs: #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 #16
Mar 17 17:34:30 myhost kernel: .... node #3, CPUs: #17
Mar 17 17:34:30 myhost kernel: .... node #0, CPUs: #18
Mar 17 17:34:30 myhost kernel: .... node #3, CPUs: #19
Mar 17 17:34:30 myhost kernel: .... node #0, CPUs: #20
Mar 17 17:34:30 myhost kernel: .... node #3, CPUs: #21
Mar 17 17:34:30 myhost kernel: .... node #0, CPUs: #22
Mar 17 17:34:30 myhost kernel: .... node #3, CPUs: #23
Mar 17 17:34:30 myhost kernel: .... node #0, CPUs: #24
Mar 17 17:34:30 myhost kernel: .... node #3, CPUs: #25
Mar 17 17:34:30 myhost kernel: .... node #0, CPUs: #26
Mar 17 17:34:30 myhost kernel: .... node #3, CPUs: #27
Mar 17 17:34:30 myhost kernel: .... node #0, CPUs: #28
Mar 17 17:34:30 myhost kernel: .... node #3, CPUs: #29
Mar 17 17:34:30 myhost kernel: .... node #0, CPUs: #30
Mar 17 17:34:30 myhost kernel: .... node #3, CPUs: #31
Mar 17 17:34:30 myhost kernel: x86: Booted up 2 nodes, 31 CPUs

And that is one weird node mapping..


I have a similar system, which after the below patch says:

[ 0.182174] max_cores: 8, cpu_ids: 32, num_siblings: 2, coreid_bits: 5
[ 0.188712] smpboot: Max logical packages: 2
[ 0.192988] smpboot: APIC(20) Converting physical 1 to logical package 0
[ 0.199689] smpboot: APIC(40) Converting physical 2 to logical package 1
[ 0.206405] Switched APIC routing to physical flat.
[ 0.211851] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[ 0.329578] smpboot: CPU0: AMD Opteron(tm) Processor 6278 (family: 0x15, model: 0x1, stepping: 0x2)
[ 0.338705] Performance Events: Fam15h core perfctr, AMD PMU driver.
[ 0.345134] ... version: 0
[ 0.349147] ... bit width: 48
[ 0.353262] ... generic registers: 6
[ 0.357274] ... value mask: 0000ffffffffffff
[ 0.362586] ... max period: 00007fffffffffff
[ 0.367900] ... fixed-purpose events: 0
[ 0.371911] ... event mask: 000000000000003f
[ 0.378664] MCE: In-kernel MCE decoding enabled.
[ 0.383965] NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.
[ 0.393079] x86: Booting SMP configuration:
[ 0.397262] .... node #0, CPUs: #1 #2 #3 #4 #5 #6 #7
[ 0.848764] .... node #1, CPUs: #8 #9 #10 #11 #12 #13 #14 #15
[ 1.364701] .... node #2, CPUs: #16 #17 #18 #19 #20 #21 #22 #23
[ 1.898586] .... node #3, CPUs: #24 #25 #26 #27 #28 #29 #30 #31
[ 2.413417] x86: Booted up 4 nodes, 32 CPUs

Could you please try? I'm not sure how this would explain your loop
device bug fail, but it certainly pointed towards broken.


Andreas; Borislav said to Cc you since you wrote all this.
The issue is that Linux assumes:

nr_logical_cpus = nr_cores * nr_siblings

But AMD reports its CU unit as 2 cores, but then sets num_smp_siblings
to 2 as well.

Thomas; I removed that first branch testing pkg against
__max_logical_packages because if the first pkg id is larger, then the
find_first_zero will find us logical package id 0. However, if the
second pkg id is indeed 0, we'll again claim it without testing if it
was already taken. Also, it fails to print the mapping.


---
arch/x86/kernel/cpu/amd.c | 8 ++++----
arch/x86/kernel/smpboot.c | 11 ++++++-----
2 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 97c59fd..6216e80 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -310,9 +310,9 @@ static void amd_get_topology(struct cpuinfo_x86 *c)
node_id = ecx & 7;

/* get compute unit information */
- smp_num_siblings = ((ebx >> 8) & 3) + 1;
+ cores_per_cu = smp_num_siblings = ((ebx >> 8) & 3) + 1;
+ c->x86_max_cores /= smp_num_siblings;
c->compute_unit_id = ebx & 0xff;
- cores_per_cu += ((ebx >> 8) & 3);
} else if (cpu_has(c, X86_FEATURE_NODEID_MSR)) {
u64 value;

@@ -328,8 +328,8 @@ static void amd_get_topology(struct cpuinfo_x86 *c)
u32 cus_per_node;

set_cpu_cap(c, X86_FEATURE_AMD_DCM);
- cores_per_node = c->x86_max_cores / nodes_per_socket;
- cus_per_node = cores_per_node / cores_per_cu;
+ cus_per_node = c->x86_max_cores / nodes_per_socket;
+ cores_per_node = cus_per_node * cores_per_cu;

/* store NodeID, use llc_shared_map to store sibling info */
per_cpu(cpu_llc_id, cpu) = node_id;
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 643dbdc..15c5fda 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -274,11 +274,6 @@ int topology_update_package_map(unsigned int apicid, unsigned int cpu)
if (test_and_set_bit(pkg, physical_package_map))
goto found;

- if (pkg < __max_logical_packages) {
- set_bit(pkg, logical_package_map);
- physical_to_logical_pkg[pkg] = pkg;
- goto found;
- }
new = find_first_zero_bit(logical_package_map, __max_logical_packages);
if (new >= __max_logical_packages) {
physical_to_logical_pkg[pkg] = -1;
@@ -314,6 +309,12 @@ static void __init smp_init_package_map(void)
unsigned int ncpus, cpu;
size_t size;

+ printk("max_cores: %d, cpu_ids: %d, num_siblings: %d, coreid_bits: %d\n",
+ boot_cpu_data.x86_max_cores,
+ nr_cpu_ids,
+ smp_num_siblings,
+ boot_cpu_data.x86_coreid_bits);
+
/*
* Today neither Intel nor AMD support heterogenous systems. That
* might change in the future....

2016-03-17 09:56:56

by Peter Zijlstra

[permalink] [raw]
Subject: Re: 4.5.0+ panic when setup loop device

On Thu, Mar 17, 2016 at 10:52:20AM +0100, Peter Zijlstra wrote:
> Mar 17 17:34:30 myhost kernel: smpboot: CPU0: AMD Opteron(TM) Processor 6274 (family: 0x15, model: 0x1, stepping: 0x2)
> Mar 17 17:34:30 myhost kernel: Performance Events: Fam15h core perfctr, Broken BIOS detected, complain to your hardware vendor.
> Mar 17 17:34:30 myhost kernel: [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR c0010200 is 430076)

FWIW, you might want to talk to HP about that; some machines have a
magic key-combo in their BIOS screen with extra options, allowing you to
fix this.

2016-03-17 10:22:58

by Thomas Gleixner

[permalink] [raw]
Subject: Re: 4.5.0+ panic when setup loop device

On Thu, 17 Mar 2016, Peter Zijlstra wrote:

> Could you please try? I'm not sure how this would explain your loop
> device bug fail, but it certainly pointed towards broken.

It definitely does not explain it. The wreckage that topo stuff causes is that
it disables a cpu, but that really is not a reason for block/loop to explode.

Thanks,

tglx

2016-03-17 10:26:42

by Peter Zijlstra

[permalink] [raw]
Subject: Re: 4.5.0+ panic when setup loop device

On Thu, Mar 17, 2016 at 11:21:24AM +0100, Thomas Gleixner wrote:
> On Thu, 17 Mar 2016, Peter Zijlstra wrote:
>
> > Could you please try? I'm not sure how this would explain your loop
> > device bug fail, but it certainly pointed towards broken.
>
> It definitely does not explain it. The wreckage that topo stuff causes is that
> it disables a cpu, but that really is not a reason for block/loop to explode.

Right. Sadly I could not reproduce that error on my machine. But we can
at least start by fixing the 'obvious' problems and then maybe we get
more clues ;-)

2016-03-17 11:41:17

by Thomas Gleixner

[permalink] [raw]
Subject: Re: 4.5.0+ panic when setup loop device

B1;2802;0cOn Thu, 17 Mar 2016, Peter Zijlstra wrote:

> On Thu, Mar 17, 2016 at 11:21:24AM +0100, Thomas Gleixner wrote:
> > On Thu, 17 Mar 2016, Peter Zijlstra wrote:
> >
> > > Could you please try? I'm not sure how this would explain your loop
> > > device bug fail, but it certainly pointed towards broken.
> >
> > It definitely does not explain it. The wreckage that topo stuff causes is that
> > it disables a cpu, but that really is not a reason for block/loop to explode.
>
> Right. Sadly I could not reproduce that error on my machine. But we can
> at least start by fixing the 'obvious' problems and then maybe we get
> more clues ;-)

I'm able to reproduce by rejecting a cpu in that topology map function
forcefully.

That stuff explodes, because the block-mq code assumes that cpu_possible_mask
has no holes.

#define queue_for_each_ctx(q, ctx, i) \
for ((i) = 0; (i) < (q)->nr_queues && \
({ ctx = per_cpu_ptr((q)->queue_ctx, (i)); 1; }); (i)++)

is what makes that assumption about a consecutive possible mask.

The cure for now is the patch below on top of PeterZ's patch.

But we have to clarify and document whether holes in cpu_possible_mask are not
allowed at all or if code like the above is simply broken.

Thanks,

tglx
---
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 643dbdccf4bc..f2ed8a01f870 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -345,7 +345,6 @@ static void __init smp_init_package_map(void)
continue;
pr_warn("CPU %u APICId %x disabled\n", cpu, apicid);
per_cpu(x86_bios_cpu_apicid, cpu) = BAD_APICID;
- set_cpu_possible(cpu, false);
set_cpu_present(cpu, false);
}
}









2016-03-17 11:51:30

by Peter Zijlstra

[permalink] [raw]
Subject: Re: 4.5.0+ panic when setup loop device

On Thu, Mar 17, 2016 at 12:39:46PM +0100, Thomas Gleixner wrote:
> But we have to clarify and document whether holes in cpu_possible_mask are not
> allowed at all or if code like the above is simply broken.

So the general rule is that cpumasks can have holes, and exempting one
just muddles the water.

Therefore I'd call the code just plain broken.

2016-03-17 11:57:33

by Borislav Petkov

[permalink] [raw]
Subject: Re: 4.5.0+ panic when setup loop device

On Thu, Mar 17, 2016 at 12:51:20PM +0100, Peter Zijlstra wrote:
> On Thu, Mar 17, 2016 at 12:39:46PM +0100, Thomas Gleixner wrote:
> > But we have to clarify and document whether holes in cpu_possible_mask are not
> > allowed at all or if code like the above is simply broken.
>
> So the general rule is that cpumasks can have holes, and exempting one
> just muddles the water.
>
> Therefore I'd call the code just plain broken.

I'll say.

Can't the code simply do:

if (!cpu_possible(i))
continue;

?

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.

2016-03-17 12:03:02

by Thomas Gleixner

[permalink] [raw]
Subject: Re: 4.5.0+ panic when setup loop device

On Thu, 17 Mar 2016, Peter Zijlstra wrote:
> On Thu, Mar 17, 2016 at 12:39:46PM +0100, Thomas Gleixner wrote:
> > But we have to clarify and document whether holes in cpu_possible_mask are not
> > allowed at all or if code like the above is simply broken.
>
> So the general rule is that cpumasks can have holes, and exempting one
> just muddles the water.
>
> Therefore I'd call the code just plain broken.

Agreed.

That macro is not really helping the readability of the code at all. So a
simple for_each_possible_cpu() loop would have avoided that wreckage.

Thanks,

tglx


2016-03-17 16:42:12

by Jens Axboe

[permalink] [raw]
Subject: Re: 4.5.0+ panic when setup loop device

On 03/17/2016 05:01 AM, Thomas Gleixner wrote:
> On Thu, 17 Mar 2016, Peter Zijlstra wrote:
>> On Thu, Mar 17, 2016 at 12:39:46PM +0100, Thomas Gleixner wrote:
>>> But we have to clarify and document whether holes in cpu_possible_mask are not
>>> allowed at all or if code like the above is simply broken.
>>
>> So the general rule is that cpumasks can have holes, and exempting one
>> just muddles the water.
>>
>> Therefore I'd call the code just plain broken.
>
> Agreed.
>
> That macro is not really helping the readability of the code at all. So a
> simple for_each_possible_cpu() loop would have avoided that wreckage.

Does the attached work? The rest of blk-mq should deal with holes just
fine, we found some of those issues on sparc. Not sure why this one
slipped through the cracks.

--
Jens Axboe


Attachments:
blk-mq-discontig.patch (1.46 kB)

2016-03-17 18:26:27

by Jens Axboe

[permalink] [raw]
Subject: Re: 4.5.0+ panic when setup loop device

On 03/17/2016 09:42 AM, Jens Axboe wrote:
> On 03/17/2016 05:01 AM, Thomas Gleixner wrote:
>> On Thu, 17 Mar 2016, Peter Zijlstra wrote:
>>> On Thu, Mar 17, 2016 at 12:39:46PM +0100, Thomas Gleixner wrote:
>>>> But we have to clarify and document whether holes in
>>>> cpu_possible_mask are not
>>>> allowed at all or if code like the above is simply broken.
>>>
>>> So the general rule is that cpumasks can have holes, and exempting one
>>> just muddles the water.
>>>
>>> Therefore I'd call the code just plain broken.
>>
>> Agreed.
>>
>> That macro is not really helping the readability of the code at all. So a
>> simple for_each_possible_cpu() loop would have avoided that wreckage.
>
> Does the attached work? The rest of blk-mq should deal with holes just
> fine, we found some of those issues on sparc. Not sure why this one
> slipped through the cracks.

This might be better, we need to start at -1 to not miss the first
one... Still untested.

--
Jens Axboe


Attachments:
blk-mq-discontig-v2.patch (1.46 kB)

2016-03-17 20:22:29

by Thomas Gleixner

[permalink] [raw]
Subject: Re: 4.5.0+ panic when setup loop device

On Thu, 17 Mar 2016, Jens Axboe wrote:
> On 03/17/2016 09:42 AM, Jens Axboe wrote:
> > On 03/17/2016 05:01 AM, Thomas Gleixner wrote:
> > > On Thu, 17 Mar 2016, Peter Zijlstra wrote:
> > > > On Thu, Mar 17, 2016 at 12:39:46PM +0100, Thomas Gleixner wrote:
> > > > > But we have to clarify and document whether holes in
> > > > > cpu_possible_mask are not
> > > > > allowed at all or if code like the above is simply broken.
> > > >
> > > > So the general rule is that cpumasks can have holes, and exempting one
> > > > just muddles the water.
> > > >
> > > > Therefore I'd call the code just plain broken.
> > >
> > > Agreed.
> > >
> > > That macro is not really helping the readability of the code at all. So a
> > > simple for_each_possible_cpu() loop would have avoided that wreckage.
> >
> > Does the attached work? The rest of blk-mq should deal with holes just

Bah. Attachements ...

> > fine, we found some of those issues on sparc. Not sure why this one
> > slipped through the cracks.
>
> This might be better, we need to start at -1 to not miss the first one...
> Still untested.

> +static inline struct blk_mq_ctx *next_ctx(struct request_queue *q, int *i)
> +{
> + do {
> + (*i)++;
> + if (*i < q->nr_queues) {
> + if (cpu_possible(*i))
> + return per_cpu_ptr(q->queue_ctx, *i);
> + continue;
> + }
> + break;
> + } while (1);
> +
> + return NULL;
> +}
> +
> +#define queue_for_each_ctx(q, ctx, i) \
> + for ((i) = -1; (ctx = next_ctx((q), &(i))) != NULL;)
> +

What's wrong with

for_each_possible_cpu(cpu) {
ctx = per_cpu_ptr(q->queue_ctx, cpu);

....
}

instead of hiding it behind an incomprehensible macro mess?

Thanks,

tglx

2016-03-17 20:23:46

by Jens Axboe

[permalink] [raw]
Subject: Re: 4.5.0+ panic when setup loop device

On 03/17/2016 01:20 PM, Thomas Gleixner wrote:
> On Thu, 17 Mar 2016, Jens Axboe wrote:
>> On 03/17/2016 09:42 AM, Jens Axboe wrote:
>>> On 03/17/2016 05:01 AM, Thomas Gleixner wrote:
>>>> On Thu, 17 Mar 2016, Peter Zijlstra wrote:
>>>>> On Thu, Mar 17, 2016 at 12:39:46PM +0100, Thomas Gleixner wrote:
>>>>>> But we have to clarify and document whether holes in
>>>>>> cpu_possible_mask are not
>>>>>> allowed at all or if code like the above is simply broken.
>>>>>
>>>>> So the general rule is that cpumasks can have holes, and exempting one
>>>>> just muddles the water.
>>>>>
>>>>> Therefore I'd call the code just plain broken.
>>>>
>>>> Agreed.
>>>>
>>>> That macro is not really helping the readability of the code at all. So a
>>>> simple for_each_possible_cpu() loop would have avoided that wreckage.
>>>
>>> Does the attached work? The rest of blk-mq should deal with holes just
>
> Bah. Attachements ...

You'll live. Let's face it, all mailers suck in one way or another.

>>> fine, we found some of those issues on sparc. Not sure why this one
>>> slipped through the cracks.
>>
>> This might be better, we need to start at -1 to not miss the first one...
>> Still untested.
>
>> +static inline struct blk_mq_ctx *next_ctx(struct request_queue *q, int *i)
>> +{
>> + do {
>> + (*i)++;
>> + if (*i < q->nr_queues) {
>> + if (cpu_possible(*i))
>> + return per_cpu_ptr(q->queue_ctx, *i);
>> + continue;
>> + }
>> + break;
>> + } while (1);
>> +
>> + return NULL;
>> +}
>> +
>> +#define queue_for_each_ctx(q, ctx, i) \
>> + for ((i) = -1; (ctx = next_ctx((q), &(i))) != NULL;)
>> +
>
> What's wrong with
>
> for_each_possible_cpu(cpu) {
> ctx = per_cpu_ptr(q->queue_ctx, cpu);
>
> ....
> }
>
> instead of hiding it behind an incomprehensible macro mess?

We might not have mapped all of them.

--
Jens Axboe

2016-03-17 20:31:32

by Thomas Gleixner

[permalink] [raw]
Subject: Re: 4.5.0+ panic when setup loop device

On Thu, 17 Mar 2016, Jens Axboe wrote:
> On 03/17/2016 01:20 PM, Thomas Gleixner wrote:
> > > This might be better, we need to start at -1 to not miss the first one...
> > > Still untested.
> >
> > > +static inline struct blk_mq_ctx *next_ctx(struct request_queue *q, int
> > > *i)
> > > +{
> > > + do {
> > > + (*i)++;
> > > + if (*i < q->nr_queues) {
> > > + if (cpu_possible(*i))
> > > + return per_cpu_ptr(q->queue_ctx, *i);
> > > + continue;
> > > + }
> > > + break;
> > > + } while (1);
> > > +
> > > + return NULL;
> > > +}
> > > +
> > > +#define queue_for_each_ctx(q, ctx, i)
> > > \
> > > + for ((i) = -1; (ctx = next_ctx((q), &(i))) != NULL;)
> > > +
> >
> > What's wrong with
> >
> > for_each_possible_cpu(cpu) {
> > ctx = per_cpu_ptr(q->queue_ctx, cpu);
> >
> > ....
> > }
> >
> > instead of hiding it behind an incomprehensible macro mess?
>
> We might not have mapped all of them.

blk_mq_init_cpu_queues() tells a different story and q->queue_ctx is a per_cpu
allocation.

Thanks,

tglx

2016-03-17 20:41:29

by Jens Axboe

[permalink] [raw]
Subject: Re: 4.5.0+ panic when setup loop device

On 03/17/2016 01:30 PM, Thomas Gleixner wrote:
> On Thu, 17 Mar 2016, Jens Axboe wrote:
>> On 03/17/2016 01:20 PM, Thomas Gleixner wrote:
>>>> This might be better, we need to start at -1 to not miss the first one...
>>>> Still untested.
>>>
>>>> +static inline struct blk_mq_ctx *next_ctx(struct request_queue *q, int
>>>> *i)
>>>> +{
>>>> + do {
>>>> + (*i)++;
>>>> + if (*i < q->nr_queues) {
>>>> + if (cpu_possible(*i))
>>>> + return per_cpu_ptr(q->queue_ctx, *i);
>>>> + continue;
>>>> + }
>>>> + break;
>>>> + } while (1);
>>>> +
>>>> + return NULL;
>>>> +}
>>>> +
>>>> +#define queue_for_each_ctx(q, ctx, i)
>>>> \
>>>> + for ((i) = -1; (ctx = next_ctx((q), &(i))) != NULL;)
>>>> +
>>>
>>> What's wrong with
>>>
>>> for_each_possible_cpu(cpu) {
>>> ctx = per_cpu_ptr(q->queue_ctx, cpu);
>>>
>>> ....
>>> }
>>>
>>> instead of hiding it behind an incomprehensible macro mess?
>>
>> We might not have mapped all of them.
>
> blk_mq_init_cpu_queues() tells a different story and q->queue_ctx is a per_cpu
> allocation.

Yeah my bad, I mistook the possible for online. So we can do the easier fix.

--
Jens Axboe

2016-03-18 02:31:27

by Murphy Zhou

[permalink] [raw]
Subject: Re: 4.5.0+ panic when setup loop device

Hi,

On Thu, Mar 17, 2016 at 7:39 PM, Thomas Gleixner <[email protected]> wrote:
> B1;2802;0cOn Thu, 17 Mar 2016, Peter Zijlstra wrote:
>
>> On Thu, Mar 17, 2016 at 11:21:24AM +0100, Thomas Gleixner wrote:
>> > On Thu, 17 Mar 2016, Peter Zijlstra wrote:
>> >
>> > > Could you please try? I'm not sure how this would explain your loop
>> > > device bug fail, but it certainly pointed towards broken.
>> >
>> > It definitely does not explain it. The wreckage that topo stuff causes is that
>> > it disables a cpu, but that really is not a reason for block/loop to explode.
>>
>> Right. Sadly I could not reproduce that error on my machine. But we can
>> at least start by fixing the 'obvious' problems and then maybe we get
>> more clues ;-)
>
> I'm able to reproduce by rejecting a cpu in that topology map function
> forcefully.
>
> That stuff explodes, because the block-mq code assumes that cpu_possible_mask
> has no holes.
>
> #define queue_for_each_ctx(q, ctx, i) \
> for ((i) = 0; (i) < (q)->nr_queues && \
> ({ ctx = per_cpu_ptr((q)->queue_ctx, (i)); 1; }); (i)++)
>
> is what makes that assumption about a consecutive possible mask.
>
> The cure for now is the patch below on top of PeterZ's patch.

No panic with both Peter's patch and yours.

Thanks all.

--
Xiong

>
> But we have to clarify and document whether holes in cpu_possible_mask are not
> allowed at all or if code like the above is simply broken.
>
> Thanks,
>
> tglx
> ---
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index 643dbdccf4bc..f2ed8a01f870 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -345,7 +345,6 @@ static void __init smp_init_package_map(void)
> continue;
> pr_warn("CPU %u APICId %x disabled\n", cpu, apicid);
> per_cpu(x86_bios_cpu_apicid, cpu) = BAD_APICID;
> - set_cpu_possible(cpu, false);
> set_cpu_present(cpu, false);
> }
> }
>
>
>
>
>
>
>
>

2016-03-18 04:12:10

by Mike Galbraith

[permalink] [raw]
Subject: Re: 4.5.0+ panic when setup loop device

On Thu, 2016-03-17 at 10:52 +0100, Peter Zijlstra wrote:

> Andreas; Borislav said to Cc you since you wrote all this.
> The issue is that Linux assumes:
>
> > nr_logical_cpus = nr_cores * nr_siblings

It also seems to now assume that if SMT is possible, it's enabled.

Below is my 8 socket DL980 G7, which has SMT turned off for RT testing,
booting NOPREEMPT master tuned for maximum bloat ala distro and getting
confused by me telling it (as always) nr_cpus=64. Bad juju ensues.

[ 0.216180] max_cores: 8, cpu_ids: 64, num_siblings: 2, coreid_bits: 5
[ 0.226593] smpboot: Max logical packages: 4 <== not
[ 0.233742] smpboot: APIC(0) Converting physical 0 to logical package 0
[ 0.244233] smpboot: APIC(20) Converting physical 1 to logical package 1
[ 0.253765] smpboot: APIC(40) Converting physical 2 to logical package 2
[ 0.264081] smpboot: APIC(60) Converting physical 3 to logical package 3
[ 0.274827] smpboot: APIC(80) Package 4 exceeds logical package map
[ 0.284705] smpboot: CPU 32 APICId 80 disabled
[ 0.292277] smpboot: APIC(a0) Package 5 exceeds logical package map
[ 0.302141] smpboot: CPU 40 APICId a0 disabled
[ 0.308607] smpboot: APIC(c0) Package 6 exceeds logical package map
[ 0.321682] smpboot: CPU 48 APICId c0 disabled
[ 0.328179] smpboot: APIC(e0) Package 7 exceeds logical package map
[ 0.337902] smpboot: CPU 56 APICId e0 disabled
[ 0.345695] DMAR: Host address width 40
[ 0.351511] DMAR: DRHD base: 0x000000b0100000 flags: 0x0
[ 0.360018] DMAR: dmar0: reg_base_addr b0100000 ver 1:0 cap c90780106f0462 ecap f0207e
[ 0.373342] DMAR: DRHD base: 0x000000a8000000 flags: 0x1
[ 0.383164] DMAR: dmar1: reg_base_addr a8000000 ver 1:0 cap c90780106f0462 ecap f0207e
[ 0.396475] DMAR: RMRR base: 0x0000007f7ee000 end: 0x0000007f7effff
[ 0.407255] DMAR: RMRR base: 0x0000007f7e7000 end: 0x0000007f7ecfff
[ 0.418136] DMAR: RMRR base: 0x0000007f62e000 end: 0x0000007f62ffff
[ 0.429787] DMAR: ATSR flags: 0x0
[ 0.434778] DMAR: ATSR flags: 0x0
[ 0.441624] DMAR-IR: IOAPIC id 10 under DRHD base 0xb0100000 IOMMU 0
[ 0.452716] DMAR-IR: IOAPIC id 8 under DRHD base 0xa8000000 IOMMU 1
[ 0.465782] DMAR-IR: IOAPIC id 0 under DRHD base 0xa8000000 IOMMU 1
[ 0.477123] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
[ 0.492918] DMAR-IR: Enabled IRQ remapping in x2apic mode
[ 0.502549] x2apic enabled
[ 0.506678] Switched APIC routing to cluster x2apic.
[ 0.519955] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[ 0.642858] smpboot: CPU0: Intel(R) Xeon(R) CPU X7560 @ 2.27GHz (family: 0x6, model: 0x2e, stepping: 0x6)
[ 0.668111] Performance Events: PEBS fmt1+, 16-deep LBR, Nehalem events, Broken BIOS detected, complain to your hardware vendor.
[ 0.694907] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR 38d is 330)
[ 0.713186] Intel PMU driver.
[ 0.719091] core: CPU erratum AAJ80 worked around
[ 0.731647] core: CPUID marked event: 'bus cycles' unavailable
[ 0.741499] ... version: 3
[ 0.747982] ... bit width: 48
[ 0.754109] ... generic registers: 4
[ 0.760980] ... value mask: 0000ffffffffffff
[ 0.769336] ... max period: 000000007fffffff
[ 0.776913] ... fixed-purpose events: 3
[ 0.783861] ... event mask: 000000070000000f
[ 0.793737] x86: Booting SMP configuration:
[ 0.800069] .... node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 #16 #17 #18 #19 #20 #21 #22 #23 #24 #25 #26 #27 #28 #29 #30 #31 #33 #34 #35 #36 #37 #38 #39 #41 #42 #43 #44 #45 #46 #47 #49 #50 #51 #5>
[ 4.717309] x86: Booted up 1 node, 60 CPUs
[ 4.724551] smpboot: Total of 60 processors activated (271280.00 BogoMIPS)
[ 5.007438] node 0 initialised, 1013474 pages in 36ms

2016-03-18 07:52:30

by Peter Zijlstra

[permalink] [raw]
Subject: Re: 4.5.0+ panic when setup loop device

On Fri, Mar 18, 2016 at 05:11:54AM +0100, Mike Galbraith wrote:
> On Thu, 2016-03-17 at 10:52 +0100, Peter Zijlstra wrote:
>
> > Andreas; Borislav said to Cc you since you wrote all this.
> > The issue is that Linux assumes:
> >
> > > nr_logical_cpus = nr_cores * nr_siblings
>
> It also seems to now assume that if SMT is possible, it's enabled.

Urgh..

What I think, with my pre wakeup brain, happens is that the CPUID
topology muck still reports 2 siblings (it tends to do that).

But the BIOS only reports APIC-IDs for all your cores, so our
nr_cpu_ids is reduced, while the nr_siblings count it not.

And them *boom*.

This is the same old problem that is nearly impossible to tell if HT is
enabled or not -- complete and utter trainwreck :/

I need to go make wake-up-juice and ponder wth to do about this.

2016-03-18 10:16:34

by Peter Zijlstra

[permalink] [raw]
Subject: Re: 4.5.0+ panic when setup loop device

On Fri, Mar 18, 2016 at 05:11:54AM +0100, Mike Galbraith wrote:
> On Thu, 2016-03-17 at 10:52 +0100, Peter Zijlstra wrote:
>
> > Andreas; Borislav said to Cc you since you wrote all this.
> > The issue is that Linux assumes:
> >
> > > nr_logical_cpus = nr_cores * nr_siblings
>
> It also seems to now assume that if SMT is possible, it's enabled.
>
> Below is my 8 socket DL980 G7, which has SMT turned off for RT testing,
> booting NOPREEMPT master tuned for maximum bloat ala distro and getting
> confused by me telling it (as always) nr_cpus=64. Bad juju ensues.

Ah, did you actually disable HT in the BIOS, or just skip the HT
enumeration by saying nr_cpus=64 (knowing that all the siblings are
last)?

In any case, Thomas has a clue and I'm going to test, but 4 socket
machine takes forever to boot, so might be a few minutes :/

2016-03-18 11:57:26

by Thomas Gleixner

[permalink] [raw]
Subject: Re: 4.5.0+ panic when setup loop device

On Fri, 18 Mar 2016, Mike Galbraith wrote:
> On Thu, 2016-03-17 at 10:52 +0100, Peter Zijlstra wrote:
>
> > Andreas; Borislav said to Cc you since you wrote all this.
> > The issue is that Linux assumes:
> >
> > > nr_logical_cpus = nr_cores * nr_siblings
>
> It also seems to now assume that if SMT is possible, it's enabled.
>
> Below is my 8 socket DL980 G7, which has SMT turned off for RT testing,
> booting NOPREEMPT master tuned for maximum bloat ala distro and getting
> confused by me telling it (as always) nr_cpus=64. Bad juju ensues.

:)

Does the patch below fix the wreckage?

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 643dbdccf4bc..c5ac71276076 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -319,7 +319,7 @@ static void __init smp_init_package_map(void)
* might change in the future....
*/
ncpus = boot_cpu_data.x86_max_cores * smp_num_siblings;
- __max_logical_packages = DIV_ROUND_UP(nr_cpu_ids, ncpus);
+ __max_logical_packages = DIV_ROUND_UP(total_cpus, ncpus);

/*
* Possibly larger than what we need as the number of apic ids per

2016-03-18 12:39:24

by Mike Galbraith

[permalink] [raw]
Subject: Re: 4.5.0+ panic when setup loop device

On Fri, 2016-03-18 at 11:15 +0100, Peter Zijlstra wrote:
> On Fri, Mar 18, 2016 at 05:11:54AM +0100, Mike Galbraith wrote:
> > On Thu, 2016-03-17 at 10:52 +0100, Peter Zijlstra wrote:
> >
> > > Andreas; Borislav said to Cc you since you wrote all this.
> > > The issue is that Linux assumes:
> > >
> > > > nr_logical_cpus = nr_cores * nr_siblings
> >
> > It also seems to now assume that if SMT is possible, it's enabled.
> >
> > Below is my 8 socket DL980 G7, which has SMT turned off for RT
> > testing,
> > booting NOPREEMPT master tuned for maximum bloat ala distro and
> > getting
> > confused by me telling it (as always) nr_cpus=64. Bad juju ensues.
>
> Ah, did you actually disable HT in the BIOS, or just skip the HT
> enumeration by saying nr_cpus=64 (knowing that all the siblings are
> last)?

It's disabled in BIOS.

> In any case, Thomas has a clue and I'm going to test, but 4 socket
> machine takes forever to boot, so might be a few minutes :/

His one-liner made my DL980 all better.

-Mike

2016-03-18 12:39:53

by Mike Galbraith

[permalink] [raw]
Subject: Re: 4.5.0+ panic when setup loop device

On Fri, 2016-03-18 at 12:55 +0100, Thomas Gleixner wrote:

> Does the patch below fix the wreckage?

Yup, all better.

> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index 643dbdccf4bc..c5ac71276076 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -319,7 +319,7 @@ static void __init smp_init_package_map(void)
> * might change in the future....
> */
> ncpus = boot_cpu_data.x86_max_cores * smp_num_siblings;
> - __max_logical_packages = DIV_ROUND_UP(nr_cpu_ids, ncpus);
> + __max_logical_packages = DIV_ROUND_UP(total_cpus, ncpus);
>
> /*
> * Possibly larger than what we need as the number of apic
> ids per

2016-03-18 13:32:44

by Peter Zijlstra

[permalink] [raw]
Subject: Re: 4.5.0+ panic when setup loop device

On Fri, Mar 18, 2016 at 01:39:16PM +0100, Mike Galbraith wrote:
> On Fri, 2016-03-18 at 11:15 +0100, Peter Zijlstra wrote:

> > Ah, did you actually disable HT in the BIOS, or just skip the HT
> > enumeration by saying nr_cpus=64 (knowing that all the siblings are
> > last)?
>
> It's disabled in BIOS.

OK, so I disabled HT in the BIOS too, on my IVB-EX

> > In any case, Thomas has a clue and I'm going to test, but 4 socket
> > machine takes forever to boot, so might be a few minutes :/
>
> His one-liner made my DL980 all better.

My machine is profoundly unhappy though; and since I have no nr_cpus=
nr_cpu_ids == total_cpus and his patch wouldn't do anything anyway.

[ 0.286838] max_cores: 15, cpu_ids: 60, num_siblings: 2, coreid_bits: 5
[ 0.293463] smpboot: Max logical packages: 2
[ 0.297733] smpboot: APIC(0) Converting physical 0 to logical package 0
[ 0.304346] smpboot: APIC(20) Converting physical 1 to logical package 1
[ 0.311047] smpboot: APIC(40) Package 2 exceeds logical package map
[ 0.317309] smpboot: CPU 30 APICId 40 disabled
[ 0.321757] smpboot: APIC(60) Package 3 exceeds logical package map
[ 0.328022] smpboot: CPU 45 APICId 60 disabled

This machine does exactly what I suspected yours did.


2016-03-18 14:07:54

by Mike Galbraith

[permalink] [raw]
Subject: Re: 4.5.0+ panic when setup loop device

On Fri, 2016-03-18 at 14:32 +0100, Peter Zijlstra wrote:
> On Fri, Mar 18, 2016 at 01:39:16PM +0100, Mike Galbraith wrote:
> > On Fri, 2016-03-18 at 11:15 +0100, Peter Zijlstra wrote:
>
> > > Ah, did you actually disable HT in the BIOS, or just skip the HT
> > > enumeration by saying nr_cpus=64 (knowing that all the siblings are
> > > last)?
> >
> > It's disabled in BIOS.
>
> OK, so I disabled HT in the BIOS too, on my IVB-EX
>
> > > In any case, Thomas has a clue and I'm going to test, but 4 socket
> > > machine takes forever to boot, so might be a few minutes :/
> >
> > His one-liner made my DL980 all better.
>
> My machine is profoundly unhappy though; and since I have no nr_cpus=
> nr_cpu_ids == total_cpus and his patch wouldn't do anything anyway.
>
> [ 0.286838] max_cores: 15, cpu_ids: 60, num_siblings: 2, coreid_bits: 5
> [ 0.293463] smpboot: Max logical packages: 2
> [ 0.297733] smpboot: APIC(0) Converting physical 0 to logical package 0
> [ 0.304346] smpboot: APIC(20) Converting physical 1 to logical package 1
> [ 0.311047] smpboot: APIC(40) Package 2 exceeds logical package map
> [ 0.317309] smpboot: CPU 30 APICId 40 disabled
> [ 0.321757] smpboot: APIC(60) Package 3 exceeds logical package map
> [ 0.328022] smpboot: CPU 45 APICId 60 disabled
>
> This machine does exactly what I suspected yours did.

Yup, that looks very familiar. I just booted without nr_cpus=64 to
make sure it's a happy camper both w/wo, and it says all is peachy.

-Mike

Subject: [tip:x86/urgent] x86/topology: Use total_cpus not nr_cpu_ids for logical packages

Commit-ID: 3e8db2246b434c6b18a6a9f09904038bddcf76c7
Gitweb: http://git.kernel.org/tip/3e8db2246b434c6b18a6a9f09904038bddcf76c7
Author: Thomas Gleixner <[email protected]>
AuthorDate: Fri, 18 Mar 2016 17:20:30 +0100
Committer: Thomas Gleixner <[email protected]>
CommitDate: Sat, 19 Mar 2016 10:26:40 +0100

x86/topology: Use total_cpus not nr_cpu_ids for logical packages

nr_cpu_ids can be limited on the command line via nr_cpus=. That can break the
logical package management because it results in a smaller number of packages,
but the cpus to online are occupying the full package space as the hyper
threads are enumerated after the physical cores typically.

total_cpus is the real possible cpu space not limited by nr_cpus command line
and gives us the proper number of packages.

Reported-by: Mike Galbraith <[email protected]>
Fixes: 1f12e32f4cd5 ("x86/topology: Create logical package id")
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Xiong Zhou <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Andreas Herrmann <[email protected]>
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1603181254330.3978@nanos
---
arch/x86/kernel/smpboot.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 81e6a43..b2c99f8 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -325,9 +325,14 @@ static void __init smp_init_package_map(void)
* By not including this we'll sometimes over-estimate the number of
* logical packages by the amount of !present siblings, but this is
* still better than MAX_LOCAL_APIC.
+ *
+ * We use total_cpus not nr_cpu_ids because nr_cpu_ids can be limited
+ * on the command line leading to a similar issue as the HT disable
+ * problem because the hyperthreads are usually enumerated after the
+ * primary cores.
*/
ncpus = boot_cpu_data.x86_max_cores;
- __max_logical_packages = DIV_ROUND_UP(nr_cpu_ids, ncpus);
+ __max_logical_packages = DIV_ROUND_UP(total_cpus, ncpus);

/*
* Possibly larger than what we need as the number of apic ids per

Subject: [tip:x86/urgent] x86/topology: Fix AMD core count

Commit-ID: ee6825c80e870fff1a370c718ec77022ade0889b
Gitweb: http://git.kernel.org/tip/ee6825c80e870fff1a370c718ec77022ade0889b
Author: Peter Zijlstra <[email protected]>
AuthorDate: Fri, 25 Mar 2016 15:52:34 +0100
Committer: Thomas Gleixner <[email protected]>
CommitDate: Tue, 29 Mar 2016 10:45:04 +0200

x86/topology: Fix AMD core count

It turns out AMD gets x86_max_cores wrong when there are compute
units.

The issue is that Linux assumes:

nr_logical_cpus = nr_cores * nr_siblings

But AMD reports its CU unit as 2 cores, but then sets num_smp_siblings
to 2 as well.

Boris: fixup ras/mce_amd_inj.c too, to compute the Node Base Core
properly, according to the new nomenclature.

Fixes: 1f12e32f4cd5 ("x86/topology: Create logical package id")
Reported-by: Xiong Zhou <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: Andreas Herrmann <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>

---
arch/x86/include/asm/smp.h | 1 +
arch/x86/kernel/cpu/amd.c | 8 ++++----
arch/x86/ras/mce_amd_inj.c | 3 ++-
3 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h
index 20a3de5..66b0573 100644
--- a/arch/x86/include/asm/smp.h
+++ b/arch/x86/include/asm/smp.h
@@ -155,6 +155,7 @@ static inline int wbinvd_on_all_cpus(void)
wbinvd();
return 0;
}
+#define smp_num_siblings 1
#endif /* CONFIG_SMP */

extern unsigned disabled_cpus;
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 6e47e3a..4d0087f 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -312,9 +312,9 @@ static void amd_get_topology(struct cpuinfo_x86 *c)
node_id = ecx & 7;

/* get compute unit information */
- smp_num_siblings = ((ebx >> 8) & 3) + 1;
+ cores_per_cu = smp_num_siblings = ((ebx >> 8) & 3) + 1;
+ c->x86_max_cores /= smp_num_siblings;
c->compute_unit_id = ebx & 0xff;
- cores_per_cu += ((ebx >> 8) & 3);
} else if (cpu_has(c, X86_FEATURE_NODEID_MSR)) {
u64 value;

@@ -329,8 +329,8 @@ static void amd_get_topology(struct cpuinfo_x86 *c)
u32 cus_per_node;

set_cpu_cap(c, X86_FEATURE_AMD_DCM);
- cores_per_node = c->x86_max_cores / nodes_per_socket;
- cus_per_node = cores_per_node / cores_per_cu;
+ cus_per_node = c->x86_max_cores / nodes_per_socket;
+ cores_per_node = cus_per_node * cores_per_cu;

/* store NodeID, use llc_shared_map to store sibling info */
per_cpu(cpu_llc_id, cpu) = node_id;
diff --git a/arch/x86/ras/mce_amd_inj.c b/arch/x86/ras/mce_amd_inj.c
index 55d38cf..9e02dca 100644
--- a/arch/x86/ras/mce_amd_inj.c
+++ b/arch/x86/ras/mce_amd_inj.c
@@ -20,6 +20,7 @@
#include <linux/pci.h>

#include <asm/mce.h>
+#include <asm/smp.h>
#include <asm/amd_nb.h>
#include <asm/irq_vectors.h>

@@ -206,7 +207,7 @@ static u32 get_nbc_for_node(int node_id)
struct cpuinfo_x86 *c = &boot_cpu_data;
u32 cores_per_node;

- cores_per_node = c->x86_max_cores / amd_get_nodes_per_socket();
+ cores_per_node = (c->x86_max_cores * smp_num_siblings) / amd_get_nodes_per_socket();

return cores_per_node * node_id;
}