2015-07-23 22:45:21

by Philip Müller

[permalink] [raw]
Subject: [linux41] Kernel panic at i686

Hi all,

I started to test linux 4.1 series with rc6. However, I was never able
to boot that kernel in i686 architecture. Trying it again with
VirtualBox gave me more conclusions. Using one core it simply boots up.
Using more than one CPU core it crashes with:

Failed to access perfctr msr (MSR c0010007 is 0)

task: f58e0000 ti: f58e8000 task.ti: f58e800
EIP: 0060:[<c135a903>] EFLAGS: 00010206 CPU: 0
EIP is at free_cache_attributes+0x83/0xd0
EAX: 00000001 EBX: f589d46c ECX: 00000090 EDX: 360c2000
ESI: 00000000 EDI: c1724a80 EBP: f58e9ec0 ESP: f58e9ea0
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
CR0: 8005003b CR2: 000000ac CR3: 01731000 CR4: 000006d0

In more rich detail you can find that problem on my bug-tracker for
Manjaro Linux:

https://github.com/manjaro/packages-core/issues/14

I just want to know if you are aware of it. With current 4.1.3 release I
still face that issue ...

kind regards
Philip Müller
--------------------------
Manjaro Project-Lead


2015-07-26 06:18:39

by Philip Müller

[permalink] [raw]
Subject: [linux41] regression with 'x86/cacheinfo: Move cacheinfo sysfs code to generic infrastructure' on AMD i686

Hi Guenter, Sudeep,

It now came down to 'x86/cacheinfo: Move cacheinfo sysfs code to generic
infrastructure'[1] as you can see here[2].

The facts are:

- You can't boot on i686 only with more than one CPU core on AMD
hardware (x86_64 however works)
- Using Ubuntu config[3] it boots but creates an kernel >= 1 GB in size
on Manjaro
- By reverting 0d55ba4[4] the kernel boots.

So we have to find out what is causing this issue[5].

kind regards
Philip Müller
--------------------------
Manjaro Project-Lead

[1]
https://github.com/torvalds/linux/commit/0d55ba46bfbee64fd2b492b87bfe2ec172e7b056
[2]
https://raw.githubusercontent.com/philmmanjaro/linux41/master/git-bisect.txt
[3]
https://raw.githubusercontent.com/philmmanjaro/linux41/master/config.4.1.3-040103-generic
[4]
https://github.com/manjaro/packages-core/commit/f7b77f3e84295a6313a9181d520fb48e60453b64
[5] https://github.com/manjaro/packages-core/issues/14


On 24.07.2015 00:23, Philip Müller wrote:
> Hi all,
>
> I started to test linux 4.1 series with rc6. However, I was never able
> to boot that kernel in i686 architecture. Trying it again with
> VirtualBox gave me more conclusions. Using one core it simply boots up.
> Using more than one CPU core it crashes with:
>
> Failed to access perfctr msr (MSR c0010007 is 0)
>
> task: f58e0000 ti: f58e8000 task.ti: f58e800
> EIP: 0060:[<c135a903>] EFLAGS: 00010206 CPU: 0
> EIP is at free_cache_attributes+0x83/0xd0
> EAX: 00000001 EBX: f589d46c ECX: 00000090 EDX: 360c2000
> ESI: 00000000 EDI: c1724a80 EBP: f58e9ec0 ESP: f58e9ea0
> DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> CR0: 8005003b CR2: 000000ac CR3: 01731000 CR4: 000006d0
>
> In more rich detail you can find that problem on my bug-tracker for
> Manjaro Linux:
>
> https://github.com/manjaro/packages-core/issues/14
>
> I just want to know if you are aware of it. With current 4.1.3 release I
> still face that issue ...
>
> kind regards
> Philip Müller
> --------------------------
> Manjaro Project-Lead
>

2015-07-26 08:14:08

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [linux41] regression with 'x86/cacheinfo: Move cacheinfo sysfs code to generic infrastructure' on AMD i686

On Sun, 26 Jul 2015, Philip Müller wrote:
> > task: f58e0000 ti: f58e8000 task.ti: f58e800
> > EIP: 0060:[<c135a903>] EFLAGS: 00010206 CPU: 0
> > EIP is at free_cache_attributes+0x83/0xd0
> > EAX: 00000001 EBX: f589d46c ECX: 00000090 EDX: 360c2000
> > ESI: 00000000 EDI: c1724a80 EBP: f58e9ec0 ESP: f58e9ea0
> > DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> > CR0: 8005003b CR2: 000000ac CR3: 01731000 CR4: 000006d0

That's a trivial NULL pointer dereference in the error/cleanup
path. Patch below should fix it.

Thanks,

tglx
---
diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
index 764280a91776..f09b106d8b81 100644
--- a/drivers/base/cacheinfo.c
+++ b/drivers/base/cacheinfo.c
@@ -159,6 +159,9 @@ static void cache_shared_cpu_map_remove(unsigned int cpu)

static void free_cache_attributes(unsigned int cpu)
{
+ if (!per_cpu_cacheinfo(cpu))
+ return;
+
cache_shared_cpu_map_remove(cpu);

kfree(per_cpu_cacheinfo(cpu));
@@ -514,8 +517,7 @@ static int cacheinfo_cpu_callback(struct notifier_block *nfb,
break;
case CPU_DEAD:
cache_remove_dev(cpu);
- if (per_cpu_cacheinfo(cpu))
- free_cache_attributes(cpu);
+ free_cache_attributes(cpu);
break;
}
return notifier_from_errno(rc);

2015-07-26 08:42:05

by Borislav Petkov

[permalink] [raw]
Subject: Re: [linux41] regression with 'x86/cacheinfo: Move cacheinfo sysfs code to generic infrastructure' on AMD i686

On Sun, Jul 26, 2015 at 10:13:45AM +0200, Thomas Gleixner wrote:
> On Sun, 26 Jul 2015, Philip Müller wrote:
> > > task: f58e0000 ti: f58e8000 task.ti: f58e800
> > > EIP: 0060:[<c135a903>] EFLAGS: 00010206 CPU: 0
> > > EIP is at free_cache_attributes+0x83/0xd0
> > > EAX: 00000001 EBX: f589d46c ECX: 00000090 EDX: 360c2000
> > > ESI: 00000000 EDI: c1724a80 EBP: f58e9ec0 ESP: f58e9ea0
> > > DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> > > CR0: 8005003b CR2: 000000ac CR3: 01731000 CR4: 000006d0
>
> That's a trivial NULL pointer dereference in the error/cleanup
> path. Patch below should fix it.

Well, I got a bit different, and of course totally untested possible
solution:

cache_shared_cpu_map_setup() does check sib_cpu_ci->info_list before
setting cpumask bits while cache_shared_cpu_map_remove() doesn't. Ballancing
this out would mean:

---
diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
index 764280a91776..8a4546dc25e3 100644
--- a/drivers/base/cacheinfo.c
+++ b/drivers/base/cacheinfo.c
@@ -148,7 +148,11 @@ static void cache_shared_cpu_map_remove(unsigned int cpu)

if (sibling == cpu) /* skip itself */
continue;
+
sib_cpu_ci = get_cpu_cacheinfo(sibling);
+ if (!sib_cpu_ci->info_list)
+ continue;
+
sib_leaf = sib_cpu_ci->info_list + index;
cpumask_clear_cpu(cpu, &sib_leaf->shared_cpu_map);
cpumask_clear_cpu(sibling, &this_leaf->shared_cpu_map);
---

Now Philip can have some more fun testing :-)

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--

2015-07-26 10:55:01

by Philip Müller

[permalink] [raw]
Subject: Re: [linux41] regression with 'x86/cacheinfo: Move cacheinfo sysfs code to generic infrastructure' on AMD i686

Hi Borislav,

I can confirm your patch working. However, it might be good to use yours
and Thomas' in combination to solve this properly.

kind regards
Philip

On 26.07.2015 10:41, Borislav Petkov wrote:
> On Sun, Jul 26, 2015 at 10:13:45AM +0200, Thomas Gleixner wrote:
>> On Sun, 26 Jul 2015, Philip Müller wrote:
>>>> task: f58e0000 ti: f58e8000 task.ti: f58e800
>>>> EIP: 0060:[<c135a903>] EFLAGS: 00010206 CPU: 0
>>>> EIP is at free_cache_attributes+0x83/0xd0
>>>> EAX: 00000001 EBX: f589d46c ECX: 00000090 EDX: 360c2000
>>>> ESI: 00000000 EDI: c1724a80 EBP: f58e9ec0 ESP: f58e9ea0
>>>> DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
>>>> CR0: 8005003b CR2: 000000ac CR3: 01731000 CR4: 000006d0
>>
>> That's a trivial NULL pointer dereference in the error/cleanup
>> path. Patch below should fix it.
>
> Well, I got a bit different, and of course totally untested possible
> solution:
>
> cache_shared_cpu_map_setup() does check sib_cpu_ci->info_list before
> setting cpumask bits while cache_shared_cpu_map_remove() doesn't. Ballancing
> this out would mean:
>
> ---
> diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
> index 764280a91776..8a4546dc25e3 100644
> --- a/drivers/base/cacheinfo.c
> +++ b/drivers/base/cacheinfo.c
> @@ -148,7 +148,11 @@ static void cache_shared_cpu_map_remove(unsigned int cpu)
>
> if (sibling == cpu) /* skip itself */
> continue;
> +
> sib_cpu_ci = get_cpu_cacheinfo(sibling);
> + if (!sib_cpu_ci->info_list)
> + continue;
> +
> sib_leaf = sib_cpu_ci->info_list + index;
> cpumask_clear_cpu(cpu, &sib_leaf->shared_cpu_map);
> cpumask_clear_cpu(sibling, &this_leaf->shared_cpu_map);
> ---
>
> Now Philip can have some more fun testing :-)
>

2015-07-26 14:42:51

by Borislav Petkov

[permalink] [raw]
Subject: Re: [linux41] regression with 'x86/cacheinfo: Move cacheinfo sysfs code to generic infrastructure' on AMD i686

On Sun, Jul 26, 2015 at 12:54:55PM +0200, Philip Müller wrote:
> I can confirm your patch working. However, it might be good to use yours
> and Thomas' in combination to solve this properly.

Please do not top-post.

We could use Thomas' too although from looking at it,
detect_cache_attributes() allocates a per-CPU per_cpu_cacheinfo thing
for each CPU. By the time we hit cache_shared_cpu_map_remove() in
free_cache_attributes(), those per_cpu_cacheinfo(cpu) things are still
allocated. We kfree them in the next step only.

But I like the moving of the check from the CPU hotplug callback to
free_cache_attributes().

So I'll merge the two patches and write up a proper commit message,
unless someone objects.

I'll add your Tested-by too.

Thanks.

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--

2015-07-26 15:59:24

by Philip Müller

[permalink] [raw]
Subject: Re: [linux41] regression with 'x86/cacheinfo: Move cacheinfo sysfs code to generic infrastructure' on AMD i686

Hi Borislav,

I'm fine with that decision. I tested your patch alone and the
combination with Thomas' changes. Both work to solve this problem.

Do whatever suits best for this matter. Thx to you too for providing
solutions so fast.

kind regards
Philip

p.s. what do you mean by top-post?

Am 26.07.2015 um 16:42 schrieb Borislav Petkov:
> On Sun, Jul 26, 2015 at 12:54:55PM +0200, Philip Müller wrote:
>> I can confirm your patch working. However, it might be good to use yours
>> and Thomas' in combination to solve this properly.
>
> Please do not top-post.
>
> We could use Thomas' too although from looking at it,
> detect_cache_attributes() allocates a per-CPU per_cpu_cacheinfo thing
> for each CPU. By the time we hit cache_shared_cpu_map_remove() in
> free_cache_attributes(), those per_cpu_cacheinfo(cpu) things are still
> allocated. We kfree them in the next step only.
>
> But I like the moving of the check from the CPU hotplug callback to
> free_cache_attributes().
>
> So I'll merge the two patches and write up a proper commit message,
> unless someone objects.
>
> I'll add your Tested-by too.
>
> Thanks.
>

2015-07-26 16:11:13

by Guenter Roeck

[permalink] [raw]
Subject: Re: [linux41] regression with 'x86/cacheinfo: Move cacheinfo sysfs code to generic infrastructure' on AMD i686

On 07/26/2015 08:59 AM, Philip Müller wrote:
> Hi Borislav,
>
> I'm fine with that decision. I tested your patch alone and the
> combination with Thomas' changes. Both work to solve this problem.
>
> Do whatever suits best for this matter. Thx to you too for providing
> solutions so fast.
>
> kind regards
> Philip
>
> p.s. what do you mean by top-post?
>

What you just did ;-).

http://ck.wikia.com/wiki/TopPosting

Guenter

> Am 26.07.2015 um 16:42 schrieb Borislav Petkov:
>> On Sun, Jul 26, 2015 at 12:54:55PM +0200, Philip Müller wrote:
>>> I can confirm your patch working. However, it might be good to use yours
>>> and Thomas' in combination to solve this properly.
>>
>> Please do not top-post.
>>
>> We could use Thomas' too although from looking at it,
>> detect_cache_attributes() allocates a per-CPU per_cpu_cacheinfo thing
>> for each CPU. By the time we hit cache_shared_cpu_map_remove() in
>> free_cache_attributes(), those per_cpu_cacheinfo(cpu) things are still
>> allocated. We kfree them in the next step only.
>>
>> But I like the moving of the check from the CPU hotplug callback to
>> free_cache_attributes().
>>
>> So I'll merge the two patches and write up a proper commit message,
>> unless someone objects.
>>
>> I'll add your Tested-by too.
>>
>> Thanks.
>>
>
>

2015-07-27 07:58:13

by Borislav Petkov

[permalink] [raw]
Subject: [PATCH] cpu/cacheinfo: Fix teardown path

From: Borislav Petkov <[email protected]>
Date: Mon, 27 Jul 2015 08:36:27 +0200
Subject: [PATCH] cpu/cacheinfo: Fix teardown path
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Philip Müller reported a hang when booting 32-bit 4.1 kernel on an AMD
box. A fragment of the splat was enough to pinpoint the issue:

task: f58e0000 ti: f58e8000 task.ti: f58e800
EIP: 0060:[<c135a903>] EFLAGS: 00010206 CPU: 0
EIP is at free_cache_attributes+0x83/0xd0
EAX: 00000001 EBX: f589d46c ECX: 00000090 EDX: 360c2000
ESI: 00000000 EDI: c1724a80 EBP: f58e9ec0 ESP: f58e9ea0
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
CR0: 8005003b CR2: 000000ac CR3: 01731000 CR4: 000006d0

cache_shared_cpu_map_setup() did check sibling CPUs cacheinfo descriptor
while the respective teardown path cache_shared_cpu_map_remove() didn't.
Fix that.

>From tglx's version: to be on the safe side, move the cacheinfo
descriptor check to free_cache_attributes(), thus cleaning up the
hotplug path a little and making this even more robust.

Reported-by: Philip Müller <[email protected]>
Cc: <[email protected]> # 4.1
Cc: Andre Przywara <[email protected]>
Cc: Guenter Roeck <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: Philip Müller <[email protected]>
Cc: Sudeep Holla <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Borislav Petkov <[email protected]>
---

Moin Thomas,

I've merged both patches and tagged it for stable. Which means,
tip-urgent.

Thanks.

drivers/base/cacheinfo.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
index 764280a91776..e9fd32e91668 100644
--- a/drivers/base/cacheinfo.c
+++ b/drivers/base/cacheinfo.c
@@ -148,7 +148,11 @@ static void cache_shared_cpu_map_remove(unsigned int cpu)

if (sibling == cpu) /* skip itself */
continue;
+
sib_cpu_ci = get_cpu_cacheinfo(sibling);
+ if (!sib_cpu_ci->info_list)
+ continue;
+
sib_leaf = sib_cpu_ci->info_list + index;
cpumask_clear_cpu(cpu, &sib_leaf->shared_cpu_map);
cpumask_clear_cpu(sibling, &this_leaf->shared_cpu_map);
@@ -159,6 +163,9 @@ static void cache_shared_cpu_map_remove(unsigned int cpu)

static void free_cache_attributes(unsigned int cpu)
{
+ if (!per_cpu_cacheinfo(cpu))
+ return;
+
cache_shared_cpu_map_remove(cpu);

kfree(per_cpu_cacheinfo(cpu));
@@ -514,8 +521,7 @@ static int cacheinfo_cpu_callback(struct notifier_block *nfb,
break;
case CPU_DEAD:
cache_remove_dev(cpu);
- if (per_cpu_cacheinfo(cpu))
- free_cache_attributes(cpu);
+ free_cache_attributes(cpu);
break;
}
return notifier_from_errno(rc);
--
2.5.0.rc2.28.g6003e7f

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--

2015-07-27 08:56:23

by Sudeep Holla

[permalink] [raw]
Subject: Re: [PATCH] cpu/cacheinfo: Fix teardown path



On 27/07/15 08:58, Borislav Petkov wrote:
> From: Borislav Petkov <[email protected]>
> Date: Mon, 27 Jul 2015 08:36:27 +0200
> Subject: [PATCH] cpu/cacheinfo: Fix teardown path
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
>
> Philip Müller reported a hang when booting 32-bit 4.1 kernel on an AMD
> box. A fragment of the splat was enough to pinpoint the issue:
>
> task: f58e0000 ti: f58e8000 task.ti: f58e800
> EIP: 0060:[<c135a903>] EFLAGS: 00010206 CPU: 0
> EIP is at free_cache_attributes+0x83/0xd0
> EAX: 00000001 EBX: f589d46c ECX: 00000090 EDX: 360c2000
> ESI: 00000000 EDI: c1724a80 EBP: f58e9ec0 ESP: f58e9ea0
> DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> CR0: 8005003b CR2: 000000ac CR3: 01731000 CR4: 000006d0
>
> cache_shared_cpu_map_setup() did check sibling CPUs cacheinfo descriptor
> while the respective teardown path cache_shared_cpu_map_remove() didn't.
> Fix that.
>
> From tglx's version: to be on the safe side, move the cacheinfo
> descriptor check to free_cache_attributes(), thus cleaning up the
> hotplug path a little and making this even more robust.
>
> Reported-by: Philip Müller <[email protected]>
> Cc: <[email protected]> # 4.1
> Cc: Andre Przywara <[email protected]>
> Cc: Guenter Roeck <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: Philip Müller <[email protected]>
> Cc: Sudeep Holla <[email protected]>

Looks good to me. If not too late
Acked-by: Sudeep Holla <[email protected]>

Regards,
Sudeep

2015-07-27 11:11:18

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] cpu/cacheinfo: Fix teardown path



On Mon, 27 Jul 2015, Borislav Petkov wrote:

> From: Borislav Petkov <[email protected]>
> Date: Mon, 27 Jul 2015 08:36:27 +0200
> Subject: [PATCH] cpu/cacheinfo: Fix teardown path
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
>
> Philip Müller reported a hang when booting 32-bit 4.1 kernel on an AMD
> box. A fragment of the splat was enough to pinpoint the issue:
>
> task: f58e0000 ti: f58e8000 task.ti: f58e800
> EIP: 0060:[<c135a903>] EFLAGS: 00010206 CPU: 0
> EIP is at free_cache_attributes+0x83/0xd0
> EAX: 00000001 EBX: f589d46c ECX: 00000090 EDX: 360c2000
> ESI: 00000000 EDI: c1724a80 EBP: f58e9ec0 ESP: f58e9ea0
> DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> CR0: 8005003b CR2: 000000ac CR3: 01731000 CR4: 000006d0
>
> cache_shared_cpu_map_setup() did check sibling CPUs cacheinfo descriptor
> while the respective teardown path cache_shared_cpu_map_remove() didn't.
> Fix that.
>
> >From tglx's version: to be on the safe side, move the cacheinfo
> descriptor check to free_cache_attributes(), thus cleaning up the
> hotplug path a little and making this even more robust.
>
> Reported-by: Philip Müller <[email protected]>
> Cc: <[email protected]> # 4.1
> Cc: Andre Przywara <[email protected]>
> Cc: Guenter Roeck <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: Philip Müller <[email protected]>
> Cc: Sudeep Holla <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Link: https://lkml.kernel.org/r/[email protected]
> Signed-off-by: Borislav Petkov <[email protected]>
> ---
>
> Moin Thomas,
>
> I've merged both patches and tagged it for stable. Which means,
> tip-urgent.

Reviewed-by: Thomas Gleixner <[email protected]>

2015-07-27 18:49:35

by Philip Müller

[permalink] [raw]
Subject: Re: [PATCH] cpu/cacheinfo: Fix teardown path

Am 27.07.2015 um 13:10 schrieb Thomas Gleixner:
>> ---
>>
>> Moin Thomas,
>>
>> I've merged both patches and tagged it for stable. Which means,
>> tip-urgent.
>
> Reviewed-by: Thomas Gleixner <[email protected]>
>

Hi Borislav,

I also reviewed your new code and also tested it.

Acked-by: Philip Müller <[email protected]>

Subject: [tip:x86/urgent] x86/cpu/cacheinfo: Fix teardown path

Commit-ID: 680ac028240f8747f31c03986fbcf18b2b521e93
Gitweb: http://git.kernel.org/tip/680ac028240f8747f31c03986fbcf18b2b521e93
Author: Borislav Petkov <[email protected]>
AuthorDate: Mon, 27 Jul 2015 09:58:05 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Wed, 5 Aug 2015 10:08:17 +0200

x86/cpu/cacheinfo: Fix teardown path

Philip Müller reported a hang when booting 32-bit 4.1 kernel on
an AMD box. A fragment of the splat was enough to pinpoint the
issue:

task: f58e0000 ti: f58e8000 task.ti: f58e800
EIP: 0060:[<c135a903>] EFLAGS: 00010206 CPU: 0
EIP is at free_cache_attributes+0x83/0xd0
EAX: 00000001 EBX: f589d46c ECX: 00000090 EDX: 360c2000
ESI: 00000000 EDI: c1724a80 EBP: f58e9ec0 ESP: f58e9ea0
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
CR0: 8005003b CR2: 000000ac CR3: 01731000 CR4: 000006d0

cache_shared_cpu_map_setup() did check sibling CPUs cacheinfo
descriptor while the respective teardown path
cache_shared_cpu_map_remove() didn't. Fix that.

>From tglx's version: to be on the safe side, move the cacheinfo
descriptor check to free_cache_attributes(), thus cleaning up
the hotplug path a little and making this even more robust.

Reported-by: Philip Müller <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: <[email protected]> # v4.1+
Cc: Andre Przywara <[email protected]>
Cc: Guenter Roeck <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sudeep Holla <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
drivers/base/cacheinfo.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
index 764280a..e9fd32e 100644
--- a/drivers/base/cacheinfo.c
+++ b/drivers/base/cacheinfo.c
@@ -148,7 +148,11 @@ static void cache_shared_cpu_map_remove(unsigned int cpu)

if (sibling == cpu) /* skip itself */
continue;
+
sib_cpu_ci = get_cpu_cacheinfo(sibling);
+ if (!sib_cpu_ci->info_list)
+ continue;
+
sib_leaf = sib_cpu_ci->info_list + index;
cpumask_clear_cpu(cpu, &sib_leaf->shared_cpu_map);
cpumask_clear_cpu(sibling, &this_leaf->shared_cpu_map);
@@ -159,6 +163,9 @@ static void cache_shared_cpu_map_remove(unsigned int cpu)

static void free_cache_attributes(unsigned int cpu)
{
+ if (!per_cpu_cacheinfo(cpu))
+ return;
+
cache_shared_cpu_map_remove(cpu);

kfree(per_cpu_cacheinfo(cpu));
@@ -514,8 +521,7 @@ static int cacheinfo_cpu_callback(struct notifier_block *nfb,
break;
case CPU_DEAD:
cache_remove_dev(cpu);
- if (per_cpu_cacheinfo(cpu))
- free_cache_attributes(cpu);
+ free_cache_attributes(cpu);
break;
}
return notifier_from_errno(rc);

2015-08-08 08:46:08

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH] cpu/cacheinfo: Fix teardown path

On Mon, Jul 27, 2015 at 09:58:05AM +0200, Borislav Petkov wrote:
> Philip Müller reported a hang when booting 32-bit 4.1 kernel on an AMD
> box. A fragment of the splat was enough to pinpoint the issue:

Bah, this goes to Greg and not to tip. Anyway, here's a version with
updated tags.

Greg, please queue for 4.2 as it fixes a hang.

Thanks.

---
From: Borislav Petkov <[email protected]>
Date: Mon, 27 Jul 2015 08:36:27 +0200
Subject: [PATCH] cpu/cacheinfo: Fix teardown path
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Philip Müller reported a hang when booting 32-bit 4.1 kernel on an AMD
box. A fragment of the splat was enough to pinpoint the issue:

task: f58e0000 ti: f58e8000 task.ti: f58e800
EIP: 0060:[<c135a903>] EFLAGS: 00010206 CPU: 0
EIP is at free_cache_attributes+0x83/0xd0
EAX: 00000001 EBX: f589d46c ECX: 00000090 EDX: 360c2000
ESI: 00000000 EDI: c1724a80 EBP: f58e9ec0 ESP: f58e9ea0
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
CR0: 8005003b CR2: 000000ac CR3: 01731000 CR4: 000006d0

cache_shared_cpu_map_setup() did check sibling CPUs cacheinfo descriptor
while the respective teardown path cache_shared_cpu_map_remove() didn't.
Fix that.

>From tglx's version: to be on the safe side, move the cacheinfo
descriptor check to free_cache_attributes(), thus cleaning up the
hotplug path a little and making this even more robust.

Reported-and-tested-by: Philip Müller <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Acked-by: Sudeep Holla <[email protected]>
Cc: <[email protected]> # 4.1
Cc: Andre Przywara <[email protected]>
Cc: Guenter Roeck <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: Philip Müller <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Borislav Petkov <[email protected]>
---
drivers/base/cacheinfo.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
index 764280a91776..e9fd32e91668 100644
--- a/drivers/base/cacheinfo.c
+++ b/drivers/base/cacheinfo.c
@@ -148,7 +148,11 @@ static void cache_shared_cpu_map_remove(unsigned int cpu)

if (sibling == cpu) /* skip itself */
continue;
+
sib_cpu_ci = get_cpu_cacheinfo(sibling);
+ if (!sib_cpu_ci->info_list)
+ continue;
+
sib_leaf = sib_cpu_ci->info_list + index;
cpumask_clear_cpu(cpu, &sib_leaf->shared_cpu_map);
cpumask_clear_cpu(sibling, &this_leaf->shared_cpu_map);
@@ -159,6 +163,9 @@ static void cache_shared_cpu_map_remove(unsigned int cpu)

static void free_cache_attributes(unsigned int cpu)
{
+ if (!per_cpu_cacheinfo(cpu))
+ return;
+
cache_shared_cpu_map_remove(cpu);

kfree(per_cpu_cacheinfo(cpu));
@@ -514,8 +521,7 @@ static int cacheinfo_cpu_callback(struct notifier_block *nfb,
break;
case CPU_DEAD:
cache_remove_dev(cpu);
- if (per_cpu_cacheinfo(cpu))
- free_cache_attributes(cpu);
+ free_cache_attributes(cpu);
break;
}
return notifier_from_errno(rc);
--
2.5.0.rc2.28.g6003e7f

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--

2015-08-08 15:41:58

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH] cpu/cacheinfo: Fix teardown path

On Sat, Aug 08, 2015 at 10:46:02AM +0200, Borislav Petkov wrote:
> On Mon, Jul 27, 2015 at 09:58:05AM +0200, Borislav Petkov wrote:
> > Philip M?ller reported a hang when booting 32-bit 4.1 kernel on an AMD
> > box. A fragment of the splat was enough to pinpoint the issue:
>
> Bah, this goes to Greg and not to tip. Anyway, here's a version with
> updated tags.
>
> Greg, please queue for 4.2 as it fixes a hang.

What commit caused this issue?

And it's a bit late for 4.2, as you say 4.1 is also affected, I'll wait
for 4.3-rc1 to give this a chance to get some testing.

thanks,

greg k-h

2015-08-08 18:23:59

by Philip Müller

[permalink] [raw]
Subject: Re: [PATCH] cpu/cacheinfo: Fix teardown path

Hi Greg,

I bi-sected it to following commit:

0d55ba46bfbee64fd2b492b87bfe2ec172e7b056 is the first bad commit
commit 0d55ba46bfbee64fd2b492b87bfe2ec172e7b056
Author: Sudeep Holla <[email protected]>
Date: Wed Mar 4 12:00:16 2015 +0000

x86/cacheinfo: Move cacheinfo sysfs code to generic infrastructure

You can follow it on my github repo in rich detail:

https://github.com/philmmanjaro/linux41/blob/master/git-bisect.txt

kind regards
Philip M?ller

Am 08.08.2015 um 17:41 schrieb Greg KH:
> On Sat, Aug 08, 2015 at 10:46:02AM +0200, Borislav Petkov wrote:
>> On Mon, Jul 27, 2015 at 09:58:05AM +0200, Borislav Petkov wrote:
>>> Philip M?ller reported a hang when booting 32-bit 4.1 kernel on an AMD
>>> box. A fragment of the splat was enough to pinpoint the issue:
>>
>> Bah, this goes to Greg and not to tip. Anyway, here's a version with
>> updated tags.
>>
>> Greg, please queue for 4.2 as it fixes a hang.
>
> What commit caused this issue?
>
> And it's a bit late for 4.2, as you say 4.1 is also affected, I'll wait
> for 4.3-rc1 to give this a chance to get some testing.
>
> thanks,
>
> greg k-h
>

2015-08-08 19:53:44

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH] cpu/cacheinfo: Fix teardown path

On Sat, Aug 08, 2015 at 08:23:49PM +0200, Philip Müller wrote:
> Hi Greg,
>
> I bi-sected it to following commit:
>
> 0d55ba46bfbee64fd2b492b87bfe2ec172e7b056 is the first bad commit
> commit 0d55ba46bfbee64fd2b492b87bfe2ec172e7b056
> Author: Sudeep Holla <[email protected]>
> Date: Wed Mar 4 12:00:16 2015 +0000
>
> x86/cacheinfo: Move cacheinfo sysfs code to generic infrastructure
>
> You can follow it on my github repo in rich detail:
>
> https://github.com/philmmanjaro/linux41/blob/master/git-bisect.txt

Philip,

what is with you and top-posting? How hard is it not to do it?!

Please stop with the top-posting already.

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--

2015-08-08 19:54:10

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH] cpu/cacheinfo: Fix teardown path

On Sat, Aug 08, 2015 at 08:41:56AM -0700, Greg KH wrote:
> What commit caused this issue?

Apparently

0d55ba46bfbe ("x86/cacheinfo: Move cacheinfo sysfs code to generic infrastructure")

Looks like moving x86 to the generic cacheinfo stuff uncovered this
shortcoming there...

> And it's a bit late for 4.2, as you say 4.1 is also affected, I'll wait
> for 4.3-rc1 to give this a chance to get some testing.

Right, I guess that's fine too as it'll trickle to stable eventually...

Thanks.

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--