2010-06-17 16:19:10

by Cliff Wickman

[permalink] [raw]
Subject: per_cpu_ptr_to_phys() failure on UV x86_64


kdump is failing on an SGI UV system because it depends on
/sys/devices/system/cpu/cpuN/crash_notes.
And these files contain bad addresses for cpus beyond cpu 0.

This occurs using 2.6.35-rc3 code. But the same problem looks
present in 2.6.33.

The problem traces to per_cpu_ptr_to_phys() -> pcpu_addr_to_page() ->
vmalloc_to_page() for per-cpu addresses not in the first per-cpu 'chunk',
but not in the VMALLOC_START/VMALLOC_END range.

I wonder why this shows up on UV but not other x86_64's?
I've included a patch that solves this for me. But I defer to the
authors for a proper solution.

This is where per_cpu_ptr_to_phys() is called for this /sys file:
static ssize_t show_crash_notes(struct sys_device *dev, struct ...
...
addr = per_cpu_ptr_to_phys(per_cpu_ptr(crash_notes, cpunum));
rc = sprintf(buf, "%Lx\n", addr);
return rc;
}

The problem, without the below patch: (but a couple of printk's)

uv3-sys:/tmp/cpw # cat /sys/devices/system/cpu/cpu0/crash_notes
1c1b040
uv3-sys:/tmp/cpw # cat /sys/devices/system/cpu/cpu1/crash_notes
db74000000000000
uv3-sys:/tmp/cpw # cat /sys/devices/system/cpu/cpu2/crash_notes
db74000000000000
uv3-sys:/tmp/cpw # dmesg | tail -n 6
[ 133.883009] cpw: per_cpu_ptr_to_phys addr ffff880001c1b040
[ 133.883012] cpw: per_cpu_ptr_to_phys returning 0x1c1b040
[ 136.910178] cpw: per_cpu_ptr_to_phys addr ffff880001c3b040
[ 136.910181] cpw: per_cpu_ptr_to_phys returning 0xdb74000000000000
[ 140.304825] cpw: per_cpu_ptr_to_phys addr ffff880001c5b040
[ 140.304828] cpw: per_cpu_ptr_to_phys returning 0xdb74000000000000


With the below patch: (plus a couple of printk's)

uv3-sys: # cat /sys/devices/system/cpu/cpu0/crash_notes
1c1b040
uv3-sys: # cat /sys/devices/system/cpu/cpu1/crash_notes
1c3b040
uv3-sys: # cat /sys/devices/system/cpu/cpu2/crash_notes
1c5b040
uv3-sys: # dmesg | tail -n 6
[ 130.411358] cpw: per_cpu_ptr_to_phys addr ffff880001c1b040
[ 130.411361] cpw: per_cpu_ptr_to_phys returning 0x1c1b040
[ 135.420702] cpw: per_cpu_ptr_to_phys addr ffff880001c3b040
[ 135.420705] cpw: per_cpu_ptr_to_phys returning 0x1c3b040
[ 139.514014] cpw: per_cpu_ptr_to_phys addr ffff880001c5b040
[ 139.514016] cpw: per_cpu_ptr_to_phys returning 0x1c5b040

Diffed against 2.6.35-rc5

Signed-off-by: Cliff Wickman <[email protected]>
---
mm/percpu.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)

Index: linux-2.6.35-rc3/mm/percpu.c
===================================================================
--- linux-2.6.35-rc3.orig/mm/percpu.c
+++ linux-2.6.35-rc3/mm/percpu.c
@@ -978,12 +978,11 @@ bool is_kernel_percpu_address(unsigned l
*/
phys_addr_t per_cpu_ptr_to_phys(void *addr)
{
+ if ((unsigned long)addr < VMALLOC_START ||
+ (unsigned long)addr >= VMALLOC_END)
+ return __pa(addr);
if (pcpu_addr_in_first_chunk(addr)) {
- if ((unsigned long)addr < VMALLOC_START ||
- (unsigned long)addr >= VMALLOC_END)
- return __pa(addr);
- else
- return page_to_phys(vmalloc_to_page(addr));
+ return page_to_phys(vmalloc_to_page(addr));
} else
return page_to_phys(pcpu_addr_to_page(addr));
}


2010-06-17 17:08:15

by Tejun Heo

[permalink] [raw]
Subject: Re: per_cpu_ptr_to_phys() failure on UV x86_64

Hello,

On 06/17/2010 06:20 PM, Cliff Wickman wrote:
> phys_addr_t per_cpu_ptr_to_phys(void *addr)
> {
> + if ((unsigned long)addr < VMALLOC_START ||
> + (unsigned long)addr >= VMALLOC_END)
> + return __pa(addr);
> if (pcpu_addr_in_first_chunk(addr)) {
> - if ((unsigned long)addr < VMALLOC_START ||
> - (unsigned long)addr >= VMALLOC_END)
> - return __pa(addr);
> - else
> - return page_to_phys(vmalloc_to_page(addr));
> + return page_to_phys(vmalloc_to_page(addr));
> } else
> return page_to_phys(pcpu_addr_to_page(addr));
> }

(scratching head...) So, that means it's given an address for which
!pcpu_addr_in_first_chunk() but outside of vmalloc area. Strange.
I'll find out what's going on.

Thanks.

--
tejun

2010-06-17 17:35:21

by Tejun Heo

[permalink] [raw]
Subject: Re: per_cpu_ptr_to_phys() failure on UV x86_64

On 06/17/2010 07:08 PM, Tejun Heo wrote:
> (scratching head...) So, that means it's given an address for which
> !pcpu_addr_in_first_chunk() but outside of vmalloc area. Strange.
> I'll find out what's going on.

Does the following patch work? The original patch assumed that @addr
would be the address of the base cpu which isn't true. I only compile
tested the patch so it might be broken (sorry, I gotta go somewhere
now) but this should be the right direction.

Thanks.

diff --git a/mm/percpu.c b/mm/percpu.c
index 46485e1..8956155 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -978,14 +978,23 @@ bool is_kernel_percpu_address(unsigned long addr)
*/
phys_addr_t per_cpu_ptr_to_phys(void *addr)
{
- if (pcpu_addr_in_first_chunk(addr)) {
- if ((unsigned long)addr < VMALLOC_START ||
- (unsigned long)addr >= VMALLOC_END)
- return __pa(addr);
- else
- return page_to_phys(vmalloc_to_page(addr));
- } else
- return page_to_phys(pcpu_addr_to_page(addr));
+ void __percpu *base = __addr_to_pcpu_ptr(pcpu_base_addr);
+ unsigned int cpu;
+
+ for_each_possible_cpu(cpu) {
+ void *start = per_cpu_ptr(base, cpu);
+
+ if (addr >= start && addr < start + pcpu_unit_size) {
+ /* in the first chunk */
+ if ((unsigned long)addr < VMALLOC_START ||
+ (unsigned long)addr >= VMALLOC_END)
+ return __pa(addr);
+ else
+ return page_to_phys(vmalloc_to_page(addr));
+ }
+ }
+ /* in one of the other chunks */
+ return page_to_phys(pcpu_addr_to_page(addr));
}

static inline size_t pcpu_calc_fc_sizes(size_t static_size,


--
tejun

2010-06-17 18:23:33

by Cliff Wickman

[permalink] [raw]
Subject: Re: per_cpu_ptr_to_phys() failure on UV x86_64


On Thu, Jun 17, 2010 at 07:35:16PM +0200, Tejun Heo wrote:
> On 06/17/2010 07:08 PM, Tejun Heo wrote:
> > (scratching head...) So, that means it's given an address for which
> > !pcpu_addr_in_first_chunk() but outside of vmalloc area. Strange.
> > I'll find out what's going on.
>
> Does the following patch work? The original patch assumed that @addr
> would be the address of the base cpu which isn't true. I only compile
> tested the patch so it might be broken (sorry, I gotta go somewhere
> now) but this should be the right direction.

Yes, your patch works. I tested it on a 32p UV system.

-Cliff

> diff --git a/mm/percpu.c b/mm/percpu.c
> index 46485e1..8956155 100644
> --- a/mm/percpu.c
> +++ b/mm/percpu.c
> @@ -978,14 +978,23 @@ bool is_kernel_percpu_address(unsigned long addr)
> */
> phys_addr_t per_cpu_ptr_to_phys(void *addr)
> {
> - if (pcpu_addr_in_first_chunk(addr)) {
> - if ((unsigned long)addr < VMALLOC_START ||
> - (unsigned long)addr >= VMALLOC_END)
> - return __pa(addr);
> - else
> - return page_to_phys(vmalloc_to_page(addr));
> - } else
> - return page_to_phys(pcpu_addr_to_page(addr));
> + void __percpu *base = __addr_to_pcpu_ptr(pcpu_base_addr);
> + unsigned int cpu;
> +
> + for_each_possible_cpu(cpu) {
> + void *start = per_cpu_ptr(base, cpu);
> +
> + if (addr >= start && addr < start + pcpu_unit_size) {
> + /* in the first chunk */
> + if ((unsigned long)addr < VMALLOC_START ||
> + (unsigned long)addr >= VMALLOC_END)
> + return __pa(addr);
> + else
> + return page_to_phys(vmalloc_to_page(addr));
> + }
> + }
> + /* in one of the other chunks */
> + return page_to_phys(pcpu_addr_to_page(addr));
> }
>
> static inline size_t pcpu_calc_fc_sizes(size_t static_size,
>
>
> --
> tejun

--
Cliff Wickman
SGI
[email protected]
(651) 683-3824

2010-06-18 09:56:21

by Tejun Heo

[permalink] [raw]
Subject: [PATCH] percpu: fix first chunk match in per_cpu_ptr_to_phys()

per_cpu_ptr_to_phys() determines whether the passed in @addr belongs
to the first_chunk or not by just matching the address against the
address range of the base unit (unit0, used by cpu0). When an adress
from another cpu was passed in, it will always determine that the
address doesn't belong to the first chunk even when it does. This
makes the function return a bogus physical address which may lead to
crash.

This problem was discovered by Cliff Wickman while investigating a
crash during kdump on a SGI UV system.

Signed-off-by: Tejun Heo <[email protected]>
Reported-by: Cliff Wickman <[email protected]>
---
Can you please verify this one? I added a small optimization so that
it doesn't suck too bad on large machines and it works fine here but
it would be great to have your Tested-by:.

Thanks.

mm/percpu.c | 31 ++++++++++++++++++++++++++++---
1 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/mm/percpu.c b/mm/percpu.c
index 46485e1..6470e77 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -229,8 +229,8 @@ static int __maybe_unused pcpu_page_idx(unsigned int cpu, int page_idx)
return pcpu_unit_map[cpu] * pcpu_unit_pages + page_idx;
}

-static unsigned long __maybe_unused pcpu_chunk_addr(struct pcpu_chunk *chunk,
- unsigned int cpu, int page_idx)
+static unsigned long pcpu_chunk_addr(struct pcpu_chunk *chunk,
+ unsigned int cpu, int page_idx)
{
return (unsigned long)chunk->base_addr + pcpu_unit_offsets[cpu] +
(page_idx << PAGE_SHIFT);
@@ -978,7 +978,32 @@ bool is_kernel_percpu_address(unsigned long addr)
*/
phys_addr_t per_cpu_ptr_to_phys(void *addr)
{
- if (pcpu_addr_in_first_chunk(addr)) {
+ void __percpu *base = __addr_to_pcpu_ptr(pcpu_base_addr);
+ bool in_first_chunk = false;
+ unsigned long first_start, first_end;
+ unsigned int cpu;
+
+ /*
+ * The following test on first_start/end isn't strictly
+ * necessary but will speed up lookups of addresses which
+ * aren't in the first chunk.
+ */
+ first_start = pcpu_chunk_addr(pcpu_first_chunk, pcpu_first_unit_cpu, 0);
+ first_end = pcpu_chunk_addr(pcpu_first_chunk, pcpu_last_unit_cpu,
+ pcpu_unit_pages);
+ if ((unsigned long)addr >= first_start &&
+ (unsigned long)addr < first_end) {
+ for_each_possible_cpu(cpu) {
+ void *start = per_cpu_ptr(base, cpu);
+
+ if (addr >= start && addr < start + pcpu_unit_size) {
+ in_first_chunk = true;
+ break;
+ }
+ }
+ }
+
+ if (in_first_chunk) {
if ((unsigned long)addr < VMALLOC_START ||
(unsigned long)addr >= VMALLOC_END)
return __pa(addr);
--
1.6.4.2

2010-06-18 12:28:42

by Cliff Wickman

[permalink] [raw]
Subject: Re: [PATCH] percpu: fix first chunk match in per_cpu_ptr_to_phys()


On Fri, Jun 18, 2010 at 11:56:16AM +0200, Tejun Heo wrote:
> per_cpu_ptr_to_phys() determines whether the passed in @addr belongs
> to the first_chunk or not by just matching the address against the
> address range of the base unit (unit0, used by cpu0). When an adress
> from another cpu was passed in, it will always determine that the
> address doesn't belong to the first chunk even when it does. This
> makes the function return a bogus physical address which may lead to
> crash.
>
> This problem was discovered by Cliff Wickman while investigating a
> crash during kdump on a SGI UV system.
>
> Signed-off-by: Tejun Heo <[email protected]>
> Reported-by: Cliff Wickman <[email protected]>
> ---
> Can you please verify this one? I added a small optimization so that
> it doesn't suck too bad on large machines and it works fine here but
> it would be great to have your Tested-by:.

Yep. Works fine on 32p UV.

Tested-by: Cliff Wickman <[email protected]>

>
> Thanks.
>
> mm/percpu.c | 31 ++++++++++++++++++++++++++++---
> 1 files changed, 28 insertions(+), 3 deletions(-)
>
> diff --git a/mm/percpu.c b/mm/percpu.c
> index 46485e1..6470e77 100644
> --- a/mm/percpu.c
> +++ b/mm/percpu.c
> @@ -229,8 +229,8 @@ static int __maybe_unused pcpu_page_idx(unsigned int cpu, int page_idx)
> return pcpu_unit_map[cpu] * pcpu_unit_pages + page_idx;
> }
>
> -static unsigned long __maybe_unused pcpu_chunk_addr(struct pcpu_chunk *chunk,
> - unsigned int cpu, int page_idx)
> +static unsigned long pcpu_chunk_addr(struct pcpu_chunk *chunk,
> + unsigned int cpu, int page_idx)
> {
> return (unsigned long)chunk->base_addr + pcpu_unit_offsets[cpu] +
> (page_idx << PAGE_SHIFT);
> @@ -978,7 +978,32 @@ bool is_kernel_percpu_address(unsigned long addr)
> */
> phys_addr_t per_cpu_ptr_to_phys(void *addr)
> {
> - if (pcpu_addr_in_first_chunk(addr)) {
> + void __percpu *base = __addr_to_pcpu_ptr(pcpu_base_addr);
> + bool in_first_chunk = false;
> + unsigned long first_start, first_end;
> + unsigned int cpu;
> +
> + /*
> + * The following test on first_start/end isn't strictly
> + * necessary but will speed up lookups of addresses which
> + * aren't in the first chunk.
> + */
> + first_start = pcpu_chunk_addr(pcpu_first_chunk, pcpu_first_unit_cpu, 0);
> + first_end = pcpu_chunk_addr(pcpu_first_chunk, pcpu_last_unit_cpu,
> + pcpu_unit_pages);
> + if ((unsigned long)addr >= first_start &&
> + (unsigned long)addr < first_end) {
> + for_each_possible_cpu(cpu) {
> + void *start = per_cpu_ptr(base, cpu);
> +
> + if (addr >= start && addr < start + pcpu_unit_size) {
> + in_first_chunk = true;
> + break;
> + }
> + }
> + }
> +
> + if (in_first_chunk) {
> if ((unsigned long)addr < VMALLOC_START ||
> (unsigned long)addr >= VMALLOC_END)
> return __pa(addr);
> --
> 1.6.4.2

--
Cliff Wickman
SGI
[email protected]
(651) 683-3824

2010-06-18 13:07:17

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH] percpu: fix first chunk match in per_cpu_ptr_to_phys()

On 06/18/2010 02:30 PM, Cliff Wickman wrote:
> On Fri, Jun 18, 2010 at 11:56:16AM +0200, Tejun Heo wrote:
>> per_cpu_ptr_to_phys() determines whether the passed in @addr belongs
>> to the first_chunk or not by just matching the address against the
>> address range of the base unit (unit0, used by cpu0). When an adress
>> from another cpu was passed in, it will always determine that the
>> address doesn't belong to the first chunk even when it does. This
>> makes the function return a bogus physical address which may lead to
>> crash.
>>
>> This problem was discovered by Cliff Wickman while investigating a
>> crash during kdump on a SGI UV system.
>>
>> Signed-off-by: Tejun Heo <[email protected]>
>> Reported-by: Cliff Wickman <[email protected]>
>> ---
>> Can you please verify this one? I added a small optimization so that
>> it doesn't suck too bad on large machines and it works fine here but
>> it would be great to have your Tested-by:.
>
> Yep. Works fine on 32p UV.
>
> Tested-by: Cliff Wickman <[email protected]>

Great, thanks. Will push it to mainline and -stable.

--
tejun