2006-01-23 11:22:15

by pravin shelar

[permalink] [raw]
Subject: [PATCH] garbage values in file /proc/net/sockstat

In 2.6.16-rc1-mm1, (for x86_64 arch) cpu_possible_map is not same
as NR_CPUS (prefill_possible_map()). Therefore per cpu areas are allocated
for cpu_possible cpus only (setup_per_cpu_areas()). This causes sockstat
to return garbage value on x84_64 arch.

So these per_cpu accesses are geting relocated (RELOC_HIDE) using
boot_cpu_pda[]->data_offset which is not initialized.

There are other instances of same bug where per_cpu() macro is used
without cpu_possible() check. e.g. net/core/utils.c ::
net_random_reseed(), net/core/dev.c :: net_dev_init(), etc.

This patch fixes these bugs.

Regards,
Pravin.

---

Signed-off by: Pravin B. Shelar <[email protected]>
Signed-off-by: Ravikiran Thirumalai <[email protected]>
Signed-off-by: Shai Fultheim <[email protected]>

Index: linux-2.6.15.1/net/core/dev.c
===================================================================
--- linux-2.6.15.1.orig/net/core/dev.c 2006-01-23 02:31:13.000000000 -0800
+++ linux-2.6.15.1/net/core/dev.c 2006-01-23 02:32:12.000000000 -0800
@@ -3240,7 +3240,7 @@ static int __init net_dev_init(void)
* Initialise the packet receive queues.
*/

- for (i = 0; i < NR_CPUS; i++) {
+ for_each_cpu (i) {
struct softnet_data *queue;

queue = &per_cpu(softnet_data, i);
Index: linux-2.6.15.1/net/core/utils.c
===================================================================
--- linux-2.6.15.1.orig/net/core/utils.c 2006-01-23 02:31:13.000000000 -0800
+++ linux-2.6.15.1/net/core/utils.c 2006-01-23 02:32:12.000000000 -0800
@@ -133,7 +133,7 @@ static int net_random_reseed(void)
unsigned long seed[NR_CPUS];

get_random_bytes(seed, sizeof(seed));
- for (i = 0; i < NR_CPUS; i++) {
+ for_each_cpu(i) {
struct nrnd_state *state = &per_cpu(net_rand_state,i);
__net_srandom(state, seed[i]);
}
Index: linux-2.6.15.1/net/socket.c
===================================================================
--- linux-2.6.15.1.orig/net/socket.c 2006-01-23 02:31:13.000000000 -0800
+++ linux-2.6.15.1/net/socket.c 2006-01-23 02:32:12.000000000 -0800
@@ -2079,7 +2079,7 @@ void socket_seq_show(struct seq_file *se
int cpu;
int counter = 0;

- for (cpu = 0; cpu < NR_CPUS; cpu++)
+ for_each_cpu (cpu)
counter += per_cpu(sockets_in_use, cpu);

/* It can be negative, by the way. 8) */


2006-01-23 11:24:34

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] garbage values in file /proc/net/sockstat

On Monday 23 January 2006 12:21, pravin shelar wrote:
> In 2.6.16-rc1-mm1, (for x86_64 arch) cpu_possible_map is not same
> as NR_CPUS (prefill_possible_map()). Therefore per cpu areas are allocated
> for cpu_possible cpus only (setup_per_cpu_areas()). This causes sockstat
> to return garbage value on x84_64 arch.
>
> So these per_cpu accesses are geting relocated (RELOC_HIDE) using
> boot_cpu_pda[]->data_offset which is not initialized.
>
> There are other instances of same bug where per_cpu() macro is used
> without cpu_possible() check. e.g. net/core/utils.c ::
> net_random_reseed(), net/core/dev.c :: net_dev_init(), etc.
>
> This patch fixes these bugs.

Thanks. Patches Look good. Dave, can you push them for 2.6.16 still please?

-Andi

2006-01-23 13:29:51

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH] garbage values in file /proc/net/sockstat

Andi Kleen a ?crit :
> On Monday 23 January 2006 12:21, pravin shelar wrote:
>> In 2.6.16-rc1-mm1, (for x86_64 arch) cpu_possible_map is not same
>> as NR_CPUS (prefill_possible_map()). Therefore per cpu areas are allocated
>> for cpu_possible cpus only (setup_per_cpu_areas()). This causes sockstat
>> to return garbage value on x84_64 arch.
>>
>> So these per_cpu accesses are geting relocated (RELOC_HIDE) using
>> boot_cpu_pda[]->data_offset which is not initialized.
>>
>> There are other instances of same bug where per_cpu() macro is used
>> without cpu_possible() check. e.g. net/core/utils.c ::
>> net_random_reseed(), net/core/dev.c :: net_dev_init(), etc.
>>
>> This patch fixes these bugs.
>
> Thanks. Patches Look good. Dave, can you push them for 2.6.16 still please?
>

Shouldnt we force a page fault for not possible cpus in cpu_data
to catch all access to per_cpu(some_object, some_not_possible_cpu) ?

We can use a red zone big enough to hold the whole per_cpu data.

Something like :

file include/asm-x86_64/pgtable.h

#define CPUDATA_RED_ZONE 0xffff808000000000UL /* start of percpu catcher */

file arch/x86_64/kernel/setup64.c

setup_per_cpu_areas(void)
{
...
for (i = 0 ; i < NR_CPUS ; i++) {
if (!cpu_possible(cpu))
cpu_pda(i)->data_offset = CPUDATA_RED_ZONE - __per_cpu_start ;
}
}

Eric

2006-01-23 15:11:58

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] garbage values in file /proc/net/sockstat

On Monday 23 January 2006 14:28, Eric Dumazet wrote:

> Shouldnt we force a page fault for not possible cpus in cpu_data
> to catch all access to per_cpu(some_object, some_not_possible_cpu) ?
>
> We can use a red zone big enough to hold the whole per_cpu data.

Good idea. Can you please send me a tested patch?

-Andi

2006-01-23 16:29:44

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH] garbage values in file /proc/net/sockstat

--- linux-2.6.16-rc1/Documentation/x86_64/mm.txt 2006-01-17 08:44:47.000000000 +0100
+++ linux-2.6.16-rc1-mm2-ed/Documentation/x86_64/mm.txt 2006-01-23 16:54:46.000000000 +0100
@@ -5,7 +5,8 @@

0000000000000000 - 00007fffffffffff (=47bits) user space, different per mm
hole caused by [48:63] sign extension
-ffff800000000000 - ffff80ffffffffff (=40bits) guard hole
+ffff800000000000 - ffff807fffffffff (=39bits) guard hole
+ffff808000000000 - ffff80ffffffffff (=39bits) not possible cpus percpudata hole
ffff810000000000 - ffffc0ffffffffff (=46bits) direct mapping of all phys. memory
ffffc10000000000 - ffffc1ffffffffff (=40bits) hole
ffffc20000000000 - ffffe1ffffffffff (=45bits) vmalloc/ioremap space
--- linux-2.6.16-rc1/include/asm-x86_64/pgtable.h 2006-01-17 08:44:47.000000000 +0100
+++ linux-2.6.16-rc1-mm2-ed/include/asm-x86_64/pgtable.h 2006-01-23 16:54:46.000000000 +0100
@@ -136,6 +136,7 @@

#ifndef __ASSEMBLY__
#define MAXMEM 0x3fffffffffffUL
+#define CPUDATA_RED_ZONE 0xffff808000000000UL
#define VMALLOC_START 0xffffc20000000000UL
#define VMALLOC_END 0xffffe1ffffffffffUL
#define MODULES_VADDR 0xffffffff88000000UL
--- linux-2.6.16-rc1/arch/x86_64/kernel/setup64.c 2006-01-23 16:36:38.000000000 +0100
+++ linux-2.6.16-rc1-mm2-ed/arch/x86_64/kernel/setup64.c 2006-01-23 16:58:30.000000000 +0100
@@ -99,9 +99,13 @@
size = PERCPU_ENOUGH_ROOM;
#endif

- for_each_cpu_mask (i, cpu_possible_map) {
+ for (i = 0 ; i < NR_CPUS ; i++) {
char *ptr;

+ cpu_pda(i)->data_offset = (char *)CPUDATA_RED_ZONE - __per_cpu_start;
+ if (!cpu_possible(i))
+ continue;
+
if (!NODE_DATA(cpu_to_node(i))) {
printk("cpu with no node %d, num_online_nodes %d\n",
i, num_online_nodes());


Attachments:
cpudata_red_zone.patch (1.69 kB)

2006-01-23 16:47:35

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH] garbage values in file /proc/net/sockstat

--- linux-2.6.16-rc1/Documentation/x86_64/mm.txt 2006-01-17 08:44:47.000000000 +0100
+++ linux-2.6.16-rc1-mm2-ed/Documentation/x86_64/mm.txt 2006-01-23 16:54:46.000000000 +0100
@@ -5,7 +5,8 @@

0000000000000000 - 00007fffffffffff (=47bits) user space, different per mm
hole caused by [48:63] sign extension
-ffff800000000000 - ffff80ffffffffff (=40bits) guard hole
+ffff800000000000 - ffff807fffffffff (=39bits) guard hole
+ffff808000000000 - ffff80ffffffffff (=39bits) not possible cpus percpudata hole
ffff810000000000 - ffffc0ffffffffff (=46bits) direct mapping of all phys. memory
ffffc10000000000 - ffffc1ffffffffff (=40bits) hole
ffffc20000000000 - ffffe1ffffffffff (=45bits) vmalloc/ioremap space
--- linux-2.6.16-rc1/include/asm-x86_64/pgtable.h 2006-01-17 08:44:47.000000000 +0100
+++ linux-2.6.16-rc1-mm2-ed/include/asm-x86_64/pgtable.h 2006-01-23 16:54:46.000000000 +0100
@@ -136,6 +136,7 @@

#ifndef __ASSEMBLY__
#define MAXMEM 0x3fffffffffffUL
+#define CPUDATA_RED_ZONE 0xffff808000000000UL
#define VMALLOC_START 0xffffc20000000000UL
#define VMALLOC_END 0xffffe1ffffffffffUL
#define MODULES_VADDR 0xffffffff88000000UL
--- linux-2.6.16-rc1/arch/x86_64/kernel/setup64.c 2006-01-23 16:36:38.000000000 +0100
+++ linux-2.6.16-rc1-mm2-ed/arch/x86_64/kernel/setup64.c 2006-01-23 17:40:54.000000000 +0100
@@ -99,9 +99,14 @@
size = PERCPU_ENOUGH_ROOM;
#endif

- for_each_cpu_mask (i, cpu_possible_map) {
+ for (i = 0 ; i < NR_CPUS ; i++) {
char *ptr;

+ if (!cpu_possible(i)) {
+ cpu_pda(i)->data_offset = (char *)CPUDATA_RED_ZONE - __per_cpu_start;
+ continue;
+ }
+
if (!NODE_DATA(cpu_to_node(i))) {
printk("cpu with no node %d, num_online_nodes %d\n",
i, num_online_nodes());


Attachments:
cpudata_red_zone.patch (1.70 kB)

2006-01-25 13:31:52

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] garbage values in file /proc/net/sockstat

On Monday 23 January 2006 17:46, Eric Dumazet wrote:
>
> Sorry for the first version of he patch. I did one change, in order to do the
> initialization only for !possible cpu
>
> [PATCH] x86_64 : Use a special CPUDATA_RED_ZONE to catch accesses to
> per_cpu(some_object, some_not_possible_cpu)
>
> Because cpu_data(cpu)->data_offset may contain garbage, some buggy code may do
> random things without notice. If we initialize data_offset so that the
> per_cpu() data sits in an unmapped memory area, we should get page faults and
> stack traces should help us find the bugs.
>
> Signed-off-by: Eric Dumazet <[email protected]>

With that patch sched_init gives an early exception. I don't think
we can fix up all cases before 2.6.16, but it's dangerous
to let it reference freed memory.

I think the best course of action for this now for 2.6.16 is:

- mark percpu init data not __init
(this way it will still reference valid memory, although shared between
all impossible CPUs)
- keep the impossible CPUs per cpu data to point to the original reference
version (== offset 0)

For 2.6.17 all the occurrences of NR_CPUS should be audited
and fixed up like you started in your earlier patch.

Like the attached patch.

-Andi


Attachments:
(No filename) (1.21 kB)
impossible-per-cpu-data-workaround (1.99 kB)
Download all attachments

2006-01-25 19:59:45

by Ravikiran G Thirumalai

[permalink] [raw]
Subject: Re: [PATCH] garbage values in file /proc/net/sockstat

On Wed, Jan 25, 2006 at 02:31:15PM +0100, Andi Kleen wrote:
> On Monday 23 January 2006 17:46, Eric Dumazet wrote:
>
> I think the best course of action for this now for 2.6.16 is:
>
> - mark percpu init data not __init
> (this way it will still reference valid memory, although shared between
> all impossible CPUs)
> - keep the impossible CPUs per cpu data to point to the original reference
> version (== offset 0)
>

How about doing the above using a debug config option? So that when the
config option is turned on, all per-cpu area references to not possible
cpus crash? and leave that option default on on -mm :). That way we can
quickly catch all references. We can probably change the arch independent
setup_per_cpu_areas also to do allocations for cpu_possible cpus only while
we are at it?

Kiran

2006-01-25 20:47:17

by Ravikiran G Thirumalai

[permalink] [raw]
Subject: Re: [PATCH] garbage values in file /proc/net/sockstat

On Wed, Jan 25, 2006 at 11:59:46AM -0800, Ravikiran G Thirumalai wrote:
>
> How about doing the above using a debug config option? So that when the
> config option is turned on, all per-cpu area references to not possible
> cpus crash? and leave that option default on on -mm :). That way we can
> quickly catch all references. We can probably change the arch independent
> setup_per_cpu_areas also to do allocations for cpu_possible cpus only while
> we are at it?

Ahh! just realised mm3 is already out with the Eric Dumazet's patch.

2006-01-25 21:45:49

by be-news06

[permalink] [raw]
Subject: Red zones (was: [PATCH] garbage values in file /proc/net/sockstat)

Eric Dumazet <[email protected]> wrote:
> We can use a red zone big enough to hold the whole per_cpu data.

I am trying to learn a bit here: why is it required to have a speciel red
zone for this case? Wouldnt it make more sence to have a single red zone
which can be used by all locations in the kernel for unused structures? That
would reduce the number of wasted segements in the page table, or?

Gruss
Bernd

2006-01-26 00:35:30

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] garbage values in file /proc/net/sockstat

On Wednesday 25 January 2006 20:59, Ravikiran G Thirumalai wrote:
> On Wed, Jan 25, 2006 at 02:31:15PM +0100, Andi Kleen wrote:
> > On Monday 23 January 2006 17:46, Eric Dumazet wrote:
> >
> > I think the best course of action for this now for 2.6.16 is:
> >
> > - mark percpu init data not __init
> > (this way it will still reference valid memory, although shared between
> > all impossible CPUs)
> > - keep the impossible CPUs per cpu data to point to the original reference
> > version (== offset 0)
> >
>
> How about doing the above using a debug config option? So that when the
> config option is turned on, all per-cpu area references to not possible
> cpus crash? and leave that option default on on -mm :)

In -mm* we could just apply Eric's patch and then someone should just
grep the tree for NR_CPUS and audit all users - that should
catch basically all occurrences. I can put it onto my todo list,
but I don't know when I'll get to it so it would be nice if someone
else could do this.

For 2.6.16 I think it's best to go forward with my hack.

> . That way we can
> quickly catch all references. We can probably change the arch independent
> setup_per_cpu_areas also to do allocations for cpu_possible cpus only while
> we are at it?

Eric did that already.

-Andi

2006-01-26 05:28:27

by Eric Dumazet

[permalink] [raw]
Subject: Re: Red zones

Bernd Eckenfels a ?crit :
> Eric Dumazet <[email protected]> wrote:
>> We can use a red zone big enough to hold the whole per_cpu data.
>
> I am trying to learn a bit here: why is it required to have a speciel red
> zone for this case? Wouldnt it make more sence to have a single red zone
> which can be used by all locations in the kernel for unused structures? That
> would reduce the number of wasted segements in the page table, or?
>

On x86_64, available virtual space is huge, so having different red zones can
spot the fault more easily : If the target of the fault is in the PER_CPU
redzone given range, we can instantly knows there is still a per_cpu() user
accessing a non possible cpu area. As the red zone is not mapped at all, no
page table is setup.


On 32 bits platforms, this is completely different : space is scarse (typical
User/Kernel split of 3GB/1GB), so we should avoid to reserve even a 32 KB
redzone. We could do it in DEBUG mode for example. Current interim patch in
2.6.16-rc1-mm3 is using NULL pointer but this is not a perfect solution since
the underlying current user process can perfectly map something in this 'zone'.

Eric


2006-01-26 10:07:27

by be-news06

[permalink] [raw]
Subject: Re: Red zones

Eric Dumazet <[email protected]> wrote:
> On x86_64, available virtual space is huge, so having different red zones can
> spot the fault more easily : If the target of the fault is in the PER_CPU
> redzone given range, we can instantly knows there is still a per_cpu() user
> accessing a non possible cpu area. As the red zone is not mapped at all, no
> page table is setup.

Ok, however you can also tell from the stack trace who accessed the red zone, right?

Gruss
Bernd