2006-01-29 06:26:46

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH] i386: Add a temporary to make put_user more type safe.


In some code I am developing I had occasion to change the type of a
variable. This made the value put_user was putting to user space
wrong. But the code continued to build cleanly without errors.

Introducing a temporary fixes this problem and at least with gcc-3.3.5
does not cause gcc any problems with optimizing out the temporary.
gcc-4.x using SSA internally ought to be even better at optimizing out
temporaries, so I don't expect a temporary to become a problem.
Especially because in all correct cases the types on both sides of the
assignment to the temporary are the same.

Signed-off-by: Eric W. Biederman <[email protected]>


---

include/asm-i386/uaccess.h | 11 ++++++-----
1 files changed, 6 insertions(+), 5 deletions(-)

dd1215e541bfe2ed94a902f7fb263ca44b7e8afa
diff --git a/include/asm-i386/uaccess.h b/include/asm-i386/uaccess.h
index 3f1337c..1641613 100644
--- a/include/asm-i386/uaccess.h
+++ b/include/asm-i386/uaccess.h
@@ -198,12 +198,13 @@ extern void __put_user_8(void);
#define put_user(x,ptr) \
({ int __ret_pu; \
__chk_user_ptr(ptr); \
+ __typeof__(*(ptr)) __pu_val = x; \
switch(sizeof(*(ptr))) { \
- case 1: __put_user_1(x, ptr); break; \
- case 2: __put_user_2(x, ptr); break; \
- case 4: __put_user_4(x, ptr); break; \
- case 8: __put_user_8(x, ptr); break; \
- default:__put_user_X(x, ptr); break; \
+ case 1: __put_user_1(__pu_val, ptr); break; \
+ case 2: __put_user_2(__pu_val, ptr); break; \
+ case 4: __put_user_4(__pu_val, ptr); break; \
+ case 8: __put_user_8(__pu_val, ptr); break; \
+ default:__put_user_X(__pu_val, ptr); break; \
} \
__ret_pu; \
})
--
1.1.5.g3480


2006-01-29 06:39:36

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] i386: Add a temporary to make put_user more type safe.

[email protected] (Eric W. Biederman) wrote:
>
>
> In some code I am developing I had occasion to change the type of a
> variable. This made the value put_user was putting to user space
> wrong. But the code continued to build cleanly without errors.

What do you mean by "wrong"? What combinations of types do you think
should be disallowed here? put_user(char, int*), for example, or what?

We had one instance recently where code was doing put_user() of an
eight-byte struct and that sailed through x86 testing but made the sparc64
compiler ICE. Certainly we should stamp out tricks like that at compile
time, but how far should we go?

There's probably a good case for ensuring that typeof(x)==typeof(*ptr), but
I suspect that'd create a lot of noise.

> Introducing a temporary fixes this problem and at least with gcc-3.3.5
> does not cause gcc any problems with optimizing out the temporary.
> gcc-4.x using SSA internally ought to be even better at optimizing out
> temporaries, so I don't expect a temporary to become a problem.
> Especially because in all correct cases the types on both sides of the
> assignment to the temporary are the same.
>

Sounds sane. We could make it warn if typeof(x)!=typeof(*ptr) by adding
another temporary for the pointer, give it type typeof(x)*, but I haven't
tried it.


>
>
> ---
>
> include/asm-i386/uaccess.h | 11 ++++++-----
> 1 files changed, 6 insertions(+), 5 deletions(-)
>
> dd1215e541bfe2ed94a902f7fb263ca44b7e8afa
> diff --git a/include/asm-i386/uaccess.h b/include/asm-i386/uaccess.h
> index 3f1337c..1641613 100644
> --- a/include/asm-i386/uaccess.h
> +++ b/include/asm-i386/uaccess.h
> @@ -198,12 +198,13 @@ extern void __put_user_8(void);
> #define put_user(x,ptr) \
> ({ int __ret_pu; \
> __chk_user_ptr(ptr); \
> + __typeof__(*(ptr)) __pu_val = x; \
> switch(sizeof(*(ptr))) { \
> - case 1: __put_user_1(x, ptr); break; \
> - case 2: __put_user_2(x, ptr); break; \
> - case 4: __put_user_4(x, ptr); break; \
> - case 8: __put_user_8(x, ptr); break; \
> - default:__put_user_X(x, ptr); break; \
> + case 1: __put_user_1(__pu_val, ptr); break; \
> + case 2: __put_user_2(__pu_val, ptr); break; \
> + case 4: __put_user_4(__pu_val, ptr); break; \
> + case 8: __put_user_8(__pu_val, ptr); break; \
> + default:__put_user_X(__pu_val, ptr); break; \
> } \
> __ret_pu; \
> })
> --
> 1.1.5.g3480

2006-01-29 06:50:03

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH] i386: Add a temporary to make put_user more type safe.

Andrew Morton <[email protected]> writes:

> [email protected] (Eric W. Biederman) wrote:
>>
>>
>> In some code I am developing I had occasion to change the type of a
>> variable. This made the value put_user was putting to user space
>> wrong. But the code continued to build cleanly without errors.
>
> What do you mean by "wrong"? What combinations of types do you think
> should be disallowed here? put_user(char, int*), for example, or what?
>
> We had one instance recently where code was doing put_user() of an
> eight-byte struct and that sailed through x86 testing but made the sparc64
> compiler ICE. Certainly we should stamp out tricks like that at compile
> time, but how far should we go?
>
> There's probably a good case for ensuring that typeof(x)==typeof(*ptr), but
> I suspect that'd create a lot of noise.

All I do is ensure that typeof(x) is assignment compatible with typeof(*ptr).
That is what the assignment to the temporary does. We actually do
this already on x86 if you compile for an i386 which people very rarely
do anymore.

I have performed sever kernel compiles with it applied against the stable
tree and saw no errors other than when I tried to assign a pointer to an
integer using put_user.

So your eight-byte struct case would probably be caught but the character
case would get type promoted and be fine.

>> Introducing a temporary fixes this problem and at least with gcc-3.3.5
>> does not cause gcc any problems with optimizing out the temporary.
>> gcc-4.x using SSA internally ought to be even better at optimizing out
>> temporaries, so I don't expect a temporary to become a problem.
>> Especially because in all correct cases the types on both sides of the
>> assignment to the temporary are the same.
>>
>
> Sounds sane. We could make it warn if typeof(x)!=typeof(*ptr) by adding
> another temporary for the pointer, give it type typeof(x)*, but I haven't
> tried it.

I guess we could do that. However if we don't use the value we will probably
get an unused variable warning.

Mostly I am just after the normal C assignment warnings/errors. Although if
we can do better that would be great.

Eric

2006-01-29 07:51:35

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] i386: Add a temporary to make put_user more type safe.

[email protected] (Eric W. Biederman) wrote:
>
> > Sounds sane. We could make it warn if typeof(x)!=typeof(*ptr) by adding
> > another temporary for the pointer, give it type typeof(x)*, but I haven't
> > tried it.
>
> I guess we could do that. However if we don't use the value we will probably
> get an unused variable warning.

No, that's OK, we can use the temporary.

Something like this:

--- devel/include/asm-i386/uaccess.h~x86-tighten-uaccess-type-checking 2006-01-28 23:45:16.000000000 -0800
+++ devel-akpm/include/asm-i386/uaccess.h 2006-01-28 23:48:31.000000000 -0800
@@ -149,14 +149,16 @@ extern void __get_user_4(void);
#define get_user(x,ptr) \
({ int __ret_gu; \
unsigned long __val_gu; \
+ __typeof__(x)*_p_; \
+ _p_ = ptr; \
__chk_user_ptr(ptr); \
- switch(sizeof (*(ptr))) { \
- case 1: __get_user_x(1,__ret_gu,__val_gu,ptr); break; \
- case 2: __get_user_x(2,__ret_gu,__val_gu,ptr); break; \
- case 4: __get_user_x(4,__ret_gu,__val_gu,ptr); break; \
- default: __get_user_x(X,__ret_gu,__val_gu,ptr); break; \
+ switch(sizeof (*(_p_))) { \
+ case 1: __get_user_x(1, __ret_gu, __val_gu, _p_); break; \
+ case 2: __get_user_x(2, __ret_gu, __val_gu, _p_); break; \
+ case 4: __get_user_x(4, __ret_gu, __val_gu, _p_); break; \
+ default: __get_user_x(X, __ret_gu, __val_gu, _p_); break; \
} \
- (x) = (__typeof__(*(ptr)))__val_gu; \
+ (x) = __val_gu; \
__ret_gu; \
})

@@ -198,13 +200,17 @@ extern void __put_user_8(void);
#define put_user(x,ptr) \
({ int __ret_pu; \
__chk_user_ptr(ptr); \
- __typeof__(*(ptr)) __pu_val = x; \
+ __typeof__(x)*_p_; \
+ __typeof__(x)__pu_val; \
+ \
+ _p_ = ptr; \
+ __pu_val = x; \
switch(sizeof(*(ptr))) { \
- case 1: __put_user_1(__pu_val, ptr); break; \
- case 2: __put_user_2(__pu_val, ptr); break; \
- case 4: __put_user_4(__pu_val, ptr); break; \
- case 8: __put_user_8(__pu_val, ptr); break; \
- default:__put_user_X(__pu_val, ptr); break; \
+ case 1: __put_user_1(__pu_val, _p_); break; \
+ case 2: __put_user_2(__pu_val, _p_); break; \
+ case 4: __put_user_4(__pu_val, _p_); break; \
+ case 8: __put_user_8(__pu_val, _p_); break; \
+ default:__put_user_X(__pu_val, _p_); break; \
} \
__ret_pu; \
})
_


It gives ~100 warnings on my usual test config. It's a bit awkward because
get_user/put_user/etc aren't allowed to evaluate their args multiple times
- code likes to do

get_user(v, p++);

I guess fixing those 100-odd warnings might find some warts -
compiler-caused truncation or sign-extension during uaccess copies is a bit
of a worry.

But I have enough things to be going on with :(

2006-01-29 19:33:35

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] i386: Add a temporary to make put_user more type safe.

Ingo Oeser <[email protected]> wrote:
>
> > #define put_user(x,ptr) \
> > ({ int __ret_pu; \
> > __chk_user_ptr(ptr); \
> > - __typeof__(*(ptr)) __pu_val = x; \
> > + __typeof__(x)*_p_; \
> > + __typeof__(x)__pu_val; \
> > + \
> > + _p_ = ptr; \
> > + __pu_val = x; \
> > switch(sizeof(*(ptr))) { \
>
> - switch(sizeof(*(ptr))) { \
> + switch(sizeof(*(_p_))) { \
>
> > - case 1: __put_user_1(__pu_val, ptr); break; \
> > - case 2: __put_user_2(__pu_val, ptr); break; \
> > - case 4: __put_user_4(__pu_val, ptr); break; \
> > - case 8: __put_user_8(__pu_val, ptr); break; \
> > - default:__put_user_X(__pu_val, ptr); break; \
> > + case 1: __put_user_1(__pu_val, _p_); break; \
> > + case 2: __put_user_2(__pu_val, _p_); break; \
> > + case 4: __put_user_4(__pu_val, _p_); break; \
> > + case 8: __put_user_8(__pu_val, _p_); break; \
> > + default:__put_user_X(__pu_val, _p_); break; \
> > } \
> > __ret_pu; \
> > })
>
> Does this now give less warnings?

No, it won't do. All the warnings were legitimate anyway.

2006-01-29 20:04:45

by Eric Dumazet

[permalink] [raw]
Subject: [PATCH] i386: instead of poisoning .init zone, change protection bits to force a fault

--- linux-2.6.16-rc1-mm3/arch/i386/mm/init.c 2006-01-25 10:17:24.000000000 +0100
+++ linux-2.6.16-rc1-mm3-ed/arch/i386/mm/init.c 2006-01-29 21:46:39.000000000 +0100
@@ -750,11 +750,12 @@
for (addr = begin; addr < end; addr += PAGE_SIZE) {
ClearPageReserved(virt_to_page(addr));
set_page_count(virt_to_page(addr), 1);
- memset((void *)addr, 0xcc, PAGE_SIZE);
+ change_page_attr(virt_to_page(addr), 1, __pgprot(0));
free_page(addr);
totalram_pages++;
}
printk(KERN_INFO "Freeing %s: %ldk freed\n", what, (end - begin) >> 10);
+ global_flush_tlb();
}

void free_initmem(void)


Attachments:
i386_mm_init.patch (609.00 B)

2006-01-29 20:09:28

by Benjamin LaHaise

[permalink] [raw]
Subject: Re: [PATCH] i386: instead of poisoning .init zone, change protection bits to force a fault

On Sun, Jan 29, 2006 at 09:04:44PM +0100, Eric Dumazet wrote:
> Chasing some invalid accesses to .init zone, I found that free_init_pages()
> was properly freeing the pages but virtual was still usable.

This change will break the large table entries up, resulting in more TLB
pressure and reducing performance, and so should only be activated as a
debug option.

-ben
--
"Ladies and gentlemen, I'm sorry to interrupt, but the police are here
and they've asked us to stop the party." Don't Email: <[email protected]>.

2006-01-29 20:28:34

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH] i386: instead of poisoning .init zone, change protection bits to force a fault

Benjamin LaHaise a ?crit :
> On Sun, Jan 29, 2006 at 09:04:44PM +0100, Eric Dumazet wrote:
>> Chasing some invalid accesses to .init zone, I found that free_init_pages()
>> was properly freeing the pages but virtual was still usable.
>
> This change will break the large table entries up, resulting in more TLB
> pressure and reducing performance, and so should only be activated as a
> debug option.

Hum... yet another CONFIG option ?

Could we 'just' move rodata (because of CONFIG_DEBUG_RODATA) and .init section
in their own 2MB/4MB page, playing with vmlinux.lds.S ?

It would be possible to have a full hugepage readonly for rodata, and a full
NOPROT for .init ?

Eric

2006-01-29 20:56:53

by Eric Dumazet

[permalink] [raw]
Subject: [PATCH, V2] i386: instead of poisoning .init zone, change protection bits to force a fault

--- a/arch/i386/Kconfig.debug 2006-01-29 22:30:10.000000000 +0100
+++ b/arch/i386/Kconfig.debug 2006-01-29 22:35:54.000000000 +0100
@@ -61,6 +61,18 @@
portion of the kernel code won't be covered by a 2MB TLB anymore.
If in doubt, say "N".

+config DEBUG_INITDATA
+ bool "Read/Write protect kernel init data structures"
+ depends on DEBUG_KERNEL
+ help
+ The init data is normally freed when kernel has booted.
+ Some code may still try to read or write to data in this area.
+ If you say Y here, the kernel will mark this zone as not readable
+ or writeable at all. Buggy code will then fault.
+ This option may have a slight performance impact because a
+ portion of the kernel code won't be covered by a 2MB TLB anymore.
+ If in doubt, say "N".
+
config 4KSTACKS
bool "Use 4Kb + 4Kb for kernel stacks instead of 8Kb" if DEBUG_KERNEL
default y
--- a/arch/i386/mm/init.c 2006-01-25 10:17:24.000000000 +0100
+++ b/arch/i386/mm/init.c 2006-01-29 22:38:53.000000000 +0100
@@ -750,11 +750,18 @@
for (addr = begin; addr < end; addr += PAGE_SIZE) {
ClearPageReserved(virt_to_page(addr));
set_page_count(virt_to_page(addr), 1);
+#ifdef CONFIG_DEBUG_INITDATA
+ change_page_attr(virt_to_page(addr), 1, __pgprot(0));
+#else
memset((void *)addr, 0xcc, PAGE_SIZE);
+#endif
free_page(addr);
totalram_pages++;
}
printk(KERN_INFO "Freeing %s: %ldk freed\n", what, (end - begin) >> 10);
+#ifdef CONFIG_DEBUG_INITDATA
+ global_flush_tlb();
+#endif
}

void free_initmem(void)


Attachments:
i386_mm_init.patch (1.48 kB)

2006-01-30 09:03:47

by Eric Dumazet

[permalink] [raw]
Subject: Questions about alloc_large_system_hash() and TLB entries

alloc_large_system_hash() is used to allocate large hash tables at boot time.

Example on a 2 nodes NUMA machine :

Inode-cache hash table entries: 1048576 (order: 11, 8388608 bytes)
IP route cache hash table entries: 2097152 (order: 12, 16777216 bytes)
TCP established hash table entries: 2097152 (order: 12, 16777216 bytes)

Memory is taken from :
bootmem if (flags & HASH_EARLY)
__vmalloc() if (hashdist is set) (NUMA knob)
__get_free_pages(GFP_ATOMIC, order);


What would be the needed changes in the code to get both :

- Allocate ram equally from all the nodes of the machine

- Use large pages (2MB) to lower TLB stress

Thank you
Eric

2006-01-30 09:23:07

by David Miller

[permalink] [raw]
Subject: Re: Questions about alloc_large_system_hash() and TLB entries

From: Eric Dumazet <[email protected]>
Date: Mon, 30 Jan 2006 10:03:40 +0100

> What would be the needed changes in the code to get both :
>
> - Allocate ram equally from all the nodes of the machine
>
> - Use large pages (2MB) to lower TLB stress

These two desires are mutually exclusive, I think.

If you want an 8MB hash table, for example, with 2MB mappings
you could use memory from a maximum of 4 nodes since the
2MB chunks have to be physically 2MB aligned and 2MB contiguous.

2006-01-30 10:22:28

by Eric Dumazet

[permalink] [raw]
Subject: Re: Questions about alloc_large_system_hash() and TLB entries

David S. Miller a ?crit :
> From: Eric Dumazet <[email protected]>
> Date: Mon, 30 Jan 2006 10:03:40 +0100
>
>> What would be the needed changes in the code to get both :
>>
>> - Allocate ram equally from all the nodes of the machine
>>
>> - Use large pages (2MB) to lower TLB stress
>
> These two desires are mutually exclusive, I think.
>
> If you want an 8MB hash table, for example, with 2MB mappings
> you could use memory from a maximum of 4 nodes since the
> 2MB chunks have to be physically 2MB aligned and 2MB contiguous.

Yes of course, but as those hash tables are very big (their size is bigger
than 2MB * number_of_nodes if you have at least 4GB per node), this could be
done ?

Eric

2006-02-04 22:41:44

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH, V2] i386: instead of poisoning .init zone, change protection bits to force a fault

Eric Dumazet <[email protected]> wrote:
>
>
> Chasing some invalid accesses to .init zone, I found that free_init_pages()
> was properly freeing the pages but virtual was still usable.
>
> A poisoning (memset(page, 0xcc, PAGE_SIZE)) was done but this is not reliable.
>
> A new config option DEBUG_INITDATA is introduced to mark this initdata as not
> present at all so that buggy code can trigger a fault.
>
> This option is not meant for production machines because it may split one or
> two huge page (2MB or 4MB) into small pages and thus slow down kernel a bit.
>
> (After that we could map non possible cpu percpu data to the initial
> percpudata that is included in .init and discarded in free_initmem())
>
> ...
>
> --- a/arch/i386/mm/init.c 2006-01-25 10:17:24.000000000 +0100
> +++ b/arch/i386/mm/init.c 2006-01-29 22:38:53.000000000 +0100
> @@ -750,11 +750,18 @@
> for (addr = begin; addr < end; addr += PAGE_SIZE) {
> ClearPageReserved(virt_to_page(addr));
> set_page_count(virt_to_page(addr), 1);
> +#ifdef CONFIG_DEBUG_INITDATA
> + change_page_attr(virt_to_page(addr), 1, __pgprot(0));
> +#else
> memset((void *)addr, 0xcc, PAGE_SIZE);
> +#endif
> free_page(addr);
> totalram_pages++;
> }
> printk(KERN_INFO "Freeing %s: %ldk freed\n", what, (end - begin) >> 10);
> +#ifdef CONFIG_DEBUG_INITDATA
> + global_flush_tlb();
> +#endif
> }
>

This doesn't seem very pointful.

We unmap the page, then return it to the page allocator. Then someone
reallocates the page, tries to use it and goes oops.

If CONFIG_DEBUG_PAGEALLOC is also set, the kernel will remap the page when
it's allocated and everything works OK. So this patch requires
CONFIG_DEBUG_PAGEALLOC.

But if CONFIG_DEBUG_PAGEALLOC is set, we'll have unmapped that page in
free_page() _anyway_, so why bother using this patch?

The only enhancement I can think of here is to not free the page, so it's
permanently leaked and permanently unmapped.

--- devel/arch/i386/mm/init.c~i386-instead-of-poisoning-init-zone-change-protection-fix 2006-02-04 14:33:33.000000000 -0800
+++ devel-akpm/arch/i386/mm/init.c 2006-02-04 14:34:07.000000000 -0800
@@ -751,11 +751,15 @@ void free_init_pages(char *what, unsigne
ClearPageReserved(virt_to_page(addr));
set_page_count(virt_to_page(addr), 1);
#ifdef CONFIG_DEBUG_INITDATA
+ /*
+ * Unmap the page, and leak it. So any further accesses will
+ * oops.
+ */
change_page_attr(virt_to_page(addr), 1, __pgprot(0));
#else
memset((void *)addr, 0xcc, PAGE_SIZE);
-#endif
free_page(addr);
+#endif
totalram_pages++;
}
printk(KERN_INFO "Freeing %s: %ldk freed\n", what, (end - begin) >> 10);
_

But is there much point in doing this? Does it offer much more than
CONFIG_DEBUG_PAGEALLOC?

2006-02-05 17:04:06

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH, V2] i386: instead of poisoning .init zone, change protection bits to force a fault

Andrew Morton a ?crit :
> Eric Dumazet <[email protected]> wrote:
>>
>> Chasing some invalid accesses to .init zone, I found that free_init_pages()
>> was properly freeing the pages but virtual was still usable.
>>
>> A poisoning (memset(page, 0xcc, PAGE_SIZE)) was done but this is not reliable.
>>
>> A new config option DEBUG_INITDATA is introduced to mark this initdata as not
>> present at all so that buggy code can trigger a fault.
>>
>> This option is not meant for production machines because it may split one or
>> two huge page (2MB or 4MB) into small pages and thus slow down kernel a bit.
>>
>> (After that we could map non possible cpu percpu data to the initial
>> percpudata that is included in .init and discarded in free_initmem())
>>
>> ...
>>
>> --- a/arch/i386/mm/init.c 2006-01-25 10:17:24.000000000 +0100
>> +++ b/arch/i386/mm/init.c 2006-01-29 22:38:53.000000000 +0100
>> @@ -750,11 +750,18 @@
>> for (addr = begin; addr < end; addr += PAGE_SIZE) {
>> ClearPageReserved(virt_to_page(addr));
>> set_page_count(virt_to_page(addr), 1);
>> +#ifdef CONFIG_DEBUG_INITDATA
>> + change_page_attr(virt_to_page(addr), 1, __pgprot(0));
>> +#else
>> memset((void *)addr, 0xcc, PAGE_SIZE);
>> +#endif
>> free_page(addr);
>> totalram_pages++;
>> }
>> printk(KERN_INFO "Freeing %s: %ldk freed\n", what, (end - begin) >> 10);
>> +#ifdef CONFIG_DEBUG_INITDATA
>> + global_flush_tlb();
>> +#endif
>> }
>>
>
> This doesn't seem very pointful.
>
> We unmap the page, then return it to the page allocator. Then someone
> reallocates the page, tries to use it and goes oops.
>
> If CONFIG_DEBUG_PAGEALLOC is also set, the kernel will remap the page when
> it's allocated and everything works OK. So this patch requires
> CONFIG_DEBUG_PAGEALLOC.
>
> But if CONFIG_DEBUG_PAGEALLOC is set, we'll have unmapped that page in
> free_page() _anyway_, so why bother using this patch?
>
> The only enhancement I can think of here is to not free the page, so it's
> permanently leaked and permanently unmapped.
>
> --- devel/arch/i386/mm/init.c~i386-instead-of-poisoning-init-zone-change-protection-fix 2006-02-04 14:33:33.000000000 -0800
> +++ devel-akpm/arch/i386/mm/init.c 2006-02-04 14:34:07.000000000 -0800
> @@ -751,11 +751,15 @@ void free_init_pages(char *what, unsigne
> ClearPageReserved(virt_to_page(addr));
> set_page_count(virt_to_page(addr), 1);
> #ifdef CONFIG_DEBUG_INITDATA
> + /*
> + * Unmap the page, and leak it. So any further accesses will
> + * oops.
> + */
> change_page_attr(virt_to_page(addr), 1, __pgprot(0));
> #else
> memset((void *)addr, 0xcc, PAGE_SIZE);
> -#endif
> free_page(addr);
> +#endif
> totalram_pages++;
> }
> printk(KERN_INFO "Freeing %s: %ldk freed\n", what, (end - begin) >> 10);
> _
>
> But is there much point in doing this? Does it offer much more than
> CONFIG_DEBUG_PAGEALLOC?
>
>

1) CONFIG_DEBUG_PAGEALLOC is very generic (and expensive), while
CONFIG_DEBUG_INITDATA only makes sure init data becomes unreadable.

2) If CONFIG_DEBUG_INITDATA is on, the redzone is in action in the virtual
memory that hold the initdata, while the physical pages that contained the
initdata where freed and might be reused for other needs.

I think we have two different things here : Virtual mem redzoning (my patch),
and physical ram poisoning (your patch).

CONFIG_DEBUG_INITDATA uses only a virtual mem redzoning (no underlying memory
cost, apart of the page tables), while your solution doesnt free the pages,
and the poisoining wont catch further accesses, just make some results funny
or false.

The only bad effect of my patch is about the TLB cost, because of the
hugepage(s) that should revert to 512 normal 4K pages.

Eric

2006-02-05 19:43:09

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH, V2] i386: instead of poisoning .init zone, change protection bits to force a fault

Eric Dumazet <[email protected]> wrote:
>
> > --- devel/arch/i386/mm/init.c~i386-instead-of-poisoning-init-zone-change-protection-fix 2006-02-04 14:33:33.000000000 -0800
> > +++ devel-akpm/arch/i386/mm/init.c 2006-02-04 14:34:07.000000000 -0800
> > @@ -751,11 +751,15 @@ void free_init_pages(char *what, unsigne
> > ClearPageReserved(virt_to_page(addr));
> > set_page_count(virt_to_page(addr), 1);
> > #ifdef CONFIG_DEBUG_INITDATA
> > + /*
> > + * Unmap the page, and leak it. So any further accesses will
> > + * oops.
> > + */
> > change_page_attr(virt_to_page(addr), 1, __pgprot(0));
> > #else
> > memset((void *)addr, 0xcc, PAGE_SIZE);
> > -#endif
> > free_page(addr);
> > +#endif
> > totalram_pages++;
> > }
> > printk(KERN_INFO "Freeing %s: %ldk freed\n", what, (end - begin) >> 10);
> > _
> >
> > But is there much point in doing this? Does it offer much more than
> > CONFIG_DEBUG_PAGEALLOC?
> >
> >
>
> 1) CONFIG_DEBUG_PAGEALLOC is very generic (and expensive), while
> CONFIG_DEBUG_INITDATA only makes sure init data becomes unreadable.
>
> 2) If CONFIG_DEBUG_INITDATA is on, the redzone is in action in the virtual
> memory that hold the initdata, while the physical pages that contained the
> initdata where freed and might be reused for other needs.
>
> I think we have two different things here : Virtual mem redzoning (my patch),
> and physical ram poisoning (your patch).
>
> CONFIG_DEBUG_INITDATA uses only a virtual mem redzoning (no underlying memory
> cost, apart of the page tables), while your solution doesnt free the pages,
> and the poisoining wont catch further accesses, just make some results funny
> or false.
>
> The only bad effect of my patch is about the TLB cost, because of the
> hugepage(s) that should revert to 512 normal 4K pages.

Your patch made my kernel oops! The oops was prevented by either enabling
CONFIG_DEBUG_PAGEALLOC or by the above patch.

2006-02-06 08:53:14

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH, V2] i386: instead of poisoning .init zone, change protection bits to force a fault

Andrew Morton a ?crit :
> Eric Dumazet <[email protected]> wrote:
>>> --- devel/arch/i386/mm/init.c~i386-instead-of-poisoning-init-zone-change-protection-fix 2006-02-04 14:33:33.000000000 -0800
>> > +++ devel-akpm/arch/i386/mm/init.c 2006-02-04 14:34:07.000000000 -0800
>> > @@ -751,11 +751,15 @@ void free_init_pages(char *what, unsigne
>> > ClearPageReserved(virt_to_page(addr));
>> > set_page_count(virt_to_page(addr), 1);
>> > #ifdef CONFIG_DEBUG_INITDATA
>> > + /*
>> > + * Unmap the page, and leak it. So any further accesses will
>> > + * oops.
>> > + */
>> > change_page_attr(virt_to_page(addr), 1, __pgprot(0));
>> > #else
>> > memset((void *)addr, 0xcc, PAGE_SIZE);
>> > -#endif
>> > free_page(addr);
>> > +#endif
>> > totalram_pages++;
>> > }
>> > printk(KERN_INFO "Freeing %s: %ldk freed\n", what, (end - begin) >> 10);
>> > _
>> >
>> > But is there much point in doing this? Does it offer much more than
>> > CONFIG_DEBUG_PAGEALLOC?
>> >
>> >
>>
>> 1) CONFIG_DEBUG_PAGEALLOC is very generic (and expensive), while
>> CONFIG_DEBUG_INITDATA only makes sure init data becomes unreadable.
>>
>> 2) If CONFIG_DEBUG_INITDATA is on, the redzone is in action in the virtual
>> memory that hold the initdata, while the physical pages that contained the
>> initdata where freed and might be reused for other needs.
>>
>> I think we have two different things here : Virtual mem redzoning (my patch),
>> and physical ram poisoning (your patch).
>>
>> CONFIG_DEBUG_INITDATA uses only a virtual mem redzoning (no underlying memory
>> cost, apart of the page tables), while your solution doesnt free the pages,
>> and the poisoining wont catch further accesses, just make some results funny
>> or false.
>>
>> The only bad effect of my patch is about the TLB cost, because of the
>> hugepage(s) that should revert to 512 normal 4K pages.
>
> Your patch made my kernel oops! The oops was prevented by either enabling
> CONFIG_DEBUG_PAGEALLOC or by the above patch.

Oh I got it now ! Sorry for being stupid :(

If pages are freed, they indeed can be reused by kmalloc() and we get page
faults...
I wrongly assumed these pages would be mapped to different virtual addresses.

So your patch is necessary, unless we can free the pages and let them be used
only for User context for example.

Thank you
Eric

2006-02-06 09:02:11

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH, V2] i386: instead of poisoning .init zone, change protection bits to force a fault

Andrew Morton a ?crit :
> Eric Dumazet <[email protected]> wrote:
>>
>> Chasing some invalid accesses to .init zone, I found that free_init_pages()
>> was properly freeing the pages but virtual was still usable.
>>
>> A poisoning (memset(page, 0xcc, PAGE_SIZE)) was done but this is not reliable.
>>
>> A new config option DEBUG_INITDATA is introduced to mark this initdata as not
>> present at all so that buggy code can trigger a fault.
>>
>> This option is not meant for production machines because it may split one or
>> two huge page (2MB or 4MB) into small pages and thus slow down kernel a bit.
>>
>> (After that we could map non possible cpu percpu data to the initial
>> percpudata that is included in .init and discarded in free_initmem())
>>
>> ...
>>
>> --- a/arch/i386/mm/init.c 2006-01-25 10:17:24.000000000 +0100
>> +++ b/arch/i386/mm/init.c 2006-01-29 22:38:53.000000000 +0100
>> @@ -750,11 +750,18 @@
>> for (addr = begin; addr < end; addr += PAGE_SIZE) {
>> ClearPageReserved(virt_to_page(addr));
>> set_page_count(virt_to_page(addr), 1);
>> +#ifdef CONFIG_DEBUG_INITDATA
>> + change_page_attr(virt_to_page(addr), 1, __pgprot(0));
>> +#else
>> memset((void *)addr, 0xcc, PAGE_SIZE);
>> +#endif
>> free_page(addr);
>> totalram_pages++;
>> }
>> printk(KERN_INFO "Freeing %s: %ldk freed\n", what, (end - begin) >> 10);
>> +#ifdef CONFIG_DEBUG_INITDATA
>> + global_flush_tlb();
>> +#endif
>> }
>>
>
> This doesn't seem very pointful.
>
> We unmap the page, then return it to the page allocator. Then someone
> reallocates the page, tries to use it and goes oops.
>
> If CONFIG_DEBUG_PAGEALLOC is also set, the kernel will remap the page when
> it's allocated and everything works OK. So this patch requires
> CONFIG_DEBUG_PAGEALLOC.
>
> But if CONFIG_DEBUG_PAGEALLOC is set, we'll have unmapped that page in
> free_page() _anyway_, so why bother using this patch?
>
> The only enhancement I can think of here is to not free the page, so it's
> permanently leaked and permanently unmapped.
>
> --- devel/arch/i386/mm/init.c~i386-instead-of-poisoning-init-zone-change-protection-fix 2006-02-04 14:33:33.000000000 -0800
> +++ devel-akpm/arch/i386/mm/init.c 2006-02-04 14:34:07.000000000 -0800
> @@ -751,11 +751,15 @@ void free_init_pages(char *what, unsigne
> ClearPageReserved(virt_to_page(addr));
> set_page_count(virt_to_page(addr), 1);
> #ifdef CONFIG_DEBUG_INITDATA
> + /*
> + * Unmap the page, and leak it. So any further accesses will
> + * oops.
> + */
> change_page_attr(virt_to_page(addr), 1, __pgprot(0));
> #else
> memset((void *)addr, 0xcc, PAGE_SIZE);
> -#endif
> free_page(addr);
> +#endif
> totalram_pages++;
> }
> printk(KERN_INFO "Freeing %s: %ldk freed\n", what, (end - begin) >> 10);

I wonder if you dont have to move the 'totalram_pages++;' next to the
free_page(addr) call (ie inside the #else/#endif block)

Eric

2006-02-06 09:28:41

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH, V2] i386: instead of poisoning .init zone, change protection bits to force a fault

Eric Dumazet <[email protected]> wrote:
>
> > #ifdef CONFIG_DEBUG_INITDATA
> > + /*
> > + * Unmap the page, and leak it. So any further accesses will
> > + * oops.
> > + */
> > change_page_attr(virt_to_page(addr), 1, __pgprot(0));
> > #else
> > memset((void *)addr, 0xcc, PAGE_SIZE);
> > -#endif
> > free_page(addr);
> > +#endif
> > totalram_pages++;
> > }
> > printk(KERN_INFO "Freeing %s: %ldk freed\n", what, (end - begin) >> 10);
>
> I wonder if you dont have to move the 'totalram_pages++;' next to the
> free_page(addr) call (ie inside the #else/#endif block)
>

yup, thanks.

But I'm inclined to drop the whole patch - I don't see how it can detect
any bugs which CONFIG_DEBUG_PAGEALLOC won't find.

2006-02-06 10:07:51

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH, V2] i386: instead of poisoning .init zone, change protection bits to force a fault

Andrew Morton a ?crit :

> But I'm inclined to drop the whole patch - I don't see how it can detect
> any bugs which CONFIG_DEBUG_PAGEALLOC won't find.
>

If CONFIG_DEBUG_PAGEALLOC is selected, does a page freed by free_page(addr);
guaranted not being reused later ?

Anyway, the CONFIG_DEBUG_INITDATA was a temporary patch in order to track the
accesses to non possible cpus percpu data. So if you feel all such accesses
were cleaned, we can drop the patch...

Eric

2006-02-06 10:17:16

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH, V2] i386: instead of poisoning .init zone, change protection bits to force a fault

Eric Dumazet <[email protected]> wrote:
>
> Andrew Morton a ?crit :
>
> > But I'm inclined to drop the whole patch - I don't see how it can detect
> > any bugs which CONFIG_DEBUG_PAGEALLOC won't find.
> >
>
> If CONFIG_DEBUG_PAGEALLOC is selected, does a page freed by free_page(addr);
> guaranted not being reused later ?

No.

So with your patch, any access to the freed page will oops. With
CONFIG_DEBUG_PAGEALLOC it'll only oops if that page is presently free.

So if enough people run with CONFIG_DEBUG_PAGEALLOC for enough time, we'll
find the same bugs.

> Anyway, the CONFIG_DEBUG_INITDATA was a temporary patch in order to track the
> accesses to non possible cpus percpu data. So if you feel all such accesses
> were cleaned, we can drop the patch...

I think so..