Hi,
I'm trying revive an old debate here[1], though with a simpler approach than
was previously tried. This patch series implements a new option to sanitize
freed pages, a (very) small subset of what is done in PaX/grsecurity[3],
inspired by a previous submission [4].
There are a few different uses that this can cover:
- some cases of use-after-free could be detected (crashes), although this not
as efficient as KAsan/kmemcheck
- it can help with long-term memory consumption in an environment with
multiple VMs and Kernel Same-page Merging on the host. [2]
- finally, it can reduce infoleaks, although this is hard to measure.
The approach is voluntarily kept as simple as possible. A single configuration
option, no command line option, no sysctl nob. It can of course be changed,
although I'd be wary of runtime-configuration options that could be used for
races.
I haven't been able to measure a meaningful performance difference when
compiling a (in-cache) kernel; I'd be interested to see what difference it
makes with your particular workload/hardware (I suspect mine is CPU-bound on
this small laptop).
First patch fixes the hibernate use case which will load all the pages of the
restored kernel, and then jump into it, leaving the loader kernel pages hanging
around unclean. We use the free pages bitmap to know which pages should be
cleaned after restore.
Third patch is debug code that can be used to find issues if this feature fails
on your system. It shouldn't necessarily be merged.
Changes since v3:
- drop original first patch, it has been queued by Andrew in mmotm
- fix issue raised by Pavel Machek in hibernate patch
- checkpatch issue in third patch
Changes since v2:
- reorder patches to fix hibernate first
- update debug patch to use memchr_inv
- cc linux-pm and maintainers
Changes since v1:
- fix some issues raised by David Rientjes, Andi Kleen and PaX Team.
- add hibernate fix (third patch)
- add debug code, this is "just in case" someone has an issue with this
feature. Not sure if it should be merged.
[1] https://lwn.net/Articles/334747/
[2] https://staff.aist.go.jp/k.suzaki/EuroSec12-SUZAKI-revised2.pdf
[3] http://en.wikibooks.org/wiki/Grsecurity/Appendix/Grsecurity_and_PaX_Configuration_Options#Sanitize_all_freed_memory
[4] http://article.gmane.org/gmane.linux.kernel.mm/34398
Anisse Astier (3):
PM / Hibernate: prepare for SANITIZE_FREED_PAGES
mm/page_alloc.c: add config option to sanitize freed pages
mm: Add debug code for SANITIZE_FREED_PAGES
kernel/power/hibernate.c | 4 +++-
kernel/power/power.h | 2 ++
kernel/power/snapshot.c | 26 ++++++++++++++++++++++++++
mm/Kconfig | 22 ++++++++++++++++++++++
mm/page_alloc.c | 30 ++++++++++++++++++++++++++++++
5 files changed, 83 insertions(+), 1 deletion(-)
--
1.9.3
SANITIZE_FREED_PAGES feature relies on having all pages going through
the free_pages_prepare path in order to be cleared before being used. In
the hibernate use case, free pages will automagically appear in the
system without being cleared, left there by the loading kernel.
This patch will make sure free pages are cleared on resume; when we'll
enable SANITIZE_FREED_PAGES. We free the pages just after resume because
we can't do it later: going through any device resume code might
allocate some memory and invalidate the free pages bitmap.
Signed-off-by: Anisse Astier <[email protected]>
---
kernel/power/hibernate.c | 4 +++-
kernel/power/power.h | 2 ++
kernel/power/snapshot.c | 22 ++++++++++++++++++++++
3 files changed, 27 insertions(+), 1 deletion(-)
diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
index 2329daa..0a73126 100644
--- a/kernel/power/hibernate.c
+++ b/kernel/power/hibernate.c
@@ -305,9 +305,11 @@ static int create_image(int platform_mode)
error);
/* Restore control flow magically appears here */
restore_processor_state();
- if (!in_suspend)
+ if (!in_suspend) {
events_check_enabled = false;
+ clear_free_pages();
+ }
platform_leave(platform_mode);
Power_up:
diff --git a/kernel/power/power.h b/kernel/power/power.h
index ce9b832..6d2d7bf 100644
--- a/kernel/power/power.h
+++ b/kernel/power/power.h
@@ -92,6 +92,8 @@ extern int create_basic_memory_bitmaps(void);
extern void free_basic_memory_bitmaps(void);
extern int hibernate_preallocate_memory(void);
+extern void clear_free_pages(void);
+
/**
* Auxiliary structure used for reading the snapshot image data and
* metadata from and writing them to the list of page backup entries
diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c
index 5235dd4..2335130 100644
--- a/kernel/power/snapshot.c
+++ b/kernel/power/snapshot.c
@@ -1032,6 +1032,28 @@ void free_basic_memory_bitmaps(void)
pr_debug("PM: Basic memory bitmaps freed\n");
}
+void clear_free_pages(void)
+{
+#ifdef CONFIG_SANITIZE_FREED_PAGES
+ struct memory_bitmap *bm = free_pages_map;
+ unsigned long pfn;
+
+ if (WARN_ON(!(free_pages_map)))
+ return;
+
+ memory_bm_position_reset(bm);
+ pfn = memory_bm_next_pfn(bm);
+ while (pfn != BM_END_OF_MAP) {
+ if (pfn_valid(pfn))
+ clear_highpage(pfn_to_page(pfn));
+
+ pfn = memory_bm_next_pfn(bm);
+ }
+ memory_bm_position_reset(bm);
+ printk(KERN_INFO "PM: free pages cleared after restore\n");
+#endif /* SANITIZE_FREED_PAGES */
+}
+
/**
* snapshot_additional_pages - estimate the number of additional pages
* be needed for setting up the suspend image data structures for given
--
1.9.3
This new config option will sanitize all freed pages. This is a pretty
low-level change useful to track some cases of use-after-free, help
kernel same-page merging in VM environments, and counter a few info
leaks.
Signed-off-by: Anisse Astier <[email protected]>
---
mm/Kconfig | 12 ++++++++++++
mm/page_alloc.c | 12 ++++++++++++
2 files changed, 24 insertions(+)
diff --git a/mm/Kconfig b/mm/Kconfig
index 390214d..e9fb3bd 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -635,3 +635,15 @@ config MAX_STACK_SIZE_MB
changed to a smaller value in which case that is used.
A sane initial value is 80 MB.
+
+config SANITIZE_FREED_PAGES
+ bool "Sanitize memory pages after free"
+ default n
+ help
+ This option is used to make sure all pages freed are zeroed. This is
+ quite low-level and doesn't handle your slab buffers.
+ It has various applications, from preventing some info leaks to
+ helping kernel same-page merging in virtualised environments.
+ Depending on your workload it will greatly reduce performance.
+
+ If unsure, say N.
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4d5ce6e..c29e3a0 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -795,6 +795,12 @@ static bool free_pages_prepare(struct page *page, unsigned int order)
debug_check_no_obj_freed(page_address(page),
PAGE_SIZE << order);
}
+
+#ifdef CONFIG_SANITIZE_FREED_PAGES
+ for (i = 0; i < (1 << order); i++)
+ clear_highpage(page + i);
+#endif
+
arch_free_page(page, order);
kernel_map_pages(page, 1 << order, 0);
@@ -960,9 +966,15 @@ static int prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags,
kernel_map_pages(page, 1 << order, 1);
kasan_alloc_pages(page, order);
+#ifndef CONFIG_SANITIZE_FREED_PAGES
+ /* SANITIZE_FREED_PAGES relies implicitly on the fact that pages are
+ * cleared before use, so we don't need gfp zero in the default case
+ * because all pages go through the free_pages_prepare code path when
+ * switching from bootmem to the default allocator */
if (gfp_flags & __GFP_ZERO)
for (i = 0; i < (1 << order); i++)
clear_highpage(page + i);
+#endif
if (order && (gfp_flags & __GFP_COMP))
prep_compound_page(page, order);
--
1.9.3
Add debug code for sanitize freed pages to print status and verify pages
at alloc to make sure they're clean. It can be useful if you have
crashes when using SANITIZE_FREED_PAGES.
Signed-off-by: Anisse Astier <[email protected]>
---
kernel/power/snapshot.c | 8 ++++++--
mm/Kconfig | 10 ++++++++++
mm/page_alloc.c | 18 ++++++++++++++++++
3 files changed, 34 insertions(+), 2 deletions(-)
diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c
index 2335130..e10e736 100644
--- a/kernel/power/snapshot.c
+++ b/kernel/power/snapshot.c
@@ -1044,9 +1044,13 @@ void clear_free_pages(void)
memory_bm_position_reset(bm);
pfn = memory_bm_next_pfn(bm);
while (pfn != BM_END_OF_MAP) {
- if (pfn_valid(pfn))
+ if (pfn_valid(pfn)) {
+#ifdef CONFIG_SANITIZE_FREED_PAGES_DEBUG
+ printk(KERN_INFO "Clearing page %p\n",
+ page_address(pfn_to_page(pfn)));
+#endif
clear_highpage(pfn_to_page(pfn));
-
+ }
pfn = memory_bm_next_pfn(bm);
}
memory_bm_position_reset(bm);
diff --git a/mm/Kconfig b/mm/Kconfig
index e9fb3bd..95364f2 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -647,3 +647,13 @@ config SANITIZE_FREED_PAGES
Depending on your workload it will greatly reduce performance.
If unsure, say N.
+
+config SANITIZE_FREED_PAGES_DEBUG
+ bool "Debug sanitize pages feature"
+ default n
+ depends on SANITIZE_FREED_PAGES && DEBUG_KERNEL
+ help
+ This option adds some debugging code for the SANITIZE_FREED_PAGES
+ option, as well as verification code to ensure pages are really
+ zeroed. Don't enable unless you want to debug this feature.
+ If unsure, say N.
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c29e3a0..d76325ad 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -975,6 +975,24 @@ static int prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags,
for (i = 0; i < (1 << order); i++)
clear_highpage(page + i);
#endif
+#ifdef CONFIG_SANITIZE_FREED_PAGES_DEBUG
+ for (i = 0; i < (1 << order); i++) {
+ struct page *p = page + i;
+ void *kaddr = kmap_atomic(p);
+ void *found = memchr_inv(kaddr, 0, PAGE_SIZE);
+
+ kunmap_atomic(kaddr);
+
+ if (found) {
+ pr_err("page %p is not zero on alloc! %s\n",
+ page_address(p), (gfp_flags &
+ __GFP_ZERO) ?
+ "fixing." : "");
+ if (gfp_flags & __GFP_ZERO)
+ clear_highpage(p);
+ }
+ }
+#endif
if (order && (gfp_flags & __GFP_COMP))
prep_compound_page(page, order);
--
1.9.3
On Thursday, May 14, 2015 04:19:46 PM Anisse Astier wrote:
> SANITIZE_FREED_PAGES feature relies on having all pages going through
> the free_pages_prepare path in order to be cleared before being used. In
> the hibernate use case, free pages will automagically appear in the
> system without being cleared, left there by the loading kernel.
>
> This patch will make sure free pages are cleared on resume; when we'll
> enable SANITIZE_FREED_PAGES. We free the pages just after resume because
> we can't do it later: going through any device resume code might
> allocate some memory and invalidate the free pages bitmap.
>
> Signed-off-by: Anisse Astier <[email protected]>
> ---
> kernel/power/hibernate.c | 4 +++-
> kernel/power/power.h | 2 ++
> kernel/power/snapshot.c | 22 ++++++++++++++++++++++
> 3 files changed, 27 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
> index 2329daa..0a73126 100644
> --- a/kernel/power/hibernate.c
> +++ b/kernel/power/hibernate.c
> @@ -305,9 +305,11 @@ static int create_image(int platform_mode)
> error);
> /* Restore control flow magically appears here */
> restore_processor_state();
> - if (!in_suspend)
> + if (!in_suspend) {
> events_check_enabled = false;
>
> + clear_free_pages();
Again, why don't you do that at the swsusp_free() time?
> + }
> platform_leave(platform_mode);
>
> Power_up:
> diff --git a/kernel/power/power.h b/kernel/power/power.h
> index ce9b832..6d2d7bf 100644
> --- a/kernel/power/power.h
> +++ b/kernel/power/power.h
> @@ -92,6 +92,8 @@ extern int create_basic_memory_bitmaps(void);
> extern void free_basic_memory_bitmaps(void);
> extern int hibernate_preallocate_memory(void);
>
> +extern void clear_free_pages(void);
> +
> /**
> * Auxiliary structure used for reading the snapshot image data and
> * metadata from and writing them to the list of page backup entries
> diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c
> index 5235dd4..2335130 100644
> --- a/kernel/power/snapshot.c
> +++ b/kernel/power/snapshot.c
> @@ -1032,6 +1032,28 @@ void free_basic_memory_bitmaps(void)
> pr_debug("PM: Basic memory bitmaps freed\n");
> }
>
> +void clear_free_pages(void)
> +{
> +#ifdef CONFIG_SANITIZE_FREED_PAGES
> + struct memory_bitmap *bm = free_pages_map;
> + unsigned long pfn;
> +
> + if (WARN_ON(!(free_pages_map)))
One paren too many.
> + return;
> +
> + memory_bm_position_reset(bm);
> + pfn = memory_bm_next_pfn(bm);
> + while (pfn != BM_END_OF_MAP) {
> + if (pfn_valid(pfn))
> + clear_highpage(pfn_to_page(pfn));
Is clear_highpage() also fine for non-highmem pages?
> +
> + pfn = memory_bm_next_pfn(bm);
> + }
> + memory_bm_position_reset(bm);
> + printk(KERN_INFO "PM: free pages cleared after restore\n");
> +#endif /* SANITIZE_FREED_PAGES */
> +}
> +
> /**
> * snapshot_additional_pages - estimate the number of additional pages
> * be needed for setting up the suspend image data structures for given
>
--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
Hi Rafael,
Thanks for taking the time to review this.
On Sat, May 16, 2015 at 2:28 AM, Rafael J. Wysocki <[email protected]> wrote:
> On Thursday, May 14, 2015 04:19:46 PM Anisse Astier wrote:
>> SANITIZE_FREED_PAGES feature relies on having all pages going through
>> the free_pages_prepare path in order to be cleared before being used. In
>> the hibernate use case, free pages will automagically appear in the
>> system without being cleared, left there by the loading kernel.
>>
>> This patch will make sure free pages are cleared on resume; when we'll
>> enable SANITIZE_FREED_PAGES. We free the pages just after resume because
>> we can't do it later: going through any device resume code might
>> allocate some memory and invalidate the free pages bitmap.
>>
>> Signed-off-by: Anisse Astier <[email protected]>
>> ---
>> kernel/power/hibernate.c | 4 +++-
>> kernel/power/power.h | 2 ++
>> kernel/power/snapshot.c | 22 ++++++++++++++++++++++
>> 3 files changed, 27 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
>> index 2329daa..0a73126 100644
>> --- a/kernel/power/hibernate.c
>> +++ b/kernel/power/hibernate.c
>> @@ -305,9 +305,11 @@ static int create_image(int platform_mode)
>> error);
>> /* Restore control flow magically appears here */
>> restore_processor_state();
>> - if (!in_suspend)
>> + if (!in_suspend) {
>> events_check_enabled = false;
>>
>> + clear_free_pages();
>
> Again, why don't you do that at the swsusp_free() time?
Because it's too late, the kernel has already been through device
resume code, and the free pages bitmap isn't valid anymore; device
resume code might allocate memory, and we'd be clearing those pages as
well.
>> diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c
>> index 5235dd4..2335130 100644
>> --- a/kernel/power/snapshot.c
>> +++ b/kernel/power/snapshot.c
>> @@ -1032,6 +1032,28 @@ void free_basic_memory_bitmaps(void)
>> pr_debug("PM: Basic memory bitmaps freed\n");
>> }
>>
>> +void clear_free_pages(void)
>> +{
>> +#ifdef CONFIG_SANITIZE_FREED_PAGES
>> + struct memory_bitmap *bm = free_pages_map;
>> + unsigned long pfn;
>> +
>> + if (WARN_ON(!(free_pages_map)))
>
> One paren too many.
Thanks, will be fixed.
>
>> + return;
>> +
>> + memory_bm_position_reset(bm);
>> + pfn = memory_bm_next_pfn(bm);
>> + while (pfn != BM_END_OF_MAP) {
>> + if (pfn_valid(pfn))
>> + clear_highpage(pfn_to_page(pfn));
>
> Is clear_highpage() also fine for non-highmem pages?
>
Yes, it works fine for low memory too because kmap_atomic will just
return the page address if it's already mapped.
Regards,
Anisse
On Thu 2015-05-14 16:19:47, Anisse Astier wrote:
> This new config option will sanitize all freed pages. This is a pretty
> low-level change useful to track some cases of use-after-free, help
> kernel same-page merging in VM environments, and counter a few info
> leaks.
Could you document the "few info leaks"? We may want to fix them for
!SANTIZE_FREED_PAGES case, too...
Thanks,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On Mon, May 18, 2015 at 1:21 PM, Pavel Machek <[email protected]> wrote:
> On Thu 2015-05-14 16:19:47, Anisse Astier wrote:
>> This new config option will sanitize all freed pages. This is a pretty
>> low-level change useful to track some cases of use-after-free, help
>> kernel same-page merging in VM environments, and counter a few info
>> leaks.
>
> Could you document the "few info leaks"? We may want to fix them for
> !SANTIZE_FREED_PAGES case, too...
>
I wish I could; I'd be sending patches for those info leaks, too.
What I meant is that this feature can also be used as a general
protection mechanism against a certain class of info leaks; for
example, some drivers allocating pages that were previously used by
other subsystems, and then sending structures to userspace that
contain padding or uninitialized fields, leaking kernel pointers.
Having all pages cleared unconditionally can help a bit in some cases
(hence "a few"), but it's of course not an end-all solution.
I'll edit the commit and kconfig messages to be more precise.
Regards,
Anisse
On Mon 2015-05-18 14:41:19, Anisse Astier wrote:
> On Mon, May 18, 2015 at 1:21 PM, Pavel Machek <[email protected]> wrote:
> > On Thu 2015-05-14 16:19:47, Anisse Astier wrote:
> >> This new config option will sanitize all freed pages. This is a pretty
> >> low-level change useful to track some cases of use-after-free, help
> >> kernel same-page merging in VM environments, and counter a few info
> >> leaks.
> >
> > Could you document the "few info leaks"? We may want to fix them for
> > !SANTIZE_FREED_PAGES case, too...
> >
>
> I wish I could; I'd be sending patches for those info leaks, too.
>
> What I meant is that this feature can also be used as a general
> protection mechanism against a certain class of info leaks; for
> example, some drivers allocating pages that were previously used by
> other subsystems, and then sending structures to userspace that
> contain padding or uninitialized fields, leaking kernel pointers.
> Having all pages cleared unconditionally can help a bit in some cases
> (hence "a few"), but it's of course not an end-all solution.
Ok. So there is class of errors where this helps, but you are not
aware of any such errors in kernel, so you can't fix them... Right?
> I'll edit the commit and kconfig messages to be more precise.
Thanks,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On Mon, May 18, 2015 at 3:02 PM, Pavel Machek <[email protected]> wrote:
>
> Ok. So there is class of errors where this helps, but you are not
> aware of any such errors in kernel, so you can't fix them... Right?
Correct.
2015-05-18 21:04 GMT+08:00 Anisse Astier <[email protected]>:
> On Mon, May 18, 2015 at 3:02 PM, Pavel Machek <[email protected]> wrote:
>>
>> Ok. So there is class of errors where this helps, but you are not
>> aware of any such errors in kernel, so you can't fix them... Right?
>
> Correct.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
i feel your patch is the same as CONFIG_DEBUG_PAGEALLOC ,
the difference is that CONFIG_DEBUG_PAGEALLOC will clear
page to a magic number, while your patch will
clear to zero,
do we really need this duplicated function ?
Thanks
On Thu, May 14, 2015 at 04:19:45PM +0200, Anisse Astier wrote:
> Hi,
>
> I'm trying revive an old debate here[1], though with a simpler approach than
> was previously tried. This patch series implements a new option to sanitize
> freed pages, a (very) small subset of what is done in PaX/grsecurity[3],
> inspired by a previous submission [4].
>
> There are a few different uses that this can cover:
> - some cases of use-after-free could be detected (crashes), although this not
> as efficient as KAsan/kmemcheck
They're not detected, they're hidden. I'm currently seeing problems with
a glibc update in userspace where applications are crashing because glibc
returns buffers with uninitialised data that would previously have been
zero. In every case so far, they were application bugs that need fixing.
Having the kernel crash due to uninitialised memory use is bad but hiding
it is not better.
> - it can help with long-term memory consumption in an environment with
> multiple VMs and Kernel Same-page Merging on the host. [2]
This is not quantified but a better way of dealing with that problem would
be for a guest to signal to the host when a page is really free. I vaguely
recall that s390 has some hinting of this nature. While I accept there
may be some benefits in some cases, I think it's a weak justification for
always zeroing pages on free.
> - finally, it can reduce infoleaks, although this is hard to measure.
>
It obscures them.
That is leaving aside the fact that this has to be enabled at kconfig time
which is unlikely to happen on a distribution config. Not many workloads
depend on the freed path as such but streaming readers are one once the
files are larger than memory.
> The approach is voluntarily kept as simple as possible. A single configuration
> option, no command line option, no sysctl nob. It can of course be changed,
> although I'd be wary of runtime-configuration options that could be used for
> races.
>
> I haven't been able to measure a meaningful performance difference when
> compiling a (in-cache) kernel; I'd be interested to see what difference it
> makes with your particular workload/hardware (I suspect mine is CPU-bound on
> this small laptop).
>
What did you use to determine this and did you check if it was hitting
the free paths heavily while it's running? It can be very easy to hide
the cost of something like this if all the frees happen at exit.
Overall, I'm not a big fan of this series. I think it would have made more
sense to use non-temporal cleaning on pages if they are freed by kswapd or
on a non-critical path like exit and then track if the page was already
freed during allocation. Then add a runtime sysctl to make that strict
and force all zeroing on all frees.
As it is, I think it'll have very few users because of the need to
enable it at kernel build time and then incur an unavoidable penalty.
--
Mel Gorman
SUSE Labs
> may be some benefits in some cases, I think it's a weak justification for
> always zeroing pages on free.
There are much better reasons for zero on free, including the improved
latency when pages are faulted in. For virtualisation there are two
interfaces that would probably make more sense
1. 'This page is of no further interest, you may fault it back in
as random data'
2. 'This page is discardable, if I touch it *and* you have
discarded it then please serve me an exception, if you've not discarded
it them give it me back"
If I remember my 390 bits the S/390 goes further including the ability to
say "if I think this page is in memory but in fact the hypervisor is
going to page it off disc then throw me an exception so I can do clever
things with the delay time"
> > - finally, it can reduce infoleaks, although this is hard to measure.
> >
> It obscures them.
Actually not. If you are doing debug work you zero on free and check for
mysterious non zeroing before reusing the page. Without that its a win in
the sense it wipes material (but crypto does that anyway), but it
replaces that with the risk of a zeroed page being scibbled upon by the
kernel and leaking kernel scribbles into allocated user pages.
Alan
On Tue, May 19, 2015 at 02:35:40PM +0100, One Thousand Gnomes wrote:
> > may be some benefits in some cases, I think it's a weak justification for
> > always zeroing pages on free.
>
> There are much better reasons for zero on free, including the improved
> latency when pages are faulted in.
Not necessarily. Not all pages are currently zero'd on allocation so we're
trading sometimes zeroing a page at with always zeroing a page on freee. It
might look good on a benchmark that measures the fault cost and not the
free but it's not a universal win.
> For virtualisation there are two
> interfaces that would probably make more sense
>
> 1. 'This page is of no further interest, you may fault it back in
> as random data'
>
> 2. 'This page is discardable, if I touch it *and* you have
> discarded it then please serve me an exception, if you've not discarded
> it them give it me back"
>
> If I remember my 390 bits the S/390 goes further including the ability to
> say "if I think this page is in memory but in fact the hypervisor is
> going to page it off disc then throw me an exception so I can do clever
> things with the delay time"
>
I think it's also used to grant another VM the page while it's not in
use. Overall though, there are better ideas for shrinking VM memory usage
than zeroing everything and depending on KSM to detect it.
> > > - finally, it can reduce infoleaks, although this is hard to measure.
> > >
> > It obscures them.
>
> Actually not. If you are doing debug work you zero on free and check for
> mysterious non zeroing before reusing the page. Without that its a win in
> the sense it wipes material (but crypto does that anyway), but it
> replaces that with the risk of a zeroed page being scibbled upon by the
> kernel and leaking kernel scribbles into allocated user pages.
>
Ok, I see now that we just crash differently. Previously the zero
on allocation would prevent a leak. With this applied the __GFP_ZERO is
ignored and so different classes of bugs can be detected. Not necessarily
better but I accept my point was wrong.
--
Mel Gorman
SUSE Labs
On 19 May 2015 at 13:46, Mel Gorman wrote:
> On Thu, May 14, 2015 at 04:19:45PM +0200, Anisse Astier wrote:
> > Hi,
> >
> > I'm trying revive an old debate here[1], though with a simpler approach than
> > was previously tried. This patch series implements a new option to sanitize
> > freed pages, a (very) small subset of what is done in PaX/grsecurity[3],
> > inspired by a previous submission [4].
> >
> > There are a few different uses that this can cover:
> > - some cases of use-after-free could be detected (crashes), although this not
> > as efficient as KAsan/kmemcheck
>
> They're not detected, they're hidden.
this is a qualitative argument that cuts both ways. namely, you could say
that uninitialized memory does *not* trigger any bad behaviour exactly
because the previous content acts as valid data (say, a valid pointer)
whereas a null dereference would pretty much always crash (both in userland
and the kernel). not to mention that a kernel null deref is no longer an
exploitable bug in many/most situations which can't be said of arbitrary
uninitialized (read: potentially attacker controlled) values.
that said, i always considered this aspect of page sanitization as a
(potentially useful) side effect, not the design goal.
> > - finally, it can reduce infoleaks, although this is hard to measure.
> >
>
> It obscures them.
i don't understand, what is being obscured exactly? maybe the term 'infoleaks'
is ambiguous, in case of page sanitization it refers to the reduction of data
lifetime (mostly userland anonymous memory, as per the original design). if
you were thinking of kernel->userland kind of leaks then i'd say that page
sanitization has little effect there because all the bugs i can think of were
not leaking from freed memory (where sanitization would have prevented the
leak).
On Monday, May 18, 2015 12:23:00 PM Anisse Astier wrote:
> Hi Rafael,
>
> Thanks for taking the time to review this.
>
> On Sat, May 16, 2015 at 2:28 AM, Rafael J. Wysocki <[email protected]> wrote:
> > On Thursday, May 14, 2015 04:19:46 PM Anisse Astier wrote:
> >> SANITIZE_FREED_PAGES feature relies on having all pages going through
> >> the free_pages_prepare path in order to be cleared before being used. In
> >> the hibernate use case, free pages will automagically appear in the
> >> system without being cleared, left there by the loading kernel.
> >>
> >> This patch will make sure free pages are cleared on resume; when we'll
> >> enable SANITIZE_FREED_PAGES. We free the pages just after resume because
> >> we can't do it later: going through any device resume code might
> >> allocate some memory and invalidate the free pages bitmap.
> >>
> >> Signed-off-by: Anisse Astier <[email protected]>
> >> ---
> >> kernel/power/hibernate.c | 4 +++-
> >> kernel/power/power.h | 2 ++
> >> kernel/power/snapshot.c | 22 ++++++++++++++++++++++
> >> 3 files changed, 27 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
> >> index 2329daa..0a73126 100644
> >> --- a/kernel/power/hibernate.c
> >> +++ b/kernel/power/hibernate.c
> >> @@ -305,9 +305,11 @@ static int create_image(int platform_mode)
> >> error);
> >> /* Restore control flow magically appears here */
> >> restore_processor_state();
> >> - if (!in_suspend)
> >> + if (!in_suspend) {
> >> events_check_enabled = false;
> >>
> >> + clear_free_pages();
> >
> > Again, why don't you do that at the swsusp_free() time?
>
> Because it's too late, the kernel has already been through device
> resume code, and the free pages bitmap isn't valid anymore; device
> resume code might allocate memory, and we'd be clearing those pages as
> well.
Are we both talking about the same thing?
swsusp_free() is *the* function that, well, frees all the pages allocated
by the hibernate core, so how isn't the free pages bitmap valid when it is
called?
Why don't you add the clearing in there, right at the spot when the pages
are actually freed?
Moreover, why is the resume code path the only one where freed pages need to
be sanitized?
--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
On 20 May 2015 at 1:46, Rafael J. Wysocki wrote:
> swsusp_free() is *the* function that, well, frees all the pages allocated
> by the hibernate core, so how isn't the free pages bitmap valid when it is
> called?
>
> Why don't you add the clearing in there, right at the spot when the pages
> are actually freed?
actually swsusp_free uses __free_page which in turn will go through the
page sanitization logic so there's no need for extra sanitization. that
said ...
> Moreover, why is the resume code path the only one where freed pages need to
> be sanitized?
... i had a bug report before (http://marc.info/?l=linux-pm&m=132871433416256)
which is why i asked Anisse to figure this out before upstreaming the feature.
i've also asked him already to explain why his approach is the proper fix for
the problem (which should include the description of the root cause as a start)
but he hasn't answered that yet.
anyway, the big question is how there can be free memory pages after resume
which are not sanitized. now i have no idea about the hibernation logic but
i assume that it doesn't save/restore free pages so the question is how the
kernel gets to learn about these free pages during resume and whether there's
a path where __free_page() or some other wrapper around free_pages_prepare()
doesn't get called at all.
On Wed, May 20, 2015 at 1:46 AM, Rafael J. Wysocki <[email protected]> wrote:
> On Monday, May 18, 2015 12:23:00 PM Anisse Astier wrote:
>> Hi Rafael,
>>
>> Thanks for taking the time to review this.
>>
>> On Sat, May 16, 2015 at 2:28 AM, Rafael J. Wysocki <[email protected]> wrote:
>> > On Thursday, May 14, 2015 04:19:46 PM Anisse Astier wrote:
>> >> SANITIZE_FREED_PAGES feature relies on having all pages going through
>> >> the free_pages_prepare path in order to be cleared before being used. In
>> >> the hibernate use case, free pages will automagically appear in the
>> >> system without being cleared, left there by the loading kernel.
>> >>
>> >> This patch will make sure free pages are cleared on resume; when we'll
>> >> enable SANITIZE_FREED_PAGES. We free the pages just after resume because
>> >> we can't do it later: going through any device resume code might
>> >> allocate some memory and invalidate the free pages bitmap.
>> >>
>> >> Signed-off-by: Anisse Astier <[email protected]>
>> >> ---
>> >> kernel/power/hibernate.c | 4 +++-
>> >> kernel/power/power.h | 2 ++
>> >> kernel/power/snapshot.c | 22 ++++++++++++++++++++++
>> >> 3 files changed, 27 insertions(+), 1 deletion(-)
>> >>
>> >> diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
>> >> index 2329daa..0a73126 100644
>> >> --- a/kernel/power/hibernate.c
>> >> +++ b/kernel/power/hibernate.c
>> >> @@ -305,9 +305,11 @@ static int create_image(int platform_mode)
>> >> error);
>> >> /* Restore control flow magically appears here */
>> >> restore_processor_state();
>> >> - if (!in_suspend)
>> >> + if (!in_suspend) {
>> >> events_check_enabled = false;
>> >>
>> >> + clear_free_pages();
>> >
>> > Again, why don't you do that at the swsusp_free() time?
>>
>> Because it's too late, the kernel has already been through device
>> resume code, and the free pages bitmap isn't valid anymore; device
>> resume code might allocate memory, and we'd be clearing those pages as
>> well.
>
> Are we both talking about the same thing?
I think we aren't talking about the same thing. The free_pages_map is
used for all free pages, plus the memory used for suspend (when it
intersects with forbidden_page_map).
We don't need to clear the memory in swsusp_free, because it already
calls the __free_page code path that clears free pages. What we need,
is to clear pages left by the loader kernel before it jumped into the
resumed kernel.
>
> swsusp_free() is *the* function that, well, frees all the pages allocated
> by the hibernate core, so how isn't the free pages bitmap valid when it is
> called?
Because swsusp_free will only free the intersection of free_pages_map
and forbidden_pages_map. This intersection is the set of pages used
for suspend, and they are in fact allocated, not free. The rest of
free_pages_map are the real free pages, but as I said, once we resume,
it will also include the pages left hanging by the loading kernel.
In addition, the free_pages_map contains a reference to all the free
pages at the moment of suspend, but this reference isn't valid by the
time we reach swsusp_free(), because we've been through many drivers'
resume by then, and those can allocate memory; they might even want
already zeroed memory, which they won't get because of the __GFP_ZERO
optimization in the next patch; we just expect all free pages to be
already cleared, but pages from the loading kernel aren't.
>
> Why don't you add the clearing in there, right at the spot when the pages
> are actually freed?
>
> Moreover, why is the resume code path the only one where freed pages need to
> be sanitized?
Because all pages in the system go through the page freeing path
(which clears them) when leaving the (no)bootmem allocator.
It's a bit long, so I hope that I'm clear in my explanation.
Regards,
Anisse
On Wed, May 20, 2015 at 1:45 PM, PaX Team <[email protected]> wrote:
>
>> Moreover, why is the resume code path the only one where freed pages need to
>> be sanitized?
>
> ... i had a bug report before (http://marc.info/?l=linux-pm&m=132871433416256)
> which is why i asked Anisse to figure this out before upstreaming the feature.
> i've also asked him already to explain why his approach is the proper fix for
> the problem (which should include the description of the root cause as a start)
> but he hasn't answered that yet.
>
> anyway, the big question is how there can be free memory pages after resume
> which are not sanitized. now i have no idea about the hibernation logic but
> i assume that it doesn't save/restore free pages so the question is how the
> kernel gets to learn about these free pages during resume and whether there's
> a path where __free_page() or some other wrapper around free_pages_prepare()
> doesn't get called at all.
In my opinion the free pages left are those used by the loading kernel.
If I understand correctly, a suspend (hibernate) image contains *all*
the memory necessary for the OS to work; so when you restore it, you
restore it all, page tables, and kernel code section included. So when
the kernel does a hibernate restoration, it loads it all the pages
into memory, then architecture-specific code will jump into the new
"resumed" kernel by restoring page table entries and CPU context. When
it does that, it leaves the "loader" kernel memory hanging; this
memory is seen as free pages by the resumed kernel, but it isn't
cleared.
Rafael, am I getting something wrong on the hibernation resume process
? What do you think of this analysis ?
Regards,
Anisse
On Tue, May 19, 2015 at 2:46 PM, Mel Gorman <[email protected]> wrote:
> On Thu, May 14, 2015 at 04:19:45PM +0200, Anisse Astier wrote:
>> Hi,
>>
>> - it can help with long-term memory consumption in an environment with
>> multiple VMs and Kernel Same-page Merging on the host. [2]
>
> This is not quantified but a better way of dealing with that problem would
> be for a guest to signal to the host when a page is really free. I vaguely
> recall that s390 has some hinting of this nature. While I accept there
> may be some benefits in some cases, I think it's a weak justification for
> always zeroing pages on free.
Sure, there's always a better way, like virtio's ballooning. This
approach has the merit of being much simpler to use.
>> I haven't been able to measure a meaningful performance difference when
>> compiling a (in-cache) kernel; I'd be interested to see what difference it
>> makes with your particular workload/hardware (I suspect mine is CPU-bound on
>> this small laptop).
>>
>
> What did you use to determine this and did you check if it was hitting
> the free paths heavily while it's running? It can be very easy to hide
> the cost of something like this if all the frees happen at exit.
I'll admit that it's lacking numbers; I've chosen the simplest
benchmark available (kernel compiles), and couldn't measure a
difference in overall time, but I didn't go as far as using perf to
find where the hot path is.
Another way of thinking about this is just moving the clearing from
allocation to freeing. Userland memory allocated through anonymous
mapping is already cleared on alloc, so this will make allocation
faster. It's a different kind of tradeoff.
Regards,
Anisse
On Tue, May 19, 2015 at 3:58 AM, yalin wang <[email protected]> wrote:
> 2015-05-18 21:04 GMT+08:00 Anisse Astier <[email protected]>:
>> On Mon, May 18, 2015 at 3:02 PM, Pavel Machek <[email protected]> wrote:
>>>
>>> Ok. So there is class of errors where this helps, but you are not
>>> aware of any such errors in kernel, so you can't fix them... Right?
>>
>> Correct.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at http://www.tux.org/lkml/
> i feel your patch is the same as CONFIG_DEBUG_PAGEALLOC ,
> the difference is that CONFIG_DEBUG_PAGEALLOC will clear
> page to a magic number, while your patch will
> clear to zero,
> do we really need this duplicated function ?
It's different because DEBUG_PAGEALLOC will only use page poisoning on
certain architectures, and clearing a page to a magic number doesn't
allow to optimize alloc with _GFP_ZERO.
Regards,
Anisse
On Wednesday, May 20, 2015 02:07:36 PM Anisse Astier wrote:
> On Wed, May 20, 2015 at 1:45 PM, PaX Team <[email protected]> wrote:
> >
> >> Moreover, why is the resume code path the only one where freed pages need to
> >> be sanitized?
> >
> > ... i had a bug report before (http://marc.info/?l=linux-pm&m=132871433416256)
> > which is why i asked Anisse to figure this out before upstreaming the feature.
> > i've also asked him already to explain why his approach is the proper fix for
> > the problem (which should include the description of the root cause as a start)
> > but he hasn't answered that yet.
> >
> > anyway, the big question is how there can be free memory pages after resume
> > which are not sanitized. now i have no idea about the hibernation logic but
> > i assume that it doesn't save/restore free pages so the question is how the
> > kernel gets to learn about these free pages during resume and whether there's
> > a path where __free_page() or some other wrapper around free_pages_prepare()
> > doesn't get called at all.
>
> In my opinion the free pages left are those used by the loading kernel.
Well, that is not a matter of opinion really, but it's actually correct.
> If I understand correctly, a suspend (hibernate) image contains *all*
> the memory necessary for the OS to work; so when you restore it, you
> restore it all, page tables, and kernel code section included. So when
> the kernel does a hibernate restoration, it loads it all the pages
> into memory, then architecture-specific code will jump into the new
> "resumed" kernel by restoring page table entries and CPU context. When
> it does that, it leaves the "loader" kernel memory hanging; this
> memory is seen as free pages by the resumed kernel, but it isn't
> cleared.
Correct, except that some of the boot kernel's memory will be overwritten
by the image kernel and all.
> Rafael, am I getting something wrong on the hibernation resume process
> ? What do you think of this analysis ?
That's more-or-less what's happening. IOW, after hibernation and resume you
may see stuff in pages that were previously all zeros as long as they are
regarded as free by the image kernel. The stuff in there is all garbage from
its perspective though.
--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.