2015-11-13 02:23:59

by Joonsoo Kim

[permalink] [raw]
Subject: [PATCH 1/3] mm/page_isolation: return last tested pfn rather than failure indicator

This is preparation step to report test failed pfn in new tracepoint
to analyze cma allocation failure problem. There is no functional change
in this patch.

Acked-by: David Rientjes <[email protected]>
Acked-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Joonsoo Kim <[email protected]>
---
mm/page_isolation.c | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 4568fd5..029a171 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -212,7 +212,7 @@ int undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
*
* Returns 1 if all pages in the range are isolated.
*/
-static int
+static unsigned long
__test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn,
bool skip_hwpoisoned_pages)
{
@@ -237,9 +237,8 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn,
else
break;
}
- if (pfn < end_pfn)
- return 0;
- return 1;
+
+ return pfn;
}

int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
@@ -248,7 +247,6 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
unsigned long pfn, flags;
struct page *page;
struct zone *zone;
- int ret;

/*
* Note: pageblock_nr_pages != MAX_ORDER. Then, chunks of free pages
@@ -266,10 +264,11 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
/* Check all pages are free or marked as ISOLATED */
zone = page_zone(page);
spin_lock_irqsave(&zone->lock, flags);
- ret = __test_page_isolated_in_pageblock(start_pfn, end_pfn,
+ pfn = __test_page_isolated_in_pageblock(start_pfn, end_pfn,
skip_hwpoisoned_pages);
spin_unlock_irqrestore(&zone->lock, flags);
- return ret ? 0 : -EBUSY;
+
+ return pfn < end_pfn ? -EBUSY : 0;
}

struct page *alloc_migrate_target(struct page *page, unsigned long private,
--
1.9.1


2015-11-13 02:24:04

by Joonsoo Kim

[permalink] [raw]
Subject: [PATCH 2/3] mm/page_isolation: add new tracepoint, test_pages_isolated

cma allocation should be guranteeded to succeed, but, sometimes,
it could be failed in current implementation. To track down
the problem, we need to know which page is problematic and
this new tracepoint will report it.

Acked-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Joonsoo Kim <[email protected]>
---
include/trace/events/page_isolation.h | 38 +++++++++++++++++++++++++++++++++++
mm/page_isolation.c | 5 +++++
2 files changed, 43 insertions(+)
create mode 100644 include/trace/events/page_isolation.h

diff --git a/include/trace/events/page_isolation.h b/include/trace/events/page_isolation.h
new file mode 100644
index 0000000..6fb6440
--- /dev/null
+++ b/include/trace/events/page_isolation.h
@@ -0,0 +1,38 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM page_isolation
+
+#if !defined(_TRACE_PAGE_ISOLATION_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_PAGE_ISOLATION_H
+
+#include <linux/tracepoint.h>
+
+TRACE_EVENT(test_pages_isolated,
+
+ TP_PROTO(
+ unsigned long start_pfn,
+ unsigned long end_pfn,
+ unsigned long fin_pfn),
+
+ TP_ARGS(start_pfn, end_pfn, fin_pfn),
+
+ TP_STRUCT__entry(
+ __field(unsigned long, start_pfn)
+ __field(unsigned long, end_pfn)
+ __field(unsigned long, fin_pfn)
+ ),
+
+ TP_fast_assign(
+ __entry->start_pfn = start_pfn;
+ __entry->end_pfn = end_pfn;
+ __entry->fin_pfn = fin_pfn;
+ ),
+
+ TP_printk("start_pfn=0x%lx end_pfn=0x%lx fin_pfn=0x%lx ret=%s",
+ __entry->start_pfn, __entry->end_pfn, __entry->fin_pfn,
+ __entry->end_pfn == __entry->fin_pfn ? "success" : "fail")
+);
+
+#endif /* _TRACE_PAGE_ISOLATION_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 029a171..f484b93 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -9,6 +9,9 @@
#include <linux/hugetlb.h>
#include "internal.h"

+#define CREATE_TRACE_POINTS
+#include <trace/events/page_isolation.h>
+
static int set_migratetype_isolate(struct page *page,
bool skip_hwpoisoned_pages)
{
@@ -268,6 +271,8 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
skip_hwpoisoned_pages);
spin_unlock_irqrestore(&zone->lock, flags);

+ trace_test_pages_isolated(start_pfn, end_pfn, pfn);
+
return pfn < end_pfn ? -EBUSY : 0;
}

--
1.9.1

2015-11-13 02:24:07

by Joonsoo Kim

[permalink] [raw]
Subject: [PATCH 3/3] mm/cma: always check which page cause allocation failure

Now, we have tracepoint in test_pages_isolated() to notify
pfn which cannot be isolated. But, in alloc_contig_range(),
some error path doesn't call test_pages_isolated() so it's still
hard to know exact pfn that causes allocation failure.

This patch change this situation by calling test_pages_isolated()
in almost error path. In allocation failure case, some overhead
is added by this change, but, allocation failure is really rare
event so it would not matter.

In fatal signal pending case, we don't call test_pages_isolated()
because this failure is intentional one.

Signed-off-by: Joonsoo Kim <[email protected]>
---
mm/page_alloc.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d89960d..e78d78f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6756,8 +6756,12 @@ int alloc_contig_range(unsigned long start, unsigned long end,
if (ret)
return ret;

+ /*
+ * In case of -EBUSY, we'd like to know which page causes problem.
+ * So, just fall through. We will check it in test_pages_isolated().
+ */
ret = __alloc_contig_migrate_range(&cc, start, end);
- if (ret)
+ if (ret && ret != -EBUSY)
goto done;

/*
@@ -6784,8 +6788,8 @@ int alloc_contig_range(unsigned long start, unsigned long end,
outer_start = start;
while (!PageBuddy(pfn_to_page(outer_start))) {
if (++order >= MAX_ORDER) {
- ret = -EBUSY;
- goto done;
+ outer_start = start;
+ break;
}
outer_start &= ~0UL << order;
}
--
1.9.1

2015-11-13 22:51:31

by David Rientjes

[permalink] [raw]
Subject: Re: [PATCH 2/3] mm/page_isolation: add new tracepoint, test_pages_isolated

On Fri, 13 Nov 2015, Joonsoo Kim wrote:

> cma allocation should be guranteeded to succeed, but, sometimes,
> it could be failed in current implementation. To track down
> the problem, we need to know which page is problematic and
> this new tracepoint will report it.
>
> Acked-by: Michal Nazarewicz <[email protected]>
> Signed-off-by: Joonsoo Kim <[email protected]>

Acked-by: David Rientjes <[email protected]>

Thanks for generalizing this!

2015-11-19 23:34:13

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 2/3] mm/page_isolation: add new tracepoint, test_pages_isolated

On Fri, 13 Nov 2015 11:23:47 +0900 Joonsoo Kim <[email protected]> wrote:

> cma allocation should be guranteeded to succeed, but, sometimes,
> it could be failed in current implementation. To track down
> the problem, we need to know which page is problematic and
> this new tracepoint will report it.

akpm3:/usr/src/25> size mm/page_isolation.o
text data bss dec hex filename
2972 112 1096 4180 1054 mm/page_isolation.o-before
4608 570 1840 7018 1b6a mm/page_isolation.o-after

This seems an excessive amount of bloat for one little tracepoint. Is
this expected and normal (and acceptable)?

2015-11-20 06:21:44

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [PATCH 2/3] mm/page_isolation: add new tracepoint, test_pages_isolated

On Thu, Nov 19, 2015 at 03:34:11PM -0800, Andrew Morton wrote:
> On Fri, 13 Nov 2015 11:23:47 +0900 Joonsoo Kim <[email protected]> wrote:
>
> > cma allocation should be guranteeded to succeed, but, sometimes,
> > it could be failed in current implementation. To track down
> > the problem, we need to know which page is problematic and
> > this new tracepoint will report it.
>
> akpm3:/usr/src/25> size mm/page_isolation.o
> text data bss dec hex filename
> 2972 112 1096 4180 1054 mm/page_isolation.o-before
> 4608 570 1840 7018 1b6a mm/page_isolation.o-after
>
> This seems an excessive amount of bloat for one little tracepoint. Is
> this expected and normal (and acceptable)?

Hello,

I checked bloat on other tracepoints and found that it's normal.
It takes 1KB more per tracepoint.

Thanks.

2015-11-24 14:57:11

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH 1/3] mm/page_isolation: return last tested pfn rather than failure indicator

On 11/13/2015 03:23 AM, Joonsoo Kim wrote:
> This is preparation step to report test failed pfn in new tracepoint
> to analyze cma allocation failure problem. There is no functional change
> in this patch.
>
> Acked-by: David Rientjes <[email protected]>
> Acked-by: Michal Nazarewicz <[email protected]>
> Signed-off-by: Joonsoo Kim <[email protected]>

Acked-by: Vlastimil Babka <[email protected]>

2015-11-24 14:57:24

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH 2/3] mm/page_isolation: add new tracepoint, test_pages_isolated

On 11/13/2015 03:23 AM, Joonsoo Kim wrote:
> cma allocation should be guranteeded to succeed, but, sometimes,
> it could be failed in current implementation. To track down
> the problem, we need to know which page is problematic and
> this new tracepoint will report it.
>
> Acked-by: Michal Nazarewicz <[email protected]>
> Signed-off-by: Joonsoo Kim <[email protected]>

Acked-by: Vlastimil Babka <[email protected]>

2015-11-24 15:28:00

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH 3/3] mm/cma: always check which page cause allocation failure

On 11/13/2015 03:23 AM, Joonsoo Kim wrote:
> Now, we have tracepoint in test_pages_isolated() to notify
> pfn which cannot be isolated. But, in alloc_contig_range(),
> some error path doesn't call test_pages_isolated() so it's still
> hard to know exact pfn that causes allocation failure.
>
> This patch change this situation by calling test_pages_isolated()
> in almost error path. In allocation failure case, some overhead
> is added by this change, but, allocation failure is really rare
> event so it would not matter.
>
> In fatal signal pending case, we don't call test_pages_isolated()
> because this failure is intentional one.
>
> Signed-off-by: Joonsoo Kim <[email protected]>
> ---
> mm/page_alloc.c | 10 +++++++---
> 1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index d89960d..e78d78f 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -6756,8 +6756,12 @@ int alloc_contig_range(unsigned long start, unsigned long end,
> if (ret)
> return ret;
>
> + /*
> + * In case of -EBUSY, we'd like to know which page causes problem.
> + * So, just fall through. We will check it in test_pages_isolated().
> + */
> ret = __alloc_contig_migrate_range(&cc, start, end);
> - if (ret)
> + if (ret && ret != -EBUSY)
> goto done;
>
> /*
> @@ -6784,8 +6788,8 @@ int alloc_contig_range(unsigned long start, unsigned long end,
> outer_start = start;
> while (!PageBuddy(pfn_to_page(outer_start))) {
> if (++order >= MAX_ORDER) {
> - ret = -EBUSY;
> - goto done;
> + outer_start = start;
> + break;
> }
> outer_start &= ~0UL << order;
> }

Ugh isn't this crazy loop broken? Shouldn't it test that the buddy it
finds has order high enough? e.g.:
buddy = pfn_to_page(outer_start)
outer_start + (1UL << page_order(buddy)) > start

Otherwise you might end up with something like:
- at "start" there's a page that CMA failed to freed
- at "start-1" there's another non-buddy page
- at "start-3" there's an order-1 buddy, so you set outer_start to start-3
- test_pages_isolated() will complain (via the new tracepoint) about pfn
of start-1, but actually you would like it to complain about pfn of "start"?

So the loop has been broken before your patch, but it didn't matter,
just potentially wasted some time by picking bogus outer_start. But now
your tracepoint will give you weird results.

2015-11-25 02:38:51

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [PATCH 3/3] mm/cma: always check which page cause allocation failure

On Tue, Nov 24, 2015 at 04:27:56PM +0100, Vlastimil Babka wrote:
> On 11/13/2015 03:23 AM, Joonsoo Kim wrote:
> >Now, we have tracepoint in test_pages_isolated() to notify
> >pfn which cannot be isolated. But, in alloc_contig_range(),
> >some error path doesn't call test_pages_isolated() so it's still
> >hard to know exact pfn that causes allocation failure.
> >
> >This patch change this situation by calling test_pages_isolated()
> >in almost error path. In allocation failure case, some overhead
> >is added by this change, but, allocation failure is really rare
> >event so it would not matter.
> >
> >In fatal signal pending case, we don't call test_pages_isolated()
> >because this failure is intentional one.
> >
> >Signed-off-by: Joonsoo Kim <[email protected]>
> >---
> > mm/page_alloc.c | 10 +++++++---
> > 1 file changed, 7 insertions(+), 3 deletions(-)
> >
> >diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >index d89960d..e78d78f 100644
> >--- a/mm/page_alloc.c
> >+++ b/mm/page_alloc.c
> >@@ -6756,8 +6756,12 @@ int alloc_contig_range(unsigned long start, unsigned long end,
> > if (ret)
> > return ret;
> >
> >+ /*
> >+ * In case of -EBUSY, we'd like to know which page causes problem.
> >+ * So, just fall through. We will check it in test_pages_isolated().
> >+ */
> > ret = __alloc_contig_migrate_range(&cc, start, end);
> >- if (ret)
> >+ if (ret && ret != -EBUSY)
> > goto done;
> >
> > /*
> >@@ -6784,8 +6788,8 @@ int alloc_contig_range(unsigned long start, unsigned long end,
> > outer_start = start;
> > while (!PageBuddy(pfn_to_page(outer_start))) {
> > if (++order >= MAX_ORDER) {
> >- ret = -EBUSY;
> >- goto done;
> >+ outer_start = start;
> >+ break;
> > }
> > outer_start &= ~0UL << order;
> > }
>
> Ugh isn't this crazy loop broken? Shouldn't it test that the buddy
> it finds has order high enough? e.g.:
> buddy = pfn_to_page(outer_start)
> outer_start + (1UL << page_order(buddy)) > start
>
> Otherwise you might end up with something like:
> - at "start" there's a page that CMA failed to freed
> - at "start-1" there's another non-buddy page
> - at "start-3" there's an order-1 buddy, so you set outer_start to start-3
> - test_pages_isolated() will complain (via the new tracepoint) about
> pfn of start-1, but actually you would like it to complain about pfn
> of "start"?
>
> So the loop has been broken before your patch, but it didn't matter,
> just potentially wasted some time by picking bogus outer_start. But
> now your tracepoint will give you weird results.

Good catch. I will fix it.

Thanks.

2015-11-25 05:32:56

by Joonsoo Kim

[permalink] [raw]
Subject: [PATCH v2] mm/cma: always check which page cause allocation failure

Now, we have tracepoint in test_pages_isolated() to notify
pfn which cannot be isolated. But, in alloc_contig_range(),
some error path doesn't call test_pages_isolated() so it's still
hard to know exact pfn that causes allocation failure.

This patch change this situation by calling test_pages_isolated()
in almost error path. In allocation failure case, some overhead
is added by this change, but, allocation failure is really rare
event so it would not matter.

In fatal signal pending case, we don't call test_pages_isolated()
because this failure is intentional one.

There was a bogus outer_start problem due to unchecked buddy order
and this patch also fix it. Before this patch, it didn't matter,
because end result is same thing. But, after this patch,
tracepoint will report failed pfn so it should be accurate.

Signed-off-by: Joonsoo Kim <[email protected]>
---
mm/page_alloc.c | 23 ++++++++++++++++++++---
1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d0499ff..21e9172 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6748,8 +6748,12 @@ int alloc_contig_range(unsigned long start, unsigned long end,
if (ret)
return ret;

+ /*
+ * In case of -EBUSY, we'd like to know which page causes problem.
+ * So, just fall through. We will check it in test_pages_isolated().
+ */
ret = __alloc_contig_migrate_range(&cc, start, end);
- if (ret)
+ if (ret && ret != -EBUSY)
goto done;

/*
@@ -6776,12 +6780,25 @@ int alloc_contig_range(unsigned long start, unsigned long end,
outer_start = start;
while (!PageBuddy(pfn_to_page(outer_start))) {
if (++order >= MAX_ORDER) {
- ret = -EBUSY;
- goto done;
+ outer_start = start;
+ break;
}
outer_start &= ~0UL << order;
}

+ if (outer_start != start) {
+ order = page_order(pfn_to_page(outer_start));
+
+ /*
+ * outer_start page could be small order buddy page and
+ * it doesn't include start page. Adjust outer_start
+ * in this case to report failed page properly
+ * on tracepoint in test_pages_isolated()
+ */
+ if (outer_start + (1UL << order) <= start)
+ outer_start = start;
+ }
+
/* Make sure the range is really isolated. */
if (test_pages_isolated(outer_start, end, false)) {
pr_info("%s: [%lx, %lx) PFNs busy\n",
--
1.9.1

2015-11-25 10:46:04

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH v2] mm/cma: always check which page cause allocation failure

On 11/25/2015 06:32 AM, Joonsoo Kim wrote:
> Now, we have tracepoint in test_pages_isolated() to notify
> pfn which cannot be isolated. But, in alloc_contig_range(),
> some error path doesn't call test_pages_isolated() so it's still
> hard to know exact pfn that causes allocation failure.
>
> This patch change this situation by calling test_pages_isolated()
> in almost error path. In allocation failure case, some overhead
> is added by this change, but, allocation failure is really rare
> event so it would not matter.
>
> In fatal signal pending case, we don't call test_pages_isolated()
> because this failure is intentional one.
>
> There was a bogus outer_start problem due to unchecked buddy order
> and this patch also fix it. Before this patch, it didn't matter,
> because end result is same thing. But, after this patch,
> tracepoint will report failed pfn so it should be accurate.
>
> Signed-off-by: Joonsoo Kim <[email protected]>

Acked-by: Vlastimil Babka <[email protected]>