A large free page buddy block will continue many times, so if the page
is free, skip the whole page buddy block instead of one page.
Signed-off-by: Xishi Qiu <[email protected]>
---
mm/compaction.c | 5 +++--
1 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index 05ccb4c..874bae1 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -520,9 +520,10 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
goto next_pageblock;
/* Skip if free */
- if (PageBuddy(page))
+ if (PageBuddy(page)) {
+ low_pfn += (1 << page_order(page)) - 1;
continue;
-
+ }
/*
* For async migration, also only scan in MOVABLE blocks. Async
* migration is optimistic to see if the minimum amount of work
--
1.7.1
Hello,
On Wed, Aug 14, 2013 at 12:45:41PM +0800, Xishi Qiu wrote:
> A large free page buddy block will continue many times, so if the page
> is free, skip the whole page buddy block instead of one page.
>
> Signed-off-by: Xishi Qiu <[email protected]>
Nitpick is it could change nr_scanned's result so that COMPACMIGRATE_SCANNED
of vmstat could be smaller than old. It means that compaction efficiency would
pretend to be better than old and if something on userspace have been depends
on it, it would be broken. But I don't know such usecase so I will pass the
decision to others. Anyway, I suppose this patch.
If it's real concern, we can fix it with increasing nr_scanned by page_order.
Thanks.
Acked-by: Minchan Kim <[email protected]>
> ---
> mm/compaction.c | 5 +++--
> 1 files changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 05ccb4c..874bae1 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -520,9 +520,10 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
> goto next_pageblock;
>
> /* Skip if free */
> - if (PageBuddy(page))
> + if (PageBuddy(page)) {
> + low_pfn += (1 << page_order(page)) - 1;
> continue;
> -
> + }
> /*
> * For async migration, also only scan in MOVABLE blocks. Async
> * migration is optimistic to see if the minimum amount of work
> --
> 1.7.1
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
--
Kind regards,
Minchan Kim
On Wed, Aug 14, 2013 at 12:45:41PM +0800, Xishi Qiu wrote:
> A large free page buddy block will continue many times, so if the page
> is free, skip the whole page buddy block instead of one page.
>
> Signed-off-by: Xishi Qiu <[email protected]>
page_order cannot be used unless zone->lock is held which is not held in
this path. Acquiring the lock would prevent parallel allocations from the
buddy allocator (per-cpu allocator would be ok except for refills). I expect
it would not be a good tradeoff to acquire the lock just to use page_order.
Nak.
--
Mel Gorman
SUSE Labs
On 2013/8/14 16:57, Mel Gorman wrote:
> On Wed, Aug 14, 2013 at 12:45:41PM +0800, Xishi Qiu wrote:
>> A large free page buddy block will continue many times, so if the page
>> is free, skip the whole page buddy block instead of one page.
>>
>> Signed-off-by: Xishi Qiu <[email protected]>
>
> page_order cannot be used unless zone->lock is held which is not held in
> this path. Acquiring the lock would prevent parallel allocations from the
> buddy allocator (per-cpu allocator would be ok except for refills). I expect
> it would not be a good tradeoff to acquire the lock just to use page_order.
>
> Nak.
>
Oh, you are right, we must hold zone->lock first.
Thanks,
Xishi Qiu
Hi Mel,
On Wed, Aug 14, 2013 at 09:57:11AM +0100, Mel Gorman wrote:
> On Wed, Aug 14, 2013 at 12:45:41PM +0800, Xishi Qiu wrote:
> > A large free page buddy block will continue many times, so if the page
> > is free, skip the whole page buddy block instead of one page.
> >
> > Signed-off-by: Xishi Qiu <[email protected]>
>
> page_order cannot be used unless zone->lock is held which is not held in
> this path. Acquiring the lock would prevent parallel allocations from the
Argh, I missed that. And it seems you already pointed it out long time ago
someone try to do same things if I remember correctly. :(
But let's think about it more.
It's always not right because CMA and memory-hotplug already isolated
free pages in the range to MIGRATE_ISOLATE right before starting migration
so we could use page_order safely in those contexts even if we don't hold
zone->lock.
In addition, it's likely to have many free pages in case of CMA because CMA
makes MIGRATE_CMA fallback of MIGRATE_MOVABLE to minimize number of migrations.
Even CMA area was full, it could have many free pages once driver who is
CMA area's owner releases the CMA area. So, the bigger CMA space is,
the bigger patch's benefit is. And it could help memory-hotplug, too.
Only problem is normal compaction. The worst case is just skipping
pageblock_nr_pages, for instace, 4M(of course, it depends on configuration).
but we can make the race window very small by dobule checking PageBuddy.
Still, it could make the race theoretically but I think it's really really
unlikely and still enhance compaction overhead withtout holding the lock.
Even if the race happens, normal compaction's customers(ex, THP) doesn't
have critical result and just fallback. So I think it isn't not bad tradeoff.
>From a8172e2197b80e183f2425f4ce8dd060d7d80f1a Mon Sep 17 00:00:00 2001
From: Minchan Kim <[email protected]>
Date: Thu, 15 Aug 2013 00:37:21 +0900
Subject: [PATCH] compaction: skip buddy page during compaction
When isolate_migratepages_range meets free page, it just skip
a page instead whole free page block. It makes unnecessary
overhead of compaction so we might want to use page_order to skip whole
free page block but it's not safe without zone->lock.
With more thinking, it's always not right because CMA and memory-hotplug
already isolated free pages in the range to MIGRATE_ISOLATE right before
starting migration so we could use page_order safely in those contexts
even if we don't hold zone->lock.
In addition to that, it's likely to have many free pages in case of CMA
because CMA makes MIGRATE_CMA fallback of MIGRATE_MOVABLE to minimize
number of migrations. Even CMA area was full, it could have many free pages
once driver who is CMA area's owner releases the CMA area.
So, the bigger CMA space is, the bigger patch's benefit is.
And it could help memory-hotplug, too.
Only problem is normal compaction. The worst case is just skipping
pageblock_nr_pages, for instace, 4M(of course, it depends on configuration).
but we can make the race window very small by dobule checking PageBuddy.
Still, it could make the race theoretically but I think it's really really
unlikely and enhance compaction overhead withtout holding the lock.
Even if the race happens, normal compaction's customers which want to
higher order allocatoin don't have critical result due to failing and
can fallback.
Signed-off-by: Minchan Kim <[email protected]>
---
mm/compaction.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index 05ccb4c..2341d52 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -520,8 +520,18 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
goto next_pageblock;
/* Skip if free */
- if (PageBuddy(page))
+ if (PageBuddy(page)) {
+ /*
+ * page_order is racy without zone->lock but worst case
+ * by the racing is just skipping pageblock_nr_pages.
+ * but even the race is really unlikely by double
+ * check of PageBuddy.
+ */
+ unsigned long order = page_order(page);
+ if (PageBuddy(page))
+ low_pfn += (1 << order) - 1;
continue;
+ }
/*
* For async migration, also only scan in MOVABLE blocks. Async
--
1.8.3.2
> buddy allocator (per-cpu allocator would be ok except for refills). I expect
> it would not be a good tradeoff to acquire the lock just to use page_order.
>
> Nak.
>
> --
> Mel Gorman
> SUSE Labs
--
Kind regards,
Minchan Kim
On Thu, Aug 15, 2013 at 12:52:29AM +0900, Minchan Kim wrote:
> Hi Mel,
>
> On Wed, Aug 14, 2013 at 09:57:11AM +0100, Mel Gorman wrote:
> > On Wed, Aug 14, 2013 at 12:45:41PM +0800, Xishi Qiu wrote:
> > > A large free page buddy block will continue many times, so if the page
> > > is free, skip the whole page buddy block instead of one page.
> > >
> > > Signed-off-by: Xishi Qiu <[email protected]>
> >
> > page_order cannot be used unless zone->lock is held which is not held in
> > this path. Acquiring the lock would prevent parallel allocations from the
>
> Argh, I missed that. And it seems you already pointed it out long time ago
> someone try to do same things if I remember correctly. :(
It feels familiar but I do not remember why.
> But let's think about it more.
>
> It's always not right because CMA and memory-hotplug already isolated
> free pages in the range to MIGRATE_ISOLATE right before starting migration
> so we could use page_order safely in those contexts even if we don't hold
> zone->lock.
>
Both of those are teh corner cases. Neither operation happen frequently
in comparison to something like THP allocations for example. I think an
optimisation along those lines is marginal at best.
> In addition, it's likely to have many free pages in case of CMA because CMA
> makes MIGRATE_CMA fallback of MIGRATE_MOVABLE to minimize number of migrations.
> Even CMA area was full, it could have many free pages once driver who is
> CMA area's owner releases the CMA area. So, the bigger CMA space is,
> the bigger patch's benefit is. And it could help memory-hotplug, too.
>
> Only problem is normal compaction. The worst case is just skipping
> pageblock_nr_pages, for instace, 4M(of course, it depends on configuration).
> but we can make the race window very small by dobule checking PageBuddy.
> Still, it could make the race theoretically but I think it's really really
> unlikely and still enhance compaction overhead withtout holding the lock.
> Even if the race happens, normal compaction's customers(ex, THP) doesn't
> have critical result and just fallback. So I think it isn't not bad tradeoff.
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 05ccb4c..2341d52 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -520,8 +520,18 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
> goto next_pageblock;
>
> /* Skip if free */
> - if (PageBuddy(page))
> + if (PageBuddy(page)) {
> + /*
> + * page_order is racy without zone->lock but worst case
> + * by the racing is just skipping pageblock_nr_pages.
> + * but even the race is really unlikely by double
> + * check of PageBuddy.
> + */
> + unsigned long order = page_order(page);
> + if (PageBuddy(page))
> + low_pfn += (1 << order) - 1;
> continue;
> + }
>
Even if the page is still page buddy, there is no guarantee that it's
the same page order as the first read. It could have be currently
merging with adjacent buddies for example. There is also a really
small race that a page was freed, allocated with some number stuffed
into page->private and freed again before the second PageBuddy check.
It's a bit of a hand grenade. How much of a performance benefit is there
from this patch?
--
Mel Gorman
SUSE Labs
On Wed, Aug 14, 2013 at 05:16:42PM +0100, Mel Gorman wrote:
> On Thu, Aug 15, 2013 at 12:52:29AM +0900, Minchan Kim wrote:
> > Hi Mel,
> >
> > On Wed, Aug 14, 2013 at 09:57:11AM +0100, Mel Gorman wrote:
> > > On Wed, Aug 14, 2013 at 12:45:41PM +0800, Xishi Qiu wrote:
> > > > A large free page buddy block will continue many times, so if the page
> > > > is free, skip the whole page buddy block instead of one page.
> > > >
> > > > Signed-off-by: Xishi Qiu <[email protected]>
> > >
> > > page_order cannot be used unless zone->lock is held which is not held in
> > > this path. Acquiring the lock would prevent parallel allocations from the
> >
> > Argh, I missed that. And it seems you already pointed it out long time ago
> > someone try to do same things if I remember correctly. :(
>
> It feels familiar but I do not remember why.
>
> > But let's think about it more.
> >
> > It's always not right because CMA and memory-hotplug already isolated
> > free pages in the range to MIGRATE_ISOLATE right before starting migration
> > so we could use page_order safely in those contexts even if we don't hold
> > zone->lock.
> >
>
> Both of those are teh corner cases. Neither operation happen frequently
> in comparison to something like THP allocations for example. I think an
> optimisation along those lines is marginal at best.
In embedded side, we don't use THP yet but uses CMA and memory-hotplug so
your claim isn't the case for the embedded world.
And as I said, CMA area is last fallback so it's likely to have many free
pages so bigger CMA area is, the bigger the benefit(CPU and Power) is,
I guess.
>
> > In addition, it's likely to have many free pages in case of CMA because CMA
> > makes MIGRATE_CMA fallback of MIGRATE_MOVABLE to minimize number of migrations.
> > Even CMA area was full, it could have many free pages once driver who is
> > CMA area's owner releases the CMA area. So, the bigger CMA space is,
> > the bigger patch's benefit is. And it could help memory-hotplug, too.
> >
> > Only problem is normal compaction. The worst case is just skipping
> > pageblock_nr_pages, for instace, 4M(of course, it depends on configuration).
> > but we can make the race window very small by dobule checking PageBuddy.
> > Still, it could make the race theoretically but I think it's really really
> > unlikely and still enhance compaction overhead withtout holding the lock.
> > Even if the race happens, normal compaction's customers(ex, THP) doesn't
> > have critical result and just fallback. So I think it isn't not bad tradeoff.
> >
> > diff --git a/mm/compaction.c b/mm/compaction.c
> > index 05ccb4c..2341d52 100644
> > --- a/mm/compaction.c
> > +++ b/mm/compaction.c
> > @@ -520,8 +520,18 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
> > goto next_pageblock;
> >
> > /* Skip if free */
> > - if (PageBuddy(page))
> > + if (PageBuddy(page)) {
> > + /*
> > + * page_order is racy without zone->lock but worst case
> > + * by the racing is just skipping pageblock_nr_pages.
> > + * but even the race is really unlikely by double
> > + * check of PageBuddy.
> > + */
> > + unsigned long order = page_order(page);
> > + if (PageBuddy(page))
> > + low_pfn += (1 << order) - 1;
> > continue;
> > + }
> >
>
> Even if the page is still page buddy, there is no guarantee that it's
> the same page order as the first read. It could have be currently
> merging with adjacent buddies for example. There is also a really
> small race that a page was freed, allocated with some number stuffed
> into page->private and freed again before the second PageBuddy check.
> It's a bit of a hand grenade. How much of a performance benefit is there
1. Just worst case is skipping pageblock_nr_pages
2. Race is really small
3. Higher order page allocation customer always have graceful fallback.
If you really have a concern about that, we can add following.
>From 97d918d6dd9766e6be26512359153400df5c2035 Mon Sep 17 00:00:00 2001
From: Minchan Kim <[email protected]>
Date: Thu, 15 Aug 2013 00:37:21 +0900
Subject: [PATCH] compaction: skip buddy page during compaction
When isolate_migratepages_range meets free page, it just skip
a page instead whole free page block. It makes unnecessary
overhead of compaction so we might want to use page_order to skip whole
free page block but it's not safe without zone->lock.
With more thinking, it's always not right because CMA and memory-hotplug
already isolated free pages in the range to MIGRATE_ISOLATE right before
starting migration so we could use page_order safely in those contexts
even if we don't hold zone->lock.
In addition to that, it's likely to have many free pages in case of CMA
because CMA makes MIGRATE_CMA fallback of MIGRATE_MOVABLE to minimize
number of migrations. Even CMA area was full, it could have many free pages
once driver who is CMA area's owner releases the CMA area.
So, the bigger CMA space is, the bigger patch's benefit is.
And it could help memory-hotplug, too.
Signed-off-by: Minchan Kim <[email protected]>
---
mm/compaction.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index 05ccb4c..aed6c0b 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -520,8 +520,18 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
goto next_pageblock;
/* Skip if free */
- if (PageBuddy(page))
+ if (PageBuddy(page)) {
+#ifdef CONFIG_MEMORY_ISOLATION
+ /*
+ * memory-hotplug and CMA already move free pages to
+ * MIGRATE_ISOLATE so we can use page_order safely
+ * without zone->lock.
+ */
+ if (PageBuddy(page))
+ low_pfn += (1 << page_order(page)) - 1;
+#endif
continue;
+ }
/*
* For async migration, also only scan in MOVABLE blocks. Async
--
1.8.3.2
> from this patch?
I have no data now but if we need it, I will try it with CMA or memory-hotplug
after return back to the office. Maybe some weeks later.
Thanks for the review, Mel!
>
> --
> Mel Gorman
> SUSE Labs
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
--
Kind regards,
Minchan Kim
On Thu, Aug 15, 2013 at 01:39:21AM +0900, Minchan Kim wrote:
> On Wed, Aug 14, 2013 at 05:16:42PM +0100, Mel Gorman wrote:
> > On Thu, Aug 15, 2013 at 12:52:29AM +0900, Minchan Kim wrote:
> > > Hi Mel,
> > >
> > > On Wed, Aug 14, 2013 at 09:57:11AM +0100, Mel Gorman wrote:
> > > > On Wed, Aug 14, 2013 at 12:45:41PM +0800, Xishi Qiu wrote:
> > > > > A large free page buddy block will continue many times, so if the page
> > > > > is free, skip the whole page buddy block instead of one page.
> > > > >
> > > > > Signed-off-by: Xishi Qiu <[email protected]>
> > > >
> > > > page_order cannot be used unless zone->lock is held which is not held in
> > > > this path. Acquiring the lock would prevent parallel allocations from the
> > >
> > > Argh, I missed that. And it seems you already pointed it out long time ago
> > > someone try to do same things if I remember correctly. :(
> >
> > It feels familiar but I do not remember why.
> >
> > > But let's think about it more.
> > >
> > > It's always not right because CMA and memory-hotplug already isolated
> > > free pages in the range to MIGRATE_ISOLATE right before starting migration
> > > so we could use page_order safely in those contexts even if we don't hold
> > > zone->lock.
> > >
> >
> > Both of those are teh corner cases. Neither operation happen frequently
> > in comparison to something like THP allocations for example. I think an
> > optimisation along those lines is marginal at best.
>
> In embedded side, we don't use THP yet but uses CMA and memory-hotplug so
> your claim isn't the case for the embedded world.
I thought CMA was only used when certain devices require large
contiguous ranges of memory. The expectation was that such allocations
are relatively rare and the cost savings from this patch would not be
measurable. Considering how heavy the memory hotplug operation is I also
severely doubt that not being able to skip over PageBuddy pages in
compaction is noticable.
> > > /* Skip if free */
> > > - if (PageBuddy(page))
> > > + if (PageBuddy(page)) {
> > > + /*
> > > + * page_order is racy without zone->lock but worst case
> > > + * by the racing is just skipping pageblock_nr_pages.
> > > + * but even the race is really unlikely by double
> > > + * check of PageBuddy.
> > > + */
> > > + unsigned long order = page_order(page);
> > > + if (PageBuddy(page))
> > > + low_pfn += (1 << order) - 1;
> > > continue;
> > > + }
> > >
> >
> > Even if the page is still page buddy, there is no guarantee that it's
> > the same page order as the first read. It could have be currently
> > merging with adjacent buddies for example. There is also a really
> > small race that a page was freed, allocated with some number stuffed
> > into page->private and freed again before the second PageBuddy check.
> > It's a bit of a hand grenade. How much of a performance benefit is there
>
> 1. Just worst case is skipping pageblock_nr_pages
No, the worst case is that page_order returns a number that is
completely garbage and low_pfn goes off the end of the zone
> 2. Race is really small
> 3. Higher order page allocation customer always have graceful fallback.
>
> If you really have a concern about that, we can add following.
>
> - if (PageBuddy(page))
> + if (PageBuddy(page)) {
> +#ifdef CONFIG_MEMORY_ISOLATION
> + /*
> + * memory-hotplug and CMA already move free pages to
> + * MIGRATE_ISOLATE so we can use page_order safely
> + * without zone->lock.
> + */
> + if (PageBuddy(page))
> + low_pfn += (1 << page_order(page)) - 1;
> +#endif
> continue;
> + }
How does that help anything? MEMORY_ISOLATION is set in distribution
configs and there is no guarantee at all and the same race exists. If it
was going to be anything it would be something like this untested hack
/*
* It is not safe to call page_order(page) for a PageBuddy page without
* holding the zone lock as the order can change or the page allocated.
* Check PageBuddy after reading page_order to minimise the race. As
* the value could still be stale, make sure that we do not accidentally
* skip past the end of the largest page the buddy allocator handles.
*/
if (PageBuddy(page)) {
int nr_pages = (1 << page_order(page)) - 1;
if (PageBuddy(page)) {
nr_pages = min(nr_pages, MAX_ORDER_NR_PAGES - 1);
low_pfn += nr_pages;
continue;
}
}
It's still race-prone meaning that it really should be backed by some
performance data justifying it.
--
Mel Gorman
SUSE Labs
On Wed, Aug 14, 2013 at 07:00:12PM +0100, Mel Gorman wrote:
> On Thu, Aug 15, 2013 at 01:39:21AM +0900, Minchan Kim wrote:
> > On Wed, Aug 14, 2013 at 05:16:42PM +0100, Mel Gorman wrote:
> > > On Thu, Aug 15, 2013 at 12:52:29AM +0900, Minchan Kim wrote:
> > > > Hi Mel,
> > > >
> > > > On Wed, Aug 14, 2013 at 09:57:11AM +0100, Mel Gorman wrote:
> > > > > On Wed, Aug 14, 2013 at 12:45:41PM +0800, Xishi Qiu wrote:
> > > > > > A large free page buddy block will continue many times, so if the page
> > > > > > is free, skip the whole page buddy block instead of one page.
> > > > > >
> > > > > > Signed-off-by: Xishi Qiu <[email protected]>
> > > > >
> > > > > page_order cannot be used unless zone->lock is held which is not held in
> > > > > this path. Acquiring the lock would prevent parallel allocations from the
> > > >
> > > > Argh, I missed that. And it seems you already pointed it out long time ago
> > > > someone try to do same things if I remember correctly. :(
> > >
> > > It feels familiar but I do not remember why.
> > >
> > > > But let's think about it more.
> > > >
> > > > It's always not right because CMA and memory-hotplug already isolated
> > > > free pages in the range to MIGRATE_ISOLATE right before starting migration
> > > > so we could use page_order safely in those contexts even if we don't hold
> > > > zone->lock.
> > > >
> > >
> > > Both of those are teh corner cases. Neither operation happen frequently
> > > in comparison to something like THP allocations for example. I think an
> > > optimisation along those lines is marginal at best.
> >
> > In embedded side, we don't use THP yet but uses CMA and memory-hotplug so
> > your claim isn't the case for the embedded world.
>
> I thought CMA was only used when certain devices require large
> contiguous ranges of memory. The expectation was that such allocations
> are relatively rare and the cost savings from this patch would not be
It's not true. It depends on user activity.
For exmaple, if user opens the app like camera frequenlty, CMA allocation can
happen frequently if we implement image buffer for camera as CMA.
> measurable. Considering how heavy the memory hotplug operation is I also
> severely doubt that not being able to skip over PageBuddy pages in
> compaction is noticable.
>
> > > > /* Skip if free */
> > > > - if (PageBuddy(page))
> > > > + if (PageBuddy(page)) {
> > > > + /*
> > > > + * page_order is racy without zone->lock but worst case
> > > > + * by the racing is just skipping pageblock_nr_pages.
> > > > + * but even the race is really unlikely by double
> > > > + * check of PageBuddy.
> > > > + */
> > > > + unsigned long order = page_order(page);
> > > > + if (PageBuddy(page))
> > > > + low_pfn += (1 << order) - 1;
> > > > continue;
> > > > + }
> > > >
> > >
> > > Even if the page is still page buddy, there is no guarantee that it's
> > > the same page order as the first read. It could have be currently
> > > merging with adjacent buddies for example. There is also a really
> > > small race that a page was freed, allocated with some number stuffed
> > > into page->private and freed again before the second PageBuddy check.
> > > It's a bit of a hand grenade. How much of a performance benefit is there
> >
> > 1. Just worst case is skipping pageblock_nr_pages
>
> No, the worst case is that page_order returns a number that is
> completely garbage and low_pfn goes off the end of the zone
Sigh, I missed low_pfn is return value so you're right but we can simple
fix it with min(low_pfn, end_pfn).
>
> > 2. Race is really small
> > 3. Higher order page allocation customer always have graceful fallback.
> >
> > If you really have a concern about that, we can add following.
> >
> > - if (PageBuddy(page))
> > + if (PageBuddy(page)) {
> > +#ifdef CONFIG_MEMORY_ISOLATION
> > + /*
> > + * memory-hotplug and CMA already move free pages to
> > + * MIGRATE_ISOLATE so we can use page_order safely
> > + * without zone->lock.
> > + */
> > + if (PageBuddy(page))
> > + low_pfn += (1 << page_order(page)) - 1;
> > +#endif
> > continue;
> > + }
>
> How does that help anything? MEMORY_ISOLATION is set in distribution
> configs and there is no guarantee at all and the same race exists. If it
> was going to be anything it would be something like this untested hack
Argh, I sent wrong versoin patch. Here is 4:06 am so maybe I need sleeping. :(
>
> /*
> * It is not safe to call page_order(page) for a PageBuddy page without
> * holding the zone lock as the order can change or the page allocated.
> * Check PageBuddy after reading page_order to minimise the race. As
> * the value could still be stale, make sure that we do not accidentally
> * skip past the end of the largest page the buddy allocator handles.
> */
> if (PageBuddy(page)) {
> int nr_pages = (1 << page_order(page)) - 1;
> if (PageBuddy(page)) {
> nr_pages = min(nr_pages, MAX_ORDER_NR_PAGES - 1);
> low_pfn += nr_pages;
> continue;
> }
> }
My version is following as
if (PageBuddy(page)) {
#ifdef CONFIG_MEMOR_ISOLATION
unsigned long order = page_order(page);
if (PageBuddy(page)) {
low_pfn += (1 << order) - 1;
low_pfn = min(low_pfn, end_pfn);
}
#endif
continue;
}
>
> It's still race-prone meaning that it really should be backed by some
> performance data justifying it.
Okay.
Thanks for the input, Mel!
>
> --
> Mel Gorman
> SUSE Labs
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
--
Kind regards,
Minchan Kim
On Thu, 15 Aug 2013 00:52:29 +0900 Minchan Kim <[email protected]> wrote:
> On Wed, Aug 14, 2013 at 09:57:11AM +0100, Mel Gorman wrote:
> > On Wed, Aug 14, 2013 at 12:45:41PM +0800, Xishi Qiu wrote:
> > > A large free page buddy block will continue many times, so if the page
> > > is free, skip the whole page buddy block instead of one page.
> > >
> > > Signed-off-by: Xishi Qiu <[email protected]>
> >
> > page_order cannot be used unless zone->lock is held which is not held in
> > this path. Acquiring the lock would prevent parallel allocations from the
>
> Argh, I missed that.
I missed it as well. And so did Xishi Qiu.
Mel, we have a problem. What can we do to make this code more
maintainable?
On Wed, Aug 14, 2013 at 01:26:02PM -0700, Andrew Morton wrote:
> On Thu, 15 Aug 2013 00:52:29 +0900 Minchan Kim <[email protected]> wrote:
>
> > On Wed, Aug 14, 2013 at 09:57:11AM +0100, Mel Gorman wrote:
> > > On Wed, Aug 14, 2013 at 12:45:41PM +0800, Xishi Qiu wrote:
> > > > A large free page buddy block will continue many times, so if the page
> > > > is free, skip the whole page buddy block instead of one page.
> > > >
> > > > Signed-off-by: Xishi Qiu <[email protected]>
> > >
> > > page_order cannot be used unless zone->lock is held which is not held in
> > > this path. Acquiring the lock would prevent parallel allocations from the
> >
> > Argh, I missed that.
>
> I missed it as well. And so did Xishi Qiu.
>
> Mel, we have a problem. What can we do to make this code more
> maintainable?
I sit in the bad man corner until I write a comment patch :/
page_order already has a comment but obviously the call site on compaction.c
could do with a hint. As I think the consequences of this race can be
dealt with I'm hoping Xishi Qiu will take the example I posted, fix it
if it needs fixing, turn it into a real patch and run it through whatever
test case led him to find this problem in the first place (HINT HINT). If
that happens, great! If not, I might do it myself and failing that, I'll
post a patch adding a comment explaining why page_order is not used there.
--
Mel Gorman
SUSE Labs
On 2013/8/15 2:00, Mel Gorman wrote:
>>> Even if the page is still page buddy, there is no guarantee that it's
>>> the same page order as the first read. It could have be currently
>>> merging with adjacent buddies for example. There is also a really
>>> small race that a page was freed, allocated with some number stuffed
>>> into page->private and freed again before the second PageBuddy check.
>>> It's a bit of a hand grenade. How much of a performance benefit is there
>>
>> 1. Just worst case is skipping pageblock_nr_pages
>
> No, the worst case is that page_order returns a number that is
> completely garbage and low_pfn goes off the end of the zone
>
>> 2. Race is really small
>> 3. Higher order page allocation customer always have graceful fallback.
>>
Hi Minchan,
I think in this case, we may get the wrong value from page_order(page).
1. page is in page buddy
> if (PageBuddy(page)) {
2. someone allocated the page, and set page->private to another value
> int nr_pages = (1 << page_order(page)) - 1;
3. someone freed the page
> if (PageBuddy(page)) {
4. we will skip wrong pages
> nr_pages = min(nr_pages, MAX_ORDER_NR_PAGES - 1);
> low_pfn += nr_pages;
> continue;
> }
> }
>
> It's still race-prone meaning that it really should be backed by some
> performance data justifying it.
>
Hi Xishi,
On Thu, Aug 15, 2013 at 10:32:50AM +0800, Xishi Qiu wrote:
> On 2013/8/15 2:00, Mel Gorman wrote:
>
> >>> Even if the page is still page buddy, there is no guarantee that it's
> >>> the same page order as the first read. It could have be currently
> >>> merging with adjacent buddies for example. There is also a really
> >>> small race that a page was freed, allocated with some number stuffed
> >>> into page->private and freed again before the second PageBuddy check.
> >>> It's a bit of a hand grenade. How much of a performance benefit is there
> >>
> >> 1. Just worst case is skipping pageblock_nr_pages
> >
> > No, the worst case is that page_order returns a number that is
> > completely garbage and low_pfn goes off the end of the zone
> >
> >> 2. Race is really small
> >> 3. Higher order page allocation customer always have graceful fallback.
> >>
>
> Hi Minchan,
> I think in this case, we may get the wrong value from page_order(page).
>
> 1. page is in page buddy
>
> > if (PageBuddy(page)) {
>
> 2. someone allocated the page, and set page->private to another value
>
> > int nr_pages = (1 << page_order(page)) - 1;
>
> 3. someone freed the page
>
> > if (PageBuddy(page)) {
>
> 4. we will skip wrong pages
So, what's the result by that?
As I said, it's just skipping (pageblock_nr_pages -1) at worst case
and the case you mentioned is right academically and I and Mel
already pointed out that. But how often could that happen in real
practice? I believe such is REALLY REALLY rare.
So, as Mel said, if you have some workloads to see the benefit
from this patch, I think we could accept the patch.
Could you try and respin with the number?
I guess big contigous memory range or memory-hotplug which are
full of free pages in embedded CPU which is rather slower than server
or desktop side could have benefit.
Thanks.
>
> > nr_pages = min(nr_pages, MAX_ORDER_NR_PAGES - 1);
> > low_pfn += nr_pages;
> > continue;
> > }
> > }
> >
> > It's still race-prone meaning that it really should be backed by some
> > performance data justifying it.
> >
>
>
>
--
Kind regards,
Minchan Kim
On 2013/8/15 10:44, Minchan Kim wrote:
> Hi Xishi,
>
> On Thu, Aug 15, 2013 at 10:32:50AM +0800, Xishi Qiu wrote:
>> On 2013/8/15 2:00, Mel Gorman wrote:
>>
>>>>> Even if the page is still page buddy, there is no guarantee that it's
>>>>> the same page order as the first read. It could have be currently
>>>>> merging with adjacent buddies for example. There is also a really
>>>>> small race that a page was freed, allocated with some number stuffed
>>>>> into page->private and freed again before the second PageBuddy check.
>>>>> It's a bit of a hand grenade. How much of a performance benefit is there
>>>>
>>>> 1. Just worst case is skipping pageblock_nr_pages
>>>
>>> No, the worst case is that page_order returns a number that is
>>> completely garbage and low_pfn goes off the end of the zone
>>>
>>>> 2. Race is really small
>>>> 3. Higher order page allocation customer always have graceful fallback.
>>>>
>>
>> Hi Minchan,
>> I think in this case, we may get the wrong value from page_order(page).
>>
>> 1. page is in page buddy
>>
>>> if (PageBuddy(page)) {
>>
>> 2. someone allocated the page, and set page->private to another value
>>
>>> int nr_pages = (1 << page_order(page)) - 1;
>>
>> 3. someone freed the page
>>
>>> if (PageBuddy(page)) {
>>
>> 4. we will skip wrong pages
>
> So, what's the result by that?
> As I said, it's just skipping (pageblock_nr_pages -1) at worst case
Hi Minchan,
I mean if the private is set to a large number, it will skip 2^private
pages, not (pageblock_nr_pages -1). I find somewhere will use page->private,
such as fs. Here is the comment about parivate.
/* Mapping-private opaque data:
* usually used for buffer_heads
* if PagePrivate set; used for
* swp_entry_t if PageSwapCache;
* indicates order in the buddy
* system if PG_buddy is set.
*/
Thanks,
Xishi Qiu
> and the case you mentioned is right academically and I and Mel
> already pointed out that. But how often could that happen in real
> practice? I believe such is REALLY REALLY rare.
> So, as Mel said, if you have some workloads to see the benefit
> from this patch, I think we could accept the patch.
> Could you try and respin with the number?
> I guess big contigous memory range or memory-hotplug which are
> full of free pages in embedded CPU which is rather slower than server
> or desktop side could have benefit.
>
> Thanks.
>
>>
>>> nr_pages = min(nr_pages, MAX_ORDER_NR_PAGES - 1);
>>> low_pfn += nr_pages;
>>> continue;
>>> }
>>> }
>>>
>>> It's still race-prone meaning that it really should be backed by some
>>> performance data justifying it.
>>>
>>
>>
>>
>
Hello,
On Thu, Aug 15, 2013 at 11:46:07AM +0800, Xishi Qiu wrote:
> On 2013/8/15 10:44, Minchan Kim wrote:
>
> > Hi Xishi,
> >
> > On Thu, Aug 15, 2013 at 10:32:50AM +0800, Xishi Qiu wrote:
> >> On 2013/8/15 2:00, Mel Gorman wrote:
> >>
> >>>>> Even if the page is still page buddy, there is no guarantee that it's
> >>>>> the same page order as the first read. It could have be currently
> >>>>> merging with adjacent buddies for example. There is also a really
> >>>>> small race that a page was freed, allocated with some number stuffed
> >>>>> into page->private and freed again before the second PageBuddy check.
> >>>>> It's a bit of a hand grenade. How much of a performance benefit is there
> >>>>
> >>>> 1. Just worst case is skipping pageblock_nr_pages
> >>>
> >>> No, the worst case is that page_order returns a number that is
> >>> completely garbage and low_pfn goes off the end of the zone
> >>>
> >>>> 2. Race is really small
> >>>> 3. Higher order page allocation customer always have graceful fallback.
> >>>>
> >>
> >> Hi Minchan,
> >> I think in this case, we may get the wrong value from page_order(page).
> >>
> >> 1. page is in page buddy
> >>
> >>> if (PageBuddy(page)) {
> >>
> >> 2. someone allocated the page, and set page->private to another value
> >>
> >>> int nr_pages = (1 << page_order(page)) - 1;
> >>
> >> 3. someone freed the page
> >>
> >>> if (PageBuddy(page)) {
> >>
> >> 4. we will skip wrong pages
> >
> > So, what's the result by that?
> > As I said, it's just skipping (pageblock_nr_pages -1) at worst case
>
> Hi Minchan,
> I mean if the private is set to a large number, it will skip 2^private
> pages, not (pageblock_nr_pages -1). I find somewhere will use page->private,
> such as fs. Here is the comment about parivate.
> /* Mapping-private opaque data:
> * usually used for buffer_heads
> * if PagePrivate set; used for
> * swp_entry_t if PageSwapCache;
> * indicates order in the buddy
> * system if PG_buddy is set.
> */
Please read full thread in detail.
Mel suggested following as
if (PageBuddy(page)) {
int nr_pages = (1 << page_order(page)) - 1;
if (PageBuddy(page)) {
nr_pages = min(nr_pages, MAX_ORDER_NR_PAGES - 1);
low_pfn += nr_pages;
continue;
}
}
min(nr_pages, xxx) removes your concern but I think Mel's version
isn't right. It should be aligned with pageblock boundary so I
suggested following.
if (PageBuddy(page)) {
#ifdef CONFIG_MEMORY_ISOLATION
unsigned long order = page_order(page);
if (PageBuddy(page)) {
low_pfn += (1 << order) - 1;
low_pfn = min(low_pfn, end_pfn);
}
#endif
continue;
}
so worst case is (pageblock_nr_pages - 1).
but we don't need to add CONFIG_MEMORY_ISOLATION so my suggestion
is following as.
if (PageBuddy(page)) {
unsigned long order = page_order(page);
if (PageBuddy(page)) {
low_pfn += (1 << order) - 1;
low_pfn = min(low_pfn, end_pfn);
}
continue;
}
> Thanks,
> Xishi Qiu
>
> > and the case you mentioned is right academically and I and Mel
> > already pointed out that. But how often could that happen in real
> > practice? I believe such is REALLY REALLY rare.
> > So, as Mel said, if you have some workloads to see the benefit
> > from this patch, I think we could accept the patch.
> > Could you try and respin with the number?
> > I guess big contigous memory range or memory-hotplug which are
> > full of free pages in embedded CPU which is rather slower than server
> > or desktop side could have benefit.
> >
> > Thanks.
> >
> >>
> >>> nr_pages = min(nr_pages, MAX_ORDER_NR_PAGES - 1);
> >>> low_pfn += nr_pages;
> >>> continue;
> >>> }
> >>> }
> >>>
> >>> It's still race-prone meaning that it really should be backed by some
> >>> performance data justifying it.
> >>>
> >>
> >>
> >>
> >
>
>
>
--
Kind regards,
Minchan Kim
On Thu, Aug 15, 2013 at 01:17:55PM +0900, Minchan Kim wrote:
> Hello,
>
> On Thu, Aug 15, 2013 at 11:46:07AM +0800, Xishi Qiu wrote:
> > On 2013/8/15 10:44, Minchan Kim wrote:
> >
> > > Hi Xishi,
> > >
> > > On Thu, Aug 15, 2013 at 10:32:50AM +0800, Xishi Qiu wrote:
> > >> On 2013/8/15 2:00, Mel Gorman wrote:
> > >>
> > >>>>> Even if the page is still page buddy, there is no guarantee that it's
> > >>>>> the same page order as the first read. It could have be currently
> > >>>>> merging with adjacent buddies for example. There is also a really
> > >>>>> small race that a page was freed, allocated with some number stuffed
> > >>>>> into page->private and freed again before the second PageBuddy check.
> > >>>>> It's a bit of a hand grenade. How much of a performance benefit is there
> > >>>>
> > >>>> 1. Just worst case is skipping pageblock_nr_pages
> > >>>
> > >>> No, the worst case is that page_order returns a number that is
> > >>> completely garbage and low_pfn goes off the end of the zone
> > >>>
> > >>>> 2. Race is really small
> > >>>> 3. Higher order page allocation customer always have graceful fallback.
> > >>>>
> > >>
> > >> Hi Minchan,
> > >> I think in this case, we may get the wrong value from page_order(page).
> > >>
> > >> 1. page is in page buddy
> > >>
> > >>> if (PageBuddy(page)) {
> > >>
> > >> 2. someone allocated the page, and set page->private to another value
> > >>
> > >>> int nr_pages = (1 << page_order(page)) - 1;
> > >>
> > >> 3. someone freed the page
> > >>
> > >>> if (PageBuddy(page)) {
> > >>
> > >> 4. we will skip wrong pages
> > >
> > > So, what's the result by that?
> > > As I said, it's just skipping (pageblock_nr_pages -1) at worst case
> >
> > Hi Minchan,
> > I mean if the private is set to a large number, it will skip 2^private
> > pages, not (pageblock_nr_pages -1). I find somewhere will use page->private,
> > such as fs. Here is the comment about parivate.
> > /* Mapping-private opaque data:
> > * usually used for buffer_heads
> > * if PagePrivate set; used for
> > * swp_entry_t if PageSwapCache;
> > * indicates order in the buddy
> > * system if PG_buddy is set.
> > */
>
> Please read full thread in detail.
>
> Mel suggested following as
>
> if (PageBuddy(page)) {
> int nr_pages = (1 << page_order(page)) - 1;
> if (PageBuddy(page)) {
> nr_pages = min(nr_pages, MAX_ORDER_NR_PAGES - 1);
> low_pfn += nr_pages;
> continue;
> }
> }
>
> min(nr_pages, xxx) removes your concern but I think Mel's version
> isn't right. It should be aligned with pageblock boundary so I
> suggested following.
>
> if (PageBuddy(page)) {
> #ifdef CONFIG_MEMORY_ISOLATION
> unsigned long order = page_order(page);
> if (PageBuddy(page)) {
> low_pfn += (1 << order) - 1;
> low_pfn = min(low_pfn, end_pfn);
> }
> #endif
> continue;
> }
>
> so worst case is (pageblock_nr_pages - 1).
> but we don't need to add CONFIG_MEMORY_ISOLATION so my suggestion
> is following as.
>
> if (PageBuddy(page)) {
> unsigned long order = page_order(page);
> if (PageBuddy(page)) {
> low_pfn += (1 << order) - 1;
> low_pfn = min(low_pfn, end_pfn);
Maybe it should be low_pfn = min(low_pfn, end_pfn - 1).
> }
> continue;
> }
>
>
--
Kind regards,
Minchan Kim
On 2013/8/15 12:17, Minchan Kim wrote:
>
> Please read full thread in detail.
>
> Mel suggested following as
>
> if (PageBuddy(page)) {
> int nr_pages = (1 << page_order(page)) - 1;
> if (PageBuddy(page)) {
> nr_pages = min(nr_pages, MAX_ORDER_NR_PAGES - 1);
> low_pfn += nr_pages;
> continue;
> }
> }
>
> min(nr_pages, xxx) removes your concern but I think Mel's version
> isn't right. It should be aligned with pageblock boundary so I
> suggested following.
>
> if (PageBuddy(page)) {
> #ifdef CONFIG_MEMORY_ISOLATION
> unsigned long order = page_order(page);
> if (PageBuddy(page)) {
> low_pfn += (1 << order) - 1;
> low_pfn = min(low_pfn, end_pfn);
Hi Minchan,
I understand now, but why use "end_pfn" here?
Do you mean we should use pageblock_nr_pages instead of MAX_ORDER_NR_PAGES?
Just like this:
if (PageBuddy(page)) {
unsigned long order = page_order(page);
order = min(order, pageblock_order);
if (PageBuddy(page))
low_pfn += (1 << order) - 1;
continue;
}
Thanks,
Xishi Qiu
> }
> #endif
> continue;
> }
>
> so worst case is (pageblock_nr_pages - 1).
> but we don't need to add CONFIG_MEMORY_ISOLATION so my suggestion
> is following as.
>
> if (PageBuddy(page)) {
> unsigned long order = page_order(page);
> if (PageBuddy(page)) {
> low_pfn += (1 << order) - 1;
> low_pfn = min(low_pfn, end_pfn);
> }
> continue;
> }
>
>
On 2013/8/15 12:24, Minchan Kim wrote:
>> Please read full thread in detail.
>>
>> Mel suggested following as
>>
>> if (PageBuddy(page)) {
>> int nr_pages = (1 << page_order(page)) - 1;
>> if (PageBuddy(page)) {
>> nr_pages = min(nr_pages, MAX_ORDER_NR_PAGES - 1);
>> low_pfn += nr_pages;
>> continue;
>> }
>> }
>>
>> min(nr_pages, xxx) removes your concern but I think Mel's version
>> isn't right. It should be aligned with pageblock boundary so I
>> suggested following.
>>
>> if (PageBuddy(page)) {
>> #ifdef CONFIG_MEMORY_ISOLATION
>> unsigned long order = page_order(page);
>> if (PageBuddy(page)) {
>> low_pfn += (1 << order) - 1;
>> low_pfn = min(low_pfn, end_pfn);
>> }
>> #endif
>> continue;
>> }
>>
Hi Minchan,
I understand now, but why use "end_pfn" here?
Maybe like this:
if (PageBuddy(page)) {
/*
* page_order is racy without zone->lock but worst case
* by the racing is just skipping pageblock_nr_pages.
*/
unsigned long nr_pages = 1 << page_order(page);
if (likely(PageBuddy(page))) {
nr_pages = min(nr_pages, MAX_ORDER_NR_PAGES);
/* Align with pageblock boundary */
if ((low_pfn & (pageblock_nr_pages - 1)) + nr_pages >
pageblock_nr_pages)
low_pfn = ALIGN(low_pfn + 1, pageblock_nr_pages) - 1;
else
low_pfn += nr_pages - 1;
}
continue;
}
Thanks,
Xishi Qiu
>> so worst case is (pageblock_nr_pages - 1).
>> but we don't need to add CONFIG_MEMORY_ISOLATION so my suggestion
>> is following as.
>>
>> if (PageBuddy(page)) {
>> unsigned long order = page_order(page);
>> if (PageBuddy(page)) {
>> low_pfn += (1 << order) - 1;
>> low_pfn = min(low_pfn, end_pfn);
>
> Maybe it should be low_pfn = min(low_pfn, end_pfn - 1).
>
>
>> }
>> continue;
>> }
>>
>>
>
On 2013/8/15 17:51, Wanpeng Li wrote:
> On Thu, Aug 15, 2013 at 03:45:11PM +0800, Xishi Qiu wrote:
>> On 2013/8/15 12:24, Minchan Kim wrote:
>>
>>>> Please read full thread in detail.
>>>>
>>>> Mel suggested following as
>>>>
>>>> if (PageBuddy(page)) {
>>>> int nr_pages = (1 << page_order(page)) - 1;
>>>> if (PageBuddy(page)) {
>>>> nr_pages = min(nr_pages, MAX_ORDER_NR_PAGES - 1);
>>>> low_pfn += nr_pages;
>>>> continue;
>>>> }
>>>> }
>>>>
>>>> min(nr_pages, xxx) removes your concern but I think Mel's version
>>>> isn't right. It should be aligned with pageblock boundary so I
>>>> suggested following.
>>>>
>>>> if (PageBuddy(page)) {
>>>> #ifdef CONFIG_MEMORY_ISOLATION
>>>> unsigned long order = page_order(page);
>>>> if (PageBuddy(page)) {
>>>> low_pfn += (1 << order) - 1;
>>>> low_pfn = min(low_pfn, end_pfn);
>>>> }
>>>> #endif
>>>> continue;
>>>> }
>>>>
>>
>> Hi Minchan,
>>
>> I understand now, but why use "end_pfn" here?
>> Maybe like this:
>>
>> if (PageBuddy(page)) {
>> /*
>> * page_order is racy without zone->lock but worst case
>> * by the racing is just skipping pageblock_nr_pages.
>> */
>> unsigned long nr_pages = 1 << page_order(page);
>> if (likely(PageBuddy(page))) {
>> nr_pages = min(nr_pages, MAX_ORDER_NR_PAGES);
>
> How much sense it make? nr_pages is still equal to itself since nr_pages can't
> larger than MAX_ORDER_NR_PAGES.
>
Hi Wanpeng,
Mel pointed "page_order cannot be used unless zone->lock is held".
"Even if the page is still page buddy, there is no guarantee that it's
the same page order as the first read. It could have be currently merging
with adjacent buddies for example."
If someone use the page during the double PageBuddy check, the value
of private may be wrong. In my opinion, just keep the code unchanged.
Thanks,
Xishi Qiu
>>
>> /* Align with pageblock boundary */
>> if ((low_pfn & (pageblock_nr_pages - 1)) + nr_pages >
>> pageblock_nr_pages)
>> low_pfn = ALIGN(low_pfn + 1, pageblock_nr_pages) - 1;
>> else
>> low_pfn += nr_pages - 1;
>> }
>> continue;
>> }
>>
>> Thanks,
>> Xishi Qiu
>>
>>>> so worst case is (pageblock_nr_pages - 1).
>>>> but we don't need to add CONFIG_MEMORY_ISOLATION so my suggestion
>>>> is following as.
>>>>
>>>> if (PageBuddy(page)) {
>>>> unsigned long order = page_order(page);
>>>> if (PageBuddy(page)) {
>>>> low_pfn += (1 << order) - 1;
>>>> low_pfn = min(low_pfn, end_pfn);
>>>
>>> Maybe it should be low_pfn = min(low_pfn, end_pfn - 1).
>>>
>>>
>>>> }
>>>> continue;
>>>> }
>>>>
On 2013/8/15 17:51, Wanpeng Li wrote:
> On Thu, Aug 15, 2013 at 03:45:11PM +0800, Xishi Qiu wrote:
>> On 2013/8/15 12:24, Minchan Kim wrote:
>>
>>>> Please read full thread in detail.
>>>>
>>>> Mel suggested following as
>>>>
>>>> if (PageBuddy(page)) {
>>>> int nr_pages = (1 << page_order(page)) - 1;
>>>> if (PageBuddy(page)) {
>>>> nr_pages = min(nr_pages, MAX_ORDER_NR_PAGES - 1);
>>>> low_pfn += nr_pages;
>>>> continue;
>>>> }
>>>> }
>>>>
>>>> min(nr_pages, xxx) removes your concern but I think Mel's version
>>>> isn't right. It should be aligned with pageblock boundary so I
>>>> suggested following.
>>>>
>>>> if (PageBuddy(page)) {
>>>> #ifdef CONFIG_MEMORY_ISOLATION
>>>> unsigned long order = page_order(page);
>>>> if (PageBuddy(page)) {
>>>> low_pfn += (1 << order) - 1;
>>>> low_pfn = min(low_pfn, end_pfn);
>>>> }
>>>> #endif
>>>> continue;
>>>> }
>>>>
>>
>> Hi Minchan,
>>
>> I understand now, but why use "end_pfn" here?
>> Maybe like this:
>>
>> if (PageBuddy(page)) {
>> /*
>> * page_order is racy without zone->lock but worst case
>> * by the racing is just skipping pageblock_nr_pages.
>> */
>> unsigned long nr_pages = 1 << page_order(page);
>> if (likely(PageBuddy(page))) {
>> nr_pages = min(nr_pages, MAX_ORDER_NR_PAGES);
>
> How much sense it make? nr_pages is still equal to itself since nr_pages can't
> larger than MAX_ORDER_NR_PAGES.
>
Hi Wanpeng,
Mel pointed "page_order cannot be used unless zone->lock is held".
"Even if the page is still page buddy, there is no guarantee that it's
the same page order as the first read. It could have be currently merging
with adjacent buddies for example."
If someone use the page during the double PageBuddy check, the value
of private may be wrong. In my opinion, just keep the code unchanged.
Thanks,
Xishi Qiu
>>
>> /* Align with pageblock boundary */
>> if ((low_pfn & (pageblock_nr_pages - 1)) + nr_pages >
>> pageblock_nr_pages)
>> low_pfn = ALIGN(low_pfn + 1, pageblock_nr_pages) - 1;
>> else
>> low_pfn += nr_pages - 1;
>> }
>> continue;
>> }
>>
>> Thanks,
>> Xishi Qiu
>>
On Thu, Aug 15, 2013 at 01:17:55PM +0900, Minchan Kim wrote:
> Hello,
>
Well, this thread managed to get out of control for no good reason!
> > > <SNIP>
> > > So, what's the result by that?
> > > As I said, it's just skipping (pageblock_nr_pages -1) at worst case
> >
> > Hi Minchan,
> > I mean if the private is set to a large number, it will skip 2^private
> > pages, not (pageblock_nr_pages -1). I find somewhere will use page->private,
> > such as fs. Here is the comment about parivate.
> > /* Mapping-private opaque data:
> > * usually used for buffer_heads
> > * if PagePrivate set; used for
> > * swp_entry_t if PageSwapCache;
> > * indicates order in the buddy
> > * system if PG_buddy is set.
> > */
>
> Please read full thread in detail.
>
> Mel suggested following as
>
> if (PageBuddy(page)) {
> int nr_pages = (1 << page_order(page)) - 1;
> if (PageBuddy(page)) {
> nr_pages = min(nr_pages, MAX_ORDER_NR_PAGES - 1);
> low_pfn += nr_pages;
> continue;
> }
> }
>
> min(nr_pages, xxx) removes your concern but I think Mel's version
> isn't right. It should be aligned with pageblock boundary so I
> suggested following.
>
Why? We're looking for pages to migrate. If the page is free and at the
maximum order then there is no point searching in the middle of a free
page.
> if (PageBuddy(page)) {
> #ifdef CONFIG_MEMORY_ISOLATION
> unsigned long order = page_order(page);
> if (PageBuddy(page)) {
> low_pfn += (1 << order) - 1;
> low_pfn = min(low_pfn, end_pfn);
> }
> #endif
> continue;
> }
>
> so worst case is (pageblock_nr_pages - 1).
No it isn't. The worst case it that the whole region being searched is
skipped. For THP allocations, it would happen to work as being the
pageblock boundary but it is not required by the API. I expect that
end_pfn is not necessarily the next pageblock boundary for CMA
allocations.
--
Mel Gorman
SUSE Labs
Hi Mel,
On Thu, Aug 15, 2013 at 12:30:19PM +0100, Mel Gorman wrote:
> On Thu, Aug 15, 2013 at 01:17:55PM +0900, Minchan Kim wrote:
> > Hello,
> >
>
> Well, this thread managed to get out of control for no good reason!
>
> > > > <SNIP>
> > > > So, what's the result by that?
> > > > As I said, it's just skipping (pageblock_nr_pages -1) at worst case
> > >
> > > Hi Minchan,
> > > I mean if the private is set to a large number, it will skip 2^private
> > > pages, not (pageblock_nr_pages -1). I find somewhere will use page->private,
> > > such as fs. Here is the comment about parivate.
> > > /* Mapping-private opaque data:
> > > * usually used for buffer_heads
> > > * if PagePrivate set; used for
> > > * swp_entry_t if PageSwapCache;
> > > * indicates order in the buddy
> > > * system if PG_buddy is set.
> > > */
> >
> > Please read full thread in detail.
> >
> > Mel suggested following as
> >
> > if (PageBuddy(page)) {
> > int nr_pages = (1 << page_order(page)) - 1;
> > if (PageBuddy(page)) {
> > nr_pages = min(nr_pages, MAX_ORDER_NR_PAGES - 1);
> > low_pfn += nr_pages;
> > continue;
> > }
> > }
> >
> > min(nr_pages, xxx) removes your concern but I think Mel's version
> > isn't right. It should be aligned with pageblock boundary so I
> > suggested following.
> >
>
> Why? We're looking for pages to migrate. If the page is free and at the
> maximum order then there is no point searching in the middle of a free
> page.
isolate_migratepages_range API works with [low_pfn, end_pfn)
and we can't guarantee page_order in normal compaction path
so I'd like to limit the skipping by end_pfn conservatively.
>
> > if (PageBuddy(page)) {
> > #ifdef CONFIG_MEMORY_ISOLATION
> > unsigned long order = page_order(page);
> > if (PageBuddy(page)) {
> > low_pfn += (1 << order) - 1;
> > low_pfn = min(low_pfn, end_pfn);
> > }
> > #endif
> > continue;
> > }
> >
> > so worst case is (pageblock_nr_pages - 1).
>
> No it isn't. The worst case it that the whole region being searched is
> skipped. For THP allocations, it would happen to work as being the
> pageblock boundary but it is not required by the API. I expect that
> end_pfn is not necessarily the next pageblock boundary for CMA
> allocations.
Mel, as I said eariler, CMA and memory-hotplug don't have a race
problem of page_order so we can consider only normal compaction path
like high order allocation(ex, THP). So, about this race problem,
worst case is the number of (pageblock_nr_pages - 1) skipping.
>
> --
> Mel Gorman
> SUSE Labs
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
--
Kind regards,
Minchan Kim
On Thu, Aug 15, 2013 at 10:19:35PM +0900, Minchan Kim wrote:
> >
> > Why? We're looking for pages to migrate. If the page is free and at the
> > maximum order then there is no point searching in the middle of a free
> > page.
>
> isolate_migratepages_range API works with [low_pfn, end_pfn)
> and we can't guarantee page_order in normal compaction path
> so I'd like to limit the skipping by end_pfn conservatively.
>
Fine
s/MAX_ORDER_NR_PAGES/pageblock_nr_pages/
and take the min of it and
low_pfn = min(low_pfn, end_pfn - 1)
--
Mel Gorman
SUSE Labs
On Thu, Aug 15, 2013 at 02:42:09PM +0100, Mel Gorman wrote:
> On Thu, Aug 15, 2013 at 10:19:35PM +0900, Minchan Kim wrote:
> > >
> > > Why? We're looking for pages to migrate. If the page is free and at the
> > > maximum order then there is no point searching in the middle of a free
> > > page.
> >
> > isolate_migratepages_range API works with [low_pfn, end_pfn)
> > and we can't guarantee page_order in normal compaction path
> > so I'd like to limit the skipping by end_pfn conservatively.
> >
>
> Fine
>
> s/MAX_ORDER_NR_PAGES/pageblock_nr_pages/
>
> and take the min of it and
>
> low_pfn = min(low_pfn, end_pfn - 1)
True. That was really what I suggested to Xishi.
Only thing we need it is just number to justify for proving the benefit.
Thanks.
--
Kind regards,
Minchan Kim
Developers occasionally try and optimise PFN scanners by using page_order
but miss that in general it requires zone->lock. This has happened twice for
compaction.c and rejected both times. This patch clarifies the documentation
of page_order and adds a note to compaction.c why page_order is not used.
Signed-off-by: Mel Gorman <[email protected]>
---
mm/compaction.c | 5 ++++-
mm/internal.h | 8 +++++---
2 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index f58bcd0..f91d26b 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -522,7 +522,10 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
if (!isolation_suitable(cc, page))
goto next_pageblock;
- /* Skip if free */
+ /*
+ * Skip if free. page_order cannot be used without zone->lock
+ * as nothing prevents parallel allocations or buddy merging.
+ */
if (PageBuddy(page))
continue;
diff --git a/mm/internal.h b/mm/internal.h
index 684f7aa..09cd8be 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -144,9 +144,11 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
#endif
/*
- * function for dealing with page's order in buddy system.
- * zone->lock is already acquired when we use these.
- * So, we don't need atomic page->flags operations here.
+ * This functions returns the order of a free page in the buddy system.
+ * In general, page_zone(page)->lock must be held by the caller to prevent
+ * the page being allocated in parallel and returning garbage as the order.
+ * If the caller does not hold page_zone(page), they must guarantee that
+ * the page cannot be allocated or merged in parallel.
*/
static inline unsigned long page_order(struct page *page)
{
On Fri, Jan 17, 2014 at 02:32:21PM +0000, Mel Gorman wrote:
> Developers occasionally try and optimise PFN scanners by using page_order
> but miss that in general it requires zone->lock. This has happened twice for
> compaction.c and rejected both times. This patch clarifies the documentation
> of page_order and adds a note to compaction.c why page_order is not used.
>
> Signed-off-by: Mel Gorman <[email protected]>
> ---
> mm/compaction.c | 5 ++++-
> mm/internal.h | 8 +++++---
> 2 files changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index f58bcd0..f91d26b 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -522,7 +522,10 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
> if (!isolation_suitable(cc, page))
> goto next_pageblock;
>
> - /* Skip if free */
> + /*
> + * Skip if free. page_order cannot be used without zone->lock
> + * as nothing prevents parallel allocations or buddy merging.
> + */
> if (PageBuddy(page))
> continue;
>
> diff --git a/mm/internal.h b/mm/internal.h
> index 684f7aa..09cd8be 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -144,9 +144,11 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
> #endif
>
> /*
> - * function for dealing with page's order in buddy system.
> - * zone->lock is already acquired when we use these.
> - * So, we don't need atomic page->flags operations here.
> + * This functions returns the order of a free page in the buddy system.
> + * In general, page_zone(page)->lock must be held by the caller to prevent
> + * the page being allocated in parallel and returning garbage as the order.
> + * If the caller does not hold page_zone(page), they must guarantee that
> + * the page cannot be allocated or merged in parallel.
> */
> static inline unsigned long page_order(struct page *page)
> {
Acked-by: Rafael Aquini <[email protected]>
On 1/17/2014 6:32 AM, Mel Gorman wrote:
> Developers occasionally try and optimise PFN scanners by using page_order
> but miss that in general it requires zone->lock. This has happened twice for
> compaction.c and rejected both times. This patch clarifies the documentation
> of page_order and adds a note to compaction.c why page_order is not used.
>
> Signed-off-by: Mel Gorman <[email protected]>
> ---
> mm/compaction.c | 5 ++++-
> mm/internal.h | 8 +++++---
> 2 files changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index f58bcd0..f91d26b 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -522,7 +522,10 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
> if (!isolation_suitable(cc, page))
> goto next_pageblock;
>
> - /* Skip if free */
> + /*
> + * Skip if free. page_order cannot be used without zone->lock
> + * as nothing prevents parallel allocations or buddy merging.
> + */
> if (PageBuddy(page))
> continue;
>
> diff --git a/mm/internal.h b/mm/internal.h
> index 684f7aa..09cd8be 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -144,9 +144,11 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
> #endif
>
> /*
> - * function for dealing with page's order in buddy system.
> - * zone->lock is already acquired when we use these.
> - * So, we don't need atomic page->flags operations here.
> + * This functions returns the order of a free page in the buddy system.
> + * In general, page_zone(page)->lock must be held by the caller to prevent
> + * the page being allocated in parallel and returning garbage as the order.
> + * If the caller does not hold page_zone(page), they must guarantee that
page_zone(page)->lock here?
> + * the page cannot be allocated or merged in parallel.
> */
> static inline unsigned long page_order(struct page *page)
> {
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
>
--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation
On Fri, Jan 17, 2014 at 10:53:50AM -0800, Laura Abbott wrote:
> On 1/17/2014 6:32 AM, Mel Gorman wrote:
> >Developers occasionally try and optimise PFN scanners by using page_order
> >but miss that in general it requires zone->lock. This has happened twice for
> >compaction.c and rejected both times. This patch clarifies the documentation
> >of page_order and adds a note to compaction.c why page_order is not used.
> >
> >Signed-off-by: Mel Gorman <[email protected]>
> >---
> > mm/compaction.c | 5 ++++-
> > mm/internal.h | 8 +++++---
> > 2 files changed, 9 insertions(+), 4 deletions(-)
> >
> >diff --git a/mm/compaction.c b/mm/compaction.c
> >index f58bcd0..f91d26b 100644
> >--- a/mm/compaction.c
> >+++ b/mm/compaction.c
> >@@ -522,7 +522,10 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
> > if (!isolation_suitable(cc, page))
> > goto next_pageblock;
> >
> >- /* Skip if free */
> >+ /*
> >+ * Skip if free. page_order cannot be used without zone->lock
> >+ * as nothing prevents parallel allocations or buddy merging.
> >+ */
> > if (PageBuddy(page))
> > continue;
> >
> >diff --git a/mm/internal.h b/mm/internal.h
> >index 684f7aa..09cd8be 100644
> >--- a/mm/internal.h
> >+++ b/mm/internal.h
> >@@ -144,9 +144,11 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
> > #endif
> >
> > /*
> >- * function for dealing with page's order in buddy system.
> >- * zone->lock is already acquired when we use these.
> >- * So, we don't need atomic page->flags operations here.
> >+ * This functions returns the order of a free page in the buddy system.
> >+ * In general, page_zone(page)->lock must be held by the caller to prevent
> >+ * the page being allocated in parallel and returning garbage as the order.
> >+ * If the caller does not hold page_zone(page), they must guarantee that
> page_zone(page)->lock here?
*sigh*
Yes, thanks.
--
Mel Gorman
SUSE Labs
On Fri, Jan 17, 2014 at 02:32:21PM +0000, Mel Gorman wrote:
> Developers occasionally try and optimise PFN scanners by using page_order
> but miss that in general it requires zone->lock. This has happened twice for
> compaction.c and rejected both times. This patch clarifies the documentation
> of page_order and adds a note to compaction.c why page_order is not used.
>
> Signed-off-by: Mel Gorman <[email protected]>
Except Laura pointed out,
Acked-by: Minchan Kim <[email protected]>
Thanks for following up this issue without forgetting.
--
Kind regards,
Minchan Kim
Developers occasionally try and optimise PFN scanners by using page_order
but miss that in general it requires zone->lock. This has happened twice for
compaction.c and rejected both times. This patch clarifies the documentation
of page_order and adds a note to compaction.c why page_order is not used.
[[email protected]: Corrected a page_zone(page)->lock reference]
Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Rafael Aquini <[email protected]>
Acked-by: Minchan Kim <[email protected]>
---
mm/compaction.c | 5 ++++-
mm/internal.h | 8 +++++---
2 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index f58bcd0..f91d26b 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -522,7 +522,10 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
if (!isolation_suitable(cc, page))
goto next_pageblock;
- /* Skip if free */
+ /*
+ * Skip if free. page_order cannot be used without zone->lock
+ * as nothing prevents parallel allocations or buddy merging.
+ */
if (PageBuddy(page))
continue;
diff --git a/mm/internal.h b/mm/internal.h
index 684f7aa..09cd8be 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -144,9 +144,11 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
#endif
/*
- * function for dealing with page's order in buddy system.
- * zone->lock is already acquired when we use these.
- * So, we don't need atomic page->flags operations here.
+ * This functions returns the order of a free page in the buddy system. In
+ * general, page_zone(page)->lock must be held by the caller to prevent the
+ * page being allocated in parallel and returning garbage as the order. If the
+ * caller does not hold page_zone(page)->lock, they must guarantee that the
+ * page cannot be allocated or merged in parallel.
*/
static inline unsigned long page_order(struct page *page)
{