2010-07-26 15:47:05

by Minchan Kim

[permalink] [raw]
Subject: [PATCH] Tight check of pfn_valid on sparsemem - v4

Changelog since v3
o fix my totally mistake in v3
o use set_page_private and page_private

Changelog since v2
o Change some function names
o Remove mark_memmap_hole in memmap bring up
o Change CONFIG_SPARSEMEM with CONFIG_ARCH_HAS_HOLES_MEMORYMODEL

I have a plan following as after this patch is acked.

TODO:
1) expand pfn_valid to FALTMEM in ARM
I think we can enhance pfn_valid of FLATMEM in ARM.
Now it is doing binary search and it's expesive.
First of all, After we merge this patch, I expand it to FALTMEM of ARM.

2) remove memmap_valid_within
We can remove memmap_valid_within by strict pfn_valid's tight check.

3) Optimize hole check in sparsemem
In case of spasemem, we can optimize pfn_valid through defining new flag
like SECTION_HAS_HOLE of hole mem_section.

== CUT HERE ==

Kukjin reported oops happen while he change min_free_kbytes
http://www.spinics.net/lists/arm-kernel/msg92894.html
It happen by memory map on sparsemem.

The system has a memory map following as.
section 0 section 1 section 2
0x20000000-0x25000000, 0x40000000-0x50000000, 0x50000000-0x58000000
SECTION_SIZE_BITS 28(256M)

It means section 0 is an incompletely filled section.
Nontheless, current pfn_valid of sparsemem checks pfn loosely.
It checks only mem_section's validation but ARM can free mem_map on hole
to save memory space. So in above case, pfn on 0x25000000 can pass pfn_valid's
validation check. It's not what we want.

We can match section size to smallest valid size.(ex, above case, 16M)
But Russell doesn't like it due to mem_section's memory overhead with different
configuration(ex, 512K section).

I tried to add valid pfn range in mem_section but everyone doesn't like it
due to size overhead. This patch is suggested by KAMEZAWA-san.
I just fixed compile error and change some naming.

This patch registers address of mem_section to memmap itself's page struct's
pg->private field. This means the page is used for memmap of the section.
Otherwise, the page is used for other purpose and memmap has a hole.

This patch is based on mmotm-2010-07-19

Signed-off-by: Minchan Kim <[email protected]>
Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
Reported-by: Kukjin Kim <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Johannes Weiner <[email protected]>
---
arch/arm/mm/init.c | 9 +++++++++
include/linux/mmzone.h | 21 ++++++++++++++++++++-
mm/mmzone.c | 33 +++++++++++++++++++++++++++++++++
3 files changed, 62 insertions(+), 1 deletions(-)

diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index bc98d5d..18b255d 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -238,6 +238,15 @@ static void __init arm_bootmem_free(struct meminfo *mi, unsigned long min,
arch_adjust_zones(zone_size, zhole_size);

free_area_init_node(0, zone_size, min, zhole_size);
+
+ /*
+ * mark pages on mem_map with valid using pg->private.
+ * mem_map on hole will be freed free_memmap later.
+ */
+ for_each_bank(i, mi) {
+ mark_valid_memmap(bank_pfn_start(&mi->bank[i]),
+ bank_pfn_end(&mi->bank[i]));
+ }
}

#ifndef CONFIG_SPARSEMEM
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 6e6e626..3b4d16f 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -15,6 +15,7 @@
#include <linux/seqlock.h>
#include <linux/nodemask.h>
#include <linux/pageblock-flags.h>
+#include <linux/mm_types.h>
#include <generated/bounds.h>
#include <asm/atomic.h>
#include <asm/page.h>
@@ -1032,11 +1033,29 @@ static inline struct mem_section *__pfn_to_section(unsigned long pfn)
return __nr_to_section(pfn_to_section_nr(pfn));
}

+void mark_valid_memmap(unsigned long start, unsigned long end);
+
+#ifdef CONFIG_ARCH_HAS_HOLES_MEMORYMODEL
+static inline int memmap_valid(unsigned long pfn)
+{
+ struct page *page = pfn_to_page(pfn);
+ struct page *__pg = virt_to_page(page);
+ return page_private(__pg) == (unsigned long)__pg;
+}
+#else
+static inline int memmap_valid(unsigned long pfn)
+{
+ return 1;
+}
+#endif
+
static inline int pfn_valid(unsigned long pfn)
{
+ struct mem_section *ms;
if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
return 0;
- return valid_section(__nr_to_section(pfn_to_section_nr(pfn)));
+ ms = __nr_to_section(pfn_to_section_nr(pfn));
+ return valid_section(ms) && memmap_valid(pfn);
}

static inline int pfn_present(unsigned long pfn)
diff --git a/mm/mmzone.c b/mm/mmzone.c
index f5b7d17..8c3cf57 100644
--- a/mm/mmzone.c
+++ b/mm/mmzone.c
@@ -86,4 +86,37 @@ int memmap_valid_within(unsigned long pfn,

return 1;
}
+
+/*
+ * Fill pg->private on valid mem_map with page itself.
+ * pfn_valid() will check this later. (see include/linux/mmzone.h)
+ * Every arch for supporting hole of mem_map should call
+ * mark_valid_memmap(start, end). please see usage in ARM.
+ */
+void mark_valid_memmap(unsigned long start, unsigned long end)
+{
+ struct mem_section *ms;
+ unsigned long pos, next;
+ struct page *pg;
+ void *memmap, *mapend;
+
+ for (pos = start; pos < end; pos = next) {
+ next = (pos + PAGES_PER_SECTION) & PAGE_SECTION_MASK;
+ ms = __pfn_to_section(pos);
+ if (!valid_section(ms))
+ continue;
+
+ for (memmap = (void*)pfn_to_page(pos),
+ /* The last page in section */
+ mapend = pfn_to_page(next-1);
+ memmap < mapend; memmap += PAGE_SIZE) {
+ pg = virt_to_page(memmap);
+ set_page_private(pg, (unsigned long)pg);
+ }
+ }
+}
+#else
+void mark_valid_memmap(unsigned long start, unsigned long end)
+{
+}
#endif /* CONFIG_ARCH_HAS_HOLES_MEMORYMODEL */
--
1.7.0.5


2010-07-26 16:47:31

by Christoph Lameter

[permalink] [raw]
Subject: Re: [PATCH] Tight check of pfn_valid on sparsemem - v4

On Tue, 27 Jul 2010, Minchan Kim wrote:

> This patch registers address of mem_section to memmap itself's page struct's
> pg->private field. This means the page is used for memmap of the section.
> Otherwise, the page is used for other purpose and memmap has a hole.

What if page->private just happens to be the value of the page struct?
Even if that is not possible today, someday someone may add new
functionality to the kernel where page->pivage == page is used for some
reason.

Checking for PG_reserved wont work?

> +void mark_valid_memmap(unsigned long start, unsigned long end);
> +
> +#ifdef CONFIG_ARCH_HAS_HOLES_MEMORYMODEL
> +static inline int memmap_valid(unsigned long pfn)
> +{
> + struct page *page = pfn_to_page(pfn);
> + struct page *__pg = virt_to_page(page);
> + return page_private(__pg) == (unsigned long)__pg;

Hmmm.. hmmm....

2010-07-26 22:47:30

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH] Tight check of pfn_valid on sparsemem - v4

Hi Christoph,

On Tue, Jul 27, 2010 at 1:40 AM, Christoph Lameter
<[email protected]> wrote:
> On Tue, 27 Jul 2010, Minchan Kim wrote:
>
>> This patch registers address of mem_section to memmap itself's page struct's
>> pg->private field. This means the page is used for memmap of the section.
>> Otherwise, the page is used for other purpose and memmap has a hole.
>
> What if page->private just happens to be the value of the page struct?
> Even if that is not possible today, someday someone may add new
> functionality to the kernel where page->pivage == page is used for some
> reason.

I agree.

>
> Checking for PG_reserved wont work?

Okay. It would be better to consider page point itself with PG_reserved.
I will reflect your opinion next version. :)

Thanks, Christoph.




--
Kind regards,
Minchan Kim

2010-07-27 05:55:51

by Milton Miller

[permalink] [raw]
Subject: Re: [PATCH] Tight check of pfn_valid on sparsemem - v4

[Sorry if i missed or added anyone on cc, patchwork.kernel.org LKML is not
working and I'm not subscribed to the list ]

On Mon Jul 26 2010 about 12:47:37 EST, Christoph Lameter wrote:
> On Tue, 27 Jul 2010, Minchan Kim wrote:
>
> > This patch registers address of mem_section to memmap itself's page struct's
> > pg->private field. This means the page is used for memmap of the section.
> > Otherwise, the page is used for other purpose and memmap has a hole.

>
> > +void mark_valid_memmap(unsigned long start, unsigned long end);
> > +
> > +#ifdef CONFIG_ARCH_HAS_HOLES_MEMORYMODEL
> > +static inline int memmap_valid(unsigned long pfn)
> > +{
> > + struct page *page = pfn_to_page(pfn);
> > + struct page *__pg = virt_to_page(page);
> > + return page_private(__pg) == (unsigned long)__pg;
>
>
> What if page->private just happens to be the value of the page struct?
> Even if that is not possible today, someday someone may add new
> functionality to the kernel where page->pivage == page is used for some
> reason.
>
> Checking for PG_reserved wont work?

I had the same thought and suggest setting it to the memory section block,
since that is a uniquie value (unlike PG_reserved),

> > static inline int pfn_valid(unsigned long pfn)
> > {
> > + struct mem_section *ms;
> > if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
> > return 0;
> > - return valid_section(__nr_to_section(pfn_to_section_nr(pfn)));
> > + ms = __nr_to_section(pfn_to_section_nr(pfn));
> > + return valid_section(ms) && memmap_valid(pfn);

.. and we already have computed it when we use it so we could pass it as
a parameter (to both _valid and mark_valid).

milton

2010-07-27 06:11:19

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH] Tight check of pfn_valid on sparsemem - v4

On Tue, Jul 27, 2010 at 2:55 PM, <[email protected]> wrote:
> [Sorry if i missed or added anyone on cc, patchwork.kernel.org ?LKML is not
> working and I'm not subscribed to the list ]

Readd them. :)

>
> On Mon Jul 26 2010 about 12:47:37 EST, Christoph Lameter wrote:
>> On Tue, 27 Jul 2010, Minchan Kim wrote:
>>
>> > This patch registers address of mem_section to memmap itself's page struct's
>> > pg->private field. This means the page is used for memmap of the section.
>> > Otherwise, the page is used for other purpose and memmap has a hole.
>
>>
>> > +void mark_valid_memmap(unsigned long start, unsigned long end);
>> > +
>> > +#ifdef CONFIG_ARCH_HAS_HOLES_MEMORYMODEL
>> > +static inline int memmap_valid(unsigned long pfn)
>> > +{
>> > + struct page *page = pfn_to_page(pfn);
>> > + struct page *__pg = virt_to_page(page);
>> > + return page_private(__pg) == (unsigned long)__pg;
>>
>>
>> What if page->private just happens to be the value of the page struct?
>> Even if that is not possible today, someday someone may add new
>> functionality to the kernel where page->pivage == page is used for some
>> reason.
>>
>> Checking for PG_reserved wont work?
>
> I had the same thought and suggest setting it to the memory section block,
> since that is a uniquie value (unlike PG_reserved),

You mean setting pg->private to mem_section address?
I hope I understand your point.

Actually, KAMEZAWA tried it at first version but I changed it.
That's because I want to support this mechanism to ARM FLATMEM.(It
doesn't have mem_section)


>
>> > static inline int pfn_valid(unsigned long pfn)
>> > {
>> > + struct mem_section *ms;
>> > if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
>> > return 0;
>> > - return valid_section(__nr_to_section(pfn_to_section_nr(pfn)));
>> > + ms = __nr_to_section(pfn_to_section_nr(pfn));
>> > + return valid_section(ms) && memmap_valid(pfn);
>
> .. and we already have computed it when we use it so we could pass it as
> a parameter (to both _valid and mark_valid).

I hope this can support FALTMEM which have holes(ex, ARM).

>
> milton
>



--
Kind regards,
Minchan Kim

2010-07-27 08:13:54

by Milton Miller

[permalink] [raw]
Subject: Re: [PATCH] Tight check of pfn_valid on sparsemem - v4


On Tue Jul 27 2010 about 02:11:22 Minchan Kim wrote:
> > [Sorry if i missed or added anyone on cc, patchwork.kernel.org LKML is not
> > working and I'm not subscribed to the list ]
>
> Readd them. :)

Changed linux-mmc at vger to linxu-mm at kvack.org, from my poor use of grep
MAINTAINERS.

> On Tue, Jul 27, 2010 at 2:55 PM, <miltonm@xxxxxxx> wrote:
> > On Mon Jul 26 2010 about 12:47:37 EST, Christoph Lameter wrote:
> > > On Tue, 27 Jul 2010, Minchan Kim wrote:
> > >
> > > > This patch registers address of mem_section to memmap itself's page struct's
> > > > pg->private field. This means the page is used for memmap of the section.
> > > > Otherwise, the page is used for other purpose and memmap has a hole.
> >
> > >
> > > > +void mark_valid_memmap(unsigned long start, unsigned long end);
> > > > +
> > > > +#ifdef CONFIG_ARCH_HAS_HOLES_MEMORYMODEL
> > > > +static inline int memmap_valid(unsigned long pfn)
> > > > +{
> > > > + struct page *page = pfn_to_page(pfn);
> > > > + struct page *__pg = virt_to_page(page);
> > > > + return page_private(__pg) == (unsigned long)__pg;
> > >
> > >
> > > What if page->private just happens to be the value of the page struct?
> > > Even if that is not possible today, someday someone may add new
> > > functionality to the kernel where page->pivage == page is used for some
> > > reason.
> > >
> > > Checking for PG_reserved wont work?
> >
> > I had the same thought and suggest setting it to the memory section block,
> > since that is a uniquie value (unlike PG_reserved),
>
> You mean setting pg->private to mem_section address?
> I hope I understand your point.
>
> Actually, KAMEZAWA tried it at first version but I changed it.
> That's because I want to support this mechanism to ARM FLATMEM.
> (It doesn't have mem_section)

> >
> > .. and we already have computed it when we use it so we could pass it as
> > a parameter (to both _valid and mark_valid).
>
> I hope this can support FALTMEM which have holes(ex, ARM).
>

If we pass a void * to this helper we should be able to find another
symbol. Looking at the pfn_valid() in arch/arm/mm/init.c I would
probably choose &meminfo as it is already used nearby, and using a single
symbol in would avoid issues if a more specific symbol chosen (eg bank)
were to change at a pfn boundary not PAGE_SIZE / sizeof(struct page).
Similarly the asm-generic/page.h version could use &max_mapnr.

This function is a validation helper for pfn_valid not the only check.

something like

static inline int memmap_valid(unsigned long pfn, void *validate)
{
struct page *page = pfn_to_page(pfn);
struct page *__pg = virt_to_page(page);
return page_private(__pg) == validate;
}

static inline int pfn_valid(unsigned long pfn)
{
struct mem_section *ms;
if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
return 0;
ms = __nr_to_section(pfn_to_section_nr(pfn));
return valid_section(ms) && memmap_valid(pfn, ms);
}

> > > > +/*
> > > > + * Fill pg->private on valid mem_map with page itself.
> > > > + * pfn_valid() will check this later. (see include/linux/mmzone.h)
> > > > + * Every arch for supporting hole of mem_map should call
> > > > + * mark_valid_memmap(start, end). please see usage in ARM.
> > > > + */
> > > > +void mark_valid_memmap(unsigned long start, unsigned long end)
> > > > +{
> > > > + struct mem_section *ms;
> > > > + unsigned long pos, next;
> > > > + struct page *pg;
> > > > + void *memmap, *mapend;
> > > > +
> > > > + for (pos = start; pos < end; pos = next) {
> > > > + next = (pos + PAGES_PER_SECTION) & PAGE_SECTION_MASK;
> > > > + ms = __pfn_to_section(pos);
> > > > + if (!valid_section(ms))
> > > > + continue;
> > > > +
> > > > + for (memmap = (void*)pfn_to_page(pos),
> > > > + /* The last page in section */
> > > > + mapend = pfn_to_page(next-1);
> > > > + memmap < mapend; memmap += PAGE_SIZE) {
> > > > + pg = virt_to_page(memmap);
> > > > + set_page_private(pg, (unsigned long)pg);
> > > > + }
> > > > + }
> > > > +}

Hmm, this loop would need to change for sections. And sizeof(struct
page) % PAGE_SIZE may not be 0, so we want a global symbol for sparsemem
too. Perhaps the mem_section array. Using a symbol that is part of
the model pre-checks can remove a global symbol lookup and has the side
effect of making sure our pfn_valid is for the right model.

milton

2010-07-27 08:18:48

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCH] Tight check of pfn_valid on sparsemem - v4

On Tue, 27 Jul 2010 03:12:38 -0500
Milton Miller <[email protected]> wrote:

> > > > > +/*
> > > > > + * Fill pg->private on valid mem_map with page itself.
> > > > > + * pfn_valid() will check this later. (see include/linux/mmzone.h)
> > > > > + * Every arch for supporting hole of mem_map should call
> > > > > + * mark_valid_memmap(start, end). please see usage in ARM.
> > > > > + */
> > > > > +void mark_valid_memmap(unsigned long start, unsigned long end)
> > > > > +{
> > > > > + struct mem_section *ms;
> > > > > + unsigned long pos, next;
> > > > > + struct page *pg;
> > > > > + void *memmap, *mapend;
> > > > > +
> > > > > + for (pos = start; pos < end; pos = next) {
> > > > > + next = (pos + PAGES_PER_SECTION) & PAGE_SECTION_MASK;
> > > > > + ms = __pfn_to_section(pos);
> > > > > + if (!valid_section(ms))
> > > > > + continue;
> > > > > +
> > > > > + for (memmap = (void*)pfn_to_page(pos),
> > > > > + /* The last page in section */
> > > > > + mapend = pfn_to_page(next-1);
> > > > > + memmap < mapend; memmap += PAGE_SIZE) {
> > > > > + pg = virt_to_page(memmap);
> > > > > + set_page_private(pg, (unsigned long)pg);
> > > > > + }
> > > > > + }
> > > > > +}
>
> Hmm, this loop would need to change for sections. And sizeof(struct
> page) % PAGE_SIZE may not be 0, so we want a global symbol for sparsemem
> too.
IIUC, sizeof(struct page) % PAGE_SIZE is not a probelm.


> Perhaps the mem_section array. Using a symbol that is part of
> the model pre-checks can remove a global symbol lookup and has the side
> effect of making sure our pfn_valid is for the right model.
>

But yes, maybe it's good to make use of a fixed-(magic)-value.


Tanks,
-Kame


2010-07-27 09:56:27

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH] Tight check of pfn_valid on sparsemem - v4

On Tue, Jul 27, 2010 at 5:12 PM, Milton Miller <[email protected]> wrote:
>
> On Tue Jul 27 2010 about 02:11:22 Minchan Kim wrote:
>> > [Sorry if i missed or added anyone on cc, patchwork.kernel.org ?LKML is not
>> > working and I'm not subscribed to the list ]
>>
>> Readd them. :)
>
> Changed linux-mmc at vger to linxu-mm at kvack.org, from my poor use of grep
> MAINTAINERS.
>
>> On Tue, Jul 27, 2010 at 2:55 PM, <miltonm@xxxxxxx> wrote:
>> > On Mon Jul 26 2010 about 12:47:37 EST, Christoph Lameter wrote:
>> > > On Tue, 27 Jul 2010, Minchan Kim wrote:
>> > >
>> > > > This patch registers address of mem_section to memmap itself's page struct's
>> > > > pg->private field. This means the page is used for memmap of the section.
>> > > > Otherwise, the page is used for other purpose and memmap has a hole.
>> >
>> > >
>> > > > +void mark_valid_memmap(unsigned long start, unsigned long end);
>> > > > +
>> > > > +#ifdef CONFIG_ARCH_HAS_HOLES_MEMORYMODEL
>> > > > +static inline int memmap_valid(unsigned long pfn)
>> > > > +{
>> > > > + struct page *page = pfn_to_page(pfn);
>> > > > + struct page *__pg = virt_to_page(page);
>> > > > + return page_private(__pg) == (unsigned long)__pg;
>> > >
>> > >
>> > > What if page->private just happens to be the value of the page struct?
>> > > Even if that is not possible today, someday someone may add new
>> > > functionality to the kernel where page->pivage == page is used for some
>> > > reason.
>> > >
>> > > Checking for PG_reserved wont work?
>> >
>> > I had the same thought and suggest setting it to the memory section block,
>> > since that is a uniquie value (unlike PG_reserved),
>>
>> You mean setting pg->private to mem_section address?
>> I hope I understand your point.
>>
>> Actually, KAMEZAWA tried it at first version but I changed it.
>> That's because I want to support this mechanism to ARM FLATMEM.
>> (It doesn't have mem_section)
>
>> >
>> > .. and we already have computed it when we use it so we could pass it as
>> > a parameter (to both _valid and mark_valid).
>>
>> I hope this can support FALTMEM which have holes(ex, ARM).
>>
>
> If we pass a void * to this helper we should be able to find another
> symbol. ?Looking at the pfn_valid() in arch/arm/mm/init.c I would
> probably choose &meminfo as it is already used nearby, and using a single

If we uses pg itself and PG_reserved, we can remove &meminfo in FLATMEM.

> symbol in would avoid issues if a more specific symbol chosen (eg bank)
> were to change at a pfn boundary not PAGE_SIZE / sizeof(struct page).
> Similarly the asm-generic/page.h version could use &max_mapnr.

I don't consider NOMMU.
I am not sure NOMMU have a this problem.

>
> This function is a validation helper for pfn_valid not the only check.
>
> something like
>
> static inline int memmap_valid(unsigned long pfn, void *validate)
> {
> ? ? ? ?struct page *page = pfn_to_page(pfn);
> ? ? ? ?struct page *__pg = virt_to_page(page);
> ? ? ? ?return page_private(__pg) == validate;
> }

I am not sure what's benefit we have if we use validate argument.

>
> static inline int pfn_valid(unsigned long pfn)
> {
> ? ? ? ?struct mem_section *ms;
> ? ? ? ?if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
> ? ? ? ? ? ? ? ?return 0;
> ? ? ? ?ms = __nr_to_section(pfn_to_section_nr(pfn));
> ? ? ? ?return valid_section(ms) && memmap_valid(pfn, ms);
> }
>
>> > > > +/*
>> > > > + * Fill pg->private on valid mem_map with page itself.
>> > > > + * pfn_valid() will check this later. (see include/linux/mmzone.h)
>> > > > + * Every arch for supporting hole of mem_map should call
>> > > > + * mark_valid_memmap(start, end). please see usage in ARM.
>> > > > + */
>> > > > +void mark_valid_memmap(unsigned long start, unsigned long end)
>> > > > +{
>> > > > + ? ? ? struct mem_section *ms;
>> > > > + ? ? ? unsigned long pos, next;
>> > > > + ? ? ? struct page *pg;
>> > > > + ? ? ? void *memmap, *mapend;
>> > > > +
>> > > > + ? ? ? for (pos = start; pos < end; pos = next) {
>> > > > + ? ? ? ? ? ? ? next = (pos + PAGES_PER_SECTION) & PAGE_SECTION_MASK;
>> > > > + ? ? ? ? ? ? ? ms = __pfn_to_section(pos);
>> > > > + ? ? ? ? ? ? ? if (!valid_section(ms))
>> > > > + ? ? ? ? ? ? ? ? ? ? ? continue;
>> > > > +
>> > > > + ? ? ? ? ? ? ? for (memmap = (void*)pfn_to_page(pos),
>> > > > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? /* The last page in section */
>> > > > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? mapend = pfn_to_page(next-1);
>> > > > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? memmap < mapend; memmap += PAGE_SIZE) {
>> > > > + ? ? ? ? ? ? ? ? ? ? ? pg = virt_to_page(memmap);
>> > > > + ? ? ? ? ? ? ? ? ? ? ? set_page_private(pg, (unsigned long)pg);
>> > > > + ? ? ? ? ? ? ? }
>> > > > + ? ? ? }
>> > > > +}
>
> Hmm, this loop would need to change for sections. ? And sizeof(struct
> page) % PAGE_SIZE may not be 0, so we want a global symbol for sparsemem

I can't understand your point. What is problem of sizeof(struct page)%PAGE_SIZE?
AFAIK, I believe sizeof(struct page) is always 32 bit in 32 bit
machine and most of PAGE_SIZE is 4K. What's problem happen?

> too. ?Perhaps the mem_section array. ?Using a symbol that is part of
> the model pre-checks can remove a global symbol lookup and has the side
> effect of making sure our pfn_valid is for the right model.

global symbol lookup?
Hmm, Please let me know your approach's benefit for improving this patch. :)

Thanks for careful review, milton.

--
Kind regards,
Minchan Kim

2010-07-27 10:01:17

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH] Tight check of pfn_valid on sparsemem - v4

Hi, Kame.

On Tue, Jul 27, 2010 at 5:13 PM, KAMEZAWA Hiroyuki
<[email protected]>
>> Perhaps the mem_section array. ?Using a symbol that is part of
>> the model pre-checks can remove a global symbol lookup and has the side
>> effect of making sure our pfn_valid is for the right model.
>>
>
> But yes, maybe it's good to make use of a fixed-(magic)-value.

fixed-magic-value?
Yes. It can be good for some debugging.
But as Christoph pointed out, we need some strict check(ex,
PG_reserved) for preventing unlucky valid using of magic value in
future.
But in fact I have a concern to use PG_reserved since it can be used
afterward pfn_valid normally to check hole in non-hole system. So I
think it's redundant.

Hmm..

--
Kind regards,
Minchan Kim

2010-07-27 14:35:06

by Christoph Lameter

[permalink] [raw]
Subject: Re: [PATCH] Tight check of pfn_valid on sparsemem - v4

On Tue, 27 Jul 2010, Minchan Kim wrote:

> But in fact I have a concern to use PG_reserved since it can be used
> afterward pfn_valid normally to check hole in non-hole system. So I
> think it's redundant.

PG_reserved is already used to mark pages not handled by the page
allocator (see mmap_init_zone).

2010-07-27 22:33:10

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH] Tight check of pfn_valid on sparsemem - v4

On Tue, Jul 27, 2010 at 11:34 PM, Christoph Lameter
<[email protected]> wrote:
> On Tue, 27 Jul 2010, Minchan Kim wrote:
>
>> But in fact I have a concern to use PG_reserved since it can be used
>> afterward pfn_valid normally to check hole in non-hole system. So I
>> think it's redundant.

Ignore me. I got confused.

>
> PG_reserved is already used to mark pages not handled by the page
> allocator (see mmap_init_zone).


I will resend below approach.

static inline int memmap_valid(unsigned long pfn)
{
struct page *page = pfn_to_page(pfn);
struct page *__pg = virt_to_page(page);
return page_private(__pg) == MAGIC_MEMMAP && PageReserved(__pg);
}

Thanks, all.


--
Kind regards,
Minchan Kim

2010-07-28 15:14:59

by Christoph Lameter

[permalink] [raw]
Subject: Re: [PATCH] Tight check of pfn_valid on sparsemem - v4

On Wed, 28 Jul 2010, Minchan Kim wrote:

> static inline int memmap_valid(unsigned long pfn)
> {
> struct page *page = pfn_to_page(pfn);
> struct page *__pg = virt_to_page(page);

Does that work both for vmemmap and real mmapping?

> return page_private(__pg) == MAGIC_MEMMAP && PageReserved(__pg);
> }

Problem is that pages may be allocated for the mmap from a variety of
places. The pages in mmap_init_zone() and allocated during boot may have
PageReserved set whereas the page allocated via vmemmap_alloc_block() have
PageReserved cleared since they came from the page allocator.

You need to have consistent use of PageReserved in page structs for the
mmap in order to do this properly.

Simplest scheme would be to clear PageReserved() in all page struct
associated with valid pages and clear those for page structs that do not
refer to valid pages.

Then

mmap_valid = !PageReserved(xxx(pfn_to_page(pfn))

2010-07-28 15:56:29

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH] Tight check of pfn_valid on sparsemem - v4

On Wed, Jul 28, 2010 at 10:14:51AM -0500, Christoph Lameter wrote:
> On Wed, 28 Jul 2010, Minchan Kim wrote:
>
> > static inline int memmap_valid(unsigned long pfn)
> > {
> > struct page *page = pfn_to_page(pfn);
> > struct page *__pg = virt_to_page(page);
>
> Does that work both for vmemmap and real mmapping?

When Kame suggested this idea, he doesn't consider vmemmap model.
(He prevent this featur's enabling by config !SPARSEMEM_VMEMMAP)

config SPARSEMEM_HAS_HOLE
bool "allow holes in sparsemem's memmap"
depends on ARM && SPARSEMEM && !SPARSEMEM_VMEMMAP
default n

When I change it with ARCH_HAS_HOLES_MEMORYMODEL, it was my mistake.
I can change it with ARCH_HAS_HOLES_MEMORYMODEL && !SPARSE_VMEMMAP.

I wonder whether we supports VMEMMAP.
That's because hole problem of sparsemem is specific on ARM.
ARM forks uses it for saving memory space but VMEMMAP does use more memory.
I think it's irony.

>
> > return page_private(__pg) == MAGIC_MEMMAP && PageReserved(__pg);
> > }
>
> Problem is that pages may be allocated for the mmap from a variety of
> places. The pages in mmap_init_zone() and allocated during boot may have
> PageReserved set whereas the page allocated via vmemmap_alloc_block() have
> PageReserved cleared since they came from the page allocator.
>
> You need to have consistent use of PageReserved in page structs for the
> mmap in order to do this properly.

Yes if we supports both model.

>
> Simplest scheme would be to clear PageReserved() in all page struct
> associated with valid pages and clear those for page structs that do not
> refer to valid pages.

I can't understand your words.
Clear PG_resereved in valid pages and invalid pages both?

I guess your code look like that clear PG_revered on valid memmap
but set PG_reserved on invalid memmap.
Right?

invalid memmap pages will be freed by free_memmap and will be used
on any place. How do we make sure it has PG_reserved?

Maybe I don't understand your point.


>
> Then
>
> mmap_valid = !PageReserved(xxx(pfn_to_page(pfn))

--
Kind regards,
Minchan Kim

2010-07-28 17:02:21

by Christoph Lameter

[permalink] [raw]
Subject: Re: [PATCH] Tight check of pfn_valid on sparsemem - v4

On Thu, 29 Jul 2010, Minchan Kim wrote:

> > Simplest scheme would be to clear PageReserved() in all page struct
> > associated with valid pages and clear those for page structs that do not
> > refer to valid pages.
>
> I can't understand your words.
> Clear PG_resereved in valid pages and invalid pages both?

Argh sorry. No. Set PageReserved for pages that do not refer to reserved
pages.

> I guess your code look like that clear PG_revered on valid memmap
> but set PG_reserved on invalid memmap.
> Right?

Right.

> invalid memmap pages will be freed by free_memmap and will be used
> on any place. How do we make sure it has PG_reserved?

Not present memmap pages make pfn_valid fail already since there is no
entry for the page table (vmemmap) or blocks are missing in the sparsemem
tables.

> Maybe I don't understand your point.

I thought we are worrying about holes in the memmap blocks containing page
structs. Some page structs point to valid pages and some are not. The
invalid page structs need to be marked consistently to allow the check.

2010-07-28 22:58:12

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH] Tight check of pfn_valid on sparsemem - v4

On Wed, Jul 28, 2010 at 12:02:16PM -0500, Christoph Lameter wrote:
> On Thu, 29 Jul 2010, Minchan Kim wrote:
> > invalid memmap pages will be freed by free_memmap and will be used
> > on any place. How do we make sure it has PG_reserved?
>
> Not present memmap pages make pfn_valid fail already since there is no
> entry for the page table (vmemmap) or blocks are missing in the sparsemem
> tables.
>
> > Maybe I don't understand your point.
>
> I thought we are worrying about holes in the memmap blocks containing page
> structs. Some page structs point to valid pages and some are not. The
> invalid page structs need to be marked consistently to allow the check.

The thing is that memmap pages which contains struct page array on hole will be
freed by free_memmap in ARM. Please loot at arch/arm/mm/init.c.
And it will be used by page allocator as free pages.

--
Kind regards,
Minchan Kim

2010-07-29 15:46:19

by Christoph Lameter

[permalink] [raw]
Subject: Re: [PATCH] Tight check of pfn_valid on sparsemem - v4

On Thu, 29 Jul 2010, Minchan Kim wrote:

> On Wed, Jul 28, 2010 at 12:02:16PM -0500, Christoph Lameter wrote:
> > On Thu, 29 Jul 2010, Minchan Kim wrote:
> > > invalid memmap pages will be freed by free_memmap and will be used
> > > on any place. How do we make sure it has PG_reserved?
> >
> > Not present memmap pages make pfn_valid fail already since there is no
> > entry for the page table (vmemmap) or blocks are missing in the sparsemem
> > tables.
> >
> > > Maybe I don't understand your point.
> >
> > I thought we are worrying about holes in the memmap blocks containing page
> > structs. Some page structs point to valid pages and some are not. The
> > invalid page structs need to be marked consistently to allow the check.
>
> The thing is that memmap pages which contains struct page array on hole will be
> freed by free_memmap in ARM. Please loot at arch/arm/mm/init.c.
> And it will be used by page allocator as free pages.

Arg thats the solution to the mystery. freememmap() is arm specific hack!

Sparsemem allows you to properly handle holes already and then pfn_valid
will work correctly.

Why are the ways to manage holes in the core not used by arm?

sparsemem does a table lookup to determine valid and invalid sections of
the memmp.

from include/linux/mmzone.h:

static inline int pfn_valid(unsigned long pfn)
{
if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
return 0;
return valid_section(__nr_to_section(pfn_to_section_nr(pfn)));
}

2010-07-29 16:19:08

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH] Tight check of pfn_valid on sparsemem - v4

On Thu, Jul 29, 2010 at 10:46:13AM -0500, Christoph Lameter wrote:
> On Thu, 29 Jul 2010, Minchan Kim wrote:
>
> > On Wed, Jul 28, 2010 at 12:02:16PM -0500, Christoph Lameter wrote:
> > > On Thu, 29 Jul 2010, Minchan Kim wrote:
> > > > invalid memmap pages will be freed by free_memmap and will be used
> > > > on any place. How do we make sure it has PG_reserved?
> > >
> > > Not present memmap pages make pfn_valid fail already since there is no
> > > entry for the page table (vmemmap) or blocks are missing in the sparsemem
> > > tables.
> > >
> > > > Maybe I don't understand your point.
> > >
> > > I thought we are worrying about holes in the memmap blocks containing page
> > > structs. Some page structs point to valid pages and some are not. The
> > > invalid page structs need to be marked consistently to allow the check.
> >
> > The thing is that memmap pages which contains struct page array on hole will be
> > freed by free_memmap in ARM. Please loot at arch/arm/mm/init.c.
> > And it will be used by page allocator as free pages.
>
> Arg thats the solution to the mystery. freememmap() is arm specific hack!
>
> Sparsemem allows you to properly handle holes already and then pfn_valid
> will work correctly.
>
> Why are the ways to manage holes in the core not used by arm?

I did use ARCH_HAS_HOLES_MEMORYMODEL.
It is used by only ARM now.
If you disable the config, it doesn't affect the core.

>
> sparsemem does a table lookup to determine valid and invalid sections of
> the memmp.
>
The thing is valid section also have a invalid memmap.
Maybe my description isn't enough.
Please look at description and following URL.

We already confirmed this problem.
http://www.spinics.net/lists/arm-kernel/msg92918.html

== CUT HERE ==

Kukjin reported oops happen while he change min_free_kbytes
http://www.spinics.net/lists/arm-kernel/msg92894.html
It happen by memory map on sparsemem.

The system has a memory map following as.
section 0 section 1 section 2
0x20000000-0x25000000, 0x40000000-0x50000000, 0x50000000-0x58000000
SECTION_SIZE_BITS 28(256M)

It means section 0 is an incompletely filled section.
Nontheless, current pfn_valid of sparsemem checks pfn loosely.
It checks only mem_section's validation but ARM can free mem_map on hole
to save memory space. So in above case, pfn on 0x25000000 can pass pfn_valid's
validation check. It's not what we want.




--
Kind regards,
Minchan Kim

2010-07-29 16:47:33

by Christoph Lameter

[permalink] [raw]
Subject: Re: [PATCH] Tight check of pfn_valid on sparsemem - v4

On Fri, 30 Jul 2010, Minchan Kim wrote:

> The thing is valid section also have a invalid memmap.

Oww... . A valid section points to a valid memmap memory block (the page
structs) but the underlying memory pages may not present. So you can check
the (useless) page structs for the page state of the not present pages in
the memory map. If the granularity of the sparsemem mapping is not
sufficient for your purpose then you can change the sparsemem config
(configuration is in arch/<arch>/include/asm/sparsemem.h but does not
exist for arm).

> It means section 0 is an incompletely filled section.
> Nontheless, current pfn_valid of sparsemem checks pfn loosely.
> It checks only mem_section's validation but ARM can free mem_map on hole
> to save memory space. So in above case, pfn on 0x25000000 can pass pfn_valid's
> validation check. It's not what we want.

IMHO ARM should not poke holes in the memmap sections. The guarantee of
the full presence of the section is intentional to avoid having to do
these checks that you are proposing. The page allocator typically expects
to be able to check all page structs in one basic allocation unit.

Also pfn_valid then does not have to touch the pag struct to perform its
function as long as we guarantee the presence of the memmap section.

2010-07-29 17:03:25

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH] Tight check of pfn_valid on sparsemem - v4

On Thu, Jul 29, 2010 at 11:47:26AM -0500, Christoph Lameter wrote:
> On Fri, 30 Jul 2010, Minchan Kim wrote:
>
> > The thing is valid section also have a invalid memmap.
>
> Oww... . A valid section points to a valid memmap memory block (the page
> structs) but the underlying memory pages may not present. So you can check
> the (useless) page structs for the page state of the not present pages in
> the memory map. If the granularity of the sparsemem mapping is not
> sufficient for your purpose then you can change the sparsemem config
> (configuration is in arch/<arch>/include/asm/sparsemem.h but does not
> exist for arm).
>
> > It means section 0 is an incompletely filled section.
> > Nontheless, current pfn_valid of sparsemem checks pfn loosely.
> > It checks only mem_section's validation but ARM can free mem_map on hole
> > to save memory space. So in above case, pfn on 0x25000000 can pass pfn_valid's
> > validation check. It's not what we want.
>
> IMHO ARM should not poke holes in the memmap sections. The guarantee of
> the full presence of the section is intentional to avoid having to do
> these checks that you are proposing. The page allocator typically expects
> to be able to check all page structs in one basic allocation unit.
>
> Also pfn_valid then does not have to touch the pag struct to perform its
> function as long as we guarantee the presence of the memmap section.

Absolutely Right. Many mm guys wanted to do it.
But Russell doesn't want it.
Please, look at the discussion.

http://www.spinics.net/lists/arm-kernel/msg93026.html

In fact, we didn't determine the approache at that time.
But I think we can't give up ARM's usecase although sparse model
dosn't be desinged to the such granularity. and I think this approach
can solve ARM's FLATMEM's pfn_valid problem which is doing binar search.
So I just tried to solve this problem. But Russell still be quiet.

Okay. I will wait other's opinion.
First of all, let's fix the approach.
Russell, Could you speak your opinion about this approach or your suggestion?

--
Kind regards,
Minchan Kim

2010-07-29 17:30:31

by Christoph Lameter

[permalink] [raw]
Subject: Re: [PATCH] Tight check of pfn_valid on sparsemem - v4

On Fri, 30 Jul 2010, Minchan Kim wrote:

> But Russell doesn't want it.
> Please, look at the discussion.
>
> http://www.spinics.net/lists/arm-kernel/msg93026.html
>
> In fact, we didn't determine the approache at that time.
> But I think we can't give up ARM's usecase although sparse model
> dosn't be desinged to the such granularity. and I think this approach

The sparse model goes down to page size memmap granularity. The problem
that you may have is with aligning the maximum allocation unit of the
page allocator with the section size of sparsemem. If you reduce your
maximum allocation units then you can get more granularity.

> can solve ARM's FLATMEM's pfn_valid problem which is doing binar search.

OMG.

2010-07-29 18:33:50

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [PATCH] Tight check of pfn_valid on sparsemem - v4

On Thu, Jul 29, 2010 at 12:30:23PM -0500, Christoph Lameter wrote:
> On Fri, 30 Jul 2010, Minchan Kim wrote:
>
> > But Russell doesn't want it.
> > Please, look at the discussion.
> >
> > http://www.spinics.net/lists/arm-kernel/msg93026.html
> >
> > In fact, we didn't determine the approache at that time.
> > But I think we can't give up ARM's usecase although sparse model
> > dosn't be desinged to the such granularity. and I think this approach
>
> The sparse model goes down to page size memmap granularity. The problem
> that you may have is with aligning the maximum allocation unit of the
> page allocator with the section size of sparsemem. If you reduce your
> maximum allocation units then you can get more granularity.

Then why is there no advantage to adding 512kB memory modules in a machine
with memory spaced at 64MB apart with sparsemem - the mem_map array for
each sparsemem section is 512kB in size. So the additional 512kB memory
modules give you nothing because they're completely full of mem_map array.

_That's_ the kind of problem that makes sparsemem unsuitable for... sparse
memory layouts found in the embedded world.

And that also makes flatmem unsuitable for use on ARM when you have such
memory layouts - four banks of discrete memory spaced at 64MB over a 256MB
range, which can have a size down to 512kB each.

And no, setting the sparse section size to 512kB doesn't work - memory is
offset by 256MB already, so you need a sparsemem section array of 1024
entries just to cover that - with the full 256MB populated, that's 512
unused entries followed by 512 used entries. That too is going to waste
memory like nobodies business.

Basically, what's come out of this discussion is that the kernel really
_sucks_ when it comes to handling sparse memory layouts found in on ARM.

> > can solve ARM's FLATMEM's pfn_valid problem which is doing binar search.
>
> OMG.

No, it is NOT that expensive. Most people go "omg, binary search on
a cached architecture, that's insane". That statement is soo far from
reality that the statement itself is insane.

The binary search operates on a very small amount of data, and results
in two or possibly three cache lines at the most being loaded, assuming
a full 8 banks of memory information passed. Most systems pass one or
maybe two banks - so the _entire_ thing fits within one cache line - a
cache line which will have already been loaded.

So no, this binary search is not as expensive as you think it is.

2010-07-29 19:55:59

by Christoph Lameter

[permalink] [raw]
Subject: Re: [PATCH] Tight check of pfn_valid on sparsemem - v4

On Thu, 29 Jul 2010, Russell King - ARM Linux wrote:

> And no, setting the sparse section size to 512kB doesn't work - memory is
> offset by 256MB already, so you need a sparsemem section array of 1024
> entries just to cover that - with the full 256MB populated, that's 512
> unused entries followed by 512 used entries. That too is going to waste
> memory like nobodies business.

SPARSEMEM EXTREME does not handle that?

Some ARMs seem to have MMUs. If so then use SPARSEMEM_VMEMMAP. You can map
4k pages for the mmap through a page table. Redirect unused 4k blocks to
the NULL page.

2010-07-29 20:55:35

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH] Tight check of pfn_valid on sparsemem - v4

On Thu, 2010-07-29 at 19:33 +0100, Russell King - ARM Linux wrote:
> And no, setting the sparse section size to 512kB doesn't work - memory is
> offset by 256MB already, so you need a sparsemem section array of 1024
> entries just to cover that - with the full 256MB populated, that's 512
> unused entries followed by 512 used entries. That too is going to waste
> memory like nobodies business.

Sparsemem could use some work in the case where memory doesn't start at
0x0. But, it doesn't seem like it would be _too_ oppressive to add.
It's literally just adding an offset to all of the places where a
physical address is stuck into the system. It'll make a few of the
calculations longer, of course, but it should be manageable.

Could you give some full examples of how the memory is laid out on these
systems? I'm having a bit of a hard time visualizing it.

As Christoph mentioned, SPARSEMEM_EXTREME might be viable here, too.

If you free up parts of the mem_map[] array, how does the buddy
allocator still work? I thought we required at 'struct page's to be
contiguous and present for at least 2^MAX_ORDER-1 pages in one go.

-- Dave

2010-07-29 21:14:10

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [PATCH] Tight check of pfn_valid on sparsemem - v4

On Thu, Jul 29, 2010 at 02:55:53PM -0500, Christoph Lameter wrote:
> On Thu, 29 Jul 2010, Russell King - ARM Linux wrote:
>
> > And no, setting the sparse section size to 512kB doesn't work - memory is
> > offset by 256MB already, so you need a sparsemem section array of 1024
> > entries just to cover that - with the full 256MB populated, that's 512
> > unused entries followed by 512 used entries. That too is going to waste
> > memory like nobodies business.
>
> SPARSEMEM EXTREME does not handle that?
>
> Some ARMs seem to have MMUs. If so then use SPARSEMEM_VMEMMAP. You can map
> 4k pages for the mmap through a page table. Redirect unused 4k blocks to
> the NULL page.

We're going over old ground which has already been covered in this very
thread. I've no compunction to repeat the arguments.

2010-07-29 22:15:05

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [PATCH] Tight check of pfn_valid on sparsemem - v4

On Thu, Jul 29, 2010 at 01:55:19PM -0700, Dave Hansen wrote:
> Could you give some full examples of how the memory is laid out on these
> systems? I'm having a bit of a hard time visualizing it.

In the example I quote, there are four banks of memory, which start at
0x10000000, 0x14000000, 0x18000000 and 0x1c000000 physical, which can
be populated or empty, each one in multiples of 512KB up to the maximum
64MB.

There are other systems where memory starts at 0xc0000000 and 0xc8000000
physical, and the memory size is either 32MB or 64MB.

We also have one class of systems where memory starts at 0xc0000000,
0xc1000000, 0xc2000000, etc - but I don't know what the minimum
populated memory size in any one region is.

Things that we've tried over the years:
1. flatmem, remapping memory into one contiguous chunk (which can cause
problems when parts of the kernel assume that the underlying phys
space is contiguous.)
2. flatmem with holes and a 1:1 v:p mapping (was told we shouldn't be
doing this - and it becomes impossible with sparsely populated banks
of memory split over a large range.)
3. discontigmem (was told this was too heavy, we're not NUMA, we shouldn't
be using this, and it will be deprecated, use sparsemem instead)
4. sparsemem

What we need is something which allows us to handle memory scattered
in several regions of the physical memory map, each bank being a
variable size.

>From what I've seen through this thread, there is no support for such
a setup. (People seem to have their opinions on this, and will tell
you what you should be using, only for someone else to tell you that
you shouldn't be using that! - *) This isn't something new for ARM,
we've had these kinds of issues for the last 10 or more years.

What is new is that we're now seeing systems where the first bank of
memory to be populated is at a higher physical address than the second
bank, and therefore people are setting up v:p mappings which switch the
ordering of these - but this I think is unrelated to the discussion at
hand.

* - this is why I'm exasperated with this latest discussion on it.

While we're here, I'll repeat a point made earlier.

We don't map lowmem in using 4K pages. That would be utter madness
given the small TLB size ARM processors tend to have. Instead, we
map lowmem using 1MB section mappings (which occupy one entry in the
L1 page table.) Modifying these mappings requires all page tables
in the system to be updated - which given that we're SMP etc. now
is not practical.

So the idea that we can remap a section of memory for the mem_map
struct (as suggested several times in this thread) isn't possible
without having it allocated in something like vmalloc space.
Plus, of course, that if you did such a remapping in the lowmem
mapping, the pages which were there become unusable as they lose
their virtual mapping (thereby causing phys_to_virt/virt_to_phys
on their addresses to break.) Therefore, you only gain even more
problems by this method.

2010-07-29 22:28:20

by Christoph Lameter

[permalink] [raw]
Subject: Re: [PATCH] Tight check of pfn_valid on sparsemem - v4

On Thu, 29 Jul 2010, Russell King - ARM Linux wrote:

> We don't map lowmem in using 4K pages. That would be utter madness
> given the small TLB size ARM processors tend to have. Instead, we
> map lowmem using 1MB section mappings (which occupy one entry in the
> L1 page table.) Modifying these mappings requires all page tables
> in the system to be updated - which given that we're SMP etc. now
> is not practical.
>
> So the idea that we can remap a section of memory for the mem_map
> struct (as suggested several times in this thread) isn't possible
> without having it allocated in something like vmalloc space.
> Plus, of course, that if you did such a remapping in the lowmem
> mapping, the pages which were there become unusable as they lose
> their virtual mapping (thereby causing phys_to_virt/virt_to_phys
> on their addresses to break.) Therefore, you only gain even more
> problems by this method.

A 1M page dedicated to vmemmap would only be used for memmap and only be
addressed using the virtual memory address. The pfn to page and vice versa
mapping that is the basic mechamism for virt_to_page and friends is then
straightforward. Nothing breaks.

memory-model.h:
#elif defined(CONFIG_SPARSEMEM_VMEMMAP)

/* memmap is virtually contiguous. */
#define __pfn_to_page(pfn) (vmemmap + (pfn))
#define __page_to_pfn(page) (unsigned long)((page) - vmemmap)


However, if you have such a sparse address space you would not want 1M
blocks for memmap but rather 4k pages. So yes you would need to use
vmalloc space (or reserve another virtual range for that purpose).

2010-07-30 00:39:07

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH] Tight check of pfn_valid on sparsemem - v4

On Thu, 2010-07-29 at 23:14 +0100, Russell King - ARM Linux wrote:
> What we need is something which allows us to handle memory scattered
> in several regions of the physical memory map, each bank being a
> variable size.

Russell, it does sound like you have a pretty pathological case here. :)
It's not one that we've really attempted to address on any other
architectures.

Just to spell it out, if you have 4GB of physical address space, with
512k sections, you need 8192 sections, which means 8192*8 bytes, so it'd
eat 64k of memory. That's the normal SPARSEMEM case.

SPARSEMEM_EXTREME would be a bit different. It's a 2-level lookup.
You'd have 16 "section roots", each representing 256MB of address space.
Each time we put memory under one of those roots, we'd fill in a
512-section second-level table, which is designed to always fit into one
page. If you start at 256MB, you won't waste all those entries.

The disadvantage of SPARSEMEM_EXTREME is that it costs you the extra
level in the lookup. The space loss in arm's case would only be 16
pointers, which would more than be made up for by the other gains.

The other case where it really makes no sense is when you're populating
a single (or small number) of sections, evenly across the address space.
For instance, let's say you have 16 512k banks, evenly spaced at 256MB
intervals:

512k@0x00000000
512k@0x10000000
512k@0x20000000
...
512k@0xF0000000

If you use SPARSEMEM_EXTREME on that it will degenerate to having the
same memory consumption as classic SPARSEMEM, along with the extra
lookup of EXTREME. But, I haven't heard you say that you have this kind
of configuration, yet. :)

SPARSEMEM_EXTREME is really easy to test. You just have to set it in
your .config. To get much use out of it, you'd also need to make the
SECTION_SIZE, like the 512k we were talking about.


-- Dave

2010-07-30 09:32:10

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH] Tight check of pfn_valid on sparsemem - v4

On Fri, Jul 30, 2010 at 5:55 AM, Dave Hansen <[email protected]> wrote:
> On Thu, 2010-07-29 at 19:33 +0100, Russell King - ARM Linux wrote:
>> And no, setting the sparse section size to 512kB doesn't work - memory is
>> offset by 256MB already, so you need a sparsemem section array of 1024
>> entries just to cover that - with the full 256MB populated, that's 512
>> unused entries followed by 512 used entries. ?That too is going to waste
>> memory like nobodies business.
>
> Sparsemem could use some work in the case where memory doesn't start at
> 0x0. ?But, it doesn't seem like it would be _too_ oppressive to add.
> It's literally just adding an offset to all of the places where a
> physical address is stuck into the system. ?It'll make a few of the
> calculations longer, of course, but it should be manageable.
>
> Could you give some full examples of how the memory is laid out on these
> systems? ?I'm having a bit of a hard time visualizing it.
>
> As Christoph mentioned, SPARSEMEM_EXTREME might be viable here, too.
>
> If you free up parts of the mem_map[] array, how does the buddy
> allocator still work? ?I thought we required at 'struct page's to be
> contiguous and present for at least 2^MAX_ORDER-1 pages in one go.

I think in that case, arch should define CONFIG_HOLES_IN_ZONE to prevent
crash. But I am not sure hole architectures on ARM have been used it well.
Kujkin's problem happens not buddy but walking whole pfn to echo
min_free_kbytes.

>
> -- Dave
>
>



--
Kind regards,
Minchan Kim

2010-07-30 09:44:00

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH] Tight check of pfn_valid on sparsemem - v4

On Fri, Jul 30, 2010 at 9:38 AM, Dave Hansen <[email protected]> wrote:
> On Thu, 2010-07-29 at 23:14 +0100, Russell King - ARM Linux wrote:
>> What we need is something which allows us to handle memory scattered
>> in several regions of the physical memory map, each bank being a
>> variable size.
>
> Russell, it does sound like you have a pretty pathological case here. :)
> It's not one that we've really attempted to address on any other
> architectures.
>
> Just to spell it out, if you have 4GB of physical address space, with
> 512k sections, you need 8192 sections, which means 8192*8 bytes, so it'd
> eat 64k of memory. ?That's the normal SPARSEMEM case.
>
> SPARSEMEM_EXTREME would be a bit different. ?It's a 2-level lookup.
> You'd have 16 "section roots", each representing 256MB of address space.
> Each time we put memory under one of those roots, we'd fill in a
> 512-section second-level table, which is designed to always fit into one
> page. ?If you start at 256MB, you won't waste all those entries.
>
> The disadvantage of SPARSEMEM_EXTREME is that it costs you the extra
> level in the lookup. ?The space loss in arm's case would only be 16
> pointers, which would more than be made up for by the other gains.
>
> The other case where it really makes no sense is when you're populating
> a single (or small number) of sections, evenly across the address space.
> For instance, let's say you have 16 512k banks, evenly spaced at 256MB
> intervals:
>
> ? ? ? ?512k@0x00000000
> ? ? ? ?512k@0x10000000
> ? ? ? ?512k@0x20000000
> ? ? ? ?...
> ? ? ? ?512k@0xF0000000
>
> If you use SPARSEMEM_EXTREME on that it will degenerate to having the
> same memory consumption as classic SPARSEMEM, along with the extra
> lookup of EXTREME. ?But, I haven't heard you say that you have this kind
> of configuration, yet. :)
>
> SPARSEMEM_EXTREME is really easy to test. ?You just have to set it in
> your .config. ?To get much use out of it, you'd also need to make the
> SECTION_SIZE, like the 512k we were talking about.
>

Thanks for good explanation.
When this problem happened, I suggested to use section size 16M.
The space isn't a big cost but failed since Russell doesn't like it.

So I tried to enhance sparsemem to support hole but you guys doesn't like it.
Frankly speaking myself don't like this approach but I think whoever
have to care of the problem.

Hmm, Is it better to give up Samsung's good embedded board?
It depends on Russell's opinion.

I will hold this patch until reaching the conclusion of controversial
discussion.
Thanks, Dave.

> -- Dave
>
>



--
Kind regards,
Minchan Kim

2010-07-30 12:48:10

by Christoph Lameter

[permalink] [raw]
Subject: Re: [PATCH] Tight check of pfn_valid on sparsemem - v4

On Thu, 29 Jul 2010, Dave Hansen wrote:

> SPARSEMEM_EXTREME would be a bit different. It's a 2-level lookup.
> You'd have 16 "section roots", each representing 256MB of address space.
> Each time we put memory under one of those roots, we'd fill in a
> 512-section second-level table, which is designed to always fit into one
> page. If you start at 256MB, you won't waste all those entries.

That is certain a solution to the !MMU case and it would work very much
like a page table. If you have an MMU then the vmemmap sparsemem
configuration can take advantage of of that to avoid the 2 level lookup.

2010-07-30 15:44:01

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH] Tight check of pfn_valid on sparsemem - v4

On Fri, 2010-07-30 at 07:48 -0500, Christoph Lameter wrote:
> On Thu, 29 Jul 2010, Dave Hansen wrote:
>
> > SPARSEMEM_EXTREME would be a bit different. It's a 2-level lookup.
> > You'd have 16 "section roots", each representing 256MB of address space.
> > Each time we put memory under one of those roots, we'd fill in a
> > 512-section second-level table, which is designed to always fit into one
> > page. If you start at 256MB, you won't waste all those entries.
>
> That is certain a solution to the !MMU case and it would work very much
> like a page table. If you have an MMU then the vmemmap sparsemem
> configuration can take advantage of of that to avoid the 2 level lookup.

Yup, couldn't agree more, Christoph.

It wouldn't hurt to have several them available on ARM since the
architecture is so diverse.

-- Dave

2010-07-31 10:40:08

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [PATCH] Tight check of pfn_valid on sparsemem - v4

On Fri, Jul 30, 2010 at 06:32:04PM +0900, Minchan Kim wrote:
> On Fri, Jul 30, 2010 at 5:55 AM, Dave Hansen <[email protected]> wrote:
> > If you free up parts of the mem_map[] array, how does the buddy
> > allocator still work? ?I thought we required at 'struct page's to be
> > contiguous and present for at least 2^MAX_ORDER-1 pages in one go.

(Dave, I don't seem to have your mail to reply to.)

What you say is correct, and memory banks as a rule of thumb tend to be
powers of two.

We do have the ability to change MAX_ORDER (which we need to do for some
platforms where there's only 1MB of DMA-able memory.)

However, in the case of two 512KB banks, the buddy allocator won't try
to satisfy a 1MB request as it'll only have two separate 2x512K free
'pages' to deal with, and 0x1M free 'pages'.

2010-07-31 15:31:12

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [PATCH] Tight check of pfn_valid on sparsemem - v4

On Fri, Jul 30, 2010 at 07:48:00AM -0500, Christoph Lameter wrote:
> On Thu, 29 Jul 2010, Dave Hansen wrote:
>
> > SPARSEMEM_EXTREME would be a bit different. It's a 2-level lookup.
> > You'd have 16 "section roots", each representing 256MB of address space.
> > Each time we put memory under one of those roots, we'd fill in a
> > 512-section second-level table, which is designed to always fit into one
> > page. If you start at 256MB, you won't waste all those entries.
>
> That is certain a solution to the !MMU case and it would work very much
> like a page table. If you have an MMU then the vmemmap sparsemem
> configuration can take advantage of of that to avoid the 2 level lookup.

Looking at vmemmap sparsemem, we need to fix it as the page table
allocation in there bypasses the arch defined page table setup.

This causes a problem if you have 256-entry L2 page tables with no
room for the additional Linux VM PTE support bits (such as young,
dirty, etc), and need to glue two 256-entry L2 hardware page tables
plus a Linux version to store its accounting in each page. See
arch/arm/include/asm/pgalloc.h.

So this causes a problem with vmemmap:

pte_t entry;
void *p = vmemmap_alloc_block_buf(PAGE_SIZE, node);
if (!p)
return NULL;
entry = pfn_pte(__pa(p) >> PAGE_SHIFT, PAGE_KERNEL);

Are you willing for this stuff to be replaced by architectures as
necessary?

2010-08-02 15:48:55

by Christoph Lameter

[permalink] [raw]
Subject: Re: [PATCH] Tight check of pfn_valid on sparsemem - v4

On Sat, 31 Jul 2010, Russell King - ARM Linux wrote:

> Looking at vmemmap sparsemem, we need to fix it as the page table
> allocation in there bypasses the arch defined page table setup.

You are required to define your own vmemmap_populate function. In that you
can call some of the provided functions or use your own.

> This causes a problem if you have 256-entry L2 page tables with no
> room for the additional Linux VM PTE support bits (such as young,
> dirty, etc), and need to glue two 256-entry L2 hardware page tables
> plus a Linux version to store its accounting in each page. See
> arch/arm/include/asm/pgalloc.h.
>
> So this causes a problem with vmemmap:
>
> pte_t entry;
> void *p = vmemmap_alloc_block_buf(PAGE_SIZE, node);
> if (!p)
> return NULL;
> entry = pfn_pte(__pa(p) >> PAGE_SHIFT, PAGE_KERNEL);
>
> Are you willing for this stuff to be replaced by architectures as
> necessary?

Sure its designed that way. If we missed anything we'd surely add it.

2010-08-11 15:31:57

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH] Tight check of pfn_valid on sparsemem - v4

On Sat, 2010-07-31 at 11:38 +0100, Russell King - ARM Linux wrote:
> On Fri, Jul 30, 2010 at 06:32:04PM +0900, Minchan Kim wrote:
> > On Fri, Jul 30, 2010 at 5:55 AM, Dave Hansen <[email protected]> wrote:
> > > If you free up parts of the mem_map[] array, how does the buddy
> > > allocator still work? I thought we required at 'struct page's to be
> > > contiguous and present for at least 2^MAX_ORDER-1 pages in one go.
>
> (Dave, I don't seem to have your mail to reply to.)
>
> What you say is correct, and memory banks as a rule of thumb tend to be
> powers of two.
>
> We do have the ability to change MAX_ORDER (which we need to do for some
> platforms where there's only 1MB of DMA-able memory.)
>
> However, in the case of two 512KB banks, the buddy allocator won't try
> to satisfy a 1MB request as it'll only have two separate 2x512K free
> 'pages' to deal with, and 0x1M free 'pages'.

Right, it won't try to _coalesce_ those pages, but it will go trying to
look for the freed page's buddy in the empty area. This is probably a
pretty rare issue, but I think it's real. Take a look at
__free_one_page():

...
while (order < MAX_ORDER-1) {
buddy = __page_find_buddy(page, page_idx, order);
if (!page_is_buddy(page, buddy, order))
break;

We look at the page, and the order of the page that just got freed. We
go looking to see whether the page's buddy at this order is in the buddy
system, and _that_ tells us whether a coalesce can be done. However, we
do this with some funky math on the original page's 'struct page *':

static inline struct page *
__page_find_buddy(struct page *page, unsigned long page_idx, unsigned int order)
{
unsigned long buddy_idx = page_idx ^ (1 << order);

return page + (buddy_idx - page_idx);
}

That relies on all 'struct pages' within the current 2^MAX_ORDER to be
virtually contiguous. If you free up section_mem_map[] 'struct page'
blocks within the MAX_ORDER, the free'd page's buddy's 'struct page'
might fall in the area that got freed. In that case, you'll get an
effectively random PageBuddy() value, and might mistakenly coalesce the
page.

In practice with a 1MB MAX_ORDER and 512KB banks, it'll only happen if
you free the page representing the entire 512KB bank, and if the memory
for the other half 'struct page' has already gotten reused. That's
probably why you've never seen it.

-- Dave