2008-08-21 13:28:21

by Mel Gorman

[permalink] [raw]
Subject: [PATCH] Skip memory holes in FLATMEM when reading /proc/pagetypeinfo (resend)

This is a resend. The last patch went to Russell King with lkml cc'd as I
wasn't subscribed to the linux-arm list. However, I haven't heard it being
picked up so trying linux-arm this time.

===

Ordinarily, memory holes in flatmem still have a valid memmap and is safe
to use. However, an architecture (ARM) frees up the memmap backing memory
holes on the assumption it is never used. /proc/pagetypeinfo reads the
whole range of pages in a zone believing that the memmap is valid and that
pfn_valid will return false if it is not. On ARM, freeing the memmap breaks
the page->zone linkages even though pfn_valid() returns true and the kernel
can oops shortly afterwards due to accessing a bogus struct zone *.

This patch lets architectures say when FLATMEM can have holes in the
memmap. Rather than an expensive check for valid memory, /proc/pagetypeinfo
will confirm that the page linkages are still valid by checking page->zone
is still the expected zone. The lookup of page_zone is safe as there is a
limited range of memory that is accessed when calling page_zone. Even if
page_zone happens to return the correct zone, the impact is that the counters
in /proc/pagetypeinfo are slightly off but fragmentation monitoring is
unlikely to be relevant on an embedded system.

Reported-by: H Hartley Sweeten <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Tested-by: H Hartley Sweeten <[email protected]>

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 4b8acd2..70dba16 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -810,6 +810,11 @@ config OABI_COMPAT
UNPREDICTABLE (in fact it can be predicted that it won't work
at all). If in doubt say Y.

+config ARCH_FLATMEM_HAS_HOLES
+ bool
+ default y
+ depends on FLATMEM
+
config ARCH_DISCONTIGMEM_ENABLE
bool
default (ARCH_LH7A40X && !LH7A40X_CONTIGMEM)
diff --git a/mm/vmstat.c b/mm/vmstat.c
index b0d08e6..98aa882 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -516,9 +516,26 @@ static void pagetypeinfo_showblockcount_print(struct seq_file *m,
continue;

page = pfn_to_page(pfn);
+#ifdef CONFIG_ARCH_FLATMEM_HAS_HOLES
+ /*
+ * Ordinarily, memory holes in flatmem still have a valid
+ * memmap for the PFN range. However, an architecture for
+ * embedded systems (e.g. ARM) can free up the memmap backing
+ * holes to save memory on the assumption the memmap is
+ * never used. The page_zone linkages are then broken even
+ * though pfn_valid() returns true. Skip the page if the
+ * linkages are broken. Even if this test passed, the impact
+ * is that the counters for the movable type are off but
+ * fragmentation monitoring is likely meaningless on small
+ * systems.
+ */
+ if (page_zone(page) != zone)
+ continue;
+#endif
mtype = get_pageblock_migratetype(page);

- count[mtype]++;
+ if (mtype < MIGRATE_TYPES)
+ count[mtype]++;
}

/* Print counts */
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab


2008-08-21 16:35:17

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] Skip memory holes in FLATMEM when reading /proc/pagetypeinfo (resend)

On Thu, 21 Aug 2008 14:28:05 +0100 Mel Gorman <[email protected]> wrote:

> This is a resend. The last patch went to Russell King with lkml cc'd as I
> wasn't subscribed to the linux-arm list. However, I haven't heard it being
> picked up so trying linux-arm this time.
>
> ===
>
> Ordinarily, memory holes in flatmem still have a valid memmap and is safe
> to use. However, an architecture (ARM) frees up the memmap backing memory
> holes on the assumption it is never used. /proc/pagetypeinfo reads the
> whole range of pages in a zone believing that the memmap is valid and that
> pfn_valid will return false if it is not. On ARM, freeing the memmap breaks
> the page->zone linkages even though pfn_valid() returns true and the kernel
> can oops shortly afterwards due to accessing a bogus struct zone *.
>
> This patch lets architectures say when FLATMEM can have holes in the
> memmap. Rather than an expensive check for valid memory, /proc/pagetypeinfo
> will confirm that the page linkages are still valid by checking page->zone
> is still the expected zone. The lookup of page_zone is safe as there is a
> limited range of memory that is accessed when calling page_zone. Even if
> page_zone happens to return the correct zone, the impact is that the counters
> in /proc/pagetypeinfo are slightly off but fragmentation monitoring is
> unlikely to be relevant on an embedded system.

Sounds like this might fix an oops. Does it?

The patch applies to 2.6.25 and to 2.6.26. Should it be backported?

2008-08-21 16:57:44

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [PATCH] Skip memory holes in FLATMEM when reading /proc/pagetypeinfo (resend)

On Thu, Aug 21, 2008 at 09:34:00AM -0700, Andrew Morton wrote:
> On Thu, 21 Aug 2008 14:28:05 +0100 Mel Gorman <[email protected]> wrote:
> > This patch lets architectures say when FLATMEM can have holes in the
> > memmap. Rather than an expensive check for valid memory, /proc/pagetypeinfo
> > will confirm that the page linkages are still valid by checking page->zone
> > is still the expected zone. The lookup of page_zone is safe as there is a
> > limited range of memory that is accessed when calling page_zone. Even if
> > page_zone happens to return the correct zone, the impact is that the counters
> > in /proc/pagetypeinfo are slightly off but fragmentation monitoring is
> > unlikely to be relevant on an embedded system.
>
> Sounds like this might fix an oops. Does it?
>
> The patch applies to 2.6.25 and to 2.6.26. Should it be backported?

The only concern there is with this patch is that we're still walking
over the memory, which could contain anything. We could be unlucky
and end up with page_zone(page) == zone.

It'll do as a stop gap, but I think the real solution is to switch over
to using sparsemem, and get rid of ARMs private version (which even
pre-dates discontigmem.) That first assumes that we have sparsemem
working on ARM - however folk seem to prefer discontigmem over
sparsemem so I don't know what state sparsemem on ARM is in. :(

2008-08-22 14:49:53

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH] Skip memory holes in FLATMEM when reading /proc/pagetypeinfo (resend)

On (21/08/08 17:56), Russell King - ARM Linux didst pronounce:
> On Thu, Aug 21, 2008 at 09:34:00AM -0700, Andrew Morton wrote:
> > On Thu, 21 Aug 2008 14:28:05 +0100 Mel Gorman <[email protected]> wrote:
> > > This patch lets architectures say when FLATMEM can have holes in the
> > > memmap. Rather than an expensive check for valid memory, /proc/pagetypeinfo
> > > will confirm that the page linkages are still valid by checking page->zone
> > > is still the expected zone. The lookup of page_zone is safe as there is a
> > > limited range of memory that is accessed when calling page_zone. Even if
> > > page_zone happens to return the correct zone, the impact is that the counters
> > > in /proc/pagetypeinfo are slightly off but fragmentation monitoring is
> > > unlikely to be relevant on an embedded system.
> >
> > Sounds like this might fix an oops. Does it?
> >

Yes, it does. Sorry for not being clear on that.

> > The patch applies to 2.6.25 and to 2.6.26. Should it be backported?
>

It wouldn't hurt. It's not a critical functionality failure and only affects
ARM but being able to generate oops from userspace is a bit of a loss.

> The only concern there is with this patch is that we're still walking
> over the memory, which could contain anything. We could be unlucky
> and end up with page_zone(page) == zone.
>

If you do, the impact is that the counters are slightly off which is not
that big of a deal. To avoid doing it, information would have to be kept
around that might end up being larger than the memmap freed.

> It'll do as a stop gap, but I think the real solution is to switch over
> to using sparsemem, and get rid of ARMs private version (which even
> pre-dates discontigmem.) That first assumes that we have sparsemem
> working on ARM - however folk seem to prefer discontigmem over
> sparsemem so I don't know what state sparsemem on ARM is in. :(
>

No idea. I know that SPARSEMEM would be preferred as a memory model as
it is a lot less arch-specific than DISCONTIG is.

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

2008-08-22 16:19:37

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH] Skip memory holes in FLATMEM when reading /proc/pagetypeinfo (resend)

On Thu, 2008-08-21 at 14:28 +0100, Mel Gorman wrote:
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index 4b8acd2..70dba16 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -810,6 +810,11 @@ config OABI_COMPAT
> UNPREDICTABLE (in fact it can be predicted that it won't work
> at all). If in doubt say Y.
>
> +config ARCH_FLATMEM_HAS_HOLES
> + bool
> + default y
> + depends on FLATMEM
> +
> config ARCH_DISCONTIGMEM_ENABLE
> bool
> default (ARCH_LH7A40X && !LH7A40X_CONTIGMEM)
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index b0d08e6..98aa882 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -516,9 +516,26 @@ static void pagetypeinfo_showblockcount_print(struct seq_file *m,
> continue;
>
> page = pfn_to_page(pfn);
> +#ifdef CONFIG_ARCH_FLATMEM_HAS_HOLES
> + /*
> + * Ordinarily, memory holes in flatmem still have a valid
> + * memmap for the PFN range. However, an architecture for
> + * embedded systems (e.g. ARM) can free up the memmap backing
> + * holes to save memory on the assumption the memmap is
> + * never used.

I'm not sure where this assumption is coming from. Have you taken a
look at the ARM code?

> The page_zone linkages are then broken even
> + * though pfn_valid() returns true. Skip the page if the
> + * linkages are broken. Even if this test passed, the impact
> + * is that the counters for the movable type are off but
> + * fragmentation monitoring is likely meaningless on small
> + * systems.
> + */
> + if (page_zone(page) != zone)
> + continue;
> +#endif
> mtype = get_pageblock_migratetype(page);
>
> - count[mtype]++;
> + if (mtype < MIGRATE_TYPES)
> + count[mtype]++;
> }

Is that really worth an #ifdef? It is in code that isn't in a hot path,
and the page_zone() should be a repetitive operation on a structure that
is already (or will soon be) in the cpu cache.

Ugh, but as I think about it, this is going to be a much more widespread
problem. Looking at the arm code, the memory for the mem_map[]e *does*
get freed back into the bootmem allocator:

free_memmap(int node, unsigned long start_pfn, unsigned long end_pfn)
{
...
if (pg < pgend)
free_bootmem_node(NODE_DATA(node), pg, pgend - pg);
}

Doesn't that mean that we can get random gunk in the middle of the
mem_map[] when someone else allocates and uses this memory?

We have quite a few things across the kernel that iterate over pfn
ranges and require pfn_valid() to be working.

Can we talk ARM into converting over to sparsemem? A plain sparsemem
port for MIPS that got coded up recently was ~25 lines of arch code. A
bit more if they decide to do vmemmap. There's even a nice bit for
Documentation/ about what was done:

http://article.gmane.org/gmane.linux.ports.mips.general/21248/match=mips+sparsemem

-- Dave

2008-08-22 17:18:14

by Christoph Lameter

[permalink] [raw]
Subject: Re: [PATCH] Skip memory holes in FLATMEM when reading /proc/pagetypeinfo (resend)

The VMEMMAP code can also be instead of flatmem with holes. Just put the
memmap into real memory and make the populate function do nothing.