2005-11-15 22:10:11

by Mike Kravetz

[permalink] [raw]
Subject: pfn_to_nid under CONFIG_SPARSEMEM and CONFIG_NUMA

The following code/comment is in <linux/mmzone.h> if SPARSEMEM
and NUMA are configured.

/*
* These are _only_ used during initialisation, therefore they
* can use __initdata ... They could have names to indicate
* this restriction.
*/
#ifdef CONFIG_NUMA
#define pfn_to_nid early_pfn_to_nid
#endif

However, pfn_to_nid is certainly used in check_pte_range() mm/mempolicy.c.
I wouldn't be surprised to find more non init time uses if you follow all
the call chains.

On ppc64, early_pfn_to_nid now only uses __initdata. So, I would expect
policy code that calls check_pte_range to cause serious problems on ppc64.

Any suggestions on how this should really be structured? I'm thinking
of removing the above definition of pfn_to_nid to force each architecture
to provide a (non init only) version.

--
Mike


2005-11-16 03:15:06

by Yasunori Goto

[permalink] [raw]
Subject: Re: pfn_to_nid under CONFIG_SPARSEMEM and CONFIG_NUMA


On Tue, 15 Nov 2005 14:10:03 -0800
Mike Kravetz <[email protected]> wrote:

> The following code/comment is in <linux/mmzone.h> if SPARSEMEM
> and NUMA are configured.
>
> /*
> * These are _only_ used during initialisation, therefore they
> * can use __initdata ... They could have names to indicate
> * this restriction.
> */
> #ifdef CONFIG_NUMA
> #define pfn_to_nid early_pfn_to_nid
> #endif
>
> However, pfn_to_nid is certainly used in check_pte_range() mm/mempolicy.c.
> I wouldn't be surprised to find more non init time uses if you follow all
> the call chains.
>
> On ppc64, early_pfn_to_nid now only uses __initdata. So, I would expect
> policy code that calls check_pte_range to cause serious problems on ppc64.
>
> Any suggestions on how this should really be structured? I'm thinking
> of removing the above definition of pfn_to_nid to force each architecture
> to provide a (non init only) version.

Yes! I worried about same things.
How is this?

static inline int pfn_to_nid(unsigned long pfn)
{
return page_to_nid(pfn_to_page(pfn));
}

page_to_nid() and pfn_to_page() is well defined.
Probably, this will work on all architecture.
So, just we should check this should be used after that memmap
is initialized.


Bye.

--
Yasunori Goto


2005-11-16 13:00:41

by Robin Holt

[permalink] [raw]
Subject: Re: pfn_to_nid under CONFIG_SPARSEMEM and CONFIG_NUMA

On Wed, Nov 16, 2005 at 12:14:18PM +0900, Yasunori Goto wrote:
> static inline int pfn_to_nid(unsigned long pfn)
> {
> return page_to_nid(pfn_to_page(pfn));

But that does not work if the pfn points to something which does not
have a struct page behind it (uncached memory on ia64 for instance).
At the very least you would need to ensure pfn_to_page returns a struct
page * before continuing blindly.

> page_to_nid() and pfn_to_page() is well defined.
> Probably, this will work on all architecture.
> So, just we should check this should be used after that memmap
> is initialized.

Robin

2005-11-16 13:26:05

by Andy Whitcroft

[permalink] [raw]
Subject: Re: pfn_to_nid under CONFIG_SPARSEMEM and CONFIG_NUMA

Mike Kravetz wrote:
> The following code/comment is in <linux/mmzone.h> if SPARSEMEM
> and NUMA are configured.
>
> /*
> * These are _only_ used during initialisation, therefore they
> * can use __initdata ... They could have names to indicate
> * this restriction.
> */
> #ifdef CONFIG_NUMA
> #define pfn_to_nid early_pfn_to_nid
> #endif

Ok. This was a ploy to avoid lots of code churn which has bitten us.
The separation here is to indicate that pfn_to_nid isn't necessarily
safe until after the memory model is init'd. When the code was
initially implmented we only used pfn_to_nid in init code so it wasn't
an issue. What we need to do here is break this link and make sure each
user is using the right version.

I'll go and put together something now.

-apw