In hot add node(memory) case, vmemmap pages are always allocated from other
node, but the current logic just skip vmemmap_verify check.
So we should also issue "potential offnode page_structs" warning messages
if we are the case
Lin Feng (2):
mm: vmemmap: x86: add vmemmap_verify check for hot-add node case
mm: vmemmap: arm64: add vmemmap_verify check for hot-add node case
arch/arm64/mm/mmu.c | 4 ++--
arch/x86/mm/init_64.c | 6 ++++--
2 files changed, 6 insertions(+), 4 deletions(-)
--
1.8.0.1
In hot add node(memory) case, vmemmap pages are always allocated from other
node, but the current logic just skip vmemmap_verify check.
So we should also issue "potential offnode page_structs" warning messages
if we are the case.
Cc: Christoph Lameter <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Tony Lindgren <[email protected]>
Cc: Ben Hutchings <[email protected]>
Cc: Andrew Morton <[email protected]>
Reported-by: Yasuaki Ishimatsu <[email protected]>
Signed-off-by: Lin Feng <[email protected]>
---
arch/arm64/mm/mmu.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 70b8cd4..9f1e417 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -427,8 +427,8 @@ int __meminit vmemmap_populate(struct page *start_page,
return -ENOMEM;
set_pmd(pmd, __pmd(__pa(p) | prot_sect_kernel));
- } else
- vmemmap_verify((pte_t *)pmd, node, addr, next);
+ }
+ vmemmap_verify((pte_t *)pmd, node, addr, next);
} while (addr = next, addr != end);
return 0;
--
1.8.0.1
In hot add node(memory) case, vmemmap pages are always allocated from other
node, but the current logic just skip vmemmap_verify check.
So we should also issue "potential offnode page_structs" warning messages
if we are the case.
Cc: Christoph Lameter <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Yinghai Lu <[email protected]>
Cc: Andrew Morton <[email protected]>
Reported-by: Yasuaki Ishimatsu <[email protected]>
Signed-off-by: Lin Feng <[email protected]>
---
arch/x86/mm/init_64.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 474e28f..e2a7277 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1318,6 +1318,8 @@ vmemmap_populate(struct page *start_page, unsigned long size, int node)
if (!p)
return -ENOMEM;
+ vmemmap_verify((pte_t *)p, node, addr, addr + PAGE_SIZE);
+
addr_end = addr + PAGE_SIZE;
p_end = p + PAGE_SIZE;
} else {
@@ -1347,8 +1349,8 @@ vmemmap_populate(struct page *start_page, unsigned long size, int node)
addr_end = addr + PMD_SIZE;
p_end = p + PMD_SIZE;
- } else
- vmemmap_verify((pte_t *)pmd, node, addr, next);
+ }
+ vmemmap_verify((pte_t *)pmd, node, addr, next);
}
}
--
1.8.0.1
Hi all,
On 04/08/2013 05:56 PM, Lin Feng wrote:
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index 474e28f..e2a7277 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -1318,6 +1318,8 @@ vmemmap_populate(struct page *start_page, unsigned long size, int node)
> if (!p)
> return -ENOMEM;
>
> + vmemmap_verify((pte_t *)p, node, addr, addr + PAGE_SIZE);
> +
> addr_end = addr + PAGE_SIZE;
> p_end = p + PAGE_SIZE;
> } else {
IIUC it seems that the original 'p_end = p + PAGE_SIZE' assignment is buggy, because:
1309 if (!cpu_has_pse) {
1310 next = (addr + PAGE_SIZE) & PAGE_MASK;
1311 pmd = vmemmap_pmd_populate(pud, addr, node);
1312
1313 if (!pmd)
1314 return -ENOMEM;
1315
1316 p = vmemmap_pte_populate(pmd, addr, node);
1317
1318 if (!p)
1319 return -ENOMEM;
1320
1321 addr_end = addr + PAGE_SIZE;
1322 p_end = p + PAGE_SIZE;
The return value of vmemmap_pte_populate() is the virtual address of pte, not the allocated
virtual address, which is different from vmemmap_alloc_block_buf() in cpu_has_pse case, so
the addition PAGE_SIZE in !cpu_has_pse case is nonsense.
Or am I missing something?
thanks,
linfeng
Hello,
On Mon, Apr 08, 2013 at 10:56:40AM +0100, Lin Feng wrote:
> In hot add node(memory) case, vmemmap pages are always allocated from other
> node, but the current logic just skip vmemmap_verify check.
> So we should also issue "potential offnode page_structs" warning messages
> if we are the case.
>
> Cc: Christoph Lameter <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: Arnd Bergmann <[email protected]>
> Cc: Tony Lindgren <[email protected]>
> Cc: Ben Hutchings <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Reported-by: Yasuaki Ishimatsu <[email protected]>
> Signed-off-by: Lin Feng <[email protected]>
> ---
> arch/arm64/mm/mmu.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 70b8cd4..9f1e417 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -427,8 +427,8 @@ int __meminit vmemmap_populate(struct page *start_page,
> return -ENOMEM;
>
> set_pmd(pmd, __pmd(__pa(p) | prot_sect_kernel));
> - } else
> - vmemmap_verify((pte_t *)pmd, node, addr, next);
> + }
> + vmemmap_verify((pte_t *)pmd, node, addr, next);
> } while (addr = next, addr != end);
>
> return 0;
Given that we don't have NUMA support or memory-hotplug on arm64 yet, I'm
not sure that this change makes much sense at the moment. early_pfn_to_nid
will always return 0 and we only ever have one node.
To be honest, I'm not sure what that vmemmap_verify check is trying to
achieve anyway. ia64 does some funky node affinity initialisation early on
but, for the rest of us, it looks like we always just check the distance
from node 0.
Will
On Mon, Apr 8, 2013 at 2:56 AM, Lin Feng <[email protected]> wrote:
> In hot add node(memory) case, vmemmap pages are always allocated from other
> node,
that is broken, and should be fixed.
vmemmap should be on local node even for hot add node.
Thanks
Yinghai
On Mon, 8 Apr 2013 11:40:11 -0700 Yinghai Lu <[email protected]> wrote:
> On Mon, Apr 8, 2013 at 2:56 AM, Lin Feng <[email protected]> wrote:
> > In hot add node(memory) case, vmemmap pages are always allocated from other
> > node,
>
> that is broken, and should be fixed.
> vmemmap should be on local node even for hot add node.
>
That would be nice.
I don't see much value in the added warnings, really. Because there's
nothing the user can *do* about them, apart from a) stop using NUMA, b)
stop using memory hotplug, c) become a kernel MM developer or d) switch
to Windows.
Hi Andrew,
On 04/09/2013 04:55 AM, Andrew Morton wrote:
> On Mon, 8 Apr 2013 11:40:11 -0700 Yinghai Lu <[email protected]> wrote:
>
>> On Mon, Apr 8, 2013 at 2:56 AM, Lin Feng <[email protected]> wrote:
>>> In hot add node(memory) case, vmemmap pages are always allocated from other
>>> node,
>>
>> that is broken, and should be fixed.
>> vmemmap should be on local node even for hot add node.
>>
>
> That would be nice.
>
> I don't see much value in the added warnings, really. Because there's
> nothing the user can *do* about them, apart from a) stop using NUMA, b)
> stop using memory hotplug, c) become a kernel MM developer or d) switch
> to Windows.
>
>
I agree that we can't do anything helpful to response to such warnings for
the moment, but maybe someone can at least take your c) measure if it's
what he really cares. ;-)
This patch sent because we found that on a old kernel we get such warnings
but we don't on latest kernel, it appears that it has been fixed by someone
but in fact it is due to sizeof(struct page) is 64bytes aligned now but not
on the old kernel. Now the struct pages for a section is always 2MB in size,
every time we populate vmemmap for a section we get a new pmd, so the
vmemmap_verify() check is just ignored. Such phenomenon is misleading.
struct page {
...
}
#ifdef CONFIG_HAVE_ALIGNED_STRUCT_PAGE
__aligned(2 * sizeof(unsigned long))
#endif
;
Anyway the current logic for vmemmap_verify() is broken :(
thanks,
linfeng
Hi Yinghai,
On 04/09/2013 02:40 AM, Yinghai Lu wrote:
> On Mon, Apr 8, 2013 at 2:56 AM, Lin Feng <[email protected]> wrote:
>> In hot add node(memory) case, vmemmap pages are always allocated from other
>> node,
>
> that is broken, and should be fixed.
> vmemmap should be on local node even for hot add node.
>
Have you ever sent any relative patchset on this, maybe we can run some test
on it ;-)
thanks,
linfeng
Hi will,
On 04/08/2013 06:55 PM, Will Deacon wrote:
> Given that we don't have NUMA support or memory-hotplug on arm64 yet, I'm
> not sure that this change makes much sense at the moment. early_pfn_to_nid
> will always return 0 and we only ever have one node.
>
> To be honest, I'm not sure what that vmemmap_verify check is trying to
> achieve anyway. ia64 does some funky node affinity initialisation early on
> but, for the rest of us, it looks like we always just check the distance
> from node 0.
Sorry for my noise to arm people.
Yes, not everyone cares about vmemmap_verify(), as you described it's not
necessary to arm64 at all.
thanks,
linfeng
Hi Yinghai,
(Add cc Liu Jiang.)
On 04/09/2013 02:40 AM, Yinghai Lu wrote:
> On Mon, Apr 8, 2013 at 2:56 AM, Lin Feng<[email protected]> wrote:
>> In hot add node(memory) case, vmemmap pages are always allocated from other
>> node,
>
> that is broken, and should be fixed.
> vmemmap should be on local node even for hot add node.
>
I want some info sharing. :)
Here is the work I'm trying to do.
1. As most of people don't like movablemem_map idea, we decide to
drop "specifying physical address" thing, and restart a new solution
to support using SRAT info only.
We want to modify movablecore to support "movablecore=acpi" to
enable/disable limiting hotpluggable memory in ZONE_MOVABLE.
And we dropped all the old design and data structures.
2. As Liu Jiang mentioned before, we can add a flag to memblock to mark
special memory. Since we are dropping all the old data structures,
I think I want to reuse his idea to reserve movable memory with memblock
when booting.
3. If we add flag to memblock, we can mark different memory. And I remember
you mentioned before that we can use memblock to reserve local node
data
for node-life-cycle data, like vmemmap, pagetable.
So are you doing the similar work now ?
If not, I think I can merge it into mine, and push a new patch-set with
hot-add, hot-remove code modified to support putting vmemmap,
pagetable,
pgdat, page_cgroup, ..., on local node.
If you are doing the similar work, I will only finish my work and wait
for your patch.
Thanks. :)
On Thu, Apr 11, 2013 at 12:41 AM, Tang Chen <[email protected]> wrote:
>
> 3. If we add flag to memblock, we can mark different memory. And I remember
> you mentioned before that we can use memblock to reserve local node data
> for node-life-cycle data, like vmemmap, pagetable.
>
> So are you doing the similar work now ?
No, i did not start it yet.
>
> If not, I think I can merge it into mine, and push a new patch-set with
> hot-add, hot-remove code modified to support putting vmemmap, pagetable,
> pgdat, page_cgroup, ..., on local node.
Need to have it separated with moving_zone.
1. rework memblock to keep alive all the way for hotplug usage.
2. put pagetable and vmemap on the local node range with help of memblock.
Thanks
Yinghai
On 04/11/2013 11:10 PM, Yinghai Lu wrote:
> On Thu, Apr 11, 2013 at 12:41 AM, Tang Chen<[email protected]> wrote:
>>
>> 3. If we add flag to memblock, we can mark different memory. And I remember
>> you mentioned before that we can use memblock to reserve local node data
>> for node-life-cycle data, like vmemmap, pagetable.
>>
>> So are you doing the similar work now ?
>
> No, i did not start it yet.
>
>>
>> If not, I think I can merge it into mine, and push a new patch-set with
>> hot-add, hot-remove code modified to support putting vmemmap, pagetable,
>> pgdat, page_cgroup, ..., on local node.
>
> Need to have it separated with moving_zone.
>
> 1. rework memblock to keep alive all the way for hotplug usage.
> 2. put pagetable and vmemap on the local node range with help of memblock.
>
OKļ¼thanks for the comments. I'll merge it into my work and post an RFC
patch-set soon.
Thanks. :)