2018-04-26 16:00:22

by Pavel Tatashin

[permalink] [raw]
Subject: [PATCH] mm: sections are not offlined during memory hotremove

Memory hotplug, and hotremove operate with per-block granularity. If
machine has large amount of memory (more than 64G), the size of memory
block can span multiple sections. By mistake, during hotremove we set
only the first section to offline state.

The bug was discovered because kernel selftest started to fail:
https://lkml.kernel.org/r/20180423011247.GK5563@yexl-desktop

After commit, "mm/memory_hotplug: optimize probe routine". But, the bug is
older than this commit. In this optimization we also added a check for
sections to be in a proper state during hotplug operation.

Fixes: 2d070eab2e82 ("mm: consider zone which is not fully populated to have holes")

Signed-off-by: Pavel Tatashin <[email protected]>
---
mm/sparse.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/sparse.c b/mm/sparse.c
index 62eef264a7bd..73dc2fcc0eab 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -629,7 +629,7 @@ void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn)
unsigned long pfn;

for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
- unsigned long section_nr = pfn_to_section_nr(start_pfn);
+ unsigned long section_nr = pfn_to_section_nr(pfn);
struct mem_section *ms;

/*
--
1.8.3.1



2018-04-26 19:12:39

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH] mm: sections are not offlined during memory hotremove

On Thu 26-04-18 11:58:34, Pavel Tatashin wrote:
> Memory hotplug, and hotremove operate with per-block granularity. If
> machine has large amount of memory (more than 64G), the size of memory
> block can span multiple sections. By mistake, during hotremove we set
> only the first section to offline state.
>
> The bug was discovered because kernel selftest started to fail:
> https://lkml.kernel.org/r/20180423011247.GK5563@yexl-desktop
>
> After commit, "mm/memory_hotplug: optimize probe routine". But, the bug is
> older than this commit. In this optimization we also added a check for
> sections to be in a proper state during hotplug operation.
>
> Fixes: 2d070eab2e82 ("mm: consider zone which is not fully populated to have holes")

Dohh. When I saw this I've had that feeling that I have fixed this
already and it must have get lost somewhere. But no, this was the same
bug in a different path b4ccec41af82 ("mm/sparse.c: fix typo in
online_mem_sections"). I wonder why I haven't noticed the same pattern
in the offline path.

Thanks for noticing and fixing this.

> Signed-off-by: Pavel Tatashin <[email protected]>

Acked-by: Michal Hocko <[email protected]>

> ---
> mm/sparse.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 62eef264a7bd..73dc2fcc0eab 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -629,7 +629,7 @@ void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn)
> unsigned long pfn;
>
> for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
> - unsigned long section_nr = pfn_to_section_nr(start_pfn);
> + unsigned long section_nr = pfn_to_section_nr(pfn);
> struct mem_section *ms;
>
> /*
> --
> 1.8.3.1
>

--
Michal Hocko
SUSE Labs

2018-04-26 19:13:22

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH] mm: sections are not offlined during memory hotremove

On Thu 26-04-18 21:11:11, Michal Hocko wrote:
> On Thu 26-04-18 11:58:34, Pavel Tatashin wrote:
> > Memory hotplug, and hotremove operate with per-block granularity. If
> > machine has large amount of memory (more than 64G), the size of memory
> > block can span multiple sections. By mistake, during hotremove we set
> > only the first section to offline state.
> >
> > The bug was discovered because kernel selftest started to fail:
> > https://lkml.kernel.org/r/20180423011247.GK5563@yexl-desktop
> >
> > After commit, "mm/memory_hotplug: optimize probe routine". But, the bug is
> > older than this commit. In this optimization we also added a check for
> > sections to be in a proper state during hotplug operation.
> >
> > Fixes: 2d070eab2e82 ("mm: consider zone which is not fully populated to have holes")
>
> Dohh. When I saw this I've had that feeling that I have fixed this
> already and it must have get lost somewhere. But no, this was the same
> bug in a different path b4ccec41af82 ("mm/sparse.c: fix typo in
> online_mem_sections"). I wonder why I haven't noticed the same pattern
> in the offline path.
>
> Thanks for noticing and fixing this.
>
> > Signed-off-by: Pavel Tatashin <[email protected]>
>
> Acked-by: Michal Hocko <[email protected]>

Btw. Cc: stable would be appropriate.

>
> > mm/sparse.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/mm/sparse.c b/mm/sparse.c
> > index 62eef264a7bd..73dc2fcc0eab 100644
> > --- a/mm/sparse.c
> > +++ b/mm/sparse.c
> > @@ -629,7 +629,7 @@ void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn)
> > unsigned long pfn;
> >
> > for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
> > - unsigned long section_nr = pfn_to_section_nr(start_pfn);
> > + unsigned long section_nr = pfn_to_section_nr(pfn);
> > struct mem_section *ms;
> >
> > /*
> > --
> > 1.8.3.1
> >
>
> --
> Michal Hocko
> SUSE Labs

--
Michal Hocko
SUSE Labs