During reviewing David's code, I found current sub-section hotplug is
broken in several places.
The following link fix one potential issue, while David and I still spot
some suspicious points.
https://lkml.org/lkml/2020/2/6/334
The problem here is mainly about the memmap manipulation:
* memmap would be overwrite if not use SPARSEMEM_VMEMMAP
* only need to adjust memmap for SPARSEMEM_VMEMMAP
Wei Yang (3):
mm/sparsemem: adjust memmap only for SPARSEMEM_VMEMMAP
mm/sparsemem: get physical address to page struct instead of virtual
address to pfn
mm/sparsemem: avoid memmap overwrite for non-SPARSEMEM_VMEMMAP
mm/sparse.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)
--
2.17.1
memmap should be the physical address to page struct instead of virtual
address to pfn.
Since we call this only for SPARSEMEM_VMEMMAP, pfn_to_page() is valid at
this point.
Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
Signed-off-by: Wei Yang <[email protected]>
CC: Dan Williams <[email protected]>
---
mm/sparse.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/sparse.c b/mm/sparse.c
index b5da121bdd6e..56816f653588 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -888,7 +888,7 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
/* Align memmap to section boundary in the subsection case */
if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) &&
section_nr_to_pfn(section_nr) != start_pfn)
- memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr));
+ memmap = pfn_to_page(section_nr_to_pfn(section_nr));
sparse_init_one_section(ms, section_nr, memmap, ms->usage, 0);
return 0;
--
2.17.1
Only when SPARSEMEM_VMEMMAP is set, memmap returned from
section_activate() points to sub-section page struct. Otherwise, memmap
already points to the whole section page struct.
This means only for SPARSEMEM_VMEMMAP, we need to adjust memmap for
sub-section case.
Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
Signed-off-by: Wei Yang <[email protected]>
CC: Dan Williams <[email protected]>
---
mm/sparse.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/mm/sparse.c b/mm/sparse.c
index 586d85662978..b5da121bdd6e 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -886,7 +886,8 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
section_mark_present(ms);
/* Align memmap to section boundary in the subsection case */
- if (section_nr_to_pfn(section_nr) != start_pfn)
+ if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) &&
+ section_nr_to_pfn(section_nr) != start_pfn)
memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr));
sparse_init_one_section(ms, section_nr, memmap, ms->usage, 0);
--
2.17.1
In case of SPARSEMEM, populate_section_memmap() would allocate memmap
for the whole section, even we just want a sub-section. This would lead
to memmap overwrite if we a sub-section to an already populated section.
Just return the populated memmap for non-SPARSEMEM_VMEMMAP case.
Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
Signed-off-by: Wei Yang <[email protected]>
CC: Dan Williams <[email protected]>
---
mm/sparse.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/mm/sparse.c b/mm/sparse.c
index 56816f653588..c75ca40db513 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -836,6 +836,16 @@ static struct page * __meminit section_activate(int nid, unsigned long pfn,
if (nr_pages < PAGES_PER_SECTION && early_section(ms))
return pfn_to_page(pfn);
+ /*
+ * If it is not SPARSEMEM_VMEMMAP, we always populate memmap for the
+ * whole section, even for a sub-section.
+ *
+ * Return its memmap if already populated to avoid memmap overwrite.
+ */
+ if (!IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) &&
+ valid_section(ms))
+ return __section_mem_map_addr(ms);
+
memmap = populate_section_memmap(pfn, nr_pages, nid, altmap);
if (!memmap) {
section_deactivate(pfn, nr_pages, altmap);
--
2.17.1
On Thu, Feb 6, 2020 at 3:17 PM Wei Yang <[email protected]> wrote:
>
> Only when SPARSEMEM_VMEMMAP is set, memmap returned from
> section_activate() points to sub-section page struct. Otherwise, memmap
> already points to the whole section page struct.
>
> This means only for SPARSEMEM_VMEMMAP, we need to adjust memmap for
> sub-section case.
>
> Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
> Signed-off-by: Wei Yang <[email protected]>
> CC: Dan Williams <[email protected]>
> ---
> mm/sparse.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 586d85662978..b5da121bdd6e 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -886,7 +886,8 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
> section_mark_present(ms);
>
> /* Align memmap to section boundary in the subsection case */
> - if (section_nr_to_pfn(section_nr) != start_pfn)
> + if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) &&
> + section_nr_to_pfn(section_nr) != start_pfn)
Aren't we assured that start_pfn is always section aligned in the
SPARSEMEM case? That's the role of check_pfn_span(). Does the change
have a runtime impact or is this just theoretical?
On Thu, Feb 6, 2020 at 3:17 PM Wei Yang <[email protected]> wrote:
>
> In case of SPARSEMEM, populate_section_memmap() would allocate memmap
> for the whole section, even we just want a sub-section. This would lead
> to memmap overwrite if we a sub-section to an already populated section.
>
> Just return the populated memmap for non-SPARSEMEM_VMEMMAP case.
>
> Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
> Signed-off-by: Wei Yang <[email protected]>
> CC: Dan Williams <[email protected]>
> ---
> mm/sparse.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 56816f653588..c75ca40db513 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -836,6 +836,16 @@ static struct page * __meminit section_activate(int nid, unsigned long pfn,
> if (nr_pages < PAGES_PER_SECTION && early_section(ms))
> return pfn_to_page(pfn);
>
> + /*
> + * If it is not SPARSEMEM_VMEMMAP, we always populate memmap for the
> + * whole section, even for a sub-section.
> + *
> + * Return its memmap if already populated to avoid memmap overwrite.
> + */
> + if (!IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) &&
> + valid_section(ms))
> + return __section_mem_map_addr(ms);
Again, is check_pfn_span() failing to prevent this path?
On Thu, Feb 6, 2020 at 3:17 PM Wei Yang <[email protected]> wrote:
>
> memmap should be the physical address to page struct instead of virtual
> address to pfn.
>
> Since we call this only for SPARSEMEM_VMEMMAP, pfn_to_page() is valid at
> this point.
>
> Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
> Signed-off-by: Wei Yang <[email protected]>
> CC: Dan Williams <[email protected]>
> ---
> mm/sparse.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/sparse.c b/mm/sparse.c
> index b5da121bdd6e..56816f653588 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -888,7 +888,7 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
> /* Align memmap to section boundary in the subsection case */
> if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) &&
> section_nr_to_pfn(section_nr) != start_pfn)
> - memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr));
> + memmap = pfn_to_page(section_nr_to_pfn(section_nr));
Yes, this looks obviously correct. This might be tripping up
makedumpfile. Do you see any practical effects of this bug? The kernel
mostly avoids ->section_mem_map in the vmemmap case and in the
!vmemmap case section_nr_to_pfn(section_nr) should always equal
start_pfn.
Hi Dan,
On 02/06/20 at 06:19pm, Dan Williams wrote:
> On Thu, Feb 6, 2020 at 3:17 PM Wei Yang <[email protected]> wrote:
> > diff --git a/mm/sparse.c b/mm/sparse.c
> > index b5da121bdd6e..56816f653588 100644
> > --- a/mm/sparse.c
> > +++ b/mm/sparse.c
> > @@ -888,7 +888,7 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
> > /* Align memmap to section boundary in the subsection case */
> > if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) &&
> > section_nr_to_pfn(section_nr) != start_pfn)
> > - memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr));
> > + memmap = pfn_to_page(section_nr_to_pfn(section_nr));
>
> Yes, this looks obviously correct. This might be tripping up
> makedumpfile. Do you see any practical effects of this bug? The kernel
> mostly avoids ->section_mem_map in the vmemmap case and in the
> !vmemmap case section_nr_to_pfn(section_nr) should always equal
> start_pfn.
The practical effects is that the memmap for the first unaligned section will be lost
when destroy namespace to hot remove it. Because we encode the ->section_mem_map
into mem_section, and get memmap from the related mem_section to free it in
section_deactivate(). In fact in vmemmap, we don't need to encode the ->section_mem_map
with memmap.
By the way, sub-section support is only valid in vmemmap case, right?
Seems yes from code, but I don't find any document to prove it.
Thanks
Baoquan
On Thu, Feb 6, 2020 at 7:10 PM Baoquan He <[email protected]> wrote:
>
> Hi Dan,
>
> On 02/06/20 at 06:19pm, Dan Williams wrote:
> > On Thu, Feb 6, 2020 at 3:17 PM Wei Yang <[email protected]> wrote:
> > > diff --git a/mm/sparse.c b/mm/sparse.c
> > > index b5da121bdd6e..56816f653588 100644
> > > --- a/mm/sparse.c
> > > +++ b/mm/sparse.c
> > > @@ -888,7 +888,7 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
> > > /* Align memmap to section boundary in the subsection case */
> > > if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) &&
> > > section_nr_to_pfn(section_nr) != start_pfn)
> > > - memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr));
> > > + memmap = pfn_to_page(section_nr_to_pfn(section_nr));
> >
> > Yes, this looks obviously correct. This might be tripping up
> > makedumpfile. Do you see any practical effects of this bug? The kernel
> > mostly avoids ->section_mem_map in the vmemmap case and in the
> > !vmemmap case section_nr_to_pfn(section_nr) should always equal
> > start_pfn.
>
> The practical effects is that the memmap for the first unaligned section will be lost
> when destroy namespace to hot remove it. Because we encode the ->section_mem_map
> into mem_section, and get memmap from the related mem_section to free it in
> section_deactivate(). In fact in vmemmap, we don't need to encode the ->section_mem_map
> with memmap.
Right, but can you actually trigger that in the SPARSEMEM_VMEMMAP=n case?
> By the way, sub-section support is only valid in vmemmap case, right?
Yes.
> Seems yes from code, but I don't find any document to prove it.
check_pfn_span() enforces this requirement.
On 02/06/20 at 07:21pm, Dan Williams wrote:
> On Thu, Feb 6, 2020 at 7:10 PM Baoquan He <[email protected]> wrote:
> >
> > Hi Dan,
> >
> > On 02/06/20 at 06:19pm, Dan Williams wrote:
> > > On Thu, Feb 6, 2020 at 3:17 PM Wei Yang <[email protected]> wrote:
> > > > diff --git a/mm/sparse.c b/mm/sparse.c
> > > > index b5da121bdd6e..56816f653588 100644
> > > > --- a/mm/sparse.c
> > > > +++ b/mm/sparse.c
> > > > @@ -888,7 +888,7 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
> > > > /* Align memmap to section boundary in the subsection case */
> > > > if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) &&
> > > > section_nr_to_pfn(section_nr) != start_pfn)
> > > > - memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr));
> > > > + memmap = pfn_to_page(section_nr_to_pfn(section_nr));
> > >
> > > Yes, this looks obviously correct. This might be tripping up
> > > makedumpfile. Do you see any practical effects of this bug? The kernel
> > > mostly avoids ->section_mem_map in the vmemmap case and in the
> > > !vmemmap case section_nr_to_pfn(section_nr) should always equal
> > > start_pfn.
> >
> > The practical effects is that the memmap for the first unaligned section will be lost
> > when destroy namespace to hot remove it. Because we encode the ->section_mem_map
> > into mem_section, and get memmap from the related mem_section to free it in
> > section_deactivate(). In fact in vmemmap, we don't need to encode the ->section_mem_map
> > with memmap.
>
> Right, but can you actually trigger that in the SPARSEMEM_VMEMMAP=n case?
I think no, the lost memmap should only happen in vmemmap case.
>
> > By the way, sub-section support is only valid in vmemmap case, right?
>
> Yes.
>
> > Seems yes from code, but I don't find any document to prove it.
>
> check_pfn_span() enforces this requirement.
Thanks for your confirmation. Do you mind if I add some document
sentences somewhere make clear this?
On 02/06/20 at 06:06pm, Dan Williams wrote:
> On Thu, Feb 6, 2020 at 3:17 PM Wei Yang <[email protected]> wrote:
> >
> > In case of SPARSEMEM, populate_section_memmap() would allocate memmap
> > for the whole section, even we just want a sub-section. This would lead
> > to memmap overwrite if we a sub-section to an already populated section.
> >
> > Just return the populated memmap for non-SPARSEMEM_VMEMMAP case.
> >
> > Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
> > Signed-off-by: Wei Yang <[email protected]>
> > CC: Dan Williams <[email protected]>
> > ---
> > mm/sparse.c | 10 ++++++++++
> > 1 file changed, 10 insertions(+)
> >
> > diff --git a/mm/sparse.c b/mm/sparse.c
> > index 56816f653588..c75ca40db513 100644
> > --- a/mm/sparse.c
> > +++ b/mm/sparse.c
> > @@ -836,6 +836,16 @@ static struct page * __meminit section_activate(int nid, unsigned long pfn,
> > if (nr_pages < PAGES_PER_SECTION && early_section(ms))
> > return pfn_to_page(pfn);
> >
> > + /*
> > + * If it is not SPARSEMEM_VMEMMAP, we always populate memmap for the
> > + * whole section, even for a sub-section.
> > + *
> > + * Return its memmap if already populated to avoid memmap overwrite.
> > + */
> > + if (!IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) &&
> > + valid_section(ms))
> > + return __section_mem_map_addr(ms);
>
> Again, is check_pfn_span() failing to prevent this path?
The answer should be yes, this patch is not needed.
>
On Thu, Feb 6, 2020 at 7:36 PM Baoquan He <[email protected]> wrote:
>
> On 02/06/20 at 07:21pm, Dan Williams wrote:
> > On Thu, Feb 6, 2020 at 7:10 PM Baoquan He <[email protected]> wrote:
> > >
> > > Hi Dan,
> > >
> > > On 02/06/20 at 06:19pm, Dan Williams wrote:
> > > > On Thu, Feb 6, 2020 at 3:17 PM Wei Yang <[email protected]> wrote:
> > > > > diff --git a/mm/sparse.c b/mm/sparse.c
> > > > > index b5da121bdd6e..56816f653588 100644
> > > > > --- a/mm/sparse.c
> > > > > +++ b/mm/sparse.c
> > > > > @@ -888,7 +888,7 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
> > > > > /* Align memmap to section boundary in the subsection case */
> > > > > if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) &&
> > > > > section_nr_to_pfn(section_nr) != start_pfn)
> > > > > - memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr));
> > > > > + memmap = pfn_to_page(section_nr_to_pfn(section_nr));
> > > >
> > > > Yes, this looks obviously correct. This might be tripping up
> > > > makedumpfile. Do you see any practical effects of this bug? The kernel
> > > > mostly avoids ->section_mem_map in the vmemmap case and in the
> > > > !vmemmap case section_nr_to_pfn(section_nr) should always equal
> > > > start_pfn.
> > >
> > > The practical effects is that the memmap for the first unaligned section will be lost
> > > when destroy namespace to hot remove it. Because we encode the ->section_mem_map
> > > into mem_section, and get memmap from the related mem_section to free it in
> > > section_deactivate(). In fact in vmemmap, we don't need to encode the ->section_mem_map
> > > with memmap.
> >
> > Right, but can you actually trigger that in the SPARSEMEM_VMEMMAP=n case?
>
> I think no, the lost memmap should only happen in vmemmap case.
>
> >
> > > By the way, sub-section support is only valid in vmemmap case, right?
> >
> > Yes.
> >
> > > Seems yes from code, but I don't find any document to prove it.
> >
> > check_pfn_span() enforces this requirement.
>
> Thanks for your confirmation. Do you mind if I add some document
> sentences somewhere make clear this?
>
Sure, I'd be happy to review as well.
On 02/07/20 at 07:16am, Wei Yang wrote:
> memmap should be the physical address to page struct instead of virtual
> address to pfn.
Maybe not, memmap stores a virtual address.
>
> Since we call this only for SPARSEMEM_VMEMMAP, pfn_to_page() is valid at
> this point.
>
> Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
> Signed-off-by: Wei Yang <[email protected]>
> CC: Dan Williams <[email protected]>
> ---
> mm/sparse.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/sparse.c b/mm/sparse.c
> index b5da121bdd6e..56816f653588 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -888,7 +888,7 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
> /* Align memmap to section boundary in the subsection case */
> if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) &&
> section_nr_to_pfn(section_nr) != start_pfn)
> - memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr));
> + memmap = pfn_to_page(section_nr_to_pfn(section_nr));
With Dan's confirmation, sub-section is only valid in vmemmap case. I
think the old if (section_nr_to_pfn(section_nr) != start_pfn) is enough
to filter out non vmemmap case. So only below code is good:
+ memmap = pfn_to_page(section_nr_to_pfn(section_nr));
> sparse_init_one_section(ms, section_nr, memmap, ms->usage, 0);
>
> return 0;
> --
> 2.17.1
>
On Thu, Feb 06, 2020 at 06:19:46PM -0800, Dan Williams wrote:
>On Thu, Feb 6, 2020 at 3:17 PM Wei Yang <[email protected]> wrote:
>>
>> memmap should be the physical address to page struct instead of virtual
>> address to pfn.
>>
>> Since we call this only for SPARSEMEM_VMEMMAP, pfn_to_page() is valid at
>> this point.
>>
>> Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
>> Signed-off-by: Wei Yang <[email protected]>
>> CC: Dan Williams <[email protected]>
>> ---
>> mm/sparse.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/mm/sparse.c b/mm/sparse.c
>> index b5da121bdd6e..56816f653588 100644
>> --- a/mm/sparse.c
>> +++ b/mm/sparse.c
>> @@ -888,7 +888,7 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
>> /* Align memmap to section boundary in the subsection case */
>> if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) &&
>> section_nr_to_pfn(section_nr) != start_pfn)
>> - memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr));
>> + memmap = pfn_to_page(section_nr_to_pfn(section_nr));
>
>Yes, this looks obviously correct. This might be tripping up
>makedumpfile. Do you see any practical effects of this bug? The kernel
>mostly avoids ->section_mem_map in the vmemmap case and in the
>!vmemmap case section_nr_to_pfn(section_nr) should always equal
>start_pfn.
To summarize:
* vmemmap, ->section_mem_map is not used mostly
* !vmemmap, we are sure range is section aligned
Sounds we don't need to handle this?
--
Wei Yang
Help you, Help me
On Fri, Feb 07, 2020 at 12:11:34PM +0800, Baoquan He wrote:
>On 02/07/20 at 07:16am, Wei Yang wrote:
>> memmap should be the physical address to page struct instead of virtual
>> address to pfn.
>
>Maybe not, memmap stores a virtual address.
>
>>
>> Since we call this only for SPARSEMEM_VMEMMAP, pfn_to_page() is valid at
>> this point.
>>
>> Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
>> Signed-off-by: Wei Yang <[email protected]>
>> CC: Dan Williams <[email protected]>
>> ---
>> mm/sparse.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/mm/sparse.c b/mm/sparse.c
>> index b5da121bdd6e..56816f653588 100644
>> --- a/mm/sparse.c
>> +++ b/mm/sparse.c
>> @@ -888,7 +888,7 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
>> /* Align memmap to section boundary in the subsection case */
>> if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) &&
>> section_nr_to_pfn(section_nr) != start_pfn)
>> - memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr));
>> + memmap = pfn_to_page(section_nr_to_pfn(section_nr));
>
>With Dan's confirmation, sub-section is only valid in vmemmap case. I
>think the old if (section_nr_to_pfn(section_nr) != start_pfn) is enough
>to filter out non vmemmap case. So only below code is good:
>
> + memmap = pfn_to_page(section_nr_to_pfn(section_nr));
>
You mean replace pfn_to_kaddr with pfn_to_page ?
>> sparse_init_one_section(ms, section_nr, memmap, ms->usage, 0);
>>
>> return 0;
>> --
>> 2.17.1
>>
--
Wei Yang
Help you, Help me
On Thu, Feb 06, 2020 at 06:06:54PM -0800, Dan Williams wrote:
>On Thu, Feb 6, 2020 at 3:17 PM Wei Yang <[email protected]> wrote:
>>
>> In case of SPARSEMEM, populate_section_memmap() would allocate memmap
>> for the whole section, even we just want a sub-section. This would lead
>> to memmap overwrite if we a sub-section to an already populated section.
>>
>> Just return the populated memmap for non-SPARSEMEM_VMEMMAP case.
>>
>> Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
>> Signed-off-by: Wei Yang <[email protected]>
>> CC: Dan Williams <[email protected]>
>> ---
>> mm/sparse.c | 10 ++++++++++
>> 1 file changed, 10 insertions(+)
>>
>> diff --git a/mm/sparse.c b/mm/sparse.c
>> index 56816f653588..c75ca40db513 100644
>> --- a/mm/sparse.c
>> +++ b/mm/sparse.c
>> @@ -836,6 +836,16 @@ static struct page * __meminit section_activate(int nid, unsigned long pfn,
>> if (nr_pages < PAGES_PER_SECTION && early_section(ms))
>> return pfn_to_page(pfn);
>>
>> + /*
>> + * If it is not SPARSEMEM_VMEMMAP, we always populate memmap for the
>> + * whole section, even for a sub-section.
>> + *
>> + * Return its memmap if already populated to avoid memmap overwrite.
>> + */
>> + if (!IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) &&
>> + valid_section(ms))
>> + return __section_mem_map_addr(ms);
>
>Again, is check_pfn_span() failing to prevent this path?
Oh, you are right. Thanks
--
Wei Yang
Help you, Help me
On Thu, Feb 06, 2020 at 06:00:13PM -0800, Dan Williams wrote:
>On Thu, Feb 6, 2020 at 3:17 PM Wei Yang <[email protected]> wrote:
>>
>> Only when SPARSEMEM_VMEMMAP is set, memmap returned from
>> section_activate() points to sub-section page struct. Otherwise, memmap
>> already points to the whole section page struct.
>>
>> This means only for SPARSEMEM_VMEMMAP, we need to adjust memmap for
>> sub-section case.
>>
>> Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
>> Signed-off-by: Wei Yang <[email protected]>
>> CC: Dan Williams <[email protected]>
>> ---
>> mm/sparse.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/sparse.c b/mm/sparse.c
>> index 586d85662978..b5da121bdd6e 100644
>> --- a/mm/sparse.c
>> +++ b/mm/sparse.c
>> @@ -886,7 +886,8 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
>> section_mark_present(ms);
>>
>> /* Align memmap to section boundary in the subsection case */
>> - if (section_nr_to_pfn(section_nr) != start_pfn)
>> + if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) &&
>> + section_nr_to_pfn(section_nr) != start_pfn)
>
>Aren't we assured that start_pfn is always section aligned in the
>SPARSEMEM case? That's the role of check_pfn_span(). Does the change
>have a runtime impact or is this just theoretical?
You are right, I missed this point.
Thanks
--
Wei Yang
Help you, Help me
On Fri, Feb 07, 2020 at 11:36:36AM +0800, Baoquan He wrote:
>On 02/06/20 at 07:21pm, Dan Williams wrote:
>> On Thu, Feb 6, 2020 at 7:10 PM Baoquan He <[email protected]> wrote:
>> >
>> > Hi Dan,
>> >
>> > On 02/06/20 at 06:19pm, Dan Williams wrote:
>> > > On Thu, Feb 6, 2020 at 3:17 PM Wei Yang <[email protected]> wrote:
>> > > > diff --git a/mm/sparse.c b/mm/sparse.c
>> > > > index b5da121bdd6e..56816f653588 100644
>> > > > --- a/mm/sparse.c
>> > > > +++ b/mm/sparse.c
>> > > > @@ -888,7 +888,7 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
>> > > > /* Align memmap to section boundary in the subsection case */
>> > > > if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) &&
>> > > > section_nr_to_pfn(section_nr) != start_pfn)
>> > > > - memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr));
>> > > > + memmap = pfn_to_page(section_nr_to_pfn(section_nr));
>> > >
>> > > Yes, this looks obviously correct. This might be tripping up
>> > > makedumpfile. Do you see any practical effects of this bug? The kernel
>> > > mostly avoids ->section_mem_map in the vmemmap case and in the
>> > > !vmemmap case section_nr_to_pfn(section_nr) should always equal
>> > > start_pfn.
>> >
>> > The practical effects is that the memmap for the first unaligned section will be lost
>> > when destroy namespace to hot remove it. Because we encode the ->section_mem_map
>> > into mem_section, and get memmap from the related mem_section to free it in
>> > section_deactivate(). In fact in vmemmap, we don't need to encode the ->section_mem_map
>> > with memmap.
>>
>> Right, but can you actually trigger that in the SPARSEMEM_VMEMMAP=n case?
>
>I think no, the lost memmap should only happen in vmemmap case.
>
>>
>> > By the way, sub-section support is only valid in vmemmap case, right?
>>
>> Yes.
>>
>> > Seems yes from code, but I don't find any document to prove it.
>>
>> check_pfn_span() enforces this requirement.
I saw this function, but those combination of vmemmap and !vmemmap make my
brain not work properly.
>
>Thanks for your confirmation. Do you mind if I add some document
>sentences somewhere make clear this?
Thanks, hope this would help the future audience.
--
Wei Yang
Help you, Help me
On Thu, Feb 06, 2020 at 06:19:46PM -0800, Dan Williams wrote:
>On Thu, Feb 6, 2020 at 3:17 PM Wei Yang <[email protected]> wrote:
>>
>> memmap should be the physical address to page struct instead of virtual
>> address to pfn.
>>
>> Since we call this only for SPARSEMEM_VMEMMAP, pfn_to_page() is valid at
>> this point.
>>
>> Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
>> Signed-off-by: Wei Yang <[email protected]>
>> CC: Dan Williams <[email protected]>
>> ---
>> mm/sparse.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/mm/sparse.c b/mm/sparse.c
>> index b5da121bdd6e..56816f653588 100644
>> --- a/mm/sparse.c
>> +++ b/mm/sparse.c
>> @@ -888,7 +888,7 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
>> /* Align memmap to section boundary in the subsection case */
>> if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) &&
>> section_nr_to_pfn(section_nr) != start_pfn)
>> - memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr));
>> + memmap = pfn_to_page(section_nr_to_pfn(section_nr));
>
>Yes, this looks obviously correct. This might be tripping up
>makedumpfile. Do you see any practical effects of this bug? The kernel
>mostly avoids ->section_mem_map in the vmemmap case and in the
>!vmemmap case section_nr_to_pfn(section_nr) should always equal
>start_pfn.
I took another look into the code. Looks there is no practical effect after
this. Because in the vmemmap case, we don't need ->section_mem_map to retrieve
the real memmap.
But leave a inconsistent data in section_mem_map is not a good practice.
--
Wei Yang
Help you, Help me
On Thu, Feb 06, 2020 at 07:21:49PM -0800, Dan Williams wrote:
>On Thu, Feb 6, 2020 at 7:10 PM Baoquan He <[email protected]> wrote:
>>
>> Hi Dan,
>>
>> On 02/06/20 at 06:19pm, Dan Williams wrote:
>> > On Thu, Feb 6, 2020 at 3:17 PM Wei Yang <[email protected]> wrote:
>> > > diff --git a/mm/sparse.c b/mm/sparse.c
>> > > index b5da121bdd6e..56816f653588 100644
>> > > --- a/mm/sparse.c
>> > > +++ b/mm/sparse.c
>> > > @@ -888,7 +888,7 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
>> > > /* Align memmap to section boundary in the subsection case */
>> > > if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) &&
>> > > section_nr_to_pfn(section_nr) != start_pfn)
>> > > - memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr));
>> > > + memmap = pfn_to_page(section_nr_to_pfn(section_nr));
>> >
>> > Yes, this looks obviously correct. This might be tripping up
>> > makedumpfile. Do you see any practical effects of this bug? The kernel
>> > mostly avoids ->section_mem_map in the vmemmap case and in the
>> > !vmemmap case section_nr_to_pfn(section_nr) should always equal
>> > start_pfn.
>>
>> The practical effects is that the memmap for the first unaligned section will be lost
>> when destroy namespace to hot remove it. Because we encode the ->section_mem_map
>> into mem_section, and get memmap from the related mem_section to free it in
>> section_deactivate(). In fact in vmemmap, we don't need to encode the ->section_mem_map
>> with memmap.
>
>Right, but can you actually trigger that in the SPARSEMEM_VMEMMAP=n case?
>
>> By the way, sub-section support is only valid in vmemmap case, right?
>
>Yes.
Just one question from curiosity. Why we don't want sub-section for !vmemmap
case? Because it will wast memory for memmap?
>
>> Seems yes from code, but I don't find any document to prove it.
>
>check_pfn_span() enforces this requirement.
--
Wei Yang
Help you, Help me
On Fri, Feb 7, 2020 at 4:15 AM Wei Yang <[email protected]> wrote:
>
> On Thu, Feb 06, 2020 at 07:21:49PM -0800, Dan Williams wrote:
> >On Thu, Feb 6, 2020 at 7:10 PM Baoquan He <[email protected]> wrote:
> >>
> >> Hi Dan,
> >>
> >> On 02/06/20 at 06:19pm, Dan Williams wrote:
> >> > On Thu, Feb 6, 2020 at 3:17 PM Wei Yang <[email protected]> wrote:
> >> > > diff --git a/mm/sparse.c b/mm/sparse.c
> >> > > index b5da121bdd6e..56816f653588 100644
> >> > > --- a/mm/sparse.c
> >> > > +++ b/mm/sparse.c
> >> > > @@ -888,7 +888,7 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
> >> > > /* Align memmap to section boundary in the subsection case */
> >> > > if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) &&
> >> > > section_nr_to_pfn(section_nr) != start_pfn)
> >> > > - memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr));
> >> > > + memmap = pfn_to_page(section_nr_to_pfn(section_nr));
> >> >
> >> > Yes, this looks obviously correct. This might be tripping up
> >> > makedumpfile. Do you see any practical effects of this bug? The kernel
> >> > mostly avoids ->section_mem_map in the vmemmap case and in the
> >> > !vmemmap case section_nr_to_pfn(section_nr) should always equal
> >> > start_pfn.
> >>
> >> The practical effects is that the memmap for the first unaligned section will be lost
> >> when destroy namespace to hot remove it. Because we encode the ->section_mem_map
> >> into mem_section, and get memmap from the related mem_section to free it in
> >> section_deactivate(). In fact in vmemmap, we don't need to encode the ->section_mem_map
> >> with memmap.
> >
> >Right, but can you actually trigger that in the SPARSEMEM_VMEMMAP=n case?
> >
> >> By the way, sub-section support is only valid in vmemmap case, right?
> >
> >Yes.
>
> Just one question from curiosity. Why we don't want sub-section for !vmemmap
> case? Because it will wast memory for memmap?
The effort and maintenance burden outweighs the benefit.
On 02/07/20 at 11:26am, Wei Yang wrote:
> On Thu, Feb 06, 2020 at 06:19:46PM -0800, Dan Williams wrote:
> >On Thu, Feb 6, 2020 at 3:17 PM Wei Yang <[email protected]> wrote:
> >>
> >> memmap should be the physical address to page struct instead of virtual
> >> address to pfn.
> >>
> >> Since we call this only for SPARSEMEM_VMEMMAP, pfn_to_page() is valid at
> >> this point.
> >>
> >> Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
> >> Signed-off-by: Wei Yang <[email protected]>
> >> CC: Dan Williams <[email protected]>
> >> ---
> >> mm/sparse.c | 2 +-
> >> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/mm/sparse.c b/mm/sparse.c
> >> index b5da121bdd6e..56816f653588 100644
> >> --- a/mm/sparse.c
> >> +++ b/mm/sparse.c
> >> @@ -888,7 +888,7 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
> >> /* Align memmap to section boundary in the subsection case */
> >> if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) &&
> >> section_nr_to_pfn(section_nr) != start_pfn)
> >> - memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr));
> >> + memmap = pfn_to_page(section_nr_to_pfn(section_nr));
> >
> >Yes, this looks obviously correct. This might be tripping up
> >makedumpfile. Do you see any practical effects of this bug? The kernel
> >mostly avoids ->section_mem_map in the vmemmap case and in the
> >!vmemmap case section_nr_to_pfn(section_nr) should always equal
> >start_pfn.
>
> I took another look into the code. Looks there is no practical effect after
> this. Because in the vmemmap case, we don't need ->section_mem_map to retrieve
> the real memmap.
>
> But leave a inconsistent data in section_mem_map is not a good practice.
Yeah, it does has no pratical effect. I tried to create sub-section
alighed namespace, then trigger crash, makedumpfile isn't impacted.
Because pmem memory is only added, but not onlined. We don't report it
to kdump, makedumpfile will ignore it.
I think it's worth fixing it to encode a correct memmap address. We
don't know if in the future this will break anything.
Thanks
Baoquan
> Am 09.02.2020 um 14:50 schrieb Baoquan He <[email protected]>:
>
> On 02/07/20 at 11:26am, Wei Yang wrote:
>>> On Thu, Feb 06, 2020 at 06:19:46PM -0800, Dan Williams wrote:
>>> On Thu, Feb 6, 2020 at 3:17 PM Wei Yang <[email protected]> wrote:
>>>>
>>>> memmap should be the physical address to page struct instead of virtual
>>>> address to pfn.
>>>>
>>>> Since we call this only for SPARSEMEM_VMEMMAP, pfn_to_page() is valid at
>>>> this point.
>>>>
>>>> Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
>>>> Signed-off-by: Wei Yang <[email protected]>
>>>> CC: Dan Williams <[email protected]>
>>>> ---
>>>> mm/sparse.c | 2 +-
>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/mm/sparse.c b/mm/sparse.c
>>>> index b5da121bdd6e..56816f653588 100644
>>>> --- a/mm/sparse.c
>>>> +++ b/mm/sparse.c
>>>> @@ -888,7 +888,7 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
>>>> /* Align memmap to section boundary in the subsection case */
>>>> if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) &&
>>>> section_nr_to_pfn(section_nr) != start_pfn)
>>>> - memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr));
>>>> + memmap = pfn_to_page(section_nr_to_pfn(section_nr));
>>>
>>> Yes, this looks obviously correct. This might be tripping up
>>> makedumpfile. Do you see any practical effects of this bug? The kernel
>>> mostly avoids ->section_mem_map in the vmemmap case and in the
>>> !vmemmap case section_nr_to_pfn(section_nr) should always equal
>>> start_pfn.
>>
>> I took another look into the code. Looks there is no practical effect after
>> this. Because in the vmemmap case, we don't need ->section_mem_map to retrieve
>> the real memmap.
>>
>> But leave a inconsistent data in section_mem_map is not a good practice.
>
> Yeah, it does has no pratical effect. I tried to create sub-section
> alighed namespace, then trigger crash, makedumpfile isn't impacted.
> Because pmem memory is only added, but not onlined. We don't report it
> to kdump, makedumpfile will ignore it.
>
> I think it's worth fixing it to encode a correct memmap address. We
> don't know if in the future this will break anything.
We can have system memory and devmem overlap within a section (which is still buggy and to be fixed in other regard - e.g., pfn_to_online_page() does not work correctly).
E.g., 64 mb of (boot) system memory in a section. Then you can hot-add devmem that spans the remaining 64 mb of that section.
So some of that memory will be kdumped - and should be fixed if broken.
Cheers
>
> Thanks
> Baoquan
On Sun, Feb 09, 2020 at 03:14:28PM +0100, David Hildenbrand wrote:
>
>
>> Am 09.02.2020 um 14:50 schrieb Baoquan He <[email protected]>:
>>
>> On 02/07/20 at 11:26am, Wei Yang wrote:
>>>> On Thu, Feb 06, 2020 at 06:19:46PM -0800, Dan Williams wrote:
>>>> On Thu, Feb 6, 2020 at 3:17 PM Wei Yang <[email protected]> wrote:
>>>>>
>>>>> memmap should be the physical address to page struct instead of virtual
>>>>> address to pfn.
>>>>>
>>>>> Since we call this only for SPARSEMEM_VMEMMAP, pfn_to_page() is valid at
>>>>> this point.
>>>>>
>>>>> Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
>>>>> Signed-off-by: Wei Yang <[email protected]>
>>>>> CC: Dan Williams <[email protected]>
>>>>> ---
>>>>> mm/sparse.c | 2 +-
>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/mm/sparse.c b/mm/sparse.c
>>>>> index b5da121bdd6e..56816f653588 100644
>>>>> --- a/mm/sparse.c
>>>>> +++ b/mm/sparse.c
>>>>> @@ -888,7 +888,7 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
>>>>> /* Align memmap to section boundary in the subsection case */
>>>>> if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) &&
>>>>> section_nr_to_pfn(section_nr) != start_pfn)
>>>>> - memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr));
>>>>> + memmap = pfn_to_page(section_nr_to_pfn(section_nr));
>>>>
>>>> Yes, this looks obviously correct. This might be tripping up
>>>> makedumpfile. Do you see any practical effects of this bug? The kernel
>>>> mostly avoids ->section_mem_map in the vmemmap case and in the
>>>> !vmemmap case section_nr_to_pfn(section_nr) should always equal
>>>> start_pfn.
>>>
>>> I took another look into the code. Looks there is no practical effect after
>>> this. Because in the vmemmap case, we don't need ->section_mem_map to retrieve
>>> the real memmap.
>>>
>>> But leave a inconsistent data in section_mem_map is not a good practice.
>>
>> Yeah, it does has no pratical effect. I tried to create sub-section
>> alighed namespace, then trigger crash, makedumpfile isn't impacted.
>> Because pmem memory is only added, but not onlined. We don't report it
>> to kdump, makedumpfile will ignore it.
>>
>> I think it's worth fixing it to encode a correct memmap address. We
>> don't know if in the future this will break anything.
>
>We can have system memory and devmem overlap within a section (which is still buggy and to be fixed in other regard - e.g., pfn_to_online_page() does not work correctly).
>
>E.g., 64 mb of (boot) system memory in a section. Then you can hot-add devmem that spans the remaining 64 mb of that section.
>
>So some of that memory will be kdumped - and should be fixed if broken.
>
>Cheers
Thanks for the explanation, I will add this in the changelog.
>
>
>>
>> Thanks
>> Baoquan
--
Wei Yang
Help you, Help me