The total mapcount is a useful information for debugging, but we can't
call total_mapcount() directly since it calls some assertions which may
be triggered as commit 6dc5ea16c86f ("mm,
dump_page: do not crash with bad compound_mapcount()") met.
We could implement yet another implementation for dump_page() but
it has the limitation when individual mapcount of subpages is corrupted.
Actually the total mapcount could be decoded from refcount, pincount and
compound mapcount although it may be not very precise due to some
transient references.
Signed-off-by: Yang Shi <[email protected]>
---
I think we are on the same page that the total mapcount is useful
information and it would be ideal to print this information when dumpping
page if possible. But how to implement it safely seems controversial.
Some ideas and potential problems have been discussed by
https://lore.kernel.org/linux-mm/[email protected]/.
So I prepared this patch to show a possible approach to get some
feedback. The same thing could be decoded by the reader of page dump
as well by using the same formula used by this patch. However it sounds
more convenient to have kernel do the math.
mm/debug.c | 35 +++++++++++++++++++++++++++++------
1 file changed, 29 insertions(+), 6 deletions(-)
diff --git a/mm/debug.c b/mm/debug.c
index e73fe0a8ec3d..129efcfcaf79 100644
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -54,8 +54,13 @@ static void __dump_page(struct page *page)
* inaccuracy here due to racing.
*/
bool page_cma = is_migrate_cma_page(page);
- int mapcount;
+ int mapcount, total_mapcount;
+ int nr;
+ int refcount;
+ int pincount = 0;
+ int comp_mapcnt;
char *type = "";
+ bool is_slab = PageSlab(head);
if (page < head || (page >= head + MAX_ORDER_NR_PAGES)) {
/*
@@ -82,22 +87,40 @@ static void __dump_page(struct page *page)
* page->_mapcount space in struct page is used by sl[aou]b pages to
* encode own info.
*/
- mapcount = PageSlab(head) ? 0 : page_mapcount(page);
+ mapcount = is_slab ? 0 : page_mapcount(page);
+
+ refcount = page_ref_count(head);
pr_warn("page:%p refcount:%d mapcount:%d mapping:%p index:%#lx pfn:%#lx\n",
- page, page_ref_count(head), mapcount, mapping,
+ page, refcount, mapcount, mapping,
page_to_pgoff(page), page_to_pfn(page));
if (compound) {
+ comp_mapcnt = head_compound_mapcount(head);
if (hpage_pincount_available(page)) {
+ pincount = head_compound_pincount(head);
pr_warn("head:%p order:%u compound_mapcount:%d compound_pincount:%d\n",
head, compound_order(head),
- head_compound_mapcount(head),
- head_compound_pincount(head));
+ comp_mapcnt, pincount);
} else {
pr_warn("head:%p order:%u compound_mapcount:%d\n",
head, compound_order(head),
- head_compound_mapcount(head));
+ comp_mapcnt);
+ }
+
+ nr = compound_nr(head);
+ if (is_slab)
+ total_mapcount = 0;
+ else if (PageHuge(head))
+ total_mapcount = comp_mapcnt;
+ else {
+ if (mapping) {
+ if (!PageAnon(head))
+ nr = nr * (comp_mapcnt + 1) - comp_mapcnt;
+ } else
+ nr = 0;
+ total_mapcount = refcount - pincount - nr;
}
+ pr_warn("total_mapcount(estimated):%d\n", total_mapcount);
}
#ifdef CONFIG_MEMCG
--
2.26.2
On Fri, May 28, 2021 at 10:54:03AM -0700, Yang Shi wrote:
> So I prepared this patch to show a possible approach to get some
> feedback. The same thing could be decoded by the reader of page dump
> as well by using the same formula used by this patch. However it sounds
> more convenient to have kernel do the math.
You haven't taken enough things into consideration ...
> + bool is_slab = PageSlab(head);
We should probably have a separate dump_slab_page(). Almost nothing
in __dump_page() is really useful for slab pages (eg, mapping, index,
mapcount, compound_mapcount, compound_pincount, aops), and the flags
(such as are used) have different meanings.
> + nr = compound_nr(head);
> + if (is_slab)
> + total_mapcount = 0;
> + else if (PageHuge(head))
> + total_mapcount = comp_mapcnt;
> + else {
> + if (mapping) {
> + if (!PageAnon(head))
> + nr = nr * (comp_mapcnt + 1) - comp_mapcnt;
> + } else
> + nr = 0;
> + total_mapcount = refcount - pincount - nr;
I see what you're trying to do here, but there are so many other things
which take a refcount on a page. The LRU, the page cache, private fs
data, random temporary "gets" (eg, buffered reads, buffered writes,
get_user_pages(), readahead, truncate, migration). I think this is
likely to be so inaccurate as to be confusing.
I had to think hard about it though. I like what you're trying to do,
I just don't think it works ;-(
On 5/28/21 10:54 AM, Yang Shi wrote:
> The total mapcount is a useful information for debugging, but we can't
> call total_mapcount() directly since it calls some assertions which may
> be triggered as commit 6dc5ea16c86f ("mm,
> dump_page: do not crash with bad compound_mapcount()") met.
>
> We could implement yet another implementation for dump_page() but
> it has the limitation when individual mapcount of subpages is corrupted.
>
> Actually the total mapcount could be decoded from refcount, pincount and
> compound mapcount although it may be not very precise due to some
> transient references.
If the mapcount calculation were in a separate routine, *and* if something
else in addition to dump_page() used it, then I'd be interested in
calling it from dump_page().
But, just adding a calculation glob like this is not a good idea. If
the reader really needs the calculation, then that person can, as you
say, work it out from the other information.
Debug and dump routines are actually supposed to remain fairly simple,
so that they themselves do not end up with bugs, or stale assumptions
(which this calculation is very much susceptible to). This goes in the
wrong direction.
So best to just not do this, IMHO.
thanks,
--
John Hubbard
NVIDIA
>
> Signed-off-by: Yang Shi <[email protected]>
> ---
> I think we are on the same page that the total mapcount is useful
> information and it would be ideal to print this information when dumpping
> page if possible. But how to implement it safely seems controversial.
> Some ideas and potential problems have been discussed by
> https://lore.kernel.org/linux-mm/[email protected]/.
>
> So I prepared this patch to show a possible approach to get some
> feedback. The same thing could be decoded by the reader of page dump
> as well by using the same formula used by this patch. However it sounds
> more convenient to have kernel do the math.
>
> mm/debug.c | 35 +++++++++++++++++++++++++++++------
> 1 file changed, 29 insertions(+), 6 deletions(-)
>
> diff --git a/mm/debug.c b/mm/debug.c
> index e73fe0a8ec3d..129efcfcaf79 100644
> --- a/mm/debug.c
> +++ b/mm/debug.c
> @@ -54,8 +54,13 @@ static void __dump_page(struct page *page)
> * inaccuracy here due to racing.
> */
> bool page_cma = is_migrate_cma_page(page);
> - int mapcount;
> + int mapcount, total_mapcount;
> + int nr;
> + int refcount;
> + int pincount = 0;
> + int comp_mapcnt;
> char *type = "";
> + bool is_slab = PageSlab(head);
>
> if (page < head || (page >= head + MAX_ORDER_NR_PAGES)) {
> /*
> @@ -82,22 +87,40 @@ static void __dump_page(struct page *page)
> * page->_mapcount space in struct page is used by sl[aou]b pages to
> * encode own info.
> */
> - mapcount = PageSlab(head) ? 0 : page_mapcount(page);
> + mapcount = is_slab ? 0 : page_mapcount(page);
> +
> + refcount = page_ref_count(head);
>
> pr_warn("page:%p refcount:%d mapcount:%d mapping:%p index:%#lx pfn:%#lx\n",
> - page, page_ref_count(head), mapcount, mapping,
> + page, refcount, mapcount, mapping,
> page_to_pgoff(page), page_to_pfn(page));
> if (compound) {
> + comp_mapcnt = head_compound_mapcount(head);
> if (hpage_pincount_available(page)) {
> + pincount = head_compound_pincount(head);
> pr_warn("head:%p order:%u compound_mapcount:%d compound_pincount:%d\n",
> head, compound_order(head),
> - head_compound_mapcount(head),
> - head_compound_pincount(head));
> + comp_mapcnt, pincount);
> } else {
> pr_warn("head:%p order:%u compound_mapcount:%d\n",
> head, compound_order(head),
> - head_compound_mapcount(head));
> + comp_mapcnt);
> + }
> +
> + nr = compound_nr(head);
> + if (is_slab)
> + total_mapcount = 0;
> + else if (PageHuge(head))
> + total_mapcount = comp_mapcnt;
> + else {
> + if (mapping) {
> + if (!PageAnon(head))
> + nr = nr * (comp_mapcnt + 1) - comp_mapcnt;
> + } else
> + nr = 0;
> + total_mapcount = refcount - pincount - nr;
> }
> + pr_warn("total_mapcount(estimated):%d\n", total_mapcount);
> }
>
> #ifdef CONFIG_MEMCG
>
On Fri, May 28, 2021 at 11:22 AM Matthew Wilcox <[email protected]> wrote:
>
> On Fri, May 28, 2021 at 10:54:03AM -0700, Yang Shi wrote:
> > So I prepared this patch to show a possible approach to get some
> > feedback. The same thing could be decoded by the reader of page dump
> > as well by using the same formula used by this patch. However it sounds
> > more convenient to have kernel do the math.
>
> You haven't taken enough things into consideration ...
>
> > + bool is_slab = PageSlab(head);
>
> We should probably have a separate dump_slab_page(). Almost nothing
> in __dump_page() is really useful for slab pages (eg, mapping, index,
> mapcount, compound_mapcount, compound_pincount, aops), and the flags
> (such as are used) have different meanings.
Yes, slab page dump is missed for a long time.
>
> > + nr = compound_nr(head);
> > + if (is_slab)
> > + total_mapcount = 0;
> > + else if (PageHuge(head))
> > + total_mapcount = comp_mapcnt;
> > + else {
> > + if (mapping) {
> > + if (!PageAnon(head))
> > + nr = nr * (comp_mapcnt + 1) - comp_mapcnt;
> > + } else
> > + nr = 0;
> > + total_mapcount = refcount - pincount - nr;
>
> I see what you're trying to do here, but there are so many other things
> which take a refcount on a page. The LRU, the page cache, private fs
> data, random temporary "gets" (eg, buffered reads, buffered writes,
> get_user_pages(), readahead, truncate, migration). I think this is
> likely to be so inaccurate as to be confusing.
Yes, it is inaccurate in some cases. There is not a simple way to rule
out those random transient references. The page cache has been taken
into account by this patch, but I overlooked private page and LRU
cache cases, they seem simple to filter out by page flags.
>
> I had to think hard about it though. I like what you're trying to do,
> I just don't think it works ;-(
The random transient references are annoying and could make the number
be far away from accuracy. That would be too confusing to be worth
printing the number. But I'm not sure how often or bad it could be.
On Fri, May 28, 2021 at 11:26 AM John Hubbard <[email protected]> wrote:
>
> On 5/28/21 10:54 AM, Yang Shi wrote:
> > The total mapcount is a useful information for debugging, but we can't
> > call total_mapcount() directly since it calls some assertions which may
> > be triggered as commit 6dc5ea16c86f ("mm,
> > dump_page: do not crash with bad compound_mapcount()") met.
> >
> > We could implement yet another implementation for dump_page() but
> > it has the limitation when individual mapcount of subpages is corrupted.
> >
> > Actually the total mapcount could be decoded from refcount, pincount and
> > compound mapcount although it may be not very precise due to some
> > transient references.
>
> If the mapcount calculation were in a separate routine, *and* if something
> else in addition to dump_page() used it, then I'd be interested in
> calling it from dump_page().
There is. The total_mapcount() is used by mm code. But as I mentioned
in the commit log and that discussion email, it is not safe to call it
directly in dump_page() path.
>
> But, just adding a calculation glob like this is not a good idea. If
> the reader really needs the calculation, then that person can, as you
> say, work it out from the other information.
>
> Debug and dump routines are actually supposed to remain fairly simple,
> so that they themselves do not end up with bugs, or stale assumptions
> (which this calculation is very much susceptible to). This goes in the
> wrong direction.
>
> So best to just not do this, IMHO.
>
> thanks,
> --
> John Hubbard
> NVIDIA
>
> >
> > Signed-off-by: Yang Shi <[email protected]>
> > ---
> > I think we are on the same page that the total mapcount is useful
> > information and it would be ideal to print this information when dumpping
> > page if possible. But how to implement it safely seems controversial.
> > Some ideas and potential problems have been discussed by
> > https://lore.kernel.org/linux-mm/[email protected]/.
> >
> > So I prepared this patch to show a possible approach to get some
> > feedback. The same thing could be decoded by the reader of page dump
> > as well by using the same formula used by this patch. However it sounds
> > more convenient to have kernel do the math.
> >
> > mm/debug.c | 35 +++++++++++++++++++++++++++++------
> > 1 file changed, 29 insertions(+), 6 deletions(-)
> >
> > diff --git a/mm/debug.c b/mm/debug.c
> > index e73fe0a8ec3d..129efcfcaf79 100644
> > --- a/mm/debug.c
> > +++ b/mm/debug.c
> > @@ -54,8 +54,13 @@ static void __dump_page(struct page *page)
> > * inaccuracy here due to racing.
> > */
> > bool page_cma = is_migrate_cma_page(page);
> > - int mapcount;
> > + int mapcount, total_mapcount;
> > + int nr;
> > + int refcount;
> > + int pincount = 0;
> > + int comp_mapcnt;
> > char *type = "";
> > + bool is_slab = PageSlab(head);
> >
> > if (page < head || (page >= head + MAX_ORDER_NR_PAGES)) {
> > /*
> > @@ -82,22 +87,40 @@ static void __dump_page(struct page *page)
> > * page->_mapcount space in struct page is used by sl[aou]b pages to
> > * encode own info.
> > */
> > - mapcount = PageSlab(head) ? 0 : page_mapcount(page);
> > + mapcount = is_slab ? 0 : page_mapcount(page);
> > +
> > + refcount = page_ref_count(head);
> >
> > pr_warn("page:%p refcount:%d mapcount:%d mapping:%p index:%#lx pfn:%#lx\n",
> > - page, page_ref_count(head), mapcount, mapping,
> > + page, refcount, mapcount, mapping,
> > page_to_pgoff(page), page_to_pfn(page));
> > if (compound) {
> > + comp_mapcnt = head_compound_mapcount(head);
> > if (hpage_pincount_available(page)) {
> > + pincount = head_compound_pincount(head);
> > pr_warn("head:%p order:%u compound_mapcount:%d compound_pincount:%d\n",
> > head, compound_order(head),
> > - head_compound_mapcount(head),
> > - head_compound_pincount(head));
> > + comp_mapcnt, pincount);
> > } else {
> > pr_warn("head:%p order:%u compound_mapcount:%d\n",
> > head, compound_order(head),
> > - head_compound_mapcount(head));
> > + comp_mapcnt);
> > + }
> > +
> > + nr = compound_nr(head);
> > + if (is_slab)
> > + total_mapcount = 0;
> > + else if (PageHuge(head))
> > + total_mapcount = comp_mapcnt;
> > + else {
> > + if (mapping) {
> > + if (!PageAnon(head))
> > + nr = nr * (comp_mapcnt + 1) - comp_mapcnt;
> > + } else
> > + nr = 0;
> > + total_mapcount = refcount - pincount - nr;
> > }
> > + pr_warn("total_mapcount(estimated):%d\n", total_mapcount);
> > }
> >
> > #ifdef CONFIG_MEMCG
> >
>
On Fri, 28 May 2021, Yang Shi wrote:
> The total mapcount is a useful information for debugging, but we can't
> call total_mapcount() directly since it calls some assertions which may
> be triggered as commit 6dc5ea16c86f ("mm,
> dump_page: do not crash with bad compound_mapcount()") met.
>
> We could implement yet another implementation for dump_page() but
> it has the limitation when individual mapcount of subpages is corrupted.
>
> Actually the total mapcount could be decoded from refcount, pincount and
> compound mapcount although it may be not very precise due to some
> transient references.
>
> Signed-off-by: Yang Shi <[email protected]>
> ---
> I think we are on the same page that the total mapcount is useful
Well, it may be useful (and used to be shown) in the case we've been
thinking of; but there the critical fact, page_mapped(), is evident from
the fact that your VM_WARN_ON_ONCE_PAGE(page_mapped) is shown at all:
being a number, total_mapcount() tells a little more, but not a lot.
> information and it would be ideal to print this information when dumpping
Yes, I admit I did say "ideal": but not at this cost.
I'm sorry for pointing you down (something like) this path.
If total_mapcount() itself had been assuredly safe, it would
have been nice to add in; but not this substitute.
> page if possible. But how to implement it safely seems controversial.
> Some ideas and potential problems have been discussed by
> https://lore.kernel.org/linux-mm/[email protected]/.
>
> So I prepared this patch to show a possible approach to get some
> feedback. The same thing could be decoded by the reader of page dump
> as well by using the same formula used by this patch. However it sounds
> more convenient to have kernel do the math.
>
> mm/debug.c | 35 +++++++++++++++++++++++++++++------
> 1 file changed, 29 insertions(+), 6 deletions(-)
Adding that code to come up with a deceptive approximation to a number
which most sites won't care about: speaking for me, I'll say no.
Hugh
On 5/28/21 12:03 PM, Yang Shi wrote:
> On Fri, May 28, 2021 at 11:26 AM John Hubbard <[email protected]> wrote:
>>
>> On 5/28/21 10:54 AM, Yang Shi wrote:
>>> The total mapcount is a useful information for debugging, but we can't
>>> call total_mapcount() directly since it calls some assertions which may
>>> be triggered as commit 6dc5ea16c86f ("mm,
>>> dump_page: do not crash with bad compound_mapcount()") met.
>>>
>>> We could implement yet another implementation for dump_page() but
>>> it has the limitation when individual mapcount of subpages is corrupted.
>>>
>>> Actually the total mapcount could be decoded from refcount, pincount and
>>> compound mapcount although it may be not very precise due to some
>>> transient references.
>>
>> If the mapcount calculation were in a separate routine, *and* if something
>> else in addition to dump_page() used it, then I'd be interested in
>> calling it from dump_page().
>
> There is. The total_mapcount() is used by mm code. But as I mentioned
> in the commit log and that discussion email, it is not safe to call it
> directly in dump_page() path.
>
Right! I apologize for missing the point that it is a separate function.
But unfortunately, the conclusion here is still the same, if you are
unable to actually call that function.
Basically, if you can make simple calls to retrieve and print out
information, then that's a good situation for diagnostics routines such
as dump_page(). But if you have to open-code calculations and those start
to get complex, then you have probably gone off in the wrong direction.
Keep in mind that the diag routines themselves have to be correct, and
they usually don't have the same level of testing that other routines
do.
thanks,
--
John Hubbard
NVIDIA
On Fri, May 28, 2021 at 12:47:31PM -0700, Hugh Dickins wrote:
> > page if possible. But how to implement it safely seems controversial.
> > Some ideas and potential problems have been discussed by
> > https://lore.kernel.org/linux-mm/[email protected]/.
> >
> > So I prepared this patch to show a possible approach to get some
> > feedback. The same thing could be decoded by the reader of page dump
> > as well by using the same formula used by this patch. However it sounds
> > more convenient to have kernel do the math.
> >
> > mm/debug.c | 35 +++++++++++++++++++++++++++++------
> > 1 file changed, 29 insertions(+), 6 deletions(-)
>
> Adding that code to come up with a deceptive approximation to a number
> which most sites won't care about: speaking for me, I'll say no.
I agree. The approximation may bring more confusion than help debugging.
--
Kirill A. Shutemov