2018-12-04 22:46:57

by Anthony Yznaga

[permalink] [raw]
Subject: [PATCH] /proc/kpagecount: return 0 for special pages that are never mapped

Certain pages that are never mapped to userspace have a type
indicated in the page_type field of their struct pages (e.g. PG_buddy).
page_type overlaps with _mapcount so set the count to 0 and avoid
calling page_mapcount() for these pages.

Signed-off-by: Anthony Yznaga <[email protected]>
---
fs/proc/page.c | 2 +-
include/linux/page-flags.h | 7 +++++++
2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/fs/proc/page.c b/fs/proc/page.c
index 6c517b11acf8..40b05e0d4274 100644
--- a/fs/proc/page.c
+++ b/fs/proc/page.c
@@ -46,7 +46,7 @@ static ssize_t kpagecount_read(struct file *file, char __user *buf,
ppage = pfn_to_page(pfn);
else
ppage = NULL;
- if (!ppage || PageSlab(ppage))
+ if (!ppage || PageSlab(ppage) || page_has_type(ppage))
pcount = 0;
else
pcount = page_mapcount(ppage);
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 50ce1bddaf56..f9a1c50ccefc 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -673,10 +673,17 @@ static inline int TestClearPageDoubleMap(struct page *page)
#define PG_balloon 0x00000100
#define PG_kmemcg 0x00000200
#define PG_table 0x00000400
+#define PAGE_TYPE_ALL (PG_buddy | PG_balloon | PG_kmemcg | PG_table)

#define PageType(page, flag) \
((page->page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)

+static inline int page_has_type(struct page *page)
+{
+ return (PageType(page, 0) &&
+ ((page->page_type & PAGE_TYPE_ALL) != PAGE_TYPE_ALL));
+}
+
#define PAGE_TYPE_OPS(uname, lname) \
static __always_inline int Page##uname(struct page *page) \
{ \
--
1.8.3.1



2018-12-05 00:49:37

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH] /proc/kpagecount: return 0 for special pages that are never mapped

On Tue, Dec 04, 2018 at 02:45:26PM -0800, Anthony Yznaga wrote:
> Certain pages that are never mapped to userspace have a type
> indicated in the page_type field of their struct pages (e.g. PG_buddy).
> page_type overlaps with _mapcount so set the count to 0 and avoid
> calling page_mapcount() for these pages.
>
> Signed-off-by: Anthony Yznaga <[email protected]>
> ---
> fs/proc/page.c | 2 +-
> include/linux/page-flags.h | 7 +++++++
> 2 files changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/fs/proc/page.c b/fs/proc/page.c
> index 6c517b11acf8..40b05e0d4274 100644
> --- a/fs/proc/page.c
> +++ b/fs/proc/page.c
> @@ -46,7 +46,7 @@ static ssize_t kpagecount_read(struct file *file, char __user *buf,
> ppage = pfn_to_page(pfn);
> else
> ppage = NULL;
> - if (!ppage || PageSlab(ppage))
> + if (!ppage || PageSlab(ppage) || page_has_type(ppage))
> pcount = 0;
> else
> pcount = page_mapcount(ppage);
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 50ce1bddaf56..f9a1c50ccefc 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -673,10 +673,17 @@ static inline int TestClearPageDoubleMap(struct page *page)
> #define PG_balloon 0x00000100
> #define PG_kmemcg 0x00000200
> #define PG_table 0x00000400
> +#define PAGE_TYPE_ALL (PG_buddy | PG_balloon | PG_kmemcg | PG_table)
>
> #define PageType(page, flag) \
> ((page->page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
>
> +static inline int page_has_type(struct page *page)
> +{
> + return (PageType(page, 0) &&
> + ((page->page_type & PAGE_TYPE_ALL) != PAGE_TYPE_ALL));
> +}
> +
> #define PAGE_TYPE_OPS(uname, lname) \

I think this is a bit complex, and a bit of a pain to update as we add
new page types. How about this?

return (int)page_type < -128;

(I'm open to appropriate #defines to make this more obvious that it's ~0x7F)

2018-12-05 01:18:03

by Anthony Yznaga

[permalink] [raw]
Subject: Re: [PATCH] /proc/kpagecount: return 0 for special pages that are never mapped



On 12/04/2018 04:48 PM, Matthew Wilcox wrote:
> On Tue, Dec 04, 2018 at 02:45:26PM -0800, Anthony Yznaga wrote:
>> Certain pages that are never mapped to userspace have a type
>> indicated in the page_type field of their struct pages (e.g. PG_buddy).
>> page_type overlaps with _mapcount so set the count to 0 and avoid
>> calling page_mapcount() for these pages.
>>
>> Signed-off-by: Anthony Yznaga <[email protected]>
>> ---
>> fs/proc/page.c | 2 +-
>> include/linux/page-flags.h | 7 +++++++
>> 2 files changed, 8 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/proc/page.c b/fs/proc/page.c
>> index 6c517b11acf8..40b05e0d4274 100644
>> --- a/fs/proc/page.c
>> +++ b/fs/proc/page.c
>> @@ -46,7 +46,7 @@ static ssize_t kpagecount_read(struct file *file, char __user *buf,
>> ppage = pfn_to_page(pfn);
>> else
>> ppage = NULL;
>> - if (!ppage || PageSlab(ppage))
>> + if (!ppage || PageSlab(ppage) || page_has_type(ppage))
>> pcount = 0;
>> else
>> pcount = page_mapcount(ppage);
>> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
>> index 50ce1bddaf56..f9a1c50ccefc 100644
>> --- a/include/linux/page-flags.h
>> +++ b/include/linux/page-flags.h
>> @@ -673,10 +673,17 @@ static inline int TestClearPageDoubleMap(struct page *page)
>> #define PG_balloon 0x00000100
>> #define PG_kmemcg 0x00000200
>> #define PG_table 0x00000400
>> +#define PAGE_TYPE_ALL (PG_buddy | PG_balloon | PG_kmemcg | PG_table)
>>
>> #define PageType(page, flag) \
>> ((page->page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
>>
>> +static inline int page_has_type(struct page *page)
>> +{
>> + return (PageType(page, 0) &&
>> + ((page->page_type & PAGE_TYPE_ALL) != PAGE_TYPE_ALL));
>> +}
>> +
>> #define PAGE_TYPE_OPS(uname, lname) \
> I think this is a bit complex, and a bit of a pain to update as we add
> new page types. How about this?
>
> return (int)page_type < -128;
>
> (I'm open to appropriate #defines to make this more obvious that it's ~0x7F)


I thought about having this:

#define PAGE_TYPE_END    0xffffff80

static int inline page_has_type(struct page *page)
{
    return page->page_type > PAGE_TYPE_BASE &&
           page->page_type < PAGE_TYPE_END;
}

But I opted for the additional complexity to avoid more false-positives from
possibly corrupted values.  I'm certainly fine with a simple approach, though.



2018-12-05 01:26:30

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH] /proc/kpagecount: return 0 for special pages that are never mapped

On Tue, Dec 04, 2018 at 05:18:32PM -0800, [email protected] wrote:
> On 12/04/2018 04:48 PM, Matthew Wilcox wrote:
> > On Tue, Dec 04, 2018 at 02:45:26PM -0800, Anthony Yznaga wrote:
> >> +static inline int page_has_type(struct page *page)
> >> +{
> >> + return (PageType(page, 0) &&
> >> + ((page->page_type & PAGE_TYPE_ALL) != PAGE_TYPE_ALL));
> >> +}
> >> +
> >
> > I think this is a bit complex, and a bit of a pain to update as we add
> > new page types. How about this?
> >
> > return (int)page_type < -128;
> >
> > (I'm open to appropriate #defines to make this more obvious that it's ~0x7F)
>
> I thought about having this:
>
> #define PAGE_TYPE_END??? 0xffffff80
>
> static int inline page_has_type(struct page *page)
> {
> ??? return page->page_type > PAGE_TYPE_BASE &&
> ??? ?????? page->page_type < PAGE_TYPE_END;
> }
>
> But I opted for the additional complexity to avoid more false-positives from
> possibly corrupted values.? I'm certainly fine with a simple approach, though.

The way I'm thinking about this field is that usually it's _mapcount
which is 0xffffffff to represent 0. We allow a certain small amount
of underflow and still treat it as a mapcount. We also allow for some
amount of overflow. So to be utterly precise, what you had there would
have been fine, but for simplicity, I'd rather just do a signed compare
against -128.

2018-12-05 08:29:44

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH] /proc/kpagecount: return 0 for special pages that are never mapped

On 05.12.18 01:48, Matthew Wilcox wrote:
> On Tue, Dec 04, 2018 at 02:45:26PM -0800, Anthony Yznaga wrote:
>> Certain pages that are never mapped to userspace have a type
>> indicated in the page_type field of their struct pages (e.g. PG_buddy).
>> page_type overlaps with _mapcount so set the count to 0 and avoid
>> calling page_mapcount() for these pages.
>>
>> Signed-off-by: Anthony Yznaga <[email protected]>
>> ---
>> fs/proc/page.c | 2 +-
>> include/linux/page-flags.h | 7 +++++++
>> 2 files changed, 8 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/proc/page.c b/fs/proc/page.c
>> index 6c517b11acf8..40b05e0d4274 100644
>> --- a/fs/proc/page.c
>> +++ b/fs/proc/page.c
>> @@ -46,7 +46,7 @@ static ssize_t kpagecount_read(struct file *file, char __user *buf,
>> ppage = pfn_to_page(pfn);
>> else
>> ppage = NULL;
>> - if (!ppage || PageSlab(ppage))
>> + if (!ppage || PageSlab(ppage) || page_has_type(ppage))
>> pcount = 0;
>> else
>> pcount = page_mapcount(ppage);
>> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
>> index 50ce1bddaf56..f9a1c50ccefc 100644
>> --- a/include/linux/page-flags.h
>> +++ b/include/linux/page-flags.h
>> @@ -673,10 +673,17 @@ static inline int TestClearPageDoubleMap(struct page *page)
>> #define PG_balloon 0x00000100
>> #define PG_kmemcg 0x00000200
>> #define PG_table 0x00000400
>> +#define PAGE_TYPE_ALL (PG_buddy | PG_balloon | PG_kmemcg | PG_table)
>>
>> #define PageType(page, flag) \
>> ((page->page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
>>
>> +static inline int page_has_type(struct page *page)
>> +{
>> + return (PageType(page, 0) &&
>> + ((page->page_type & PAGE_TYPE_ALL) != PAGE_TYPE_ALL));
>> +}
>> +
>> #define PAGE_TYPE_OPS(uname, lname) \
>
> I think this is a bit complex, and a bit of a pain to update as we add
> new page types. How about this?
>
> return (int)page_type < -128;
>
> (I'm open to appropriate #defines to make this more obvious that it's ~0x7F)
>
There was already a collision on linux-next, where PG_balloon was
renamed to PG_offline.

--

Thanks,

David / dhildenb

2018-12-05 19:39:39

by Anthony Yznaga

[permalink] [raw]
Subject: Re: [PATCH] /proc/kpagecount: return 0 for special pages that are never mapped



On 12/04/2018 05:25 PM, Matthew Wilcox wrote:
> On Tue, Dec 04, 2018 at 05:18:32PM -0800, [email protected] wrote:
>> On 12/04/2018 04:48 PM, Matthew Wilcox wrote:
>>> On Tue, Dec 04, 2018 at 02:45:26PM -0800, Anthony Yznaga wrote:
>>>> +static inline int page_has_type(struct page *page)
>>>> +{
>>>> + return (PageType(page, 0) &&
>>>> + ((page->page_type & PAGE_TYPE_ALL) != PAGE_TYPE_ALL));
>>>> +}
>>>> +
>>> I think this is a bit complex, and a bit of a pain to update as we add
>>> new page types. How about this?
>>>
>>> return (int)page_type < -128;
>>>
>>> (I'm open to appropriate #defines to make this more obvious that it's ~0x7F)
>> I thought about having this:
>>
>> #define PAGE_TYPE_END    0xffffff80
>>
>> static int inline page_has_type(struct page *page)
>> {
>>     return page->page_type > PAGE_TYPE_BASE &&
>>            page->page_type < PAGE_TYPE_END;
>> }
>>
>> But I opted for the additional complexity to avoid more false-positives from
>> possibly corrupted values.  I'm certainly fine with a simple approach, though.
> The way I'm thinking about this field is that usually it's _mapcount
> which is 0xffffffff to represent 0. We allow a certain small amount
> of underflow and still treat it as a mapcount. We also allow for some
> amount of overflow. So to be utterly precise, what you had there would
> have been fine, but for simplicity, I'd rather just do a signed compare
> against -128.
The signed compare does not allow for mapcount overflow.  Is that acceptable?
False-positives would be benign for /proc/kpagecount though from a debug
perspective it could be helpful to see overflowed mapcounts.  Some future
caller would need separate consideration.




2018-12-05 19:45:08

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH] /proc/kpagecount: return 0 for special pages that are never mapped

On Wed, Dec 05, 2018 at 11:40:51AM -0800, Anthony Yznaga wrote:
> On 12/04/2018 05:25 PM, Matthew Wilcox wrote:
> > On Tue, Dec 04, 2018 at 05:18:32PM -0800, [email protected] wrote:
> >> On 12/04/2018 04:48 PM, Matthew Wilcox wrote:
> >>> On Tue, Dec 04, 2018 at 02:45:26PM -0800, Anthony Yznaga wrote:
> >>>> +static inline int page_has_type(struct page *page)
> >>>> +{
> >>>> + return (PageType(page, 0) &&
> >>>> + ((page->page_type & PAGE_TYPE_ALL) != PAGE_TYPE_ALL));
> >>>> +}
> >>>> +
> >>> I think this is a bit complex, and a bit of a pain to update as we add
> >>> new page types. How about this?
> >>>
> >>> return (int)page_type < -128;
> >>>
> >>> (I'm open to appropriate #defines to make this more obvious that it's ~0x7F)
> >> I thought about having this:
> >>
> >> #define PAGE_TYPE_END??? 0xffffff80
> >>
> >> static int inline page_has_type(struct page *page)
> >> {
> >> ??? return page->page_type > PAGE_TYPE_BASE &&
> >> ??? ?????? page->page_type < PAGE_TYPE_END;
> >> }
> >>
> >> But I opted for the additional complexity to avoid more false-positives from
> >> possibly corrupted values.? I'm certainly fine with a simple approach, though.
> > The way I'm thinking about this field is that usually it's _mapcount
> > which is 0xffffffff to represent 0. We allow a certain small amount
> > of underflow and still treat it as a mapcount. We also allow for some
> > amount of overflow. So to be utterly precise, what you had there would
> > have been fine, but for simplicity, I'd rather just do a signed compare
> > against -128.
> The signed compare does not allow for mapcount overflow.? Is that acceptable?
> False-positives would be benign for /proc/kpagecount though from a debug
> perspective it could be helpful to see overflowed mapcounts.? Some future
> caller would need separate consideration.

Nobody seems terribly interested in mapcount overflows. I got no response
to https://lkml.org/lkml/2018/3/2/991

2018-12-06 00:44:44

by Anthony Yznaga

[permalink] [raw]
Subject: Re: [PATCH] /proc/kpagecount: return 0 for special pages that are never mapped



On 12/05/2018 11:44 AM, Matthew Wilcox wrote:
> On Wed, Dec 05, 2018 at 11:40:51AM -0800, Anthony Yznaga wrote:
>> On 12/04/2018 05:25 PM, Matthew Wilcox wrote:
>>> On Tue, Dec 04, 2018 at 05:18:32PM -0800, [email protected] wrote:
>>>> On 12/04/2018 04:48 PM, Matthew Wilcox wrote:
>>>>> On Tue, Dec 04, 2018 at 02:45:26PM -0800, Anthony Yznaga wrote:
>>>>>> +static inline int page_has_type(struct page *page)
>>>>>> +{
>>>>>> + return (PageType(page, 0) &&
>>>>>> + ((page->page_type & PAGE_TYPE_ALL) != PAGE_TYPE_ALL));
>>>>>> +}
>>>>>> +
>>>>> I think this is a bit complex, and a bit of a pain to update as we add
>>>>> new page types. How about this?
>>>>>
>>>>> return (int)page_type < -128;
>>>>>
>>>>> (I'm open to appropriate #defines to make this more obvious that it's ~0x7F)
>>>> I thought about having this:
>>>>
>>>> #define PAGE_TYPE_END    0xffffff80
>>>>
>>>> static int inline page_has_type(struct page *page)
>>>> {
>>>>     return page->page_type > PAGE_TYPE_BASE &&
>>>>            page->page_type < PAGE_TYPE_END;
>>>> }
>>>>
>>>> But I opted for the additional complexity to avoid more false-positives from
>>>> possibly corrupted values.  I'm certainly fine with a simple approach, though.
>>> The way I'm thinking about this field is that usually it's _mapcount
>>> which is 0xffffffff to represent 0. We allow a certain small amount
>>> of underflow and still treat it as a mapcount. We also allow for some
>>> amount of overflow. So to be utterly precise, what you had there would
>>> have been fine, but for simplicity, I'd rather just do a signed compare
>>> against -128.
>> The signed compare does not allow for mapcount overflow.  Is that acceptable?
>> False-positives would be benign for /proc/kpagecount though from a debug
>> perspective it could be helpful to see overflowed mapcounts.  Some future
>> caller would need separate consideration.
> Nobody seems terribly interested in mapcount overflows. I got no response
> to https://lkml.org/lkml/2018/3/2/991

Okay.  Thanks for the background.

How about this, then:

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 50ce1bddaf56..39b4494e29f1 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -669,6 +669,7 @@ static inline int TestClearPageDoubleMap(struct page *page)
 
 #define PAGE_TYPE_BASE    0xf0000000
 /* Reserve        0x0000007f to catch underflows of page_mapcount */
+#define PAGE_MAPCOUNT_RESERVE    -128
 #define PG_buddy    0x00000080
 #define PG_balloon    0x00000100
 #define PG_kmemcg    0x00000200
@@ -677,6 +678,11 @@ static inline int TestClearPageDoubleMap(struct page *page)
 #define PageType(page, flag)                        \
     ((page->page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
 
+static inline int page_has_type(struct page *page)
+{
+    return (int)page->page_type < PAGE_MAPCOUNT_RESERVE;
+}
+
 #define PAGE_TYPE_OPS(uname, lname)                    \
 static __always_inline int Page##uname(struct page *page)        \
 {                                    \


2018-12-06 04:27:32

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH] /proc/kpagecount: return 0 for special pages that are never mapped

On Wed, Dec 05, 2018 at 04:44:15PM -0800, Anthony Yznaga wrote:
> On 12/05/2018 11:44 AM, Matthew Wilcox wrote:
> > Nobody seems terribly interested in mapcount overflows. I got no response
> > to https://lkml.org/lkml/2018/3/2/991
>
> Okay.? Thanks for the background.
>
> How about this, then:

Acked-by: Matthew Wilcox <[email protected]>

2018-12-06 06:06:48

by Anthony Yznaga

[permalink] [raw]
Subject: Re: [PATCH] /proc/kpagecount: return 0 for special pages that are never mapped



On 12/05/2018 08:26 PM, Matthew Wilcox wrote:
> On Wed, Dec 05, 2018 at 04:44:15PM -0800, Anthony Yznaga wrote:
>> On 12/05/2018 11:44 AM, Matthew Wilcox wrote:
>>> Nobody seems terribly interested in mapcount overflows. I got no response
>>> to https://lkml.org/lkml/2018/3/2/991
>> Okay.  Thanks for the background.
>>
>> How about this, then:
> Acked-by: Matthew Wilcox <[email protected]>

Thanks for all of the feedback, Matthew.

Andrew,
Would you like me to submit a revised patch?  An -mm tree diff?

Thanks,
Anthony

2018-12-08 00:28:52

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] /proc/kpagecount: return 0 for special pages that are never mapped

On Wed, 5 Dec 2018 22:07:37 -0800 Anthony Yznaga <[email protected]> wrote:

> Would you like me to submit a revised patch?? An -mm tree diff?

Either is OK. I usually turn replacemensts into deltas so we can see
what changed.