2019-08-06 03:09:42

by Prakhya, Sai Praneeth

[permalink] [raw]
Subject: [PATCH V2] fork: Improve error message for corrupted page tables

When a user process exits, the kernel cleans up the mm_struct of the user
process and during cleanup, check_mm() checks the page tables of the user
process for corruption (E.g: unexpected page flags set/cleared). For
corrupted page tables, the error message printed by check_mm() isn't very
clear as it prints the loop index instead of page table type (E.g: Resident
file mapping pages vs Resident shared memory pages). The loop index in
check_mm() is used to index rss_stat[] which represents individual memory
type stats. Hence, instead of printing index, print memory type, thereby
improving error message.

Without patch:
--------------
[ 204.836425] mm/pgtable-generic.c:29: bad p4d 0000000089eb4e92(800000025f941467)
[ 204.836544] BUG: Bad rss-counter state mm:00000000f75895ea idx:0 val:2
[ 204.836615] BUG: Bad rss-counter state mm:00000000f75895ea idx:1 val:5
[ 204.836685] BUG: non-zero pgtables_bytes on freeing mm: 20480

With patch:
-----------
[ 69.815453] mm/pgtable-generic.c:29: bad p4d 0000000084653642(800000025ca37467)
[ 69.815872] BUG: Bad rss-counter state mm:00000000014a6c03 type:MM_FILEPAGES val:2
[ 69.815962] BUG: Bad rss-counter state mm:00000000014a6c03 type:MM_ANONPAGES val:5
[ 69.816050] BUG: non-zero pgtables_bytes on freeing mm: 20480

Also, change print function (from printk(KERN_ALERT, ..) to pr_alert()) so
that it matches the other print statement.

Cc: Ingo Molnar <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Acked-by: Dave Hansen <[email protected]>
Suggested-by: Dave Hansen <[email protected]>
Signed-off-by: Sai Praneeth Prakhya <[email protected]>
---

Changes from V1 to V2:
----------------------
1. Move struct definition from header file to fork.c file, so that it won't be
included in every compilation unit. As this struct is used *only* in fork.c,
include the definition in fork.c itself.
2. Index the struct to match respective macros.
3. Mention about print function change in commit message.

kernel/fork.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index d8ae0f1b4148..f34f441c50c0 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -125,6 +125,13 @@ int nr_threads; /* The idle threads do not count.. */

static int max_threads; /* tunable limit on nr_threads */

+static const char * const resident_page_types[NR_MM_COUNTERS] = {
+ [MM_FILEPAGES] = "MM_FILEPAGES",
+ [MM_ANONPAGES] = "MM_ANONPAGES",
+ [MM_SWAPENTS] = "MM_SWAPENTS",
+ [MM_SHMEMPAGES] = "MM_SHMEMPAGES",
+};
+
DEFINE_PER_CPU(unsigned long, process_counts) = 0;

__cacheline_aligned DEFINE_RWLOCK(tasklist_lock); /* outer */
@@ -649,8 +656,8 @@ static void check_mm(struct mm_struct *mm)
long x = atomic_long_read(&mm->rss_stat.count[i]);

if (unlikely(x))
- printk(KERN_ALERT "BUG: Bad rss-counter state "
- "mm:%p idx:%d val:%ld\n", mm, i, x);
+ pr_alert("BUG: Bad rss-counter state mm:%p type:%s val:%ld\n",
+ mm, resident_page_types[i], x);
}

if (mm_pgtables_bytes(mm))
--
2.7.4


2019-08-06 07:55:18

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH V2] fork: Improve error message for corrupted page tables


On 8/6/19 5:05 AM, Sai Praneeth Prakhya wrote:
> When a user process exits, the kernel cleans up the mm_struct of the user
> process and during cleanup, check_mm() checks the page tables of the user
> process for corruption (E.g: unexpected page flags set/cleared). For
> corrupted page tables, the error message printed by check_mm() isn't very
> clear as it prints the loop index instead of page table type (E.g: Resident
> file mapping pages vs Resident shared memory pages). The loop index in
> check_mm() is used to index rss_stat[] which represents individual memory
> type stats. Hence, instead of printing index, print memory type, thereby
> improving error message.
>
> Without patch:
> --------------
> [ 204.836425] mm/pgtable-generic.c:29: bad p4d 0000000089eb4e92(800000025f941467)
> [ 204.836544] BUG: Bad rss-counter state mm:00000000f75895ea idx:0 val:2
> [ 204.836615] BUG: Bad rss-counter state mm:00000000f75895ea idx:1 val:5
> [ 204.836685] BUG: non-zero pgtables_bytes on freeing mm: 20480
>
> With patch:
> -----------
> [ 69.815453] mm/pgtable-generic.c:29: bad p4d 0000000084653642(800000025ca37467)
> [ 69.815872] BUG: Bad rss-counter state mm:00000000014a6c03 type:MM_FILEPAGES val:2
> [ 69.815962] BUG: Bad rss-counter state mm:00000000014a6c03 type:MM_ANONPAGES val:5
> [ 69.816050] BUG: non-zero pgtables_bytes on freeing mm: 20480
>
> Also, change print function (from printk(KERN_ALERT, ..) to pr_alert()) so
> that it matches the other print statement.
>
> Cc: Ingo Molnar <[email protected]>
> Cc: Vlastimil Babka <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: Anshuman Khandual <[email protected]>
> Acked-by: Dave Hansen <[email protected]>
> Suggested-by: Dave Hansen <[email protected]>
> Signed-off-by: Sai Praneeth Prakhya <[email protected]>

Acked-by: Vlastimil Babka <[email protected]>

I would also add something like this to reduce risk of breaking it in the
future:

----8<----
diff --git a/include/linux/mm_types_task.h b/include/linux/mm_types_task.h
index d7016dcb245e..a6f83cbe4603 100644
--- a/include/linux/mm_types_task.h
+++ b/include/linux/mm_types_task.h
@@ -36,6 +36,9 @@ struct vmacache {
struct vm_area_struct *vmas[VMACACHE_SIZE];
};

+/*
+ * When touching this, update also resident_page_types in kernel/fork.c
+ */
enum {
MM_FILEPAGES, /* Resident file mapping pages */
MM_ANONPAGES, /* Resident anonymous pages */

2019-08-06 08:32:58

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [PATCH V2] fork: Improve error message for corrupted page tables



On 08/06/2019 01:23 PM, Vlastimil Babka wrote:
>
> On 8/6/19 5:05 AM, Sai Praneeth Prakhya wrote:
>> When a user process exits, the kernel cleans up the mm_struct of the user
>> process and during cleanup, check_mm() checks the page tables of the user
>> process for corruption (E.g: unexpected page flags set/cleared). For
>> corrupted page tables, the error message printed by check_mm() isn't very
>> clear as it prints the loop index instead of page table type (E.g: Resident
>> file mapping pages vs Resident shared memory pages). The loop index in
>> check_mm() is used to index rss_stat[] which represents individual memory
>> type stats. Hence, instead of printing index, print memory type, thereby
>> improving error message.
>>
>> Without patch:
>> --------------
>> [ 204.836425] mm/pgtable-generic.c:29: bad p4d 0000000089eb4e92(800000025f941467)
>> [ 204.836544] BUG: Bad rss-counter state mm:00000000f75895ea idx:0 val:2
>> [ 204.836615] BUG: Bad rss-counter state mm:00000000f75895ea idx:1 val:5
>> [ 204.836685] BUG: non-zero pgtables_bytes on freeing mm: 20480
>>
>> With patch:
>> -----------
>> [ 69.815453] mm/pgtable-generic.c:29: bad p4d 0000000084653642(800000025ca37467)
>> [ 69.815872] BUG: Bad rss-counter state mm:00000000014a6c03 type:MM_FILEPAGES val:2
>> [ 69.815962] BUG: Bad rss-counter state mm:00000000014a6c03 type:MM_ANONPAGES val:5
>> [ 69.816050] BUG: non-zero pgtables_bytes on freeing mm: 20480
>>
>> Also, change print function (from printk(KERN_ALERT, ..) to pr_alert()) so
>> that it matches the other print statement.
>>
>> Cc: Ingo Molnar <[email protected]>
>> Cc: Vlastimil Babka <[email protected]>
>> Cc: Peter Zijlstra <[email protected]>
>> Cc: Andrew Morton <[email protected]>
>> Cc: Anshuman Khandual <[email protected]>
>> Acked-by: Dave Hansen <[email protected]>
>> Suggested-by: Dave Hansen <[email protected]>
>> Signed-off-by: Sai Praneeth Prakhya <[email protected]>
>
> Acked-by: Vlastimil Babka <[email protected]>
>
> I would also add something like this to reduce risk of breaking it in the
> future:
>
> ----8<----
> diff --git a/include/linux/mm_types_task.h b/include/linux/mm_types_task.h
> index d7016dcb245e..a6f83cbe4603 100644
> --- a/include/linux/mm_types_task.h
> +++ b/include/linux/mm_types_task.h
> @@ -36,6 +36,9 @@ struct vmacache {
> struct vm_area_struct *vmas[VMACACHE_SIZE];
> };
>
> +/*
> + * When touching this, update also resident_page_types in kernel/fork.c
> + */
> enum {
> MM_FILEPAGES, /* Resident file mapping pages */
> MM_ANONPAGES, /* Resident anonymous pages */
>

Agreed and with that

Reviewed-by: Anshuman Khandual <[email protected]>

2019-08-06 08:38:23

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH V2] fork: Improve error message for corrupted page tables

On Mon 05-08-19 20:05:27, Sai Praneeth Prakhya wrote:
> When a user process exits, the kernel cleans up the mm_struct of the user
> process and during cleanup, check_mm() checks the page tables of the user
> process for corruption (E.g: unexpected page flags set/cleared). For
> corrupted page tables, the error message printed by check_mm() isn't very
> clear as it prints the loop index instead of page table type (E.g: Resident
> file mapping pages vs Resident shared memory pages). The loop index in
> check_mm() is used to index rss_stat[] which represents individual memory
> type stats. Hence, instead of printing index, print memory type, thereby
> improving error message.
>
> Without patch:
> --------------
> [ 204.836425] mm/pgtable-generic.c:29: bad p4d 0000000089eb4e92(800000025f941467)
> [ 204.836544] BUG: Bad rss-counter state mm:00000000f75895ea idx:0 val:2
> [ 204.836615] BUG: Bad rss-counter state mm:00000000f75895ea idx:1 val:5
> [ 204.836685] BUG: non-zero pgtables_bytes on freeing mm: 20480
>
> With patch:
> -----------
> [ 69.815453] mm/pgtable-generic.c:29: bad p4d 0000000084653642(800000025ca37467)
> [ 69.815872] BUG: Bad rss-counter state mm:00000000014a6c03 type:MM_FILEPAGES val:2
> [ 69.815962] BUG: Bad rss-counter state mm:00000000014a6c03 type:MM_ANONPAGES val:5
> [ 69.816050] BUG: non-zero pgtables_bytes on freeing mm: 20480

I like this. On any occasion I am investigating an issue with an rss
inbalance I have to go back to kernel sources to see which pte type that
is.

> Also, change print function (from printk(KERN_ALERT, ..) to pr_alert()) so
> that it matches the other print statement.

good change as well. Maybe we should also lower the loglevel (in a
separate patch) as well. While this is not nice because we are
apparently leaking memory behind it shouldn't be really critical enough
to jump on normal consoles.

> Cc: Ingo Molnar <[email protected]>
> Cc: Vlastimil Babka <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: Anshuman Khandual <[email protected]>
> Acked-by: Dave Hansen <[email protected]>
> Suggested-by: Dave Hansen <[email protected]>
> Signed-off-by: Sai Praneeth Prakhya <[email protected]>

Acked-by: Michal Hocko <[email protected]>

> ---
>
> Changes from V1 to V2:
> ----------------------
> 1. Move struct definition from header file to fork.c file, so that it won't be
> included in every compilation unit. As this struct is used *only* in fork.c,
> include the definition in fork.c itself.
> 2. Index the struct to match respective macros.
> 3. Mention about print function change in commit message.
>
> kernel/fork.c | 11 +++++++++--
> 1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/fork.c b/kernel/fork.c
> index d8ae0f1b4148..f34f441c50c0 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -125,6 +125,13 @@ int nr_threads; /* The idle threads do not count.. */
>
> static int max_threads; /* tunable limit on nr_threads */
>
> +static const char * const resident_page_types[NR_MM_COUNTERS] = {
> + [MM_FILEPAGES] = "MM_FILEPAGES",
> + [MM_ANONPAGES] = "MM_ANONPAGES",
> + [MM_SWAPENTS] = "MM_SWAPENTS",
> + [MM_SHMEMPAGES] = "MM_SHMEMPAGES",
> +};
> +
> DEFINE_PER_CPU(unsigned long, process_counts) = 0;
>
> __cacheline_aligned DEFINE_RWLOCK(tasklist_lock); /* outer */
> @@ -649,8 +656,8 @@ static void check_mm(struct mm_struct *mm)
> long x = atomic_long_read(&mm->rss_stat.count[i]);
>
> if (unlikely(x))
> - printk(KERN_ALERT "BUG: Bad rss-counter state "
> - "mm:%p idx:%d val:%ld\n", mm, i, x);
> + pr_alert("BUG: Bad rss-counter state mm:%p type:%s val:%ld\n",
> + mm, resident_page_types[i], x);
> }
>
> if (mm_pgtables_bytes(mm))
> --
> 2.7.4

--
Michal Hocko
SUSE Labs

2019-08-06 16:31:41

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH V2] fork: Improve error message for corrupted page tables

On 8/5/19 8:05 PM, Sai Praneeth Prakhya wrote:
> +static const char * const resident_page_types[NR_MM_COUNTERS] = {
> + [MM_FILEPAGES] = "MM_FILEPAGES",
> + [MM_ANONPAGES] = "MM_ANONPAGES",
> + [MM_SWAPENTS] = "MM_SWAPENTS",
> + [MM_SHMEMPAGES] = "MM_SHMEMPAGES",
> +};

One trick to ensure that this gets updated if the names are ever
updated. You can do:

#define NAMED_ARRAY_INDEX(x) [x] = __stringify(x),

and

static const char * const resident_page_types[NR_MM_COUNTERS] = {
NAMED_ARRAY_INDEX(MM_FILE_PAGES),
NAMED_ARRAY_INDEX(MM_SHMEMPAGES),
...
};

That makes sure that any name changes make it into the strings. Then
stick a:

BUILD_BUG_ON(NR_MM_COUNTERS != ARRAY_SIZE(resident_page_types));

somewhere. That makes sure that any new array indexes get a string
added in the array. Otherwise you get nice, early, compile-time errors.

2019-08-06 17:15:57

by Prakhya, Sai Praneeth

[permalink] [raw]
Subject: RE: [PATCH V2] fork: Improve error message for corrupted page tables

> >> Without patch:
> >> --------------
> >> [ 204.836425] mm/pgtable-generic.c:29: bad p4d
> >> 0000000089eb4e92(800000025f941467)
> >> [ 204.836544] BUG: Bad rss-counter state mm:00000000f75895ea idx:0
> >> val:2 [ 204.836615] BUG: Bad rss-counter state mm:00000000f75895ea
> >> idx:1 val:5 [ 204.836685] BUG: non-zero pgtables_bytes on freeing
> >> mm: 20480
> >>
> >> With patch:
> >> -----------
> >> [ 69.815453] mm/pgtable-generic.c:29: bad p4d
> 0000000084653642(800000025ca37467)
> >> [ 69.815872] BUG: Bad rss-counter state mm:00000000014a6c03
> type:MM_FILEPAGES val:2
> >> [ 69.815962] BUG: Bad rss-counter state mm:00000000014a6c03
> type:MM_ANONPAGES val:5
> >> [ 69.816050] BUG: non-zero pgtables_bytes on freeing mm: 20480
> >>
> >> Also, change print function (from printk(KERN_ALERT, ..) to
> >> pr_alert()) so that it matches the other print statement.
> >>
> >> Cc: Ingo Molnar <[email protected]>
> >> Cc: Vlastimil Babka <[email protected]>
> >> Cc: Peter Zijlstra <[email protected]>
> >> Cc: Andrew Morton <[email protected]>
> >> Cc: Anshuman Khandual <[email protected]>
> >> Acked-by: Dave Hansen <[email protected]>
> >> Suggested-by: Dave Hansen <[email protected]>
> >> Signed-off-by: Sai Praneeth Prakhya <[email protected]>
> >
> > Acked-by: Vlastimil Babka <[email protected]>
> >
> > I would also add something like this to reduce risk of breaking it in
> > the
> > future:
> >
> > ----8<----
> > diff --git a/include/linux/mm_types_task.h
> > b/include/linux/mm_types_task.h index d7016dcb245e..a6f83cbe4603
> > 100644
> > --- a/include/linux/mm_types_task.h
> > +++ b/include/linux/mm_types_task.h
> > @@ -36,6 +36,9 @@ struct vmacache {
> > struct vm_area_struct *vmas[VMACACHE_SIZE]; };
> >
> > +/*
> > + * When touching this, update also resident_page_types in
> > +kernel/fork.c */
> > enum {
> > MM_FILEPAGES, /* Resident file mapping pages */
> > MM_ANONPAGES, /* Resident anonymous pages */
> >
>
> Agreed and with that
>
> Reviewed-by: Anshuman Khandual <[email protected]>

Thanks for the review and helping me in improving the patch :)

Regards,
Sai

2019-08-06 17:16:06

by Prakhya, Sai Praneeth

[permalink] [raw]
Subject: RE: [PATCH V2] fork: Improve error message for corrupted page tables

> > With patch:
> > -----------
> > [ 69.815453] mm/pgtable-generic.c:29: bad p4d
> 0000000084653642(800000025ca37467)
> > [ 69.815872] BUG: Bad rss-counter state mm:00000000014a6c03
> type:MM_FILEPAGES val:2
> > [ 69.815962] BUG: Bad rss-counter state mm:00000000014a6c03
> type:MM_ANONPAGES val:5
> > [ 69.816050] BUG: non-zero pgtables_bytes on freeing mm: 20480
> >
> > Also, change print function (from printk(KERN_ALERT, ..) to
> > pr_alert()) so that it matches the other print statement.
> >
> > Cc: Ingo Molnar <[email protected]>
> > Cc: Vlastimil Babka <[email protected]>
> > Cc: Peter Zijlstra <[email protected]>
> > Cc: Andrew Morton <[email protected]>
> > Cc: Anshuman Khandual <[email protected]>
> > Acked-by: Dave Hansen <[email protected]>
> > Suggested-by: Dave Hansen <[email protected]>
> > Signed-off-by: Sai Praneeth Prakhya <[email protected]>
>
> Acked-by: Vlastimil Babka <[email protected]>
>
> I would also add something like this to reduce risk of breaking it in the
> future:

Sure! Sounds good to me. Will add it to V3.

Regards,
Sai

2019-08-06 18:34:22

by Prakhya, Sai Praneeth

[permalink] [raw]
Subject: Re: [PATCH V2] fork: Improve error message for corrupted page tables

On Tue, 2019-08-06 at 09:30 -0700, Dave Hansen wrote:
> On 8/5/19 8:05 PM, Sai Praneeth Prakhya wrote:
> > +static const char * const resident_page_types[NR_MM_COUNTERS] = {
> > + [MM_FILEPAGES] = "MM_FILEPAGES",
> > + [MM_ANONPAGES] = "MM_ANONPAGES",
> > + [MM_SWAPENTS] = "MM_SWAPENTS",
> > + [MM_SHMEMPAGES] = "MM_SHMEMPAGES",
> > +};
>
> One trick to ensure that this gets updated if the names are ever
> updated. You can do:
>
> #define NAMED_ARRAY_INDEX(x) [x] = __stringify(x),
>
> and
>
> static const char * const resident_page_types[NR_MM_COUNTERS] = {
> NAMED_ARRAY_INDEX(MM_FILE_PAGES),
> NAMED_ARRAY_INDEX(MM_SHMEMPAGES),
> ...
> };

Thanks for the suggestion Dave. I will add this in V3.
Even with this, (if ever) anyone who changes the name of page types or adds an
new entry would still need to update struct resident_page_types[]. So, I will
add the comment as suggested by Vlastimil.

>
> That makes sure that any name changes make it into the strings. Then
> stick a:
>
> BUILD_BUG_ON(NR_MM_COUNTERS != ARRAY_SIZE(resident_page_types));
>
> somewhere. That makes sure that any new array indexes get a string
> added in the array. Otherwise you get nice, early, compile-time errors.

Sure! this sounds good and a small nit-bit :)
For the BUILD_BUG_ON() to work, the definition of struct should be changed as
below

static const char * const resident_page_types[] = {
...
}

i.e. we should not specify the size of array.

Regards,
Sai

2019-08-06 21:27:21

by Prakhya, Sai Praneeth

[permalink] [raw]
Subject: Re: [PATCH V2] fork: Improve error message for corrupted page tables

On Tue, 2019-08-06 at 10:36 +0200, Michal Hocko wrote:
> On Mon 05-08-19 20:05:27, Sai Praneeth Prakhya wrote:
> > When a user process exits, the kernel cleans up the mm_struct of the user
> > process and during cleanup, check_mm() checks the page tables of the user
> > process for corruption (E.g: unexpected page flags set/cleared). For
> > corrupted page tables, the error message printed by check_mm() isn't very
> > clear as it prints the loop index instead of page table type (E.g:
> > Resident
> > file mapping pages vs Resident shared memory pages). The loop index in
> > check_mm() is used to index rss_stat[] which represents individual memory
> > type stats. Hence, instead of printing index, print memory type, thereby
> > improving error message.
> >
> > Without patch:
> > --------------
> > [ 204.836425] mm/pgtable-generic.c:29: bad p4d
> > 0000000089eb4e92(800000025f941467)
> > [ 204.836544] BUG: Bad rss-counter state mm:00000000f75895ea idx:0 val:2
> > [ 204.836615] BUG: Bad rss-counter state mm:00000000f75895ea idx:1 val:5
> > [ 204.836685] BUG: non-zero pgtables_bytes on freeing mm: 20480
> >
> > With patch:
> > -----------
> > [ 69.815453] mm/pgtable-generic.c:29: bad p4d
> > 0000000084653642(800000025ca37467)
> > [ 69.815872] BUG: Bad rss-counter state mm:00000000014a6c03
> > type:MM_FILEPAGES val:2
> > [ 69.815962] BUG: Bad rss-counter state mm:00000000014a6c03
> > type:MM_ANONPAGES val:5
> > [ 69.816050] BUG: non-zero pgtables_bytes on freeing mm: 20480
>
> I like this. On any occasion I am investigating an issue with an rss
> inbalance I have to go back to kernel sources to see which pte type that
> is.
>

Hopefully, this patch will be useful to you the next time you run into any rss
imbalance issues.

> > Also, change print function (from printk(KERN_ALERT, ..) to pr_alert()) so
> > that it matches the other print statement.
>
> good change as well. Maybe we should also lower the loglevel (in a
> separate patch) as well. While this is not nice because we are
> apparently leaking memory behind it shouldn't be really critical enough
> to jump on normal consoles.

Ya.. I think, probably could be lowered to pr_err() or pr_warn().

Regards,
Sai