When a user process exits, the kernel cleans up the mm_struct of the user
process and during cleanup, check_mm() checks the page tables of the user
process for corruption (E.g: unexpected page flags set/cleared). For
corrupted page tables, the error message printed by check_mm() isn't very
clear as it prints the loop index instead of page table type (E.g: Resident
file mapping pages vs Resident shared memory pages). The loop index in
check_mm() is used to index rss_stat[] which represents individual memory
type stats. Hence, instead of printing index, print memory type, thereby
improving error message.
Without patch:
--------------
[ 204.836425] mm/pgtable-generic.c:29: bad p4d 0000000089eb4e92(800000025f941467)
[ 204.836544] BUG: Bad rss-counter state mm:00000000f75895ea idx:0 val:2
[ 204.836615] BUG: Bad rss-counter state mm:00000000f75895ea idx:1 val:5
[ 204.836685] BUG: non-zero pgtables_bytes on freeing mm: 20480
With patch:
-----------
[ 69.815453] mm/pgtable-generic.c:29: bad p4d 0000000084653642(800000025ca37467)
[ 69.815872] BUG: Bad rss-counter state mm:00000000014a6c03 type:MM_FILEPAGES val:2
[ 69.815962] BUG: Bad rss-counter state mm:00000000014a6c03 type:MM_ANONPAGES val:5
[ 69.816050] BUG: non-zero pgtables_bytes on freeing mm: 20480
Also, change print function (from printk(KERN_ALERT, ..) to pr_alert()) so
that it matches the other print statement.
Cc: Ingo Molnar <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Acked-by: Dave Hansen <[email protected]>
Suggested-by: Dave Hansen <[email protected]>
Signed-off-by: Sai Praneeth Prakhya <[email protected]>
---
Changes from V1 to V2:
----------------------
1. Move struct definition from header file to fork.c file, so that it won't be
included in every compilation unit. As this struct is used *only* in fork.c,
include the definition in fork.c itself.
2. Index the struct to match respective macros.
3. Mention about print function change in commit message.
kernel/fork.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/kernel/fork.c b/kernel/fork.c
index d8ae0f1b4148..f34f441c50c0 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -125,6 +125,13 @@ int nr_threads; /* The idle threads do not count.. */
static int max_threads; /* tunable limit on nr_threads */
+static const char * const resident_page_types[NR_MM_COUNTERS] = {
+ [MM_FILEPAGES] = "MM_FILEPAGES",
+ [MM_ANONPAGES] = "MM_ANONPAGES",
+ [MM_SWAPENTS] = "MM_SWAPENTS",
+ [MM_SHMEMPAGES] = "MM_SHMEMPAGES",
+};
+
DEFINE_PER_CPU(unsigned long, process_counts) = 0;
__cacheline_aligned DEFINE_RWLOCK(tasklist_lock); /* outer */
@@ -649,8 +656,8 @@ static void check_mm(struct mm_struct *mm)
long x = atomic_long_read(&mm->rss_stat.count[i]);
if (unlikely(x))
- printk(KERN_ALERT "BUG: Bad rss-counter state "
- "mm:%p idx:%d val:%ld\n", mm, i, x);
+ pr_alert("BUG: Bad rss-counter state mm:%p type:%s val:%ld\n",
+ mm, resident_page_types[i], x);
}
if (mm_pgtables_bytes(mm))
--
2.7.4
On 8/6/19 5:05 AM, Sai Praneeth Prakhya wrote:
> When a user process exits, the kernel cleans up the mm_struct of the user
> process and during cleanup, check_mm() checks the page tables of the user
> process for corruption (E.g: unexpected page flags set/cleared). For
> corrupted page tables, the error message printed by check_mm() isn't very
> clear as it prints the loop index instead of page table type (E.g: Resident
> file mapping pages vs Resident shared memory pages). The loop index in
> check_mm() is used to index rss_stat[] which represents individual memory
> type stats. Hence, instead of printing index, print memory type, thereby
> improving error message.
>
> Without patch:
> --------------
> [ 204.836425] mm/pgtable-generic.c:29: bad p4d 0000000089eb4e92(800000025f941467)
> [ 204.836544] BUG: Bad rss-counter state mm:00000000f75895ea idx:0 val:2
> [ 204.836615] BUG: Bad rss-counter state mm:00000000f75895ea idx:1 val:5
> [ 204.836685] BUG: non-zero pgtables_bytes on freeing mm: 20480
>
> With patch:
> -----------
> [ 69.815453] mm/pgtable-generic.c:29: bad p4d 0000000084653642(800000025ca37467)
> [ 69.815872] BUG: Bad rss-counter state mm:00000000014a6c03 type:MM_FILEPAGES val:2
> [ 69.815962] BUG: Bad rss-counter state mm:00000000014a6c03 type:MM_ANONPAGES val:5
> [ 69.816050] BUG: non-zero pgtables_bytes on freeing mm: 20480
>
> Also, change print function (from printk(KERN_ALERT, ..) to pr_alert()) so
> that it matches the other print statement.
>
> Cc: Ingo Molnar <[email protected]>
> Cc: Vlastimil Babka <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: Anshuman Khandual <[email protected]>
> Acked-by: Dave Hansen <[email protected]>
> Suggested-by: Dave Hansen <[email protected]>
> Signed-off-by: Sai Praneeth Prakhya <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
I would also add something like this to reduce risk of breaking it in the
future:
----8<----
diff --git a/include/linux/mm_types_task.h b/include/linux/mm_types_task.h
index d7016dcb245e..a6f83cbe4603 100644
--- a/include/linux/mm_types_task.h
+++ b/include/linux/mm_types_task.h
@@ -36,6 +36,9 @@ struct vmacache {
struct vm_area_struct *vmas[VMACACHE_SIZE];
};
+/*
+ * When touching this, update also resident_page_types in kernel/fork.c
+ */
enum {
MM_FILEPAGES, /* Resident file mapping pages */
MM_ANONPAGES, /* Resident anonymous pages */
On 08/06/2019 01:23 PM, Vlastimil Babka wrote:
>
> On 8/6/19 5:05 AM, Sai Praneeth Prakhya wrote:
>> When a user process exits, the kernel cleans up the mm_struct of the user
>> process and during cleanup, check_mm() checks the page tables of the user
>> process for corruption (E.g: unexpected page flags set/cleared). For
>> corrupted page tables, the error message printed by check_mm() isn't very
>> clear as it prints the loop index instead of page table type (E.g: Resident
>> file mapping pages vs Resident shared memory pages). The loop index in
>> check_mm() is used to index rss_stat[] which represents individual memory
>> type stats. Hence, instead of printing index, print memory type, thereby
>> improving error message.
>>
>> Without patch:
>> --------------
>> [ 204.836425] mm/pgtable-generic.c:29: bad p4d 0000000089eb4e92(800000025f941467)
>> [ 204.836544] BUG: Bad rss-counter state mm:00000000f75895ea idx:0 val:2
>> [ 204.836615] BUG: Bad rss-counter state mm:00000000f75895ea idx:1 val:5
>> [ 204.836685] BUG: non-zero pgtables_bytes on freeing mm: 20480
>>
>> With patch:
>> -----------
>> [ 69.815453] mm/pgtable-generic.c:29: bad p4d 0000000084653642(800000025ca37467)
>> [ 69.815872] BUG: Bad rss-counter state mm:00000000014a6c03 type:MM_FILEPAGES val:2
>> [ 69.815962] BUG: Bad rss-counter state mm:00000000014a6c03 type:MM_ANONPAGES val:5
>> [ 69.816050] BUG: non-zero pgtables_bytes on freeing mm: 20480
>>
>> Also, change print function (from printk(KERN_ALERT, ..) to pr_alert()) so
>> that it matches the other print statement.
>>
>> Cc: Ingo Molnar <[email protected]>
>> Cc: Vlastimil Babka <[email protected]>
>> Cc: Peter Zijlstra <[email protected]>
>> Cc: Andrew Morton <[email protected]>
>> Cc: Anshuman Khandual <[email protected]>
>> Acked-by: Dave Hansen <[email protected]>
>> Suggested-by: Dave Hansen <[email protected]>
>> Signed-off-by: Sai Praneeth Prakhya <[email protected]>
>
> Acked-by: Vlastimil Babka <[email protected]>
>
> I would also add something like this to reduce risk of breaking it in the
> future:
>
> ----8<----
> diff --git a/include/linux/mm_types_task.h b/include/linux/mm_types_task.h
> index d7016dcb245e..a6f83cbe4603 100644
> --- a/include/linux/mm_types_task.h
> +++ b/include/linux/mm_types_task.h
> @@ -36,6 +36,9 @@ struct vmacache {
> struct vm_area_struct *vmas[VMACACHE_SIZE];
> };
>
> +/*
> + * When touching this, update also resident_page_types in kernel/fork.c
> + */
> enum {
> MM_FILEPAGES, /* Resident file mapping pages */
> MM_ANONPAGES, /* Resident anonymous pages */
>
Agreed and with that
Reviewed-by: Anshuman Khandual <[email protected]>
On Mon 05-08-19 20:05:27, Sai Praneeth Prakhya wrote:
> When a user process exits, the kernel cleans up the mm_struct of the user
> process and during cleanup, check_mm() checks the page tables of the user
> process for corruption (E.g: unexpected page flags set/cleared). For
> corrupted page tables, the error message printed by check_mm() isn't very
> clear as it prints the loop index instead of page table type (E.g: Resident
> file mapping pages vs Resident shared memory pages). The loop index in
> check_mm() is used to index rss_stat[] which represents individual memory
> type stats. Hence, instead of printing index, print memory type, thereby
> improving error message.
>
> Without patch:
> --------------
> [ 204.836425] mm/pgtable-generic.c:29: bad p4d 0000000089eb4e92(800000025f941467)
> [ 204.836544] BUG: Bad rss-counter state mm:00000000f75895ea idx:0 val:2
> [ 204.836615] BUG: Bad rss-counter state mm:00000000f75895ea idx:1 val:5
> [ 204.836685] BUG: non-zero pgtables_bytes on freeing mm: 20480
>
> With patch:
> -----------
> [ 69.815453] mm/pgtable-generic.c:29: bad p4d 0000000084653642(800000025ca37467)
> [ 69.815872] BUG: Bad rss-counter state mm:00000000014a6c03 type:MM_FILEPAGES val:2
> [ 69.815962] BUG: Bad rss-counter state mm:00000000014a6c03 type:MM_ANONPAGES val:5
> [ 69.816050] BUG: non-zero pgtables_bytes on freeing mm: 20480
I like this. On any occasion I am investigating an issue with an rss
inbalance I have to go back to kernel sources to see which pte type that
is.
> Also, change print function (from printk(KERN_ALERT, ..) to pr_alert()) so
> that it matches the other print statement.
good change as well. Maybe we should also lower the loglevel (in a
separate patch) as well. While this is not nice because we are
apparently leaking memory behind it shouldn't be really critical enough
to jump on normal consoles.
> Cc: Ingo Molnar <[email protected]>
> Cc: Vlastimil Babka <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: Anshuman Khandual <[email protected]>
> Acked-by: Dave Hansen <[email protected]>
> Suggested-by: Dave Hansen <[email protected]>
> Signed-off-by: Sai Praneeth Prakhya <[email protected]>
Acked-by: Michal Hocko <[email protected]>
> ---
>
> Changes from V1 to V2:
> ----------------------
> 1. Move struct definition from header file to fork.c file, so that it won't be
> included in every compilation unit. As this struct is used *only* in fork.c,
> include the definition in fork.c itself.
> 2. Index the struct to match respective macros.
> 3. Mention about print function change in commit message.
>
> kernel/fork.c | 11 +++++++++--
> 1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/fork.c b/kernel/fork.c
> index d8ae0f1b4148..f34f441c50c0 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -125,6 +125,13 @@ int nr_threads; /* The idle threads do not count.. */
>
> static int max_threads; /* tunable limit on nr_threads */
>
> +static const char * const resident_page_types[NR_MM_COUNTERS] = {
> + [MM_FILEPAGES] = "MM_FILEPAGES",
> + [MM_ANONPAGES] = "MM_ANONPAGES",
> + [MM_SWAPENTS] = "MM_SWAPENTS",
> + [MM_SHMEMPAGES] = "MM_SHMEMPAGES",
> +};
> +
> DEFINE_PER_CPU(unsigned long, process_counts) = 0;
>
> __cacheline_aligned DEFINE_RWLOCK(tasklist_lock); /* outer */
> @@ -649,8 +656,8 @@ static void check_mm(struct mm_struct *mm)
> long x = atomic_long_read(&mm->rss_stat.count[i]);
>
> if (unlikely(x))
> - printk(KERN_ALERT "BUG: Bad rss-counter state "
> - "mm:%p idx:%d val:%ld\n", mm, i, x);
> + pr_alert("BUG: Bad rss-counter state mm:%p type:%s val:%ld\n",
> + mm, resident_page_types[i], x);
> }
>
> if (mm_pgtables_bytes(mm))
> --
> 2.7.4
--
Michal Hocko
SUSE Labs
On 8/5/19 8:05 PM, Sai Praneeth Prakhya wrote:
> +static const char * const resident_page_types[NR_MM_COUNTERS] = {
> + [MM_FILEPAGES] = "MM_FILEPAGES",
> + [MM_ANONPAGES] = "MM_ANONPAGES",
> + [MM_SWAPENTS] = "MM_SWAPENTS",
> + [MM_SHMEMPAGES] = "MM_SHMEMPAGES",
> +};
One trick to ensure that this gets updated if the names are ever
updated. You can do:
#define NAMED_ARRAY_INDEX(x) [x] = __stringify(x),
and
static const char * const resident_page_types[NR_MM_COUNTERS] = {
NAMED_ARRAY_INDEX(MM_FILE_PAGES),
NAMED_ARRAY_INDEX(MM_SHMEMPAGES),
...
};
That makes sure that any name changes make it into the strings. Then
stick a:
BUILD_BUG_ON(NR_MM_COUNTERS != ARRAY_SIZE(resident_page_types));
somewhere. That makes sure that any new array indexes get a string
added in the array. Otherwise you get nice, early, compile-time errors.
> >> Without patch:
> >> --------------
> >> [ 204.836425] mm/pgtable-generic.c:29: bad p4d
> >> 0000000089eb4e92(800000025f941467)
> >> [ 204.836544] BUG: Bad rss-counter state mm:00000000f75895ea idx:0
> >> val:2 [ 204.836615] BUG: Bad rss-counter state mm:00000000f75895ea
> >> idx:1 val:5 [ 204.836685] BUG: non-zero pgtables_bytes on freeing
> >> mm: 20480
> >>
> >> With patch:
> >> -----------
> >> [ 69.815453] mm/pgtable-generic.c:29: bad p4d
> 0000000084653642(800000025ca37467)
> >> [ 69.815872] BUG: Bad rss-counter state mm:00000000014a6c03
> type:MM_FILEPAGES val:2
> >> [ 69.815962] BUG: Bad rss-counter state mm:00000000014a6c03
> type:MM_ANONPAGES val:5
> >> [ 69.816050] BUG: non-zero pgtables_bytes on freeing mm: 20480
> >>
> >> Also, change print function (from printk(KERN_ALERT, ..) to
> >> pr_alert()) so that it matches the other print statement.
> >>
> >> Cc: Ingo Molnar <[email protected]>
> >> Cc: Vlastimil Babka <[email protected]>
> >> Cc: Peter Zijlstra <[email protected]>
> >> Cc: Andrew Morton <[email protected]>
> >> Cc: Anshuman Khandual <[email protected]>
> >> Acked-by: Dave Hansen <[email protected]>
> >> Suggested-by: Dave Hansen <[email protected]>
> >> Signed-off-by: Sai Praneeth Prakhya <[email protected]>
> >
> > Acked-by: Vlastimil Babka <[email protected]>
> >
> > I would also add something like this to reduce risk of breaking it in
> > the
> > future:
> >
> > ----8<----
> > diff --git a/include/linux/mm_types_task.h
> > b/include/linux/mm_types_task.h index d7016dcb245e..a6f83cbe4603
> > 100644
> > --- a/include/linux/mm_types_task.h
> > +++ b/include/linux/mm_types_task.h
> > @@ -36,6 +36,9 @@ struct vmacache {
> > struct vm_area_struct *vmas[VMACACHE_SIZE]; };
> >
> > +/*
> > + * When touching this, update also resident_page_types in
> > +kernel/fork.c */
> > enum {
> > MM_FILEPAGES, /* Resident file mapping pages */
> > MM_ANONPAGES, /* Resident anonymous pages */
> >
>
> Agreed and with that
>
> Reviewed-by: Anshuman Khandual <[email protected]>
Thanks for the review and helping me in improving the patch :)
Regards,
Sai
> > With patch:
> > -----------
> > [ 69.815453] mm/pgtable-generic.c:29: bad p4d
> 0000000084653642(800000025ca37467)
> > [ 69.815872] BUG: Bad rss-counter state mm:00000000014a6c03
> type:MM_FILEPAGES val:2
> > [ 69.815962] BUG: Bad rss-counter state mm:00000000014a6c03
> type:MM_ANONPAGES val:5
> > [ 69.816050] BUG: non-zero pgtables_bytes on freeing mm: 20480
> >
> > Also, change print function (from printk(KERN_ALERT, ..) to
> > pr_alert()) so that it matches the other print statement.
> >
> > Cc: Ingo Molnar <[email protected]>
> > Cc: Vlastimil Babka <[email protected]>
> > Cc: Peter Zijlstra <[email protected]>
> > Cc: Andrew Morton <[email protected]>
> > Cc: Anshuman Khandual <[email protected]>
> > Acked-by: Dave Hansen <[email protected]>
> > Suggested-by: Dave Hansen <[email protected]>
> > Signed-off-by: Sai Praneeth Prakhya <[email protected]>
>
> Acked-by: Vlastimil Babka <[email protected]>
>
> I would also add something like this to reduce risk of breaking it in the
> future:
Sure! Sounds good to me. Will add it to V3.
Regards,
Sai
On Tue, 2019-08-06 at 09:30 -0700, Dave Hansen wrote:
> On 8/5/19 8:05 PM, Sai Praneeth Prakhya wrote:
> > +static const char * const resident_page_types[NR_MM_COUNTERS] = {
> > + [MM_FILEPAGES] = "MM_FILEPAGES",
> > + [MM_ANONPAGES] = "MM_ANONPAGES",
> > + [MM_SWAPENTS] = "MM_SWAPENTS",
> > + [MM_SHMEMPAGES] = "MM_SHMEMPAGES",
> > +};
>
> One trick to ensure that this gets updated if the names are ever
> updated. You can do:
>
> #define NAMED_ARRAY_INDEX(x) [x] = __stringify(x),
>
> and
>
> static const char * const resident_page_types[NR_MM_COUNTERS] = {
> NAMED_ARRAY_INDEX(MM_FILE_PAGES),
> NAMED_ARRAY_INDEX(MM_SHMEMPAGES),
> ...
> };
Thanks for the suggestion Dave. I will add this in V3.
Even with this, (if ever) anyone who changes the name of page types or adds an
new entry would still need to update struct resident_page_types[]. So, I will
add the comment as suggested by Vlastimil.
>
> That makes sure that any name changes make it into the strings. Then
> stick a:
>
> BUILD_BUG_ON(NR_MM_COUNTERS != ARRAY_SIZE(resident_page_types));
>
> somewhere. That makes sure that any new array indexes get a string
> added in the array. Otherwise you get nice, early, compile-time errors.
Sure! this sounds good and a small nit-bit :)
For the BUILD_BUG_ON() to work, the definition of struct should be changed as
below
static const char * const resident_page_types[] = {
...
}
i.e. we should not specify the size of array.
Regards,
Sai
On Tue, 2019-08-06 at 10:36 +0200, Michal Hocko wrote:
> On Mon 05-08-19 20:05:27, Sai Praneeth Prakhya wrote:
> > When a user process exits, the kernel cleans up the mm_struct of the user
> > process and during cleanup, check_mm() checks the page tables of the user
> > process for corruption (E.g: unexpected page flags set/cleared). For
> > corrupted page tables, the error message printed by check_mm() isn't very
> > clear as it prints the loop index instead of page table type (E.g:
> > Resident
> > file mapping pages vs Resident shared memory pages). The loop index in
> > check_mm() is used to index rss_stat[] which represents individual memory
> > type stats. Hence, instead of printing index, print memory type, thereby
> > improving error message.
> >
> > Without patch:
> > --------------
> > [ 204.836425] mm/pgtable-generic.c:29: bad p4d
> > 0000000089eb4e92(800000025f941467)
> > [ 204.836544] BUG: Bad rss-counter state mm:00000000f75895ea idx:0 val:2
> > [ 204.836615] BUG: Bad rss-counter state mm:00000000f75895ea idx:1 val:5
> > [ 204.836685] BUG: non-zero pgtables_bytes on freeing mm: 20480
> >
> > With patch:
> > -----------
> > [ 69.815453] mm/pgtable-generic.c:29: bad p4d
> > 0000000084653642(800000025ca37467)
> > [ 69.815872] BUG: Bad rss-counter state mm:00000000014a6c03
> > type:MM_FILEPAGES val:2
> > [ 69.815962] BUG: Bad rss-counter state mm:00000000014a6c03
> > type:MM_ANONPAGES val:5
> > [ 69.816050] BUG: non-zero pgtables_bytes on freeing mm: 20480
>
> I like this. On any occasion I am investigating an issue with an rss
> inbalance I have to go back to kernel sources to see which pte type that
> is.
>
Hopefully, this patch will be useful to you the next time you run into any rss
imbalance issues.
> > Also, change print function (from printk(KERN_ALERT, ..) to pr_alert()) so
> > that it matches the other print statement.
>
> good change as well. Maybe we should also lower the loglevel (in a
> separate patch) as well. While this is not nice because we are
> apparently leaking memory behind it shouldn't be really critical enough
> to jump on normal consoles.
Ya.. I think, probably could be lowered to pr_err() or pr_warn().
Regards,
Sai