This is the second version of patchset which originally aimed to improve the
page_owner functionality. Thanks to feedback from v1 and some bugs I
discovered along the way, it is now larger in scope and number of patches.
It's based on next-20151124.
For page_owner, the main changes are
o Use static key to further reduce overhead when compiled in but not enabled.
o Improve output wrt. page and pageblock migratetypes
o Transfer the info on page migrations and track last migration reason.
o Dump the info as part of dump_page() to hopefully help debugging.
For the last point, Kirill requested a human readable printing of gfp_mask and
migratetype after v1. At that point it probably makes a lot of sense to do the
same for page alloc failure and OOM warnings. The flags have been undergoing
revisions recently, and we might be getting reports from various kernel
versions that differ. The ./scripts/gfp-translate tool needs to be pointed at
the corresponding sources to be accurate. The downside is potentially breaking
scripts that grep these warnings, but it's not a first change done there over
the years.
Note I'm not entirely happy about the dump_gfpflag_names() implementation, due
to usage of pr_cont() unreliable on SMP (and I've seen spurious newlines in
dmesg output, while being correct on serial console or /var/log/messages).
It also doesn't allow plugging the gfp_mask translation into
/sys/kernel/debug/page_owner where it also could make sense. Maybe a new
*printf formatting flag? Too specialized maybe? Or just prepare the string in
a buffer on stack with strscpy?
Other changes since v1:
o Change placement of page owner migration calls to cover missing cases (Hugh)
o Move dump_page_owner() call up from dump_page_badflags(), so the latter can
be used for adding debugging prints without page owner info (Kirill)
Vlastimil Babka (9):
mm, debug: fix wrongly filtered flags in dump_vma()
mm, page_owner: print symbolic migratetype of both page and pageblock
mm, page_owner: convert page_owner_inited to static key
mm, page_owner: copy page owner info during migration
mm, page_owner: track and print last migrate reason
mm, debug: introduce dump_gfpflag_names() for symbolic printing of
gfp_flags
mm, page_owner: dump page owner info from dump_page()
mm, page_alloc: print symbolic gfp_flags on allocation failure
mm, oom: print symbolic gfp_flags in oom warning
Documentation/vm/page_owner.txt | 9 +++--
include/linux/migrate.h | 6 ++-
include/linux/mmdebug.h | 1 +
include/linux/mmzone.h | 3 ++
include/linux/page_ext.h | 1 +
include/linux/page_owner.h | 50 ++++++++++++++++++-------
include/trace/events/gfpflags.h | 14 +++++--
mm/debug.c | 44 ++++++++++++++++------
mm/migrate.c | 23 ++++++++++--
mm/oom_kill.c | 10 +++--
mm/page_alloc.c | 18 ++++++++-
mm/page_owner.c | 82 +++++++++++++++++++++++++++++++++++++----
mm/vmstat.c | 15 +-------
13 files changed, 213 insertions(+), 63 deletions(-)
--
2.6.3
The dump_vma() function uses dump_flags() for printing the flags as symbolic
names. That function however does a page-flags specific filtering of bits
higher than NR_PAGEFLAGS in order to remove the zone id part. For dump_vma()
this results in removing several VM_* flags from the symbolic translation.
Fix this by refactoring dump_flags() to dump_flag_names(), which only prints
the symbolic names in parentheses. Printing the raw flag value with a prefix,
and any filtering is left to the caller. In addition to fixing the bug, this
allows better flexibility, which will be useful to print gfp_flags by a later
patch.
Signed-off-by: Vlastimil Babka <[email protected]>
---
mm/debug.c | 32 ++++++++++++++++++++------------
1 file changed, 20 insertions(+), 12 deletions(-)
diff --git a/mm/debug.c b/mm/debug.c
index 8362765..d9718fc 100644
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -46,17 +46,14 @@ static const struct trace_print_flags pageflag_names[] = {
#endif
};
-static void dump_flags(unsigned long flags,
+static void dump_flag_names(unsigned long flags,
const struct trace_print_flags *names, int count)
{
const char *delim = "";
unsigned long mask;
int i;
- pr_emerg("flags: %#lx(", flags);
-
- /* remove zone id */
- flags &= (1UL << NR_PAGEFLAGS) - 1;
+ pr_cont("(");
for (i = 0; i < count && flags; i++) {
@@ -79,6 +76,8 @@ static void dump_flags(unsigned long flags,
void dump_page_badflags(struct page *page, const char *reason,
unsigned long badflags)
{
+ unsigned long printflags = page->flags;
+
pr_emerg("page:%p count:%d mapcount:%d mapping:%p index:%#lx",
page, atomic_read(&page->_count), page_mapcount(page),
page->mapping, page->index);
@@ -86,13 +85,19 @@ void dump_page_badflags(struct page *page, const char *reason,
pr_cont(" compound_mapcount: %d", compound_mapcount(page));
pr_cont("\n");
BUILD_BUG_ON(ARRAY_SIZE(pageflag_names) != __NR_PAGEFLAGS);
- dump_flags(page->flags, pageflag_names, ARRAY_SIZE(pageflag_names));
+
+ pr_emerg("flags: %#lx", printflags);
+ /* remove zone id */
+ printflags &= (1UL << NR_PAGEFLAGS) - 1;
+ dump_flag_names(printflags, pageflag_names, ARRAY_SIZE(pageflag_names));
+
if (reason)
pr_alert("page dumped because: %s\n", reason);
if (page->flags & badflags) {
- pr_alert("bad because of flags:\n");
- dump_flags(page->flags & badflags,
- pageflag_names, ARRAY_SIZE(pageflag_names));
+ printflags = page->flags & badflags;
+ pr_alert("bad because of flags: %#lx:", printflags);
+ dump_flag_names(printflags, pageflag_names,
+ ARRAY_SIZE(pageflag_names));
}
#ifdef CONFIG_MEMCG
if (page->mem_cgroup)
@@ -162,7 +167,9 @@ void dump_vma(const struct vm_area_struct *vma)
(unsigned long)pgprot_val(vma->vm_page_prot),
vma->anon_vma, vma->vm_ops, vma->vm_pgoff,
vma->vm_file, vma->vm_private_data);
- dump_flags(vma->vm_flags, vmaflags_names, ARRAY_SIZE(vmaflags_names));
+ pr_emerg("flags: %#lx", vma->vm_flags);
+ dump_flag_names(vma->vm_flags, vmaflags_names,
+ ARRAY_SIZE(vmaflags_names));
}
EXPORT_SYMBOL(dump_vma);
@@ -233,8 +240,9 @@ void dump_mm(const struct mm_struct *mm)
"" /* This is here to not have a comma! */
);
- dump_flags(mm->def_flags, vmaflags_names,
- ARRAY_SIZE(vmaflags_names));
+ pr_emerg("def_flags: %#lx(", mm->def_flags);
+ dump_flag_names(mm->def_flags, vmaflags_names,
+ ARRAY_SIZE(vmaflags_names));
}
#endif /* CONFIG_DEBUG_VM */
--
2.6.3
The information in /sys/kernel/debug/page_owner includes the migratetype of
the pageblock the page belongs to. This is also checked against the page's
migratetype (as declared by gfp_flags during its allocation), and the page is
reported as Fallback if its migratetype differs from the pageblock's one.
This is somewhat misleading because in fact fallback allocation is not the only
reason why these two can differ. It also doesn't direcly provide the page's
migratetype, although it's possible to derive that from the gfp_flags.
It's arguably better to print both page and pageblock's migratetype and leave
the interpretation to the consumer than to suggest fallback allocation as the
only possible reason. While at it, we can print the migratetypes as string
the same way as /proc/pagetypeinfo does, as some of the numeric values depend
on kernel configuration. For that, this patch moves the migratetype_names
array from #ifdef CONFIG_PROC_FS part of mm/vmstat.c to mm/page_alloc.c and
exports it.
Example page_owner entry after the patch:
Page allocated via order 0, mask 0x2420848
PFN 512 type Reclaimable Block 1 type Reclaimable Flags R LA
[<ffffffff81164e8a>] __alloc_pages_nodemask+0x15a/0xa30
[<ffffffff811ab808>] alloc_pages_current+0x88/0x120
[<ffffffff8115bc36>] __page_cache_alloc+0xe6/0x120
[<ffffffff8115c226>] pagecache_get_page+0x56/0x200
[<ffffffff81205892>] __getblk_slow+0xd2/0x2b0
[<ffffffff81205ab0>] __getblk_gfp+0x40/0x50
[<ffffffff81206ad7>] __breadahead+0x17/0x50
[<ffffffffa0437b27>] __ext4_get_inode_loc+0x397/0x3e0 [ext4]
Signed-off-by: Vlastimil Babka <[email protected]>
---
include/linux/mmzone.h | 3 +++
mm/page_alloc.c | 13 +++++++++++++
mm/page_owner.c | 6 +++---
mm/vmstat.c | 13 -------------
4 files changed, 19 insertions(+), 16 deletions(-)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 3b6fb71..2bfad18 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -63,6 +63,9 @@ enum {
MIGRATE_TYPES
};
+/* In mm/page_alloc.c; keep in sync also with show_migration_types() there */
+extern char * const migratetype_names[MIGRATE_TYPES];
+
#ifdef CONFIG_CMA
# define is_migrate_cma(migratetype) unlikely((migratetype) == MIGRATE_CMA)
#else
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 35ab351..61a023a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -229,6 +229,19 @@ static char * const zone_names[MAX_NR_ZONES] = {
#endif
};
+char * const migratetype_names[MIGRATE_TYPES] = {
+ "Unmovable",
+ "Movable",
+ "Reclaimable",
+ "HighAtomic",
+#ifdef CONFIG_CMA
+ "CMA",
+#endif
+#ifdef CONFIG_MEMORY_ISOLATION
+ "Isolate",
+#endif
+};
+
compound_page_dtor * const compound_page_dtors[] = {
NULL,
free_compound_page,
diff --git a/mm/page_owner.c b/mm/page_owner.c
index 983c3a1..f35826e 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -110,11 +110,11 @@ print_page_owner(char __user *buf, size_t count, unsigned long pfn,
pageblock_mt = get_pfnblock_migratetype(page, pfn);
page_mt = gfpflags_to_migratetype(page_ext->gfp_mask);
ret += snprintf(kbuf + ret, count - ret,
- "PFN %lu Block %lu type %d %s Flags %s%s%s%s%s%s%s%s%s%s%s%s\n",
+ "PFN %lu type %s Block %lu type %s Flags %s%s%s%s%s%s%s%s%s%s%s%s\n",
pfn,
+ migratetype_names[page_mt],
pfn >> pageblock_order,
- pageblock_mt,
- pageblock_mt != page_mt ? "Fallback" : " ",
+ migratetype_names[pageblock_mt],
PageLocked(page) ? "K" : " ",
PageError(page) ? "E" : " ",
PageReferenced(page) ? "R" : " ",
diff --git a/mm/vmstat.c b/mm/vmstat.c
index f7ebad2..53b722b 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -921,19 +921,6 @@ static void walk_zones_in_node(struct seq_file *m, pg_data_t *pgdat,
#endif
#ifdef CONFIG_PROC_FS
-static char * const migratetype_names[MIGRATE_TYPES] = {
- "Unmovable",
- "Movable",
- "Reclaimable",
- "HighAtomic",
-#ifdef CONFIG_CMA
- "CMA",
-#endif
-#ifdef CONFIG_MEMORY_ISOLATION
- "Isolate",
-#endif
-};
-
static void frag_show_print(struct seq_file *m, pg_data_t *pgdat,
struct zone *zone)
{
--
2.6.3
CONFIG_PAGE_OWNER attempts to impose negligible runtime overhead when enabled
during compilation, but not actually enabled during runtime by boot param
page_owner=on. This overhead can be further reduced using the static key
mechanism, which this patch does.
Signed-off-by: Vlastimil Babka <[email protected]>
---
Documentation/vm/page_owner.txt | 9 +++++----
include/linux/page_owner.h | 22 ++++++++++------------
mm/page_owner.c | 9 +++++----
mm/vmstat.c | 2 +-
4 files changed, 21 insertions(+), 21 deletions(-)
diff --git a/Documentation/vm/page_owner.txt b/Documentation/vm/page_owner.txt
index 8f3ce9b..ffff143 100644
--- a/Documentation/vm/page_owner.txt
+++ b/Documentation/vm/page_owner.txt
@@ -28,10 +28,11 @@ with page owner and page owner is disabled in runtime due to no enabling
boot option, runtime overhead is marginal. If disabled in runtime, it
doesn't require memory to store owner information, so there is no runtime
memory overhead. And, page owner inserts just two unlikely branches into
-the page allocator hotpath and if it returns false then allocation is
-done like as the kernel without page owner. These two unlikely branches
-would not affect to allocation performance. Following is the kernel's
-code size change due to this facility.
+the page allocator hotpath and if not enabled, then allocation is done
+like as the kernel without page owner. These two unlikely branches should
+not affect to allocation performance, especially if the static keys jump
+label patching functionality is available. Following is the kernel's code
+size change due to this facility.
- Without page owner
text data bss dec hex filename
diff --git a/include/linux/page_owner.h b/include/linux/page_owner.h
index cacaabe..8e2eb15 100644
--- a/include/linux/page_owner.h
+++ b/include/linux/page_owner.h
@@ -1,8 +1,10 @@
#ifndef __LINUX_PAGE_OWNER_H
#define __LINUX_PAGE_OWNER_H
+#include <linux/jump_label.h>
+
#ifdef CONFIG_PAGE_OWNER
-extern bool page_owner_inited;
+extern struct static_key_false page_owner_inited;
extern struct page_ext_operations page_owner_ops;
extern void __reset_page_owner(struct page *page, unsigned int order);
@@ -12,27 +14,23 @@ extern gfp_t __get_page_owner_gfp(struct page *page);
static inline void reset_page_owner(struct page *page, unsigned int order)
{
- if (likely(!page_owner_inited))
- return;
-
- __reset_page_owner(page, order);
+ if (static_branch_unlikely(&page_owner_inited))
+ __reset_page_owner(page, order);
}
static inline void set_page_owner(struct page *page,
unsigned int order, gfp_t gfp_mask)
{
- if (likely(!page_owner_inited))
- return;
-
- __set_page_owner(page, order, gfp_mask);
+ if (static_branch_unlikely(&page_owner_inited))
+ __set_page_owner(page, order, gfp_mask);
}
static inline gfp_t get_page_owner_gfp(struct page *page)
{
- if (likely(!page_owner_inited))
+ if (static_branch_unlikely(&page_owner_inited))
+ return __get_page_owner_gfp(page);
+ else
return 0;
-
- return __get_page_owner_gfp(page);
}
#else
static inline void reset_page_owner(struct page *page, unsigned int order)
diff --git a/mm/page_owner.c b/mm/page_owner.c
index f35826e..10a6a46 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -5,10 +5,11 @@
#include <linux/bootmem.h>
#include <linux/stacktrace.h>
#include <linux/page_owner.h>
+#include <linux/jump_label.h>
#include "internal.h"
static bool page_owner_disabled = true;
-bool page_owner_inited __read_mostly;
+DEFINE_STATIC_KEY_FALSE(page_owner_inited);
static void init_early_allocated_pages(void);
@@ -37,7 +38,7 @@ static void init_page_owner(void)
if (page_owner_disabled)
return;
- page_owner_inited = true;
+ static_branch_enable(&page_owner_inited);
init_early_allocated_pages();
}
@@ -157,7 +158,7 @@ read_page_owner(struct file *file, char __user *buf, size_t count, loff_t *ppos)
struct page *page;
struct page_ext *page_ext;
- if (!page_owner_inited)
+ if (!static_branch_unlikely(&page_owner_inited))
return -EINVAL;
page = NULL;
@@ -305,7 +306,7 @@ static int __init pageowner_init(void)
{
struct dentry *dentry;
- if (!page_owner_inited) {
+ if (!static_branch_unlikely(&page_owner_inited)) {
pr_info("page_owner is disabled\n");
return 0;
}
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 53b722b..3f7ec14 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1117,7 +1117,7 @@ static void pagetypeinfo_showmixedcount(struct seq_file *m, pg_data_t *pgdat)
#ifdef CONFIG_PAGE_OWNER
int mtype;
- if (!page_owner_inited)
+ if (!static_branch_unlikely(&page_owner_inited))
return;
drain_all_pages(NULL);
--
2.6.3
The page_owner mechanism stores gfp_flags of an allocation and stack trace
that lead to it. During page migration, the original information is
practically replaced by the allocation of free page as the migration target.
Arguably this is less useful and might lead to all the page_owner info for
migratable pages gradually converge towards compaction or numa balancing
migrations. It has also lead to inaccuracies such as one fixed by commit
e2cfc91120fa ("mm/page_owner: set correct gfp_mask on page_owner").
This patch thus introduces copying the page_owner info during migration.
However, since the fact that the page has been migrated from its original
place might be useful for debugging, the next patch will introduce a way to
track that information as well.
Signed-off-by: Vlastimil Babka <[email protected]>
---
include/linux/page_owner.h | 10 +++++++++-
mm/migrate.c | 3 +++
mm/page_owner.c | 25 +++++++++++++++++++++++++
3 files changed, 37 insertions(+), 1 deletion(-)
diff --git a/include/linux/page_owner.h b/include/linux/page_owner.h
index 8e2eb15..6440daa 100644
--- a/include/linux/page_owner.h
+++ b/include/linux/page_owner.h
@@ -11,6 +11,7 @@ extern void __reset_page_owner(struct page *page, unsigned int order);
extern void __set_page_owner(struct page *page,
unsigned int order, gfp_t gfp_mask);
extern gfp_t __get_page_owner_gfp(struct page *page);
+extern void __copy_page_owner(struct page *oldpage, struct page *newpage);
static inline void reset_page_owner(struct page *page, unsigned int order)
{
@@ -32,6 +33,11 @@ static inline gfp_t get_page_owner_gfp(struct page *page)
else
return 0;
}
+static inline void copy_page_owner(struct page *oldpage, struct page *newpage)
+{
+ if (static_branch_unlikely(&page_owner_inited))
+ __copy_page_owner(oldpage, newpage);
+}
#else
static inline void reset_page_owner(struct page *page, unsigned int order)
{
@@ -44,6 +50,8 @@ static inline gfp_t get_page_owner_gfp(struct page *page)
{
return 0;
}
-
+static inline void copy_page_owner(struct page *oldpage, struct page *newpage)
+{
+}
#endif /* CONFIG_PAGE_OWNER */
#endif /* __LINUX_PAGE_OWNER_H */
diff --git a/mm/migrate.c b/mm/migrate.c
index b1034f9..863a0f1 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -38,6 +38,7 @@
#include <linux/balloon_compaction.h>
#include <linux/mmu_notifier.h>
#include <linux/page_idle.h>
+#include <linux/page_owner.h>
#include <asm/tlbflush.h>
@@ -578,6 +579,8 @@ void migrate_page_copy(struct page *newpage, struct page *page)
*/
if (PageWriteback(newpage))
end_page_writeback(newpage);
+
+ copy_page_owner(page, newpage);
}
/************************************************************
diff --git a/mm/page_owner.c b/mm/page_owner.c
index 10a6a46..f571e55 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -84,6 +84,31 @@ gfp_t __get_page_owner_gfp(struct page *page)
return page_ext->gfp_mask;
}
+void __copy_page_owner(struct page *oldpage, struct page *newpage)
+{
+ struct page_ext *old_ext = lookup_page_ext(oldpage);
+ struct page_ext *new_ext = lookup_page_ext(newpage);
+ int i;
+
+ new_ext->order = old_ext->order;
+ new_ext->gfp_mask = old_ext->gfp_mask;
+ new_ext->nr_entries = old_ext->nr_entries;
+
+ for (i = 0; i < ARRAY_SIZE(new_ext->trace_entries); i++)
+ new_ext->trace_entries[i] = old_ext->trace_entries[i];
+
+ /*
+ * We don't clear the bit on the oldpage as it's going to be freed
+ * after migration. Until then, the info can be useful in case of
+ * a bug, and the overal stats will be off a bit only temporarily.
+ * Also, migrate_misplaced_transhuge_page() can still fail the
+ * migration and then we want the oldpage to retain the info. But
+ * in that case we also don't need to explicitly clear the info from
+ * the new page, which will be freed.
+ */
+ __set_bit(PAGE_EXT_OWNER, &new_ext->flags);
+}
+
static ssize_t
print_page_owner(char __user *buf, size_t count, unsigned long pfn,
struct page *page, struct page_ext *page_ext)
--
2.6.3
During migration, page_owner info is now copied with the rest of the page, so
the stacktrace leading to free page allocation during migration is overwritten.
For debugging purposes, it might be however useful to know that the page has
been migrated since its initial allocation. This might happen many times during
the lifetime for different reasons and fully tracking this, especially with
stacktraces would incur extra memory costs. As a compromise, store and print
the migrate_reason of the last migration that occured to the page. This is
enough to distinguish compaction, numa balancing etc.
Example page_owner entry after the patch:
Page allocated via order 0, mask 0x24280ca
PFN 669757 type Reclaimable Block 1308 type Reclaimable Flags UDLA
[<ffffffff81164e8a>] __alloc_pages_nodemask+0x15a/0xa30
[<ffffffff811ad435>] alloc_pages_vma+0xb5/0x250
[<ffffffff8118ba54>] handle_mm_fault+0x1304/0x1820
[<ffffffff81051213>] __do_page_fault+0x183/0x3f0
[<ffffffff810514a2>] do_page_fault+0x22/0x30
[<ffffffff81573ba8>] page_fault+0x28/0x30
[<ffffffffffffffff>] 0xffffffffffffffff
Page has been migrated, last migrate reason: compaction
Signed-off-by: Vlastimil Babka <[email protected]>
---
include/linux/migrate.h | 6 +++++-
include/linux/page_ext.h | 1 +
include/linux/page_owner.h | 9 +++++++++
mm/migrate.c | 20 +++++++++++++++++---
mm/page_owner.c | 17 +++++++++++++++++
5 files changed, 49 insertions(+), 4 deletions(-)
diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index cac1c09..ab92a8c 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -23,9 +23,13 @@ enum migrate_reason {
MR_SYSCALL, /* also applies to cpusets */
MR_MEMPOLICY_MBIND,
MR_NUMA_MISPLACED,
- MR_CMA
+ MR_CMA,
+ MR_TYPES
};
+/* In mm/migrate.c; also keep sync with include/trace/events/migrate.h */
+extern char * migrate_reason_names[MR_TYPES];
+
#ifdef CONFIG_MIGRATION
extern void putback_movable_pages(struct list_head *l);
diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h
index 17f118a..e1fe7cf 100644
--- a/include/linux/page_ext.h
+++ b/include/linux/page_ext.h
@@ -45,6 +45,7 @@ struct page_ext {
unsigned int order;
gfp_t gfp_mask;
unsigned int nr_entries;
+ int last_migrate_reason;
unsigned long trace_entries[8];
#endif
};
diff --git a/include/linux/page_owner.h b/include/linux/page_owner.h
index 6440daa..555893b 100644
--- a/include/linux/page_owner.h
+++ b/include/linux/page_owner.h
@@ -12,6 +12,7 @@ extern void __set_page_owner(struct page *page,
unsigned int order, gfp_t gfp_mask);
extern gfp_t __get_page_owner_gfp(struct page *page);
extern void __copy_page_owner(struct page *oldpage, struct page *newpage);
+extern void __set_page_owner_migrate_reason(struct page *page, int reason);
static inline void reset_page_owner(struct page *page, unsigned int order)
{
@@ -38,6 +39,11 @@ static inline void copy_page_owner(struct page *oldpage, struct page *newpage)
if (static_branch_unlikely(&page_owner_inited))
__copy_page_owner(oldpage, newpage);
}
+static inline void set_page_owner_migrate_reason(struct page *page, int reason)
+{
+ if (static_branch_unlikely(&page_owner_inited))
+ __set_page_owner_migrate_reason(page, reason);
+}
#else
static inline void reset_page_owner(struct page *page, unsigned int order)
{
@@ -53,5 +59,8 @@ static inline gfp_t get_page_owner_gfp(struct page *page)
static inline void copy_page_owner(struct page *oldpage, struct page *newpage)
{
}
+static inline void set_page_owner_migrate_reason(struct page *page, int reason)
+{
+}
#endif /* CONFIG_PAGE_OWNER */
#endif /* __LINUX_PAGE_OWNER_H */
diff --git a/mm/migrate.c b/mm/migrate.c
index 863a0f1..12e9ab9 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -47,6 +47,16 @@
#include "internal.h"
+char *migrate_reason_names[MR_TYPES] = {
+ "compaction",
+ "memory_failure",
+ "memory_hotplug",
+ "syscall_or_cpuset",
+ "mempolicy_mbind",
+ "numa_misplaced",
+ "cma",
+};
+
/*
* migrate_prep() needs to be called before we start compiling a list of pages
* to be migrated using isolate_lru_page(). If scheduling work on other CPUs is
@@ -955,8 +965,10 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page,
}
rc = __unmap_and_move(page, newpage, force, mode);
- if (rc == MIGRATEPAGE_SUCCESS)
+ if (rc == MIGRATEPAGE_SUCCESS) {
put_new_page = NULL;
+ set_page_owner_migrate_reason(newpage, reason);
+ }
out:
if (rc != -EAGAIN) {
@@ -1021,7 +1033,7 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page,
static int unmap_and_move_huge_page(new_page_t get_new_page,
free_page_t put_new_page, unsigned long private,
struct page *hpage, int force,
- enum migrate_mode mode)
+ enum migrate_mode mode, int reason)
{
int rc = -EAGAIN;
int *result = NULL;
@@ -1079,6 +1091,7 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
if (rc == MIGRATEPAGE_SUCCESS) {
hugetlb_cgroup_migrate(hpage, new_hpage);
put_new_page = NULL;
+ set_page_owner_migrate_reason(new_hpage, reason);
}
unlock_page(hpage);
@@ -1151,7 +1164,7 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
if (PageHuge(page))
rc = unmap_and_move_huge_page(get_new_page,
put_new_page, private, page,
- pass > 2, mode);
+ pass > 2, mode, reason);
else
rc = unmap_and_move(get_new_page, put_new_page,
private, page, pass > 2, mode,
@@ -1842,6 +1855,7 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
set_page_memcg(new_page, page_memcg(page));
set_page_memcg(page, NULL);
page_remove_rmap(page, true);
+ set_page_owner_migrate_reason(new_page, MR_NUMA_MISPLACED);
spin_unlock(ptl);
mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
diff --git a/mm/page_owner.c b/mm/page_owner.c
index f571e55..59fd6f1 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -6,6 +6,7 @@
#include <linux/stacktrace.h>
#include <linux/page_owner.h>
#include <linux/jump_label.h>
+#include <linux/migrate.h>
#include "internal.h"
static bool page_owner_disabled = true;
@@ -73,10 +74,18 @@ void __set_page_owner(struct page *page, unsigned int order, gfp_t gfp_mask)
page_ext->order = order;
page_ext->gfp_mask = gfp_mask;
page_ext->nr_entries = trace.nr_entries;
+ page_ext->last_migrate_reason = -1;
__set_bit(PAGE_EXT_OWNER, &page_ext->flags);
}
+void __set_page_owner_migrate_reason(struct page *page, int reason)
+{
+ struct page_ext *page_ext = lookup_page_ext(page);
+
+ page_ext->last_migrate_reason = reason;
+}
+
gfp_t __get_page_owner_gfp(struct page *page)
{
struct page_ext *page_ext = lookup_page_ext(page);
@@ -161,6 +170,14 @@ print_page_owner(char __user *buf, size_t count, unsigned long pfn,
if (ret >= count)
goto err;
+ if (page_ext->last_migrate_reason != -1) {
+ ret += snprintf(kbuf + ret, count - ret,
+ "Page has been migrated, last migrate reason: %s\n",
+ migrate_reason_names[page_ext->last_migrate_reason]);
+ if (ret >= count)
+ goto err;
+ }
+
ret += snprintf(kbuf + ret, count - ret, "\n");
if (ret >= count)
goto err;
--
2.6.3
It would be useful to convert gfp_flags into string representation when
printing them in case of allocation failure, OOM etc. There's a script
./scripts/gfp-translate to make this simpler, but it needs the matching version
of the sources to be accurate, and the flags have been undergoing some changes
recently.
The ftrace framework already has this translation in the form of
show_gfp_flags() defined in include/trace/events/gfpflags.h which defines the
translation table internally. Allow reusing the table outside ftrace by putting
it behind __def_gfpflag_names definition and introduce dump_gfpflag_names() to
handle the printing.
While at it, also fill in the names for the flags and flag combinations that
have been missing in the table. GFP_NOWAIT no longer equals to "no flags", so
change the output for no flags to "none".
Signed-off-by: Vlastimil Babka <[email protected]>
---
include/linux/mmdebug.h | 1 +
include/trace/events/gfpflags.h | 14 +++++++++++---
mm/debug.c | 10 ++++++++++
3 files changed, 22 insertions(+), 3 deletions(-)
diff --git a/include/linux/mmdebug.h b/include/linux/mmdebug.h
index c447d80..3b77fab 100644
--- a/include/linux/mmdebug.h
+++ b/include/linux/mmdebug.h
@@ -10,6 +10,7 @@ struct mm_struct;
extern void dump_page(struct page *page, const char *reason);
extern void dump_page_badflags(struct page *page, const char *reason,
unsigned long badflags);
+extern void dump_gfpflag_names(unsigned long gfp_flags);
void dump_vma(const struct vm_area_struct *vma);
void dump_mm(const struct mm_struct *mm);
diff --git a/include/trace/events/gfpflags.h b/include/trace/events/gfpflags.h
index dde6bf0..3d580fd 100644
--- a/include/trace/events/gfpflags.h
+++ b/include/trace/events/gfpflags.h
@@ -8,8 +8,8 @@
*
* Thus most bits set go first.
*/
-#define show_gfp_flags(flags) \
- (flags) ? __print_flags(flags, "|", \
+
+#define __def_gfpflag_names \
{(unsigned long)GFP_TRANSHUGE, "GFP_TRANSHUGE"}, \
{(unsigned long)GFP_HIGHUSER_MOVABLE, "GFP_HIGHUSER_MOVABLE"}, \
{(unsigned long)GFP_HIGHUSER, "GFP_HIGHUSER"}, \
@@ -19,9 +19,13 @@
{(unsigned long)GFP_NOFS, "GFP_NOFS"}, \
{(unsigned long)GFP_ATOMIC, "GFP_ATOMIC"}, \
{(unsigned long)GFP_NOIO, "GFP_NOIO"}, \
+ {(unsigned long)GFP_NOWAIT, "GFP_NOWAIT"}, \
+ {(unsigned long)__GFP_DMA, "GFP_DMA"}, \
+ {(unsigned long)__GFP_DMA32, "GFP_DMA32"}, \
{(unsigned long)__GFP_HIGH, "GFP_HIGH"}, \
{(unsigned long)__GFP_ATOMIC, "GFP_ATOMIC"}, \
{(unsigned long)__GFP_IO, "GFP_IO"}, \
+ {(unsigned long)__GFP_FS, "GFP_FS"}, \
{(unsigned long)__GFP_COLD, "GFP_COLD"}, \
{(unsigned long)__GFP_NOWARN, "GFP_NOWARN"}, \
{(unsigned long)__GFP_REPEAT, "GFP_REPEAT"}, \
@@ -36,8 +40,12 @@
{(unsigned long)__GFP_RECLAIMABLE, "GFP_RECLAIMABLE"}, \
{(unsigned long)__GFP_MOVABLE, "GFP_MOVABLE"}, \
{(unsigned long)__GFP_NOTRACK, "GFP_NOTRACK"}, \
+ {(unsigned long)__GFP_WRITE, "GFP_WRITE"}, \
{(unsigned long)__GFP_DIRECT_RECLAIM, "GFP_DIRECT_RECLAIM"}, \
{(unsigned long)__GFP_KSWAPD_RECLAIM, "GFP_KSWAPD_RECLAIM"}, \
{(unsigned long)__GFP_OTHER_NODE, "GFP_OTHER_NODE"} \
- ) : "GFP_NOWAIT"
+#define show_gfp_flags(flags) \
+ (flags) ? __print_flags(flags, "|", \
+ __def_gfpflag_names \
+ ) : "none"
diff --git a/mm/debug.c b/mm/debug.c
index d9718fc..1a71a3b 100644
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -9,6 +9,7 @@
#include <linux/mm.h>
#include <linux/trace_events.h>
#include <linux/memcontrol.h>
+#include <trace/events/gfpflags.h>
static const struct trace_print_flags pageflag_names[] = {
{1UL << PG_locked, "locked" },
@@ -46,6 +47,10 @@ static const struct trace_print_flags pageflag_names[] = {
#endif
};
+static const struct trace_print_flags gfpflag_names[] = {
+ __def_gfpflag_names
+};
+
static void dump_flag_names(unsigned long flags,
const struct trace_print_flags *names, int count)
{
@@ -73,6 +78,11 @@ static void dump_flag_names(unsigned long flags,
pr_cont(")\n");
}
+void dump_gfpflag_names(unsigned long gfp_flags)
+{
+ dump_flag_names(gfp_flags, gfpflag_names, ARRAY_SIZE(gfpflag_names));
+}
+
void dump_page_badflags(struct page *page, const char *reason,
unsigned long badflags)
{
--
2.6.3
The page_owner mechanism is useful for dealing with memory leaks. By reading
/sys/kernel/debug/page_owner one can determine the stack traces leading to
allocations of all pages, and find e.g. a buggy driver.
This information might be also potentially useful for debugging, such as the
VM_BUG_ON_PAGE() calls to dump_page(). So let's print the stored info from
dump_page().
Example output:
page:ffffea0002868a00 count:1 mapcount:0 mapping:ffff8800bba8e958 index:0x63a22c
flags: 0x1fffff80000060(lru|active)
page dumped because: VM_BUG_ON_PAGE(1)
page->mem_cgroup:ffff880138efdc00
page allocated via order 0, migratetype Movable, gfp_mask 0x2420848(GFP_NOFS|GFP_NOFAIL|GFP_HARDWALL|GFP_MOVABLE)
[<ffffffff81164e8a>] __alloc_pages_nodemask+0x15a/0xa30
[<ffffffff811ab808>] alloc_pages_current+0x88/0x120
[<ffffffff8115bc36>] __page_cache_alloc+0xe6/0x120
[<ffffffff8115c226>] pagecache_get_page+0x56/0x200
[<ffffffff812058c2>] __getblk_slow+0xd2/0x2b0
[<ffffffff81205ae0>] __getblk_gfp+0x40/0x50
[<ffffffffa0283abe>] jbd2_journal_get_descriptor_buffer+0x3e/0x90 [jbd2]
[<ffffffffa027c793>] jbd2_journal_commit_transaction+0x8e3/0x1870 [jbd2]
page has been migrated, last migrate reason: compaction
Signed-off-by: Vlastimil Babka <[email protected]>
---
include/linux/page_owner.h | 9 +++++++++
mm/debug.c | 2 ++
mm/page_alloc.c | 1 +
mm/page_owner.c | 25 +++++++++++++++++++++++++
4 files changed, 37 insertions(+)
diff --git a/include/linux/page_owner.h b/include/linux/page_owner.h
index 555893b..46f1b93 100644
--- a/include/linux/page_owner.h
+++ b/include/linux/page_owner.h
@@ -13,6 +13,7 @@ extern void __set_page_owner(struct page *page,
extern gfp_t __get_page_owner_gfp(struct page *page);
extern void __copy_page_owner(struct page *oldpage, struct page *newpage);
extern void __set_page_owner_migrate_reason(struct page *page, int reason);
+extern void __dump_page_owner(struct page *page);
static inline void reset_page_owner(struct page *page, unsigned int order)
{
@@ -44,6 +45,11 @@ static inline void set_page_owner_migrate_reason(struct page *page, int reason)
if (static_branch_unlikely(&page_owner_inited))
__set_page_owner_migrate_reason(page, reason);
}
+static inline void dump_page_owner(struct page *page)
+{
+ if (static_branch_unlikely(&page_owner_inited))
+ __dump_page_owner(page);
+}
#else
static inline void reset_page_owner(struct page *page, unsigned int order)
{
@@ -62,5 +68,8 @@ static inline void copy_page_owner(struct page *oldpage, struct page *newpage)
static inline void set_page_owner_migrate_reason(struct page *page, int reason)
{
}
+static inline void dump_page_owner(struct page *page)
+{
+}
#endif /* CONFIG_PAGE_OWNER */
#endif /* __LINUX_PAGE_OWNER_H */
diff --git a/mm/debug.c b/mm/debug.c
index 1a71a3b..c3300ee 100644
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -10,6 +10,7 @@
#include <linux/trace_events.h>
#include <linux/memcontrol.h>
#include <trace/events/gfpflags.h>
+#include <linux/page_owner.h>
static const struct trace_print_flags pageflag_names[] = {
{1UL << PG_locked, "locked" },
@@ -118,6 +119,7 @@ void dump_page_badflags(struct page *page, const char *reason,
void dump_page(struct page *page, const char *reason)
{
dump_page_badflags(page, reason, 0);
+ dump_page_owner(page);
}
EXPORT_SYMBOL(dump_page);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 61a023a..f806a1a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -448,6 +448,7 @@ static void bad_page(struct page *page, const char *reason,
printk(KERN_ALERT "BUG: Bad page state in process %s pfn:%05lx\n",
current->comm, page_to_pfn(page));
dump_page_badflags(page, reason, bad_flags);
+ dump_page_owner(page);
print_modules();
dump_stack();
diff --git a/mm/page_owner.c b/mm/page_owner.c
index 59fd6f1..a81cfa0 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -193,6 +193,31 @@ print_page_owner(char __user *buf, size_t count, unsigned long pfn,
return -ENOMEM;
}
+void __dump_page_owner(struct page *page)
+{
+ struct page_ext *page_ext = lookup_page_ext(page);
+ struct stack_trace trace = {
+ .nr_entries = page_ext->nr_entries,
+ .entries = &page_ext->trace_entries[0],
+ };
+ gfp_t gfp_mask = page_ext->gfp_mask;
+ int mt = gfpflags_to_migratetype(gfp_mask);
+
+ if (!test_bit(PAGE_EXT_OWNER, &page_ext->flags)) {
+ pr_alert("page_owner info is not active (free page?)\n");
+ return;
+ }
+ ;
+ pr_alert("page allocated via order %u, migratetype %s, gfp_mask 0x%x",
+ page_ext->order, migratetype_names[mt], gfp_mask);
+ dump_gfpflag_names(gfp_mask);
+ print_stack_trace(&trace, 0);
+
+ if (page_ext->last_migrate_reason != -1)
+ pr_alert("page has been migrated, last migrate reason: %s\n",
+ migrate_reason_names[page_ext->last_migrate_reason]);
+}
+
static ssize_t
read_page_owner(struct file *file, char __user *buf, size_t count, loff_t *ppos)
{
--
2.6.3
It would be useful to translate gfp_flags into string representation when
printing in case of an allocation failure, especially as the flags have been
undergoing some changes recently and the script ./scripts/gfp-translate needs
a matching source version to be accurate.
Example output:
stapio: page allocation failure: order:9, mode:0x2080020(GFP_ATOMIC)
Signed-off-by: Vlastimil Babka <[email protected]>
---
mm/page_alloc.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f806a1a..80349ac 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2711,9 +2711,9 @@ void warn_alloc_failed(gfp_t gfp_mask, unsigned int order, const char *fmt, ...)
va_end(args);
}
- pr_warn("%s: page allocation failure: order:%u, mode:0x%x\n",
+ pr_warn("%s: page allocation failure: order:%u, mode:0x%x",
current->comm, order, gfp_mask);
-
+ dump_gfpflag_names(gfp_mask);
dump_stack();
if (!should_suppress_show_mem())
show_mem(filter);
--
2.6.3
It would be useful to translate gfp_flags into string representation when
printing in case of an OOM, especially as the flags have been undergoing some
changes recently and the script ./scripts/gfp-translate needs a matching source
version to be accurate.
Example output:
a.out invoked oom-killer: order=0, oom_score_adj=0, gfp_mask=0x24280ca(GFP_HIGHUSER_MOVABLE|GFP_ZERO)
Signed-off-by: Vlastimil Babka <[email protected]>
---
mm/oom_kill.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 5314b20..542d56c 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -386,10 +386,12 @@ static void dump_tasks(struct mem_cgroup *memcg, const nodemask_t *nodemask)
static void dump_header(struct oom_control *oc, struct task_struct *p,
struct mem_cgroup *memcg)
{
- pr_warning("%s invoked oom-killer: gfp_mask=0x%x, order=%d, "
- "oom_score_adj=%hd\n",
- current->comm, oc->gfp_mask, oc->order,
- current->signal->oom_score_adj);
+ pr_warning("%s invoked oom-killer: order=%d, oom_score_adj=%hd, "
+ "gfp_mask=0x%x",
+ current->comm, oc->order, current->signal->oom_score_adj,
+ oc->gfp_mask);
+ dump_gfpflag_names(oc->gfp_mask);
+
cpuset_print_current_mems_allowed();
dump_stack();
if (memcg)
--
2.6.3
On Tue, Nov 24, 2015 at 01:36:14PM +0100, Vlastimil Babka wrote:
> The information in /sys/kernel/debug/page_owner includes the migratetype of
> the pageblock the page belongs to. This is also checked against the page's
> migratetype (as declared by gfp_flags during its allocation), and the page is
> reported as Fallback if its migratetype differs from the pageblock's one.
>
> This is somewhat misleading because in fact fallback allocation is not the only
> reason why these two can differ. It also doesn't direcly provide the page's
> migratetype, although it's possible to derive that from the gfp_flags.
>
> It's arguably better to print both page and pageblock's migratetype and leave
> the interpretation to the consumer than to suggest fallback allocation as the
> only possible reason. While at it, we can print the migratetypes as string
> the same way as /proc/pagetypeinfo does, as some of the numeric values depend
> on kernel configuration. For that, this patch moves the migratetype_names
> array from #ifdef CONFIG_PROC_FS part of mm/vmstat.c to mm/page_alloc.c and
> exports it.
>
> Example page_owner entry after the patch:
>
> Page allocated via order 0, mask 0x2420848
> PFN 512 type Reclaimable Block 1 type Reclaimable Flags R LA
> [<ffffffff81164e8a>] __alloc_pages_nodemask+0x15a/0xa30
> [<ffffffff811ab808>] alloc_pages_current+0x88/0x120
> [<ffffffff8115bc36>] __page_cache_alloc+0xe6/0x120
> [<ffffffff8115c226>] pagecache_get_page+0x56/0x200
> [<ffffffff81205892>] __getblk_slow+0xd2/0x2b0
> [<ffffffff81205ab0>] __getblk_gfp+0x40/0x50
> [<ffffffff81206ad7>] __breadahead+0x17/0x50
> [<ffffffffa0437b27>] __ext4_get_inode_loc+0x397/0x3e0 [ext4]
>
> Signed-off-by: Vlastimil Babka <[email protected]>
> ---
> include/linux/mmzone.h | 3 +++
> mm/page_alloc.c | 13 +++++++++++++
> mm/page_owner.c | 6 +++---
> mm/vmstat.c | 13 -------------
> 4 files changed, 19 insertions(+), 16 deletions(-)
>
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 3b6fb71..2bfad18 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -63,6 +63,9 @@ enum {
> MIGRATE_TYPES
> };
>
> +/* In mm/page_alloc.c; keep in sync also with show_migration_types() there */
> +extern char * const migratetype_names[MIGRATE_TYPES];
> +
> #ifdef CONFIG_CMA
> # define is_migrate_cma(migratetype) unlikely((migratetype) == MIGRATE_CMA)
> #else
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 35ab351..61a023a 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -229,6 +229,19 @@ static char * const zone_names[MAX_NR_ZONES] = {
> #endif
> };
>
> +char * const migratetype_names[MIGRATE_TYPES] = {
> + "Unmovable",
> + "Movable",
> + "Reclaimable",
> + "HighAtomic",
> +#ifdef CONFIG_CMA
> + "CMA",
> +#endif
> +#ifdef CONFIG_MEMORY_ISOLATION
> + "Isolate",
> +#endif
> +};
> +
> compound_page_dtor * const compound_page_dtors[] = {
> NULL,
> free_compound_page,
> diff --git a/mm/page_owner.c b/mm/page_owner.c
> index 983c3a1..f35826e 100644
> --- a/mm/page_owner.c
> +++ b/mm/page_owner.c
> @@ -110,11 +110,11 @@ print_page_owner(char __user *buf, size_t count, unsigned long pfn,
> pageblock_mt = get_pfnblock_migratetype(page, pfn);
> page_mt = gfpflags_to_migratetype(page_ext->gfp_mask);
> ret += snprintf(kbuf + ret, count - ret,
> - "PFN %lu Block %lu type %d %s Flags %s%s%s%s%s%s%s%s%s%s%s%s\n",
> + "PFN %lu type %s Block %lu type %s Flags %s%s%s%s%s%s%s%s%s%s%s%s\n",
How about generalizing dump_flag_names() more and using it here?
This output is neat than dump_flag_names() but not complete.
Thanks.
On Tue, Nov 24, 2015 at 01:36:17PM +0100, Vlastimil Babka wrote:
> During migration, page_owner info is now copied with the rest of the page, so
> the stacktrace leading to free page allocation during migration is overwritten.
> For debugging purposes, it might be however useful to know that the page has
> been migrated since its initial allocation. This might happen many times during
> the lifetime for different reasons and fully tracking this, especially with
> stacktraces would incur extra memory costs. As a compromise, store and print
> the migrate_reason of the last migration that occured to the page. This is
> enough to distinguish compaction, numa balancing etc.
>
> Example page_owner entry after the patch:
>
> Page allocated via order 0, mask 0x24280ca
> PFN 669757 type Reclaimable Block 1308 type Reclaimable Flags UDLA
> [<ffffffff81164e8a>] __alloc_pages_nodemask+0x15a/0xa30
> [<ffffffff811ad435>] alloc_pages_vma+0xb5/0x250
> [<ffffffff8118ba54>] handle_mm_fault+0x1304/0x1820
> [<ffffffff81051213>] __do_page_fault+0x183/0x3f0
> [<ffffffff810514a2>] do_page_fault+0x22/0x30
> [<ffffffff81573ba8>] page_fault+0x28/0x30
> [<ffffffffffffffff>] 0xffffffffffffffff
> Page has been migrated, last migrate reason: compaction
>
> Signed-off-by: Vlastimil Babka <[email protected]>
> ---
> include/linux/migrate.h | 6 +++++-
> include/linux/page_ext.h | 1 +
> include/linux/page_owner.h | 9 +++++++++
> mm/migrate.c | 20 +++++++++++++++++---
> mm/page_owner.c | 17 +++++++++++++++++
> 5 files changed, 49 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/migrate.h b/include/linux/migrate.h
> index cac1c09..ab92a8c 100644
> --- a/include/linux/migrate.h
> +++ b/include/linux/migrate.h
> @@ -23,9 +23,13 @@ enum migrate_reason {
> MR_SYSCALL, /* also applies to cpusets */
> MR_MEMPOLICY_MBIND,
> MR_NUMA_MISPLACED,
> - MR_CMA
> + MR_CMA,
> + MR_TYPES
> };
>
> +/* In mm/migrate.c; also keep sync with include/trace/events/migrate.h */
> +extern char * migrate_reason_names[MR_TYPES];
> +
> #ifdef CONFIG_MIGRATION
>
> extern void putback_movable_pages(struct list_head *l);
> diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h
> index 17f118a..e1fe7cf 100644
> --- a/include/linux/page_ext.h
> +++ b/include/linux/page_ext.h
> @@ -45,6 +45,7 @@ struct page_ext {
> unsigned int order;
> gfp_t gfp_mask;
> unsigned int nr_entries;
> + int last_migrate_reason;
> unsigned long trace_entries[8];
> #endif
> };
> diff --git a/include/linux/page_owner.h b/include/linux/page_owner.h
> index 6440daa..555893b 100644
> --- a/include/linux/page_owner.h
> +++ b/include/linux/page_owner.h
> @@ -12,6 +12,7 @@ extern void __set_page_owner(struct page *page,
> unsigned int order, gfp_t gfp_mask);
> extern gfp_t __get_page_owner_gfp(struct page *page);
> extern void __copy_page_owner(struct page *oldpage, struct page *newpage);
> +extern void __set_page_owner_migrate_reason(struct page *page, int reason);
>
> static inline void reset_page_owner(struct page *page, unsigned int order)
> {
> @@ -38,6 +39,11 @@ static inline void copy_page_owner(struct page *oldpage, struct page *newpage)
> if (static_branch_unlikely(&page_owner_inited))
> __copy_page_owner(oldpage, newpage);
> }
> +static inline void set_page_owner_migrate_reason(struct page *page, int reason)
> +{
> + if (static_branch_unlikely(&page_owner_inited))
> + __set_page_owner_migrate_reason(page, reason);
> +}
> #else
> static inline void reset_page_owner(struct page *page, unsigned int order)
> {
> @@ -53,5 +59,8 @@ static inline gfp_t get_page_owner_gfp(struct page *page)
> static inline void copy_page_owner(struct page *oldpage, struct page *newpage)
> {
> }
> +static inline void set_page_owner_migrate_reason(struct page *page, int reason)
> +{
> +}
> #endif /* CONFIG_PAGE_OWNER */
> #endif /* __LINUX_PAGE_OWNER_H */
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 863a0f1..12e9ab9 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -47,6 +47,16 @@
>
> #include "internal.h"
>
> +char *migrate_reason_names[MR_TYPES] = {
> + "compaction",
> + "memory_failure",
> + "memory_hotplug",
> + "syscall_or_cpuset",
> + "mempolicy_mbind",
> + "numa_misplaced",
> + "cma",
> +};
> +
> /*
> * migrate_prep() needs to be called before we start compiling a list of pages
> * to be migrated using isolate_lru_page(). If scheduling work on other CPUs is
> @@ -955,8 +965,10 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page,
> }
>
> rc = __unmap_and_move(page, newpage, force, mode);
> - if (rc == MIGRATEPAGE_SUCCESS)
> + if (rc == MIGRATEPAGE_SUCCESS) {
> put_new_page = NULL;
> + set_page_owner_migrate_reason(newpage, reason);
> + }
>
> out:
> if (rc != -EAGAIN) {
> @@ -1021,7 +1033,7 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page,
> static int unmap_and_move_huge_page(new_page_t get_new_page,
> free_page_t put_new_page, unsigned long private,
> struct page *hpage, int force,
> - enum migrate_mode mode)
> + enum migrate_mode mode, int reason)
> {
> int rc = -EAGAIN;
> int *result = NULL;
> @@ -1079,6 +1091,7 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
> if (rc == MIGRATEPAGE_SUCCESS) {
> hugetlb_cgroup_migrate(hpage, new_hpage);
> put_new_page = NULL;
> + set_page_owner_migrate_reason(new_hpage, reason);
> }
>
> unlock_page(hpage);
> @@ -1151,7 +1164,7 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
> if (PageHuge(page))
> rc = unmap_and_move_huge_page(get_new_page,
> put_new_page, private, page,
> - pass > 2, mode);
> + pass > 2, mode, reason);
> else
> rc = unmap_and_move(get_new_page, put_new_page,
> private, page, pass > 2, mode,
> @@ -1842,6 +1855,7 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
> set_page_memcg(new_page, page_memcg(page));
> set_page_memcg(page, NULL);
> page_remove_rmap(page, true);
> + set_page_owner_migrate_reason(new_page, MR_NUMA_MISPLACED);
>
> spin_unlock(ptl);
> mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
> diff --git a/mm/page_owner.c b/mm/page_owner.c
> index f571e55..59fd6f1 100644
> --- a/mm/page_owner.c
> +++ b/mm/page_owner.c
> @@ -6,6 +6,7 @@
> #include <linux/stacktrace.h>
> #include <linux/page_owner.h>
> #include <linux/jump_label.h>
> +#include <linux/migrate.h>
> #include "internal.h"
>
> static bool page_owner_disabled = true;
> @@ -73,10 +74,18 @@ void __set_page_owner(struct page *page, unsigned int order, gfp_t gfp_mask)
> page_ext->order = order;
> page_ext->gfp_mask = gfp_mask;
> page_ext->nr_entries = trace.nr_entries;
> + page_ext->last_migrate_reason = -1;
>
> __set_bit(PAGE_EXT_OWNER, &page_ext->flags);
> }
>
> +void __set_page_owner_migrate_reason(struct page *page, int reason)
> +{
> + struct page_ext *page_ext = lookup_page_ext(page);
> +
> + page_ext->last_migrate_reason = reason;
> +}
> +
> gfp_t __get_page_owner_gfp(struct page *page)
> {
> struct page_ext *page_ext = lookup_page_ext(page);
> @@ -161,6 +170,14 @@ print_page_owner(char __user *buf, size_t count, unsigned long pfn,
> if (ret >= count)
> goto err;
>
> + if (page_ext->last_migrate_reason != -1) {
> + ret += snprintf(kbuf + ret, count - ret,
> + "Page has been migrated, last migrate reason: %s\n",
> + migrate_reason_names[page_ext->last_migrate_reason]);
> + if (ret >= count)
> + goto err;
> + }
> +
migrate_reason_names is defined if CONFIG_MIGRATION is enabled so
it would cause build failure in case of !CONFIG_MIGRATION and
CONFIG_PAGE_OWNER.
Thanks.
On Tue, Nov 24, 2015 at 01:36:18PM +0100, Vlastimil Babka wrote:
> It would be useful to convert gfp_flags into string representation when
> printing them in case of allocation failure, OOM etc. There's a script
> ./scripts/gfp-translate to make this simpler, but it needs the matching version
> of the sources to be accurate, and the flags have been undergoing some changes
> recently.
>
> The ftrace framework already has this translation in the form of
> show_gfp_flags() defined in include/trace/events/gfpflags.h which defines the
> translation table internally. Allow reusing the table outside ftrace by putting
> it behind __def_gfpflag_names definition and introduce dump_gfpflag_names() to
> handle the printing.
>
> While at it, also fill in the names for the flags and flag combinations that
> have been missing in the table. GFP_NOWAIT no longer equals to "no flags", so
> change the output for no flags to "none".
>
> Signed-off-by: Vlastimil Babka <[email protected]>
> ---
> include/linux/mmdebug.h | 1 +
> include/trace/events/gfpflags.h | 14 +++++++++++---
> mm/debug.c | 10 ++++++++++
> 3 files changed, 22 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/mmdebug.h b/include/linux/mmdebug.h
> index c447d80..3b77fab 100644
> --- a/include/linux/mmdebug.h
> +++ b/include/linux/mmdebug.h
> @@ -10,6 +10,7 @@ struct mm_struct;
> extern void dump_page(struct page *page, const char *reason);
> extern void dump_page_badflags(struct page *page, const char *reason,
> unsigned long badflags);
> +extern void dump_gfpflag_names(unsigned long gfp_flags);
> void dump_vma(const struct vm_area_struct *vma);
> void dump_mm(const struct mm_struct *mm);
>
> diff --git a/include/trace/events/gfpflags.h b/include/trace/events/gfpflags.h
> index dde6bf0..3d580fd 100644
> --- a/include/trace/events/gfpflags.h
> +++ b/include/trace/events/gfpflags.h
> @@ -8,8 +8,8 @@
> *
> * Thus most bits set go first.
> */
> -#define show_gfp_flags(flags) \
> - (flags) ? __print_flags(flags, "|", \
> +
> +#define __def_gfpflag_names \
> {(unsigned long)GFP_TRANSHUGE, "GFP_TRANSHUGE"}, \
> {(unsigned long)GFP_HIGHUSER_MOVABLE, "GFP_HIGHUSER_MOVABLE"}, \
> {(unsigned long)GFP_HIGHUSER, "GFP_HIGHUSER"}, \
> @@ -19,9 +19,13 @@
> {(unsigned long)GFP_NOFS, "GFP_NOFS"}, \
> {(unsigned long)GFP_ATOMIC, "GFP_ATOMIC"}, \
> {(unsigned long)GFP_NOIO, "GFP_NOIO"}, \
> + {(unsigned long)GFP_NOWAIT, "GFP_NOWAIT"}, \
> + {(unsigned long)__GFP_DMA, "GFP_DMA"}, \
> + {(unsigned long)__GFP_DMA32, "GFP_DMA32"}, \
> {(unsigned long)__GFP_HIGH, "GFP_HIGH"}, \
> {(unsigned long)__GFP_ATOMIC, "GFP_ATOMIC"}, \
> {(unsigned long)__GFP_IO, "GFP_IO"}, \
> + {(unsigned long)__GFP_FS, "GFP_FS"}, \
> {(unsigned long)__GFP_COLD, "GFP_COLD"}, \
> {(unsigned long)__GFP_NOWARN, "GFP_NOWARN"}, \
> {(unsigned long)__GFP_REPEAT, "GFP_REPEAT"}, \
> @@ -36,8 +40,12 @@
> {(unsigned long)__GFP_RECLAIMABLE, "GFP_RECLAIMABLE"}, \
> {(unsigned long)__GFP_MOVABLE, "GFP_MOVABLE"}, \
> {(unsigned long)__GFP_NOTRACK, "GFP_NOTRACK"}, \
> + {(unsigned long)__GFP_WRITE, "GFP_WRITE"}, \
> {(unsigned long)__GFP_DIRECT_RECLAIM, "GFP_DIRECT_RECLAIM"}, \
> {(unsigned long)__GFP_KSWAPD_RECLAIM, "GFP_KSWAPD_RECLAIM"}, \
> {(unsigned long)__GFP_OTHER_NODE, "GFP_OTHER_NODE"} \
> - ) : "GFP_NOWAIT"
>
> +#define show_gfp_flags(flags) \
> + (flags) ? __print_flags(flags, "|", \
> + __def_gfpflag_names \
> + ) : "none"
How about moving this to gfp.h or something?
Now, we use it in out of tracepoints so there is no need to keep it
in include/trace/events/xxx.
Thanks.
> diff --git a/mm/debug.c b/mm/debug.c
> index d9718fc..1a71a3b 100644
> --- a/mm/debug.c
> +++ b/mm/debug.c
> @@ -9,6 +9,7 @@
> #include <linux/mm.h>
> #include <linux/trace_events.h>
> #include <linux/memcontrol.h>
> +#include <trace/events/gfpflags.h>
>
> static const struct trace_print_flags pageflag_names[] = {
> {1UL << PG_locked, "locked" },
> @@ -46,6 +47,10 @@ static const struct trace_print_flags pageflag_names[] = {
> #endif
> };
>
> +static const struct trace_print_flags gfpflag_names[] = {
> + __def_gfpflag_names
> +};
> +
> static void dump_flag_names(unsigned long flags,
> const struct trace_print_flags *names, int count)
> {
> @@ -73,6 +78,11 @@ static void dump_flag_names(unsigned long flags,
> pr_cont(")\n");
> }
>
> +void dump_gfpflag_names(unsigned long gfp_flags)
> +{
> + dump_flag_names(gfp_flags, gfpflag_names, ARRAY_SIZE(gfpflag_names));
> +}
> +
> void dump_page_badflags(struct page *page, const char *reason,
> unsigned long badflags)
> {
> --
> 2.6.3
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
On 11/25/2015 09:16 AM, Joonsoo Kim wrote:
> On Tue, Nov 24, 2015 at 01:36:18PM +0100, Vlastimil Babka wrote:
>> --- a/include/trace/events/gfpflags.h
>> +++ b/include/trace/events/gfpflags.h
>> @@ -8,8 +8,8 @@
>> *
>> * Thus most bits set go first.
>> */
>> -#define show_gfp_flags(flags) \
>> - (flags) ? __print_flags(flags, "|", \
>> +
>> +#define __def_gfpflag_names \
>> {(unsigned long)GFP_TRANSHUGE, "GFP_TRANSHUGE"}, \
>> {(unsigned long)GFP_HIGHUSER_MOVABLE, "GFP_HIGHUSER_MOVABLE"}, \
>> {(unsigned long)GFP_HIGHUSER, "GFP_HIGHUSER"}, \
>> @@ -19,9 +19,13 @@
>> {(unsigned long)GFP_NOFS, "GFP_NOFS"}, \
>> {(unsigned long)GFP_ATOMIC, "GFP_ATOMIC"}, \
>> {(unsigned long)GFP_NOIO, "GFP_NOIO"}, \
>> + {(unsigned long)GFP_NOWAIT, "GFP_NOWAIT"}, \
>> + {(unsigned long)__GFP_DMA, "GFP_DMA"}, \
>> + {(unsigned long)__GFP_DMA32, "GFP_DMA32"}, \
>> {(unsigned long)__GFP_HIGH, "GFP_HIGH"}, \
>> {(unsigned long)__GFP_ATOMIC, "GFP_ATOMIC"}, \
>> {(unsigned long)__GFP_IO, "GFP_IO"}, \
>> + {(unsigned long)__GFP_FS, "GFP_FS"}, \
>> {(unsigned long)__GFP_COLD, "GFP_COLD"}, \
>> {(unsigned long)__GFP_NOWARN, "GFP_NOWARN"}, \
>> {(unsigned long)__GFP_REPEAT, "GFP_REPEAT"}, \
>> @@ -36,8 +40,12 @@
>> {(unsigned long)__GFP_RECLAIMABLE, "GFP_RECLAIMABLE"}, \
>> {(unsigned long)__GFP_MOVABLE, "GFP_MOVABLE"}, \
>> {(unsigned long)__GFP_NOTRACK, "GFP_NOTRACK"}, \
>> + {(unsigned long)__GFP_WRITE, "GFP_WRITE"}, \
>> {(unsigned long)__GFP_DIRECT_RECLAIM, "GFP_DIRECT_RECLAIM"}, \
>> {(unsigned long)__GFP_KSWAPD_RECLAIM, "GFP_KSWAPD_RECLAIM"}, \
>> {(unsigned long)__GFP_OTHER_NODE, "GFP_OTHER_NODE"} \
>> - ) : "GFP_NOWAIT"
>>
>> +#define show_gfp_flags(flags) \
>> + (flags) ? __print_flags(flags, "|", \
>> + __def_gfpflag_names \
>> + ) : "none"
>
> How about moving this to gfp.h or something?
> Now, we use it in out of tracepoints so there is no need to keep it
> in include/trace/events/xxx.
Hm I didn't want to pollute such widely included header with such defines. And
show_gfp_flags shouldn't be there definitely as it depends on __print_flags.
What do others think?
On Tue 24-11-15 13:36:12, Vlastimil Babka wrote:
[...]
> For the last point, Kirill requested a human readable printing of gfp_mask and
> migratetype after v1. At that point it probably makes a lot of sense to do the
> same for page alloc failure and OOM warnings. The flags have been undergoing
> revisions recently, and we might be getting reports from various kernel
> versions that differ. The ./scripts/gfp-translate tool needs to be pointed at
> the corresponding sources to be accurate. The downside is potentially breaking
> scripts that grep these warnings, but it's not a first change done there over
> the years.
Yes this is very helpful! Thanks for doing this.
> Note I'm not entirely happy about the dump_gfpflag_names() implementation, due
> to usage of pr_cont() unreliable on SMP (and I've seen spurious newlines in
> dmesg output, while being correct on serial console or /var/log/messages).
> It also doesn't allow plugging the gfp_mask translation into
> /sys/kernel/debug/page_owner where it also could make sense. Maybe a new
> *printf formatting flag?
I wouldn't object. gfp_mask has its own "type" so having a specific
formatter sounds like a good idea to me.
--
Michal Hocko
SUSE Labs
On Tue 24-11-15 13:36:21, Vlastimil Babka wrote:
> It would be useful to translate gfp_flags into string representation when
> printing in case of an OOM, especially as the flags have been undergoing some
> changes recently and the script ./scripts/gfp-translate needs a matching source
> version to be accurate.
>
> Example output:
>
> a.out invoked oom-killer: order=0, oom_score_adj=0, gfp_mask=0x24280ca(GFP_HIGHUSER_MOVABLE|GFP_ZERO)
I like this _very much_
> Signed-off-by: Vlastimil Babka <[email protected]>
If this can be done with a printk formatter it would be even nicer but
this is good enough for the OOM purpose.
Acked-by: Michal Hocko <[email protected]>
> ---
> mm/oom_kill.c | 10 ++++++----
> 1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 5314b20..542d56c 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -386,10 +386,12 @@ static void dump_tasks(struct mem_cgroup *memcg, const nodemask_t *nodemask)
> static void dump_header(struct oom_control *oc, struct task_struct *p,
> struct mem_cgroup *memcg)
> {
> - pr_warning("%s invoked oom-killer: gfp_mask=0x%x, order=%d, "
> - "oom_score_adj=%hd\n",
> - current->comm, oc->gfp_mask, oc->order,
> - current->signal->oom_score_adj);
> + pr_warning("%s invoked oom-killer: order=%d, oom_score_adj=%hd, "
> + "gfp_mask=0x%x",
> + current->comm, oc->order, current->signal->oom_score_adj,
> + oc->gfp_mask);
> + dump_gfpflag_names(oc->gfp_mask);
> +
> cpuset_print_current_mems_allowed();
> dump_stack();
> if (memcg)
> --
> 2.6.3
--
Michal Hocko
SUSE Labs
On Tue 24-11-15 13:36:20, Vlastimil Babka wrote:
> It would be useful to translate gfp_flags into string representation when
> printing in case of an allocation failure, especially as the flags have been
> undergoing some changes recently and the script ./scripts/gfp-translate needs
> a matching source version to be accurate.
>
> Example output:
>
> stapio: page allocation failure: order:9, mode:0x2080020(GFP_ATOMIC)
I like this _very much_
> Signed-off-by: Vlastimil Babka <[email protected]>
If this can be done with a printk formatter it would be even nicer but
this is good enough for the OOM purpose.
Acked-by: Michal Hocko <[email protected]>
> ---
> mm/page_alloc.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index f806a1a..80349ac 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2711,9 +2711,9 @@ void warn_alloc_failed(gfp_t gfp_mask, unsigned int order, const char *fmt, ...)
> va_end(args);
> }
>
> - pr_warn("%s: page allocation failure: order:%u, mode:0x%x\n",
> + pr_warn("%s: page allocation failure: order:%u, mode:0x%x",
> current->comm, order, gfp_mask);
> -
> + dump_gfpflag_names(gfp_mask);
> dump_stack();
> if (!should_suppress_show_mem())
> show_mem(filter);
> --
> 2.6.3
--
Michal Hocko
SUSE Labs
On Tue 24-11-15 13:36:15, Vlastimil Babka wrote:
> CONFIG_PAGE_OWNER attempts to impose negligible runtime overhead when enabled
> during compilation, but not actually enabled during runtime by boot param
> page_owner=on. This overhead can be further reduced using the static key
> mechanism, which this patch does.
Is this really worth doing? If we do not have jump labels then the check
will be atomic rather than a simple access, so it would be more costly,
no? Or am I missing something?
--
Michal Hocko
SUSE Labs
On Tue 24-11-15 13:36:19, Vlastimil Babka wrote:
> The page_owner mechanism is useful for dealing with memory leaks. By reading
> /sys/kernel/debug/page_owner one can determine the stack traces leading to
> allocations of all pages, and find e.g. a buggy driver.
>
> This information might be also potentially useful for debugging, such as the
> VM_BUG_ON_PAGE() calls to dump_page(). So let's print the stored info from
> dump_page().
>
> Example output:
>
> page:ffffea0002868a00 count:1 mapcount:0 mapping:ffff8800bba8e958 index:0x63a22c
> flags: 0x1fffff80000060(lru|active)
> page dumped because: VM_BUG_ON_PAGE(1)
> page->mem_cgroup:ffff880138efdc00
> page allocated via order 0, migratetype Movable, gfp_mask 0x2420848(GFP_NOFS|GFP_NOFAIL|GFP_HARDWALL|GFP_MOVABLE)
> [<ffffffff81164e8a>] __alloc_pages_nodemask+0x15a/0xa30
> [<ffffffff811ab808>] alloc_pages_current+0x88/0x120
> [<ffffffff8115bc36>] __page_cache_alloc+0xe6/0x120
> [<ffffffff8115c226>] pagecache_get_page+0x56/0x200
> [<ffffffff812058c2>] __getblk_slow+0xd2/0x2b0
> [<ffffffff81205ae0>] __getblk_gfp+0x40/0x50
> [<ffffffffa0283abe>] jbd2_journal_get_descriptor_buffer+0x3e/0x90 [jbd2]
> [<ffffffffa027c793>] jbd2_journal_commit_transaction+0x8e3/0x1870 [jbd2]
> page has been migrated, last migrate reason: compaction
Nice! This can be really helpful.
> Signed-off-by: Vlastimil Babka <[email protected]>
Appart from a typo below, looks good to me
Acked-by: Michal Hocko <[email protected]>
[...]
> +void __dump_page_owner(struct page *page)
> +{
> + struct page_ext *page_ext = lookup_page_ext(page);
> + struct stack_trace trace = {
> + .nr_entries = page_ext->nr_entries,
> + .entries = &page_ext->trace_entries[0],
> + };
> + gfp_t gfp_mask = page_ext->gfp_mask;
> + int mt = gfpflags_to_migratetype(gfp_mask);
> +
> + if (!test_bit(PAGE_EXT_OWNER, &page_ext->flags)) {
> + pr_alert("page_owner info is not active (free page?)\n");
> + return;
> + }
> + ;
Typo?
> + pr_alert("page allocated via order %u, migratetype %s, gfp_mask 0x%x",
> + page_ext->order, migratetype_names[mt], gfp_mask);
> + dump_gfpflag_names(gfp_mask);
> + print_stack_trace(&trace, 0);
> +
> + if (page_ext->last_migrate_reason != -1)
> + pr_alert("page has been migrated, last migrate reason: %s\n",
> + migrate_reason_names[page_ext->last_migrate_reason]);
> +}
> +
> static ssize_t
> read_page_owner(struct file *file, char __user *buf, size_t count, loff_t *ppos)
> {
> --
> 2.6.3
--
Michal Hocko
SUSE Labs
[+CC PeterZ]
On 11/25/2015 03:52 PM, Michal Hocko wrote:
> On Tue 24-11-15 13:36:15, Vlastimil Babka wrote:
>> CONFIG_PAGE_OWNER attempts to impose negligible runtime overhead when enabled
>> during compilation, but not actually enabled during runtime by boot param
>> page_owner=on. This overhead can be further reduced using the static key
>> mechanism, which this patch does.
>
> Is this really worth doing?
Well, I assume that jump labels exist for a reason, and allocation hot paths are
sufficiently sensitive to be worth it? It's not an extra maintenance burden for
us anyway. Just a bit different content of the if () line.
> If we do not have jump labels then the check
> will be atomic rather than a simple access, so it would be more costly,
> no? Or am I missing something?
Well, atomic read is a simple READ_ONCE on x86_64. That excludes some compiler
optimizations, but it's not expensive for the CPU. The optimization would be
caching the value of the flag to a register, which would only potentially affect
multiple checks from the same function (and its inlines). Which doesn't happen
AFAIK, as it's just once in the allocation and once in the free path?
Now I admit I have no idea if there are architectures that don't support jump
labels *and* have an expensive atomic read, and whether we care?
On Wed, Nov 25, 2015 at 04:08:11PM +0100, Vlastimil Babka wrote:
> Now I admit I have no idea if there are architectures that don't support jump
> labels *and* have an expensive atomic read, and whether we care?
atomic_read() is basically always READ_ONCE(), there's a few archs that
implement it in asm with a 'weird' load instruction, but its still a
load. The worst is I think an uncached load for blackfin or somesuch.
There's plenty archs that do not support the jump label bits, but
typically you don't care much about those. I'm not aware of an arch that
cannot fundamentally implement jump_label support if they wanted to.
On Wed 25-11-15 16:25:33, Peter Zijlstra wrote:
> On Wed, Nov 25, 2015 at 04:08:11PM +0100, Vlastimil Babka wrote:
> > Now I admit I have no idea if there are architectures that don't support jump
> > labels *and* have an expensive atomic read, and whether we care?
>
> atomic_read() is basically always READ_ONCE(), there's a few archs that
> implement it in asm with a 'weird' load instruction, but its still a
> load. The worst is I think an uncached load for blackfin or somesuch.
>
> There's plenty archs that do not support the jump label bits, but
> typically you don't care much about those. I'm not aware of an arch that
> cannot fundamentally implement jump_label support if they wanted to.
OK, I see. Thanks for the clarification! Then I do not have any
objections.
--
Michal Hocko
SUSE Labs
On 11/25/2015 09:13 AM, Joonsoo Kim wrote:
>> + if (page_ext->last_migrate_reason != -1) {
>> + ret += snprintf(kbuf + ret, count - ret,
>> + "Page has been migrated, last migrate reason: %s\n",
>> + migrate_reason_names[page_ext->last_migrate_reason]);
>> + if (ret >= count)
>> + goto err;
>> + }
>> +
>
> migrate_reason_names is defined if CONFIG_MIGRATION is enabled so
> it would cause build failure in case of !CONFIG_MIGRATION and
> CONFIG_PAGE_OWNER.
>
> Thanks.
Ugh right, linking gives warnings... Thanks.
I think instead of adding #ifdefs here, let's move migrate_reason_names to
mm/debug.c as we gradually do with these things. Also enum
migrate_reason is defined regardless of CONFIG_MIGRATION, so match that
for migrate_reason_names as well.
------8<------
>From 7b650fd613ed382aaa6f11f4b779e883a6af10aa Mon Sep 17 00:00:00 2001
From: Vlastimil Babka <[email protected]>
Date: Thu, 26 Nov 2015 11:31:09 +0100
Subject: mm, page_owner: track and print last migrate reason-fix
Move migrate_reason_names from mm/migrate.c to mm/debug.c so that link doesn't
warn with CONFIG_MIGRATION disabled.
---
include/linux/migrate.h | 2 +-
mm/debug.c | 11 +++++++++++
mm/migrate.c | 10 ----------
3 files changed, 12 insertions(+), 11 deletions(-)
diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index ab92a8cf4c93..8fdc033e527a 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -27,7 +27,7 @@ enum migrate_reason {
MR_TYPES
};
-/* In mm/migrate.c; also keep sync with include/trace/events/migrate.h */
+/* In mm/debug.c; also keep sync with include/trace/events/migrate.h */
extern char * migrate_reason_names[MR_TYPES];
#ifdef CONFIG_MIGRATION
diff --git a/mm/debug.c b/mm/debug.c
index d9718fc8377a..a4cd0c093ff6 100644
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -9,6 +9,17 @@
#include <linux/mm.h>
#include <linux/trace_events.h>
#include <linux/memcontrol.h>
+#include <linux/migrate.h>
+
+char *migrate_reason_names[MR_TYPES] = {
+ "compaction",
+ "memory_failure",
+ "memory_hotplug",
+ "syscall_or_cpuset",
+ "mempolicy_mbind",
+ "numa_misplaced",
+ "cma",
+};
static const struct trace_print_flags pageflag_names[] = {
{1UL << PG_locked, "locked" },
diff --git a/mm/migrate.c b/mm/migrate.c
index 12e9ab9de446..1c11b73cd834 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -47,16 +47,6 @@
#include "internal.h"
-char *migrate_reason_names[MR_TYPES] = {
- "compaction",
- "memory_failure",
- "memory_hotplug",
- "syscall_or_cpuset",
- "mempolicy_mbind",
- "numa_misplaced",
- "cma",
-};
-
/*
* migrate_prep() needs to be called before we start compiling a list of pages
* to be migrated using isolate_lru_page(). If scheduling work on other CPUs is
--
2.6.3
On 11/25/2015 03:58 PM, Michal Hocko wrote:
> Nice! This can be really helpful.
>
>> Signed-off-by: Vlastimil Babka <[email protected]>
>
> Appart from a typo below, looks good to me
> Acked-by: Michal Hocko <[email protected]>
Thanks!
> [...]
>
>> +void __dump_page_owner(struct page *page)
>> +{
>> + struct page_ext *page_ext = lookup_page_ext(page);
>> + struct stack_trace trace = {
>> + .nr_entries = page_ext->nr_entries,
>> + .entries = &page_ext->trace_entries[0],
>> + };
>> + gfp_t gfp_mask = page_ext->gfp_mask;
>> + int mt = gfpflags_to_migratetype(gfp_mask);
>> +
>> + if (!test_bit(PAGE_EXT_OWNER, &page_ext->flags)) {
>> + pr_alert("page_owner info is not active (free page?)\n");
>> + return;
>> + }
>> + ;
>
> Typo?
The cat did it!
------8<------
From: Vlastimil Babka <[email protected]>
Date: Thu, 26 Nov 2015 11:41:11 +0100
Subject: mm, page_owner: dump page owner info from dump_page()-fix
Remove stray semicolon.
---
mm/page_owner.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/page_owner.c b/mm/page_owner.c
index a81cfa0c13c3..f4acd2452c35 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -207,7 +207,7 @@ void __dump_page_owner(struct page *page)
pr_alert("page_owner info is not active (free page?)\n");
return;
}
- ;
+
pr_alert("page allocated via order %u, migratetype %s, gfp_mask 0x%x",
page_ext->order, migratetype_names[mt], gfp_mask);
dump_gfpflag_names(gfp_mask);
--
2.6.3
> On Nov 25, 2015, at 18:28, Vlastimil Babka <[email protected]> wrote:
>
> On 11/25/2015 09:16 AM, Joonsoo Kim wrote:
>> On Tue, Nov 24, 2015 at 01:36:18PM +0100, Vlastimil Babka wrote:
>>> --- a/include/trace/events/gfpflags.h
>>> +++ b/include/trace/events/gfpflags.h
>>> @@ -8,8 +8,8 @@
>>> *
>>> * Thus most bits set go first.
>>> */
>>> -#define show_gfp_flags(flags) \
>>> - (flags) ? __print_flags(flags, "|", \
>>> +
>>> +#define __def_gfpflag_names \
>>> {(unsigned long)GFP_TRANSHUGE, "GFP_TRANSHUGE"}, \
>>> {(unsigned long)GFP_HIGHUSER_MOVABLE, "GFP_HIGHUSER_MOVABLE"}, \
>>> {(unsigned long)GFP_HIGHUSER, "GFP_HIGHUSER"}, \
>>> @@ -19,9 +19,13 @@
>>> {(unsigned long)GFP_NOFS, "GFP_NOFS"}, \
>>> {(unsigned long)GFP_ATOMIC, "GFP_ATOMIC"}, \
>>> {(unsigned long)GFP_NOIO, "GFP_NOIO"}, \
>>> + {(unsigned long)GFP_NOWAIT, "GFP_NOWAIT"}, \
>>> + {(unsigned long)__GFP_DMA, "GFP_DMA"}, \
>>> + {(unsigned long)__GFP_DMA32, "GFP_DMA32"}, \
>>> {(unsigned long)__GFP_HIGH, "GFP_HIGH"}, \
>>> {(unsigned long)__GFP_ATOMIC, "GFP_ATOMIC"}, \
>>> {(unsigned long)__GFP_IO, "GFP_IO"}, \
>>> + {(unsigned long)__GFP_FS, "GFP_FS"}, \
>>> {(unsigned long)__GFP_COLD, "GFP_COLD"}, \
>>> {(unsigned long)__GFP_NOWARN, "GFP_NOWARN"}, \
>>> {(unsigned long)__GFP_REPEAT, "GFP_REPEAT"}, \
>>> @@ -36,8 +40,12 @@
>>> {(unsigned long)__GFP_RECLAIMABLE, "GFP_RECLAIMABLE"}, \
>>> {(unsigned long)__GFP_MOVABLE, "GFP_MOVABLE"}, \
>>> {(unsigned long)__GFP_NOTRACK, "GFP_NOTRACK"}, \
>>> + {(unsigned long)__GFP_WRITE, "GFP_WRITE"}, \
>>> {(unsigned long)__GFP_DIRECT_RECLAIM, "GFP_DIRECT_RECLAIM"}, \
>>> {(unsigned long)__GFP_KSWAPD_RECLAIM, "GFP_KSWAPD_RECLAIM"}, \
>>> {(unsigned long)__GFP_OTHER_NODE, "GFP_OTHER_NODE"} \
>>> - ) : "GFP_NOWAIT"
>>>
>>> +#define show_gfp_flags(flags) \
>>> + (flags) ? __print_flags(flags, "|", \
>>> + __def_gfpflag_names \
>>> + ) : "none"
>>
>> How about moving this to gfp.h or something?
>> Now, we use it in out of tracepoints so there is no need to keep it
>> in include/trace/events/xxx.
>
> Hm I didn't want to pollute such widely included header with such defines. And
> show_gfp_flags shouldn't be there definitely as it depends on __print_flags.
> What do others think?
how about add this into standard printk() format ?
like cpu mask print in printk use %*pb[l] ,
it define a macro cpumask_pr_args to print cpumask .
we can also define a new format like %pG means print flag ,
then it will be useful for other code to use , like dump vma / mm flags ..
Thanks
From: Vlastimil Babka <[email protected]>
Date: Thu, 26 Nov 2015 15:39:27 +0100
Subject: [PATCH] mm, debug: fix wrongly filtered flags in dump_vma()-fix
Don't print opening parenthesis twice.
---
mm/debug.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/debug.c b/mm/debug.c
index d9718fc8377a..68118399c2b6 100644
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -240,7 +240,7 @@ void dump_mm(const struct mm_struct *mm)
"" /* This is here to not have a comma! */
);
- pr_emerg("def_flags: %#lx(", mm->def_flags);
+ pr_emerg("def_flags: %#lx", mm->def_flags);
dump_flag_names(mm->def_flags, vmaflags_names,
ARRAY_SIZE(vmaflags_names));
}
--
2.6.3
In mm we use several kinds of flags bitfields that are sometimes printed for
debugging purposes, or exported to userspace via sysfs. To make them easier to
interpret independently on kernel version and config, we want to dump also the
symbolic flag names. So far this has been done with repeated calls to
pr_cont(), which is unreliable on SMP, and not usable for e.g. sysfs export.
To get a more reliable and universal solution, this patch extends printk()
format string for pointers to handle the page flags (%pgp), gfp_flags (%pgg)
and vma flags (%pgv). Existing users of dump_flag_names() are converted and
simplified.
It would be possible to pass flags by value instead of pointer, but the %p
format string for pointers already has extensions for various kernel
structures, so it's a good fit, and the extra indirection in a non-critical
path is negligible.
Signed-off-by: Vlastimil Babka <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
---
I'm sending it on top of the page_owner series, as it's already in mmotm.
But to reduce churn (in case this approach is accepted), I can later
incorporate it and resend it whole.
Documentation/printk-formats.txt | 14 ++++
include/linux/mmdebug.h | 5 +-
lib/vsprintf.c | 31 ++++++++
mm/debug.c | 150 ++++++++++++++++++++++-----------------
mm/oom_kill.c | 5 +-
mm/page_alloc.c | 5 +-
mm/page_owner.c | 5 +-
7 files changed, 140 insertions(+), 75 deletions(-)
diff --git a/Documentation/printk-formats.txt b/Documentation/printk-formats.txt
index b784c270105f..4b5156e74b09 100644
--- a/Documentation/printk-formats.txt
+++ b/Documentation/printk-formats.txt
@@ -292,6 +292,20 @@ Raw pointer value SHOULD be printed with %p. The kernel supports
Passed by reference.
+Flags bitfields such as page flags, gfp_flags:
+
+ %pgp 0x1fffff8000086c(referenced|uptodate|lru|active|private)
+ %pgg 0x24202c4(GFP_USER|GFP_DMA32|GFP_NOWARN)
+ %pgv 0x875(read|exec|mayread|maywrite|mayexec|denywrite)
+
+ For printing raw values of flags bitfields together with symbolic
+ strings that would construct the value. The type of flags is given by
+ the third character. Currently supported are [p]age flags, [g]fp_flags
+ and [v]ma_flags. The flag names and print order depends on the
+ particular type.
+
+ Passed by reference.
+
Network device features:
%pNF 0x000000000000c000
diff --git a/include/linux/mmdebug.h b/include/linux/mmdebug.h
index 3b77fab7ad28..e6518df259ca 100644
--- a/include/linux/mmdebug.h
+++ b/include/linux/mmdebug.h
@@ -2,6 +2,7 @@
#define LINUX_MM_DEBUG_H 1
#include <linux/stringify.h>
+#include <linux/types.h>
struct page;
struct vm_area_struct;
@@ -10,7 +11,9 @@ struct mm_struct;
extern void dump_page(struct page *page, const char *reason);
extern void dump_page_badflags(struct page *page, const char *reason,
unsigned long badflags);
-extern void dump_gfpflag_names(unsigned long gfp_flags);
+extern char *format_page_flags(unsigned long flags, char *buf, char *end);
+extern char *format_vma_flags(unsigned long flags, char *buf, char *end);
+extern char *format_gfp_flags(gfp_t gfp_flags, char *buf, char*end);
void dump_vma(const struct vm_area_struct *vma);
void dump_mm(const struct mm_struct *mm);
diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index f9cee8e1233c..41cd122bd307 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -31,6 +31,7 @@
#include <linux/dcache.h>
#include <linux/cred.h>
#include <net/addrconf.h>
+#include <linux/mmdebug.h>
#include <asm/page.h> /* for PAGE_SIZE */
#include <asm/sections.h> /* for dereference_function_descriptor() */
@@ -1361,6 +1362,29 @@ char *clock(char *buf, char *end, struct clk *clk, struct printf_spec spec,
}
}
+static noinline_for_stack
+char *flags_string(char *buf, char *end, void *flags_ptr,
+ struct printf_spec spec, const char *fmt)
+{
+ unsigned long flags;
+ gfp_t gfp_flags;
+
+ switch (fmt[1]) {
+ case 'p':
+ flags = *(unsigned long *)flags_ptr;
+ return format_page_flags(flags, buf, end);
+ case 'v':
+ flags = *(unsigned long *)flags_ptr;
+ return format_vma_flags(flags, buf, end);
+ case 'g':
+ gfp_flags = *(gfp_t *)flags_ptr;
+ return format_gfp_flags(gfp_flags, buf, end);
+ default:
+ WARN_ONCE(1, "Unsupported flags modifier: %c\n", fmt[1]);
+ return 0;
+ }
+}
+
int kptr_restrict __read_mostly;
/*
@@ -1448,6 +1472,11 @@ int kptr_restrict __read_mostly;
* - 'Cn' For a clock, it prints the name (Common Clock Framework) or address
* (legacy clock framework) of the clock
* - 'Cr' For a clock, it prints the current rate of the clock
+ * - 'g' For flags to be printed as a collection of symbolic strings that would
+ * construct the specific value. Supported flags given by option:
+ * p page flags (see struct page) given as pointer to unsigned long
+ * g gfp flags (GFP_* and __GFP_*) given as pointer to gfp_t
+ * v vma flags (VM_*) given as pointer to unsigned long
*
* ** Please update also Documentation/printk-formats.txt when making changes **
*
@@ -1600,6 +1629,8 @@ char *pointer(const char *fmt, char *buf, char *end, void *ptr,
return dentry_name(buf, end,
((const struct file *)ptr)->f_path.dentry,
spec, fmt);
+ case 'g':
+ return flags_string(buf, end, ptr, spec, fmt);
}
spec.flags |= SMALL;
if (spec.field_width == -1) {
diff --git a/mm/debug.c b/mm/debug.c
index 2fdf0999e6f9..a092111920e7 100644
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -59,40 +59,109 @@ static const struct trace_print_flags pageflag_names[] = {
#endif
};
+static const struct trace_print_flags vmaflags_names[] = {
+ {VM_READ, "read" },
+ {VM_WRITE, "write" },
+ {VM_EXEC, "exec" },
+ {VM_SHARED, "shared" },
+ {VM_MAYREAD, "mayread" },
+ {VM_MAYWRITE, "maywrite" },
+ {VM_MAYEXEC, "mayexec" },
+ {VM_MAYSHARE, "mayshare" },
+ {VM_GROWSDOWN, "growsdown" },
+ {VM_PFNMAP, "pfnmap" },
+ {VM_DENYWRITE, "denywrite" },
+ {VM_LOCKONFAULT, "lockonfault" },
+ {VM_LOCKED, "locked" },
+ {VM_IO, "io" },
+ {VM_SEQ_READ, "seqread" },
+ {VM_RAND_READ, "randread" },
+ {VM_DONTCOPY, "dontcopy" },
+ {VM_DONTEXPAND, "dontexpand" },
+ {VM_ACCOUNT, "account" },
+ {VM_NORESERVE, "noreserve" },
+ {VM_HUGETLB, "hugetlb" },
+#if defined(CONFIG_X86)
+ {VM_PAT, "pat" },
+#elif defined(CONFIG_PPC)
+ {VM_SAO, "sao" },
+#elif defined(CONFIG_PARISC) || defined(CONFIG_METAG) || defined(CONFIG_IA64)
+ {VM_GROWSUP, "growsup" },
+#elif !defined(CONFIG_MMU)
+ {VM_MAPPED_COPY, "mappedcopy" },
+#else
+ {VM_ARCH_1, "arch_1" },
+#endif
+ {VM_DONTDUMP, "dontdump" },
+#ifdef CONFIG_MEM_SOFT_DIRTY
+ {VM_SOFTDIRTY, "softdirty" },
+#endif
+ {VM_MIXEDMAP, "mixedmap" },
+ {VM_HUGEPAGE, "hugepage" },
+ {VM_NOHUGEPAGE, "nohugepage" },
+ {VM_MERGEABLE, "mergeable" },
+};
+
static const struct trace_print_flags gfpflag_names[] = {
__def_gfpflag_names
};
-static void dump_flag_names(unsigned long flags,
- const struct trace_print_flags *names, int count)
+static char *format_flag_names(unsigned long flags, unsigned long mask_out,
+ const struct trace_print_flags *names, int count,
+ char *buf, char *end)
{
const char *delim = "";
unsigned long mask;
int i;
- pr_cont("(");
+ buf += snprintf(buf, end - buf, "%#lx(", flags);
+
+ flags &= ~mask_out;
for (i = 0; i < count && flags; i++) {
+ if (buf >= end)
+ break;
mask = names[i].mask;
if ((flags & mask) != mask)
continue;
flags &= ~mask;
- pr_cont("%s%s", delim, names[i].name);
+ buf += snprintf(buf, end - buf, "%s%s", delim, names[i].name);
delim = "|";
}
/* check for left over flags */
- if (flags)
- pr_cont("%s%#lx", delim, flags);
+ if (flags && (buf < end))
+ buf += snprintf(buf, end - buf, "%s%#lx", delim, flags);
+
+ if (buf < end) {
+ *buf = ')';
+ buf++;
+ }
- pr_cont(")\n");
+ return buf;
}
-void dump_gfpflag_names(unsigned long gfp_flags)
+char *format_page_flags(unsigned long flags, char *buf, char *end)
{
- dump_flag_names(gfp_flags, gfpflag_names, ARRAY_SIZE(gfpflag_names));
+ /* remove zone id */
+ unsigned long mask = (1UL << NR_PAGEFLAGS) - 1;
+
+ return format_flag_names(flags, ~mask, pageflag_names,
+ ARRAY_SIZE(pageflag_names), buf, end);
+}
+
+char *format_vma_flags(unsigned long flags, char *buf, char *end)
+{
+ return format_flag_names(flags, 0, vmaflags_names,
+ ARRAY_SIZE(vmaflags_names), buf, end);
+}
+
+char *format_gfp_flags(gfp_t gfp_flags, char *buf, char *end)
+{
+ return format_flag_names(gfp_flags, 0, gfpflag_names,
+ ARRAY_SIZE(gfpflag_names), buf, end);
}
void dump_page_badflags(struct page *page, const char *reason,
@@ -108,18 +177,15 @@ void dump_page_badflags(struct page *page, const char *reason,
pr_cont("\n");
BUILD_BUG_ON(ARRAY_SIZE(pageflag_names) != __NR_PAGEFLAGS);
- pr_emerg("flags: %#lx", printflags);
+ pr_emerg("flags: %pgp\n", &printflags);
/* remove zone id */
printflags &= (1UL << NR_PAGEFLAGS) - 1;
- dump_flag_names(printflags, pageflag_names, ARRAY_SIZE(pageflag_names));
if (reason)
pr_alert("page dumped because: %s\n", reason);
if (page->flags & badflags) {
printflags = page->flags & badflags;
- pr_alert("bad because of flags: %#lx:", printflags);
- dump_flag_names(printflags, pageflag_names,
- ARRAY_SIZE(pageflag_names));
+ pr_alert("bad because of flags: %pgp\n", &printflags);
}
#ifdef CONFIG_MEMCG
if (page->mem_cgroup)
@@ -136,63 +202,19 @@ EXPORT_SYMBOL(dump_page);
#ifdef CONFIG_DEBUG_VM
-static const struct trace_print_flags vmaflags_names[] = {
- {VM_READ, "read" },
- {VM_WRITE, "write" },
- {VM_EXEC, "exec" },
- {VM_SHARED, "shared" },
- {VM_MAYREAD, "mayread" },
- {VM_MAYWRITE, "maywrite" },
- {VM_MAYEXEC, "mayexec" },
- {VM_MAYSHARE, "mayshare" },
- {VM_GROWSDOWN, "growsdown" },
- {VM_PFNMAP, "pfnmap" },
- {VM_DENYWRITE, "denywrite" },
- {VM_LOCKONFAULT, "lockonfault" },
- {VM_LOCKED, "locked" },
- {VM_IO, "io" },
- {VM_SEQ_READ, "seqread" },
- {VM_RAND_READ, "randread" },
- {VM_DONTCOPY, "dontcopy" },
- {VM_DONTEXPAND, "dontexpand" },
- {VM_ACCOUNT, "account" },
- {VM_NORESERVE, "noreserve" },
- {VM_HUGETLB, "hugetlb" },
-#if defined(CONFIG_X86)
- {VM_PAT, "pat" },
-#elif defined(CONFIG_PPC)
- {VM_SAO, "sao" },
-#elif defined(CONFIG_PARISC) || defined(CONFIG_METAG) || defined(CONFIG_IA64)
- {VM_GROWSUP, "growsup" },
-#elif !defined(CONFIG_MMU)
- {VM_MAPPED_COPY, "mappedcopy" },
-#else
- {VM_ARCH_1, "arch_1" },
-#endif
- {VM_DONTDUMP, "dontdump" },
-#ifdef CONFIG_MEM_SOFT_DIRTY
- {VM_SOFTDIRTY, "softdirty" },
-#endif
- {VM_MIXEDMAP, "mixedmap" },
- {VM_HUGEPAGE, "hugepage" },
- {VM_NOHUGEPAGE, "nohugepage" },
- {VM_MERGEABLE, "mergeable" },
-};
-
void dump_vma(const struct vm_area_struct *vma)
{
pr_emerg("vma %p start %p end %p\n"
"next %p prev %p mm %p\n"
"prot %lx anon_vma %p vm_ops %p\n"
- "pgoff %lx file %p private_data %p\n",
+ "pgoff %lx file %p private_data %p\n"
+ "flags: %pgv\n",
vma, (void *)vma->vm_start, (void *)vma->vm_end, vma->vm_next,
vma->vm_prev, vma->vm_mm,
(unsigned long)pgprot_val(vma->vm_page_prot),
vma->anon_vma, vma->vm_ops, vma->vm_pgoff,
- vma->vm_file, vma->vm_private_data);
- pr_emerg("flags: %#lx", vma->vm_flags);
- dump_flag_names(vma->vm_flags, vmaflags_names,
- ARRAY_SIZE(vmaflags_names));
+ vma->vm_file, vma->vm_private_data,
+ &vma->vm_flags);
}
EXPORT_SYMBOL(dump_vma);
@@ -263,9 +285,7 @@ void dump_mm(const struct mm_struct *mm)
"" /* This is here to not have a comma! */
);
- pr_emerg("def_flags: %#lx", mm->def_flags);
- dump_flag_names(mm->def_flags, vmaflags_names,
- ARRAY_SIZE(vmaflags_names));
+ pr_emerg("def_flags: %pgv\n", &mm->def_flags);
}
#endif /* CONFIG_DEBUG_VM */
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 542d56c93209..63a68b62ee68 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -387,10 +387,9 @@ static void dump_header(struct oom_control *oc, struct task_struct *p,
struct mem_cgroup *memcg)
{
pr_warning("%s invoked oom-killer: order=%d, oom_score_adj=%hd, "
- "gfp_mask=0x%x",
+ "gfp_mask=%pgg\n",
current->comm, oc->order, current->signal->oom_score_adj,
- oc->gfp_mask);
- dump_gfpflag_names(oc->gfp_mask);
+ &oc->gfp_mask);
cpuset_print_current_mems_allowed();
dump_stack();
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 80349acd8c17..77d2c75f80e4 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2711,9 +2711,8 @@ void warn_alloc_failed(gfp_t gfp_mask, unsigned int order, const char *fmt, ...)
va_end(args);
}
- pr_warn("%s: page allocation failure: order:%u, mode:0x%x",
- current->comm, order, gfp_mask);
- dump_gfpflag_names(gfp_mask);
+ pr_warn("%s: page allocation failure: order:%u, mode:%pgg\n",
+ current->comm, order, &gfp_mask);
dump_stack();
if (!should_suppress_show_mem())
show_mem(filter);
diff --git a/mm/page_owner.c b/mm/page_owner.c
index f4acd2452c35..ff862b6d12da 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -208,9 +208,8 @@ void __dump_page_owner(struct page *page)
return;
}
- pr_alert("page allocated via order %u, migratetype %s, gfp_mask 0x%x",
- page_ext->order, migratetype_names[mt], gfp_mask);
- dump_gfpflag_names(gfp_mask);
+ pr_alert("page allocated via order %u, migratetype %s, gfp_mask %pgg\n",
+ page_ext->order, migratetype_names[mt], &gfp_mask);
print_stack_trace(&trace, 0);
if (page_ext->last_migrate_reason != -1)
--
2.6.3
With the new format strings for flags, we can now provide symbolic page and gfp
flags in the /sys/kernel/debug/page_owner file. This replaces the positional
printing of page flags as single letters, which might have looked nicer, but
was limited to a subset of flags, and required the user to remember the
letters.
Example of the adjusted format:
Page allocated via order 0, mask 0x24213ca(GFP_HIGHUSER_MOVABLE|GFP_COLD|GFP_NOWARN|GFP_NORETRY)
PFN 674308 type Movable Block 1317 type Movable Flags 0x1fffff80010068(uptodate|lru|active|mappedtodisk)
[<ffffffff81164e9a>] __alloc_pages_nodemask+0x15a/0xa30
[<ffffffff811ab938>] alloc_pages_current+0x88/0x120
[<ffffffff8115bc46>] __page_cache_alloc+0xe6/0x120
[<ffffffff81168b9b>] __do_page_cache_readahead+0xdb/0x200
[<ffffffff81168df5>] ondemand_readahead+0x135/0x260
[<ffffffff81168f8c>] page_cache_async_readahead+0x6c/0x70
[<ffffffff8115d5f8>] generic_file_read_iter+0x378/0x590
[<ffffffff811d12a7>] __vfs_read+0xa7/0xd0
Page has been migrated, last migrate reason: compaction
Signed-off-by: Vlastimil Babka <[email protected]>
---
mm/page_owner.c | 19 ++++---------------
1 file changed, 4 insertions(+), 15 deletions(-)
diff --git a/mm/page_owner.c b/mm/page_owner.c
index ff862b6d12da..421765a53c68 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -135,8 +135,8 @@ print_page_owner(char __user *buf, size_t count, unsigned long pfn,
return -ENOMEM;
ret = snprintf(kbuf, count,
- "Page allocated via order %u, mask 0x%x\n",
- page_ext->order, page_ext->gfp_mask);
+ "Page allocated via order %u, mask %pgg\n",
+ page_ext->order, &page_ext->gfp_mask);
if (ret >= count)
goto err;
@@ -145,23 +145,12 @@ print_page_owner(char __user *buf, size_t count, unsigned long pfn,
pageblock_mt = get_pfnblock_migratetype(page, pfn);
page_mt = gfpflags_to_migratetype(page_ext->gfp_mask);
ret += snprintf(kbuf + ret, count - ret,
- "PFN %lu type %s Block %lu type %s Flags %s%s%s%s%s%s%s%s%s%s%s%s\n",
+ "PFN %lu type %s Block %lu type %s Flags %pgp\n",
pfn,
migratetype_names[page_mt],
pfn >> pageblock_order,
migratetype_names[pageblock_mt],
- PageLocked(page) ? "K" : " ",
- PageError(page) ? "E" : " ",
- PageReferenced(page) ? "R" : " ",
- PageUptodate(page) ? "U" : " ",
- PageDirty(page) ? "D" : " ",
- PageLRU(page) ? "L" : " ",
- PageActive(page) ? "A" : " ",
- PageSlab(page) ? "S" : " ",
- PageWriteback(page) ? "W" : " ",
- PageCompound(page) ? "C" : " ",
- PageSwapCache(page) ? "B" : " ",
- PageMappedToDisk(page) ? "M" : " ");
+ &page->flags);
if (ret >= count)
goto err;
--
2.6.3
On Mon, Nov 30 2015, Vlastimil Babka <[email protected]> wrote:
> In mm we use several kinds of flags bitfields that are sometimes printed for
> debugging purposes, or exported to userspace via sysfs. To make them easier to
> interpret independently on kernel version and config, we want to dump also the
> symbolic flag names. So far this has been done with repeated calls to
> pr_cont(), which is unreliable on SMP, and not usable for e.g. sysfs export.
>
> To get a more reliable and universal solution, this patch extends printk()
> format string for pointers to handle the page flags (%pgp), gfp_flags (%pgg)
> and vma flags (%pgv). Existing users of dump_flag_names() are converted and
> simplified.
>
> It would be possible to pass flags by value instead of pointer, but the %p
> format string for pointers already has extensions for various kernel
> structures, so it's a good fit, and the extra indirection in a non-critical
> path is negligible.
>
> Signed-off-by: Vlastimil Babka <[email protected]>
> Cc: Rasmus Villemoes <[email protected]>
> ---
> I'm sending it on top of the page_owner series, as it's already in mmotm.
> But to reduce churn (in case this approach is accepted), I can later
> incorporate it and resend it whole.
>
> Documentation/printk-formats.txt | 14 ++++
> include/linux/mmdebug.h | 5 +-
> lib/vsprintf.c | 31 ++++++++
> mm/debug.c | 150 ++++++++++++++++++++++-----------------
> mm/oom_kill.c | 5 +-
> mm/page_alloc.c | 5 +-
> mm/page_owner.c | 5 +-
> 7 files changed, 140 insertions(+), 75 deletions(-)
I'd prefer to have the formatting code in vsprintf.c, so that we'd avoid
having to call vsnprintf recursively (and repeatedly - not that this is
going to be used in hot paths, but if the box is going down it might be
nice to get the debug info out a few thousand cycles earlier). That'll
also make it easier to avoid the bugs below.
> diff --git a/Documentation/printk-formats.txt b/Documentation/printk-formats.txt
> index b784c270105f..4b5156e74b09 100644
> --- a/Documentation/printk-formats.txt
> +++ b/Documentation/printk-formats.txt
> @@ -292,6 +292,20 @@ Raw pointer value SHOULD be printed with %p. The kernel supports
>
> Passed by reference.
>
> +Flags bitfields such as page flags, gfp_flags:
> +
> + %pgp 0x1fffff8000086c(referenced|uptodate|lru|active|private)
> + %pgg 0x24202c4(GFP_USER|GFP_DMA32|GFP_NOWARN)
> + %pgv 0x875(read|exec|mayread|maywrite|mayexec|denywrite)
> +
I think it would be better (and more flexible) if %pg* only stood for
printing the | chain of strings. Let people pass the flags twice if they
also want the numeric value; then they're also able to choose 0-padding
and whatnot, can use other kinds of parentheses, etc., etc. So
pr_emerg("flags: 0x%08lu [%pgp]\n", printflags, &printflags)
> + For printing raw values of flags bitfields together with symbolic
> + strings that would construct the value. The type of flags is given by
> + the third character. Currently supported are [p]age flags, [g]fp_flags
> + and [v]ma_flags. The flag names and print order depends on the
> + particular type.
> +
> + Passed by reference.
> +
> Network device features:
>
> %pNF 0x000000000000c000
> diff --git a/include/linux/mmdebug.h b/include/linux/mmdebug.h
> index 3b77fab7ad28..e6518df259ca 100644
> --- a/include/linux/mmdebug.h
> +++ b/include/linux/mmdebug.h
> @@ -2,6 +2,7 @@
> #define LINUX_MM_DEBUG_H 1
>
> #include <linux/stringify.h>
> +#include <linux/types.h>
>
> struct page;
> struct vm_area_struct;
> @@ -10,7 +11,9 @@ struct mm_struct;
> extern void dump_page(struct page *page, const char *reason);
> extern void dump_page_badflags(struct page *page, const char *reason,
> unsigned long badflags);
> -extern void dump_gfpflag_names(unsigned long gfp_flags);
> +extern char *format_page_flags(unsigned long flags, char *buf, char *end);
> +extern char *format_vma_flags(unsigned long flags, char *buf, char *end);
> +extern char *format_gfp_flags(gfp_t gfp_flags, char *buf, char*end);
> void dump_vma(const struct vm_area_struct *vma);
> void dump_mm(const struct mm_struct *mm);
>
> diff --git a/lib/vsprintf.c b/lib/vsprintf.c
> index f9cee8e1233c..41cd122bd307 100644
> --- a/lib/vsprintf.c
> +++ b/lib/vsprintf.c
> @@ -31,6 +31,7 @@
> #include <linux/dcache.h>
> #include <linux/cred.h>
> #include <net/addrconf.h>
> +#include <linux/mmdebug.h>
>
> #include <asm/page.h> /* for PAGE_SIZE */
> #include <asm/sections.h> /* for dereference_function_descriptor() */
> @@ -1361,6 +1362,29 @@ char *clock(char *buf, char *end, struct clk *clk, struct printf_spec spec,
> }
> }
>
> +static noinline_for_stack
> +char *flags_string(char *buf, char *end, void *flags_ptr,
> + struct printf_spec spec, const char *fmt)
> +{
> + unsigned long flags;
> + gfp_t gfp_flags;
> +
> + switch (fmt[1]) {
> + case 'p':
> + flags = *(unsigned long *)flags_ptr;
> + return format_page_flags(flags, buf, end);
> + case 'v':
> + flags = *(unsigned long *)flags_ptr;
> + return format_vma_flags(flags, buf, end);
> + case 'g':
> + gfp_flags = *(gfp_t *)flags_ptr;
> + return format_gfp_flags(gfp_flags, buf, end);
> + default:
> + WARN_ONCE(1, "Unsupported flags modifier: %c\n", fmt[1]);
> + return 0;
> + }
> +}
> +
That return 0 aka return NULL will lead to an oops when the next thing
is printed. Did you mean 'return buf;'?
> int kptr_restrict __read_mostly;
>
> /*
> @@ -1448,6 +1472,11 @@ int kptr_restrict __read_mostly;
> * - 'Cn' For a clock, it prints the name (Common Clock Framework) or address
> * (legacy clock framework) of the clock
> * - 'Cr' For a clock, it prints the current rate of the clock
> + * - 'g' For flags to be printed as a collection of symbolic strings that would
> + * construct the specific value. Supported flags given by option:
> + * p page flags (see struct page) given as pointer to unsigned long
> + * g gfp flags (GFP_* and __GFP_*) given as pointer to gfp_t
> + * v vma flags (VM_*) given as pointer to unsigned long
> *
> * ** Please update also Documentation/printk-formats.txt when making changes **
> *
> @@ -1600,6 +1629,8 @@ char *pointer(const char *fmt, char *buf, char *end, void *ptr,
> return dentry_name(buf, end,
> ((const struct file *)ptr)->f_path.dentry,
> spec, fmt);
> + case 'g':
> + return flags_string(buf, end, ptr, spec, fmt);
> }
> spec.flags |= SMALL;
> if (spec.field_width == -1) {
> diff --git a/mm/debug.c b/mm/debug.c
> index 2fdf0999e6f9..a092111920e7 100644
> --- a/mm/debug.c
> +++ b/mm/debug.c
> @@ -59,40 +59,109 @@ static const struct trace_print_flags pageflag_names[] = {
> #endif
> };
>
> +static const struct trace_print_flags vmaflags_names[] = {
> + {VM_READ, "read" },
> + {VM_WRITE, "write" },
> + {VM_EXEC, "exec" },
> + {VM_SHARED, "shared" },
> + {VM_MAYREAD, "mayread" },
> + {VM_MAYWRITE, "maywrite" },
> + {VM_MAYEXEC, "mayexec" },
> + {VM_MAYSHARE, "mayshare" },
> + {VM_GROWSDOWN, "growsdown" },
> + {VM_PFNMAP, "pfnmap" },
> + {VM_DENYWRITE, "denywrite" },
> + {VM_LOCKONFAULT, "lockonfault" },
> + {VM_LOCKED, "locked" },
> + {VM_IO, "io" },
> + {VM_SEQ_READ, "seqread" },
> + {VM_RAND_READ, "randread" },
> + {VM_DONTCOPY, "dontcopy" },
> + {VM_DONTEXPAND, "dontexpand" },
> + {VM_ACCOUNT, "account" },
> + {VM_NORESERVE, "noreserve" },
> + {VM_HUGETLB, "hugetlb" },
> +#if defined(CONFIG_X86)
> + {VM_PAT, "pat" },
> +#elif defined(CONFIG_PPC)
> + {VM_SAO, "sao" },
> +#elif defined(CONFIG_PARISC) || defined(CONFIG_METAG) || defined(CONFIG_IA64)
> + {VM_GROWSUP, "growsup" },
> +#elif !defined(CONFIG_MMU)
> + {VM_MAPPED_COPY, "mappedcopy" },
> +#else
> + {VM_ARCH_1, "arch_1" },
> +#endif
> + {VM_DONTDUMP, "dontdump" },
> +#ifdef CONFIG_MEM_SOFT_DIRTY
> + {VM_SOFTDIRTY, "softdirty" },
> +#endif
> + {VM_MIXEDMAP, "mixedmap" },
> + {VM_HUGEPAGE, "hugepage" },
> + {VM_NOHUGEPAGE, "nohugepage" },
> + {VM_MERGEABLE, "mergeable" },
> +};
> +
> static const struct trace_print_flags gfpflag_names[] = {
> __def_gfpflag_names
> };
>
> -static void dump_flag_names(unsigned long flags,
> - const struct trace_print_flags *names, int count)
> +static char *format_flag_names(unsigned long flags, unsigned long mask_out,
> + const struct trace_print_flags *names, int count,
> + char *buf, char *end)
> {
> const char *delim = "";
> unsigned long mask;
> int i;
>
> - pr_cont("(");
> + buf += snprintf(buf, end - buf, "%#lx(", flags);
Sorry, you can't do it like this. The buf you've been passed from inside
vsnprintf may be beyond end, so end-buf is a negative number which will
(get converted to a huge positive size_t and) trigger a WARN_ONCE and
get you a return value of 0.
> + flags &= ~mask_out;
>
> for (i = 0; i < count && flags; i++) {
> + if (buf >= end)
> + break;
Even if you fix the above, this is also wrong. We have to return the
length of the string that would be generated if there was room enough,
so we cannot make an early return like this. As I said above, the
easiest way to do that is to do it inside vsprintf.c, where we have
e.g. string() available. So I'd do something like
char *format_flags(char *buf, char *end, unsigned long flags,
const struct trace_print_flags *names)
{
unsigned long mask;
const struct printf_spec strspec = {/* appropriate defaults*/}
const struct printf_spec numspec = {/* appropriate defaults*/}
for ( ; flags && names->mask; names++) {
mask = names->mask;
if ((flags & mask) != mask)
continue;
flags &= ~mask;
buf = string(buf, end, names->name, strspec);
if (flags) {
if (buf < end)
*buf = '|';
buf++;
}
}
if (flags)
buf = number(buf, end, flags, numspec);
return buf;
}
[where I've assumed that the trace_print_flags array is terminated with
an entry with 0 mask. Passing its length is also possible, but maybe a
little awkward if the arrays are defined in mm/ and contents depend on
.config.]
Then flags_string() would call this directly with an appropriate array
for names, and we avoid the individual tiny helper
functions. flags_string() can still do the mask_out thing for page
flags, especially when/if the numeric and string representations are not
done at the same time.
Rasmus
> On Nov 30, 2015, at 08:10, Vlastimil Babka <[email protected]> wrote:
>
> In mm we use several kinds of flags bitfields that are sometimes printed for
> debugging purposes, or exported to userspace via sysfs. To make them easier to
> interpret independently on kernel version and config, we want to dump also the
> symbolic flag names. So far this has been done with repeated calls to
> pr_cont(), which is unreliable on SMP, and not usable for e.g. sysfs export.
>
> To get a more reliable and universal solution, this patch extends printk()
> format string for pointers to handle the page flags (%pgp), gfp_flags (%pgg)
> and vma flags (%pgv). Existing users of dump_flag_names() are converted and
> simplified.
>
> It would be possible to pass flags by value instead of pointer, but the %p
> format string for pointers already has extensions for various kernel
> structures, so it's a good fit, and the extra indirection in a non-critical
> path is negligible.
>
> Signed-off-by: Vlastimil Babka <[email protected]>
> Cc: Rasmus Villemoes <[email protected]>
> ---
> I'm sending it on top of the page_owner series, as it's already in mmotm.
> But to reduce churn (in case this approach is accepted), I can later
> incorporate it and resend it whole.
>
> Documentation/printk-formats.txt | 14 ++++
> include/linux/mmdebug.h | 5 +-
> lib/vsprintf.c | 31 ++++++++
> mm/debug.c | 150 ++++++++++++++++++++++-----------------
> mm/oom_kill.c | 5 +-
> mm/page_alloc.c | 5 +-
> mm/page_owner.c | 5 +-
> 7 files changed, 140 insertions(+), 75 deletions(-)
>
> diff --git a/Documentation/printk-formats.txt b/Documentation/printk-formats.txt
> index b784c270105f..4b5156e74b09 100644
> --- a/Documentation/printk-formats.txt
> +++ b/Documentation/printk-formats.txt
> @@ -292,6 +292,20 @@ Raw pointer value SHOULD be printed with %p. The kernel supports
>
> Passed by reference.
>
> +Flags bitfields such as page flags, gfp_flags:
> +
> + %pgp 0x1fffff8000086c(referenced|uptodate|lru|active|private)
> + %pgg 0x24202c4(GFP_USER|GFP_DMA32|GFP_NOWARN)
> + %pgv 0x875(read|exec|mayread|maywrite|mayexec|denywrite)
> +
> + For printing raw values of flags bitfields together with symbolic
> + strings that would construct the value. The type of flags is given by
> + the third character. Currently supported are [p]age flags, [g]fp_flags
> + and [v]ma_flags. The flag names and print order depends on the
> + particular type.
> +
> + Passed by reference.
> +
> Network device features:
>
> %pNF 0x000000000000c000
> diff --git a/include/linux/mmdebug.h b/include/linux/mmdebug.h
> index 3b77fab7ad28..e6518df259ca 100644
> --- a/include/linux/mmdebug.h
> +++ b/include/linux/mmdebug.h
> @@ -2,6 +2,7 @@
> #define LINUX_MM_DEBUG_H 1
>
> #include <linux/stringify.h>
> +#include <linux/types.h>
>
> struct page;
> struct vm_area_struct;
> @@ -10,7 +11,9 @@ struct mm_struct;
> extern void dump_page(struct page *page, const char *reason);
> extern void dump_page_badflags(struct page *page, const char *reason,
> unsigned long badflags);
> -extern void dump_gfpflag_names(unsigned long gfp_flags);
> +extern char *format_page_flags(unsigned long flags, char *buf, char *end);
> +extern char *format_vma_flags(unsigned long flags, char *buf, char *end);
> +extern char *format_gfp_flags(gfp_t gfp_flags, char *buf, char*end);
> void dump_vma(const struct vm_area_struct *vma);
> void dump_mm(const struct mm_struct *mm);
>
> diff --git a/lib/vsprintf.c b/lib/vsprintf.c
> index f9cee8e1233c..41cd122bd307 100644
> --- a/lib/vsprintf.c
> +++ b/lib/vsprintf.c
> @@ -31,6 +31,7 @@
> #include <linux/dcache.h>
> #include <linux/cred.h>
> #include <net/addrconf.h>
> +#include <linux/mmdebug.h>
>
> #include <asm/page.h> /* for PAGE_SIZE */
> #include <asm/sections.h> /* for dereference_function_descriptor() */
> @@ -1361,6 +1362,29 @@ char *clock(char *buf, char *end, struct clk *clk, struct printf_spec spec,
> }
> }
>
> +static noinline_for_stack
> +char *flags_string(char *buf, char *end, void *flags_ptr,
> + struct printf_spec spec, const char *fmt)
> +{
> + unsigned long flags;
> + gfp_t gfp_flags;
> +
> + switch (fmt[1]) {
> + case 'p':
> + flags = *(unsigned long *)flags_ptr;
> + return format_page_flags(flags, buf, end);
> + case 'v':
> + flags = *(unsigned long *)flags_ptr;
> + return format_vma_flags(flags, buf, end);
> + case 'g':
> + gfp_flags = *(gfp_t *)flags_ptr;
> + return format_gfp_flags(gfp_flags, buf, end);
> + default:
> + WARN_ONCE(1, "Unsupported flags modifier: %c\n", fmt[1]);
> + return 0;
> + }
> +}
> +
> int kptr_restrict __read_mostly;
>
> /*
> @@ -1448,6 +1472,11 @@ int kptr_restrict __read_mostly;
> * - 'Cn' For a clock, it prints the name (Common Clock Framework) or address
> * (legacy clock framework) of the clock
> * - 'Cr' For a clock, it prints the current rate of the clock
> + * - 'g' For flags to be printed as a collection of symbolic strings that would
> + * construct the specific value. Supported flags given by option:
> + * p page flags (see struct page) given as pointer to unsigned long
> + * g gfp flags (GFP_* and __GFP_*) given as pointer to gfp_t
> + * v vma flags (VM_*) given as pointer to unsigned long
> *
> * ** Please update also Documentation/printk-formats.txt when making changes **
> *
> @@ -1600,6 +1629,8 @@ char *pointer(const char *fmt, char *buf, char *end, void *ptr,
> return dentry_name(buf, end,
> ((const struct file *)ptr)->f_path.dentry,
> spec, fmt);
> + case 'g':
> + return flags_string(buf, end, ptr, spec, fmt);
> }
> spec.flags |= SMALL;
> if (spec.field_width == -1) {
> diff --git a/mm/debug.c b/mm/debug.c
> index 2fdf0999e6f9..a092111920e7 100644
> --- a/mm/debug.c
> +++ b/mm/debug.c
> @@ -59,40 +59,109 @@ static const struct trace_print_flags pageflag_names[] = {
> #endif
> };
>
> +static const struct trace_print_flags vmaflags_names[] = {
> + {VM_READ, "read" },
> + {VM_WRITE, "write" },
> + {VM_EXEC, "exec" },
> + {VM_SHARED, "shared" },
> + {VM_MAYREAD, "mayread" },
> + {VM_MAYWRITE, "maywrite" },
> + {VM_MAYEXEC, "mayexec" },
> + {VM_MAYSHARE, "mayshare" },
> + {VM_GROWSDOWN, "growsdown" },
> + {VM_PFNMAP, "pfnmap" },
> + {VM_DENYWRITE, "denywrite" },
> + {VM_LOCKONFAULT, "lockonfault" },
> + {VM_LOCKED, "locked" },
> + {VM_IO, "io" },
> + {VM_SEQ_READ, "seqread" },
> + {VM_RAND_READ, "randread" },
> + {VM_DONTCOPY, "dontcopy" },
> + {VM_DONTEXPAND, "dontexpand" },
> + {VM_ACCOUNT, "account" },
> + {VM_NORESERVE, "noreserve" },
> + {VM_HUGETLB, "hugetlb" },
> +#if defined(CONFIG_X86)
> + {VM_PAT, "pat" },
> +#elif defined(CONFIG_PPC)
> + {VM_SAO, "sao" },
> +#elif defined(CONFIG_PARISC) || defined(CONFIG_METAG) || defined(CONFIG_IA64)
> + {VM_GROWSUP, "growsup" },
> +#elif !defined(CONFIG_MMU)
> + {VM_MAPPED_COPY, "mappedcopy" },
> +#else
> + {VM_ARCH_1, "arch_1" },
> +#endif
> + {VM_DONTDUMP, "dontdump" },
> +#ifdef CONFIG_MEM_SOFT_DIRTY
> + {VM_SOFTDIRTY, "softdirty" },
> +#endif
> + {VM_MIXEDMAP, "mixedmap" },
> + {VM_HUGEPAGE, "hugepage" },
> + {VM_NOHUGEPAGE, "nohugepage" },
> + {VM_MERGEABLE, "mergeable" },
> +};
> +
> static const struct trace_print_flags gfpflag_names[] = {
> __def_gfpflag_names
> };
>
> -static void dump_flag_names(unsigned long flags,
> - const struct trace_print_flags *names, int count)
> +static char *format_flag_names(unsigned long flags, unsigned long mask_out,
> + const struct trace_print_flags *names, int count,
> + char *buf, char *end)
> {
> const char *delim = "";
> unsigned long mask;
> int i;
>
> - pr_cont("(");
> + buf += snprintf(buf, end - buf, "%#lx(", flags);
> +
> + flags &= ~mask_out;
>
> for (i = 0; i < count && flags; i++) {
> + if (buf >= end)
> + break;
>
> mask = names[i].mask;
> if ((flags & mask) != mask)
> continue;
>
> flags &= ~mask;
> - pr_cont("%s%s", delim, names[i].name);
> + buf += snprintf(buf, end - buf, "%s%s", delim, names[i].name);
> delim = "|";
> }
>
> /* check for left over flags */
> - if (flags)
> - pr_cont("%s%#lx", delim, flags);
> + if (flags && (buf < end))
> + buf += snprintf(buf, end - buf, "%s%#lx", delim, flags);
> +
> + if (buf < end) {
> + *buf = ')';
> + buf++;
> + }
>
> - pr_cont(")\n");
> + return buf;
> }
>
> -void dump_gfpflag_names(unsigned long gfp_flags)
> +char *format_page_flags(unsigned long flags, char *buf, char *end)
> {
> - dump_flag_names(gfp_flags, gfpflag_names, ARRAY_SIZE(gfpflag_names));
> + /* remove zone id */
> + unsigned long mask = (1UL << NR_PAGEFLAGS) - 1;
> +
> + return format_flag_names(flags, ~mask, pageflag_names,
> + ARRAY_SIZE(pageflag_names), buf, end);
> +}
> +
> +char *format_vma_flags(unsigned long flags, char *buf, char *end)
> +{
> + return format_flag_names(flags, 0, vmaflags_names,
> + ARRAY_SIZE(vmaflags_names), buf, end);
> +}
> +
> +char *format_gfp_flags(gfp_t gfp_flags, char *buf, char *end)
> +{
> + return format_flag_names(gfp_flags, 0, gfpflag_names,
> + ARRAY_SIZE(gfpflag_names), buf, end);
> }
>
> void dump_page_badflags(struct page *page, const char *reason,
> @@ -108,18 +177,15 @@ void dump_page_badflags(struct page *page, const char *reason,
> pr_cont("\n");
> BUILD_BUG_ON(ARRAY_SIZE(pageflag_names) != __NR_PAGEFLAGS);
>
> - pr_emerg("flags: %#lx", printflags);
> + pr_emerg("flags: %pgp\n", &printflags);
> /* remove zone id */
> printflags &= (1UL << NR_PAGEFLAGS) - 1;
> - dump_flag_names(printflags, pageflag_names, ARRAY_SIZE(pageflag_names));
>
> if (reason)
> pr_alert("page dumped because: %s\n", reason);
> if (page->flags & badflags) {
> printflags = page->flags & badflags;
> - pr_alert("bad because of flags: %#lx:", printflags);
> - dump_flag_names(printflags, pageflag_names,
> - ARRAY_SIZE(pageflag_names));
> + pr_alert("bad because of flags: %pgp\n", &printflags);
> }
> #ifdef CONFIG_MEMCG
> if (page->mem_cgroup)
> @@ -136,63 +202,19 @@ EXPORT_SYMBOL(dump_page);
>
> #ifdef CONFIG_DEBUG_VM
>
> -static const struct trace_print_flags vmaflags_names[] = {
> - {VM_READ, "read" },
> - {VM_WRITE, "write" },
> - {VM_EXEC, "exec" },
> - {VM_SHARED, "shared" },
> - {VM_MAYREAD, "mayread" },
> - {VM_MAYWRITE, "maywrite" },
> - {VM_MAYEXEC, "mayexec" },
> - {VM_MAYSHARE, "mayshare" },
> - {VM_GROWSDOWN, "growsdown" },
> - {VM_PFNMAP, "pfnmap" },
> - {VM_DENYWRITE, "denywrite" },
> - {VM_LOCKONFAULT, "lockonfault" },
> - {VM_LOCKED, "locked" },
> - {VM_IO, "io" },
> - {VM_SEQ_READ, "seqread" },
> - {VM_RAND_READ, "randread" },
> - {VM_DONTCOPY, "dontcopy" },
> - {VM_DONTEXPAND, "dontexpand" },
> - {VM_ACCOUNT, "account" },
> - {VM_NORESERVE, "noreserve" },
> - {VM_HUGETLB, "hugetlb" },
> -#if defined(CONFIG_X86)
> - {VM_PAT, "pat" },
> -#elif defined(CONFIG_PPC)
> - {VM_SAO, "sao" },
> -#elif defined(CONFIG_PARISC) || defined(CONFIG_METAG) || defined(CONFIG_IA64)
> - {VM_GROWSUP, "growsup" },
> -#elif !defined(CONFIG_MMU)
> - {VM_MAPPED_COPY, "mappedcopy" },
> -#else
> - {VM_ARCH_1, "arch_1" },
> -#endif
> - {VM_DONTDUMP, "dontdump" },
> -#ifdef CONFIG_MEM_SOFT_DIRTY
> - {VM_SOFTDIRTY, "softdirty" },
> -#endif
> - {VM_MIXEDMAP, "mixedmap" },
> - {VM_HUGEPAGE, "hugepage" },
> - {VM_NOHUGEPAGE, "nohugepage" },
> - {VM_MERGEABLE, "mergeable" },
> -};
> -
> void dump_vma(const struct vm_area_struct *vma)
> {
> pr_emerg("vma %p start %p end %p\n"
> "next %p prev %p mm %p\n"
> "prot %lx anon_vma %p vm_ops %p\n"
> - "pgoff %lx file %p private_data %p\n",
> + "pgoff %lx file %p private_data %p\n"
> + "flags: %pgv\n",
> vma, (void *)vma->vm_start, (void *)vma->vm_end, vma->vm_next,
> vma->vm_prev, vma->vm_mm,
> (unsigned long)pgprot_val(vma->vm_page_prot),
> vma->anon_vma, vma->vm_ops, vma->vm_pgoff,
> - vma->vm_file, vma->vm_private_data);
> - pr_emerg("flags: %#lx", vma->vm_flags);
> - dump_flag_names(vma->vm_flags, vmaflags_names,
> - ARRAY_SIZE(vmaflags_names));
> + vma->vm_file, vma->vm_private_data,
> + &vma->vm_flags);
> }
> EXPORT_SYMBOL(dump_vma);
>
> @@ -263,9 +285,7 @@ void dump_mm(const struct mm_struct *mm)
> "" /* This is here to not have a comma! */
> );
>
> - pr_emerg("def_flags: %#lx", mm->def_flags);
> - dump_flag_names(mm->def_flags, vmaflags_names,
> - ARRAY_SIZE(vmaflags_names));
> + pr_emerg("def_flags: %pgv\n", &mm->def_flags);
> }
>
> #endif /* CONFIG_DEBUG_VM */
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 542d56c93209..63a68b62ee68 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -387,10 +387,9 @@ static void dump_header(struct oom_control *oc, struct task_struct *p,
> struct mem_cgroup *memcg)
> {
> pr_warning("%s invoked oom-killer: order=%d, oom_score_adj=%hd, "
> - "gfp_mask=0x%x",
> + "gfp_mask=%pgg\n",
> current->comm, oc->order, current->signal->oom_score_adj,
> - oc->gfp_mask);
> - dump_gfpflag_names(oc->gfp_mask);
> + &oc->gfp_mask);
>
> cpuset_print_current_mems_allowed();
> dump_stack();
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 80349acd8c17..77d2c75f80e4 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2711,9 +2711,8 @@ void warn_alloc_failed(gfp_t gfp_mask, unsigned int order, const char *fmt, ...)
> va_end(args);
> }
>
> - pr_warn("%s: page allocation failure: order:%u, mode:0x%x",
> - current->comm, order, gfp_mask);
> - dump_gfpflag_names(gfp_mask);
> + pr_warn("%s: page allocation failure: order:%u, mode:%pgg\n",
> + current->comm, order, &gfp_mask);
> dump_stack();
> if (!should_suppress_show_mem())
> show_mem(filter);
> diff --git a/mm/page_owner.c b/mm/page_owner.c
> index f4acd2452c35..ff862b6d12da 100644
> --- a/mm/page_owner.c
> +++ b/mm/page_owner.c
> @@ -208,9 +208,8 @@ void __dump_page_owner(struct page *page)
> return;
> }
>
> - pr_alert("page allocated via order %u, migratetype %s, gfp_mask 0x%x",
> - page_ext->order, migratetype_names[mt], gfp_mask);
> - dump_gfpflag_names(gfp_mask);
> + pr_alert("page allocated via order %u, migratetype %s, gfp_mask %pgg\n",
> + page_ext->order, migratetype_names[mt], &gfp_mask);
> print_stack_trace(&trace, 0);
>
> if (page_ext->last_migrate_reason != -1)
> --
> 2.6.3
>
i am thinking why not make %pg* to be more generic ?
not restricted to only GFP / vma flags / page flags .
so could we change format like this ?
define a flag spec struct to include flag and trace_print_flags and some other option :
typedef struct {
unsigned long flag;
struct trace_print_flags *flags;
unsigned long option; } flag_sec;
flag_sec my_flag;
in printk we only pass like this :
printk(“%pg\n”, &my_flag) ;
then it can print any flags defined by user .
more useful for other drivers to use .
Thanks
On 12/02/2015 12:01 PM, Rasmus Villemoes wrote:
> On Mon, Nov 30 2015, Vlastimil Babka <[email protected]> wrote:
>
> I'd prefer to have the formatting code in vsprintf.c, so that we'd avoid
> having to call vsnprintf recursively (and repeatedly - not that this is
> going to be used in hot paths, but if the box is going down it might be
> nice to get the debug info out a few thousand cycles earlier). That'll
> also make it easier to avoid the bugs below.
OK, I'll try.
>> diff --git a/Documentation/printk-formats.txt b/Documentation/printk-formats.txt
>> index b784c270105f..4b5156e74b09 100644
>> --- a/Documentation/printk-formats.txt
>> +++ b/Documentation/printk-formats.txt
>> @@ -292,6 +292,20 @@ Raw pointer value SHOULD be printed with %p. The kernel supports
>>
>> Passed by reference.
>>
>> +Flags bitfields such as page flags, gfp_flags:
>> +
>> + %pgp 0x1fffff8000086c(referenced|uptodate|lru|active|private)
>> + %pgg 0x24202c4(GFP_USER|GFP_DMA32|GFP_NOWARN)
>> + %pgv 0x875(read|exec|mayread|maywrite|mayexec|denywrite)
>> +
>
> I think it would be better (and more flexible) if %pg* only stood for
> printing the | chain of strings. Let people pass the flags twice if they
> also want the numeric value; then they're also able to choose 0-padding
> and whatnot, can use other kinds of parentheses, etc., etc. So
>
> pr_emerg("flags: 0x%08lu [%pgp]\n", printflags, &printflags)
I had it initially like this, but then thought it was somewhat repetitive and
all current users did use the same format. But I agree it's more generic to do
it as you say so I'll change it.
>> @@ -1361,6 +1362,29 @@ char *clock(char *buf, char *end, struct clk *clk, struct printf_spec spec,
>> }
>> }
>>
>> +static noinline_for_stack
>> +char *flags_string(char *buf, char *end, void *flags_ptr,
>> + struct printf_spec spec, const char *fmt)
>> +{
>> + unsigned long flags;
>> + gfp_t gfp_flags;
>> +
>> + switch (fmt[1]) {
>> + case 'p':
>> + flags = *(unsigned long *)flags_ptr;
>> + return format_page_flags(flags, buf, end);
>> + case 'v':
>> + flags = *(unsigned long *)flags_ptr;
>> + return format_vma_flags(flags, buf, end);
>> + case 'g':
>> + gfp_flags = *(gfp_t *)flags_ptr;
>> + return format_gfp_flags(gfp_flags, buf, end);
>> + default:
>> + WARN_ONCE(1, "Unsupported flags modifier: %c\n", fmt[1]);
>> + return 0;
>> + }
>> +}
>> +
>
> That return 0 aka return NULL will lead to an oops when the next thing
> is printed. Did you mean 'return buf;'?
Uh, right.
>>
>> -static void dump_flag_names(unsigned long flags,
>> - const struct trace_print_flags *names, int count)
>> +static char *format_flag_names(unsigned long flags, unsigned long mask_out,
>> + const struct trace_print_flags *names, int count,
>> + char *buf, char *end)
>> {
>> const char *delim = "";
>> unsigned long mask;
>> int i;
>>
>> - pr_cont("(");
>> + buf += snprintf(buf, end - buf, "%#lx(", flags);
>
> Sorry, you can't do it like this. The buf you've been passed from inside
> vsnprintf may be beyond end
Ah, didn't realize that :/
> , so end-buf is a negative number which will
> (get converted to a huge positive size_t and) trigger a WARN_ONCE and
> get you a return value of 0.
>
>
>> + flags &= ~mask_out;
>>
>> for (i = 0; i < count && flags; i++) {
>> + if (buf >= end)
>> + break;
>
> Even if you fix the above, this is also wrong. We have to return the
> length of the string that would be generated if there was room enough,
> so we cannot make an early return like this. As I said above, the
> easiest way to do that is to do it inside vsprintf.c, where we have
> e.g. string() available. So I'd do something like
>
>
> char *format_flags(char *buf, char *end, unsigned long flags,
> const struct trace_print_flags *names)
> {
> unsigned long mask;
> const struct printf_spec strspec = {/* appropriate defaults*/}
> const struct printf_spec numspec = {/* appropriate defaults*/}
>
> for ( ; flags && names->mask; names++) {
> mask = names->mask;
> if ((flags & mask) != mask)
> continue;
> flags &= ~mask;
> buf = string(buf, end, names->name, strspec);
> if (flags) {
> if (buf < end)
> *buf = '|';
> buf++;
> }
> }
> if (flags)
> buf = number(buf, end, flags, numspec);
> return buf;
> }
Thanks a lot for your review and suggestions!
> [where I've assumed that the trace_print_flags array is terminated with
> an entry with 0 mask. Passing its length is also possible, but maybe a
> little awkward if the arrays are defined in mm/ and contents depend on
> .config.]
> Then flags_string() would call this directly with an appropriate array
> for names, and we avoid the individual tiny helper
> functions. flags_string() can still do the mask_out thing for page
> flags, especially when/if the numeric and string representations are not
> done at the same time.
>
> Rasmus
Zero-terminated array is a good idea to get rid of the ARRAY_SIZE with helpers
needing to live in the same .c file etc.
But if I were to keep the array definitions in mm/debug.c with declarations
(which don't know the size yet) in e.g. <linux/mmdebug.h> (which lib/vsnprintf.c
would include so that format_flags() can reference them, is there a more elegant
way than the one below?
--- a/include/linux/mmdebug.h
+++ b/include/linux/mmdebug.h
@@ -7,6 +7,9 @@
struct page;
struct vm_area_struct;
struct mm_struct;
+struct trace_print_flags; // can't include trace_events.h here
+
+extern const struct trace_print_flags *pageflag_names;
extern void dump_page(struct page *page, const char *reason);
extern void dump_page_badflags(struct page *page, const char *reason,
diff --git a/mm/debug.c b/mm/debug.c
index a092111920e7..1cbc60544b87 100644
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -23,7 +23,7 @@ char *migrate_reason_names[MR_TYPES] = {
"cma",
};
-static const struct trace_print_flags pageflag_names[] = {
+const struct trace_print_flags __pageflag_names[] = {
{1UL << PG_locked, "locked" },
{1UL << PG_error, "error" },
{1UL << PG_referenced, "referenced" },
@@ -59,6 +59,8 @@ static const struct trace_print_flags pageflag_names[] = {
#endif
};
+const struct trace_print_flags *pageflag_names = &__pageflag_names[0];
On 12/02/2015 06:40 PM, yalin wang wrote:
(please trim your reply next time, no need to quote whole patch here)
> i am thinking why not make %pg* to be more generic ?
> not restricted to only GFP / vma flags / page flags .
> so could we change format like this ?
> define a flag spec struct to include flag and trace_print_flags and some other option :
> typedef struct {
> unsigned long flag;
> struct trace_print_flags *flags;
> unsigned long option; } flag_sec;
> flag_sec my_flag;
> in printk we only pass like this :
> printk(“%pg\n”, &my_flag) ;
> then it can print any flags defined by user .
> more useful for other drivers to use .
I don't know, it sounds quite complicated given that we had no flags printing
for years and now there's just three kinds of them. The extra struct flag_sec is
IMHO nuissance. No other printk format needs such thing AFAIK? For example, if I
were to print page flags from several places, each would have to define the
struct flag_sec instance, or some header would have to provide it?
I could maybe accept passing a flag value and trace_print_flags * as two
separate parameters, but I guess that breaks an ancient invariant of one
parameter per format string...
> Thanks
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Dec 2, 2015, at 13:04, Vlastimil Babka <[email protected]> wrote:
>
> On 12/02/2015 06:40 PM, yalin wang wrote:
>
> (please trim your reply next time, no need to quote whole patch here)
>
>> i am thinking why not make %pg* to be more generic ?
>> not restricted to only GFP / vma flags / page flags .
>> so could we change format like this ?
>> define a flag spec struct to include flag and trace_print_flags and some other option :
>> typedef struct {
>> unsigned long flag;
>> structtrace_print_flags *flags;
>> unsigned long option; } flag_sec;
>> flag_sec my_flag;
>> in printk we only pass like this :
>> printk(“%pg\n”, &my_flag) ;
>> then it can print any flags defined by user .
>> more useful for other drivers to use .
>
> I don't know, it sounds quite complicated given that we had no flags printing
> for years and now there's just three kinds of them. The extra struct flag_sec is
> IMHO nuissance. No other printk format needs such thing AFAIK? For example, if I
> were to print page flags from several places, each would have to define the
> struct flag_sec instance, or some header would have to provide it?
this can be avoided by provide a macro in header file .
we can add a new struct to declare trace_print_flags :
for example:
#define DECLARE_FLAG_PRINTK_FMT(name, flags_array) flag_spec name = { .flags = flags_array};
#define FLAG_PRINTK_FMT(name, flag) ({ name.flag = flag; &name})
in source code :
DECLARE_FLAG_PRINTK_FMT(my_flag, vmaflags_names);
printk(“%pg\n”, FLAG_PRINTK_FMT(my_flag, vma->flag));
i am not if DECLARE_FLAG_PRINTK_FMT and FLAG_PRINTK_FMT macro
can be defined into one macro ?
maybe need some trick here .
is it possible ?
Thanks
On Thu, Dec 03 2015, yalin wang <[email protected]> wrote:
>> On Dec 2, 2015, at 13:04, Vlastimil Babka <[email protected]> wrote:
>>
>> On 12/02/2015 06:40 PM, yalin wang wrote:
>>
>> (please trim your reply next time, no need to quote whole patch here)
>>
>>> i am thinking why not make %pg* to be more generic ?
>>> not restricted to only GFP / vma flags / page flags .
>>> so could we change format like this ?
>>> define a flag spec struct to include flag and trace_print_flags and some other option :
>>> typedef struct {
>>> unsigned long flag;
>>> structtrace_print_flags *flags;
>>> unsigned long option; } flag_sec;
>>> flag_sec my_flag;
>>> in printk we only pass like this :
>>> printk(“%pg\n”, &my_flag) ;
>>> then it can print any flags defined by user .
>>> more useful for other drivers to use .
>>
>> I don't know, it sounds quite complicated
Agreed, I think this would be premature generalization. There's also
some value in having the individual %pgX specifiers, as that allows
individual tweaks such as the mask_out for page flags.
given that we had no flags printing
>> for years and now there's just three kinds of them. The extra struct flag_sec is
>> IMHO nuissance. No other printk format needs such thing AFAIK? For example, if I
>> were to print page flags from several places, each would have to define the
>> struct flag_sec instance, or some header would have to provide it?
> this can be avoided by provide a macro in header file .
> we can add a new struct to declare trace_print_flags :
> for example:
> #define DECLARE_FLAG_PRINTK_FMT(name, flags_array) flag_spec name = { .flags = flags_array};
> #define FLAG_PRINTK_FMT(name, flag) ({ name.flag = flag; &name})
>
> in source code :
> DECLARE_FLAG_PRINTK_FMT(my_flag, vmaflags_names);
> printk(“%pg\n”, FLAG_PRINTK_FMT(my_flag, vma->flag));
>
Compared to printk("%pgv\n", &vma->flag), I know which I'd prefer to read.
> i am not if DECLARE_FLAG_PRINTK_FMT and FLAG_PRINTK_FMT macro
> can be defined into one macro ?
> maybe need some trick here .
>
> is it possible ?
Technically, I think the answer is yes, at least in C99 (and I suppose
gcc would accept it in gnu89 mode as well).
printk("%pg\n", &(struct flag_printer){.flags = my_flags, .names = vmaflags_names});
Not tested, and I still don't think it would be particularly readable
even when macroized
printk("%pg\n", PRINTF_VMAFLAGS(my_flags));
Rasmus
On Wed, Dec 02 2015, Vlastimil Babka <[email protected]> wrote:
>> [where I've assumed that the trace_print_flags array is terminated with
>> an entry with 0 mask. Passing its length is also possible, but maybe a
>> little awkward if the arrays are defined in mm/ and contents depend on
>> .config.]
...
>
>> Rasmus
>
> Zero-terminated array is a good idea to get rid of the ARRAY_SIZE with helpers
> needing to live in the same .c file etc.
>
> But if I were to keep the array definitions in mm/debug.c with declarations
> (which don't know the size yet) in e.g. <linux/mmdebug.h> (which lib/vsnprintf.c
> would include so that format_flags() can reference them, is there a more elegant
> way than the one below?
>
> --- a/include/linux/mmdebug.h
> +++ b/include/linux/mmdebug.h
> @@ -7,6 +7,9 @@
> struct page;
> struct vm_area_struct;
> struct mm_struct;
> +struct trace_print_flags; // can't include trace_events.h here
> +
> +extern const struct trace_print_flags *pageflag_names;
>
> extern void dump_page(struct page *page, const char *reason);
> extern void dump_page_badflags(struct page *page, const char *reason,
> diff --git a/mm/debug.c b/mm/debug.c
> index a092111920e7..1cbc60544b87 100644
> --- a/mm/debug.c
> +++ b/mm/debug.c
> @@ -23,7 +23,7 @@ char *migrate_reason_names[MR_TYPES] = {
> "cma",
> };
>
> -static const struct trace_print_flags pageflag_names[] = {
> +const struct trace_print_flags __pageflag_names[] = {
> {1UL << PG_locked, "locked" },
> {1UL << PG_error, "error" },
> {1UL << PG_referenced, "referenced" },
> @@ -59,6 +59,8 @@ static const struct trace_print_flags pageflag_names[] = {
> #endif
> };
>
> +const struct trace_print_flags *pageflag_names = &__pageflag_names[0];
Ugh. I think it would be better if either the definition of struct
trace_print_flags is moved somewhere where everybody can see it or to
make our own identical type definition. For now I'd go with the latter,
also since this doesn't really have anything to do with the tracing
subsystem. Then just declare the array in the header
extern const struct print_flags pageflag_names[];
(If you do the extra indirection thing, __pageflag_names could still be
static, and it would be best to declare the pointer itself const as
well, but I'd rather we don't go that way.)
Rasmus
On 12/03/2015 01:37 PM, Rasmus Villemoes wrote:
> On Wed, Dec 02 2015, Vlastimil Babka <[email protected]> wrote:
>> --- a/include/linux/mmdebug.h
>> +++ b/include/linux/mmdebug.h
>> @@ -7,6 +7,9 @@
>> struct page;
>> struct vm_area_struct;
>> struct mm_struct;
>> +struct trace_print_flags; // can't include trace_events.h here
>> +
>> +extern const struct trace_print_flags *pageflag_names;
>>
>> extern void dump_page(struct page *page, const char *reason);
>> extern void dump_page_badflags(struct page *page, const char *reason,
>> diff --git a/mm/debug.c b/mm/debug.c
>> index a092111920e7..1cbc60544b87 100644
>> --- a/mm/debug.c
>> +++ b/mm/debug.c
>> @@ -23,7 +23,7 @@ char *migrate_reason_names[MR_TYPES] = {
>> "cma",
>> };
>>
>> -static const struct trace_print_flags pageflag_names[] = {
>> +const struct trace_print_flags __pageflag_names[] = {
>> {1UL << PG_locked, "locked" },
>> {1UL << PG_error, "error" },
>> {1UL << PG_referenced, "referenced" },
>> @@ -59,6 +59,8 @@ static const struct trace_print_flags pageflag_names[] = {
>> #endif
>> };
>>
>> +const struct trace_print_flags *pageflag_names = &__pageflag_names[0];
>
> Ugh. I think it would be better if either the definition of struct
> trace_print_flags is moved somewhere where everybody can see it or to
> make our own identical type definition. For now I'd go with the latter,
> also since this doesn't really have anything to do with the tracing
> subsystem. Then just declare the array in the header
>
> extern const struct print_flags pageflag_names[];
Ugh so yesterday I copy/pasted the definition and still got an error, which I
probably didn't read closely enough. I assumed that if it needs the full
definition of "struct trace_print_flags" here to know the size, it would also
need to know the lenght of the array as well.
But now it works. Well, copy/pasting the definition fails as long as both
headers are included and it's redefining the struct (even though it's the same
thing). But looks like I can move it from trace_events.h to tracepoint.h and it
won't blow off (knock knock).
I suck at C.
> (If you do the extra indirection thing, __pageflag_names could still be
> static, and it would be best to declare the pointer itself const as
> well, but I'd rather we don't go that way.)
>
> Rasmus
>
> On Dec 3, 2015, at 00:03, Rasmus Villemoes <[email protected]> wrote:
>
> On Thu, Dec 03 2015, yalin wang <[email protected]> wrote:
>
>>> On Dec 2, 2015, at 13:04, Vlastimil Babka <[email protected]> wrote:
>>>
>>> On 12/02/2015 06:40 PM, yalin wang wrote:
>>>
>>> (please trim your reply next time, no need to quote whole patch here)
>>>
>>>> i am thinking why not make %pg* to be more generic ?
>>>> not restricted to only GFP / vma flags / page flags .
>>>> so could we change format like this ?
>>>> define a flag spec struct to include flag and trace_print_flags and some other option :
>>>> typedef struct {
>>>> unsigned long flag;
>>>> structtrace_print_flags *flags;
>>>> unsigned long option; } flag_sec;
>>>> flag_sec my_flag;
>>>> in printk we only pass like this :
>>>> printk(“%pg\n”, &my_flag) ;
>>>> then it can print any flags defined by user .
>>>> more useful for other drivers to use .
>>>
>>> I don't know, it sounds quite complicated
>
> Agreed, I think this would be premature generalization. There's also
> some value in having the individual %pgX specifiers, as that allows
> individual tweaks such as the mask_out for page flags.
>
> given that we had no flags printing
>>
if we use this generic method, %pgX where X can be used to specify some flag to
mask out some thing . it will be great .
>
> Compared to printk("%pgv\n", &vma->flag), I know which I'd prefer to read.
>
>> i am not if DECLARE_FLAG_PRINTK_FMT and FLAG_PRINTK_FMT macro
>> can be defined into one macro ?
>> maybe need some trick here .
>>
>> is it possible ?
>
> Technically, I think the answer is yes, at least in C99 (and I suppose
> gcc would accept it in gnu89 mode as well).
>
> printk("%pg\n", &(struct flag_printer){.flags = my_flags, .names = vmaflags_names});
>
> Not tested, and I still don't think it would be particularly readable
> even when macroized
>
> printk("%pg\n", PRINTF_VMAFLAGS(my_flags));
i test on gcc 4.9.3, it can work for this method,
so the final solution like this:
printk.h:
struct flag_fmt_spec {
unsigned long flag;
struct trace_print_flags *flags;
int array_size;
char delimiter; }
#define FLAG_FORMAT(flag, flag_array, delimiter) (&(struct flag_ft_spec){ .flag = flag, .flags = flag_array, .array_size = ARRAY_SIZE(flag_array), .delimiter = delimiter})
#define VMA_FLAG_FORMAT(flag) FLAG_FORMAT(flag, vmaflags_names, ‘|')
source code:
printk("%pg\n", VMA_FLAG_FORMAT(my_flags));
that’s all, see cpumask_pr_args(masks) macro,
it also use macro and %*pb to print cpu mask .
i think this method is not very complex to use .
search source code ,
there is lots of printk to print flag into hex number :
$ grep -n -r 'printk.*flag.*%x’ .
it will be great if this flag string print is generic.
Thanks
>> Technically, I think the answer is yes, at least in C99 (and I suppose
>> gcc would accept it in gnu89 mode as well).
>>
>> printk("%pg\n", &(struct flag_printer){.flags = my_flags, .names = vmaflags_names});
>>
>> Not tested, and I still don't think it would be particularly readable
>> even when macroized
>>
>> printk("%pg\n", PRINTF_VMAFLAGS(my_flags));
> i test on gcc 4.9.3, it can work for this method,
> so the final solution like this:
> printk.h:
> struct flag_fmt_spec {
> unsigned long flag;
> struct trace_print_flags *flags;
> int array_size;
> char delimiter; }
>
> #define FLAG_FORMAT(flag, flag_array, delimiter) (&(struct flag_ft_spec){ .flag = flag, .flags = flag_array, .array_size = ARRAY_SIZE(flag_array), .delimiter = delimiter})
> #define VMA_FLAG_FORMAT(flag) FLAG_FORMAT(flag, vmaflags_names, ‘|’)
a little change:
#define VMA_FLAG_FORMAT(vma) FLAG_FORMAT(vma->vm_flags, vmaflags_names, ‘|’)
> source code:
> printk("%pg\n", VMA_FLAG_FORMAT(my_flags));
a little change:
printk("%pg\n", VMA_FLAG_FORMAT(vma));
>
> that’s all, see cpumask_pr_args(masks) macro,
> it also use macro and %*pb to print cpu mask .
> i think this method is not very complex to use .
>
> search source code ,
> there is lots of printk to print flag into hex number :
> $ grep -n -r 'printk.*flag.*%x’ .
> it will be great if this flag string print is generic.
>
> Thanks
On 12/03/2015 07:38 PM, yalin wang wrote:
> that’s all, see cpumask_pr_args(masks) macro,
> it also use macro and %*pb to print cpu mask .
> i think this method is not very complex to use .
Well, one also has to write the appropriate translation tables.
> search source code ,
> there is lots of printk to print flag into hex number :
> $ grep -n -r 'printk.*flag.*%x’ .
> it will be great if this flag string print is generic.
I think it can always be done later, this is an internal API. For now we
just have 3 quite generic flags, so let's not over-engineer things right
now.
> Thanks
In mm we use several kinds of flags bitfields that are sometimes printed for
debugging purposes, or exported to userspace via sysfs. To make them easier to
interpret independently on kernel version and config, we want to dump also the
symbolic flag names. So far this has been done with repeated calls to
pr_cont(), which is unreliable on SMP, and not usable for e.g. sysfs export.
To get a more reliable and universal solution, this patch extends printk()
format string for pointers to handle the page flags (%pgp), gfp_flags (%pgg)
and vma flags (%pgv). Existing users of dump_flag_names() are converted and
simplified.
It would be possible to pass flags by value instead of pointer, but the %p
format string for pointers already has extensions for various kernel
structures, so it's a good fit, and the extra indirection in a non-critical
path is negligible.
[[email protected]: lots of good implementation suggestions]
Signed-off-by: Vlastimil Babka <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Sasha Levin <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Michal Hocko <[email protected]>
---
Should hopefully apply to mmots (release mmotm doesn't have the 3 -fix patches yet?)
Documentation/printk-formats.txt | 14 ++++
include/linux/mmdebug.h | 7 +-
include/linux/trace_events.h | 10 ---
include/linux/tracepoint.h | 10 +++
lib/vsprintf.c | 75 ++++++++++++++++++++++
mm/debug.c | 135 ++++++++++++++-------------------------
mm/oom_kill.c | 5 +-
mm/page_alloc.c | 5 +-
mm/page_owner.c | 6 +-
9 files changed, 160 insertions(+), 107 deletions(-)
diff --git a/Documentation/printk-formats.txt b/Documentation/printk-formats.txt
index b784c270105f..8b6ab00fcfc9 100644
--- a/Documentation/printk-formats.txt
+++ b/Documentation/printk-formats.txt
@@ -292,6 +292,20 @@ Raw pointer value SHOULD be printed with %p. The kernel supports
Passed by reference.
+Flags bitfields such as page flags, gfp_flags:
+
+ %pgp referenced|uptodate|lru|active|private
+ %pgg GFP_USER|GFP_DMA32|GFP_NOWARN
+ %pgv read|exec|mayread|maywrite|mayexec|denywrite
+
+ For printing flags bitfields as a collection of symbolic constants that
+ would construct the value. The type of flags is given by the third
+ character. Currently supported are [p]age flags, [g]fp_flags and
+ [v]ma_flags. The flag names and print order depends on the particular
+ type.
+
+ Passed by reference.
+
Network device features:
%pNF 0x000000000000c000
diff --git a/include/linux/mmdebug.h b/include/linux/mmdebug.h
index 3b77fab7ad28..2c8286cf162e 100644
--- a/include/linux/mmdebug.h
+++ b/include/linux/mmdebug.h
@@ -2,15 +2,20 @@
#define LINUX_MM_DEBUG_H 1
#include <linux/stringify.h>
+#include <linux/types.h>
+#include <linux/tracepoint.h>
struct page;
struct vm_area_struct;
struct mm_struct;
+extern const struct trace_print_flags pageflag_names[];
+extern const struct trace_print_flags vmaflag_names[];
+extern const struct trace_print_flags gfpflag_names[];
+
extern void dump_page(struct page *page, const char *reason);
extern void dump_page_badflags(struct page *page, const char *reason,
unsigned long badflags);
-extern void dump_gfpflag_names(unsigned long gfp_flags);
void dump_vma(const struct vm_area_struct *vma);
void dump_mm(const struct mm_struct *mm);
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 429fdfc3baf5..d91404f89ff2 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -15,16 +15,6 @@ struct tracer;
struct dentry;
struct bpf_prog;
-struct trace_print_flags {
- unsigned long mask;
- const char *name;
-};
-
-struct trace_print_flags_u64 {
- unsigned long long mask;
- const char *name;
-};
-
const char *trace_print_flags_seq(struct trace_seq *p, const char *delim,
unsigned long flags,
const struct trace_print_flags *flag_array);
diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index 7834a8a8bf1e..a5d0ab46724d 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -43,6 +43,16 @@ struct trace_enum_map {
unsigned long enum_value;
};
+struct trace_print_flags {
+ unsigned long mask;
+ const char *name;
+};
+
+struct trace_print_flags_u64 {
+ unsigned long long mask;
+ const char *name;
+};
+
#define TRACEPOINT_DEFAULT_PRIO 10
extern int
diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index f9cee8e1233c..9a0697b14ea3 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -31,6 +31,7 @@
#include <linux/dcache.h>
#include <linux/cred.h>
#include <net/addrconf.h>
+#include <linux/mmdebug.h>
#include <asm/page.h> /* for PAGE_SIZE */
#include <asm/sections.h> /* for dereference_function_descriptor() */
@@ -1361,6 +1362,73 @@ char *clock(char *buf, char *end, struct clk *clk, struct printf_spec spec,
}
}
+static
+char *format_flags(char *buf, char *end, unsigned long flags,
+ const struct trace_print_flags *names)
+{
+ unsigned long mask;
+ const struct printf_spec strspec = {
+ .field_width = -1,
+ .precision = -1,
+ };
+ const struct printf_spec numspec = {
+ .flags = SPECIAL|SMALL,
+ .field_width = -1,
+ .precision = -1,
+ .base = 16,
+ };
+
+ for ( ; flags && (names->mask || names->name); names++) {
+ mask = names->mask;
+ if ((flags & mask) != mask)
+ continue;
+
+ buf = string(buf, end, names->name, strspec);
+
+ flags &= ~mask;
+ if (flags) {
+ if (buf < end)
+ *buf = '|';
+ buf++;
+ }
+ }
+
+ if (flags)
+ buf = number(buf, end, flags, numspec);
+
+ return buf;
+}
+
+static noinline_for_stack
+char *flags_string(char *buf, char *end, void *flags_ptr,
+ struct printf_spec spec, const char *fmt)
+{
+ unsigned long flags;
+ const struct trace_print_flags *names;
+
+ switch (fmt[1]) {
+ case 'p':
+ flags = *(unsigned long *)flags_ptr;
+ /* Remove zone id */
+ flags &= (1UL << NR_PAGEFLAGS) - 1;
+ names = pageflag_names;
+ break;
+ case 'v':
+ flags = *(unsigned long *)flags_ptr;
+ names = vmaflag_names;
+ break;
+ case 'g':
+ flags = *(gfp_t *)flags_ptr;
+ names = gfpflag_names;
+ break;
+ default:
+ WARN_ONCE(1, "Unsupported flags modifier: %c\n", fmt[1]);
+ return buf;
+ }
+
+ return format_flags(buf, end, flags, names);
+}
+
int kptr_restrict __read_mostly;
/*
@@ -1448,6 +1516,11 @@ int kptr_restrict __read_mostly;
* - 'Cn' For a clock, it prints the name (Common Clock Framework) or address
* (legacy clock framework) of the clock
* - 'Cr' For a clock, it prints the current rate of the clock
+ * - 'g' For flags to be printed as a collection of symbolic strings that would
+ * construct the specific value. Supported flags given by option:
+ * p page flags (see struct page) given as pointer to unsigned long
+ * g gfp flags (GFP_* and __GFP_*) given as pointer to gfp_t
+ * v vma flags (VM_*) given as pointer to unsigned long
*
* ** Please update also Documentation/printk-formats.txt when making changes **
*
@@ -1600,6 +1673,8 @@ char *pointer(const char *fmt, char *buf, char *end, void *ptr,
return dentry_name(buf, end,
((const struct file *)ptr)->f_path.dentry,
spec, fmt);
+ case 'g':
+ return flags_string(buf, end, ptr, spec, fmt);
}
spec.flags |= SMALL;
if (spec.field_width == -1) {
diff --git a/mm/debug.c b/mm/debug.c
index 9416524839d8..c42bb4c13c2d 100644
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -23,7 +23,7 @@ char *migrate_reason_names[MR_TYPES] = {
"cma",
};
-static const struct trace_print_flags pageflag_names[] = {
+const struct trace_print_flags pageflag_names[] = {
{1UL << PG_locked, "locked" },
{1UL << PG_error, "error" },
{1UL << PG_referenced, "referenced" },
@@ -57,86 +57,10 @@ static const struct trace_print_flags pageflag_names[] = {
{1UL << PG_young, "young" },
{1UL << PG_idle, "idle" },
#endif
+ {0, NULL },
};
-static const struct trace_print_flags gfpflag_names[] = {
- __def_gfpflag_names
-};
-
-static void dump_flag_names(unsigned long flags,
- const struct trace_print_flags *names, int count)
-{
- const char *delim = "";
- unsigned long mask;
- int i;
-
- pr_cont("(");
-
- for (i = 0; i < count && flags; i++) {
-
- mask = names[i].mask;
- if ((flags & mask) != mask)
- continue;
-
- flags &= ~mask;
- pr_cont("%s%s", delim, names[i].name);
- delim = "|";
- }
-
- /* check for left over flags */
- if (flags)
- pr_cont("%s%#lx", delim, flags);
-
- pr_cont(")\n");
-}
-
-void dump_gfpflag_names(unsigned long gfp_flags)
-{
- dump_flag_names(gfp_flags, gfpflag_names, ARRAY_SIZE(gfpflag_names));
-}
-
-void dump_page_badflags(struct page *page, const char *reason,
- unsigned long badflags)
-{
- unsigned long printflags = page->flags;
-
- pr_emerg("page:%p count:%d mapcount:%d mapping:%p index:%#lx",
- page, atomic_read(&page->_count), page_mapcount(page),
- page->mapping, page->index);
- if (PageCompound(page))
- pr_cont(" compound_mapcount: %d", compound_mapcount(page));
- pr_cont("\n");
- BUILD_BUG_ON(ARRAY_SIZE(pageflag_names) != __NR_PAGEFLAGS);
-
- pr_emerg("flags: %#lx", printflags);
- /* remove zone id */
- printflags &= (1UL << NR_PAGEFLAGS) - 1;
- dump_flag_names(printflags, pageflag_names, ARRAY_SIZE(pageflag_names));
-
- if (reason)
- pr_alert("page dumped because: %s\n", reason);
- if (page->flags & badflags) {
- printflags = page->flags & badflags;
- pr_alert("bad because of flags: %#lx:", printflags);
- dump_flag_names(printflags, pageflag_names,
- ARRAY_SIZE(pageflag_names));
- }
-#ifdef CONFIG_MEMCG
- if (page->mem_cgroup)
- pr_alert("page->mem_cgroup:%p\n", page->mem_cgroup);
-#endif
-}
-
-void dump_page(struct page *page, const char *reason)
-{
- dump_page_badflags(page, reason, 0);
- dump_page_owner(page);
-}
-EXPORT_SYMBOL(dump_page);
-
-#ifdef CONFIG_DEBUG_VM
-
-static const struct trace_print_flags vmaflags_names[] = {
+const struct trace_print_flags vmaflag_names[] = {
{VM_READ, "read" },
{VM_WRITE, "write" },
{VM_EXEC, "exec" },
@@ -177,22 +101,61 @@ static const struct trace_print_flags vmaflags_names[] = {
{VM_HUGEPAGE, "hugepage" },
{VM_NOHUGEPAGE, "nohugepage" },
{VM_MERGEABLE, "mergeable" },
+ {0, NULL },
+};
+
+const struct trace_print_flags gfpflag_names[] = {
+ __def_gfpflag_names,
+ {0, NULL},
};
+void dump_page_badflags(struct page *page, const char *reason,
+ unsigned long badflags)
+{
+ unsigned long printflags = page->flags;
+
+ pr_emerg("page:%p count:%d mapcount:%d mapping:%p index:%#lx\n",
+ page, atomic_read(&page->_count), page_mapcount(page),
+ page->mapping, page->index);
+ BUILD_BUG_ON(ARRAY_SIZE(pageflag_names) != __NR_PAGEFLAGS + 1);
+
+ pr_emerg("flags: %#lx(%pgp)\n", printflags, &printflags);
+
+ if (reason)
+ pr_alert("page dumped because: %s\n", reason);
+ if (page->flags & badflags) {
+ printflags = page->flags & badflags;
+ pr_alert("bad because of flags: %#lx(%pgp)\n", printflags,
+ &printflags);
+ }
+#ifdef CONFIG_MEMCG
+ if (page->mem_cgroup)
+ pr_alert("page->mem_cgroup:%p\n", page->mem_cgroup);
+#endif
+}
+
+void dump_page(struct page *page, const char *reason)
+{
+ dump_page_badflags(page, reason, 0);
+ dump_page_owner(page);
+}
+EXPORT_SYMBOL(dump_page);
+
+#ifdef CONFIG_DEBUG_VM
+
void dump_vma(const struct vm_area_struct *vma)
{
pr_emerg("vma %p start %p end %p\n"
"next %p prev %p mm %p\n"
"prot %lx anon_vma %p vm_ops %p\n"
- "pgoff %lx file %p private_data %p\n",
+ "pgoff %lx file %p private_data %p\n"
+ "flags: %#lx(%pgv)\n",
vma, (void *)vma->vm_start, (void *)vma->vm_end, vma->vm_next,
vma->vm_prev, vma->vm_mm,
(unsigned long)pgprot_val(vma->vm_page_prot),
vma->anon_vma, vma->vm_ops, vma->vm_pgoff,
- vma->vm_file, vma->vm_private_data);
- pr_emerg("flags: %#lx", vma->vm_flags);
- dump_flag_names(vma->vm_flags, vmaflags_names,
- ARRAY_SIZE(vmaflags_names));
+ vma->vm_file, vma->vm_private_data,
+ vma->vm_flags, &vma->vm_flags);
}
EXPORT_SYMBOL(dump_vma);
@@ -263,9 +226,7 @@ void dump_mm(const struct mm_struct *mm)
"" /* This is here to not have a comma! */
);
- pr_emerg("def_flags: %#lx", mm->def_flags);
- dump_flag_names(mm->def_flags, vmaflags_names,
- ARRAY_SIZE(vmaflags_names));
+ pr_emerg("def_flags: %#lx(%pgv)\n", mm->def_flags, &mm->def_flags);
}
#endif /* CONFIG_DEBUG_VM */
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 542d56c93209..07bba5a46b47 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -387,10 +387,9 @@ static void dump_header(struct oom_control *oc, struct task_struct *p,
struct mem_cgroup *memcg)
{
pr_warning("%s invoked oom-killer: order=%d, oom_score_adj=%hd, "
- "gfp_mask=0x%x",
+ "gfp_mask=%#x(%pgg)\n",
current->comm, oc->order, current->signal->oom_score_adj,
- oc->gfp_mask);
- dump_gfpflag_names(oc->gfp_mask);
+ oc->gfp_mask, &oc->gfp_mask);
cpuset_print_current_mems_allowed();
dump_stack();
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d6d7c97c0b28..bd94c7465dea 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2711,9 +2711,8 @@ void warn_alloc_failed(gfp_t gfp_mask, unsigned int order, const char *fmt, ...)
va_end(args);
}
- pr_warn("%s: page allocation failure: order:%u, mode:0x%x",
- current->comm, order, gfp_mask);
- dump_gfpflag_names(gfp_mask);
+ pr_warn("%s: page allocation failure: order:%u, mode:%#x(%pgg)\n",
+ current->comm, order, gfp_mask, &gfp_mask);
dump_stack();
if (!should_suppress_show_mem())
show_mem(filter);
diff --git a/mm/page_owner.c b/mm/page_owner.c
index f4acd2452c35..313251f36d86 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -208,9 +208,9 @@ void __dump_page_owner(struct page *page)
return;
}
- pr_alert("page allocated via order %u, migratetype %s, gfp_mask 0x%x",
- page_ext->order, migratetype_names[mt], gfp_mask);
- dump_gfpflag_names(gfp_mask);
+ pr_alert("page allocated via order %u, migratetype %s, "
+ "gfp_mask %#x(%pgg)\n", page_ext->order,
+ migratetype_names[mt], gfp_mask, &gfp_mask);
print_stack_trace(&trace, 0);
if (page_ext->last_migrate_reason != -1)
--
2.6.3
With the new format strings for flags, we can now provide symbolic page and gfp
flags in the /sys/kernel/debug/page_owner file. This replaces the positional
printing of page flags as single letters, which might have looked nicer, but
was limited to a subset of flags, and required the user to remember the
letters.
Example of the adjusted format:
Page allocated via order 0, mask 0x24213ca(GFP_HIGHUSER_MOVABLE|GFP_COLD|GFP_NOWARN|GFP_NORETRY)
PFN 674308 type Movable Block 1317 type Movable Flags 0x1fffff80010068(uptodate|lru|active|mappedtodisk)
[<ffffffff81164e9a>] __alloc_pages_nodemask+0x15a/0xa30
[<ffffffff811ab938>] alloc_pages_current+0x88/0x120
[<ffffffff8115bc46>] __page_cache_alloc+0xe6/0x120
[<ffffffff81168b9b>] __do_page_cache_readahead+0xdb/0x200
[<ffffffff81168df5>] ondemand_readahead+0x135/0x260
[<ffffffff81168f8c>] page_cache_async_readahead+0x6c/0x70
[<ffffffff8115d5f8>] generic_file_read_iter+0x378/0x590
[<ffffffff811d12a7>] __vfs_read+0xa7/0xd0
Page has been migrated, last migrate reason: compaction
Signed-off-by: Vlastimil Babka <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Sasha Levin <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Michal Hocko <[email protected]>
---
mm/page_owner.c | 20 +++++---------------
1 file changed, 5 insertions(+), 15 deletions(-)
diff --git a/mm/page_owner.c b/mm/page_owner.c
index 313251f36d86..011377548b4f 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -135,8 +135,9 @@ print_page_owner(char __user *buf, size_t count, unsigned long pfn,
return -ENOMEM;
ret = snprintf(kbuf, count,
- "Page allocated via order %u, mask 0x%x\n",
- page_ext->order, page_ext->gfp_mask);
+ "Page allocated via order %u, mask %#x(%pgg)\n",
+ page_ext->order, page_ext->gfp_mask,
+ &page_ext->gfp_mask);
if (ret >= count)
goto err;
@@ -145,23 +146,12 @@ print_page_owner(char __user *buf, size_t count, unsigned long pfn,
pageblock_mt = get_pfnblock_migratetype(page, pfn);
page_mt = gfpflags_to_migratetype(page_ext->gfp_mask);
ret += snprintf(kbuf + ret, count - ret,
- "PFN %lu type %s Block %lu type %s Flags %s%s%s%s%s%s%s%s%s%s%s%s\n",
+ "PFN %lu type %s Block %lu type %s Flags %#lx(%pgp)\n",
pfn,
migratetype_names[page_mt],
pfn >> pageblock_order,
migratetype_names[pageblock_mt],
- PageLocked(page) ? "K" : " ",
- PageError(page) ? "E" : " ",
- PageReferenced(page) ? "R" : " ",
- PageUptodate(page) ? "U" : " ",
- PageDirty(page) ? "D" : " ",
- PageLRU(page) ? "L" : " ",
- PageActive(page) ? "A" : " ",
- PageSlab(page) ? "S" : " ",
- PageWriteback(page) ? "W" : " ",
- PageCompound(page) ? "C" : " ",
- PageSwapCache(page) ? "B" : " ",
- PageMappedToDisk(page) ? "M" : " ");
+ page->flags, &page->flags);
if (ret >= count)
goto err;
--
2.6.3
Since bad_page() is the only user of the badflags parameter of
dump_page_badflags(), we can move the code to bad_page() and simplify a bit.
The dump_page_badflags() function is renamed to __dump_page() and can still be
called separately from dump_page() for temporary debug prints where page_owner
info is not desired.
The only user-visible change is that page->mem_cgroup is printed before the bad
flags.
Signed-off-by: Vlastimil Babka <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Sasha Levin <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Michal Hocko <[email protected]>
---
include/linux/mmdebug.h | 3 +--
mm/debug.c | 14 +++-----------
mm/page_alloc.c | 10 +++++++---
3 files changed, 11 insertions(+), 16 deletions(-)
diff --git a/include/linux/mmdebug.h b/include/linux/mmdebug.h
index 2c8286cf162e..9b0dc2161f7a 100644
--- a/include/linux/mmdebug.h
+++ b/include/linux/mmdebug.h
@@ -14,8 +14,7 @@ extern const struct trace_print_flags vmaflag_names[];
extern const struct trace_print_flags gfpflag_names[];
extern void dump_page(struct page *page, const char *reason);
-extern void dump_page_badflags(struct page *page, const char *reason,
- unsigned long badflags);
+extern void __dump_page(struct page *page, const char *reason);
void dump_vma(const struct vm_area_struct *vma);
void dump_mm(const struct mm_struct *mm);
diff --git a/mm/debug.c b/mm/debug.c
index c42bb4c13c2d..d36aa66e9779 100644
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -109,25 +109,17 @@ const struct trace_print_flags gfpflag_names[] = {
{0, NULL},
};
-void dump_page_badflags(struct page *page, const char *reason,
- unsigned long badflags)
+void __dump_page(struct page *page, const char *reason)
{
- unsigned long printflags = page->flags;
-
pr_emerg("page:%p count:%d mapcount:%d mapping:%p index:%#lx\n",
page, atomic_read(&page->_count), page_mapcount(page),
page->mapping, page->index);
BUILD_BUG_ON(ARRAY_SIZE(pageflag_names) != __NR_PAGEFLAGS + 1);
- pr_emerg("flags: %#lx(%pgp)\n", printflags, &printflags);
+ pr_emerg("flags: %#lx(%pgp)\n", page->flags, &page->flags);
if (reason)
pr_alert("page dumped because: %s\n", reason);
- if (page->flags & badflags) {
- printflags = page->flags & badflags;
- pr_alert("bad because of flags: %#lx(%pgp)\n", printflags,
- &printflags);
- }
#ifdef CONFIG_MEMCG
if (page->mem_cgroup)
pr_alert("page->mem_cgroup:%p\n", page->mem_cgroup);
@@ -136,7 +128,7 @@ void dump_page_badflags(struct page *page, const char *reason,
void dump_page(struct page *page, const char *reason)
{
- dump_page_badflags(page, reason, 0);
+ __dump_page(page, reason);
dump_page_owner(page);
}
EXPORT_SYMBOL(dump_page);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index bd94c7465dea..2388496a7c6c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -435,7 +435,7 @@ static void bad_page(struct page *page, const char *reason,
goto out;
}
if (nr_unshown) {
- printk(KERN_ALERT
+ pr_alert(
"BUG: Bad page state: %lu messages suppressed\n",
nr_unshown);
nr_unshown = 0;
@@ -445,9 +445,13 @@ static void bad_page(struct page *page, const char *reason,
if (nr_shown++ == 0)
resume = jiffies + 60 * HZ;
- printk(KERN_ALERT "BUG: Bad page state in process %s pfn:%05lx\n",
+ pr_alert("BUG: Bad page state in process %s pfn:%05lx\n",
current->comm, page_to_pfn(page));
- dump_page_badflags(page, reason, bad_flags);
+ __dump_page(page, reason);
+ bad_flags &= page->flags;
+ if (bad_flags)
+ pr_alert("bad because of flags: %#lx(%pgp)\n",
+ bad_flags, &bad_flags);
dump_page_owner(page);
print_modules();
--
2.6.3
On Fri, Dec 04 2015, Vlastimil Babka <[email protected]> wrote:
> In mm we use several kinds of flags bitfields that are sometimes printed for
> debugging purposes, or exported to userspace via sysfs. To make them easier to
> interpret independently on kernel version and config, we want to dump also the
> symbolic flag names. So far this has been done with repeated calls to
> pr_cont(), which is unreliable on SMP, and not usable for e.g. sysfs export.
>
> To get a more reliable and universal solution, this patch extends printk()
> format string for pointers to handle the page flags (%pgp), gfp_flags (%pgg)
> and vma flags (%pgv).
Hm, with that $subject, I'd expect the patch to touch little beyond
lib/vsprintf.c and Documentation/printk-formats.txt (plus whatever might
be needed to make the name arrays accessible).
> Existing users of dump_flag_names() are converted and simplified.
If you do a respin, please do that part in a separate patch. That would
make it a lot easier to review.
> diff --git a/Documentation/printk-formats.txt b/Documentation/printk-formats.txt
> index b784c270105f..8b6ab00fcfc9 100644
> --- a/Documentation/printk-formats.txt
> +++ b/Documentation/printk-formats.txt
> @@ -292,6 +292,20 @@ Raw pointer value SHOULD be printed with %p. The kernel supports
>
> Passed by reference.
>
> +Flags bitfields such as page flags, gfp_flags:
> +
> + %pgp referenced|uptodate|lru|active|private
> + %pgg GFP_USER|GFP_DMA32|GFP_NOWARN
> + %pgv read|exec|mayread|maywrite|mayexec|denywrite
> +
> + For printing flags bitfields as a collection of symbolic constants that
> + would construct the value. The type of flags is given by the third
> + character. Currently supported are [p]age flags, [g]fp_flags and
> + [v]ma_flags. The flag names and print order depends on the particular
> + type.
> +
> + Passed by reference.
> +
It should probably be emphasized that %pgp and %pgv expect (unsigned
long*), while %pgg expect (gfp_t*), just as you do in vsprintf.c.
> Network device features:
>
> %pNF 0x000000000000c000
> diff --git a/include/linux/mmdebug.h b/include/linux/mmdebug.h
> index 3b77fab7ad28..2c8286cf162e 100644
> --- a/include/linux/mmdebug.h
> +++ b/include/linux/mmdebug.h
> @@ -2,15 +2,20 @@
> #define LINUX_MM_DEBUG_H 1
>
> #include <linux/stringify.h>
> +#include <linux/types.h>
> +#include <linux/tracepoint.h>
>
How much header bloat does it cause by making all users of mmdebug.h
also include tracepoint.h? Couldn't we put the struct definition in
types.h instead?
> struct page;
> struct vm_area_struct;
> struct mm_struct;
>
> +extern const struct trace_print_flags pageflag_names[];
> +extern const struct trace_print_flags vmaflag_names[];
> +extern const struct trace_print_flags gfpflag_names[];
> +
> extern void dump_page(struct page *page, const char *reason);
> extern void dump_page_badflags(struct page *page, const char *reason,
> unsigned long badflags);
> -extern void dump_gfpflag_names(unsigned long gfp_flags);
> void dump_vma(const struct vm_area_struct *vma);
> void dump_mm(const struct mm_struct *mm);
>
>
> extern int
> diff --git a/lib/vsprintf.c b/lib/vsprintf.c
> index f9cee8e1233c..9a0697b14ea3 100644
> --- a/lib/vsprintf.c
> +++ b/lib/vsprintf.c
> @@ -31,6 +31,7 @@
> #include <linux/dcache.h>
> #include <linux/cred.h>
> #include <net/addrconf.h>
> +#include <linux/mmdebug.h>
>
> #include <asm/page.h> /* for PAGE_SIZE */
> #include <asm/sections.h> /* for dereference_function_descriptor() */
> @@ -1361,6 +1362,73 @@ char *clock(char *buf, char *end, struct clk *clk, struct printf_spec spec,
> }
> }
>
> +static
> +char *format_flags(char *buf, char *end, unsigned long flags,
> + const struct trace_print_flags *names)
> +{
> + unsigned long mask;
> + const struct printf_spec strspec = {
> + .field_width = -1,
> + .precision = -1,
> + };
> + const struct printf_spec numspec = {
> + .flags = SPECIAL|SMALL,
> + .field_width = -1,
> + .precision = -1,
> + .base = 16,
> + };
> +
> + for ( ; flags && (names->mask || names->name); names++) {
Why test both ->mask and ->name? I could imagine some constant being
#defined to 0 due to some CONFIG_* (and stuff that tested "flag & THAT"
would then get compiled away), so maybe ->mask is insufficient. But
->name will always be non-NULL for all but the sentinel entry, right?
> + mask = names->mask;
> + if ((flags & mask) != mask)
> + continue;
And if we have some constant which is 0, this will then actually cause
us to print the corresponding string. Do we want that? If not we should
have a "if (!mask) continue;". And how helpful are these strings really
if their meaning might be .config dependent?
> + buf = string(buf, end, names->name, strspec);
So string() is robust against a NULL string (printing the string
"(null)"), but that seems silly to rely on. So I'd strongly lean towards
making the loop condition just test ->name.
> + flags &= ~mask;
> + if (flags) {
> + if (buf < end)
> + *buf = '|';
> + buf++;
> + }
> + }
> +
> + if (flags)
> + buf = number(buf, end, flags, numspec);
> +
> + return buf;
> +}
> +
> +static noinline_for_stack
> +char *flags_string(char *buf, char *end, void *flags_ptr,
> + struct printf_spec spec, const char *fmt)
Even if the user passed a field width (which is extremely unlikely), we
wouldn't honour it, so there's no reason to pass on the spec. But maybe
gcc realizes that.
> +{
> + unsigned long flags;
> + const struct trace_print_flags *names;
> +
> + switch (fmt[1]) {
> + case 'p':
> + flags = *(unsigned long *)flags_ptr;
> + /* Remove zone id */
> + flags &= (1UL << NR_PAGEFLAGS) - 1;
> + names = pageflag_names;
> + break;
> + case 'v':
> + flags = *(unsigned long *)flags_ptr;
> + names = vmaflag_names;
> + break;
> + case 'g':
> + flags = *(gfp_t *)flags_ptr;
> + names = gfpflag_names;
> + break;
> + default:
> + WARN_ONCE(1, "Unsupported flags modifier: %c\n", fmt[1]);
> + return buf;
> + }
> +
> + return format_flags(buf, end, flags, names);
> +}
> +
OK.
> int kptr_restrict __read_mostly;
>
> /*
> @@ -1448,6 +1516,11 @@ int kptr_restrict __read_mostly;
> * - 'Cn' For a clock, it prints the name (Common Clock Framework) or address
> * (legacy clock framework) of the clock
> * - 'Cr' For a clock, it prints the current rate of the clock
> + * - 'g' For flags to be printed as a collection of symbolic strings that would
> + * construct the specific value. Supported flags given by option:
> + * p page flags (see struct page) given as pointer to unsigned long
> + * g gfp flags (GFP_* and __GFP_*) given as pointer to gfp_t
> + * v vma flags (VM_*) given as pointer to unsigned long
> *
> * ** Please update also Documentation/printk-formats.txt when making changes **
> *
> @@ -1600,6 +1673,8 @@ char *pointer(const char *fmt, char *buf, char *end, void *ptr,
> return dentry_name(buf, end,
> ((const struct file *)ptr)->f_path.dentry,
> spec, fmt);
> + case 'g':
> + return flags_string(buf, end, ptr, spec, fmt);
> }
OK.
I looked briefly at the conversions in mm/ and they seemed fine, but
others are more qualified to comment on that part.
Oh, and while I remember, citing from the end of printk-format.txt:
If you add other %p extensions, please extend lib/test_printf.c with
one or more test cases, if at all feasible.
Maybe I shouldn't have put that note at the end :)
Rasmus
On Friday 04 December 2015 16:16:33 Vlastimil Babka wrote:
> --- a/include/linux/mmdebug.h
> +++ b/include/linux/mmdebug.h
> @@ -2,15 +2,20 @@
> #define LINUX_MM_DEBUG_H 1
>
> #include <linux/stringify.h>
> +#include <linux/types.h>
> +#include <linux/tracepoint.h>
8<-----
Subject: mm: fix generated/bounds.h
The inclusion of linux/tracepoint.h is causing build errors for me in ARM
randconfig:
In file included from /git/arm-soc/include/linux/ktime.h:25:0,
from /git/arm-soc/include/linux/rcupdate.h:47,
from /git/arm-soc/include/linux/tracepoint.h:19,
from /git/arm-soc/include/linux/mmdebug.h:6,
from /git/arm-soc/include/linux/page-flags.h:10,
from /git/arm-soc/kernel/bounds.c:9:
/git/arm-soc/include/linux/jiffies.h:10:33: fatal error: generated/timeconst.h: No such file or directory
compilation terminated.
To work around this, we can stop including linux/mmdebug.h from linux/page_flags.h
while generating bounds.h, as we do for mm_types.h already.
Signed-off-by: Arnd Bergmann <[email protected]>
Fixes: 8c0d593d0f8f ("mm, printk: introduce new format string for flags")
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 19724e6ebd26..4efad0578a28 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -7,8 +7,8 @@
#include <linux/types.h>
#include <linux/bug.h>
-#include <linux/mmdebug.h>
#ifndef __GENERATING_BOUNDS_H
+#include <linux/mmdebug.h>
#include <linux/mm_types.h>
#include <generated/bounds.h>
#endif /* !__GENERATING_BOUNDS_H */
On 12/09/2015 12:29 PM, Arnd Bergmann wrote:
> On Friday 04 December 2015 16:16:33 Vlastimil Babka wrote:
>> --- a/include/linux/mmdebug.h
>> +++ b/include/linux/mmdebug.h
>> @@ -2,15 +2,20 @@
>> #define LINUX_MM_DEBUG_H 1
>>
>> #include <linux/stringify.h>
>> +#include <linux/types.h>
>> +#include <linux/tracepoint.h>
>
> 8<-----
> Subject: mm: fix generated/bounds.h
>
> The inclusion of linux/tracepoint.h is causing build errors for me in ARM
> randconfig:
>
> In file included from /git/arm-soc/include/linux/ktime.h:25:0,
> from /git/arm-soc/include/linux/rcupdate.h:47,
> from /git/arm-soc/include/linux/tracepoint.h:19,
> from /git/arm-soc/include/linux/mmdebug.h:6,
> from /git/arm-soc/include/linux/page-flags.h:10,
> from /git/arm-soc/kernel/bounds.c:9:
> /git/arm-soc/include/linux/jiffies.h:10:33: fatal error: generated/timeconst.h: No such file or directory
> compilation terminated.
>
> To work around this, we can stop including linux/mmdebug.h from linux/page_flags.h
> while generating bounds.h, as we do for mm_types.h already.
>
> Signed-off-by: Arnd Bergmann <[email protected]>
Thanks and sorry. Andrew can you please include it in mmotm as -fix for now?
I plan to respin the whole of this later with some patch splitting and
reordering to reduce churn and follow Rasmus' advice.
Also I've just learned that there's a new lightweight tracepoint-defs.h in -tip
thanks to Andi, which would be a better place for struct trace_print_flags than
tracepoint.h is, so I'll look into using it for the respin, which should make
this temporary -fix redundant.
> Fixes: 8c0d593d0f8f ("mm, printk: introduce new format string for flags")
Note that the linux-next commit id is volatile here (regenerated from quilt series).
>
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 19724e6ebd26..4efad0578a28 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -7,8 +7,8 @@
>
> #include <linux/types.h>
> #include <linux/bug.h>
> -#include <linux/mmdebug.h>
> #ifndef __GENERATING_BOUNDS_H
> +#include <linux/mmdebug.h>
> #include <linux/mm_types.h>
> #include <generated/bounds.h>
> #endif /* !__GENERATING_BOUNDS_H */
>
Ccing, Steven to ask trace-cmd problem.
On Fri, Dec 04, 2015 at 04:16:33PM +0100, Vlastimil Babka wrote:
> In mm we use several kinds of flags bitfields that are sometimes printed for
> debugging purposes, or exported to userspace via sysfs. To make them easier to
> interpret independently on kernel version and config, we want to dump also the
> symbolic flag names. So far this has been done with repeated calls to
> pr_cont(), which is unreliable on SMP, and not usable for e.g. sysfs export.
>
> To get a more reliable and universal solution, this patch extends printk()
> format string for pointers to handle the page flags (%pgp), gfp_flags (%pgg)
> and vma flags (%pgv). Existing users of dump_flag_names() are converted and
> simplified.
>
> It would be possible to pass flags by value instead of pointer, but the %p
> format string for pointers already has extensions for various kernel
> structures, so it's a good fit, and the extra indirection in a non-critical
> path is negligible.
I'd like to use %pgp in tracepoint output. It works well when I do
'cat /sys/kernel/debug/tracing/trace' but not works well when I do
'./trace-cmd report'. It prints following error log.
[page_ref:page_ref_unfreeze] bad op token &
[page_ref:page_ref_set] bad op token &
[page_ref:page_ref_mod_unless] bad op token &
[page_ref:page_ref_mod_and_test] bad op token &
[page_ref:page_ref_mod_and_return] bad op token &
[page_ref:page_ref_mod] bad op token &
[page_ref:page_ref_freeze] bad op token &
Following is the format I used.
TP_printk("pfn=0x%lx flags=%pgp count=%d mapcount=%d mapping=%p mt=%d val=%d ret=%d",
__entry->pfn, &__entry->flags, __entry->count,
__entry->mapcount, __entry->mapping, __entry->mt,
__entry->val, __entry->ret)
Could it be solved by 'trace-cmd' itself?
Or it's better to pass flags by value?
Or should I use something like show_gfp_flags()?
Thanks.
I should have been Cc'd on this as I'm the maintainer of a few of the files
here that is being modified.
On Fri, Dec 04, 2015 at 04:16:33PM +0100, Vlastimil Babka wrote:
> In mm we use several kinds of flags bitfields that are sometimes printed for
> debugging purposes, or exported to userspace via sysfs. To make them easier to
> interpret independently on kernel version and config, we want to dump also the
> symbolic flag names. So far this has been done with repeated calls to
> pr_cont(), which is unreliable on SMP, and not usable for e.g. sysfs export.
>
> To get a more reliable and universal solution, this patch extends printk()
> format string for pointers to handle the page flags (%pgp), gfp_flags (%pgg)
> and vma flags (%pgv). Existing users of dump_flag_names() are converted and
> simplified.
>
> It would be possible to pass flags by value instead of pointer, but the %p
> format string for pointers already has extensions for various kernel
> structures, so it's a good fit, and the extra indirection in a non-critical
> path is negligible.
>
> [[email protected]: lots of good implementation suggestions]
> Signed-off-by: Vlastimil Babka <[email protected]>
> Cc: Rasmus Villemoes <[email protected]>
> Cc: Joonsoo Kim <[email protected]>
> Cc: Minchan Kim <[email protected]>
> Cc: Sasha Levin <[email protected]>
> Cc: "Kirill A. Shutemov" <[email protected]>
> Cc: Mel Gorman <[email protected]>
> Cc: Michal Hocko <[email protected]>
> ---
> Should hopefully apply to mmots (release mmotm doesn't have the 3 -fix patches yet?)
>
> Documentation/printk-formats.txt | 14 ++++
> include/linux/mmdebug.h | 7 +-
> include/linux/trace_events.h | 10 ---
> include/linux/tracepoint.h | 10 +++
> lib/vsprintf.c | 75 ++++++++++++++++++++++
> mm/debug.c | 135 ++++++++++++++-------------------------
> mm/oom_kill.c | 5 +-
> mm/page_alloc.c | 5 +-
> mm/page_owner.c | 6 +-
> 9 files changed, 160 insertions(+), 107 deletions(-)
>
> diff --git a/Documentation/printk-formats.txt b/Documentation/printk-formats.txt
> index b784c270105f..8b6ab00fcfc9 100644
> --- a/Documentation/printk-formats.txt
> +++ b/Documentation/printk-formats.txt
> @@ -292,6 +292,20 @@ Raw pointer value SHOULD be printed with %p. The kernel supports
>
> Passed by reference.
>
> +Flags bitfields such as page flags, gfp_flags:
> +
> + %pgp referenced|uptodate|lru|active|private
> + %pgg GFP_USER|GFP_DMA32|GFP_NOWARN
> + %pgv read|exec|mayread|maywrite|mayexec|denywrite
> +
> + For printing flags bitfields as a collection of symbolic constants that
> + would construct the value. The type of flags is given by the third
> + character. Currently supported are [p]age flags, [g]fp_flags and
> + [v]ma_flags. The flag names and print order depends on the particular
> + type.
> +
> + Passed by reference.
> +
> Network device features:
>
> %pNF 0x000000000000c000
> diff --git a/include/linux/mmdebug.h b/include/linux/mmdebug.h
> index 3b77fab7ad28..2c8286cf162e 100644
> --- a/include/linux/mmdebug.h
> +++ b/include/linux/mmdebug.h
> @@ -2,15 +2,20 @@
> #define LINUX_MM_DEBUG_H 1
>
> #include <linux/stringify.h>
> +#include <linux/types.h>
> +#include <linux/tracepoint.h>
>
> struct page;
> struct vm_area_struct;
> struct mm_struct;
>
> +extern const struct trace_print_flags pageflag_names[];
> +extern const struct trace_print_flags vmaflag_names[];
> +extern const struct trace_print_flags gfpflag_names[];
> +
> extern void dump_page(struct page *page, const char *reason);
> extern void dump_page_badflags(struct page *page, const char *reason,
> unsigned long badflags);
> -extern void dump_gfpflag_names(unsigned long gfp_flags);
> void dump_vma(const struct vm_area_struct *vma);
> void dump_mm(const struct mm_struct *mm);
>
> diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
> index 429fdfc3baf5..d91404f89ff2 100644
> --- a/include/linux/trace_events.h
> +++ b/include/linux/trace_events.h
> @@ -15,16 +15,6 @@ struct tracer;
> struct dentry;
> struct bpf_prog;
>
> -struct trace_print_flags {
> - unsigned long mask;
> - const char *name;
> -};
> -
> -struct trace_print_flags_u64 {
> - unsigned long long mask;
> - const char *name;
> -};
> -
Ingo took some patches from Andi Kleen that creates a tracepoint-defs.h file
If anything, these should be moved there. That code is currently in tip.
-- Steve
> const char *trace_print_flags_seq(struct trace_seq *p, const char *delim,
> unsigned long flags,
> const struct trace_print_flags *flag_array);
> diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
> index 7834a8a8bf1e..a5d0ab46724d 100644
> --- a/include/linux/tracepoint.h
> +++ b/include/linux/tracepoint.h
> @@ -43,6 +43,16 @@ struct trace_enum_map {
> unsigned long enum_value;
> };
>
> +struct trace_print_flags {
> + unsigned long mask;
> + const char *name;
> +};
> +
> +struct trace_print_flags_u64 {
> + unsigned long long mask;
> + const char *name;
> +};
> +
> #define TRACEPOINT_DEFAULT_PRIO 10
>
> extern int
> diff --git a/lib/vsprintf.c b/lib/vsprintf.c
> index f9cee8e1233c..9a0697b14ea3 100644
> --- a/lib/vsprintf.c
> +++ b/lib/vsprintf.c
> @@ -31,6 +31,7 @@
> #include <linux/dcache.h>
> #include <linux/cred.h>
> #include <net/addrconf.h>
> +#include <linux/mmdebug.h>
>
> #include <asm/page.h> /* for PAGE_SIZE */
> #include <asm/sections.h> /* for dereference_function_descriptor() */
> @@ -1361,6 +1362,73 @@ char *clock(char *buf, char *end, struct clk *clk, struct printf_spec spec,
> }
> }
>
> +static
> +char *format_flags(char *buf, char *end, unsigned long flags,
> + const struct trace_print_flags *names)
> +{
> + unsigned long mask;
> + const struct printf_spec strspec = {
> + .field_width = -1,
> + .precision = -1,
> + };
> + const struct printf_spec numspec = {
> + .flags = SPECIAL|SMALL,
> + .field_width = -1,
> + .precision = -1,
> + .base = 16,
> + };
> +
> + for ( ; flags && (names->mask || names->name); names++) {
> + mask = names->mask;
> + if ((flags & mask) != mask)
> + continue;
> +
> + buf = string(buf, end, names->name, strspec);
> +
> + flags &= ~mask;
> + if (flags) {
> + if (buf < end)
> + *buf = '|';
> + buf++;
> + }
> + }
> +
> + if (flags)
> + buf = number(buf, end, flags, numspec);
> +
> + return buf;
> +}
> +
> +static noinline_for_stack
> +char *flags_string(char *buf, char *end, void *flags_ptr,
> + struct printf_spec spec, const char *fmt)
> +{
> + unsigned long flags;
> + const struct trace_print_flags *names;
> +
> + switch (fmt[1]) {
> + case 'p':
> + flags = *(unsigned long *)flags_ptr;
> + /* Remove zone id */
> + flags &= (1UL << NR_PAGEFLAGS) - 1;
> + names = pageflag_names;
> + break;
> + case 'v':
> + flags = *(unsigned long *)flags_ptr;
> + names = vmaflag_names;
> + break;
> + case 'g':
> + flags = *(gfp_t *)flags_ptr;
> + names = gfpflag_names;
> + break;
> + default:
> + WARN_ONCE(1, "Unsupported flags modifier: %c\n", fmt[1]);
> + return buf;
> + }
> +
> + return format_flags(buf, end, flags, names);
> +}
> +
> int kptr_restrict __read_mostly;
>
> /*
> @@ -1448,6 +1516,11 @@ int kptr_restrict __read_mostly;
> * - 'Cn' For a clock, it prints the name (Common Clock Framework) or address
> * (legacy clock framework) of the clock
> * - 'Cr' For a clock, it prints the current rate of the clock
> + * - 'g' For flags to be printed as a collection of symbolic strings that would
> + * construct the specific value. Supported flags given by option:
> + * p page flags (see struct page) given as pointer to unsigned long
> + * g gfp flags (GFP_* and __GFP_*) given as pointer to gfp_t
> + * v vma flags (VM_*) given as pointer to unsigned long
> *
> * ** Please update also Documentation/printk-formats.txt when making changes **
> *
> @@ -1600,6 +1673,8 @@ char *pointer(const char *fmt, char *buf, char *end, void *ptr,
> return dentry_name(buf, end,
> ((const struct file *)ptr)->f_path.dentry,
> spec, fmt);
> + case 'g':
> + return flags_string(buf, end, ptr, spec, fmt);
> }
> spec.flags |= SMALL;
> if (spec.field_width == -1) {
> diff --git a/mm/debug.c b/mm/debug.c
> index 9416524839d8..c42bb4c13c2d 100644
> --- a/mm/debug.c
> +++ b/mm/debug.c
> @@ -23,7 +23,7 @@ char *migrate_reason_names[MR_TYPES] = {
> "cma",
> };
>
> -static const struct trace_print_flags pageflag_names[] = {
> +const struct trace_print_flags pageflag_names[] = {
> {1UL << PG_locked, "locked" },
> {1UL << PG_error, "error" },
> {1UL << PG_referenced, "referenced" },
> @@ -57,86 +57,10 @@ static const struct trace_print_flags pageflag_names[] = {
> {1UL << PG_young, "young" },
> {1UL << PG_idle, "idle" },
> #endif
> + {0, NULL },
> };
>
> -static const struct trace_print_flags gfpflag_names[] = {
> - __def_gfpflag_names
> -};
> -
> -static void dump_flag_names(unsigned long flags,
> - const struct trace_print_flags *names, int count)
> -{
> - const char *delim = "";
> - unsigned long mask;
> - int i;
> -
> - pr_cont("(");
> -
> - for (i = 0; i < count && flags; i++) {
> -
> - mask = names[i].mask;
> - if ((flags & mask) != mask)
> - continue;
> -
> - flags &= ~mask;
> - pr_cont("%s%s", delim, names[i].name);
> - delim = "|";
> - }
> -
> - /* check for left over flags */
> - if (flags)
> - pr_cont("%s%#lx", delim, flags);
> -
> - pr_cont(")\n");
> -}
> -
> -void dump_gfpflag_names(unsigned long gfp_flags)
> -{
> - dump_flag_names(gfp_flags, gfpflag_names, ARRAY_SIZE(gfpflag_names));
> -}
> -
> -void dump_page_badflags(struct page *page, const char *reason,
> - unsigned long badflags)
> -{
> - unsigned long printflags = page->flags;
> -
> - pr_emerg("page:%p count:%d mapcount:%d mapping:%p index:%#lx",
> - page, atomic_read(&page->_count), page_mapcount(page),
> - page->mapping, page->index);
> - if (PageCompound(page))
> - pr_cont(" compound_mapcount: %d", compound_mapcount(page));
> - pr_cont("\n");
> - BUILD_BUG_ON(ARRAY_SIZE(pageflag_names) != __NR_PAGEFLAGS);
> -
> - pr_emerg("flags: %#lx", printflags);
> - /* remove zone id */
> - printflags &= (1UL << NR_PAGEFLAGS) - 1;
> - dump_flag_names(printflags, pageflag_names, ARRAY_SIZE(pageflag_names));
> -
> - if (reason)
> - pr_alert("page dumped because: %s\n", reason);
> - if (page->flags & badflags) {
> - printflags = page->flags & badflags;
> - pr_alert("bad because of flags: %#lx:", printflags);
> - dump_flag_names(printflags, pageflag_names,
> - ARRAY_SIZE(pageflag_names));
> - }
> -#ifdef CONFIG_MEMCG
> - if (page->mem_cgroup)
> - pr_alert("page->mem_cgroup:%p\n", page->mem_cgroup);
> -#endif
> -}
> -
> -void dump_page(struct page *page, const char *reason)
> -{
> - dump_page_badflags(page, reason, 0);
> - dump_page_owner(page);
> -}
> -EXPORT_SYMBOL(dump_page);
> -
> -#ifdef CONFIG_DEBUG_VM
> -
> -static const struct trace_print_flags vmaflags_names[] = {
> +const struct trace_print_flags vmaflag_names[] = {
> {VM_READ, "read" },
> {VM_WRITE, "write" },
> {VM_EXEC, "exec" },
> @@ -177,22 +101,61 @@ static const struct trace_print_flags vmaflags_names[] = {
> {VM_HUGEPAGE, "hugepage" },
> {VM_NOHUGEPAGE, "nohugepage" },
> {VM_MERGEABLE, "mergeable" },
> + {0, NULL },
> +};
> +
> +const struct trace_print_flags gfpflag_names[] = {
> + __def_gfpflag_names,
> + {0, NULL},
> };
>
> +void dump_page_badflags(struct page *page, const char *reason,
> + unsigned long badflags)
> +{
> + unsigned long printflags = page->flags;
> +
> + pr_emerg("page:%p count:%d mapcount:%d mapping:%p index:%#lx\n",
> + page, atomic_read(&page->_count), page_mapcount(page),
> + page->mapping, page->index);
> + BUILD_BUG_ON(ARRAY_SIZE(pageflag_names) != __NR_PAGEFLAGS + 1);
> +
> + pr_emerg("flags: %#lx(%pgp)\n", printflags, &printflags);
> +
> + if (reason)
> + pr_alert("page dumped because: %s\n", reason);
> + if (page->flags & badflags) {
> + printflags = page->flags & badflags;
> + pr_alert("bad because of flags: %#lx(%pgp)\n", printflags,
> + &printflags);
> + }
> +#ifdef CONFIG_MEMCG
> + if (page->mem_cgroup)
> + pr_alert("page->mem_cgroup:%p\n", page->mem_cgroup);
> +#endif
> +}
> +
> +void dump_page(struct page *page, const char *reason)
> +{
> + dump_page_badflags(page, reason, 0);
> + dump_page_owner(page);
> +}
> +EXPORT_SYMBOL(dump_page);
> +
> +#ifdef CONFIG_DEBUG_VM
> +
> void dump_vma(const struct vm_area_struct *vma)
> {
> pr_emerg("vma %p start %p end %p\n"
> "next %p prev %p mm %p\n"
> "prot %lx anon_vma %p vm_ops %p\n"
> - "pgoff %lx file %p private_data %p\n",
> + "pgoff %lx file %p private_data %p\n"
> + "flags: %#lx(%pgv)\n",
> vma, (void *)vma->vm_start, (void *)vma->vm_end, vma->vm_next,
> vma->vm_prev, vma->vm_mm,
> (unsigned long)pgprot_val(vma->vm_page_prot),
> vma->anon_vma, vma->vm_ops, vma->vm_pgoff,
> - vma->vm_file, vma->vm_private_data);
> - pr_emerg("flags: %#lx", vma->vm_flags);
> - dump_flag_names(vma->vm_flags, vmaflags_names,
> - ARRAY_SIZE(vmaflags_names));
> + vma->vm_file, vma->vm_private_data,
> + vma->vm_flags, &vma->vm_flags);
> }
> EXPORT_SYMBOL(dump_vma);
>
> @@ -263,9 +226,7 @@ void dump_mm(const struct mm_struct *mm)
> "" /* This is here to not have a comma! */
> );
>
> - pr_emerg("def_flags: %#lx", mm->def_flags);
> - dump_flag_names(mm->def_flags, vmaflags_names,
> - ARRAY_SIZE(vmaflags_names));
> + pr_emerg("def_flags: %#lx(%pgv)\n", mm->def_flags, &mm->def_flags);
> }
>
> #endif /* CONFIG_DEBUG_VM */
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 542d56c93209..07bba5a46b47 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -387,10 +387,9 @@ static void dump_header(struct oom_control *oc, struct task_struct *p,
> struct mem_cgroup *memcg)
> {
> pr_warning("%s invoked oom-killer: order=%d, oom_score_adj=%hd, "
> - "gfp_mask=0x%x",
> + "gfp_mask=%#x(%pgg)\n",
> current->comm, oc->order, current->signal->oom_score_adj,
> - oc->gfp_mask);
> - dump_gfpflag_names(oc->gfp_mask);
> + oc->gfp_mask, &oc->gfp_mask);
>
> cpuset_print_current_mems_allowed();
> dump_stack();
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index d6d7c97c0b28..bd94c7465dea 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2711,9 +2711,8 @@ void warn_alloc_failed(gfp_t gfp_mask, unsigned int order, const char *fmt, ...)
> va_end(args);
> }
>
> - pr_warn("%s: page allocation failure: order:%u, mode:0x%x",
> - current->comm, order, gfp_mask);
> - dump_gfpflag_names(gfp_mask);
> + pr_warn("%s: page allocation failure: order:%u, mode:%#x(%pgg)\n",
> + current->comm, order, gfp_mask, &gfp_mask);
> dump_stack();
> if (!should_suppress_show_mem())
> show_mem(filter);
> diff --git a/mm/page_owner.c b/mm/page_owner.c
> index f4acd2452c35..313251f36d86 100644
> --- a/mm/page_owner.c
> +++ b/mm/page_owner.c
> @@ -208,9 +208,9 @@ void __dump_page_owner(struct page *page)
> return;
> }
>
> - pr_alert("page allocated via order %u, migratetype %s, gfp_mask 0x%x",
> - page_ext->order, migratetype_names[mt], gfp_mask);
> - dump_gfpflag_names(gfp_mask);
> + pr_alert("page allocated via order %u, migratetype %s, "
> + "gfp_mask %#x(%pgg)\n", page_ext->order,
> + migratetype_names[mt], gfp_mask, &gfp_mask);
> print_stack_trace(&trace, 0);
>
> if (page_ext->last_migrate_reason != -1)
> --
> 2.6.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
On Fri, Dec 04, 2015 at 03:15:45PM +0100, Vlastimil Babka wrote:
> On 12/03/2015 07:38 PM, yalin wang wrote:
> >that’s all, see cpumask_pr_args(masks) macro,
> >it also use macro and %*pb to print cpu mask .
> >i think this method is not very complex to use .
>
> Well, one also has to write the appropriate translation tables.
>
> >search source code ,
> >there is lots of printk to print flag into hex number :
> >$ grep -n -r 'printk.*flag.*%x’ .
> >it will be great if this flag string print is generic.
>
> I think it can always be done later, this is an internal API. For now we
> just have 3 quite generic flags, so let's not over-engineer things right
> now.
>
As long as it is never used in the TP_printk() part of a tracepoint. As soon
as it is, trace-cmd and perf will update parse-events to handle that
parameter, and as soon as that is done, it becomes a userspace ABI.
Just be warned.
-- Steve
On Thu, Dec 10, 2015 at 11:59:44AM +0900, Joonsoo Kim wrote:
> Ccing, Steven to ask trace-cmd problem.
Thanks, I should have been Cc'd for the rest as well.
>
> On Fri, Dec 04, 2015 at 04:16:33PM +0100, Vlastimil Babka wrote:
> > In mm we use several kinds of flags bitfields that are sometimes printed for
> > debugging purposes, or exported to userspace via sysfs. To make them easier to
> > interpret independently on kernel version and config, we want to dump also the
> > symbolic flag names. So far this has been done with repeated calls to
> > pr_cont(), which is unreliable on SMP, and not usable for e.g. sysfs export.
> >
> > To get a more reliable and universal solution, this patch extends printk()
> > format string for pointers to handle the page flags (%pgp), gfp_flags (%pgg)
> > and vma flags (%pgv). Existing users of dump_flag_names() are converted and
> > simplified.
> >
> > It would be possible to pass flags by value instead of pointer, but the %p
> > format string for pointers already has extensions for various kernel
> > structures, so it's a good fit, and the extra indirection in a non-critical
> > path is negligible.
>
> I'd like to use %pgp in tracepoint output. It works well when I do
> 'cat /sys/kernel/debug/tracing/trace' but not works well when I do
> './trace-cmd report'. It prints following error log.
>
> [page_ref:page_ref_unfreeze] bad op token &
> [page_ref:page_ref_set] bad op token &
> [page_ref:page_ref_mod_unless] bad op token &
> [page_ref:page_ref_mod_and_test] bad op token &
> [page_ref:page_ref_mod_and_return] bad op token &
> [page_ref:page_ref_mod] bad op token &
> [page_ref:page_ref_freeze] bad op token &
>
> Following is the format I used.
>
> TP_printk("pfn=0x%lx flags=%pgp count=%d mapcount=%d mapping=%p mt=%d val=%d ret=%d",
> __entry->pfn, &__entry->flags, __entry->count,
> __entry->mapcount, __entry->mapping, __entry->mt,
> __entry->val, __entry->ret)
>
> Could it be solved by 'trace-cmd' itself?
> Or it's better to pass flags by value?
> Or should I use something like show_gfp_flags()?
Yes this can be solved in perf and trace-cmd via the parse-events.c file. And
as soon as that happens, whatever method we decide upon becomes a userspace
ABI. So don't think you can change it later.
-- Steve
On Wed, Dec 09, 2015 at 11:04:56PM -0500, Steven Rostedt wrote:
> On Thu, Dec 10, 2015 at 11:59:44AM +0900, Joonsoo Kim wrote:
> > Ccing, Steven to ask trace-cmd problem.
>
> Thanks, I should have been Cc'd for the rest as well.
>
> >
> > On Fri, Dec 04, 2015 at 04:16:33PM +0100, Vlastimil Babka wrote:
> > > In mm we use several kinds of flags bitfields that are sometimes printed for
> > > debugging purposes, or exported to userspace via sysfs. To make them easier to
> > > interpret independently on kernel version and config, we want to dump also the
> > > symbolic flag names. So far this has been done with repeated calls to
> > > pr_cont(), which is unreliable on SMP, and not usable for e.g. sysfs export.
> > >
> > > To get a more reliable and universal solution, this patch extends printk()
> > > format string for pointers to handle the page flags (%pgp), gfp_flags (%pgg)
> > > and vma flags (%pgv). Existing users of dump_flag_names() are converted and
> > > simplified.
> > >
> > > It would be possible to pass flags by value instead of pointer, but the %p
> > > format string for pointers already has extensions for various kernel
> > > structures, so it's a good fit, and the extra indirection in a non-critical
> > > path is negligible.
> >
> > I'd like to use %pgp in tracepoint output. It works well when I do
> > 'cat /sys/kernel/debug/tracing/trace' but not works well when I do
> > './trace-cmd report'. It prints following error log.
> >
> > [page_ref:page_ref_unfreeze] bad op token &
> > [page_ref:page_ref_set] bad op token &
> > [page_ref:page_ref_mod_unless] bad op token &
> > [page_ref:page_ref_mod_and_test] bad op token &
> > [page_ref:page_ref_mod_and_return] bad op token &
> > [page_ref:page_ref_mod] bad op token &
> > [page_ref:page_ref_freeze] bad op token &
> >
> > Following is the format I used.
> >
> > TP_printk("pfn=0x%lx flags=%pgp count=%d mapcount=%d mapping=%p mt=%d val=%d ret=%d",
> > __entry->pfn, &__entry->flags, __entry->count,
> > __entry->mapcount, __entry->mapping, __entry->mt,
> > __entry->val, __entry->ret)
> >
> > Could it be solved by 'trace-cmd' itself?
> > Or it's better to pass flags by value?
> > Or should I use something like show_gfp_flags()?
>
> Yes this can be solved in perf and trace-cmd via the parse-events.c file. And
> as soon as that happens, whatever method we decide upon becomes a userspace
> ABI. So don't think you can change it later.
Okay. Because it can be solved by perf and trace-cmd via the
parse-events.c, I have no preference whether it is passed by value or
not.
Thanks.
On Thu, Dec 10 2015, Steven Rostedt <[email protected]> wrote:
> On Thu, Dec 10, 2015 at 11:59:44AM +0900, Joonsoo Kim wrote:
>>
>> [page_ref:page_ref_unfreeze] bad op token &
>> [page_ref:page_ref_set] bad op token &
>> [page_ref:page_ref_mod_unless] bad op token &
>> [page_ref:page_ref_mod_and_test] bad op token &
>> [page_ref:page_ref_mod_and_return] bad op token &
>> [page_ref:page_ref_mod] bad op token &
>> [page_ref:page_ref_freeze] bad op token &
>>
>> Following is the format I used.
>>
>> TP_printk("pfn=0x%lx flags=%pgp count=%d mapcount=%d mapping=%p mt=%d val=%d ret=%d",
>> __entry->pfn, &__entry->flags, __entry->count,
>> __entry->mapcount, __entry->mapping, __entry->mt,
>> __entry->val, __entry->ret)
>>
>> Could it be solved by 'trace-cmd' itself?
>> Or it's better to pass flags by value?
>> Or should I use something like show_gfp_flags()?
>
> Yes this can be solved in perf and trace-cmd via the parse-events.c file. And
> as soon as that happens, whatever method we decide upon becomes a userspace
> ABI. So don't think you can change it later.
So somewhat off-topic, but this reminds me of a question I've been
meaning to ask: What makes it safe to stash the pointer values in
vbin_printf and only dereference them later in bstr_printf? For plain
pointer printing (%p) it's of course not a problem, but quite a few of
the %p extensions do dereference the pointer in one way or another (at
least %p[dD], %p[mM], %p[iI], %ph, %pE, %pC, %pNF, %pU, %pa and probably
soon %pg).
Rasmus
On 12/10/2015 04:51 AM, Steven Rostedt wrote:
> I should have been Cc'd on this as I'm the maintainer of a few of the files
> here that is being modified.
Sorry about that.
>> --- a/include/linux/trace_events.h
>> +++ b/include/linux/trace_events.h
>> @@ -15,16 +15,6 @@ struct tracer;
>> struct dentry;
>> struct bpf_prog;
>>
>> -struct trace_print_flags {
>> - unsigned long mask;
>> - const char *name;
>> -};
>> -
>> -struct trace_print_flags_u64 {
>> - unsigned long long mask;
>> - const char *name;
>> -};
>> -
>
> Ingo took some patches from Andi Kleen that creates a tracepoint-defs.h file
> If anything, these should be moved there. That code is currently in tip.
Yeah I noticed that yesterday and seems like a good idea. Rasmus
suggested types.h but these didn't seem general enough for that one. Thanks.
On 12/10/2015 05:04 AM, Steven Rostedt wrote:
> On Thu, Dec 10, 2015 at 11:59:44AM +0900, Joonsoo Kim wrote:
>> Ccing, Steven to ask trace-cmd problem.
>>
>> I'd like to use %pgp in tracepoint output. It works well when I do
>> 'cat /sys/kernel/debug/tracing/trace' but not works well when I do
>> './trace-cmd report'. It prints following error log.
>>
>> [page_ref:page_ref_unfreeze] bad op token &
>> [page_ref:page_ref_set] bad op token &
>> [page_ref:page_ref_mod_unless] bad op token &
>> [page_ref:page_ref_mod_and_test] bad op token &
>> [page_ref:page_ref_mod_and_return] bad op token &
>> [page_ref:page_ref_mod] bad op token &
>> [page_ref:page_ref_freeze] bad op token &
>>
>> Following is the format I used.
>>
>> TP_printk("pfn=0x%lx flags=%pgp count=%d mapcount=%d mapping=%p mt=%d val=%d ret=%d",
>> __entry->pfn, &__entry->flags, __entry->count,
>> __entry->mapcount, __entry->mapping, __entry->mt,
>> __entry->val, __entry->ret)
>>
>> Could it be solved by 'trace-cmd' itself?
You mean that trace-cmd/parse-events.c would interpret the raw value of
flags by itself? That would mean the flags became fixed ABI, not a good
idea...
>> Or it's better to pass flags by value?
If it's value (as opposed to a pointer in %pgp), that doesn't change
much wrt. having to intepret them?
>> Or should I use something like show_gfp_flags()?
Sounds like least pain to me, at least for now. We just need to have the
translation tables available as #define with __print_flags() in some
trace/events header, like the existing trace/events/gfpflags.h for gfp
flags. These tables can still be reused within mm/debug.c or printk code
without copy/paste, like I did in "[PATCH v2 6/9] mm, debug: introduce
dump_gfpflag_names() for symbolic printing of gfp_flags" [1]. Maybe it's
not the most elegant solution, but works without changing parse-events.c
using the existing format export.
So if you agree, I can do this in the next spin.
[1] https://lkml.org/lkml/2015/11/24/354
> Yes this can be solved in perf and trace-cmd via the parse-events.c file. And
> as soon as that happens, whatever method we decide upon becomes a userspace
> ABI. So don't think you can change it later.
>
> -- Steve
>
On 9 December 2015 at 11:29, Arnd Bergmann <[email protected]> wrote:
> On Friday 04 December 2015 16:16:33 Vlastimil Babka wrote:
>> --- a/include/linux/mmdebug.h
>> +++ b/include/linux/mmdebug.h
>> @@ -2,15 +2,20 @@
>> #define LINUX_MM_DEBUG_H 1
>>
>> #include <linux/stringify.h>
>> +#include <linux/types.h>
>> +#include <linux/tracepoint.h>
>
> 8<-----
> Subject: mm: fix generated/bounds.h
>
> The inclusion of linux/tracepoint.h is causing build errors for me in ARM
> randconfig:
>
> In file included from /git/arm-soc/include/linux/ktime.h:25:0,
> from /git/arm-soc/include/linux/rcupdate.h:47,
> from /git/arm-soc/include/linux/tracepoint.h:19,
> from /git/arm-soc/include/linux/mmdebug.h:6,
> from /git/arm-soc/include/linux/page-flags.h:10,
> from /git/arm-soc/kernel/bounds.c:9:
> /git/arm-soc/include/linux/jiffies.h:10:33: fatal error: generated/timeconst.h: No such file or directory
> compilation terminated.
>
> To work around this, we can stop including linux/mmdebug.h from linux/page_flags.h
> while generating bounds.h, as we do for mm_types.h already.
>
> Signed-off-by: Arnd Bergmann <[email protected]>
> Fixes: 8c0d593d0f8f ("mm, printk: introduce new format string for flags")
>
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 19724e6ebd26..4efad0578a28 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -7,8 +7,8 @@
>
> #include <linux/types.h>
> #include <linux/bug.h>
> -#include <linux/mmdebug.h>
> #ifndef __GENERATING_BOUNDS_H
> +#include <linux/mmdebug.h>
> #include <linux/mm_types.h>
> #include <generated/bounds.h>
> #endif /* !__GENERATING_BOUNDS_H */
Same build issue observed for metag too.
Tested-by: James Hogan <[email protected]>
Cheers
James
On Thu, Dec 10, 2015 at 11:03:34AM +0100, Vlastimil Babka wrote:
> On 12/10/2015 05:04 AM, Steven Rostedt wrote:
> >On Thu, Dec 10, 2015 at 11:59:44AM +0900, Joonsoo Kim wrote:
> >>Ccing, Steven to ask trace-cmd problem.
> >>
> >>I'd like to use %pgp in tracepoint output. It works well when I do
> >>'cat /sys/kernel/debug/tracing/trace' but not works well when I do
> >>'./trace-cmd report'. It prints following error log.
> >>
> >> [page_ref:page_ref_unfreeze] bad op token &
> >> [page_ref:page_ref_set] bad op token &
> >> [page_ref:page_ref_mod_unless] bad op token &
> >> [page_ref:page_ref_mod_and_test] bad op token &
> >> [page_ref:page_ref_mod_and_return] bad op token &
> >> [page_ref:page_ref_mod] bad op token &
> >> [page_ref:page_ref_freeze] bad op token &
> >>
> >>Following is the format I used.
> >>
> >>TP_printk("pfn=0x%lx flags=%pgp count=%d mapcount=%d mapping=%p mt=%d val=%d ret=%d",
> >> __entry->pfn, &__entry->flags, __entry->count,
> >> __entry->mapcount, __entry->mapping, __entry->mt,
> >> __entry->val, __entry->ret)
> >>
> >>Could it be solved by 'trace-cmd' itself?
>
> You mean that trace-cmd/parse-events.c would interpret the raw value
> of flags by itself? That would mean the flags became fixed ABI, not
> a good idea...
>
> >>Or it's better to pass flags by value?
>
> If it's value (as opposed to a pointer in %pgp), that doesn't change
> much wrt. having to intepret them?
>
> >>Or should I use something like show_gfp_flags()?
>
> Sounds like least pain to me, at least for now. We just need to have
> the translation tables available as #define with __print_flags() in
> some trace/events header, like the existing trace/events/gfpflags.h
> for gfp flags. These tables can still be reused within mm/debug.c or
> printk code without copy/paste, like I did in "[PATCH v2 6/9] mm,
> debug: introduce dump_gfpflag_names() for symbolic printing of
> gfp_flags" [1]. Maybe it's not the most elegant solution, but works
> without changing parse-events.c using the existing format export.
>
> So if you agree, I can do this in the next spin.
>
Okay. I'm okay with this approach.
Thanks.