2018-05-27 09:33:20

by 禹舟键

[permalink] [raw]
Subject: [PATCH v6] Refactor part of the oom report in dump_header

The dump_header does not print the memcg's name when the system
oom happened, so users cannot locate the certain container which
contains the task that has been killed by the oom killer.

I follow the advices of David Rientjes and Michal Hocko, and refactor
part of the oom report in a backwards compatible way. After this patch,
users can get the memcg's path from the oom report and check the certain
container more quickly.

Below is the part of the oom report in the dmesg
...
[ 142.158316] panic cpuset=/ mems_allowed=0-1
[ 142.158983] CPU: 15 PID: 8682 Comm: panic Not tainted 4.17.0-rc6+ #13
[ 142.159659] Hardware name: Inspur SA5212M4/YZMB-00370-107, BIOS 4.1.10 11/14/2016
[ 142.160342] Call Trace:
[ 142.161037] dump_stack+0x78/0xb3
[ 142.161734] dump_header+0x7d/0x334
[ 142.162433] oom_kill_process+0x228/0x490
[ 142.163126] ? oom_badness+0x2a/0x130
[ 142.163821] out_of_memory+0xf0/0x280
[ 142.164532] __alloc_pages_slowpath+0x711/0xa07
[ 142.165241] __alloc_pages_nodemask+0x23f/0x260
[ 142.165947] alloc_pages_vma+0x73/0x180
[ 142.166665] do_anonymous_page+0xed/0x4e0
[ 142.167388] __handle_mm_fault+0xbd2/0xe00
[ 142.168114] handle_mm_fault+0x116/0x250
[ 142.168841] __do_page_fault+0x233/0x4d0
[ 142.169567] do_page_fault+0x32/0x130
[ 142.170303] ? page_fault+0x8/0x30
[ 142.171036] page_fault+0x1e/0x30
[ 142.171764] RIP: 0033:0x7f403000a860
[ 142.172517] RSP: 002b:00007ffc9f745c28 EFLAGS: 00010206
[ 142.173268] RAX: 00007f3f6fd7d000 RBX: 0000000000000000 RCX: 00007f3f7f5cd000
[ 142.174040] RDX: 00007f3fafd7d000 RSI: 0000000000000000 RDI: 00007f3f6fd7d000
[ 142.174806] RBP: 00007ffc9f745c50 R08: ffffffffffffffff R09: 0000000000000000
[ 142.175623] R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000400490
[ 142.176542] R13: 00007ffc9f745d30 R14: 0000000000000000 R15: 0000000000000000
[ 142.177709] oom-kill: constrain=CONSTRAINT_NONE nodemask=(null) origin_memcg= kill_memcg=/test/test1/test2 task=panic pid= 8622 uid= 0
...

Changes since v5:
- add an array of const char for each constraint.
- replace all of the pr_cont with a single line print of the pr_info.
- put enum oom_constraint into the memcontrol.c file for printing oom constraint.

Changes since v4:
- rename the helper's name to mem_cgroup_print_oom_context.
- rename the mem_cgroup_print_oom_info to mem_cgroup_print_oom_meminfo.
- add the constrain info in the dump_header.

Changes since v3:
- rename the helper's name to mem_cgroup_print_oom_memcg_name.
- add the rcu lock held to the helper.
- remove the print info of memcg's name in mem_cgroup_print_oom_info.

Changes since v2:
- add the mem_cgroup_print_memcg_name helper to print the memcg's
name which contains the task that will be killed by the oom-killer.

Changes since v1:
- replace adding mem_cgroup_print_oom_info with printing the memcg's
name only.

Signed-off-by: ufo19890607 <[email protected]>
---
include/linux/memcontrol.h | 29 +++++++++++++++++++++---
mm/memcontrol.c | 55 ++++++++++++++++++++++++++++++++--------------
mm/oom_kill.c | 12 +++-------
3 files changed, 67 insertions(+), 29 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index d99b71bc2c66..1c7d5da1c827 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -62,6 +62,20 @@ struct mem_cgroup_reclaim_cookie {
unsigned int generation;
};

+enum oom_constraint {
+ CONSTRAINT_NONE,
+ CONSTRAINT_CPUSET,
+ CONSTRAINT_MEMORY_POLICY,
+ CONSTRAINT_MEMCG,
+};
+
+static const char * const oom_constraint_text[] = {
+ [CONSTRAINT_NONE] = "CONSTRAINT_NONE",
+ [CONSTRAINT_CPUSET] = "CONSTRAINT_CPUSET",
+ [CONSTRAINT_MEMORY_POLICY] = "CONSTRAINT_MEMORY_POLICY",
+ [CONSTRAINT_MEMCG] = "CONSTRAINT_MEMCG",
+};
+
#ifdef CONFIG_MEMCG

#define MEM_CGROUP_ID_SHIFT 16
@@ -464,8 +478,11 @@ void mem_cgroup_handle_over_high(void);

unsigned long mem_cgroup_get_limit(struct mem_cgroup *memcg);

-void mem_cgroup_print_oom_info(struct mem_cgroup *memcg,
- struct task_struct *p);
+void mem_cgroup_print_oom_context(struct mem_cgroup *memcg,
+ struct task_struct *p, enum oom_constraint constraint,
+ nodemask_t *nodemask);
+
+void mem_cgroup_print_oom_meminfo(struct mem_cgroup *memcg);

static inline void mem_cgroup_oom_enable(void)
{
@@ -859,7 +876,13 @@ static inline unsigned long mem_cgroup_get_limit(struct mem_cgroup *memcg)
}

static inline void
-mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
+mem_cgroup_print_oom_context(struct mem_cgroup *memcg, struct task_struct *p,
+ enum oom_constraint constraint, nodemask_t *nodemask)
+{
+}
+
+static inline void
+mem_cgroup_print_oom_meminfo(struct mem_cgroup *memcg)
{
}

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 2bd3df3d101a..6c05fd3291e6 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1118,33 +1118,54 @@ static const char *const memcg1_stat_names[] = {
};

#define K(x) ((x) << (PAGE_SHIFT-10))
-/**
- * mem_cgroup_print_oom_info: Print OOM information relevant to memory controller.
- * @memcg: The memory cgroup that went over limit
+/*
+ * mem_cgroup_print_oom_context: Print OOM context information relevant to
+ * memory controller, which includes allocation constraint, nodemask, origin
+ * memcg that has reached its limit, kill memcg that contains the killed
+ * process, killed process's command, pid and uid.
+ * @memcg: The origin memory cgroup that went over limit
* @p: Task that is going to be killed
+ * @constraint: The allocation constraint
+ * @nodemask: The allocation nodemask
*
* NOTE: @memcg and @p's mem_cgroup can be different when hierarchy is
* enabled
*/
-void mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
+void mem_cgroup_print_oom_context(struct mem_cgroup *memcg, struct task_struct *p,
+ enum oom_constraint constraint, nodemask_t *nodemask)
{
- struct mem_cgroup *iter;
- unsigned int i;
+ static char origin_memcg_name[PATH_MAX], kill_memcg_name[PATH_MAX];
+ struct cgroup *origin_cgrp, *kill_cgrp;

rcu_read_lock();
-
- if (p) {
- pr_info("Task in ");
- pr_cont_cgroup_path(task_cgroup(p, memory_cgrp_id));
- pr_cont(" killed as a result of limit of ");
- } else {
- pr_info("Memory limit reached of cgroup ");
+ if (memcg) {
+ origin_cgrp = memcg->css.cgroup;
+ cgroup_path(origin_cgrp, origin_memcg_name, PATH_MAX);
}
-
- pr_cont_cgroup_path(memcg->css.cgroup);
- pr_cont("\n");
-
+ kill_cgrp = task_cgroup(p, memory_cgrp_id);
+ cgroup_path(kill_cgrp, kill_memcg_name, PATH_MAX);
+
+ if (p)
+ pr_info("oom-kill: constrain=%s, nodemask=%*pbl, origin_memcg=%s, kill_memcg=%s, task=%s, pid=%5d, uid=%5d\n",
+ oom_constraint_text[constraint], nodemask_pr_args(nodemask),
+ origin_memcg_name, kill_memcg_name, p->comm, p->pid,
+ from_kuid(&init_user_ns, task_uid(p)));
+ else
+ pr_info("oom-kill: constrain=%s, nodemask=%*pbl, origin_memcg=%s, kill_memcg=%s\n",
+ oom_constraint_text[constraint], nodemask_pr_args(nodemask),
+ origin_memcg_name, kill_memcg_name);
rcu_read_unlock();
+}
+
+/**
+ * mem_cgroup_print_oom_info: Print OOM information relevant to memory controller.
+ * @memcg: The memory cgroup that went over limit
+ * @p: Task that is going to be killed
+ */
+void mem_cgroup_print_oom_meminfo(struct mem_cgroup *memcg)
+{
+ struct mem_cgroup *iter;
+ unsigned int i;

pr_info("memory: usage %llukB, limit %llukB, failcnt %lu\n",
K((u64)page_counter_read(&memcg->memory)),
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 8ba6cb88cf58..af0efab8a9e5 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -16,7 +16,6 @@
* for newbie kernel hackers. It features several pointers to major
* kernel subsystems and hints as to where to find out what things do.
*/
-
#include <linux/oom.h>
#include <linux/mm.h>
#include <linux/err.h>
@@ -237,13 +236,6 @@ unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg,
return points > 0 ? points : 1;
}

-enum oom_constraint {
- CONSTRAINT_NONE,
- CONSTRAINT_CPUSET,
- CONSTRAINT_MEMORY_POLICY,
- CONSTRAINT_MEMCG,
-};
-
/*
* Determine the type of allocation constraint.
*/
@@ -421,6 +413,7 @@ static void dump_tasks(struct mem_cgroup *memcg, const nodemask_t *nodemask)

static void dump_header(struct oom_control *oc, struct task_struct *p)
{
+ enum oom_constraint constraint = constrained_alloc(oc);
pr_warn("%s invoked oom-killer: gfp_mask=%#x(%pGg), nodemask=%*pbl, order=%d, oom_score_adj=%hd\n",
current->comm, oc->gfp_mask, &oc->gfp_mask,
nodemask_pr_args(oc->nodemask), oc->order,
@@ -430,8 +423,9 @@ static void dump_header(struct oom_control *oc, struct task_struct *p)

cpuset_print_current_mems_allowed();
dump_stack();
+ mem_cgroup_print_oom_context(oc->memcg, p, constraint, oc->nodemask);
if (is_memcg_oom(oc))
- mem_cgroup_print_oom_info(oc->memcg, p);
+ mem_cgroup_print_oom_meminfo(oc->memcg);
else {
show_mem(SHOW_MEM_FILTER_NODES, oc->nodemask);
if (is_dump_unreclaim_slabs())
--
2.14.1



2018-05-28 15:33:48

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH v6] Refactor part of the oom report in dump_header

On Sun 27-05-18 10:32:31, ufo19890607 wrote:
> The dump_header does not print the memcg's name when the system
> oom happened, so users cannot locate the certain container which
> contains the task that has been killed by the oom killer.
>
> I follow the advices of David Rientjes and Michal Hocko, and refactor
> part of the oom report in a backwards compatible way. After this patch,
> users can get the memcg's path from the oom report and check the certain
> container more quickly.
>
> Below is the part of the oom report in the dmesg
> ...
> [ 142.158316] panic cpuset=/ mems_allowed=0-1
> [ 142.158983] CPU: 15 PID: 8682 Comm: panic Not tainted 4.17.0-rc6+ #13
> [ 142.159659] Hardware name: Inspur SA5212M4/YZMB-00370-107, BIOS 4.1.10 11/14/2016
> [ 142.160342] Call Trace:
> [ 142.161037] dump_stack+0x78/0xb3
> [ 142.161734] dump_header+0x7d/0x334
> [ 142.162433] oom_kill_process+0x228/0x490
> [ 142.163126] ? oom_badness+0x2a/0x130
> [ 142.163821] out_of_memory+0xf0/0x280
> [ 142.164532] __alloc_pages_slowpath+0x711/0xa07
> [ 142.165241] __alloc_pages_nodemask+0x23f/0x260
> [ 142.165947] alloc_pages_vma+0x73/0x180
> [ 142.166665] do_anonymous_page+0xed/0x4e0
> [ 142.167388] __handle_mm_fault+0xbd2/0xe00
> [ 142.168114] handle_mm_fault+0x116/0x250
> [ 142.168841] __do_page_fault+0x233/0x4d0
> [ 142.169567] do_page_fault+0x32/0x130
> [ 142.170303] ? page_fault+0x8/0x30
> [ 142.171036] page_fault+0x1e/0x30
> [ 142.171764] RIP: 0033:0x7f403000a860
> [ 142.172517] RSP: 002b:00007ffc9f745c28 EFLAGS: 00010206
> [ 142.173268] RAX: 00007f3f6fd7d000 RBX: 0000000000000000 RCX: 00007f3f7f5cd000
> [ 142.174040] RDX: 00007f3fafd7d000 RSI: 0000000000000000 RDI: 00007f3f6fd7d000
> [ 142.174806] RBP: 00007ffc9f745c50 R08: ffffffffffffffff R09: 0000000000000000
> [ 142.175623] R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000400490
> [ 142.176542] R13: 00007ffc9f745d30 R14: 0000000000000000 R15: 0000000000000000
> [ 142.177709] oom-kill: constrain=CONSTRAINT_NONE nodemask=(null) origin_memcg= kill_memcg=/test/test1/test2 task=panic pid= 8622 uid= 0

Is it really helpful to dump the nodemask here again? We already have it
as a part of the "%s invoked oom-killer:" message.

Also I am worried that you are trying to do too many thing in a single
patch. One part is to provide an additional information. The other is
to guarantee a single line output. While the later is nice to have is
2*PATH_MAX static buffer justified for something that shouldn't really
occure? Also how often is the OOM report unreadable due to interleaving
messages?
--
Michal Hocko
SUSE Labs

2018-05-30 20:57:36

by David Rientjes

[permalink] [raw]
Subject: Re: [PATCH v6] Refactor part of the oom report in dump_header

On Mon, 28 May 2018, Michal Hocko wrote:

> > Below is the part of the oom report in the dmesg
> > ...
> > [ 142.158316] panic cpuset=/ mems_allowed=0-1
> > [ 142.158983] CPU: 15 PID: 8682 Comm: panic Not tainted 4.17.0-rc6+ #13
> > [ 142.159659] Hardware name: Inspur SA5212M4/YZMB-00370-107, BIOS 4.1.10 11/14/2016
> > [ 142.160342] Call Trace:
> > [ 142.161037] dump_stack+0x78/0xb3
> > [ 142.161734] dump_header+0x7d/0x334
> > [ 142.162433] oom_kill_process+0x228/0x490
> > [ 142.163126] ? oom_badness+0x2a/0x130
> > [ 142.163821] out_of_memory+0xf0/0x280
> > [ 142.164532] __alloc_pages_slowpath+0x711/0xa07
> > [ 142.165241] __alloc_pages_nodemask+0x23f/0x260
> > [ 142.165947] alloc_pages_vma+0x73/0x180
> > [ 142.166665] do_anonymous_page+0xed/0x4e0
> > [ 142.167388] __handle_mm_fault+0xbd2/0xe00
> > [ 142.168114] handle_mm_fault+0x116/0x250
> > [ 142.168841] __do_page_fault+0x233/0x4d0
> > [ 142.169567] do_page_fault+0x32/0x130
> > [ 142.170303] ? page_fault+0x8/0x30
> > [ 142.171036] page_fault+0x1e/0x30
> > [ 142.171764] RIP: 0033:0x7f403000a860
> > [ 142.172517] RSP: 002b:00007ffc9f745c28 EFLAGS: 00010206
> > [ 142.173268] RAX: 00007f3f6fd7d000 RBX: 0000000000000000 RCX: 00007f3f7f5cd000
> > [ 142.174040] RDX: 00007f3fafd7d000 RSI: 0000000000000000 RDI: 00007f3f6fd7d000
> > [ 142.174806] RBP: 00007ffc9f745c50 R08: ffffffffffffffff R09: 0000000000000000
> > [ 142.175623] R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000400490
> > [ 142.176542] R13: 00007ffc9f745d30 R14: 0000000000000000 R15: 0000000000000000
> > [ 142.177709] oom-kill: constrain=CONSTRAINT_NONE nodemask=(null) origin_memcg= kill_memcg=/test/test1/test2 task=panic pid= 8622 uid= 0
>
> Is it really helpful to dump the nodemask here again? We already have it
> as a part of the "%s invoked oom-killer:" message.
>

At the risk of making the patch more complex, it would be possible to
suppress nodemask=<mask> for constraints that are not
CONSTRAINT_MEMORY_POLICY, but the goal was to provide a single line output
that userspace can parse for all information and not rely on surrounding
lines to match oom kills with invocations (the invocation itself may have
been lost from the ring buffer), and we want this to not be subjected to
any ratelimit.

We need to eliminate the spurious spaces in the output, though, and fix
the spelling of "constrain". There should be no spaces between pid and
uid values.

2018-05-31 06:54:16

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH v6] Refactor part of the oom report in dump_header

On Sun 27-05-18 10:32:31, ufo19890607 wrote:
> The dump_header does not print the memcg's name when the system
> oom happened, so users cannot locate the certain container which
> contains the task that has been killed by the oom killer.
>
> I follow the advices of David Rientjes and Michal Hocko, and refactor
> part of the oom report in a backwards compatible way. After this patch,
> users can get the memcg's path from the oom report and check the certain
> container more quickly.
>
> Below is the part of the oom report in the dmesg
> ...
> [ 142.158316] panic cpuset=/ mems_allowed=0-1
> [ 142.158983] CPU: 15 PID: 8682 Comm: panic Not tainted 4.17.0-rc6+ #13
> [ 142.159659] Hardware name: Inspur SA5212M4/YZMB-00370-107, BIOS 4.1.10 11/14/2016
> [ 142.160342] Call Trace:
> [ 142.161037] dump_stack+0x78/0xb3
> [ 142.161734] dump_header+0x7d/0x334
> [ 142.162433] oom_kill_process+0x228/0x490
> [ 142.163126] ? oom_badness+0x2a/0x130
> [ 142.163821] out_of_memory+0xf0/0x280
> [ 142.164532] __alloc_pages_slowpath+0x711/0xa07
> [ 142.165241] __alloc_pages_nodemask+0x23f/0x260
> [ 142.165947] alloc_pages_vma+0x73/0x180
> [ 142.166665] do_anonymous_page+0xed/0x4e0
> [ 142.167388] __handle_mm_fault+0xbd2/0xe00
> [ 142.168114] handle_mm_fault+0x116/0x250
> [ 142.168841] __do_page_fault+0x233/0x4d0
> [ 142.169567] do_page_fault+0x32/0x130
> [ 142.170303] ? page_fault+0x8/0x30
> [ 142.171036] page_fault+0x1e/0x30
> [ 142.171764] RIP: 0033:0x7f403000a860
> [ 142.172517] RSP: 002b:00007ffc9f745c28 EFLAGS: 00010206
> [ 142.173268] RAX: 00007f3f6fd7d000 RBX: 0000000000000000 RCX: 00007f3f7f5cd000
> [ 142.174040] RDX: 00007f3fafd7d000 RSI: 0000000000000000 RDI: 00007f3f6fd7d000
> [ 142.174806] RBP: 00007ffc9f745c50 R08: ffffffffffffffff R09: 0000000000000000
> [ 142.175623] R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000400490
> [ 142.176542] R13: 00007ffc9f745d30 R14: 0000000000000000 R15: 0000000000000000
> [ 142.177709] oom-kill: constrain=CONSTRAINT_NONE nodemask=(null) origin_memcg= kill_memcg=/test/test1/test2 task=panic pid= 8622 uid= 0
> ...
>
> Changes since v5:
> - add an array of const char for each constraint.
> - replace all of the pr_cont with a single line print of the pr_info.
> - put enum oom_constraint into the memcontrol.c file for printing oom constraint.
>
> Changes since v4:
> - rename the helper's name to mem_cgroup_print_oom_context.
> - rename the mem_cgroup_print_oom_info to mem_cgroup_print_oom_meminfo.
> - add the constrain info in the dump_header.
>
> Changes since v3:
> - rename the helper's name to mem_cgroup_print_oom_memcg_name.
> - add the rcu lock held to the helper.
> - remove the print info of memcg's name in mem_cgroup_print_oom_info.
>
> Changes since v2:
> - add the mem_cgroup_print_memcg_name helper to print the memcg's
> name which contains the task that will be killed by the oom-killer.
>
> Changes since v1:
> - replace adding mem_cgroup_print_oom_info with printing the memcg's
> name only.
>
> Signed-off-by: ufo19890607 <[email protected]>
> ---
> include/linux/memcontrol.h | 29 +++++++++++++++++++++---
> mm/memcontrol.c | 55 ++++++++++++++++++++++++++++++++--------------
> mm/oom_kill.c | 12 +++-------
> 3 files changed, 67 insertions(+), 29 deletions(-)
>
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index d99b71bc2c66..1c7d5da1c827 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -62,6 +62,20 @@ struct mem_cgroup_reclaim_cookie {
> unsigned int generation;
> };
>
> +enum oom_constraint {
> + CONSTRAINT_NONE,
> + CONSTRAINT_CPUSET,
> + CONSTRAINT_MEMORY_POLICY,
> + CONSTRAINT_MEMCG,
> +};
> +
> +static const char * const oom_constraint_text[] = {
> + [CONSTRAINT_NONE] = "CONSTRAINT_NONE",
> + [CONSTRAINT_CPUSET] = "CONSTRAINT_CPUSET",
> + [CONSTRAINT_MEMORY_POLICY] = "CONSTRAINT_MEMORY_POLICY",
> + [CONSTRAINT_MEMCG] = "CONSTRAINT_MEMCG",
> +};
> +
> #ifdef CONFIG_MEMCG
>
> #define MEM_CGROUP_ID_SHIFT 16
> @@ -464,8 +478,11 @@ void mem_cgroup_handle_over_high(void);
>
> unsigned long mem_cgroup_get_limit(struct mem_cgroup *memcg);
>
> -void mem_cgroup_print_oom_info(struct mem_cgroup *memcg,
> - struct task_struct *p);
> +void mem_cgroup_print_oom_context(struct mem_cgroup *memcg,
> + struct task_struct *p, enum oom_constraint constraint,
> + nodemask_t *nodemask);
> +
> +void mem_cgroup_print_oom_meminfo(struct mem_cgroup *memcg);
>
> static inline void mem_cgroup_oom_enable(void)
> {
> @@ -859,7 +876,13 @@ static inline unsigned long mem_cgroup_get_limit(struct mem_cgroup *memcg)
> }
>
> static inline void
> -mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
> +mem_cgroup_print_oom_context(struct mem_cgroup *memcg, struct task_struct *p,
> + enum oom_constraint constraint, nodemask_t *nodemask)
> +{
> +}
> +
> +static inline void
> +mem_cgroup_print_oom_meminfo(struct mem_cgroup *memcg)
> {
> }
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 2bd3df3d101a..6c05fd3291e6 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1118,33 +1118,54 @@ static const char *const memcg1_stat_names[] = {
> };
>
> #define K(x) ((x) << (PAGE_SHIFT-10))
> -/**
> - * mem_cgroup_print_oom_info: Print OOM information relevant to memory controller.
> - * @memcg: The memory cgroup that went over limit
> +/*
> + * mem_cgroup_print_oom_context: Print OOM context information relevant to
> + * memory controller, which includes allocation constraint, nodemask, origin
> + * memcg that has reached its limit, kill memcg that contains the killed
> + * process, killed process's command, pid and uid.
> + * @memcg: The origin memory cgroup that went over limit
> * @p: Task that is going to be killed
> + * @constraint: The allocation constraint
> + * @nodemask: The allocation nodemask
> *
> * NOTE: @memcg and @p's mem_cgroup can be different when hierarchy is
> * enabled
> */
> -void mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
> +void mem_cgroup_print_oom_context(struct mem_cgroup *memcg, struct task_struct *p,
> + enum oom_constraint constraint, nodemask_t *nodemask)
> {
> - struct mem_cgroup *iter;
> - unsigned int i;
> + static char origin_memcg_name[PATH_MAX], kill_memcg_name[PATH_MAX];
> + struct cgroup *origin_cgrp, *kill_cgrp;
>
> rcu_read_lock();
> -
> - if (p) {
> - pr_info("Task in ");
> - pr_cont_cgroup_path(task_cgroup(p, memory_cgrp_id));
> - pr_cont(" killed as a result of limit of ");
> - } else {
> - pr_info("Memory limit reached of cgroup ");
> + if (memcg) {
> + origin_cgrp = memcg->css.cgroup;
> + cgroup_path(origin_cgrp, origin_memcg_name, PATH_MAX);
> }
> -
> - pr_cont_cgroup_path(memcg->css.cgroup);
> - pr_cont("\n");
> -
> + kill_cgrp = task_cgroup(p, memory_cgrp_id);
> + cgroup_path(kill_cgrp, kill_memcg_name, PATH_MAX);
> +
> + if (p)
> + pr_info("oom-kill: constrain=%s, nodemask=%*pbl, origin_memcg=%s, kill_memcg=%s, task=%s, pid=%5d, uid=%5d\n",
> + oom_constraint_text[constraint], nodemask_pr_args(nodemask),
> + origin_memcg_name, kill_memcg_name, p->comm, p->pid,
> + from_kuid(&init_user_ns, task_uid(p)));
> + else
> + pr_info("oom-kill: constrain=%s, nodemask=%*pbl, origin_memcg=%s, kill_memcg=%s\n",
> + oom_constraint_text[constraint], nodemask_pr_args(nodemask),
> + origin_memcg_name, kill_memcg_name);
> rcu_read_unlock();

Btw. this is buggy as well. origin_memcg_name will be the previous memcg
oom killed memcg name in the follow up global oom killer context. You
would need to *origin_memcg_name = 0 for !memcg case.

Anyway, I guess we do not really have a strong case to justify the
additional memory just to get rid of pr_cont which would be much
simpler. Interleaving messages will be still possible but we would
really need to see this happening to do something about that.
--
Michal Hocko
SUSE Labs

2018-06-01 06:21:12

by kernel test robot

[permalink] [raw]
Subject: [lkp-robot] 6c2f089158: BUG:KASAN:null-ptr-deref_in_m


FYI, we noticed the following commit (built with gcc-6):

commit: 6c2f08915886cda1668ace606660f72917ce1f71 ("Refactor part of the oom report in dump_header")
ERROR# 1:

in testcase: trinity
with following parameters:

runtime: 300s

test-description: Trinity is a linux system call fuzz tester.
test-url: http://codemonkey.org.uk/projects/trinity/


on test machine: qemu-system-x86_64 -enable-kvm -cpu host -smp 2 -m 1G

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):


+------------------------------------------------------------------+------------+------------+
| | bee797529d | 6c2f089158 |
+------------------------------------------------------------------+------------+------------+
| boot_successes | 4 | 3 |
| boot_failures | 12 | 13 |
| invoked_oom-killer:gfp_mask=0x | 12 | 12 |
| Mem-Info | 12 | 1 |
| Kernel_panic-not_syncing:Out_of_memory_and_no_killable_processes | 12 | |
| BUG:KASAN:null-ptr-deref_in_m | 0 | 12 |
| BUG:unable_to_handle_kernel | 0 | 12 |
| Oops:#[##] | 0 | 12 |
| RIP:mem_cgroup_print_oom_context | 0 | 12 |
| Kernel_panic-not_syncing:Fatal_exception | 0 | 12 |
| Out_of_memory:Kill_process | 0 | 1 |
+------------------------------------------------------------------+------------+------------+



[ 43.710081] BUG: KASAN: null-ptr-deref in mem_cgroup_print_oom_context+0x90/0x203
[ 43.710674] Read of size 8 at addr 0000000000000710 by task swapper/0/1
[ 43.713611]
[ 43.713611] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.17.0-rc6-00159-g6c2f089 #1
[ 43.713611] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[ 43.713611] Call Trace:
[ 43.713611] show_stack+0x6e/0x71
[ 43.713611] dump_stack+0x133/0x1c8
[ 43.713611] ? mem_cgroup_print_oom_context+0x90/0x203
[ 43.713611] kasan_report+0x313/0x35d
[ 43.713611] __asan_load8+0x7f/0x81
[ 43.723539] mem_cgroup_print_oom_context+0x90/0x203
[ 43.723539] dump_header+0x149/0x4cc
[ 43.723539] out_of_memory+0x5a1/0x665
[ 43.723539] ? unregister_oom_notifier+0x1a/0x1a
[ 43.723539] ? __alloc_pages_slowpath+0x119c/0x184f
[ 43.723539] __alloc_pages_slowpath+0x13dc/0x184f
[ 43.723539] ? get_page_from_freelist+0x17ba/0x18cf
[ 43.723539] ? __alloc_pages_cpuset_fallback+0x74/0x74
[ 43.733564] ? __asan_loadN+0xf/0x11
[ 43.733564] __alloc_pages_nodemask+0x384/0x560
[ 43.733564] ? __alloc_pages_slowpath+0x184f/0x184f
[ 43.733564] ? _find_next_bit+0x12f/0x1be
[ 43.733564] ? __asan_loadN+0xf/0x11
[ 43.733564] ? trace_irq_enable_rcuidle+0x3a/0x117
[ 43.733564] cache_grow_begin+0xe0/0x45e
[ 43.733564] fallback_alloc+0x1f3/0x27d
[ 43.733564] ____cache_alloc_node+0x16e/0x17b
[ 43.743553] kmem_cache_alloc_trace+0xe7/0x20f
[ 43.743553] maybe_link+0x1ff/0x307
[ 43.743553] do_name+0x10a/0x507
[ 43.743553] ? write_buffer+0x31/0x4c
[ 43.743553] write_buffer+0x39/0x4c
[ 43.743553] flush_buffer+0x63/0x143
[ 43.743553] __gunzip+0x65a/0x82e
[ 43.743553] ? decompress_method+0x147/0x147
[ 43.743553] ? error+0x51/0x51
[ 43.753558] gunzip+0x11/0x13
[ 43.753558] ? do_start+0x23/0x23
[ 43.753558] unpack_to_rootfs+0x2d8/0x5b5
[ 43.753558] ? do_start+0x23/0x23
[ 43.753558] ? printk+0x3a/0xc3
[ 43.753558] ? __gunzip+0x82e/0x82e
[ 43.753558] ? do_collect+0xc7/0xc7
[ 43.753558] populate_rootfs+0xf4/0x2fd
[ 43.753558] ? unpack_to_rootfs+0x5b5/0x5b5
[ 43.763573] do_one_initcall+0x1bf/0x47d
[ 43.763573] ? start_kernel+0x7f2/0x7f2
[ 43.763573] ? __asan_loadN+0xf/0x11
[ 43.763573] ? __asan_loadN+0xf/0x11
[ 43.763573] ? do_early_param+0x119/0x119
[ 43.763573] kernel_init_freeable+0x1d4/0x2e7
[ 43.763573] ? rest_init+0xdd/0xdd
[ 43.763573] kernel_init+0x11/0x20e
[ 43.773471] ? rest_init+0xdd/0xdd
[ 43.773566] ret_from_fork+0x24/0x30
[ 43.773566] ==================================================================
[ 43.773566] Disabling lock debugging due to kernel taint
[ 43.780696] BUG: unable to handle kernel NULL pointer dereference at 0000000000000710
[ 43.782654] PGD 0 P4D 0
[ 43.783382] Oops: 0000 [#1] SMP KASAN PTI
[ 43.783737] CPU: 1 PID: 1 Comm: swapper/0 Tainted: G B 4.17.0-rc6-00159-g6c2f089 #1
[ 43.783737] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[ 43.783737] RIP: 0010:mem_cgroup_print_oom_context+0x90/0x203
[ 43.783737] RSP: 0000:ffff88000f4d7648 EFLAGS: 00010296
[ 43.793615] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff812d6701
[ 43.793615] RDX: 0000000000000000 RSI: 0000000000000008 RDI: 0000000000000246
[ 43.793615] RBP: ffff88000f4d7678 R08: fffffbfff0f2c7e4 R09: fffffbfff0f2c7e3
[ 43.793615] R10: 0000000000000000 R11: ffffffff87963f1f R12: 0000000000000000
[ 43.793615] R13: 0000000000000002 R14: 0000000000000000 R15: 0000000000000000
[ 43.793615] FS: 0000000000000000(0000) GS:ffff88002cd00000(0000) knlGS:0000000000000000
[ 43.803595] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 43.803595] CR2: 0000000000000710 CR3: 0000000006236000 CR4: 00000000000006e0
[ 43.803595] Call Trace:
[ 43.803595] dump_header+0x149/0x4cc
[ 43.803595] out_of_memory+0x5a1/0x665
[ 43.803595] ? unregister_oom_notifier+0x1a/0x1a
[ 43.803595] ? __alloc_pages_slowpath+0x119c/0x184f
[ 43.803595] __alloc_pages_slowpath+0x13dc/0x184f
[ 43.803595] ? get_page_from_freelist+0x17ba/0x18cf
[ 43.813619] ? __alloc_pages_cpuset_fallback+0x74/0x74
[ 43.813619] ? __asan_loadN+0xf/0x11
[ 43.813619] __alloc_pages_nodemask+0x384/0x560
[ 43.813619] ? __alloc_pages_slowpath+0x184f/0x184f
[ 43.813619] ? _find_next_bit+0x12f/0x1be
[ 43.813619] ? __asan_loadN+0xf/0x11
[ 43.813619] ? trace_irq_enable_rcuidle+0x3a/0x117
[ 43.813619] cache_grow_begin+0xe0/0x45e
[ 43.813619] fallback_alloc+0x1f3/0x27d
[ 43.813619] ____cache_alloc_node+0x16e/0x17b
[ 43.823627] kmem_cache_alloc_trace+0xe7/0x20f
[ 43.823627] maybe_link+0x1ff/0x307
[ 43.823627] do_name+0x10a/0x507
[ 43.823627] ? write_buffer+0x31/0x4c
[ 43.823627] write_buffer+0x39/0x4c
[ 43.823627] flush_buffer+0x63/0x143
[ 43.823627] __gunzip+0x65a/0x82e
[ 43.823627] ? decompress_method+0x147/0x147
[ 43.823627] ? error+0x51/0x51
[ 43.823627] gunzip+0x11/0x13
[ 43.833595] ? do_start+0x23/0x23
[ 43.833595] unpack_to_rootfs+0x2d8/0x5b5
[ 43.833595] ? do_start+0x23/0x23
[ 43.833595] ? printk+0x3a/0xc3
[ 43.833595] ? __gunzip+0x82e/0x82e
[ 43.833595] ? do_collect+0xc7/0xc7
[ 43.833595] populate_rootfs+0xf4/0x2fd
[ 43.833595] ? unpack_to_rootfs+0x5b5/0x5b5
[ 43.833595] do_one_initcall+0x1bf/0x47d
[ 43.833595] ? start_kernel+0x7f2/0x7f2
[ 43.833595] ? __asan_loadN+0xf/0x11
[ 43.843603] ? __asan_loadN+0xf/0x11
[ 43.843603] ? do_early_param+0x119/0x119
[ 43.843603] kernel_init_freeable+0x1d4/0x2e7
[ 43.843603] ? rest_init+0xdd/0xdd
[ 43.843603] kernel_init+0x11/0x20e
[ 43.843603] ? rest_init+0xdd/0xdd
[ 43.843603] ret_from_fork+0x24/0x30
[ 43.843603] Code: e8 45 33 ff ff 49 8b bd 10 01 00 00 b9 00 10 00 00 48 c7 c2 c0 c8 a8 88 31 f6 e8 3b 69 12 00 48 8d bb 10 07 00 00 e8 1f 33 ff ff <4c> 8b ab 10 07 00 00 49 8d 7d 10 e8 0f 33 ff ff 4d 8b 6d 10 4c
[ 43.853611] RIP: mem_cgroup_print_oom_context+0x90/0x203 RSP: ffff88000f4d7648
[ 43.853611] CR2: 0000000000000710
[ 43.853611] random: get_random_bytes called from init_oops_id+0x51/0x64 with crng_init=0
[ 43.853611] ---[ end trace 0d0e0f91afa36bfe ]---


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> job-script # job-script is attached in this email



Thanks,
Xiaolong


Attachments:
(No filename) (8.36 kB)
config-4.17.0-rc6-00159-g6c2f089 (133.38 kB)
job-script (3.97 kB)
dmesg.xz (8.93 kB)
Download all attachments