From: yuzhoujian <[email protected]>
The current system wide oom report prints information about the victim
and the allocation context and restrictions. It, however, doesn't
provide any information about memory cgroup the victim belongs to. This
information can be interesting for container users because they can find
the victim's container much more easily.
I follow the advices of David Rientjes and Michal Hocko, and refactor
part of the oom report. After this patch, users can get the memcg's
path from the oom report and check the certain container more quickly.
The oom print info after this patch:
oom-kill:constraint=<constraint>,nodemask=<nodemask>,oom_memcg=<memcg>,task_memcg=<memcg>,task=<comm>,pid=<pid>,uid=<uid>
Signed-off-by: yuzhoujian <[email protected]>
---
Below is the part of the oom report in the dmesg
...
[ 126.168182] panic invoked oom-killer: gfp_mask=0x6280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), order=0, oom_score_adj=0
[ 126.169115] panic cpuset=/ mems_allowed=0-1
[ 126.169806] CPU: 23 PID: 8668 Comm: panic Not tainted 4.18.0-rc2+ #36
[ 126.170494] Hardware name: Inspur SA5212M4/YZMB-00370-107, BIOS 4.1.10 11/14/2016
[ 126.171197] Call Trace:
[ 126.171901] dump_stack+0x5a/0x73
[ 126.172593] dump_header+0x58/0x2dc
[ 126.173294] oom_kill_process+0x228/0x420
[ 126.173999] ? oom_badness+0x2a/0x130
[ 126.174705] out_of_memory+0x11a/0x4a0
[ 126.175415] __alloc_pages_slowpath+0x7cc/0xa1e
[ 126.176128] ? __alloc_pages_slowpath+0x194/0xa1e
[ 126.176853] ? page_counter_try_charge+0x54/0xc0
[ 126.177580] __alloc_pages_nodemask+0x277/0x290
[ 126.178319] alloc_pages_vma+0x73/0x180
[ 126.179058] do_anonymous_page+0xed/0x5a0
[ 126.179825] __handle_mm_fault+0xbb3/0xe70
[ 126.180566] handle_mm_fault+0xfa/0x210
[ 126.181313] __do_page_fault+0x233/0x4c0
[ 126.182063] do_page_fault+0x32/0x140
[ 126.182812] ? page_fault+0x8/0x30
[ 126.183560] page_fault+0x1e/0x30
[ 126.184311] RIP: 0033:0x7f62c9e65860
[ 126.185059] Code: Bad RIP value.
[ 126.185819] RSP: 002b:00007ffcf7bc9288 EFLAGS: 00010206
[ 126.186589] RAX: 00007f6209bd8000 RBX: 0000000000000000 RCX: 00007f6236914000
[ 126.187383] RDX: 00007f6249bd8000 RSI: 0000000000000000 RDI: 00007f6209bd8000
[ 126.188179] RBP: 00007ffcf7bc92b0 R08: ffffffffffffffff R09: 0000000000000000
[ 126.188981] R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000400490
[ 126.189793] R13: 00007ffcf7bc9390 R14: 0000000000000000 R15: 0000000000000000
[ 126.190619] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),global_oom,task_memcg=/test/test1/test2,task=panic,pid= 8673,uid= 0
...
Changes since v10:
- divide the patch v8 into two parts. One part is to add the array of const char and put enum
oom_constaint into oom.h; the other adds a new func to print the missing information for the system-
wide oom report.
Changes since v9:
- divide the patch v8 into two parts. One part is to move enum oom_constraint into memcontrol.h; the
other refactors the output info in the dump_header.
- replace orgin_memcg and kill_memcg with oom_memcg and task_memcg resptively.
Changes since v8:
- add the constraint in the oom_control structure.
- put enum oom_constraint and constraint array into the oom.h file.
- simplify the description for mem_cgroup_print_oom_context.
Changes since v7:
- add the constraint parameter to dump_header and oom_kill_process.
- remove the static char array in the mem_cgroup_print_oom_context, and
invoke pr_cont_cgroup_path to print memcg' name.
- combine the patchset v6 into one.
Changes since v6:
- divide the patch v5 into two parts. One part is to add an array of const char and
put enum oom_constraint into the memcontrol.h; the other refactors the output
in the dump_header.
- limit the memory usage for the static char array by using NAME_MAX in the mem_cgroup_print_oom_context.
- eliminate the spurious spaces in the oom's output and fix the spelling of "constrain".
Changes since v5:
- add an array of const char for each constraint.
- replace all of the pr_cont with a single line print of the pr_info.
- put enum oom_constraint into the memcontrol.c file for printing oom constraint.
Changes since v4:
- rename the helper's name to mem_cgroup_print_oom_context.
- rename the mem_cgroup_print_oom_info to mem_cgroup_print_oom_meminfo.
- add the constrain info in the dump_header.
Changes since v3:
- rename the helper's name to mem_cgroup_print_oom_memcg_name.
- add the rcu lock held to the helper.
- remove the print info of memcg's name in mem_cgroup_print_oom_info.
Changes since v2:
- add the mem_cgroup_print_memcg_name helper to print the memcg's
name which contains the task that will be killed by the oom-killer.
Changes since v1:
- replace adding mem_cgroup_print_oom_info with printing the memcg's
name only.
include/linux/oom.h | 17 +++++++++++++++++
mm/oom_kill.c | 31 ++++++++++++++-----------------
2 files changed, 31 insertions(+), 17 deletions(-)
diff --git a/include/linux/oom.h b/include/linux/oom.h
index 6adac113e96d..5bed78d4bfb8 100644
--- a/include/linux/oom.h
+++ b/include/linux/oom.h
@@ -15,6 +15,20 @@ struct notifier_block;
struct mem_cgroup;
struct task_struct;
+enum oom_constraint {
+ CONSTRAINT_NONE,
+ CONSTRAINT_CPUSET,
+ CONSTRAINT_MEMORY_POLICY,
+ CONSTRAINT_MEMCG,
+};
+
+static const char * const oom_constraint_text[] = {
+ [CONSTRAINT_NONE] = "CONSTRAINT_NONE",
+ [CONSTRAINT_CPUSET] = "CONSTRAINT_CPUSET",
+ [CONSTRAINT_MEMORY_POLICY] = "CONSTRAINT_MEMORY_POLICY",
+ [CONSTRAINT_MEMCG] = "CONSTRAINT_MEMCG",
+};
+
/*
* Details of the page allocation that triggered the oom killer that are used to
* determine what should be killed.
@@ -42,6 +56,9 @@ struct oom_control {
unsigned long totalpages;
struct task_struct *chosen;
unsigned long chosen_points;
+
+ /* Used to print the constraint info. */
+ enum oom_constraint constraint;
};
extern struct mutex oom_lock;
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 84081e77bc51..f9b08e455fd1 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -237,13 +237,6 @@ unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg,
return points > 0 ? points : 1;
}
-enum oom_constraint {
- CONSTRAINT_NONE,
- CONSTRAINT_CPUSET,
- CONSTRAINT_MEMORY_POLICY,
- CONSTRAINT_MEMCG,
-};
-
/*
* Determine the type of allocation constraint.
*/
@@ -421,15 +414,20 @@ static void dump_tasks(struct mem_cgroup *memcg, const nodemask_t *nodemask)
static void dump_header(struct oom_control *oc, struct task_struct *p)
{
- pr_warn("%s invoked oom-killer: gfp_mask=%#x(%pGg), nodemask=%*pbl, order=%d, oom_score_adj=%hd\n",
- current->comm, oc->gfp_mask, &oc->gfp_mask,
- nodemask_pr_args(oc->nodemask), oc->order,
+ pr_warn("%s invoked oom-killer: gfp_mask=%#x(%pGg), order=%d, oom_score_adj=%hd\n",
+ current->comm, oc->gfp_mask, &oc->gfp_mask, oc->order,
current->signal->oom_score_adj);
if (!IS_ENABLED(CONFIG_COMPACTION) && oc->order)
pr_warn("COMPACTION is disabled!!!\n");
cpuset_print_current_mems_allowed();
dump_stack();
+
+ /* one line summary of the oom killer context. */
+ pr_info("oom-kill:constraint=%s,nodemask=%*pbl,task=%s,pid=%5d,uid=%5d",
+ oom_constraint_text[oc->constraint],
+ nodemask_pr_args(oc->nodemask),
+ p->comm, p->pid, from_kuid(&init_user_ns, task_uid(p)));
if (is_memcg_oom(oc))
mem_cgroup_print_oom_info(oc->memcg, p);
else {
@@ -973,8 +971,7 @@ static void oom_kill_process(struct oom_control *oc, const char *message)
/*
* Determines whether the kernel must panic because of the panic_on_oom sysctl.
*/
-static void check_panic_on_oom(struct oom_control *oc,
- enum oom_constraint constraint)
+static void check_panic_on_oom(struct oom_control *oc)
{
if (likely(!sysctl_panic_on_oom))
return;
@@ -984,7 +981,7 @@ static void check_panic_on_oom(struct oom_control *oc,
* does not panic for cpuset, mempolicy, or memcg allocation
* failures.
*/
- if (constraint != CONSTRAINT_NONE)
+ if (oc->constraint != CONSTRAINT_NONE)
return;
}
/* Do not panic for oom kills triggered by sysrq */
@@ -1021,8 +1018,8 @@ EXPORT_SYMBOL_GPL(unregister_oom_notifier);
bool out_of_memory(struct oom_control *oc)
{
unsigned long freed = 0;
- enum oom_constraint constraint = CONSTRAINT_NONE;
+ oc->constraint = CONSTRAINT_NONE;
if (oom_killer_disabled)
return false;
@@ -1057,10 +1054,10 @@ bool out_of_memory(struct oom_control *oc)
* Check if there were limitations on the allocation (only relevant for
* NUMA and memcg) that may require different handling.
*/
- constraint = constrained_alloc(oc);
- if (constraint != CONSTRAINT_MEMORY_POLICY)
+ oc->constraint = constrained_alloc(oc);
+ if (oc->constraint != CONSTRAINT_MEMORY_POLICY)
oc->nodemask = NULL;
- check_panic_on_oom(oc, constraint);
+ check_panic_on_oom(oc);
if (!is_memcg_oom(oc) && sysctl_oom_kill_allocating_task &&
current->mm && !oom_unkillable_task(current, NULL, oc->nodemask) &&
--
2.14.1
From: yuzhoujian <[email protected]>
Add a new func mem_cgroup_print_oom_context to print missing information
for the system-wide oom report which includes the oom memcg that has
reached its limit, task memcg that contains the killed task.
Signed-off-by: yuzhoujian <[email protected]>
---
include/linux/memcontrol.h | 15 ++++++++++++---
mm/memcontrol.c | 36 ++++++++++++++++++++++--------------
mm/oom_kill.c | 10 ++++++----
3 files changed, 40 insertions(+), 21 deletions(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 6c6fb116e925..90855880bca2 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -28,6 +28,7 @@
#include <linux/eventfd.h>
#include <linux/mm.h>
#include <linux/vmstat.h>
+#include <linux/oom.h>
#include <linux/writeback.h>
#include <linux/page-flags.h>
@@ -491,8 +492,10 @@ void mem_cgroup_handle_over_high(void);
unsigned long mem_cgroup_get_max(struct mem_cgroup *memcg);
-void mem_cgroup_print_oom_info(struct mem_cgroup *memcg,
- struct task_struct *p);
+void mem_cgroup_print_oom_context(struct mem_cgroup *memcg,
+ struct task_struct *p);
+
+void mem_cgroup_print_oom_meminfo(struct mem_cgroup *memcg);
static inline void mem_cgroup_oom_enable(void)
{
@@ -903,7 +906,13 @@ static inline unsigned long mem_cgroup_get_max(struct mem_cgroup *memcg)
}
static inline void
-mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
+mem_cgroup_print_oom_context(struct mem_cgroup *memcg,
+ struct task_struct *p)
+{
+}
+
+static inline void
+mem_cgroup_print_oom_meminfo(struct mem_cgroup *memcg)
{
}
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index e6f0d5ef320a..18deea974cfd 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1119,32 +1119,40 @@ static const char *const memcg1_stat_names[] = {
#define K(x) ((x) << (PAGE_SHIFT-10))
/**
- * mem_cgroup_print_oom_info: Print OOM information relevant to memory controller.
- * @memcg: The memory cgroup that went over limit
+ * mem_cgroup_print_oom_context: Print OOM context information relevant to
+ * memory controller.
+ * @memcg: The origin memory cgroup that went over limit
* @p: Task that is going to be killed
*
* NOTE: @memcg and @p's mem_cgroup can be different when hierarchy is
* enabled
*/
-void mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
+void mem_cgroup_print_oom_context(struct mem_cgroup *memcg, struct task_struct *p)
{
- struct mem_cgroup *iter;
- unsigned int i;
+ struct cgroup *origin_cgrp, *kill_cgrp;
rcu_read_lock();
-
+ if (memcg) {
+ pr_cont(",oom_memcg=");
+ pr_cont_cgroup_path(memcg->css.cgroup);
+ } else
+ pr_cont(",global_oom");
if (p) {
- pr_info("Task in ");
+ pr_cont(",task_memcg=");
pr_cont_cgroup_path(task_cgroup(p, memory_cgrp_id));
- pr_cont(" killed as a result of limit of ");
- } else {
- pr_info("Memory limit reached of cgroup ");
}
-
- pr_cont_cgroup_path(memcg->css.cgroup);
- pr_cont("\n");
-
rcu_read_unlock();
+}
+
+/**
+ * mem_cgroup_print_oom_meminfo: Print OOM memory information relevant to
+ * memory controller.
+ * @memcg: The memory cgroup that went over limit
+ */
+void mem_cgroup_print_oom_meminfo(struct mem_cgroup *memcg)
+{
+ struct mem_cgroup *iter;
+ unsigned int i;
pr_info("memory: usage %llukB, limit %llukB, failcnt %lu\n",
K((u64)page_counter_read(&memcg->memory)),
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index f9b08e455fd1..e990c45d2e7d 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -424,12 +424,14 @@ static void dump_header(struct oom_control *oc, struct task_struct *p)
dump_stack();
/* one line summary of the oom killer context. */
- pr_info("oom-kill:constraint=%s,nodemask=%*pbl,task=%s,pid=%5d,uid=%5d",
+ pr_info("oom-kill:constraint=%s,nodemask=%*pbl",
oom_constraint_text[oc->constraint],
- nodemask_pr_args(oc->nodemask),
- p->comm, p->pid, from_kuid(&init_user_ns, task_uid(p)));
+ nodemask_pr_args(oc->nodemask));
+ mem_cgroup_print_oom_context(oc->memcg, p);
+ pr_cont(",task=%s,pid=%5d,uid=%5d\n", p->comm, p->pid,
+ from_kuid(&init_user_ns, task_uid(p)));
if (is_memcg_oom(oc))
- mem_cgroup_print_oom_info(oc->memcg, p);
+ mem_cgroup_print_oom_meminfo(oc->memcg);
else {
show_mem(SHOW_MEM_FILTER_NODES, oc->nodemask);
if (is_dump_unreclaim_slabs())
--
2.14.1
On Sun 01-07-18 00:38:59, [email protected] wrote:
> From: yuzhoujian <[email protected]>
>
> Add a new func mem_cgroup_print_oom_context to print missing information
> for the system-wide oom report which includes the oom memcg that has
> reached its limit, task memcg that contains the killed task.
A proper changelog should contain the motivation. It is trivial to see
what the patch does from the diff. The motivation is less clear. What
about the followig
"
The current oom report doesn't display victim's memcg context during the
global OOM situation. While this information is not strictly needed it
can be really usefule for containerized environments to see which
container has lost a process (+ add more arguments I am just guessing
from your not really specific statements). Now that we have a single
line for the oom context we can trivially add both the oom memcg (this
can be either global_oom or a specific memcg which hits its hard limits)
and task_memcg which is the victim's memcg.
"
--
Michal Hocko
SUSE Labs
On Sun 01-07-18 00:38:58, [email protected] wrote:
> From: yuzhoujian <[email protected]>
>
> The current system wide oom report prints information about the victim
> and the allocation context and restrictions. It, however, doesn't
> provide any information about memory cgroup the victim belongs to. This
> information can be interesting for container users because they can find
> the victim's container much more easily.
>
> I follow the advices of David Rientjes and Michal Hocko, and refactor
> part of the oom report. After this patch, users can get the memcg's
> path from the oom report and check the certain container more quickly.
>
> The oom print info after this patch:
> oom-kill:constraint=<constraint>,nodemask=<nodemask>,oom_memcg=<memcg>,task_memcg=<memcg>,task=<comm>,pid=<pid>,uid=<uid>
This changelog doesn't correspond to the patch. Also while we were
discussing this off-list, I have suggested to pull the cpuset info into
the single line output.
What about the following?
"
OOM report contains several sections. The first one is the allocation
context that has triggered the OOM. Then we have cpuset context
followed by the stack trace of the OOM path. Followed by the oom
eligible tasks and the information about the chosen oom victim.
One thing that makes parsing more awkward than necessary is that we do
not have a single and easily parsable line about the oom context. This
patch is reorganizing the oom report to
1) who invoked oom and what was the allocation request
[ 126.168182] panic invoked oom-killer: gfp_mask=0x6280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), order=0, oom_score_adj=0
2) OOM stack trace
[ 126.169806] CPU: 23 PID: 8668 Comm: panic Not tainted 4.18.0-rc2+ #36
[ 126.170494] Hardware name: Inspur SA5212M4/YZMB-00370-107, BIOS 4.1.10 11/14/2016
[ 126.171197] Call Trace:
[ 126.171901] dump_stack+0x5a/0x73
[ 126.172593] dump_header+0x58/0x2dc
[ 126.173294] oom_kill_process+0x228/0x420
[ 126.173999] ? oom_badness+0x2a/0x130
[ 126.174705] out_of_memory+0x11a/0x4a0
[ 126.175415] __alloc_pages_slowpath+0x7cc/0xa1e
[ 126.176128] ? __alloc_pages_slowpath+0x194/0xa1e
[ 126.176853] ? page_counter_try_charge+0x54/0xc0
[ 126.177580] __alloc_pages_nodemask+0x277/0x290
[ 126.178319] alloc_pages_vma+0x73/0x180
[ 126.179058] do_anonymous_page+0xed/0x5a0
[ 126.179825] __handle_mm_fault+0xbb3/0xe70
[ 126.180566] handle_mm_fault+0xfa/0x210
[ 126.181313] __do_page_fault+0x233/0x4c0
[ 126.182063] do_page_fault+0x32/0x140
[ 126.182812] ? page_fault+0x8/0x30
[ 126.183560] page_fault+0x1e/0x30
3) oom context (contrains and the chosen victim)
[ 126.190619] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0-1,task=panic,pid= 8673,uid= 0
An admin can easily get the full oom context at a single line which
makes parsing much easier.
"
--
Michal Hocko
SUSE Labs
Hi Michal
cpuset_print_current_mems_allowed is also invoked by
warn_alloc(page_alloc.c). So, can I remove the current->comm output in
the pr_info ?
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index d8b12e0d39cd..09b8ef6186c6 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -2666,9 +2666,9 @@ void cpuset_print_current_mems_allowed(void)
rcu_read_lock();
cgrp = task_cs(current)->css.cgroup;
- pr_info("%s cpuset=", current->comm);
+ pr_info(",cpuset=");
pr_cont_cgroup_name(cgrp);
- pr_cont(" mems_allowed=%*pbl\n",
+ pr_cont(",mems_allowed=%*pbl",
nodemask_pr_args(¤t->mems_allowed));
>
> On Sun 01-07-18 00:38:58, [email protected] wrote:
> > From: yuzhoujian <[email protected]>
> >
> > The current system wide oom report prints information about the victim
> > and the allocation context and restrictions. It, however, doesn't
> > provide any information about memory cgroup the victim belongs to. This
> > information can be interesting for container users because they can find
> > the victim's container much more easily.
> >
> > I follow the advices of David Rientjes and Michal Hocko, and refactor
> > part of the oom report. After this patch, users can get the memcg's
> > path from the oom report and check the certain container more quickly.
> >
> > The oom print info after this patch:
> > oom-kill:constraint=<constraint>,nodemask=<nodemask>,oom_memcg=<memcg>,task_memcg=<memcg>,task=<comm>,pid=<pid>,uid=<uid>
>
> This changelog doesn't correspond to the patch. Also while we were
> discussing this off-list, I have suggested to pull the cpuset info into
> the single line output.
>
> What about the following?
> "
> OOM report contains several sections. The first one is the allocation
> context that has triggered the OOM. Then we have cpuset context
> followed by the stack trace of the OOM path. Followed by the oom
> eligible tasks and the information about the chosen oom victim.
>
> One thing that makes parsing more awkward than necessary is that we do
> not have a single and easily parsable line about the oom context. This
> patch is reorganizing the oom report to
> 1) who invoked oom and what was the allocation request
> [ 126.168182] panic invoked oom-killer: gfp_mask=0x6280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), order=0, oom_score_adj=0
>
> 2) OOM stack trace
> [ 126.169806] CPU: 23 PID: 8668 Comm: panic Not tainted 4.18.0-rc2+ #36
> [ 126.170494] Hardware name: Inspur SA5212M4/YZMB-00370-107, BIOS 4.1.10 11/14/2016
> [ 126.171197] Call Trace:
> [ 126.171901] dump_stack+0x5a/0x73
> [ 126.172593] dump_header+0x58/0x2dc
> [ 126.173294] oom_kill_process+0x228/0x420
> [ 126.173999] ? oom_badness+0x2a/0x130
> [ 126.174705] out_of_memory+0x11a/0x4a0
> [ 126.175415] __alloc_pages_slowpath+0x7cc/0xa1e
> [ 126.176128] ? __alloc_pages_slowpath+0x194/0xa1e
> [ 126.176853] ? page_counter_try_charge+0x54/0xc0
> [ 126.177580] __alloc_pages_nodemask+0x277/0x290
> [ 126.178319] alloc_pages_vma+0x73/0x180
> [ 126.179058] do_anonymous_page+0xed/0x5a0
> [ 126.179825] __handle_mm_fault+0xbb3/0xe70
> [ 126.180566] handle_mm_fault+0xfa/0x210
> [ 126.181313] __do_page_fault+0x233/0x4c0
> [ 126.182063] do_page_fault+0x32/0x140
> [ 126.182812] ? page_fault+0x8/0x30
> [ 126.183560] page_fault+0x1e/0x30
>
> 3) oom context (contrains and the chosen victim)
> [ 126.190619] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0-1,task=panic,pid= 8673,uid= 0
>
> An admin can easily get the full oom context at a single line which
> makes parsing much easier.
> "
> --
> Michal Hocko
> SUSE Labs
On Tue 03-07-18 18:57:14, 禹舟键 wrote:
> Hi Michal
> cpuset_print_current_mems_allowed is also invoked by
> warn_alloc(page_alloc.c). So, can I remove the current->comm output in
> the pr_info ?
>
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index d8b12e0d39cd..09b8ef6186c6 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -2666,9 +2666,9 @@ void cpuset_print_current_mems_allowed(void)
> rcu_read_lock();
>
> cgrp = task_cs(current)->css.cgroup;
> - pr_info("%s cpuset=", current->comm);
> + pr_info(",cpuset=");
> pr_cont_cgroup_name(cgrp);
> - pr_cont(" mems_allowed=%*pbl\n",
> + pr_cont(",mems_allowed=%*pbl",
> nodemask_pr_args(¤t->mems_allowed));
Yes, I think so. Just jam the cpuset info to the allocation context
warning like this
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1521100f1e63..6bc7d5d4007a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3416,12 +3416,13 @@ void warn_alloc(gfp_t gfp_mask, nodemask_t *nodemask, const char *fmt, ...)
va_start(args, fmt);
vaf.fmt = fmt;
vaf.va = &args;
- pr_warn("%s: %pV, mode:%#x(%pGg), nodemask=%*pbl\n",
+ pr_warn("%s: %pV, mode:%#x(%pGg), nodemask=%*pbl",
current->comm, &vaf, gfp_mask, &gfp_mask,
nodemask_pr_args(nodemask));
va_end(args);
cpuset_print_current_mems_allowed();
+ pr_cont("\n");
dump_stack();
warn_alloc_show_mem(gfp_mask, nodemask);
--
Michal Hocko
SUSE Labs
On Sat, Jun 30, 2018 at 7:38 PM, <[email protected]> wrote:
> From: yuzhoujian <[email protected]>
>
> The current system wide oom report prints information about the victim
> and the allocation context and restrictions. It, however, doesn't
> provide any information about memory cgroup the victim belongs to. This
> information can be interesting for container users because they can find
> the victim's container much more easily.
>
> I follow the advices of David Rientjes and Michal Hocko, and refactor
> part of the oom report. After this patch, users can get the memcg's
> path from the oom report and check the certain container more quickly.
>
> The oom print info after this patch:
> oom-kill:constraint=<constraint>,nodemask=<nodemask>,oom_memcg=<memcg>,task_memcg=<memcg>,task=<comm>,pid=<pid>,uid=<uid>
> +static const char * const oom_constraint_text[] = {
> + [CONSTRAINT_NONE] = "CONSTRAINT_NONE",
> + [CONSTRAINT_CPUSET] = "CONSTRAINT_CPUSET",
> + [CONSTRAINT_MEMORY_POLICY] = "CONSTRAINT_MEMORY_POLICY",
> + [CONSTRAINT_MEMCG] = "CONSTRAINT_MEMCG",
> +};
I'm not sure why we have this in the header.
This produces a lot of noise when W=1.
In file included from
/home/andy/prj/linux-topic-mfld/include/linux/memcontrol.h:31:0,
from /home/andy/prj/linux-topic-mfld/include/net/sock.h:58,
from /home/andy/prj/linux-topic-mfld/include/linux/tcp.h:23,
from /home/andy/prj/linux-topic-mfld/include/linux/ipv6.h:87,
from /home/andy/prj/linux-topic-mfld/include/net/ipv6.h:16,
from
/home/andy/prj/linux-topic-mfld/net/ipv4/netfilter/nf_log_ipv4.c:17:
/home/andy/prj/linux-topic-mfld/include/linux/oom.h:32:27: warning:
‘oom_constraint_text’ defined but not used [-W
unused-const-variable=]
static const char * const oom_constraint_text[] = {
^~~~~~~~~~~~~~~~~~~
CC [M] net/ipv4/netfilter/iptable_nat.o
If you need (but looking at the code you actually don't if I didn't
miss anything) it in several places, just export.
Otherwise put it back to memcontrol.c.
--
With Best Regards,
Andy Shevchenko
Hi Andy
The const char array need to be used by the new func
mem_cgroup_print_oom_context and some funcs in oom_kill.c in the
second patch.
Thanks
>
> On Sat, Jun 30, 2018 at 7:38 PM, <[email protected]> wrote:
> > From: yuzhoujian <[email protected]>
> >
> > The current system wide oom report prints information about the victim
> > and the allocation context and restrictions. It, however, doesn't
> > provide any information about memory cgroup the victim belongs to. This
> > information can be interesting for container users because they can find
> > the victim's container much more easily.
> >
> > I follow the advices of David Rientjes and Michal Hocko, and refactor
> > part of the oom report. After this patch, users can get the memcg's
> > path from the oom report and check the certain container more quickly.
> >
> > The oom print info after this patch:
> > oom-kill:constraint=<constraint>,nodemask=<nodemask>,oom_memcg=<memcg>,task_memcg=<memcg>,task=<comm>,pid=<pid>,uid=<uid>
>
>
> > +static const char * const oom_constraint_text[] = {
> > + [CONSTRAINT_NONE] = "CONSTRAINT_NONE",
> > + [CONSTRAINT_CPUSET] = "CONSTRAINT_CPUSET",
> > + [CONSTRAINT_MEMORY_POLICY] = "CONSTRAINT_MEMORY_POLICY",
> > + [CONSTRAINT_MEMCG] = "CONSTRAINT_MEMCG",
> > +};
>
> I'm not sure why we have this in the header.
>
> This produces a lot of noise when W=1.
>
> In file included from
> /home/andy/prj/linux-topic-mfld/include/linux/memcontrol.h:31:0,
> from /home/andy/prj/linux-topic-mfld/include/net/sock.h:58,
> from /home/andy/prj/linux-topic-mfld/include/linux/tcp.h:23,
> from /home/andy/prj/linux-topic-mfld/include/linux/ipv6.h:87,
> from /home/andy/prj/linux-topic-mfld/include/net/ipv6.h:16,
> from
> /home/andy/prj/linux-topic-mfld/net/ipv4/netfilter/nf_log_ipv4.c:17:
> /home/andy/prj/linux-topic-mfld/include/linux/oom.h:32:27: warning:
> ‘oom_constraint_text’ defined but not used [-W
> unused-const-variable=]
> static const char * const oom_constraint_text[] = {
> ^~~~~~~~~~~~~~~~~~~
> CC [M] net/ipv4/netfilter/iptable_nat.o
>
>
> If you need (but looking at the code you actually don't if I didn't
> miss anything) it in several places, just export.
> Otherwise put it back to memcontrol.c.
>
> --
> With Best Regards,
> Andy Shevchenko
On Wed 04-07-18 10:25:30, 禹舟键 wrote:
> Hi Andy
> The const char array need to be used by the new func
> mem_cgroup_print_oom_context and some funcs in oom_kill.c in the
> second patch.
Just declare it in oom.h and define in oom.c
--
Michal Hocko
SUSE Labs
On Wed, Jul 4, 2018 at 5:25 AM, 禹舟键 <[email protected]> wrote:
> Hi Andy
> The const char array need to be used by the new func
> mem_cgroup_print_oom_context and some funcs in oom_kill.c in the
> second patch.
Did I understand correctly that the array is added by you in this solely patch?
Did I understand correctly that it's used only in one module
(oom_kill.c in new version)?
If both are true, just move it to the C file.
If you need a synchronization, a) put a comment, b) create another
enum item (like FOO_BAR_MAX) at the end and use it in the array as a
fixed size,
>
> Thanks
>
>>
>> On Sat, Jun 30, 2018 at 7:38 PM, <[email protected]> wrote:
>> > From: yuzhoujian <[email protected]>
>> >
>> > The current system wide oom report prints information about the victim
>> > and the allocation context and restrictions. It, however, doesn't
>> > provide any information about memory cgroup the victim belongs to. This
>> > information can be interesting for container users because they can find
>> > the victim's container much more easily.
>> >
>> > I follow the advices of David Rientjes and Michal Hocko, and refactor
>> > part of the oom report. After this patch, users can get the memcg's
>> > path from the oom report and check the certain container more quickly.
>> >
>> > The oom print info after this patch:
>> > oom-kill:constraint=<constraint>,nodemask=<nodemask>,oom_memcg=<memcg>,task_memcg=<memcg>,task=<comm>,pid=<pid>,uid=<uid>
>>
>>
>> > +static const char * const oom_constraint_text[] = {
>> > + [CONSTRAINT_NONE] = "CONSTRAINT_NONE",
>> > + [CONSTRAINT_CPUSET] = "CONSTRAINT_CPUSET",
>> > + [CONSTRAINT_MEMORY_POLICY] = "CONSTRAINT_MEMORY_POLICY",
>> > + [CONSTRAINT_MEMCG] = "CONSTRAINT_MEMCG",
>> > +};
>>
>> I'm not sure why we have this in the header.
>>
>> This produces a lot of noise when W=1.
>>
>> In file included from
>> /home/andy/prj/linux-topic-mfld/include/linux/memcontrol.h:31:0,
>> from /home/andy/prj/linux-topic-mfld/include/net/sock.h:58,
>> from /home/andy/prj/linux-topic-mfld/include/linux/tcp.h:23,
>> from /home/andy/prj/linux-topic-mfld/include/linux/ipv6.h:87,
>> from /home/andy/prj/linux-topic-mfld/include/net/ipv6.h:16,
>> from
>> /home/andy/prj/linux-topic-mfld/net/ipv4/netfilter/nf_log_ipv4.c:17:
>> /home/andy/prj/linux-topic-mfld/include/linux/oom.h:32:27: warning:
>> ‘oom_constraint_text’ defined but not used [-W
>> unused-const-variable=]
>> static const char * const oom_constraint_text[] = {
>> ^~~~~~~~~~~~~~~~~~~
>> CC [M] net/ipv4/netfilter/iptable_nat.o
>>
>>
>> If you need (but looking at the code you actually don't if I didn't
>> miss anything) it in several places, just export.
>> Otherwise put it back to memcontrol.c.
>>
>> --
>> With Best Regards,
>> Andy Shevchenko
--
With Best Regards,
Andy Shevchenko
FYI, we noticed the following commit (built with gcc-6):
commit: 3586e04c2954d48a690aee721a034c7867bb0fc1 ("[PATCH v11 2/2] Add the missing information in dump_header")
url: https://github.com/0day-ci/linux/commits/ufo19890607-gmail-com/Refactor-part-of-the-oom-report-in-dump_header/20180701-004229
in testcase: trinity
with following parameters:
runtime: 300s
test-description: Trinity is a linux system call fuzz tester.
test-url: http://codemonkey.org.uk/projects/trinity/
on test machine: qemu-system-x86_64 -enable-kvm -cpu Westmere -m 512M
caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
+------------------------------------------+------------+------------+
| | d1206092c9 | 3586e04c29 |
+------------------------------------------+------------+------------+
| boot_successes | 0 | 2 |
| boot_failures | 12 | 21 |
| invoked_oom-killer:gfp_mask=0x | 12 | 21 |
| BUG:KASAN:null-ptr-deref_in_d | 12 | |
| BUG:unable_to_handle_kernel | 12 | 21 |
| Oops:#[##] | 12 | 21 |
| RIP:dump_header | 12 | 21 |
| Kernel_panic-not_syncing:Fatal_exception | 12 | 21 |
| kernel_BUG_at_mm/usercopy.c | 1 | 2 |
| invalid_opcode:#[##] | 1 | 2 |
| RIP:usercopy_abort | 1 | 2 |
| BUG:KASAN:user-memory-access_in_d | 0 | 21 |
+------------------------------------------+------------+------------+
[ 8.645427] BUG: KASAN: user-memory-access in dump_header+0xf7/0x452
[ 8.646474] Read of size 8 at addr 0000000000001c58 by task swapper/0/1
[ 8.646692]
[ 8.646692] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G T 4.18.0-rc2-00225-g3586e04 #1
[ 8.646692] Call Trace:
[ 8.646692] dump_stack+0x8e/0xd5
[ 8.646692] kasan_report+0x245/0x28d
[ 8.646692] dump_header+0xf7/0x452
[ 8.646692] out_of_memory+0x7cb/0x86c
[ 8.646692] ? oom_killer_disable+0x1b7/0x1b7
[ 8.646692] __alloc_pages_slowpath+0xc9e/0xf35
[ 8.646692] ? gfp_pfmemalloc_allowed+0x10/0x10
[ 8.646692] ? sched_clock_local+0xa4/0xc0
[ 8.646692] ? check_chain_key+0xf4/0x14b
[ 8.646692] ? match_held_lock+0x2b/0xf8
[ 8.646692] ? match_held_lock+0x2b/0xf8
[ 8.646692] ? lock_is_held_type+0x80/0x90
[ 8.646692] __alloc_pages_nodemask+0x1b9/0x343
[ 8.646692] ? __alloc_pages_slowpath+0xf35/0xf35
[ 8.646692] ? find_first_bit+0x1b/0x4a
[ 8.646692] ? __next_node_in+0x39/0x46
[ 8.646692] alloc_page_interleave+0x12/0xba
[ 8.646692] pagecache_get_page+0x118/0x190
[ 8.646692] grab_cache_page_write_begin+0x37/0x50
[ 8.646692] simple_write_begin+0x26/0x79
[ 8.646692] generic_perform_write+0x163/0x2a2
[ 8.646692] ? fatal_signal_pending+0x34/0x34
[ 8.646692] ? file_update_time+0x132/0x21e
[ 8.646692] ? __insert_inode_hash+0xc7/0xc7
[ 8.646692] ? lock_acquired+0x3b0/0x429
[ 8.646692] ? generic_file_write_iter+0x4b/0xd0
[ 8.646692] ? lock_contended+0x46a/0x46a
[ 8.646692] ? lock_acquire+0x1d8/0x22c
[ 8.646692] __generic_file_write_iter+0x176/0x201
[ 8.646692] generic_file_write_iter+0x66/0xd0
[ 8.646692] __vfs_write+0x15b/0x1dd
[ 8.646692] ? kernel_read+0x6e/0x6e
[ 8.646692] ? lock_is_held_type+0x80/0x90
[ 8.646692] ? rcu_read_lock_sched_held+0x5d/0x74
[ 8.646692] ? rcu_sync_lockdep_assert+0x3d/0x63
[ 8.646692] ? __sb_start_write+0x188/0x1a3
[ 8.646692] ? vfs_write+0xb0/0xf2
[ 8.646692] vfs_write+0xce/0xf2
[ 8.646692] ksys_write+0xbb/0x133
[ 8.646692] ? __ia32_sys_read+0x41/0x41
[ 8.646692] ? trace_kmalloc+0xd8/0x123
[ 8.646692] ? do_name+0x22c/0x484
[ 8.646692] ? __kmalloc_track_caller+0x13f/0x167
[ 8.646692] xwrite+0x57/0x124
[ 8.646692] do_copy+0x52/0x172
[ 8.646692] write_buffer+0x61/0x9c
[ 8.646692] flush_buffer+0x10e/0x165
[ 8.646692] __gunzip+0x5d8/0x7ab
[ 8.646692] ? bunzip2+0x94d/0x94d
[ 8.646692] ? write_buffer+0x9c/0x9c
[ 8.646692] gunzip+0x39/0x3d
[ 8.646692] ? initrd_load+0xad/0xad
[ 8.646692] unpack_to_rootfs+0x2a4/0x526
[ 8.646692] ? initrd_load+0xad/0xad
[ 8.646692] ? do_symlink+0xe8/0xe8
[ 8.646692] ? __lock_is_held+0x72/0x87
[ 8.646692] ? do_header+0x1de/0x1de
[ 8.646692] populate_rootfs+0xd8/0x2cc
[ 8.646692] ? do_header+0x1de/0x1de
[ 8.646692] do_one_initcall+0x193/0x3c9
[ 8.646692] ? perf_trace_initcall_finish+0x1ef/0x1ef
[ 8.646692] ? __lock_is_held+0x72/0x87
[ 8.646692] ? lock_is_held_type+0x80/0x90
[ 8.646692] kernel_init_freeable+0x3ba/0x54d
[ 8.646692] ? start_kernel+0x8b8/0x8b8
[ 8.646692] ? mmdrop+0x19/0x2f
[ 8.646692] ? finish_task_switch+0x1bd/0x233
[ 8.646692] ? balance_callback+0x1f/0xa1
[ 8.646692] ? rest_init+0xd3/0xd3
[ 8.646692] ? rest_init+0xd3/0xd3
[ 8.646692] kernel_init+0xc/0x108
[ 8.646692] ? rest_init+0xd3/0xd3
[ 8.646692] ret_from_fork+0x3a/0x50
[ 8.646692] ==================================================================
[ 8.646692] Disabling lock debugging due to kernel taint
[ 8.701796] BUG: unable to handle kernel paging request at 0000000000001c58
[ 8.703542] PGD 0 P4D 0
[ 8.703995] Oops: 0000 [#1] SMP KASAN
[ 8.704606] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G B T 4.18.0-rc2-00225-g3586e04 #1
[ 8.705771] RIP: 0010:dump_header+0xf7/0x452
[ 8.705771] Code: 8b 34 fd 80 ac ec 81 44 89 ea 4c 89 f1 48 c7 c7 a0 9a ec 81 e8 79 13 f1 ff e8 4f db ff ff 48 8d bb 58 1c 00 00 e8 3b f3 06 00 <4c> 8b ab 58 1c 00 00 e8 1b 33 f2 ff 85 c0 74 31 80 3d d4 ca 77 01
[ 8.705771] RSP: 0000:ffff880009907258 EFLAGS: 00010286
[ 8.705771] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff810f8551
[ 8.705771] RDX: 1ffffffff04d7600 RSI: 0000000000000003 RDI: 0000000000000296
[ 8.705771] RBP: ffff8800099074e0 R08: dffffc0000000000 R09: fffffbfff04d7620
[ 8.705771] R10: fffffbfff04d7620 R11: 0000000000000000 R12: ffff8800099074e8
[ 8.705771] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 8.705771] FS: 0000000000000000(0000) GS:ffff88000a200000(0000) knlGS:0000000000000000
[ 8.705771] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8.705771] CR2: 0000000000001c58 CR3: 0000000002415000 CR4: 00000000000006b0
[ 8.705771] Call Trace:
[ 8.705771] out_of_memory+0x7cb/0x86c
[ 8.705771] ? oom_killer_disable+0x1b7/0x1b7
[ 8.705771] __alloc_pages_slowpath+0xc9e/0xf35
[ 8.705771] ? gfp_pfmemalloc_allowed+0x10/0x10
[ 8.705771] ? sched_clock_local+0xa4/0xc0
[ 8.705771] ? check_chain_key+0xf4/0x14b
[ 8.705771] ? match_held_lock+0x2b/0xf8
[ 8.705771] ? match_held_lock+0x2b/0xf8
[ 8.705771] ? lock_is_held_type+0x80/0x90
[ 8.705771] __alloc_pages_nodemask+0x1b9/0x343
[ 8.705771] ? __alloc_pages_slowpath+0xf35/0xf35
[ 8.705771] ? find_first_bit+0x1b/0x4a
[ 8.705771] ? __next_node_in+0x39/0x46
[ 8.705771] alloc_page_interleave+0x12/0xba
[ 8.705771] pagecache_get_page+0x118/0x190
[ 8.705771] grab_cache_page_write_begin+0x37/0x50
[ 8.705771] simple_write_begin+0x26/0x79
[ 8.705771] generic_perform_write+0x163/0x2a2
[ 8.705771] ? fatal_signal_pending+0x34/0x34
[ 8.705771] ? file_update_time+0x132/0x21e
[ 8.705771] ? __insert_inode_hash+0xc7/0xc7
[ 8.705771] ? lock_acquired+0x3b0/0x429
[ 8.705771] ? generic_file_write_iter+0x4b/0xd0
[ 8.705771] ? lock_contended+0x46a/0x46a
[ 8.705771] ? lock_acquire+0x1d8/0x22c
[ 8.705771] __generic_file_write_iter+0x176/0x201
[ 8.705771] generic_file_write_iter+0x66/0xd0
[ 8.705771] __vfs_write+0x15b/0x1dd
[ 8.705771] ? kernel_read+0x6e/0x6e
[ 8.705771] ? lock_is_held_type+0x80/0x90
[ 8.705771] ? rcu_read_lock_sched_held+0x5d/0x74
[ 8.705771] ? rcu_sync_lockdep_assert+0x3d/0x63
[ 8.705771] ? __sb_start_write+0x188/0x1a3
[ 8.705771] ? vfs_write+0xb0/0xf2
[ 8.705771] vfs_write+0xce/0xf2
[ 8.705771] ksys_write+0xbb/0x133
[ 8.705771] ? __ia32_sys_read+0x41/0x41
[ 8.705771] ? trace_kmalloc+0xd8/0x123
[ 8.705771] ? do_name+0x22c/0x484
[ 8.705771] ? __kmalloc_track_caller+0x13f/0x167
[ 8.705771] xwrite+0x57/0x124
[ 8.705771] do_copy+0x52/0x172
[ 8.705771] write_buffer+0x61/0x9c
[ 8.705771] flush_buffer+0x10e/0x165
[ 8.705771] __gunzip+0x5d8/0x7ab
[ 8.705771] ? bunzip2+0x94d/0x94d
[ 8.705771] ? write_buffer+0x9c/0x9c
[ 8.705771] gunzip+0x39/0x3d
[ 8.705771] ? initrd_load+0xad/0xad
[ 8.705771] unpack_to_rootfs+0x2a4/0x526
[ 8.705771] ? initrd_load+0xad/0xad
[ 8.705771] ? do_symlink+0xe8/0xe8
[ 8.705771] ? __lock_is_held+0x72/0x87
[ 8.705771] ? do_header+0x1de/0x1de
[ 8.705771] populate_rootfs+0xd8/0x2cc
[ 8.705771] ? do_header+0x1de/0x1de
[ 8.705771] do_one_initcall+0x193/0x3c9
[ 8.705771] ? perf_trace_initcall_finish+0x1ef/0x1ef
[ 8.705771] ? __lock_is_held+0x72/0x87
[ 8.705771] ? lock_is_held_type+0x80/0x90
[ 8.705771] kernel_init_freeable+0x3ba/0x54d
[ 8.705771] ? start_kernel+0x8b8/0x8b8
[ 8.705771] ? mmdrop+0x19/0x2f
[ 8.705771] ? finish_task_switch+0x1bd/0x233
[ 8.705771] ? balance_callback+0x1f/0xa1
[ 8.705771] ? rest_init+0xd3/0xd3
[ 8.705771] ? rest_init+0xd3/0xd3
[ 8.705771] kernel_init+0xc/0x108
[ 8.705771] ? rest_init+0xd3/0xd3
[ 8.705771] ret_from_fork+0x3a/0x50
[ 8.705771] Modules linked in:
[ 8.705771] CR2: 0000000000001c58
[ 8.705771] ---[ end trace 414d7789c0d43a18 ]---
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> job-script # job-script is attached in this email
Thanks,
Xiaolong
Hi Michal and Andy
The enum oom_constraint will be added in the struct oom_control. So
I still think I should define it in oom.h.
Michal Hocko <[email protected]> 于2018年7月4日周三 下午4:17写道:
>
> On Wed 04-07-18 10:25:30, 禹舟键 wrote:
> > Hi Andy
> > The const char array need to be used by the new func
> > mem_cgroup_print_oom_context and some funcs in oom_kill.c in the
> > second patch.
>
> Just declare it in oom.h and define in oom.c
> --
> Michal Hocko
> SUSE Labs
On Thu, Jul 5, 2018 at 2:23 PM, 禹舟键 <[email protected]> wrote:
> Hi Michal and Andy
> The enum oom_constraint will be added in the struct oom_control. So
> I still think I should define it in oom.h.
You missed the point. I'm talking about an array of string literals.
Please, check what the warning I got from the compiler.
> Michal Hocko <[email protected]> 于2018年7月4日周三 下午4:17写道:
>>
>> On Wed 04-07-18 10:25:30, 禹舟键 wrote:
>> > Hi Andy
>> > The const char array need to be used by the new func
>> > mem_cgroup_print_oom_context and some funcs in oom_kill.c in the
>> > second patch.
>>
>> Just declare it in oom.h and define in oom.c
>> --
>> Michal Hocko
>> SUSE Labs
--
With Best Regards,
Andy Shevchenko