2018-05-21 02:42:07

by 禹舟键

[permalink] [raw]
Subject: [PATCH v4] Print the memcg's name when system-wide OOM happened

From: yuzhoujian <[email protected]>

The dump_header does not print the memcg's name when the system
oom happened. So users cannot locate the certain container which
contains the task that has been killed by the oom killer.

System oom report will print the memcg's name after this patch,
so users can get the memcg's path from the oom report and check
the certain container more quickly.

Changes since v3:
- rename the helper's name to mem_cgroup_print_oom_memcg_name.
- add the rcu lock held to the helper.
- remove the print info of memcg's name in mem_cgroup_print_oom_info.

Changes since v2:
- add the mem_cgroup_print_memcg_name helper to print the memcg's
name which contains the task that will be killed by the oom-killer.

Changes since v1:
- replace adding mem_cgroup_print_oom_info with printing the memcg's
name only.

Signed-off-by: yuzhoujian <[email protected]>
---
include/linux/memcontrol.h | 9 +++++++++
mm/memcontrol.c | 27 +++++++++++++++++++--------
mm/oom_kill.c | 1 +
3 files changed, 29 insertions(+), 8 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index d99b71bc2c66..5fc58beae368 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -464,6 +464,9 @@ void mem_cgroup_handle_over_high(void);

unsigned long mem_cgroup_get_limit(struct mem_cgroup *memcg);

+void mem_cgroup_print_oom_memcg_name(struct mem_cgroup *memcg,
+ struct task_struct *p);
+
void mem_cgroup_print_oom_info(struct mem_cgroup *memcg,
struct task_struct *p);

@@ -858,6 +861,12 @@ static inline unsigned long mem_cgroup_get_limit(struct mem_cgroup *memcg)
return 0;
}

+static inline void
+mem_cgroup_print_oom_memcg_name(struct mem_cgroup *memcg,
+ struct task_struct *p)
+{
+}
+
static inline void
mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
{
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 2bd3df3d101a..138a11edfacb 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1118,19 +1118,15 @@ static const char *const memcg1_stat_names[] = {
};

#define K(x) ((x) << (PAGE_SHIFT-10))
+
/**
- * mem_cgroup_print_oom_info: Print OOM information relevant to memory controller.
+ * mem_cgroup_print_oom_memcg_name: Print the memcg's name which contains the
+ * task that will be killed by the oom-killer.
* @memcg: The memory cgroup that went over limit
* @p: Task that is going to be killed
- *
- * NOTE: @memcg and @p's mem_cgroup can be different when hierarchy is
- * enabled
*/
-void mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
+void mem_cgroup_print_oom_memcg_name(struct mem_cgroup *memcg, struct task_struct *p)
{
- struct mem_cgroup *iter;
- unsigned int i;
-
rcu_read_lock();

if (p) {
@@ -1145,7 +1141,22 @@ void mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
pr_cont("\n");

rcu_read_unlock();
+}
+
+/**
+ * mem_cgroup_print_oom_info: Print OOM information relevant to memory controller.
+ * @memcg: The memory cgroup that went over limit
+ * @p: Task that is going to be killed
+ *
+ * NOTE: @memcg and @p's mem_cgroup can be different when hierarchy is
+ * enabled
+ */
+void mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
+{
+ struct mem_cgroup *iter;
+ unsigned int i;

+ mem_cgroup_print_oom_memcg_name(memcg, p);
pr_info("memory: usage %llukB, limit %llukB, failcnt %lu\n",
K((u64)page_counter_read(&memcg->memory)),
K((u64)memcg->memory.limit), memcg->memory.failcnt);
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 8ba6cb88cf58..3e0b725fb877 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -433,6 +433,7 @@ static void dump_header(struct oom_control *oc, struct task_struct *p)
if (is_memcg_oom(oc))
mem_cgroup_print_oom_info(oc->memcg, p);
else {
+ mem_cgroup_print_oom_memcg_name(oc->memcg, p);
show_mem(SHOW_MEM_FILTER_NODES, oc->nodemask);
if (is_dump_unreclaim_slabs())
dump_unreclaimable_slab();
--
2.14.1



2018-05-22 07:30:28

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH v4] Print the memcg's name when system-wide OOM happened

On Mon 21-05-18 03:39:46, ufo19890607 wrote:
> From: yuzhoujian <[email protected]>
>
> The dump_header does not print the memcg's name when the system
> oom happened. So users cannot locate the certain container which
> contains the task that has been killed by the oom killer.
>
> System oom report will print the memcg's name after this patch,
> so users can get the memcg's path from the oom report and check
> the certain container more quickly.
>
> Changes since v3:
> - rename the helper's name to mem_cgroup_print_oom_memcg_name.
> - add the rcu lock held to the helper.
> - remove the print info of memcg's name in mem_cgroup_print_oom_info.
>
> Changes since v2:
> - add the mem_cgroup_print_memcg_name helper to print the memcg's
> name which contains the task that will be killed by the oom-killer.
>
> Changes since v1:
> - replace adding mem_cgroup_print_oom_info with printing the memcg's
> name only.

This has still the part which is misleading in the global oom context.
So no, it seems that a helper will not do much good. Unless we can
squeeze everything into a single like like David proposed
(http://lkml.kernel.org/r/[email protected])
we should simply open code the relevant part of in the global oom path.

> Signed-off-by: yuzhoujian <[email protected]>
> ---
> include/linux/memcontrol.h | 9 +++++++++
> mm/memcontrol.c | 27 +++++++++++++++++++--------
> mm/oom_kill.c | 1 +
> 3 files changed, 29 insertions(+), 8 deletions(-)
>
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index d99b71bc2c66..5fc58beae368 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -464,6 +464,9 @@ void mem_cgroup_handle_over_high(void);
>
> unsigned long mem_cgroup_get_limit(struct mem_cgroup *memcg);
>
> +void mem_cgroup_print_oom_memcg_name(struct mem_cgroup *memcg,
> + struct task_struct *p);
> +
> void mem_cgroup_print_oom_info(struct mem_cgroup *memcg,
> struct task_struct *p);
>
> @@ -858,6 +861,12 @@ static inline unsigned long mem_cgroup_get_limit(struct mem_cgroup *memcg)
> return 0;
> }
>
> +static inline void
> +mem_cgroup_print_oom_memcg_name(struct mem_cgroup *memcg,
> + struct task_struct *p)
> +{
> +}
> +
> static inline void
> mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
> {
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 2bd3df3d101a..138a11edfacb 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1118,19 +1118,15 @@ static const char *const memcg1_stat_names[] = {
> };
>
> #define K(x) ((x) << (PAGE_SHIFT-10))
> +
> /**
> - * mem_cgroup_print_oom_info: Print OOM information relevant to memory controller.
> + * mem_cgroup_print_oom_memcg_name: Print the memcg's name which contains the
> + * task that will be killed by the oom-killer.
> * @memcg: The memory cgroup that went over limit
> * @p: Task that is going to be killed
> - *
> - * NOTE: @memcg and @p's mem_cgroup can be different when hierarchy is
> - * enabled
> */
> -void mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
> +void mem_cgroup_print_oom_memcg_name(struct mem_cgroup *memcg, struct task_struct *p)
> {
> - struct mem_cgroup *iter;
> - unsigned int i;
> -
> rcu_read_lock();
>
> if (p) {
> @@ -1145,7 +1141,22 @@ void mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
> pr_cont("\n");
>
> rcu_read_unlock();
> +}
> +
> +/**
> + * mem_cgroup_print_oom_info: Print OOM information relevant to memory controller.
> + * @memcg: The memory cgroup that went over limit
> + * @p: Task that is going to be killed
> + *
> + * NOTE: @memcg and @p's mem_cgroup can be different when hierarchy is
> + * enabled
> + */
> +void mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
> +{
> + struct mem_cgroup *iter;
> + unsigned int i;
>
> + mem_cgroup_print_oom_memcg_name(memcg, p);
> pr_info("memory: usage %llukB, limit %llukB, failcnt %lu\n",
> K((u64)page_counter_read(&memcg->memory)),
> K((u64)memcg->memory.limit), memcg->memory.failcnt);
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 8ba6cb88cf58..3e0b725fb877 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -433,6 +433,7 @@ static void dump_header(struct oom_control *oc, struct task_struct *p)
> if (is_memcg_oom(oc))
> mem_cgroup_print_oom_info(oc->memcg, p);
> else {
> + mem_cgroup_print_oom_memcg_name(oc->memcg, p);
> show_mem(SHOW_MEM_FILTER_NODES, oc->nodemask);
> if (is_dump_unreclaim_slabs())
> dump_unreclaimable_slab();
> --
> 2.14.1

--
Michal Hocko
SUSE Labs

2018-05-30 07:56:53

by kernel test robot

[permalink] [raw]
Subject: [lkp-robot] [Print the memcg's name when system] c385a55f52: BUG:KASAN:null-ptr-deref_in_m


FYI, we noticed the following commit (built with gcc-6):

commit: c385a55f521e1649051d7f653bec9aa0ce711c9e ("Print the memcg's name when system-wide OOM happened")
url: https://github.com/0day-ci/linux/commits/ufo19890607/Print-the-memcg-s-name-when-system-wide-OOM-happened/20180522-033834


in testcase: boot

on test machine: qemu-system-x86_64 -enable-kvm -smp 2 -m 512M

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):


+------------------------------------------------------------------+------------+------------+
| | 6741c4bb38 | c385a55f52 |
+------------------------------------------------------------------+------------+------------+
| boot_successes | 0 | 0 |
| boot_failures | 12 | 30 |
| invoked_oom-killer:gfp_mask=0x | 12 | 29 |
| Mem-Info | 12 | |
| Kernel_panic-not_syncing:Out_of_memory_and_no_killable_processes | 12 | |
| BUG:KASAN:null-ptr-deref_in_m | 0 | 29 |
| BUG:unable_to_handle_kernel | 0 | 29 |
| Oops:#[##] | 0 | 29 |
| RIP:mem_cgroup_print_oom_memcg_name | 0 | 29 |
| Kernel_panic-not_syncing:Fatal_exception | 0 | 29 |
| BUG:kernel_hang_in_boot_stage | 0 | 1 |
+------------------------------------------------------------------+------------+------------+



[ 5.366081] BUG: KASAN: null-ptr-deref in mem_cgroup_print_oom_memcg_name+0xdb/0x130
[ 5.366817] Read of size 8 at addr 0000000000000000 by task swapper/0/1
[ 5.366817]
[ 5.366817] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.17.0-rc6-00081-gc385a55 #2
[ 5.370063] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[ 5.370063] Call Trace:
[ 5.370063] dump_stack+0x137/0x1d5
[ 5.376789] ? mem_cgroup_print_oom_memcg_name+0xdb/0x130
[ 5.376789] kasan_report+0x330/0x3c0
[ 5.376789] __asan_load8+0x7d/0x80
[ 5.376789] mem_cgroup_print_oom_memcg_name+0xdb/0x130
[ 5.380065] dump_header+0x161/0x756
[ 5.380065] ? __asan_loadN+0xf/0x20
[ 5.380065] out_of_memory+0x69e/0x860
[ 5.380065] ? unregister_oom_notifier+0x20/0x20
[ 5.380065] __alloc_pages_slowpath+0x1399/0x1d20
[ 5.383398] ? fs_reclaim_release+0x60/0x60
[ 5.383398] ? __asan_loadN+0xf/0x20
[ 5.383398] ? ftrace_likely_update+0x8c/0xb0
[ 5.383398] ? __asan_loadN+0xf/0x20
[ 5.386811] __alloc_pages_nodemask+0x507/0x820
[ 5.386811] ? __alloc_pages_slowpath+0x1d20/0x1d20
[ 5.386811] ? __asan_loadN+0xf/0x20
[ 5.396789] cache_grow_begin+0x137/0x1260
[ 5.396789] ? fs_reclaim_release+0x3b/0x60
[ 5.403389] ? __asan_loadN+0xf/0x20
[ 5.403389] cache_alloc_refill+0x3c6/0x7d0
[ 5.403389] kmem_cache_alloc+0x1ba/0x540
[ 5.403389] getname_flags+0x7b/0x5c0
[ 5.406793] ? __asan_loadN+0xf/0x20
[ 5.410056] ? _parse_integer+0x1b3/0x1d0
[ 5.410056] user_path_at_empty+0x23/0x40
[ 5.410056] vfs_statx+0x191/0x250
[ 5.410056] ? __do_compat_sys_newfstat+0x100/0x100
[ 5.410056] clean_path+0x94/0x177
[ 5.416793] ? do_reset+0x85/0x85
[ 5.416793] ? __asan_loadN+0xf/0x20
[ 5.416793] ? trace_hardirqs_on+0x37/0x2c0
[ 5.416793] ? __asan_loadN+0xf/0x20
[ 5.416793] ? strcmp+0x5c/0xc0
[ 5.420054] do_name+0xc3/0x509
[ 5.420054] ? write_buffer+0x31/0x4c
[ 5.420054] write_buffer+0x39/0x4c
[ 5.423389] flush_buffer+0x110/0x140
[ 5.423389] __gunzip+0x667/0x842
[ 5.426788] ? bunzip2+0xa5b/0xa5b
[ 5.430063] ? error+0x51/0x51
[ 5.430063] ? __gunzip+0x842/0x842
[ 5.430063] gunzip+0x11/0x13
[ 5.430063] ? do_start+0x23/0x23
[ 5.430063] unpack_to_rootfs+0x355/0x645
[ 5.436806] ? do_start+0x23/0x23
[ 5.436806] ? kmsg_dump_rewind+0xd0/0xf3
[ 5.436806] ? do_collect+0xc9/0xc9
[ 5.436806] populate_rootfs+0xf4/0x308
[ 5.436806] ? unpack_to_rootfs+0x645/0x645
[ 5.443389] do_one_initcall+0x289/0x755
[ 5.443389] ? trace_event_raw_event_initcall_finish+0x270/0x270
[ 5.443389] ? kasan_check_write+0x20/0x20
[ 5.446790] ? ftrace_likely_update+0x8c/0xb0
[ 5.446790] ? do_early_param+0x11b/0x11b
[ 5.446790] ? cpumask_check+0x77/0x90
[ 5.446790] ? __asan_loadN+0xf/0x20
[ 5.453387] ? do_early_param+0x11b/0x11b
[ 5.453387] kernel_init_freeable+0x1c1/0x2e6
[ 5.453387] ? rest_init+0x110/0x110
[ 5.453387] kernel_init+0x11/0x200
[ 5.453387] ? rest_init+0x110/0x110
[ 5.453387] ret_from_fork+0x24/0x30
[ 5.460056] ==================================================================
[ 5.460056] Disabling lock debugging due to kernel taint
[ 5.464179] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[ 5.465373] PGD 0 P4D 0
[ 5.467430] Oops: 0000 [#1] SMP KASAN
[ 5.467430] Modules linked in:
[ 5.470057] CPU: 1 PID: 1 Comm: swapper/0 Tainted: G B 4.17.0-rc6-00081-gc385a55 #2
[ 5.470057] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[ 5.476808] RIP: 0010:mem_cgroup_print_oom_memcg_name+0xdb/0x130
[ 5.476808] RSP: 0000:ffff88000320f458 EFLAGS: 00010292
[ 5.476808] RAX: 0000000000000000 RBX: 0000000000000002 RCX: ffffffffb4449027
[ 5.483385] RDX: 0000000000000000 RSI: 0000000000000008 RDI: 0000000000000297
[ 5.483385] RBP: ffff88000320f470 R08: fffffbfff6f2126f R09: fffffbfff6f2126e
[ 5.490049] R10: ffffffffb7909377 R11: fffffbfff6f2126f R12: 0000000000000000
[ 5.490049] R13: 0000000000000000 R14: 0000000000000000 R15: ffff88000320f6b0
[ 5.490049] FS: 0000000000000000(0000) GS:ffff880003700000(0000) knlGS:0000000000000000
[ 5.496794] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5.496794] CR2: 0000000000000000 CR3: 0000000013422000 CR4: 00000000000006e0
[ 5.496794] Call Trace:
[ 5.496794] dump_header+0x161/0x756
[ 5.500058] ? __asan_loadN+0xf/0x20
[ 5.500058] out_of_memory+0x69e/0x860
[ 5.500058] ? unregister_oom_notifier+0x20/0x20
[ 5.500058] __alloc_pages_slowpath+0x1399/0x1d20
[ 5.503391] ? fs_reclaim_release+0x60/0x60
[ 5.503391] ? __asan_loadN+0xf/0x20
[ 5.503391] ? ftrace_likely_update+0x8c/0xb0
[ 5.503391] ? __asan_loadN+0xf/0x20
[ 5.506791] __alloc_pages_nodemask+0x507/0x820
[ 5.506791] ? __alloc_pages_slowpath+0x1d20/0x1d20
[ 5.506791] ? __asan_loadN+0xf/0x20
[ 5.506791] cache_grow_begin+0x137/0x1260
[ 5.510059] ? fs_reclaim_release+0x3b/0x60
[ 5.510059] ? __asan_loadN+0xf/0x20
[ 5.510059] cache_alloc_refill+0x3c6/0x7d0
[ 5.510059] kmem_cache_alloc+0x1ba/0x540
[ 5.513390] getname_flags+0x7b/0x5c0
[ 5.513390] ? __asan_loadN+0xf/0x20
[ 5.513390] ? _parse_integer+0x1b3/0x1d0
[ 5.513390] user_path_at_empty+0x23/0x40
[ 5.513390] vfs_statx+0x191/0x250
[ 5.513390] ? __do_compat_sys_newfstat+0x100/0x100
[ 5.516775] clean_path+0x94/0x177
[ 5.516775] ? do_reset+0x85/0x85
[ 5.516775] ? __asan_loadN+0xf/0x20
[ 5.516775] ? trace_hardirqs_on+0x37/0x2c0
[ 5.516775] ? __asan_loadN+0xf/0x20
[ 5.520065] ? strcmp+0x5c/0xc0
[ 5.520065] do_name+0xc3/0x509
[ 5.520065] ? write_buffer+0x31/0x4c
[ 5.520065] write_buffer+0x39/0x4c
[ 5.520065] flush_buffer+0x110/0x140
[ 5.520065] __gunzip+0x667/0x842
[ 5.523384] ? bunzip2+0xa5b/0xa5b
[ 5.523384] ? error+0x51/0x51
[ 5.523384] ? __gunzip+0x842/0x842
[ 5.523384] gunzip+0x11/0x13
[ 5.523384] ? do_start+0x23/0x23
[ 5.523384] unpack_to_rootfs+0x355/0x645
[ 5.526774] ? do_start+0x23/0x23
[ 5.530049] ? kmsg_dump_rewind+0xd0/0xf3
[ 5.530049] ? do_collect+0xc9/0xc9
[ 5.530049] populate_rootfs+0xf4/0x308
[ 5.530049] ? unpack_to_rootfs+0x645/0x645
[ 5.530049] do_one_initcall+0x289/0x755
[ 5.533381] ? trace_event_raw_event_initcall_finish+0x270/0x270
[ 5.533381] ? kasan_check_write+0x20/0x20
[ 5.533381] ? ftrace_likely_update+0x8c/0xb0
[ 5.540051] ? do_early_param+0x11b/0x11b
[ 5.540051] ? cpumask_check+0x77/0x90
[ 5.543385] ? __asan_loadN+0xf/0x20
[ 5.543385] ? do_early_param+0x11b/0x11b
[ 5.543385] kernel_init_freeable+0x1c1/0x2e6
[ 5.543385] ? rest_init+0x110/0x110
[ 5.546774] kernel_init+0x11/0x200
[ 5.550058] ? rest_init+0x110/0x110
[ 5.550058] ret_from_fork+0x24/0x30
[ 5.550058] Code: 50 01 00 00 e8 b7 31 15 00 48 c7 c7 00 dc ff b5 e8 6e 2e d0 ff eb 0c 48 c7 c7 60 dc ff b5 e8 60 2e d0 ff 4c 89 ef e8 75 e8 fd ff <49> 8b 5d 00 48 8d bb 50 01 00 00 e8 65 e8 fd ff 48 8b bb 50 01
[ 5.553391] RIP: mem_cgroup_print_oom_memcg_name+0xdb/0x130 RSP: ffff88000320f458
[ 5.556791] CR2: 0000000000000000
[ 5.556791] _warn_unseeded_randomness: 6 callbacks suppressed
[ 5.556791] random: get_random_bytes called from init_oops_id+0x50/0x70 with crng_init=0
[ 5.560058] ---[ end trace 8cd4338bfad4c0db ]---


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> job-script # job-script is attached in this email



Thanks,
Xiaolong


Attachments:
(No filename) (9.54 kB)
config-4.17.0-rc6-00081-gc385a55 (117.60 kB)
job-script (4.09 kB)
dmesg.xz (10.27 kB)
Download all attachments

2018-05-30 20:43:38

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v4] Print the memcg's name when system-wide OOM happened

On Mon, 21 May 2018 03:39:46 +0100 ufo19890607 <[email protected]> wrote:

> From: yuzhoujian <[email protected]>
>
> The dump_header does not print the memcg's name when the system
> oom happened. So users cannot locate the certain container which
> contains the task that has been killed by the oom killer.
>
> System oom report will print the memcg's name after this patch,
> so users can get the memcg's path from the oom report and check
> the certain container more quickly.

lkp-robot is reporting an oops.

> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -433,6 +433,7 @@ static void dump_header(struct oom_control *oc, struct task_struct *p)
> if (is_memcg_oom(oc))
> mem_cgroup_print_oom_info(oc->memcg, p);
> else {
> + mem_cgroup_print_oom_memcg_name(oc->memcg, p);
> show_mem(SHOW_MEM_FILTER_NODES, oc->nodemask);
> if (is_dump_unreclaim_slabs())
> dump_unreclaimable_slab();

static inline bool is_memcg_oom(struct oom_control *oc)
{
return oc->memcg != NULL;
}

So in the mem_cgroup_print_oom_memcg_name() call which this patch adds,
oc->memcg is known to be NULL. How can this possibly work?

2018-05-31 06:50:34

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH v4] Print the memcg's name when system-wide OOM happened

On Wed 30-05-18 13:42:56, Andrew Morton wrote:
> On Mon, 21 May 2018 03:39:46 +0100 ufo19890607 <[email protected]> wrote:
>
> > From: yuzhoujian <[email protected]>
> >
> > The dump_header does not print the memcg's name when the system
> > oom happened. So users cannot locate the certain container which
> > contains the task that has been killed by the oom killer.
> >
> > System oom report will print the memcg's name after this patch,
> > so users can get the memcg's path from the oom report and check
> > the certain container more quickly.
>
> lkp-robot is reporting an oops.
>
> > --- a/mm/oom_kill.c
> > +++ b/mm/oom_kill.c
> > @@ -433,6 +433,7 @@ static void dump_header(struct oom_control *oc, struct task_struct *p)
> > if (is_memcg_oom(oc))
> > mem_cgroup_print_oom_info(oc->memcg, p);
> > else {
> > + mem_cgroup_print_oom_memcg_name(oc->memcg, p);
> > show_mem(SHOW_MEM_FILTER_NODES, oc->nodemask);
> > if (is_dump_unreclaim_slabs())
> > dump_unreclaimable_slab();
>
> static inline bool is_memcg_oom(struct oom_control *oc)
> {
> return oc->memcg != NULL;
> }
>
> So in the mem_cgroup_print_oom_memcg_name() call which this patch adds,
> oc->memcg is known to be NULL. How can this possibly work?

This version is broken. The current version [1] seems to be doing the
right thing in that regards AFAICS. It has some other issues though.
Can we drop the current code from the mmotm tree and start over?

[1] http://lkml.kernel.org/r/[email protected]
--
Michal Hocko
SUSE Labs