If there are heavy memory pressure, page allocation with __GFP_NOWAIT
fails easily although it's order-0 request.
I got below warning 9 times for normal boot.
[ 17.072747] c0 0 <snip >: page allocation failure: order:0, mode:0x2200000(GFP_NOWAIT|__GFP_NOTRACK)
< snip >
[ 17.072789] c0 0 Call trace:
[ 17.072803] c0 0 [<ffffff8009914da4>] dump_backtrace+0x0/0x4
[ 17.072813] c0 0 [<ffffff80086bfb5c>] dump_stack+0xa4/0xc0
[ 17.072822] c0 0 [<ffffff800831a4f8>] warn_alloc+0xd4/0x15c
[ 17.072829] c0 0 [<ffffff8008318c3c>] __alloc_pages_nodemask+0xf88/0x10fc
[ 17.072838] c0 0 [<ffffff8008392b34>] alloc_slab_page+0x40/0x18c
[ 17.072843] c0 0 [<ffffff8008392acc>] new_slab+0x2b8/0x2e0
[ 17.072849] c0 0 [<ffffff800839220c>] ___slab_alloc+0x25c/0x464
[ 17.072858] c0 0 [<ffffff8008393dd0>] __kmalloc+0x394/0x498
[ 17.072865] c0 0 [<ffffff80083a658c>] memcg_kmem_get_cache+0x114/0x2b8
[ 17.072870] c0 0 [<ffffff8008392f38>] kmem_cache_alloc+0x98/0x3e8
[ 17.072878] c0 0 [<ffffff8008370be8>] mmap_region+0x3bc/0x8c0
[ 17.072884] c0 0 [<ffffff80083707fc>] do_mmap+0x40c/0x43c
[ 17.072890] c0 0 [<ffffff8008343598>] vm_mmap_pgoff+0x15c/0x1e4
[ 17.072898] c0 0 [<ffffff800814be28>] sys_mmap+0xb0/0xc8
[ 17.072904] c0 0 [<ffffff8008083730>] el0_svc_naked+0x24/0x28
[ 17.072908] c0 0 Mem-Info:
[ 17.072920] c0 0 active_anon:17124 inactive_anon:193 isolated_anon:0
[ 17.072920] c0 0 active_file:7898 inactive_file:712955 isolated_file:55
[ 17.072920] c0 0 unevictable:0 dirty:27 writeback:18 unstable:0
[ 17.072920] c0 0 slab_reclaimable:12250 slab_unreclaimable:23334
[ 17.072920] c0 0 mapped:19310 shmem:212 pagetables:816 bounce:0
[ 17.072920] c0 0 free:36561 free_pcp:1205 free_cma:35615
[ 17.072933] c0 0 Node 0 active_anon:68496kB inactive_anon:772kB active_file:31592kB inactive_file:2851820kB unevictable:0kB isolated(anon):0kB isolated(file):220kB mapped:77240kB dirty:108kB writeback:72kB shmem:848kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[ 17.072945] c0 0 DMA free:142188kB min:3056kB low:3820kB high:4584kB active_anon:10052kB inactive_anon:12kB active_file:312kB inactive_file:1412620kB unevictable:0kB writepending:0kB present:1781412kB managed:1604728kB mlocked:0kB slab_reclaimable:3592kB slab_unreclaimable:876kB kernel_stack:400kB pagetables:52kB bounce:0kB free_pcp:1436kB local_pcp:124kB free_cma:142492kB
[ 17.072949] c0 0 lowmem_reserve[]: 0 1842 1842
[ 17.072966] c0 0 Normal free:4056kB min:4172kB low:5212kB high:6252kB active_anon:58376kB inactive_anon:760kB active_file:31348kB inactive_file:1439040kB unevictable:0kB writepending:180kB present:2000636kB managed:1923688kB mlocked:0kB slab_reclaimable:45408kB slab_unreclaimable:92460kB kernel_stack:9680kB pagetables:3212kB bounce:0kB free_pcp:3392kB local_pcp:688kB free_cma:0kB
[ 17.072971] c0 0 lowmem_reserve[]: 0 0 0
[ 17.072982] c0 0 DMA: 0*4kB 0*8kB 1*16kB (C) 0*32kB 0*64kB 0*128kB 1*256kB (C) 1*512kB (C) 0*1024kB 1*2048kB (C) 34*4096kB (C) = 142096kB
[ 17.073024] c0 0 Normal: 228*4kB (UMEH) 172*8kB (UMH) 23*16kB (UH) 24*32kB (H) 5*64kB (H) 1*128kB (H) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3872kB
[ 17.073069] c0 0 721350 total pagecache pages
[ 17.073073] c0 0 0 pages in swap cache
[ 17.073078] c0 0 Swap cache stats: add 0, delete 0, find 0/0
[ 17.073081] c0 0 Free swap = 0kB
[ 17.073085] c0 0 Total swap = 0kB
[ 17.073089] c0 0 945512 pages RAM
[ 17.073093] c0 0 0 pages HighMem/MovableOnly
[ 17.073097] c0 0 63408 pages reserved
[ 17.073100] c0 0 51200 pages cma reserved
Let's not make user scared.
Cc: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
---
mm/memcontrol.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 448db08d97a0..671d07e73a3b 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2200,7 +2200,7 @@ static void __memcg_schedule_kmem_cache_create(struct mem_cgroup *memcg,
{
struct memcg_kmem_cache_create_work *cw;
- cw = kmalloc(sizeof(*cw), GFP_NOWAIT);
+ cw = kmalloc(sizeof(*cw), GFP_NOWAIT | __GFP_NOWARN);
if (!cw)
return;
--
2.17.0.484.g0c8726318c-goog
On Wed, 18 Apr 2018, Minchan Kim wrote:
> If there are heavy memory pressure, page allocation with __GFP_NOWAIT
> fails easily although it's order-0 request.
> I got below warning 9 times for normal boot.
>
> [ 17.072747] c0 0 <snip >: page allocation failure: order:0, mode:0x2200000(GFP_NOWAIT|__GFP_NOTRACK)
> < snip >
> [ 17.072789] c0 0 Call trace:
> [ 17.072803] c0 0 [<ffffff8009914da4>] dump_backtrace+0x0/0x4
> [ 17.072813] c0 0 [<ffffff80086bfb5c>] dump_stack+0xa4/0xc0
> [ 17.072822] c0 0 [<ffffff800831a4f8>] warn_alloc+0xd4/0x15c
> [ 17.072829] c0 0 [<ffffff8008318c3c>] __alloc_pages_nodemask+0xf88/0x10fc
> [ 17.072838] c0 0 [<ffffff8008392b34>] alloc_slab_page+0x40/0x18c
> [ 17.072843] c0 0 [<ffffff8008392acc>] new_slab+0x2b8/0x2e0
> [ 17.072849] c0 0 [<ffffff800839220c>] ___slab_alloc+0x25c/0x464
> [ 17.072858] c0 0 [<ffffff8008393dd0>] __kmalloc+0x394/0x498
> [ 17.072865] c0 0 [<ffffff80083a658c>] memcg_kmem_get_cache+0x114/0x2b8
> [ 17.072870] c0 0 [<ffffff8008392f38>] kmem_cache_alloc+0x98/0x3e8
> [ 17.072878] c0 0 [<ffffff8008370be8>] mmap_region+0x3bc/0x8c0
> [ 17.072884] c0 0 [<ffffff80083707fc>] do_mmap+0x40c/0x43c
> [ 17.072890] c0 0 [<ffffff8008343598>] vm_mmap_pgoff+0x15c/0x1e4
> [ 17.072898] c0 0 [<ffffff800814be28>] sys_mmap+0xb0/0xc8
> [ 17.072904] c0 0 [<ffffff8008083730>] el0_svc_naked+0x24/0x28
> [ 17.072908] c0 0 Mem-Info:
> [ 17.072920] c0 0 active_anon:17124 inactive_anon:193 isolated_anon:0
> [ 17.072920] c0 0 active_file:7898 inactive_file:712955 isolated_file:55
> [ 17.072920] c0 0 unevictable:0 dirty:27 writeback:18 unstable:0
> [ 17.072920] c0 0 slab_reclaimable:12250 slab_unreclaimable:23334
> [ 17.072920] c0 0 mapped:19310 shmem:212 pagetables:816 bounce:0
> [ 17.072920] c0 0 free:36561 free_pcp:1205 free_cma:35615
> [ 17.072933] c0 0 Node 0 active_anon:68496kB inactive_anon:772kB active_file:31592kB inactive_file:2851820kB unevictable:0kB isolated(anon):0kB isolated(file):220kB mapped:77240kB dirty:108kB writeback:72kB shmem:848kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
> [ 17.072945] c0 0 DMA free:142188kB min:3056kB low:3820kB high:4584kB active_anon:10052kB inactive_anon:12kB active_file:312kB inactive_file:1412620kB unevictable:0kB writepending:0kB present:1781412kB managed:1604728kB mlocked:0kB slab_reclaimable:3592kB slab_unreclaimable:876kB kernel_stack:400kB pagetables:52kB bounce:0kB free_pcp:1436kB local_pcp:124kB free_cma:142492kB
> [ 17.072949] c0 0 lowmem_reserve[]: 0 1842 1842
> [ 17.072966] c0 0 Normal free:4056kB min:4172kB low:5212kB high:6252kB active_anon:58376kB inactive_anon:760kB active_file:31348kB inactive_file:1439040kB unevictable:0kB writepending:180kB present:2000636kB managed:1923688kB mlocked:0kB slab_reclaimable:45408kB slab_unreclaimable:92460kB kernel_stack:9680kB pagetables:3212kB bounce:0kB free_pcp:3392kB local_pcp:688kB free_cma:0kB
> [ 17.072971] c0 0 lowmem_reserve[]: 0 0 0
> [ 17.072982] c0 0 DMA: 0*4kB 0*8kB 1*16kB (C) 0*32kB 0*64kB 0*128kB 1*256kB (C) 1*512kB (C) 0*1024kB 1*2048kB (C) 34*4096kB (C) = 142096kB
> [ 17.073024] c0 0 Normal: 228*4kB (UMEH) 172*8kB (UMH) 23*16kB (UH) 24*32kB (H) 5*64kB (H) 1*128kB (H) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3872kB
> [ 17.073069] c0 0 721350 total pagecache pages
> [ 17.073073] c0 0 0 pages in swap cache
> [ 17.073078] c0 0 Swap cache stats: add 0, delete 0, find 0/0
> [ 17.073081] c0 0 Free swap = 0kB
> [ 17.073085] c0 0 Total swap = 0kB
> [ 17.073089] c0 0 945512 pages RAM
> [ 17.073093] c0 0 0 pages HighMem/MovableOnly
> [ 17.073097] c0 0 63408 pages reserved
> [ 17.073100] c0 0 51200 pages cma reserved
>
> Let's not make user scared.
>
> Cc: Johannes Weiner <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: Vladimir Davydov <[email protected]>
> Signed-off-by: Minchan Kim <[email protected]>
Acked-by: David Rientjes <[email protected]>
On Wed, Apr 18, 2018 at 11:29:12AM +0900, Minchan Kim wrote:
> If there are heavy memory pressure, page allocation with __GFP_NOWAIT
> fails easily although it's order-0 request.
> I got below warning 9 times for normal boot.
>
> [ 17.072747] c0 0 <snip >: page allocation failure: order:0, mode:0x2200000(GFP_NOWAIT|__GFP_NOTRACK)
>
> Let's not make user scared.
>
> - cw = kmalloc(sizeof(*cw), GFP_NOWAIT);
> + cw = kmalloc(sizeof(*cw), GFP_NOWAIT | __GFP_NOWARN);
> if (!cw)
Not arguing against this patch. But how many places do we want to use
GFP_NOWAIT without __GFP_NOWARN? Not many, and the few which do do this
seem like they simply haven't added it yet. Maybe this would be a good idea?
-#define GFP_NOWAIT (__GFP_KSWAPD_RECLAIM)
+#define GFP_NOWAIT (__GFP_KSWAPD_RECLAIM | __GFP_NOWARN)
On Tue, 17 Apr 2018, Matthew Wilcox wrote:
> Not arguing against this patch. But how many places do we want to use
> GFP_NOWAIT without __GFP_NOWARN? Not many, and the few which do do this
> seem like they simply haven't added it yet. Maybe this would be a good idea?
>
> -#define GFP_NOWAIT (__GFP_KSWAPD_RECLAIM)
> +#define GFP_NOWAIT (__GFP_KSWAPD_RECLAIM | __GFP_NOWARN)
>
I don't think that's a good idea, slab allocators use GFP_NOWAIT during
init, for example, followed up with a BUG_ON() if it fails. With an
implicit __GFP_NOWARN we wouldn't be able to see the state of memory when
it crashes (likely memory that wasn't freed to the allocator). I think
whether the allocation failure should trigger a warning is up to the
caller.
On Tue 17-04-18 20:08:24, Matthew Wilcox wrote:
> On Wed, Apr 18, 2018 at 11:29:12AM +0900, Minchan Kim wrote:
> > If there are heavy memory pressure, page allocation with __GFP_NOWAIT
> > fails easily although it's order-0 request.
> > I got below warning 9 times for normal boot.
> >
> > [ 17.072747] c0 0 <snip >: page allocation failure: order:0, mode:0x2200000(GFP_NOWAIT|__GFP_NOTRACK)
> >
> > Let's not make user scared.
> >
> > - cw = kmalloc(sizeof(*cw), GFP_NOWAIT);
> > + cw = kmalloc(sizeof(*cw), GFP_NOWAIT | __GFP_NOWARN);
> > if (!cw)
>
> Not arguing against this patch. But how many places do we want to use
> GFP_NOWAIT without __GFP_NOWARN? Not many, and the few which do do this
> seem like they simply haven't added it yet. Maybe this would be a good idea?
>
> -#define GFP_NOWAIT (__GFP_KSWAPD_RECLAIM)
> +#define GFP_NOWAIT (__GFP_KSWAPD_RECLAIM | __GFP_NOWARN)
We have tried something like this in the past and Linus was strongly
against. I do not have reference handy but his argument was that each
__GFP_NOWARN should be explicit rather than implicit because it is
a deliberate decision to make.
--
Michal Hocko
SUSE Labs
On Wed 18-04-18 11:29:12, Minchan Kim wrote:
> If there are heavy memory pressure, page allocation with __GFP_NOWAIT
> fails easily although it's order-0 request.
> I got below warning 9 times for normal boot.
>
> [ 17.072747] c0 0 <snip >: page allocation failure: order:0, mode:0x2200000(GFP_NOWAIT|__GFP_NOTRACK)
> < snip >
> [ 17.072789] c0 0 Call trace:
> [ 17.072803] c0 0 [<ffffff8009914da4>] dump_backtrace+0x0/0x4
> [ 17.072813] c0 0 [<ffffff80086bfb5c>] dump_stack+0xa4/0xc0
> [ 17.072822] c0 0 [<ffffff800831a4f8>] warn_alloc+0xd4/0x15c
> [ 17.072829] c0 0 [<ffffff8008318c3c>] __alloc_pages_nodemask+0xf88/0x10fc
> [ 17.072838] c0 0 [<ffffff8008392b34>] alloc_slab_page+0x40/0x18c
> [ 17.072843] c0 0 [<ffffff8008392acc>] new_slab+0x2b8/0x2e0
> [ 17.072849] c0 0 [<ffffff800839220c>] ___slab_alloc+0x25c/0x464
> [ 17.072858] c0 0 [<ffffff8008393dd0>] __kmalloc+0x394/0x498
> [ 17.072865] c0 0 [<ffffff80083a658c>] memcg_kmem_get_cache+0x114/0x2b8
> [ 17.072870] c0 0 [<ffffff8008392f38>] kmem_cache_alloc+0x98/0x3e8
> [ 17.072878] c0 0 [<ffffff8008370be8>] mmap_region+0x3bc/0x8c0
> [ 17.072884] c0 0 [<ffffff80083707fc>] do_mmap+0x40c/0x43c
> [ 17.072890] c0 0 [<ffffff8008343598>] vm_mmap_pgoff+0x15c/0x1e4
> [ 17.072898] c0 0 [<ffffff800814be28>] sys_mmap+0xb0/0xc8
> [ 17.072904] c0 0 [<ffffff8008083730>] el0_svc_naked+0x24/0x28
> [ 17.072908] c0 0 Mem-Info:
> [ 17.072920] c0 0 active_anon:17124 inactive_anon:193 isolated_anon:0
> [ 17.072920] c0 0 active_file:7898 inactive_file:712955 isolated_file:55
> [ 17.072920] c0 0 unevictable:0 dirty:27 writeback:18 unstable:0
> [ 17.072920] c0 0 slab_reclaimable:12250 slab_unreclaimable:23334
> [ 17.072920] c0 0 mapped:19310 shmem:212 pagetables:816 bounce:0
> [ 17.072920] c0 0 free:36561 free_pcp:1205 free_cma:35615
> [ 17.072933] c0 0 Node 0 active_anon:68496kB inactive_anon:772kB active_file:31592kB inactive_file:2851820kB unevictable:0kB isolated(anon):0kB isolated(file):220kB mapped:77240kB dirty:108kB writeback:72kB shmem:848kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
> [ 17.072945] c0 0 DMA free:142188kB min:3056kB low:3820kB high:4584kB active_anon:10052kB inactive_anon:12kB active_file:312kB inactive_file:1412620kB unevictable:0kB writepending:0kB present:1781412kB managed:1604728kB mlocked:0kB slab_reclaimable:3592kB slab_unreclaimable:876kB kernel_stack:400kB pagetables:52kB bounce:0kB free_pcp:1436kB local_pcp:124kB free_cma:142492kB
> [ 17.072949] c0 0 lowmem_reserve[]: 0 1842 1842
> [ 17.072966] c0 0 Normal free:4056kB min:4172kB low:5212kB high:6252kB active_anon:58376kB inactive_anon:760kB active_file:31348kB inactive_file:1439040kB unevictable:0kB writepending:180kB present:2000636kB managed:1923688kB mlocked:0kB slab_reclaimable:45408kB slab_unreclaimable:92460kB kernel_stack:9680kB pagetables:3212kB bounce:0kB free_pcp:3392kB local_pcp:688kB free_cma:0kB
> [ 17.072971] c0 0 lowmem_reserve[]: 0 0 0
> [ 17.072982] c0 0 DMA: 0*4kB 0*8kB 1*16kB (C) 0*32kB 0*64kB 0*128kB 1*256kB (C) 1*512kB (C) 0*1024kB 1*2048kB (C) 34*4096kB (C) = 142096kB
> [ 17.073024] c0 0 Normal: 228*4kB (UMEH) 172*8kB (UMH) 23*16kB (UH) 24*32kB (H) 5*64kB (H) 1*128kB (H) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3872kB
> [ 17.073069] c0 0 721350 total pagecache pages
> [ 17.073073] c0 0 0 pages in swap cache
> [ 17.073078] c0 0 Swap cache stats: add 0, delete 0, find 0/0
> [ 17.073081] c0 0 Free swap = 0kB
> [ 17.073085] c0 0 Total swap = 0kB
> [ 17.073089] c0 0 945512 pages RAM
> [ 17.073093] c0 0 0 pages HighMem/MovableOnly
> [ 17.073097] c0 0 63408 pages reserved
> [ 17.073100] c0 0 51200 pages cma reserved
>
> Let's not make user scared.
This is not a proper explanation. So what exactly happens when this
allocation fails? I would suggest something like the following
"
__memcg_schedule_kmem_cache_create tries to create a shadow slab cache
and the worker allocation failure is not really critical because we will
retry on the next kmem charge. We might miss some charges but that
shouldn't be critical. The excessive allocation failure report is not
very much helpful. Replace it with a rate limited single line output so
that we know that there is a lot of these failures and that we need to
do something about it in future.
"
With the last part to be implemented of course.
> Cc: Johannes Weiner <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: Vladimir Davydov <[email protected]>
> Signed-off-by: Minchan Kim <[email protected]>
> ---
> mm/memcontrol.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 448db08d97a0..671d07e73a3b 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2200,7 +2200,7 @@ static void __memcg_schedule_kmem_cache_create(struct mem_cgroup *memcg,
> {
> struct memcg_kmem_cache_create_work *cw;
>
> - cw = kmalloc(sizeof(*cw), GFP_NOWAIT);
> + cw = kmalloc(sizeof(*cw), GFP_NOWAIT | __GFP_NOWARN);
> if (!cw)
> return;
>
> --
> 2.17.0.484.g0c8726318c-goog
--
Michal Hocko
SUSE Labs
On Wed, Apr 18, 2018 at 09:20:02AM +0200, Michal Hocko wrote:
> On Wed 18-04-18 11:29:12, Minchan Kim wrote:
> > If there are heavy memory pressure, page allocation with __GFP_NOWAIT
> > fails easily although it's order-0 request.
> > I got below warning 9 times for normal boot.
> >
> > [ 17.072747] c0 0 <snip >: page allocation failure: order:0, mode:0x2200000(GFP_NOWAIT|__GFP_NOTRACK)
> > < snip >
> > [ 17.072789] c0 0 Call trace:
> > [ 17.072803] c0 0 [<ffffff8009914da4>] dump_backtrace+0x0/0x4
> > [ 17.072813] c0 0 [<ffffff80086bfb5c>] dump_stack+0xa4/0xc0
> > [ 17.072822] c0 0 [<ffffff800831a4f8>] warn_alloc+0xd4/0x15c
> > [ 17.072829] c0 0 [<ffffff8008318c3c>] __alloc_pages_nodemask+0xf88/0x10fc
> > [ 17.072838] c0 0 [<ffffff8008392b34>] alloc_slab_page+0x40/0x18c
> > [ 17.072843] c0 0 [<ffffff8008392acc>] new_slab+0x2b8/0x2e0
> > [ 17.072849] c0 0 [<ffffff800839220c>] ___slab_alloc+0x25c/0x464
> > [ 17.072858] c0 0 [<ffffff8008393dd0>] __kmalloc+0x394/0x498
> > [ 17.072865] c0 0 [<ffffff80083a658c>] memcg_kmem_get_cache+0x114/0x2b8
> > [ 17.072870] c0 0 [<ffffff8008392f38>] kmem_cache_alloc+0x98/0x3e8
> > [ 17.072878] c0 0 [<ffffff8008370be8>] mmap_region+0x3bc/0x8c0
> > [ 17.072884] c0 0 [<ffffff80083707fc>] do_mmap+0x40c/0x43c
> > [ 17.072890] c0 0 [<ffffff8008343598>] vm_mmap_pgoff+0x15c/0x1e4
> > [ 17.072898] c0 0 [<ffffff800814be28>] sys_mmap+0xb0/0xc8
> > [ 17.072904] c0 0 [<ffffff8008083730>] el0_svc_naked+0x24/0x28
> > [ 17.072908] c0 0 Mem-Info:
> > [ 17.072920] c0 0 active_anon:17124 inactive_anon:193 isolated_anon:0
> > [ 17.072920] c0 0 active_file:7898 inactive_file:712955 isolated_file:55
> > [ 17.072920] c0 0 unevictable:0 dirty:27 writeback:18 unstable:0
> > [ 17.072920] c0 0 slab_reclaimable:12250 slab_unreclaimable:23334
> > [ 17.072920] c0 0 mapped:19310 shmem:212 pagetables:816 bounce:0
> > [ 17.072920] c0 0 free:36561 free_pcp:1205 free_cma:35615
> > [ 17.072933] c0 0 Node 0 active_anon:68496kB inactive_anon:772kB active_file:31592kB inactive_file:2851820kB unevictable:0kB isolated(anon):0kB isolated(file):220kB mapped:77240kB dirty:108kB writeback:72kB shmem:848kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
> > [ 17.072945] c0 0 DMA free:142188kB min:3056kB low:3820kB high:4584kB active_anon:10052kB inactive_anon:12kB active_file:312kB inactive_file:1412620kB unevictable:0kB writepending:0kB present:1781412kB managed:1604728kB mlocked:0kB slab_reclaimable:3592kB slab_unreclaimable:876kB kernel_stack:400kB pagetables:52kB bounce:0kB free_pcp:1436kB local_pcp:124kB free_cma:142492kB
> > [ 17.072949] c0 0 lowmem_reserve[]: 0 1842 1842
> > [ 17.072966] c0 0 Normal free:4056kB min:4172kB low:5212kB high:6252kB active_anon:58376kB inactive_anon:760kB active_file:31348kB inactive_file:1439040kB unevictable:0kB writepending:180kB present:2000636kB managed:1923688kB mlocked:0kB slab_reclaimable:45408kB slab_unreclaimable:92460kB kernel_stack:9680kB pagetables:3212kB bounce:0kB free_pcp:3392kB local_pcp:688kB free_cma:0kB
> > [ 17.072971] c0 0 lowmem_reserve[]: 0 0 0
> > [ 17.072982] c0 0 DMA: 0*4kB 0*8kB 1*16kB (C) 0*32kB 0*64kB 0*128kB 1*256kB (C) 1*512kB (C) 0*1024kB 1*2048kB (C) 34*4096kB (C) = 142096kB
> > [ 17.073024] c0 0 Normal: 228*4kB (UMEH) 172*8kB (UMH) 23*16kB (UH) 24*32kB (H) 5*64kB (H) 1*128kB (H) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3872kB
> > [ 17.073069] c0 0 721350 total pagecache pages
> > [ 17.073073] c0 0 0 pages in swap cache
> > [ 17.073078] c0 0 Swap cache stats: add 0, delete 0, find 0/0
> > [ 17.073081] c0 0 Free swap = 0kB
> > [ 17.073085] c0 0 Total swap = 0kB
> > [ 17.073089] c0 0 945512 pages RAM
> > [ 17.073093] c0 0 0 pages HighMem/MovableOnly
> > [ 17.073097] c0 0 63408 pages reserved
> > [ 17.073100] c0 0 51200 pages cma reserved
> >
> > Let's not make user scared.
>
> This is not a proper explanation. So what exactly happens when this
> allocation fails? I would suggest something like the following
> "
> __memcg_schedule_kmem_cache_create tries to create a shadow slab cache
> and the worker allocation failure is not really critical because we will
> retry on the next kmem charge. We might miss some charges but that
> shouldn't be critical. The excessive allocation failure report is not
> very much helpful. Replace it with a rate limited single line output so
> that we know that there is a lot of these failures and that we need to
> do something about it in future.
> "
>
> With the last part to be implemented of course.
If you want to see warning and catch on it in future, I don't see any reason
to change it. Because I didn't see any excessive warning output that it could
make system slow unless we did ratelimiting.
It was a just report from non-MM guys who have a concern that somethings
might go wrong on the system. I just wanted them relax since it's not
critical.
On Wed 18-04-18 16:41:17, Minchan Kim wrote:
> On Wed, Apr 18, 2018 at 09:20:02AM +0200, Michal Hocko wrote:
> > On Wed 18-04-18 11:29:12, Minchan Kim wrote:
[...]
> > > Let's not make user scared.
> >
> > This is not a proper explanation. So what exactly happens when this
> > allocation fails? I would suggest something like the following
> > "
> > __memcg_schedule_kmem_cache_create tries to create a shadow slab cache
> > and the worker allocation failure is not really critical because we will
> > retry on the next kmem charge. We might miss some charges but that
> > shouldn't be critical. The excessive allocation failure report is not
> > very much helpful. Replace it with a rate limited single line output so
> > that we know that there is a lot of these failures and that we need to
> > do something about it in future.
> > "
> >
> > With the last part to be implemented of course.
>
> If you want to see warning and catch on it in future, I don't see any reason
> to change it. Because I didn't see any excessive warning output that it could
> make system slow unless we did ratelimiting.
Yeah, but a single line would be as much informative and less scary to
users.
> It was a just report from non-MM guys who have a concern that somethings
> might go wrong on the system. I just wanted them relax since it's not
> critical.
I do agree with __GFP_NOWARN but I think a single line warning is due
and helpful for further debugging.
--
Michal Hocko
SUSE Labs
On Wed, Apr 18, 2018 at 09:54:37AM +0200, Michal Hocko wrote:
> On Wed 18-04-18 16:41:17, Minchan Kim wrote:
> > On Wed, Apr 18, 2018 at 09:20:02AM +0200, Michal Hocko wrote:
> > > On Wed 18-04-18 11:29:12, Minchan Kim wrote:
> [...]
> > > > Let's not make user scared.
> > >
> > > This is not a proper explanation. So what exactly happens when this
> > > allocation fails? I would suggest something like the following
> > > "
> > > __memcg_schedule_kmem_cache_create tries to create a shadow slab cache
> > > and the worker allocation failure is not really critical because we will
> > > retry on the next kmem charge. We might miss some charges but that
> > > shouldn't be critical. The excessive allocation failure report is not
> > > very much helpful. Replace it with a rate limited single line output so
> > > that we know that there is a lot of these failures and that we need to
> > > do something about it in future.
> > > "
> > >
> > > With the last part to be implemented of course.
> >
> > If you want to see warning and catch on it in future, I don't see any reason
> > to change it. Because I didn't see any excessive warning output that it could
> > make system slow unless we did ratelimiting.
>
> Yeah, but a single line would be as much informative and less scary to
> users.
>
> > It was a just report from non-MM guys who have a concern that somethings
> > might go wrong on the system. I just wanted them relax since it's not
> > critical.
>
> I do agree with __GFP_NOWARN but I think a single line warning is due
> and helpful for further debugging.
Okay, no problem. However, I don't feel we need ratelimit at this moment.
We can do when we got real report. Let's add just one line warning.
However, I have no talent to write a poem to express with one line.
Could you help me?
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 671d07e73a3b..e26f85cac63f 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2201,8 +2201,11 @@ static void __memcg_schedule_kmem_cache_create(struct mem_cgroup *memcg,
struct memcg_kmem_cache_create_work *cw;
cw = kmalloc(sizeof(*cw), GFP_NOWAIT | __GFP_NOWARN);
- if (!cw)
+ if (!cw) {
+ pr_warn("Fail to create shadow slab cache for memcg but it's not critical.\n");
+ pr_warn("If you see lots of this message, send an email to [email protected]\n");
return;
+ }
css_get(&memcg->css);
On Wed 18-04-18 22:23:28, Minchan Kim wrote:
> On Wed, Apr 18, 2018 at 09:54:37AM +0200, Michal Hocko wrote:
> > On Wed 18-04-18 16:41:17, Minchan Kim wrote:
> > > On Wed, Apr 18, 2018 at 09:20:02AM +0200, Michal Hocko wrote:
> > > > On Wed 18-04-18 11:29:12, Minchan Kim wrote:
> > [...]
> > > > > Let's not make user scared.
> > > >
> > > > This is not a proper explanation. So what exactly happens when this
> > > > allocation fails? I would suggest something like the following
> > > > "
> > > > __memcg_schedule_kmem_cache_create tries to create a shadow slab cache
> > > > and the worker allocation failure is not really critical because we will
> > > > retry on the next kmem charge. We might miss some charges but that
> > > > shouldn't be critical. The excessive allocation failure report is not
> > > > very much helpful. Replace it with a rate limited single line output so
> > > > that we know that there is a lot of these failures and that we need to
> > > > do something about it in future.
> > > > "
> > > >
> > > > With the last part to be implemented of course.
> > >
> > > If you want to see warning and catch on it in future, I don't see any reason
> > > to change it. Because I didn't see any excessive warning output that it could
> > > make system slow unless we did ratelimiting.
> >
> > Yeah, but a single line would be as much informative and less scary to
> > users.
> >
> > > It was a just report from non-MM guys who have a concern that somethings
> > > might go wrong on the system. I just wanted them relax since it's not
> > > critical.
> >
> > I do agree with __GFP_NOWARN but I think a single line warning is due
> > and helpful for further debugging.
>
> Okay, no problem. However, I don't feel we need ratelimit at this moment.
> We can do when we got real report. Let's add just one line warning.
> However, I have no talent to write a poem to express with one line.
> Could you help me?
What about
pr_info("Failed to create memcg slab cache. Report if you see floods of these\n");
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 671d07e73a3b..e26f85cac63f 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2201,8 +2201,11 @@ static void __memcg_schedule_kmem_cache_create(struct mem_cgroup *memcg,
> struct memcg_kmem_cache_create_work *cw;
>
> cw = kmalloc(sizeof(*cw), GFP_NOWAIT | __GFP_NOWARN);
> - if (!cw)
> + if (!cw) {
> + pr_warn("Fail to create shadow slab cache for memcg but it's not critical.\n");
> + pr_warn("If you see lots of this message, send an email to [email protected]\n");
> return;
> + }
>
> css_get(&memcg->css);
--
Michal Hocko
SUSE Labs
On Wed, Apr 18, 2018 at 11:29:12AM +0900, Minchan Kim wrote:
> If there are heavy memory pressure, page allocation with __GFP_NOWAIT
> fails easily although it's order-0 request.
> I got below warning 9 times for normal boot.
>
> Let's not make user scared.
Actually, can you explain why it's OK if this fails? As I understand this
code, we'll fail to create a kmalloc cache for this memcg. What problems
does that cause?
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 448db08d97a0..671d07e73a3b 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2200,7 +2200,7 @@ static void __memcg_schedule_kmem_cache_create(struct mem_cgroup *memcg,
> {
> struct memcg_kmem_cache_create_work *cw;
>
> - cw = kmalloc(sizeof(*cw), GFP_NOWAIT);
> + cw = kmalloc(sizeof(*cw), GFP_NOWAIT | __GFP_NOWARN);
> if (!cw)
> return;
>
> --
> 2.17.0.484.g0c8726318c-goog
>
On Wed 18-04-18 06:31:39, Matthew Wilcox wrote:
> On Wed, Apr 18, 2018 at 11:29:12AM +0900, Minchan Kim wrote:
> > If there are heavy memory pressure, page allocation with __GFP_NOWAIT
> > fails easily although it's order-0 request.
> > I got below warning 9 times for normal boot.
> >
> > Let's not make user scared.
>
> Actually, can you explain why it's OK if this fails? As I understand this
> code, we'll fail to create a kmalloc cache for this memcg. What problems
> does that cause?
See http://lkml.kernel.org/r/[email protected]
--
Michal Hocko
SUSE Labs
On Wed, 18 Apr 2018, Michal Hocko wrote:
> > Okay, no problem. However, I don't feel we need ratelimit at this moment.
> > We can do when we got real report. Let's add just one line warning.
> > However, I have no talent to write a poem to express with one line.
> > Could you help me?
>
> What about
> pr_info("Failed to create memcg slab cache. Report if you see floods of these\n");
>
Um, there's nothing actionable here for the user. Even if the message
directed them to a specific email address, what would you ask the user for
in response if they show a kernel log with 100 of these? Probably ask
them to use sysrq at the time it happens to get meminfo. But any user
initiated sysrq is going to reveal very different state of memory compared
to when the kmalloc() actually failed.
If this really needs a warning, I think it only needs to be done once and
reveal the state of memory similar to how slub emits oom warnings. But as
the changelog indicates, the system is oom and we couldn't reclaim. We
can expect this happens a lot on systems with memory pressure. What is
the warning revealing that would be actionable?
On Wed, Apr 18, 2018 at 11:29:12AM +0900, Minchan Kim wrote:
> If there are heavy memory pressure, page allocation with __GFP_NOWAIT
> fails easily although it's order-0 request.
> I got below warning 9 times for normal boot.
>
> [ 17.072747] c0 0 <snip >: page allocation failure: order:0, mode:0x2200000(GFP_NOWAIT|__GFP_NOTRACK)
> < snip >
> [ 17.072789] c0 0 Call trace:
> [ 17.072803] c0 0 [<ffffff8009914da4>] dump_backtrace+0x0/0x4
> [ 17.072813] c0 0 [<ffffff80086bfb5c>] dump_stack+0xa4/0xc0
> [ 17.072822] c0 0 [<ffffff800831a4f8>] warn_alloc+0xd4/0x15c
> [ 17.072829] c0 0 [<ffffff8008318c3c>] __alloc_pages_nodemask+0xf88/0x10fc
> [ 17.072838] c0 0 [<ffffff8008392b34>] alloc_slab_page+0x40/0x18c
> [ 17.072843] c0 0 [<ffffff8008392acc>] new_slab+0x2b8/0x2e0
> [ 17.072849] c0 0 [<ffffff800839220c>] ___slab_alloc+0x25c/0x464
> [ 17.072858] c0 0 [<ffffff8008393dd0>] __kmalloc+0x394/0x498
> [ 17.072865] c0 0 [<ffffff80083a658c>] memcg_kmem_get_cache+0x114/0x2b8
> [ 17.072870] c0 0 [<ffffff8008392f38>] kmem_cache_alloc+0x98/0x3e8
> [ 17.072878] c0 0 [<ffffff8008370be8>] mmap_region+0x3bc/0x8c0
> [ 17.072884] c0 0 [<ffffff80083707fc>] do_mmap+0x40c/0x43c
> [ 17.072890] c0 0 [<ffffff8008343598>] vm_mmap_pgoff+0x15c/0x1e4
> [ 17.072898] c0 0 [<ffffff800814be28>] sys_mmap+0xb0/0xc8
> [ 17.072904] c0 0 [<ffffff8008083730>] el0_svc_naked+0x24/0x28
> [ 17.072908] c0 0 Mem-Info:
> [ 17.072920] c0 0 active_anon:17124 inactive_anon:193 isolated_anon:0
> [ 17.072920] c0 0 active_file:7898 inactive_file:712955 isolated_file:55
> [ 17.072920] c0 0 unevictable:0 dirty:27 writeback:18 unstable:0
> [ 17.072920] c0 0 slab_reclaimable:12250 slab_unreclaimable:23334
> [ 17.072920] c0 0 mapped:19310 shmem:212 pagetables:816 bounce:0
> [ 17.072920] c0 0 free:36561 free_pcp:1205 free_cma:35615
> [ 17.072933] c0 0 Node 0 active_anon:68496kB inactive_anon:772kB active_file:31592kB inactive_file:2851820kB unevictable:0kB isolated(anon):0kB isolated(file):220kB mapped:77240kB dirty:108kB writeback:72kB shmem:848kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
> [ 17.072945] c0 0 DMA free:142188kB min:3056kB low:3820kB high:4584kB active_anon:10052kB inactive_anon:12kB active_file:312kB inactive_file:1412620kB unevictable:0kB writepending:0kB present:1781412kB managed:1604728kB mlocked:0kB slab_reclaimable:3592kB slab_unreclaimable:876kB kernel_stack:400kB pagetables:52kB bounce:0kB free_pcp:1436kB local_pcp:124kB free_cma:142492kB
> [ 17.072949] c0 0 lowmem_reserve[]: 0 1842 1842
> [ 17.072966] c0 0 Normal free:4056kB min:4172kB low:5212kB high:6252kB active_anon:58376kB inactive_anon:760kB active_file:31348kB inactive_file:1439040kB unevictable:0kB writepending:180kB present:2000636kB managed:1923688kB mlocked:0kB slab_reclaimable:45408kB slab_unreclaimable:92460kB kernel_stack:9680kB pagetables:3212kB bounce:0kB free_pcp:3392kB local_pcp:688kB free_cma:0kB
> [ 17.072971] c0 0 lowmem_reserve[]: 0 0 0
> [ 17.072982] c0 0 DMA: 0*4kB 0*8kB 1*16kB (C) 0*32kB 0*64kB 0*128kB 1*256kB (C) 1*512kB (C) 0*1024kB 1*2048kB (C) 34*4096kB (C) = 142096kB
> [ 17.073024] c0 0 Normal: 228*4kB (UMEH) 172*8kB (UMH) 23*16kB (UH) 24*32kB (H) 5*64kB (H) 1*128kB (H) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3872kB
> [ 17.073069] c0 0 721350 total pagecache pages
> [ 17.073073] c0 0 0 pages in swap cache
> [ 17.073078] c0 0 Swap cache stats: add 0, delete 0, find 0/0
> [ 17.073081] c0 0 Free swap = 0kB
> [ 17.073085] c0 0 Total swap = 0kB
> [ 17.073089] c0 0 945512 pages RAM
> [ 17.073093] c0 0 0 pages HighMem/MovableOnly
> [ 17.073097] c0 0 63408 pages reserved
> [ 17.073100] c0 0 51200 pages cma reserved
>
> Let's not make user scared.
>
> Cc: Johannes Weiner <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: Vladimir Davydov <[email protected]>
> Signed-off-by: Minchan Kim <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
On Wed 18-04-18 11:58:00, David Rientjes wrote:
> On Wed, 18 Apr 2018, Michal Hocko wrote:
>
> > > Okay, no problem. However, I don't feel we need ratelimit at this moment.
> > > We can do when we got real report. Let's add just one line warning.
> > > However, I have no talent to write a poem to express with one line.
> > > Could you help me?
> >
> > What about
> > pr_info("Failed to create memcg slab cache. Report if you see floods of these\n");
> >
>
> Um, there's nothing actionable here for the user. Even if the message
> directed them to a specific email address, what would you ask the user for
> in response if they show a kernel log with 100 of these?
We would have to think of a better way to create shaddow memcg caches.
> Probably ask
> them to use sysrq at the time it happens to get meminfo. But any user
> initiated sysrq is going to reveal very different state of memory compared
> to when the kmalloc() actually failed.
Not really.
> If this really needs a warning, I think it only needs to be done once and
> reveal the state of memory similar to how slub emits oom warnings. But as
> the changelog indicates, the system is oom and we couldn't reclaim. We
> can expect this happens a lot on systems with memory pressure. What is
> the warning revealing that would be actionable?
That it actually happens in real workloads and we want to know what
those workloads are. This code is quite old and yet this is the first
some somebody complains. So it is most probably rare. Maybe because most
workloads doesn't create many memcgs dynamically while low on memory.
And maybe that will change in future. In any case, having a large splat
of meminfo for GFP_NOWAIT is not really helpful. It will tell us what we
know already - the memory is low and the reclaim was prohibited. We just
need to know that this happens out there.
--
Michal Hocko
SUSE Labs
On Thu, Apr 19, 2018 at 08:40:05AM +0200, Michal Hocko wrote:
> On Wed 18-04-18 11:58:00, David Rientjes wrote:
> > On Wed, 18 Apr 2018, Michal Hocko wrote:
> >
> > > > Okay, no problem. However, I don't feel we need ratelimit at this moment.
> > > > We can do when we got real report. Let's add just one line warning.
> > > > However, I have no talent to write a poem to express with one line.
> > > > Could you help me?
> > >
> > > What about
> > > pr_info("Failed to create memcg slab cache. Report if you see floods of these\n");
> > >
Thanks you, Michal. However, hmm, floods is very vague to me. 100 time per sec?
10 time per hour? I guess we need more guide line to trigger user's reporting
if we really want to do.
> >
> > Um, there's nothing actionable here for the user. Even if the message
> > directed them to a specific email address, what would you ask the user for
> > in response if they show a kernel log with 100 of these?
>
> We would have to think of a better way to create shaddow memcg caches.
>
> > Probably ask
> > them to use sysrq at the time it happens to get meminfo. But any user
> > initiated sysrq is going to reveal very different state of memory compared
> > to when the kmalloc() actually failed.
>
> Not really.
>
> > If this really needs a warning, I think it only needs to be done once and
> > reveal the state of memory similar to how slub emits oom warnings. But as
> > the changelog indicates, the system is oom and we couldn't reclaim. We
> > can expect this happens a lot on systems with memory pressure. What is
> > the warning revealing that would be actionable?
>
> That it actually happens in real workloads and we want to know what
> those workloads are. This code is quite old and yet this is the first
> some somebody complains. So it is most probably rare. Maybe because most
> workloads doesn't create many memcgs dynamically while low on memory.
> And maybe that will change in future. In any case, having a large splat
> of meminfo for GFP_NOWAIT is not really helpful. It will tell us what we
> know already - the memory is low and the reclaim was prohibited. We just
> need to know that this happens out there.
The workload was experimenting creating memcg per app on embedded device
but at this moment, I don't consider kmemcg at this moment so I can live
with disabling kmemcg, even. Based on it, I cannot say whether it's real
workload or not.
When I see replies of this thread, it's arguble to add such one-line
warn so if you want it strongly, could you handle by yourself?
Sorry but I don't have any interest on the arguing.
Thanks.
On Fri 20-04-18 14:42:39, Minchan Kim wrote:
[...]
> When I see replies of this thread, it's arguble to add such one-line
> warn so if you want it strongly, could you handle by yourself?
I do not feel strongly about it to argue as well. So the patch Andrew
added with a better explanation is sufficient from my POV.
--
Michal Hocko
SUSE Labs