Received: by 10.213.65.68 with SMTP id h4csp488407imn; Tue, 20 Mar 2018 08:04:21 -0700 (PDT) X-Google-Smtp-Source: AG47ELsnmrVEQRu5zDgthPNKQSR7BKvuFna9qzmwDhuoselYlw0hE7FHzmtqf/+AGjUiiVTNT2ug X-Received: by 2002:a17:902:183:: with SMTP id b3-v6mr17243706plb.80.1521558261846; Tue, 20 Mar 2018 08:04:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1521558261; cv=none; d=google.com; s=arc-20160816; b=VjcbjHPsuEl6eKVIA4kZ3FhvPAXOw/BI7YO/wwBxwqyBybLCE7pLEilDhcIVjIjI4x yPH3OXuLhFYfIZkUbJcz5RasK0zPK3nJyYz2Z1PWcuEhGOG5zwUuUwwmyxmUZXn42AzH ULd3PndCyO3mbNGpblt6VM2v18GhezRY2n++V0vuG66ZMZzcr4VDNHV9nQk6Z2A0V29u /DPcqBRT1TZLPgI7lcPUCy2qN0itPq+UPa+eIvIiioeYfhgG9mIVOpXwbqMGoPN1w5P9 FlsW7lATuCfxL4DhYAQxwTJUUqptXGTuhCUVCq+OZgKLZhY6k8ehPLcEyKON1HwfQGbf v+Aw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=dsWuFUwe7/43sFJdPk7AxV/h6GrGwqgzNt46eEwLgx8=; b=0S96Q4xO0679iIJRfFktJD0KikxuPFX6VyXHZbBonNgydVDkGnk5ZAEdyoBAVuLx3x /X3TmziZoXAZK8My5kJwmYS/DDUn1xlB6QTItYspFa4Bii1mhp/hoUQz0/Vph/8nZVe+ y7D2mpDuIpkehMe8Bnbeh53SUjh6mBsun7iz82s9bOcc1xAL4bEDPiKeDc6I+pV3ZYj1 kTfmS0zyHq7xzdu/K4gVBjGzLCHsSB85jLDr4TwY7HDi+oH1PffUFxXcJUZa+yn70fGU 9lhclQWQ6U0fd8tzNdztKks6OfsDJxchLtyYq9HKfVhwX6fMZZLYF+Nuusth3p7JvLPZ WUoA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 197si1254662pge.78.2018.03.20.08.03.34; Tue, 20 Mar 2018 08:04:21 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751745AbeCTPAU (ORCPT + 99 others); Tue, 20 Mar 2018 11:00:20 -0400 Received: from mx2.suse.de ([195.135.220.15]:38910 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751458AbeCTPAS (ORCPT ); Tue, 20 Mar 2018 11:00:18 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 906ECACBC; Tue, 20 Mar 2018 15:00:16 +0000 (UTC) Date: Tue, 20 Mar 2018 16:00:16 +0100 From: Michal Hocko To: Andrey Ryabinin Cc: Andrew Morton , stable@vger.kernel.org, Mel Gorman , Tejun Heo , Johannes Weiner , linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH 1/6] mm/vmscan: Wake up flushers for legacy cgroups too Message-ID: <20180320150016.GW23100@dhcp22.suse.cz> References: <20180315164553.17856-1-aryabinin@virtuozzo.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180315164553.17856-1-aryabinin@virtuozzo.com> User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu 15-03-18 19:45:48, Andrey Ryabinin wrote: > Commit 726d061fbd36 ("mm: vmscan: kick flushers when we encounter > dirty pages on the LRU") added flusher invocation to > shrink_inactive_list() when many dirty pages on the LRU are encountered. > > However, shrink_inactive_list() doesn't wake up flushers for legacy > cgroup reclaim, so the next commit bbef938429f5 ("mm: vmscan: remove > old flusher wakeup from direct reclaim path") removed the only source > of flusher's wake up in legacy mem cgroup reclaim path. > > This leads to premature OOM if there is too many dirty pages in cgroup: > # mkdir /sys/fs/cgroup/memory/test > # echo $$ > /sys/fs/cgroup/memory/test/tasks > # echo 50M > /sys/fs/cgroup/memory/test/memory.limit_in_bytes > # dd if=/dev/zero of=tmp_file bs=1M count=100 > Killed > > dd invoked oom-killer: gfp_mask=0x14000c0(GFP_KERNEL), nodemask=(null), order=0, oom_score_adj=0 > > Call Trace: > dump_stack+0x46/0x65 > dump_header+0x6b/0x2ac > oom_kill_process+0x21c/0x4a0 > out_of_memory+0x2a5/0x4b0 > mem_cgroup_out_of_memory+0x3b/0x60 > mem_cgroup_oom_synchronize+0x2ed/0x330 > pagefault_out_of_memory+0x24/0x54 > __do_page_fault+0x521/0x540 > page_fault+0x45/0x50 > > Task in /test killed as a result of limit of /test > memory: usage 51200kB, limit 51200kB, failcnt 73 > memory+swap: usage 51200kB, limit 9007199254740988kB, failcnt 0 > kmem: usage 296kB, limit 9007199254740988kB, failcnt 0 > Memory cgroup stats for /test: cache:49632KB rss:1056KB rss_huge:0KB shmem:0KB > mapped_file:0KB dirty:49500KB writeback:0KB swap:0KB inactive_anon:0KB > active_anon:1168KB inactive_file:24760KB active_file:24960KB unevictable:0KB > Memory cgroup out of memory: Kill process 3861 (bash) score 88 or sacrifice child > Killed process 3876 (dd) total-vm:8484kB, anon-rss:1052kB, file-rss:1720kB, shmem-rss:0kB > oom_reaper: reaped process 3876 (dd), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB > > Wake up flushers in legacy cgroup reclaim too. > > Fixes: bbef938429f5 ("mm: vmscan: remove old flusher wakeup from direct reclaim path") > Signed-off-by: Andrey Ryabinin > Cc: Yes, this makes sense. We are stalling on writeback pages for the legacy memcg but we do not have any way to throttle dirty pages before the writeback kicks in Acked-by: Michal Hocko > --- > mm/vmscan.c | 31 ++++++++++++++++--------------- > 1 file changed, 16 insertions(+), 15 deletions(-) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 8fcd9f8d7390..4390a8d5be41 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1771,6 +1771,20 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec, > if (stat.nr_writeback && stat.nr_writeback == nr_taken) > set_bit(PGDAT_WRITEBACK, &pgdat->flags); > > + /* > + * If dirty pages are scanned that are not queued for IO, it > + * implies that flushers are not doing their job. This can > + * happen when memory pressure pushes dirty pages to the end of > + * the LRU before the dirty limits are breached and the dirty > + * data has expired. It can also happen when the proportion of > + * dirty pages grows not through writes but through memory > + * pressure reclaiming all the clean cache. And in some cases, > + * the flushers simply cannot keep up with the allocation > + * rate. Nudge the flusher threads in case they are asleep. > + */ > + if (stat.nr_unqueued_dirty == nr_taken) > + wakeup_flusher_threads(WB_REASON_VMSCAN); > + > /* > * Legacy memcg will stall in page writeback so avoid forcibly > * stalling here. > @@ -1783,22 +1797,9 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec, > if (stat.nr_dirty && stat.nr_dirty == stat.nr_congested) > set_bit(PGDAT_CONGESTED, &pgdat->flags); > > - /* > - * If dirty pages are scanned that are not queued for IO, it > - * implies that flushers are not doing their job. This can > - * happen when memory pressure pushes dirty pages to the end of > - * the LRU before the dirty limits are breached and the dirty > - * data has expired. It can also happen when the proportion of > - * dirty pages grows not through writes but through memory > - * pressure reclaiming all the clean cache. And in some cases, > - * the flushers simply cannot keep up with the allocation > - * rate. Nudge the flusher threads in case they are asleep, but > - * also allow kswapd to start writing pages during reclaim. > - */ > - if (stat.nr_unqueued_dirty == nr_taken) { > - wakeup_flusher_threads(WB_REASON_VMSCAN); > + /* Allow kswapd to start writing pages during reclaim. */ > + if (stat.nr_unqueued_dirty == nr_taken) > set_bit(PGDAT_DIRTY, &pgdat->flags); > - } > > /* > * If kswapd scans pages marked marked for immediate > -- > 2.16.1 -- Michal Hocko SUSE Labs