Received: by 10.213.65.68 with SMTP id h4csp136797imn; Thu, 15 Mar 2018 11:58:58 -0700 (PDT) X-Google-Smtp-Source: AG47ELsf8YT0Wlmo8ve4zzKICT3Uh4LVwqTnKYmey1hwi2Dy8bTQ7YtuesIdYYQitIyb6uehU3XT X-Received: by 2002:a17:902:468:: with SMTP id 95-v6mr9353495ple.360.1521140338779; Thu, 15 Mar 2018 11:58:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1521140338; cv=none; d=google.com; s=arc-20160816; b=ICDf2u/Dfo4UTYcB2fijpS0cIhbKDMkrRsOkrVhAe9y9Z8dNGl96+jFC1r9Wf2m4ao FRru7CTnihyAQ8CWurma+lQK8QT5F9hlO8wStL12PCOP8qWUX/em6NNkWayEGqNVGUuP FRAc+9YaV7D5NsjB08gEyNQLoaCHSXFZ4C23/ruHC+YDuSb4oJiX7ucVG/wq+78NseAo CwOl8NC77fuBjrZ/S00mfLxHcmklsGC43qX3Y0z98YqPhNCvylDI4z3wyCes3z0W8sjl mKtQEQHscpSdchg3F+AnT69OiDYkuSLUlR6AaU/gDlKSaANJhAdpAH/zRn7VlsvYBbHN rI4Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=OVanvcC/dLgyh6uP3KXtKdocGMaip3rlCnvK2NPvV18=; b=t30Ai2elGHUxhWtf1XQvXjTpVQl43uc/+iQAemXEIJsIxFv0LPwjMOWJz3rJyU3Bw6 ZG14uuNmAgWGRyKxTMemkLIhQiQVeB3E7CYTuy9VEjeoToqudloJH6QH9P2fgkyNeNwR NvPqflJdnUGY93ZAGdrwQcNbxpsJymNN1+oRNCabyDLYpXYRB8yvohLwZ4ox2r0njsiY 3OQtKyrt4qUy+cUNSRnSUaE4fnM6RfYIL4cLcdh+qczG8Xvvnfl5clHyl0NtRt/z0X3T ly4cdkPK+2AGK6A9NQjlJUfm32vHsttm+AZZCChDd4xCdFC6stWLMsR+LUiX9fF55Mcy KkeA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=JjdRc6ZH; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j11si4164180pfi.57.2018.03.15.11.58.44; Thu, 15 Mar 2018 11:58:58 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=JjdRc6ZH; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752815AbeCOS5t (ORCPT + 99 others); Thu, 15 Mar 2018 14:57:49 -0400 Received: from mail-wm0-f66.google.com ([74.125.82.66]:36514 "EHLO mail-wm0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751897AbeCOS5r (ORCPT ); Thu, 15 Mar 2018 14:57:47 -0400 Received: by mail-wm0-f66.google.com with SMTP id i194so12456925wmg.1 for ; Thu, 15 Mar 2018 11:57:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=OVanvcC/dLgyh6uP3KXtKdocGMaip3rlCnvK2NPvV18=; b=JjdRc6ZH3MXMezdRfnvjAXgJNS9z+0ArHZmSfabRb1f68xkAlQJhVZhfIkhx3A7sT8 2SpY/8gtt1z55d5gZ467OTe6U7gSN6FtUmv4hpn2oKHpILKf5+kdWLt0PD8FhVVXWn6K LNvlfdk5ZT81Ku9fmLsMw4aG2BzGp0XXe5CZwFa0AxFW4yhXmPu36ptlYSGsJBOCAtzR vcJHm00p3vYVGiV/KTOI3MtiH+sY1OSh5xcwNLlVIxyFpyI5Uowl94QLg13MUSS470c+ ortHj2WB0LnjK7tV/+Qb11OKdqcZ76rE3++6zJlOHADDBSh3fpOSm9vmGtDV2Cs9Sc2L V9sg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=OVanvcC/dLgyh6uP3KXtKdocGMaip3rlCnvK2NPvV18=; b=S0iM5xw8pi8z04iq1+mNRuVIodrfdX80TIgEnBNuc0crNpTnWqzW9896Vql2fbLAZK 25sMVAkDy6u7g7NdhDqB6RPJyf1g40aEGAHAfAIu/+YKHjXK3yN3IRxVArAiDUYoHxbA xnXy/CLKkVmLVlqruAI6qILHmFn82LPcwqLZkXjGszLjheP2dOBShVXf3buEmMzdwOyp jUFfZtHjQ0J3r1g37B91CKBQMz7PmxvHTn6z3fQSi+wEMbHhZfvAxhxAeAR4O0fAI0fA oxbVeXxwwtX3kjkETU/VtEvO/BhYs3Nx3eQVA02jdHR+lnFFaopIaskO321KXtX8Xrrg B7uw== X-Gm-Message-State: AElRT7Fj/fC1u+g27xo5ZgiHoeVS53nq/eBR32R9zjIf5cYyvUXXh6yR IYBtRGLoibvDM3k7KkK9P1BC1jKrlf56WAI5Dxw/rQ== X-Received: by 10.28.14.6 with SMTP id 6mr6068600wmo.2.1521140265857; Thu, 15 Mar 2018 11:57:45 -0700 (PDT) MIME-Version: 1.0 Received: by 10.28.184.12 with HTTP; Thu, 15 Mar 2018 11:57:44 -0700 (PDT) In-Reply-To: <20180315164553.17856-1-aryabinin@virtuozzo.com> References: <20180315164553.17856-1-aryabinin@virtuozzo.com> From: Shakeel Butt Date: Thu, 15 Mar 2018 11:57:44 -0700 Message-ID: Subject: Re: [PATCH 1/6] mm/vmscan: Wake up flushers for legacy cgroups too To: Andrey Ryabinin Cc: Andrew Morton , stable@vger.kernel.org, Mel Gorman , Tejun Heo , Johannes Weiner , Michal Hocko , Linux MM , LKML , Cgroups Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Mar 15, 2018 at 9:45 AM, Andrey Ryabinin wrote: > Commit 726d061fbd36 ("mm: vmscan: kick flushers when we encounter > dirty pages on the LRU") added flusher invocation to > shrink_inactive_list() when many dirty pages on the LRU are encountered. > > However, shrink_inactive_list() doesn't wake up flushers for legacy > cgroup reclaim, so the next commit bbef938429f5 ("mm: vmscan: remove > old flusher wakeup from direct reclaim path") removed the only source > of flusher's wake up in legacy mem cgroup reclaim path. > > This leads to premature OOM if there is too many dirty pages in cgroup: > # mkdir /sys/fs/cgroup/memory/test > # echo $$ > /sys/fs/cgroup/memory/test/tasks > # echo 50M > /sys/fs/cgroup/memory/test/memory.limit_in_bytes > # dd if=/dev/zero of=tmp_file bs=1M count=100 > Killed > > dd invoked oom-killer: gfp_mask=0x14000c0(GFP_KERNEL), nodemask=(null), order=0, oom_score_adj=0 > > Call Trace: > dump_stack+0x46/0x65 > dump_header+0x6b/0x2ac > oom_kill_process+0x21c/0x4a0 > out_of_memory+0x2a5/0x4b0 > mem_cgroup_out_of_memory+0x3b/0x60 > mem_cgroup_oom_synchronize+0x2ed/0x330 > pagefault_out_of_memory+0x24/0x54 > __do_page_fault+0x521/0x540 > page_fault+0x45/0x50 > > Task in /test killed as a result of limit of /test > memory: usage 51200kB, limit 51200kB, failcnt 73 > memory+swap: usage 51200kB, limit 9007199254740988kB, failcnt 0 > kmem: usage 296kB, limit 9007199254740988kB, failcnt 0 > Memory cgroup stats for /test: cache:49632KB rss:1056KB rss_huge:0KB shmem:0KB > mapped_file:0KB dirty:49500KB writeback:0KB swap:0KB inactive_anon:0KB > active_anon:1168KB inactive_file:24760KB active_file:24960KB unevictable:0KB > Memory cgroup out of memory: Kill process 3861 (bash) score 88 or sacrifice child > Killed process 3876 (dd) total-vm:8484kB, anon-rss:1052kB, file-rss:1720kB, shmem-rss:0kB > oom_reaper: reaped process 3876 (dd), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB > > Wake up flushers in legacy cgroup reclaim too. > > Fixes: bbef938429f5 ("mm: vmscan: remove old flusher wakeup from direct reclaim path") > Signed-off-by: Andrey Ryabinin Indeed I am seeing oom-kills without the patch. Tested-by: Shakeel Butt > Cc: > --- > mm/vmscan.c | 31 ++++++++++++++++--------------- > 1 file changed, 16 insertions(+), 15 deletions(-) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 8fcd9f8d7390..4390a8d5be41 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1771,6 +1771,20 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec, > if (stat.nr_writeback && stat.nr_writeback == nr_taken) > set_bit(PGDAT_WRITEBACK, &pgdat->flags); > > + /* > + * If dirty pages are scanned that are not queued for IO, it > + * implies that flushers are not doing their job. This can > + * happen when memory pressure pushes dirty pages to the end of > + * the LRU before the dirty limits are breached and the dirty > + * data has expired. It can also happen when the proportion of > + * dirty pages grows not through writes but through memory > + * pressure reclaiming all the clean cache. And in some cases, > + * the flushers simply cannot keep up with the allocation > + * rate. Nudge the flusher threads in case they are asleep. > + */ > + if (stat.nr_unqueued_dirty == nr_taken) > + wakeup_flusher_threads(WB_REASON_VMSCAN); > + > /* > * Legacy memcg will stall in page writeback so avoid forcibly > * stalling here. > @@ -1783,22 +1797,9 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec, > if (stat.nr_dirty && stat.nr_dirty == stat.nr_congested) > set_bit(PGDAT_CONGESTED, &pgdat->flags); > > - /* > - * If dirty pages are scanned that are not queued for IO, it > - * implies that flushers are not doing their job. This can > - * happen when memory pressure pushes dirty pages to the end of > - * the LRU before the dirty limits are breached and the dirty > - * data has expired. It can also happen when the proportion of > - * dirty pages grows not through writes but through memory > - * pressure reclaiming all the clean cache. And in some cases, > - * the flushers simply cannot keep up with the allocation > - * rate. Nudge the flusher threads in case they are asleep, but > - * also allow kswapd to start writing pages during reclaim. > - */ > - if (stat.nr_unqueued_dirty == nr_taken) { > - wakeup_flusher_threads(WB_REASON_VMSCAN); > + /* Allow kswapd to start writing pages during reclaim. */ > + if (stat.nr_unqueued_dirty == nr_taken) > set_bit(PGDAT_DIRTY, &pgdat->flags); > - } > > /* > * If kswapd scans pages marked marked for immediate > -- > 2.16.1 > > -- > To unsubscribe from this list: send the line "unsubscribe cgroups" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html