Received: by 2002:ac0:a679:0:0:0:0:0 with SMTP id p54csp1253710imp; Thu, 21 Feb 2019 23:10:50 -0800 (PST) X-Google-Smtp-Source: AHgI3IaSp7xJ+I9YNKeRn8OoiOQZjMPjD7hs7kl37RcEnk6WPAqZX9uFbsAGewDpkFLlO6fOCHxM X-Received: by 2002:a63:545:: with SMTP id 66mr2600151pgf.102.1550819450302; Thu, 21 Feb 2019 23:10:50 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550819450; cv=none; d=google.com; s=arc-20160816; b=R8HCdVsjf63kTS+UelrW5A84kefmVuKUdO59c4IZi6FYMnyYvGiSrECCec/x8fxkz1 3SDRgLDBHKmNggpYgEbok42P0SXuREgJAaaNkU4LeHBZvzdBMxk7jVX/6Xnw/OsJaDnE Z6GWRLUBUonZQUaL/X87MQfuPsmpqqEJ0D6dzj61G6RBJhAaCoE80ftd/QEeX8570noD /2+77Cfv5u88Q/FAfbA/kCJ4F5vMcaWOzwC+RrWGGrIP1BOgXBo0aJUjeFSTwZt/jS5I K3gkU7uHNa6mP3wddzO8ecZfnNmFynK3aXVMMSsLjmZO48pSUr0sduKOgN0cbIR+qHJ1 c6Aw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=2kKf8y+ffkNPE88Rc4WXNgZiqrMhzuyfy2Zdnp8/vK0=; b=bl8EmQTiF9IkOIQ2jaAhr+Aj1DEVJvqmepoccNYXl4hUg6Lxrji13917uwVliJttmy F/EAGrOdJCMNUN270XxLZxboHrzMfrahClYGltIOvgqqPH0akPZfIDJ3KszwKhM3qtln p9by3Hw5UjPuxSsDhQUu/JlegdPRoS0huQRZSNneGp0JANljOrpuCPUt++ohgRMVeJJx OkC6bgRQSQWBZawiB0Ru2BFrO8qjDZ+wRTj/yIz4hpcXd7auWhXP0ojp/4sVX1IEvXDw A7XC7zbPxWo/tHip0em5PRSaCAW170FMXmfI23Pr7EI0aq3P3Cha2/bKedw/TpYwAoft fcXQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g127si682633pgc.313.2019.02.21.23.10.34; Thu, 21 Feb 2019 23:10:50 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726213AbfBVHKH (ORCPT + 99 others); Fri, 22 Feb 2019 02:10:07 -0500 Received: from mx2.suse.de ([195.135.220.15]:34132 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725854AbfBVHKH (ORCPT ); Fri, 22 Feb 2019 02:10:07 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id A9767AB98; Fri, 22 Feb 2019 07:10:04 +0000 (UTC) Date: Fri, 22 Feb 2019 08:10:01 +0100 From: Michal Hocko To: Junil Lee Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, willy@infradead.org, pasha.tatashin@oracle.com, kirill.shutemov@linux.intel.com, jrdr.linux@gmail.com, dan.j.williams@intel.com, alexander.h.duyck@linux.intel.com, andreyknvl@google.com, arunks@codeaurora.org, keith.busch@intel.com, guro@fb.com, hannes@cmpxchg.org, rientjes@google.com, penguin-kernel@I-love.SAKURA.ne.jp, shakeelb@google.com, yuzhoujian@didichuxing.com Subject: Re: [PATCH] mm, oom: OOM killer use rss size without shmem Message-ID: <20190222071001.GA10588@dhcp22.suse.cz> References: <1550810253-152925-1-git-send-email-junil0814.lee@lge.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1550810253-152925-1-git-send-email-junil0814.lee@lge.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri 22-02-19 13:37:33, Junil Lee wrote: > The oom killer use get_mm_rss() function to estimate how free memory > will be reclaimed when the oom killer select victim task. > > However, the returned rss size by get_mm_rss() function was changed from > "mm, shmem: add internal shmem resident memory accounting" commit. > This commit makes the get_mm_rss() return size including SHMEM pages. This was actually the case even before eca56ff906bdd because SHMEM was just accounted to MM_FILEPAGES so this commit hasn't changed much really. Besides that we cannot really rule out SHMEM pages simply. They are backing MAP_ANON|MAP_SHARED which might be unmapped and freed during the oom victim exit. Moreover this is essentially the same as file backed pages or even MAP_PRIVATE|MAP_ANON pages. Bothe can be pinned by other processes e.g. via private pages via CoW mappings and file pages by filesystem or simply mlocked by another process. So this really gross evaluation will never be perfect. We would basically have to do exact calculation of the freeable memory of each process and that is just not feasible. That being said, I do not think the patch is an improvement in that direction. It just turnes one fuzzy evaluation by another that even misses a lot of memory potentially. > The oom killer can't get free memory from SHMEM pages directly after > kill victim process, it leads to mis-calculate victim points. > > Therefore, make new API as get_mm_rss_wo_shmem() which returns the rss > value excluding SHMEM_PAGES. > > Signed-off-by: Junil Lee > --- > include/linux/mm.h | 6 ++++++ > mm/oom_kill.c | 4 ++-- > 2 files changed, 8 insertions(+), 2 deletions(-) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 2d483db..bca3acc 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -1701,6 +1701,12 @@ static inline int mm_counter(struct page *page) > return mm_counter_file(page); > } > > +static inline unsigned long get_mm_rss_wo_shmem(struct mm_struct *mm) > +{ > + return get_mm_counter(mm, MM_FILEPAGES) + > + get_mm_counter(mm, MM_ANONPAGES); > +} > + > static inline unsigned long get_mm_rss(struct mm_struct *mm) > { > return get_mm_counter(mm, MM_FILEPAGES) + > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > index 3a24848..e569737 100644 > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -230,7 +230,7 @@ unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg, > * The baseline for the badness score is the proportion of RAM that each > * task's rss, pagetable and swap space use. > */ > - points = get_mm_rss(p->mm) + get_mm_counter(p->mm, MM_SWAPENTS) + > + points = get_mm_rss_wo_shmem(p->mm) + get_mm_counter(p->mm, MM_SWAPENTS) + > mm_pgtables_bytes(p->mm) / PAGE_SIZE; > task_unlock(p); > > @@ -419,7 +419,7 @@ static void dump_tasks(struct mem_cgroup *memcg, const nodemask_t *nodemask) > > pr_info("[%7d] %5d %5d %8lu %8lu %8ld %8lu %5hd %s\n", > task->pid, from_kuid(&init_user_ns, task_uid(task)), > - task->tgid, task->mm->total_vm, get_mm_rss(task->mm), > + task->tgid, task->mm->total_vm, get_mm_rss_wo_shmem(task->mm), > mm_pgtables_bytes(task->mm), > get_mm_counter(task->mm, MM_SWAPENTS), > task->signal->oom_score_adj, task->comm); > -- > 2.6.2 > -- Michal Hocko SUSE Labs