Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752590AbbEHIBm (ORCPT ); Fri, 8 May 2015 04:01:42 -0400 Received: from mailout4.samsung.com ([203.254.224.34]:17254 "EHLO mailout4.samsung.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751396AbbEHIBl (ORCPT ); Fri, 8 May 2015 04:01:41 -0400 X-AuditID: cbfee690-f796f6d000005054-0a-554c6d6223ae Date: Fri, 08 May 2015 08:01:38 +0000 (GMT) From: Yogesh Narayan Gaur Subject: Re: Re: [EDT] oom_killer: find bulkiest task based on pss value To: yalin wang Cc: "akpm@linux-foundation.org" , "linux-kernel@vger.kernel.org" , AJEET YADAV , Amit Arora Reply-to: yn.gaur@samsung.com MIME-version: 1.0 X-MTR: 20150508074847150@yn.gaur Msgkey: 20150508074847150@yn.gaur X-EPLocale: en_US.windows-1252 X-Priority: 3 X-EPWebmail-Msg-Type: personal X-EPWebmail-Reply-Demand: 0 X-EPApproval-Locale: X-EPHeader: ML X-MLAttribute: X-RootMTR: 20150508074847150@yn.gaur X-ParentMTR: X-ArchiveUser: X-CPGSPASS: N X-ConfirmMail: N,general Content-type: text/plain; charset=windows-1252 MIME-version: 1.0 Message-id: <2137546797.259031431072097606.JavaMail.weblogic@epmlwas09d> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFvrMIsWRmVeSWpSXmKPExsWyRsSkSjcp1yfUYPd/I4vLu+awOTB6fN4k F8AYxWWTkpqTWZZapG+XwJVxYeEbloKG0Iprf/YwNzA+Cepi5OQQElCSmHr0ABuILSFgIjFp 8mYoW0ziwr31bBA1SxklXtyShKnpPbwJKM4FFJ/DKDG3eyc7SIJFQEWi7f88VhCbTcBAYtm3 /0wgtrCAh8T7bR/AbBEBbYlpf1tZQZqZBc4wSsxcdYERYoOsxNM5T5hBbF4BQYmTM5+wdDFy AG1TkJj4RgUirCgx5e5fqOPkJJZMvcwEYfNKzGh/ygITn/Z1DTOELS1xftYGRphnFn9/DBXn lzh2ewdUr4DE1DMHoWpUJT4c7WCHsPkk1ix8ywJTv+vUcmaYXfe3zIXqlZDY2vIE7F9mkNu6 H7JD2AYSRxbNYUX3Ci8wHDZfsIVo7eSQuLQvdQKj0iwkVbOQTJqFZBKymgWMLKsYRVMLkguK k9KLTPSKE3OLS/PS9ZLzczcxAtPC6X/PJuxgvHfA+hCjAAejEg/vA1afUCHWxLLiytxDjKbA WJrILCWanA9MPnkl8YbGZkYWpiamxkbmlmZK4ryvpX4GCwmkJ5akZqemFqQWxReV5qQWH2Jk 4uCUamB01NVSPPBh+fWPfefqgx5VigpO6l0nNjGt+27M7IqLn2fI73hvK/X732LNPxUb1+7r SmW6UFJedNHV86fLGpsjV7ruCgoejEhYvtRFq7WC4dPe67u/bnUP3iMy6+NssyW+naff7F/f wZYRLV6o28Kjpu9x3KxC4N7fLo0dPmKxZ6bqWl5jn6akxFKckWioxVxUnAgAITVnewYDAAA= X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrGKsWRmVeSWpSXmKPExsVy+t/tPt2kXJ9QgxuTtCwu75rD5sDo8XmT XABjVJpNRmpiSmqRQmpecn5KZl66rZJ3cLxzvKmZgaGuoaWFuZJCXmJuqq2Si0+ArltmDtBQ JYWyxJxSoFBAYnGxkr6dTVF+aUmqQkZ+cYmtUrShuZGekYGeqZGeoWmslaGBgZEpUE1CWsaF hW9YChpCK6792cPcwPgkqIuRk0NIQEli6tEDbCC2hICJRO/hTVC2mMSFe+uBbC6gmjmMEnO7 d7KDJFgEVCTa/s9jBbHZBAwkln37zwRiCwt4SLzf9gHMFhHQlpj2t5UVpJlZ4AyjxMxVFxgh tslKPJ3zhBnE5hUQlDg58wlLFyMH0DYFiYlvVCDCihJT7v6FOkJOYsnUy0wQNq/EjPanLDDx aV/XMEPY0hLnZ21ghDl68ffHUHF+iWO3d0D1CkhMPXMQqkZV4sPRDnYIm09izcK3LDD1u04t Z4bZdX/LXKheCYmtLU/A/mUGua37ITuEbSBxZNEcVnSv8ALDYfMF2wmMsrOQZGYh6Z6FpBtZ zQJGllWMoqkFyQXFSekVxnrFibnFpXnpesn5uZsYwWno2eIdjP/PWx9iFOBgVOLhfcjqEyrE mlhWXJl7iFGCg1lJhNdcDSjEm5JYWZValB9fVJqTWnyI0RQYaROZpUST84EpMq8k3tDYxNzU 2NTCwNDc3ExJnPf/udwQIYH0xJLU7NTUgtQimD4mDk6pBkb+cxcsTCSMn6UZ2BS718TlX742 t+rQfVYdxVOpbnPvtL+Q4XY1eCZlfjCx2HhD+lb1MrWDf4VXPXtiy6Cn2ccldblWWWNqBHfA pW1z9NgaLlYVMmp2653YyTA/Qjmp/6l4jhDDvvwZLR0OMkVGnnsm3tpTNDG7J0hOZf9znyMW khsWf+18p8RSnJFoqMVcVJwIAPngATRZAwAA DLP-Filter: Pass X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by nfs id t4881mOs026523 Content-Length: 7907 Lines: 202 EP-2DAD0AFA905A4ACB804C4F82A001242F ------- Original Message ------- Sender : yalin wang Date : May 08, 2015 13:17 (GMT+05:30) Title : Re: [EDT] oom_killer: find bulkiest task based on pss value 2015-05-08 13:29 GMT+08:00 Yogesh Narayan Gaur : >> >> EP-2DAD0AFA905A4ACB804C4F82A001242F >> Hi Andrew, >> >> Presently in oom_kill.c we calculate badness score of the victim task as per the present RSS counter value of the task. >> RSS counter value for any task is usually '[Private (Dirty/Clean)] + [Shared (Dirty/Clean)]' of the task. >> We have encountered a situation where values for Private fields are less but value for Shared fields are more and hence make total RSS counter value large. Later on oom situation killing task with highest RSS value but as Private field values are not large hence memory gain after killing this process is not as per the expectation. >> >> For e.g. take below use-case scenario, in which 3 process are running in system. >> All these process done mmap for file exist in present directory and then copying data from this file to local allocated pointers in while(1) loop with some sleep. Out of 3 process, 2 process has mmaped file with MAP_SHARED setting and one has mapped file with MAP_PRIVATE setting. >> I have all 3 processes in background and checks RSS/PSS value from user space utility (utility over cat /proc/pid/smaps) >> Before OOM, below is the consumed memory status for these 3 process (all processes run with oom_score_adj = 0) >> ==================================================== >> Comm : 1prg, Pid : 213 (values in kB) >> Rss Shared Private Pss >> Process : 375764 194596 181168 278460 >> ==================================================== >> Comm : 3prg, Pid : 217 (values in kB) >> Rss Shared Private Pss >> Process : 305760 32 305728 305738 >> ==================================================== >> Comm : 2prg, Pid : 218 (values in kB) >> Rss Shared Private Pss >> Process : 389980 194596 195384 292676 >> ==================================================== >> >> Thus as per present code design, first it would select process [2prg : 218] as bulkiest process as its RSS value is highest to kill. But if we kill this process then only ~195MB would be free as compare to expected ~389MB. >> Thus identifying the task based on RSS value is not accurate design and killing that identified process didn?t release expected memory back to system. >> >> We need to calculate victim task based on PSS instead of RSS as PSS value calculates as >> PSS value = [Private (Dirty/Clean)] + [Shared (Dirty/Clean) / no. of shared task] >> For above use-case scenario also, it can be checked that process [3prg : 217] is having largest PSS value and by killing this process we can gain maximum memory (~305MB) as compare to killing process identified based on RSS value. >> >> -- >> Regards, >> Yogesh Gaur. > >Great, > > in fact, i also encounter this scenario, > I use USS (page map counter == 1) pages > to decide which process should be killed, > seems have the same result as you use PSS, > but PSS is better , it also consider shared pages, > in case some process have large shared pages mapping > but little Private page mapping > > BRs, > Yalin I have made patch which identifies bulkiest task on basis of PSS value. Please check below patch. This patch is correcting the way victim task gets identified in oom condition. ================== >From 1c3d7f552f696bdbc0126c8e23beabedbd80e423 Mon Sep 17 00:00:00 2001 From: Yogesh Gaur Date: Thu, 7 May 2015 01:52:13 +0530 Subject: [PATCH] oom: find victim task based on pss This patch is identifying bulkiest task to kill by OOM on the basis of PSS value instead of present RSS values. There can be scenario where task with highest RSS counter is consuming lot of shared memory and killing that task didn't release expected amount of memory to system. PSS value = [Private (Dirty/Clean)] + [Shared (Dirty/Clean) / no. of shared task] RSS value = [Private (Dirty/Clean)] + [Shared (Dirty/Clean)] Thus, using PSS value instead of RSS value as PSS value closely matches with actual memory usage by the task. This patch is using smaps_pte_range() interface defined in CONFIG_PROC_PAGE_MONITOR. For case when CONFIG_PROC_PAGE_MONITOR disabled, this simply returns RSS value count. Signed-off-by: Yogesh Gaur Signed-off-by: Amit Arora Reviewed-by: Ajeet Yadav --- fs/proc/task_mmu.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++ include/linux/mm.h | 9 +++++++++ mm/oom_kill.c | 9 +++++++-- 3 files changed, 63 insertions(+), 2 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 956b75d..dd962ff 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -964,6 +964,53 @@ struct pagemapread { bool v2; }; +/** + * get_mm_pss - function to determine PSS count of pages being used for proc p + * PSS value=[Private(Dirty/Clean)] + [Shared(Dirty/Clean)/no. of shared task] + * @p: task struct of which task we should calculate + * @mm: mm struct of the task. + * + * This function needs to be called under task_lock for calling task 'p'. + */ +long get_mm_pss(struct task_struct *p, struct mm_struct *mm) +{ + long pss = 0; + struct vm_area_struct *vma = NULL; + struct mem_size_stats mss; + struct mm_walk smaps_walk = { + .pmd_entry = smaps_pte_range, + .private = &mss, + }; + + if (mm == NULL) + return 0; + + /* task_lock held in oom_badness */ + smaps_walk.mm = mm; + + if (!down_read_trylock(&mm->mmap_sem)) { + pr_warn("Skipping task:%s\n", p->comm); + return 0; + } + + vma = mm->mmap; + if (!vma) { + up_read(&mm->mmap_sem); + return 0; + } + + while (vma) { + memset(&mss, 0, sizeof(struct mem_size_stats)); + walk_page_vma(vma, &smaps_walk); + pss += (unsigned long) (mss.pss >> (12 + PSS_SHIFT)); /*PSS in PAGE */ + + /* Check next vma in list */ + vma = vma->vm_next; + } + up_read(&mm->mmap_sem); + return pss; +} + #define PAGEMAP_WALK_SIZE (PMD_SIZE) #define PAGEMAP_WALK_MASK (PMD_MASK) diff --git a/include/linux/mm.h b/include/linux/mm.h index 47a9392..b6bb521 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1423,6 +1423,15 @@ static inline void setmax_mm_hiwater_rss(unsigned long *maxrss, *maxrss = hiwater_rss; } +#ifdef CONFIG_PROC_PAGE_MONITOR +long get_mm_pss(struct task_struct *p, struct mm_struct *mm); +#else +static inline long get_mm_pss(struct task_struct *p, struct mm_struct *mm) +{ + return 0; +} +#endif + #if defined(SPLIT_RSS_COUNTING) void sync_mm_rss(struct mm_struct *mm); #else diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 642f38c..537eb4c 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -151,6 +151,7 @@ unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg, { long points; long adj; + long pss = 0; if (oom_unkillable_task(p, memcg, nodemask)) return 0; @@ -167,9 +168,13 @@ unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg, /* * The baseline for the badness score is the proportion of RAM that each - * task's rss, pagetable and swap space use. + * task's pss, pagetable and swap space use. */ - points = get_mm_rss(p->mm) + get_mm_counter(p->mm, MM_SWAPENTS) + + pss = get_mm_pss(p, p->mm); + if (pss == 0) /* make pss equals to rss, pseudo-pss */ + pss = get_mm_rss(p->mm); + + points = pss + get_mm_counter(p->mm, MM_SWAPENTS) + atomic_long_read(&p->mm->nr_ptes) + mm_nr_pmds(p->mm); task_unlock(p); -- 1.7.1 -- BRs Yogesh Gaur.????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?