Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752159AbdHOOO0 (ORCPT ); Tue, 15 Aug 2017 10:14:26 -0400 Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:42295 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751617AbdHOOOT (ORCPT ); Tue, 15 Aug 2017 10:14:19 -0400 Date: Tue, 15 Aug 2017 15:13:50 +0100 From: Roman Gushchin To: David Rientjes CC: , Michal Hocko , Vladimir Davydov , Johannes Weiner , Tetsuo Handa , Tejun Heo , , , , Subject: Re: [v5 4/4] mm, oom, docs: describe the cgroup-aware OOM killer Message-ID: <20170815141350.GA4510@castle.DHCP.thefacebook.com> References: <20170814183213.12319-1-guro@fb.com> <20170814183213.12319-5-guro@fb.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.8.3 (2017-05-23) X-Originating-IP: [2620:10d:c092:200::1:8881] X-ClientProxiedBy: VI1PR08CA0129.eurprd08.prod.outlook.com (2603:10a6:800:d4::31) To BL2PR15MB1074.namprd15.prod.outlook.com (2603:10b6:201:17::8) X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 7e5c4d1e-0c9f-4d44-20eb-08d4e3e7e209 X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:(300000500095)(300135000095)(300000501095)(300135300095)(300000502095)(300135100095)(22001)(2017030254152)(300000503095)(300135400095)(201703131423075)(201703031133081)(201702281549075)(300000504095)(300135200095)(300000505095)(300135600095)(300000506095)(300135500095);SRVR:BL2PR15MB1074; X-Microsoft-Exchange-Diagnostics: 1;BL2PR15MB1074;3:K2AOabwooAK0t/x7tv8DqMY4z+WkeSsVUonGPj1uB5j/5kcDyB8hq3iU3CEB5jTx6fZRMFC8yc1b0JfcmvBK5UUsmS1flntyaMloVteKDQ3/D71hUC/Cw6wJu7LQpo1/V+jpFMCvCsZNrtwESfekXAM97s6td2MB6m2a5kdrAhrchGw5bgc1KfzTRDdp+sj2LhqAQ+/TUyyR4XxOlEKT6G/YcSqDYdA7W3UOJus6dJLiUnnp52j0SJFk5FodYFbC;25:thHKgzq4Yv85HpZ8f9U+BEee3Zv5l53aXIaRcJyDGHiFjH8oBriVL9yG0jhn55yulPStCUc1pFH1B3XL3MJmTPhTztIx8VuZdnY8+ZdkQv/80gNAa9VIe79T5cARXuHP9ncwSMsQX+jKVE1yrwmgk2TfPtnK6DlN2MamXQIex//l2eGdPH2cbtaiqB5owgCD/F/1vJakLoSI8xoYPZDm6clmmNpofFkHCeShy4xthtW0mgxcGeh+Jvx2WWXtYpTMLMycmcV3+eyinMUZalRzHAErEVi65O2dFLJUIQcM0LD55JB1jcIZwGLWeebjWOt/VXaOFv2TWEp8nWVmnHhcSw==;31:DzPx/GKQqQ5jKOtANQxCpg6HT313AtMnsDAeg/737OlJ+16NDZGR1lo70IUh/84MhI7cC3wUNQZh8/QGirCfmPRScq1rT8CUwmpMDVTsnLiMRO62rmmtQI/VJdcYvWm44aDFkZ1EIfoVIE0EY4Jk/AY42j0ZiQ+f602NSjLkoeBKYOo/nX1UFyzaosQGod79mJSG46cBo+tYVeJtCzw10rcutRh/Cz7zsgWF8aS+gTA= X-MS-TrafficTypeDiagnostic: BL2PR15MB1074: X-Microsoft-Exchange-Diagnostics: 1;BL2PR15MB1074;20:BhJWzcdL/a4q+zPXrXbw91UrsfkL3Hf9ScdMaGYwfNBuqPk1bW/Do4SErkadriUHMt6DBA60cfUyCPIUhTTZljbsE0N886vl5vs8XI7x0hQD3rlit9LthKaRFaNpqjGGqLFXWaogkQ2tu5jRXcOhFIHW8/OByIVX+P+WuFbUwMrtXboTxclHVtZoXsmP/ar5dQtWETd740GEMUbTLGd2KPvXVNdiFL5PCXGxk69N4rVc/PtqYsXXTsJuzl7itvxwsYVqFQ3GZvRmh61y7VDHcQAYjXv7l0mFDIR5aeKB9E/eYHSIwJ7kb7D8KNmSIGPwfMbhnwAWWn523pFb4zavazIBRXv8mUkbYAYWi0h8nwRVtI5DhQ65mEGNSsGXG24nvH4R1FqKIR4uDv0774PVWXm8P8cVHFySHdXQJ1SPjn9vvs1jKpW6Gz9tJpxyAztEg39f3lU0HFlAZiv2rvryd++/aRRjEdGz/EJFXeTwwnG6OLCB+VNbvmE5bqRyOWHx;4:DDuKXXcQjcPeYzeeZgeb8pxXiEC9oxjfutd4BBogccYC/AiuhfHC4KdY8YDzZx02J2jTLP+LdMBg6SiUGLfYop5tGbX7TvHkjbf0JeAf7wmbKcrW/e2L3/O10wLNcMdO/iHZ6GFWj4/kZK2OBpH4B43iyl97Mi4WJUHBa8dUms82Rw/kX7A0YdVyb/7Hu/ddtZZ4vVv+AUEBi34MsRyWp1/i1XYcUrGSfzQKhufaXo/LKX176R8kEQr2hw5fQiM95bqfvCWEwK3rZWOliFMZCbC8zEzJp9maAjlytTbvEl0= X-Exchange-Antispam-Report-Test: UriScan:(60795455431006); X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(6040450)(601004)(2401047)(5005006)(8121501046)(3002001)(10201501046)(100000703101)(100105400095)(93006095)(93001095)(6041248)(20161123558100)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123555025)(20161123562025)(20161123560025)(20161123564025)(6072148)(201708071742011)(100000704101)(100105200095)(100000705101)(100105500095);SRVR:BL2PR15MB1074;BCL:0;PCL:0;RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095);SRVR:BL2PR15MB1074; X-Forefront-PRVS: 04004D94E2 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(4630300001)(7370300001)(6009001)(24454002)(199003)(189002)(33656002)(81166006)(6916009)(6506006)(105586002)(478600001)(42186005)(2906002)(6666003)(2950100002)(47776003)(5890100001)(305945005)(106356001)(5660300001)(7416002)(25786009)(83506001)(50466002)(8676002)(81156014)(50986999)(76176999)(54356999)(189998001)(23726003)(6246003)(110136004)(7736002)(55016002)(9686003)(53936002)(7350300001)(54906002)(101416001)(68736007)(229853002)(86362001)(97736004)(4001350100001)(6116002)(4326008)(1076002)(18370500001)(42262002);DIR:OUT;SFP:1102;SCL:1;SRVR:BL2PR15MB1074;H:castle.DHCP.thefacebook.com;FPR:;SPF:None;PTR:InfoNoRecords;A:1;MX:1;LANG:en; X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1;BL2PR15MB1074;23:i++jwiI15veOdpxJ2Z2Vk8IN+lEoPw+kGUKPz0Mvy?= =?us-ascii?Q?v+GcaJZdZyKTgGHZ0tlqWpf5BR3K1K9ExN26lmGGrcRMQublRiX3B61Y6dc1?= =?us-ascii?Q?zUJFwR1WqpOUlv33jCi1xQy/uAJmUvNz8PV1EUdxBVFNyIFPVum4Cttwdl7v?= =?us-ascii?Q?i/TalypBX3UUoTaiYLD9iAQDdUXxHxsDBAHq94DEyT2vaYi/v0idL8U9UpbZ?= =?us-ascii?Q?AKqbhGkXpNeaIo1R/ihAxuS/G0HVcm9+g93koey+djFy64gLdgTvrMqdchQl?= =?us-ascii?Q?KaUjPieuDt3UkVO+g9uoGAM7F9Qy677K5LKBih4GwczwHzOyb2mtpP2sCNbF?= =?us-ascii?Q?UaJtIWxMy12+5jUWsap+TlSvYn1VagE7XoQqqyrAii2myya2wTfM2hhvUaG2?= =?us-ascii?Q?wTlOfLgyQo4xHeOY3IUSKJEn+lFrKw4m86y7HDlqjzG3790Jb72rA2SxXeI5?= =?us-ascii?Q?JJWPpK42hUagXsrxGfGLxdLCMDshfEtQdqcosex4if+JtU1MYRf6Q/Xh8gQg?= =?us-ascii?Q?qQ/vRplNxv2AEvzcAiwuNz98dNN+kK+3FkuLbjMsXDooRQtMVO9Ri/eG9jcw?= =?us-ascii?Q?nB1O45W5AIdkGnFHZi8I7S8hTGRoaDutANBHY7AeGXWe22k7zzKMa34yCDnR?= =?us-ascii?Q?sco6WUUj5kEfibV5dDBk4JqZ/5c0x0ben6L/jnwDhuE2m+H3MdPF5ZGRqTuc?= =?us-ascii?Q?EIiEfE7cKd4G9C9Zgg0jjgGzdiSLF0SP+yJRiUjvOuba6kTlzMCoO7aZplWe?= =?us-ascii?Q?pYtXZhr6X0K1/apg2t8NkPAL8y/EO/7pZSIT+tiTtiVYf1YTh34P5jiOJN1S?= =?us-ascii?Q?G6/aFXk4yJwd5TM/5d+Fzo2BJkNBMHVYya0RVMrfAZ1G8L6zeaICuCuAS0eA?= =?us-ascii?Q?H5iZokd3q4zyuy4ggLrkuGGlFpxvxxwYC3tNkb8DEsUeEkyxlJRrouB8IoLX?= =?us-ascii?Q?TV2kVkXYS2QHxeg4qEXoWRUo3cpSktDMcDF2GcMQGelM03ALpiUbuoFeImId?= =?us-ascii?Q?2xt/+FQcaCbBH2O4+UqKcfgnVdWZpJaebE9FX4YzjX0a7NbsOvyfWjehaEpL?= =?us-ascii?Q?23tXFhJUk2QNdU/QCCkgSLzeWmOrlBHkS8P5XPA4PO7yqI15LWtd5Qayiud+?= =?us-ascii?Q?1jUwLHFu6qZN44RfDkzSzNqMpwrIr5VyXNGYaQud3eDPLr+S31CYdH13hIQ6?= =?us-ascii?Q?u0t9joHj1S/3tK09LvHhdTwSInr56NJqVaHxHfB8+3cQe3ZcL779cjSHLMBz?= =?us-ascii?Q?U6AAHwjuJKaDsjm/Gc=3D?= X-Microsoft-Exchange-Diagnostics: 1;BL2PR15MB1074;6:NbZeZgSDPKygLYvDpG6qp92MX7XxXQJvyamSP/LGyDYV4d/1hfOgOZ5kXmUd4a/EFfdVQ8nsmRWKk6f9Qu96EGWMDeLKWiBxhrflQH/mDyjFR5FTAB3bCrJ5O39VoN9gMqI2PO2AkmctJfPFj7WSHYUVjJYGFTH4TZ1s0s8JePBVHFu5OGNxhrCP1i5CEEhsy2EB8dUd6pk7evjquhAfXK/ZbSw/nYkg4sCK0ZbPI29XyUe6iu5yvswp2lgXNl+X7dMLxJCA7kt5ryZrlnegh0xoHc4X3+VL6xkCvRLCeHcZERkTwWE2FdTh67GRo2qQ+YNSGnQB+preXPvuHlTEBg==;5:dEKi8NY9nyDLMRY4BQocYH704xlLA0HRCbG5+WwCJpaSWWe5KLJjK5utJEIYqrMSTK6cTPC/EVk6lWyQ96BXYA+XuD3Z6+kXa64tfBTwmHRIbbEpprUPEtRjAjA4Exh1lY3CPh0MrHoVMcqUzBs0Jw==;24:uxUYk35+B+8EEYmdr5QfXm6XNmBD6R898PAchfu+z1a135cKb7BQ5wrDARIrJQM86Ogy9hRsHTUP0D8KhiGYc/Eh+9FC4/q3WP+5A4c3iPY=;7:hX/46n9pRqHPSbMvcITudWOdzlobyTHH2tqNe2b7tESkJb8hupRNdRBdITXm1tcC20vBGgzSQOOoYGPlvijPZdXNiXCnFXFpnAx+BgcOfC207ECvUbLNUKwfo7R88w89pjCx4aZWvUnUViAZhcsc/PAT3pM+ELyPhIAxe0GsuO3n4qwNSqJctzzbIYQ5vMbjqUASxe43Ug/amKo7GWAVqs+//IhV8euJaBGJInmB9+A= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1;BL2PR15MB1074;20:EsCGBWkiD6dbE8KQ992Gt37EL/b4ggbT2smb0Q+i6OUx6dzBN03rW7k9fpKHNjgotf35Cjmuge1DtRofo5D6vSsbKcZPiJE2qVh01ZGkGt0fh0oLyfE/3/82c0uPBKj3lA6qQpMKVcxHD2yvbA7SkTMjepcepWsoT0ansXhLej8= X-MS-Exchange-CrossTenant-OriginalArrivalTime: 15 Aug 2017 14:14:01.7468 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: BL2PR15MB1074 X-OriginatorOrg: fb.com X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-08-15_10:,, signatures=0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4687 Lines: 144 On Mon, Aug 14, 2017 at 03:52:26PM -0700, David Rientjes wrote: > On Mon, 14 Aug 2017, Roman Gushchin wrote: > > > diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt > > index dec5afdaa36d..22108f31e09d 100644 > > --- a/Documentation/cgroup-v2.txt > > +++ b/Documentation/cgroup-v2.txt > > @@ -48,6 +48,7 @@ v1 is available under Documentation/cgroup-v1/. > > 5-2-1. Memory Interface Files > > 5-2-2. Usage Guidelines > > 5-2-3. Memory Ownership > > + 5-2-4. Cgroup-aware OOM Killer > > Random curiousness, why cgroup-aware oom killer and not memcg-aware oom > killer? I don't think we use the term "memcg" somewhere in v2 docs. Do you think that "Memory cgroup-aware OOM killer" is better? > > > 5-3. IO > > 5-3-1. IO Interface Files > > 5-3-2. Writeback > > @@ -1002,6 +1003,37 @@ PAGE_SIZE multiple when read back. > > high limit is used and monitored properly, this limit's > > utility is limited to providing the final safety net. > > > > + memory.oom_kill_all_tasks > > + > > + A read-write single value file which exits on non-root > > s/exits/exists/ Fixed. Thanks! > > > + cgroups. The default is "0". > > + > > + Defines whether the OOM killer should treat the cgroup > > + as a single entity during the victim selection. > > Isn't this true independent of the memory.oom_kill_all_tasks setting? > The cgroup aware oom killer will consider memcg's as logical units when > deciding what to kill with or without memory.oom_kill_all_tasks, right? > > I think you cover this fact in the cgroup aware oom killer section below > so this might result in confusion if described alongside a setting of > memory.oom_kill_all_tasks. > > > + > > + If set, OOM killer will kill all belonging tasks in > > + corresponding cgroup is selected as an OOM victim. > > Maybe > > "If set, the OOM killer will kill all threads attached to the memcg if > selected as an OOM victim." > > is better? Fixed to the following (to conform with core v2 concepts): If set, OOM killer will kill all processes attached to the cgroup if selected as an OOM victim. > > > + > > + Be default, OOM killer respect /proc/pid/oom_score_adj value > > + -1000, and will never kill the task, unless oom_kill_all_tasks > > + is set. > > + > > + memory.oom_priority > > + > > + A read-write single value file which exits on non-root > > s/exits/exists/ Fixed. > > > + cgroups. The default is "0". > > + > > + An integer number within the [-10000, 10000] range, > > + which defines the order in which the OOM killer selects victim > > + memory cgroups. > > + > > + OOM killer prefers memory cgroups with larger priority if they > > + are populated with elegible tasks. > > s/elegible/eligible/ Fixed. > > > + > > + The oom_priority value is compared within sibling cgroups. > > + > > + The root cgroup has the oom_priority 0, which cannot be changed. > > + > > memory.events > > A read-only flat-keyed file which exists on non-root cgroups. > > The following entries are defined. Unless specified > > @@ -1206,6 +1238,36 @@ POSIX_FADV_DONTNEED to relinquish the ownership of memory areas > > belonging to the affected files to ensure correct memory ownership. > > > > > > +Cgroup-aware OOM Killer > > +~~~~~~~~~~~~~~~~~~~~~~~ > > + > > +Cgroup v2 memory controller implements a cgroup-aware OOM killer. > > +It means that it treats memory cgroups as first class OOM entities. > > + > > +Under OOM conditions the memory controller tries to make the best > > +choise of a victim, hierarchically looking for the largest memory > > +consumer. By default, it will look for the biggest task in the > > +biggest leaf cgroup. > > + > > +Be default, all cgroups have oom_priority 0, and OOM killer will > > +chose the largest cgroup recursively on each level. For non-root > > +cgroups it's possible to change the oom_priority, and it will cause > > +the OOM killer to look athe the priority value first, and compare > > +sizes only of cgroups with equal priority. > > Maybe some description of "largest" would be helpful here? I think you > could briefly describe what is accounted for in the decisionmaking. I'm afraid that it's too implementation-defined to be described. Do you have an idea, how to describe it without going too much into details? > s/athe/at the/ Fixed. > > Reading through this, it makes me wonder if doing s/cgroup/memcg/ over > most of it would be better. I don't think memcg is a good user term, but I agree, that it's necessary to highlight the fact that a user should enable memory controller to get this functionality. Added a corresponding note. Thanks! Roman