Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S966012AbdIZMNo (ORCPT ); Tue, 26 Sep 2017 08:13:44 -0400 Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:50300 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934836AbdIZMNl (ORCPT ); Tue, 26 Sep 2017 08:13:41 -0400 Date: Tue, 26 Sep 2017 13:13:00 +0100 From: Roman Gushchin To: Michal Hocko CC: Johannes Weiner , Tejun Heo , , David Rientjes , , Vladimir Davydov , Tetsuo Handa , Andrew Morton , , , Subject: Re: [v8 0/4] cgroup-aware OOM killer Message-ID: <20170926121300.GB23139@castle.dhcp.TheFacebook.com> References: <20170915105826.hq5afcu2ij7hevb4@dhcp22.suse.cz> <20170915152301.GA29379@castle> <20170918061405.pcrf5vauvul4c2nr@dhcp22.suse.cz> <20170920215341.GA5382@castle> <20170925122400.4e7jh5zmuzvbggpe@dhcp22.suse.cz> <20170925170004.GA22704@cmpxchg.org> <20170925181533.GA15918@castle> <20170925202442.lmcmvqwy2jj2tr5h@dhcp22.suse.cz> <20170926105925.GA23139@castle.dhcp.TheFacebook.com> <20170926112134.r5eunanjy7ogjg5n@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <20170926112134.r5eunanjy7ogjg5n@dhcp22.suse.cz> User-Agent: Mutt/1.9.0 (2017-09-02) X-Originating-IP: [2620:10d:c092:200::1:b331] X-ClientProxiedBy: AM4PR05CA0023.eurprd05.prod.outlook.com (2603:10a6:205::36) To BL2PR15MB1076.namprd15.prod.outlook.com (2603:10b6:201:17::10) X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 6335b644-fb25-4aa6-e591-08d504d7fb4c X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:(22001)(2017030254152)(2017052603199)(201703131423075)(201703031133081)(201702281549075);SRVR:BL2PR15MB1076; X-Microsoft-Exchange-Diagnostics: 1;BL2PR15MB1076;3:bHi4O3UlxrWaIuRkLlRtJEQy5fx9FOdw+CWmnv7Hg9QBuHWeLw8Toi+3JCCUF/dL5IFDwbtc2HqhZMv5A9aMro20DiO8qNMPNee++CiPCvrs8kVuf1bu4zP82OM7RB90NUNc6XbW1cJlZ40tEUQIFO6/YQgrWLn+JafVHogwirsEmvUI1dy4cOOshICVYLxEMT8e3p4ER4Wd0zlThRBQDNsLx5HAiCz1EoRluuFwoyZ7X6WQKhOzoZZNvYjMkUlX;25:Ojk2jXkX3OfFRQwTvDs0bTPZnfiBPTP8VqfHwjXMJY+d7myveld6Tt/xD2A8QjDULkPkr3Jbnbget3O5Ijq5RjJjrPl24VW1cyDjyBKLszrHv/gVDeFtaN6WkXWcbftLJjgUNKs3AMom6jYQk14V99HCkC1INBk0+NqKX+fQNymCcJXx+BXtwmgjSAing7Aw1dQiAkL2o/5LrL9ePO6agzX2/xslXWYXdllvnPJATGWRJON3jG0SVaRFRVoubdZ2zeS2yCJUlxW3SDd3vBQvJS+aYflC2lGVVgKgy5wq87+7n41wkZhQb+NDTHeKRAjPz2fnq+kJSN64gyC44M+v6A==;31:xCh4QA/sdZQqYYLtJKm5e2ChvRxfnH/jvVOtk7kSg0XgUULGweN7KdJQvlROebOCQzzNgVxfRORbfqt5jX9aNyTpbWxtn0zMWnuI+xqOlVC3OYImQm3G3VeRW3WdjklsNLKJV2XU75n3jSqGop7fFq7ef4e+V/rAuTPCCv+Alkhbzy1RW/oV8+8PzmCoTYdydu54HgDfGB8Gu87h8FENU5tWZWiGyEI6XB9pKuOyhDo= X-MS-TrafficTypeDiagnostic: BL2PR15MB1076: X-Microsoft-Exchange-Diagnostics: 1;BL2PR15MB1076;20:xMJQJTxHAZGmCztxaQ0NwTQvBNc7VzFPaETCdwlMMmtmv8EwpR/6jgJfHSMWcW+67xFKuRoEHyydEAu+uli4EdlciyvBn+vJNibnesn1hah4YH6vmibcXXUtFA6szQ+MDiQLTcaOqJshmqkJXvGfwBb5L8KJJMjRE2JsUOBOn/mSTH+J6DiJgNlEzajzvBCOwK9DWvnbnAIxwLqhcHOxWoccIDal2JjpPC8DrKzQB3qhepdxsqyqHaeCzLBW6C15d6H6zenoG1Ja/26gxdPuKTBOLfjg5SHxlG17kPJwF74FY5a0dy9DQG9YejkmADkq4ss8474rndu9JEC/+2eHLB5ub2nqlDcfchkKCTf2K4iXbJ+ACuUi0P+fpRS+CvtOM7Ib3ypUKPap3SvcnskpvWoAQUmsHsC3RjF+HhEBvlzRHht5Nl/2vY4H2ohgXTUTbjzwFdWV/5O2vhw7VYtILuA15RDmt3QhgWPlSX0YbSlcEjZS5MdPgvpJAwwK2sV9;4:wLLj/GhdBiuy7VwDUImnjrsMS+lwQfmx9vhi6SDlEBX2S6BgukFIpuUm4NeYEDFjwXQUI5FbNjo4hihSHA08qIzLHZ3LVtwNEQgyFv498LuwKWAIbrLFkDH2N3E9IN1rpRP1oKNu4lcDFj/OJsCCWc+u25ygaosVyeCvCvilm/FCQw+T9ThGzzGNz38RDsTpNeqLgXP9O7aek3F9RifV4P59YFtOVJjpOgsAc1CGzQlVqtbnPYH4fP4DgeUHzsUe X-Exchange-Antispam-Report-Test: UriScan:; X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(6040450)(2401047)(8121501046)(5005006)(10201501046)(3002001)(100000703101)(100105400095)(93006095)(93001095)(6041248)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123562025)(20161123564025)(20161123560025)(20161123558100)(20161123555025)(6072148)(201708071742011)(100000704101)(100105200095)(100000705101)(100105500095);SRVR:BL2PR15MB1076;BCL:0;PCL:0;RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095);SRVR:BL2PR15MB1076; X-Forefront-PRVS: 0442E569BC X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(6009001)(376002)(346002)(189002)(199003)(377424004)(51444003)(24454002)(5660300001)(4326008)(8676002)(6116002)(81166006)(81156014)(189998001)(39060400002)(6666003)(6506006)(8936002)(1076002)(23726003)(50466002)(47776003)(33656002)(76176999)(106356001)(83506001)(68736007)(2950100002)(229853002)(101416001)(2906002)(86362001)(478600001)(53936002)(50986999)(25786009)(105586002)(58126008)(9686003)(55016002)(316002)(6246003)(6916009)(93886005)(16586007)(7416002)(54356999)(54906003)(97736004)(7736002)(305945005)(18370500001)(42262002);DIR:OUT;SFP:1102;SCL:1;SRVR:BL2PR15MB1076;H:castle.dhcp.TheFacebook.com;FPR:;SPF:None;PTR:InfoNoRecords;MX:1;A:1;LANG:en; X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1;BL2PR15MB1076;23:+xjGEVk3MydVPycFOklQMAQJsm+RFWjpmWYXuav1S?= =?us-ascii?Q?Y/GHI0ED237kHqnoow88Gjuv7wNFr/5nkUy7o2Ss3zz9q1aoG/dqwi5BwvxC?= =?us-ascii?Q?8X3PCwAEEWcf2DWsX9AzRU6WVUN887fzlHv6TbRHAn8l7Vwyr+0nSXuUhmu3?= =?us-ascii?Q?z91ID6Y0LxE8O/uwRyqQCHKmtjo98NYe1Zie4qnhVy5gNY3dy5OTbJX8viEa?= =?us-ascii?Q?QpugkO9uXF+5dAUks4l4DVpKu7c8Ah7n2jgJEKqVwRaDEiz6R23/+2s9ytND?= =?us-ascii?Q?EpXJahLsX6kEBluomcscGAWCAU5CycrqN1lRXQFbHRftN3oVlt8auC7kl3FO?= =?us-ascii?Q?yM/EhOBQpARgCWxFfDjX+ZyqEoIa8XKY9/x7/jacfHIoyIK+3ztcEcIV1KNP?= =?us-ascii?Q?iJNHwpoujIOBX3SIhU/gyFRWXMO6UYwL10+RGNsTGRhfNAwbgJn5MGcSNYna?= =?us-ascii?Q?yT6IVsNdzFKYyf+INNK73WQJysqqGvtCDb7Gxr6Zsr8WPuxdJIsUsAFT8Ojm?= =?us-ascii?Q?ax1hNQzHM77KU5SmS+ux6EtDMaeLSG27vdfIbRrPqmDChf2Ju96CTRVji2DS?= =?us-ascii?Q?g55axDTMhw4MStg+n9SzpBuTIBwd4OjOnpAl5RkGipoAvA90S0bXgH+g2jN8?= =?us-ascii?Q?NWcNdm5OzBEoOSR9VFlNfr77+8u8g3vc1LsJAe2x5YNrKm5eVHx5/RMFMaWh?= =?us-ascii?Q?XFTOVao+15xh9BFAcFtB34qtYhSYoqNva/XkYsxe0+T3nBFWbeNLcaAKrLmh?= =?us-ascii?Q?cQPlUbWWaMMxOAoQa4a78VdwcQIvNdxIt7dSzyGSy2Ug0xXOFQM58+6tFbDJ?= =?us-ascii?Q?Xzb3ssam0IXnwwWHxMgBXFFyhhuWRGBvqmKuihymuGxWS090qiL+BzY3CP/r?= =?us-ascii?Q?tqhXW6LjEBWIqpz7BwQigXFPqtdPh2YsgZ3KnTQEIPhLlkNZZPk4fFjBgA2n?= =?us-ascii?Q?j0h96IJtfOjRbxH8a3RzfWlXCo44wArER+emsCxicQXZ53jAex7c8U8oTBO7?= =?us-ascii?Q?eMatNaNi7y6z21JraOUsoAI0mljyubWg7KCLZSKlc0yKy9i+3IxHckkr8fjK?= =?us-ascii?Q?lk4KSbg5MZMbELEnT4BAGN/Vi3lSxh7nRyK2nBTn7KVIn1+NVPJx/9HPhGzI?= =?us-ascii?Q?PVmd2QFX2ybTfj1ftBcb8Vr7MgPI3D44+84iBSCF0bPSLtmtNcYpIjxtQFnF?= =?us-ascii?Q?+blsVQNZONXCGV+1kjyJHNr7oxS23fZdNhdm3i9AEdU3AgT+ZUashPr81HkA?= =?us-ascii?Q?vKwv9nt0cs5XvHmhBg=3D?= X-Microsoft-Exchange-Diagnostics: 1;BL2PR15MB1076;6:r4NbWB9TJayKnaNOJWxo+tqKeOj7eXMs+Hfwjqg86VM4i7cgNk0lGaw9JHvbHySXXc8f5xVuAVhNEtBFl8syVIFRsSJibZvHg2c9z2oCrmahqbc8+Sctcf69X86+kAl3IUrdVGYiFGOuXFQf0gJjj3zis3nSlk2sMx6olvOj4dGMGZRy3YoShnNl7aGTmtp4sMndIQmvjjKvgTu1xv6Mp++OkeieNEWenlAw81Di9Z+eaP7d3OekatrRwgIXu3KrMozRlGTW15o7pcDouFtY3yoxcssg6GoJQZL6L2hEFEddKh7SvelLtDC1pJai1gbcpwWpYeJvoYa3ToMEtyYPLw==;5:WMLibBrpt132Bojl0RgXyZvlb6ZYgJniNemrR9kbmRdWytrMPza0kpajpjPHY9vYg4NvMEVgNCNl/tKFMDAVtghroq75T6YyfwXEjnInaIApks6PLDjtF7NzWlG2qA31nAUkuqm8KNsbAil3MaNnZA==;24:bASQkU/IQbsqQ/x8ReRVQB+hJTN5dtKTryFb2ONmWfEilw8LPbN5n7hPR4YiIYuT1A2nEEHnSgESOKQ+WdY593SMFsxyZjWTMU6ylD/QTNI=;7:YXFJ+1+XKmnQjLdzHex8YoChPyRRV/p+djxZM39E9KSYj2mOHvbTusZMphgxxtUkKp1H280rpT7aHOCJafsslhyGZV1iSFeMRY7xHplnU2z+CXKDyQordYc/DPkAIX++MgnQxXEoH1NBSmk6gJ/m5kSjtvUR5JC5PkgDoXaNPAGAx8np9a0UZMIsr4t/WLLOsyBvb98pTGG+1KzZRFbIvNtNBhc1N6Dvy60lUGBwM9M= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1;BL2PR15MB1076;20:XJLpNyuQPkrcXEDSShlNcfW0T1K/DDfaNeZlKCuxo4lCFmS2Opwql/jUmIsVldnY8lvFMACRi64F9cnMxueU7wa5+M+YGvhD+Y7lM92tTVuiqLDG8DJG6Rl7PCR/VT6W19Vs+LYzLqPiqdxbAcGzWsA1agBUVOBpW9nWUJchU2Q= X-MS-Exchange-CrossTenant-OriginalArrivalTime: 26 Sep 2017 12:13:20.5893 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 8ae927fe-1255-47a7-a2af-5f3a069daaa2 X-MS-Exchange-Transport-CrossTenantHeadersStamped: BL2PR15MB1076 X-OriginatorOrg: fb.com X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-09-26_04:,, signatures=0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3816 Lines: 74 On Tue, Sep 26, 2017 at 01:21:34PM +0200, Michal Hocko wrote: > On Tue 26-09-17 11:59:25, Roman Gushchin wrote: > > On Mon, Sep 25, 2017 at 10:25:21PM +0200, Michal Hocko wrote: > > > On Mon 25-09-17 19:15:33, Roman Gushchin wrote: > > > [...] > > > > I'm not against this model, as I've said before. It feels logical, > > > > and will work fine in most cases. > > > > > > > > In this case we can drop any mount/boot options, because it preserves > > > > the existing behavior in the default configuration. A big advantage. > > > > > > I am not sure about this. We still need an opt-in, ragardless, because > > > selecting the largest process from the largest memcg != selecting the > > > largest task (just consider memcgs with many processes example). > > > > As I understand Johannes, he suggested to compare individual processes with > > group_oom mem cgroups. In other words, always select a killable entity with > > the biggest memory footprint. > > > > This is slightly different from my v8 approach, where I treat leaf memcgs > > as indivisible memory consumers independent on group_oom setting, so > > by default I'm selecting the biggest task in the biggest memcg. > > My reading is that he is actually proposing the same thing I've been > mentioning. Simply select the biggest killable entity (leaf memcg or > group_oom hierarchy) and either kill the largest task in that entity > (for !group_oom) or the whole memcg/hierarchy otherwise. He wrote the following: "So I'm leaning toward the second model: compare all oomgroups and standalone tasks in the system with each other, independent of the failed hierarchical control structure. Then kill the biggest of them." > > > While the approach suggested by Johannes looks clear and reasonable, > > I'm slightly concerned about possible implementation issues, > > which I've described below: > > > > > > > > > The only thing, I'm slightly concerned, that due to the way how we calculate > > > > the memory footprint for tasks and memory cgroups, we will have a number > > > > of weird edge cases. For instance, when putting a single process into > > > > the group_oom memcg will alter the oom_score significantly and result > > > > in significantly different chances to be killed. An obvious example will > > > > be a task with oom_score_adj set to any non-extreme (other than 0 and -1000) > > > > value, but it can also happen in case of constrained alloc, for instance. > > > > > > I am not sure I understand. Are you talking about root memcg comparing > > > to other memcgs? > > > > Not only, but root memcg in this case will be another complication. We can > > also use the same trick for all memcg (define memcg oom_score as maximum oom_score > > of the belonging tasks), it will turn group_oom into pure container cleanup > > solution, without changing victim selection algorithm > > I fail to see the problem to be honest. Simply evaluate the memcg_score > you have so far with one minor detail. You only check memcgs which have > tasks (rather than check for leaf node check) or it is group_oom. An > intermediate memcg will get a cumulative size of the whole subhierarchy > and then you know you can skip the subtree because any subtree can be larger. > > > But, again, I'm not against approach suggested by Johannes. I think that overall > > it's the best possible semantics, if we're not taking some implementation details > > into account. > > I do not see those implementation details issues and let me repeat do > not develop a semantic based on implementation details. There are no problems in "select the biggest leaf or group_oom memcg, then kill the biggest task or all tasks depending on group_oom" approach, which you're describing. Comparing tasks and memcgs (what Johannes is suggesting) may have some issues. Thanks!