Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752578AbdI0KTv (ORCPT ); Wed, 27 Sep 2017 06:19:51 -0400 Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:41780 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751800AbdI0KTr (ORCPT ); Wed, 27 Sep 2017 06:19:47 -0400 Date: Wed, 27 Sep 2017 11:19:13 +0100 From: Roman Gushchin To: Michal Hocko CC: Tim Hockin , Johannes Weiner , Tejun Heo , , David Rientjes , , Vladimir Davydov , Tetsuo Handa , Andrew Morton , Cgroups , , "linux-kernel@vger.kernel.org" Subject: Re: [v8 0/4] cgroup-aware OOM killer Message-ID: <20170927101913.GB4159@castle> References: <20170925170004.GA22704@cmpxchg.org> <20170925181533.GA15918@castle> <20170925202442.lmcmvqwy2jj2tr5h@dhcp22.suse.cz> <20170926105925.GA23139@castle.dhcp.TheFacebook.com> <20170926112134.r5eunanjy7ogjg5n@dhcp22.suse.cz> <20170926121300.GB23139@castle.dhcp.TheFacebook.com> <20170926133040.uupv3ibkt3jtbotf@dhcp22.suse.cz> <20170926172610.GA26694@cmpxchg.org> <20170927074319.o3k26kja43rfqmvb@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <20170927074319.o3k26kja43rfqmvb@dhcp22.suse.cz> User-Agent: Mutt/1.9.0 (2017-09-02) X-Originating-IP: [2620:10d:c092:180::1:a461] X-ClientProxiedBy: HE1PR05CA0200.eurprd05.prod.outlook.com (2603:10a6:3:f9::24) To BL2PR15MB1075.namprd15.prod.outlook.com (2603:10b6:201:17::9) X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 3be6565a-e40c-4e95-1091-08d505913c55 X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:(22001)(2017030254152)(2017052603199)(201703131423075)(201703031133081)(201702281549075);SRVR:BL2PR15MB1075; X-Microsoft-Exchange-Diagnostics: 1;BL2PR15MB1075;3:JzLokd1gGpH+5yHFExIeYY2uhyVwQznDYmfAY/P2qHgqn12XGyEYL/XRgXoJhHVWnFOkMnpuz7bBgsGPQC9s7WFeH5gPcWGDX9xeXBheP2qA0XEJYyjJWwtYjjDhmRMujapiBX5n3H8liVwxezZhF7WHyFSyx6WpcempqbfM9ZoGZGC4iTgwc0Ash9Ddpi01d5ciZ14EjC/190krEIcxxHhlfdlHl0XcoeGCJMwenBHhgHW4vVqPDQapM+dwLjMv;25:EkZNV/Dv/E7G1/WvjGsR4ZqCwsoYfh3GJcHOGUYuuES/OU82tU2iXpa4ZurzYLcn8UruXAItLRUFBgyVFhQr8YcLxu3KhJRXK9JTYtPom5/wluLh4/r+PckfGTxKc3iAhygEsTU1dikYNEVVruZBrr795CQhQZ6vvVBVdn0o7Q3s4U8GiIk6AGyvacPKV8zCHxeNwJNmQ//Xo09gZPblStUGBIAmZmczfgIO6eVgKuQWnXhtLjOLQdEMdaMKTfogEk1a58MQpYiONp5T32MEI/1RrZ+7GtF4+ynbPvfr3CpVJQhb4h6lOIJ6+cxqR8zmLywe2jDw+AeNuIKf/kf8TA==;31:RyCRc4k2aoDblyyRI+2AI/Puc5cjZAnVJfwDh1Oj2/YhLW9cMYIT7G5ghUwnWcjz5gc7RYh2bAS5GDjsncMzL5gEvnR1+uxXaQeVnzWtBFLPAJZy4FQEkSpCV/HhZfPXKkjj5lSgJXwY8LTL4StgIU/uj3VT35PzqZHQkLgA+y/Ukm9g4QtnHze/x7rFTsH95snrBUuZHpdiSQjVpSuROu2RNJfnf7EonVg1GeqoXzE= X-MS-TrafficTypeDiagnostic: BL2PR15MB1075: X-Microsoft-Exchange-Diagnostics: 1;BL2PR15MB1075;20:sq2Ix2YxovnUCmL7HXLOjnF9pBQ26p/1/67H2SFZpc9vJdRLVY7n7byO3eSW5UAA6aJrLFY771GoaaAqUqORKo/EFdlM+uhwyj4GlV7afAg3TiU1cGDSdhr9MD0SgVmHdlgmENdOmjTaIRAQI2Bia+jh5MfBcWzeTTSNd+hK9Drrs1hOEu5tmX4NMqCi+uc7aILNkXeZGHZCeSPq4/1OlJKf4oh39af5B7YPPEuYIPyS2e8FNiejjeu8CTuKNIExjMAV16qGfg0sPW8BQbC8TjVIq1JGWAZ7q+xPPnZycpxXrboWg3JdFboAXVRwlRWWoB+UfLHY7mnwJj8TDpfhFttUlTAZSo80oUwSScVdXZ3OmVj5HSXsPn6G2bfOIcYte2lqRtOTHSh3OVD1Wvkj4and9Xvrkg012INeGHAJyV/Td1ru7DwqZZQOUZ3BQlsAxp+lFH0iEGbUiFzxsFFK7K2RAdkgQCtKAnlc2mgZdsYCoj+bW4nlgC5J56MsNZGy;4:JO6jGYFhdMYZRPVPdjnpEM6LEIZS/dT3FfapXFe2n5OdtONiDM/XxhI71MifseymkMiP0csc0Yvk8zt6s65nwLzy5+iw7ABQD2L/nU2KSZ4g4f9IzLD83cgJN+hqq0MiyCT4VxD7Ok+s8nnodEYKX4R0eku0+sMxmef8MBlotWPxJFhO1hKsWImc81agVEAxAZoINojnxYxrpLZbn8+YfDrVstJa0Ebrmx1DbV8WS16T1jUjVrb9tgcGTfRdNmsX5NvvVkyhp/Wx2InqNKeC9MPSv/cohQQwUrNbaixN4mM= X-Exchange-Antispam-Report-Test: UriScan:(17755550239193); X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(6040450)(2401047)(8121501046)(5005006)(100000703101)(100105400095)(10201501046)(3002001)(93006095)(93001095)(6041248)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123558100)(20161123562025)(20161123564025)(20161123560025)(20161123555025)(6072148)(201708071742011)(100000704101)(100105200095)(100000705101)(100105500095);SRVR:BL2PR15MB1075;BCL:0;PCL:0;RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095);SRVR:BL2PR15MB1075; X-Forefront-PRVS: 04433051BF X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(6009001)(346002)(376002)(24454002)(377424004)(189002)(199003)(6666003)(9686003)(33656002)(106356001)(2950100002)(55016002)(93886005)(53936002)(6916009)(6496005)(101416001)(229853002)(83506001)(1076002)(6116002)(8936002)(23726003)(81166006)(81156014)(5660300001)(2906002)(105586002)(8676002)(7736002)(305945005)(478600001)(68736007)(50466002)(39060400002)(58126008)(50986999)(316002)(16586007)(47776003)(54906003)(33716001)(7416002)(86362001)(189998001)(4326008)(25786009)(6246003)(54356999)(76176999)(97736004)(18370500001)(42262002);DIR:OUT;SFP:1102;SCL:1;SRVR:BL2PR15MB1075;H:castle;FPR:;SPF:None;PTR:InfoNoRecords;MX:1;A:1;LANG:en; X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1;BL2PR15MB1075;23:Y0la6POLxf9Ugn6aSIrx3JC2IB0A38+9m/F1G0L3P?= =?us-ascii?Q?Rw8znGhEGDJFVYd3cK69A+SZsHotSp4D6i5EzrfnEmlXnXeGkDdUA3V4oMS/?= =?us-ascii?Q?pLiNHooRJonqvfs3e2PdrBqE7XZcsot13+mSNEjsuX2kNv1HubSh2FgNfY+W?= =?us-ascii?Q?mlY7OaBhEt10wZupZf/GHNDwPEhFpYdV5l3ik/i6c7YkK/TEJdnfrrEBoNAW?= =?us-ascii?Q?HYU62wChCogMHnhiOk51COrsvno3L9AZbAE+tmnStYSLnKnGIjeRXOiUF1Bb?= =?us-ascii?Q?VoTmCpJbNWBlcfddxc6QIlz/iwCn7rbKB3Wwd5W+iUwstkzTBVg5o5EDqzy8?= =?us-ascii?Q?6GGjY7Rl5Q/P1wa4FhH2Y9KPuIxGKsRu7ATZ8L7pKfXqgkNGRRpbbuje1GeP?= =?us-ascii?Q?s7KnmkQmTJ00ez43aWUd0GgZWAgCubQ7ulT4hev5iDgowGc/lSP5cTvMVWGd?= =?us-ascii?Q?60mWRM8AfkFW3MPAMu/DmkJ3jUjaE86N+aXAS0DTIELpml8M208Wa7+GKRvn?= =?us-ascii?Q?8RH2YZB0fPEKw3cs9kTbT+yymXKImK5h+K2otRVucskYJbU9vAIDcZ8Au/+M?= =?us-ascii?Q?cHr5GSBCf9orujlYpKLbQAd0u5bBHMblpg3DhsaKqayonR9F/+P56SesmE8c?= =?us-ascii?Q?chj8hS/xvNwIZVneQu3Spv8+L/+lbn59bHMzA6vr5BQbQO1OdxhcvKA/32RU?= =?us-ascii?Q?/uUeM215fuehjfNeFIDx44GCfNK4+SIeVEwwZMYTuqH6oDNliK0E74WKDhpP?= =?us-ascii?Q?X5DzwqkkpQDIqj8Zk19s6KMtquzJw2iDTjCO2DlhxKtgS+jyaNc3FY/kEQuU?= =?us-ascii?Q?aG7mBKzWVC5h1BnHov5oq0UWKvj5JJFit0fMoOYGpu0UhFc4An8Ru37PXFpo?= =?us-ascii?Q?I6HCl93537Mma97CxH5qvuXu2ESmgAhkNxc9FJr+zHGenivqwK6QPo9Gv6Zg?= =?us-ascii?Q?4ZRg+uCkYTLogC/umHEgQbNr51cyaQS+wNC2O0J+lTD+JRqgIysb5GS3ex0D?= =?us-ascii?Q?nBUnc6fgNwy5gpLtoDJ707hv6GzIdWb+G7TvglfK7mG9Ot5V5HFtOURQW1M2?= =?us-ascii?Q?gzeX/SzabnuUILQa43Dubjn3aRjphJIFMcFCiTYqQpdzVfpPwgg8mU+J/dkn?= =?us-ascii?Q?YrXTN1MA7gUeT1Bh0WxdvOU1B6NiAbhLgiLM2KD98SSOe0SPDfvSdPZ9yb01?= =?us-ascii?Q?RpVbiywjekzChFmh7zVPVeNaDJKiO1qRGXceMdG6Jg8lDfOJ4a+C5xHB3W8u?= =?us-ascii?Q?g5lQRNR9DkccyTIEdI=3D?= X-Microsoft-Exchange-Diagnostics: 1;BL2PR15MB1075;6:u7vPlueq5Sv+V1CqTPoOg806Zbx7Ima8WgtmV9I0paFOj2I8jdfKGZtI40hBWJIKhiuDIp7L8Ye+BYDHYrke2CBCeeARPPVnXifKI/SdtqiH53C8FQBcEvg4s5f/wUHs1A5ovVLyBqAEMlxQd0sLiOlPH/vqv1GDq1/h47+B6fHm3n17xgXSgNX9TaQF+D5KKlK9p4yU7ijBRdagp/3aN8v2ffT4P4AomysC5NGEkzH0hhGORoDKaFdjumOjl0OES4TgnP3ltYWGqzeaB11u76hkHABVYxF/OHY8yav7vKZhEdixIr/WlrjhIIz6WIcykOVlTrWaVd3D/x03bvbmLQ==;5:4VMhngh95zqsucGS0BOScSNPcBUjORcFkfuTkYbLktbqC9EKRnNncY9MYiwWBid2ejK3t99G6XVOoe0GGhxozds1W83lkPCrWpRAMBWPxzXwV4LeOerLZZHNszOu79WVhbsN9s+JrKH6UnmOcUVY2g==;24:UYQvZBoeYCpeYct1sLhyIM1FkdwwyRZqUVRkF98odK1J8B8QAOFEFuQM2sR36HWoqsJ/fpX4CHlCnJ99wqvdgidkZQ9WcDpAjBLPhJ+ffjg=;7:/tkotUsZfacffbKx0UI4+KGRAyuQqOxI/+Jvn3qH4nDXrhpEbzJh1UlxbaLboSW6npF5+vOMDfY4K6aD3s/ByEK5Rtqrq/kGb/USG6Z1JYvCX93nsO1K1BRlcRKfq64m1sbYQ4RAlaK5MHCveM6B3XukCN8uTMC3IUqCd5lulyAIviXiVlODbZ6GuPILHEHdKXZgrqTRxSaSMJTGgYTQ68OdVrEfKqjXulRHpJy8ySY= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1;BL2PR15MB1075;20:hGXCYkOp7H/wVDhk2KP4kySyN9lC9cMgbLr2ab6z8dL6EGtGxBSVW1d/sBKKZreiBLef0tOhvbGomeqPBRKgzoVcpuY5gIqdhq7BVcLB/UrHGs54eDYSb0Tsn5f9uX7/KdP/ZBxxs9UZxouL7fLB3smnbrFLd8MRtL1dOOXwkBg= X-MS-Exchange-CrossTenant-OriginalArrivalTime: 27 Sep 2017 10:19:25.2771 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 8ae927fe-1255-47a7-a2af-5f3a069daaa2 X-MS-Exchange-Transport-CrossTenantHeadersStamped: BL2PR15MB1075 X-OriginatorOrg: fb.com X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-09-27_02:,, signatures=0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2565 Lines: 54 On Wed, Sep 27, 2017 at 09:43:19AM +0200, Michal Hocko wrote: > On Tue 26-09-17 20:37:37, Tim Hockin wrote: > [...] > > I feel like David has offered examples here, and many of us at Google > > have offered examples as long ago as 2013 (if I recall) of cases where > > the proposed heuristic is EXACTLY WRONG. > > I do not think we have discussed anything resembling the current > approach. And I would really appreciate some more examples where > decisions based on leaf nodes would be EXACTLY WRONG. > I would agree here. The discussing two-step approach (select biggest leaf or oom_group memcg, then select largest process inside) does really look as a way to go. It should work well in practice and it allows further development. It will catch workloads which are leaking child processes by default, which is an advantage in comparison to the existing algorithm. Both strong hierarchical approach (as in v8) and pure flat (by Johannes) are more limiting. In first case, deep hierarchies are affected (as Michal mentioned) and we stick with tree traverse policy (Tejun's point). In second case, the further development is under a question: any new idea (say, oom_priorities, or, for example, if we will have a new useful memcg metric) should be applied to processes and memcgs simultaneously. Also, We drop any idea of memcg-level fairness and obtain some implementation issues (which I mentioned earlier). The idea of mixing tasks and memcgs leads to a much more hairy code, and the OOM code is already quite hairy. The idea of comparing killable entities is a leaking abstraction, as we can't predict how much memory killing a single process will release (say, for example, the process is the init in a pid namespace). > > We need OOM behavior to kill in a deterministic order configured by > > policy. > > And nobody is objecting to this usecase. I think we can build a priority > policy on top of leaf-based decision as well. The main point we are > trying to sort out here is a reasonable semantic that would work for > most workloads. Sibling based selection will simply not work on those > that have to use deeper hierarchies for organizational purposes. I > haven't heard a counter argument for that example yet. Yes, implementing oom_priorities is a ~15 lines patch on top of the discussing approach. David can use this small off-stream patch for now, in any case it's a step forward in comparison to the existing state. Overall, do we have any open question left? Does anyone has any strong arguments against the discussing design? Thanks!