Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751823AbdITWYh (ORCPT ); Wed, 20 Sep 2017 18:24:37 -0400 Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:58947 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751524AbdITWYd (ORCPT ); Wed, 20 Sep 2017 18:24:33 -0400 Date: Wed, 20 Sep 2017 15:24:03 -0700 From: Roman Gushchin To: David Rientjes CC: Michal Hocko , , Vladimir Davydov , Johannes Weiner , Tetsuo Handa , Andrew Morton , Tejun Heo , , , , Subject: Re: [v8 0/4] cgroup-aware OOM killer Message-ID: <20170920222403.GA4729@castle> References: <20170913122914.5gdksbmkolum7ita@dhcp22.suse.cz> <20170913215607.GA19259@castle> <20170914134014.wqemev2kgychv7m5@dhcp22.suse.cz> <20170914160548.GA30441@castle> <20170915105826.hq5afcu2ij7hevb4@dhcp22.suse.cz> <20170915152301.GA29379@castle> <20170915210807.GA5238@castle> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.0 (2017-09-02) X-Originating-IP: [2620:10d:c090:200::4:5589] X-ClientProxiedBy: BN6PR1801CA0034.namprd18.prod.outlook.com (2603:10b6:405:5f::47) To DM3PR15MB1081.namprd15.prod.outlook.com (2603:10b6:0:12::7) X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 61f82f3b-cfb1-4f7f-16a3-08d5007652e1 X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:(300000500095)(300135000095)(300000501095)(300135300095)(22001)(300000502095)(300135100095)(2017030254152)(300000503095)(300135400095)(2017052603199)(201703131423075)(201703031133081)(201702281549075)(300000504095)(300135200095)(300000505095)(300135600095)(300000506095)(300135500095);SRVR:DM3PR15MB1081; X-Microsoft-Exchange-Diagnostics: 1;DM3PR15MB1081;3:k07rSSfgoho21cBjMH+bfGVDZ81TMaM5Aoy7b3ckL2ca42f6TrF8gWIBGTlnGTyeBoyvoVFqrVU4yrlTHDDZ9AZq7UKLy5Do+ppezFfBBLCFL3njfiZmQnuDhjwZ6oWJBiEi5kYC4I/mYGKNiRzavdZa69SV8IXpov7B6cJQYxEvZkN6qNhKXqO7xzb159i9B6szqoYkwup1EXB+VI7aplaR9cXJCf258SDpSuQgWLSPe2ySCeUf31xVL9fsaqrD;25:IF/t0h7nh7JkZuOzHn5Yym1tG+4+2FutRoPvflxc0YZ2iKR3XHmddz19CVRtVNl8kEsJxUUiUG6nmSIKeHlt1Z0tf6araqa8q73Y/Od46Xj2pdloFEbKWZwwnDj2mSomE2f3xZqfptLMLnVaX4hxTmxYZm5QkyaRUVNQTkgBNO78ZLWnXf2JhIF4WnhsXu8OxOQqg/PgVkArP9CIDpfI32ugpX3MQparITcersUNG4ED6g2VgH9wWZH2+22pSLdpyJvSzFIqVvefCRrnW1UYU555yLtNRYmJQhKtKyjZ8myEHv+Pd5J95dallkamWX1XTX2LfmP7BdmdDAToR8WsWw==;31:jlzvNK0HLZ5G/69Vid3LwfBuhVQHH4voWY6moZP6JQzgLZm5lgVwfd1WcXjfjXetrSxqg2WH49SIMYfzu2r3QR5F7382hyIRQvSD7Kr2SuHLTmWIqHfY8zNE1zd9pXEArzWGwyWxvXdNZdFo4smI1HLAjsHBUXyCZ3GbXtmMt9hTtKgfaW4Dfra1PvIGa9GhG7455Zdlm97WueOF665TIkbbMrOBMbS3x2Es6TpareU= X-MS-TrafficTypeDiagnostic: DM3PR15MB1081: X-Microsoft-Exchange-Diagnostics: 1;DM3PR15MB1081;20:OE9MxPx+28cnZE3gzj2VgJaMvuhmx4TfLtGXAfMqot5o0JMV3a6TeNr6/N506KJjyCP8pLtrbzz89QvrXDBY6E24nm7van+AXpGUY8prtNR86+KDf+gwg+JSn3ApBS7D8tttgMUIAmr0CS9QJgClEJDTqgCf8mCSKaEiUepdnkJtYJj4hrorZzxdWH8eZJ/VkpmqS+dOHTJ8zk0vKtQMTWbMpwoTDuGaK28OnlcsxcJQXveUbfjHcNWOUrxA1eK5O29HP6iosxBzO/7o5yTGftqFxmyKLU1uidHWNL1Hh+No0HP6D5mctt3Tp2AmCaEr16b6hQFVLjLwvF8aF6KQm6PuSuugb187YexfYSwM09FD982IX6xLVz9boRKFmNSyjo8/QQrpwdYMbB4SzIW7y5Wz8bG9r7JRNx4fHdOQL/JVlc6Y9XNLip/JmvfpjvsZ0cUnZnuTzKP5KpVg0NY9d9OUCaB6hAPsp1YuFPxriWnPkJmdbr2tBxoUXWftHl+c;4:H3PukjUJGXcwRQzpOf4neesidd2OOCBtQvdb5uXzgmCuG08ZcNsMJwYVgnlOV4ZS0ma5WbcPXmJ4tSY9UZQnpZOtROoRDNczXH+qi4xqAQaw114o1J+h4KDWfb01NrYklEAx2R4QrvOqsommLx++Xq5avuwea29t1/7oLLbhG37iyvzGr+aKX9USjWU7Z01MhQye0YW147HIO/YC1UQM9jD05XoDts4zax1EAgp9k0SRrcPL9ScNPPDZS+/1WcJP X-Exchange-Antispam-Report-Test: UriScan:; X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(6040450)(2401047)(5005006)(8121501046)(10201501046)(3002001)(100000703101)(100105400095)(93006095)(93001095)(6041248)(20161123560025)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123558100)(20161123564025)(20161123555025)(20161123562025)(6072148)(201708071742011)(100000704101)(100105200095)(100000705101)(100105500095);SRVR:DM3PR15MB1081;BCL:0;PCL:0;RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095);SRVR:DM3PR15MB1081; X-Forefront-PRVS: 04362AC73B X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(6009001)(346002)(376002)(199003)(189002)(24454002)(101416001)(53936002)(6666003)(229853002)(33656002)(478600001)(7736002)(305945005)(105586002)(106356001)(8676002)(7416002)(81156014)(81166006)(68736007)(8936002)(93886005)(39060400002)(6496005)(6246003)(50466002)(33716001)(5660300001)(25786009)(4326008)(189998001)(1076002)(6916009)(23726003)(6116002)(2906002)(9686003)(55016002)(50986999)(76176999)(97736004)(54906003)(58126008)(16586007)(316002)(86362001)(47776003)(83506001)(2950100002)(54356999)(18370500001)(42262002);DIR:OUT;SFP:1102;SCL:1;SRVR:DM3PR15MB1081;H:castle;FPR:;SPF:None;PTR:InfoNoRecords;A:1;MX:1;LANG:en; X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1;DM3PR15MB1081;23:i+1KFzZfQZvSz+0vSdMpcssX1//vMQPm9NboSPK5y?= =?us-ascii?Q?+JHURGfXd0XiVSQE93A7O9e6xA2Iw5Bvj1HxBlAStzTJQSxW9B9m5ULWLwg0?= =?us-ascii?Q?eK6fTxFATUevrg6sSBWzZzQNSnQax4UWfprHaHf6XAe0U9sAzTZ5qCpt0Z0B?= =?us-ascii?Q?/sx8DeVVzmYBUX9R2/6KiIi4s+Zm6sdDOdY1skXp+2oLlU9MqwgrDxZhfran?= =?us-ascii?Q?K7IWhORaqq2DWWvlE67go5M9qXAPZxBnxw5zlw4s0u/GFlzWC0qpxxjrb41/?= =?us-ascii?Q?yCI+TFUsqaw43KsxMHPZ4iq0h8y43DGq/SlowDg1BhUnPdMGKiWPu1m5dm1G?= =?us-ascii?Q?nz+nTLwiuMXmsJCmdVq0C+EsWUrjdpmqYzADz1PnzgztQQ9VI1UYp5T4IVIR?= =?us-ascii?Q?Y3dts0OTx3Ueo+EIh2D+ta0ETQVGiyEIw+glx7dHZb71EpA88UrIrB9NSmsh?= =?us-ascii?Q?k6XLusewc9kvKOpA+DD/8kADebY7+fCTsKnt3eQVbi+h4r3hdFVCvaeJdguX?= =?us-ascii?Q?d4Id4kzypTlfCAuJTLoyaH3rSGf00mzxz05P1KTVU/d6MZeJ+lyMse4rQJHZ?= =?us-ascii?Q?DEUMZXYME757RDNE3QcFknnTfONkb1a7eeinEftjtZdaBsTBfabJVqgN1qCk?= =?us-ascii?Q?1cGKqs0Nx4HwCqw4K403v+URTH8jPBIYCs3yfnwLrAIGjtOum5rNZGqKV7Bc?= =?us-ascii?Q?m8h8FLRBh85iSs5j3W0OYWezZ5F4U1sSf7udscuRG4GwoquOMQxOXg/P+nOR?= =?us-ascii?Q?i3HPJqEpqhdh6PArLe1dqSt3hWciL5LFayGVeJ9BnDhRaGDu3R74PfG+YVDr?= =?us-ascii?Q?oetdmlx0lh+iqY69gDAt9PR57+zhDnuIsTXfJIn7TDWroC9x9uvXXzrReO4t?= =?us-ascii?Q?UUTMQ1w0AHDu0w4tOdAEuGXMOyI7rr0AUXxsmE2N4xAba1nS0VBnlZ9uhjb4?= =?us-ascii?Q?PzYYUEQPo7r1n0AUcqHgt/EIVk4+41CWH8BaIAATGXbDQCj9s10WNkse08Nh?= =?us-ascii?Q?6aHGnxD3of54OQ6gEbhJKbPch+5wAZ37HwFlWh99108e/vSb787CMWx145zZ?= =?us-ascii?Q?YB3S3iOfritpYarUTcf6ibL2oCb9EeD3PfPCvN0ZRTfctAVuqbBMLNYteMOk?= =?us-ascii?Q?F7JY129i+aDzse+U2z1DDBQOjbLUTGqh18PI6Q2NbwMJ4cfg0nAe5YSWG0ZS?= =?us-ascii?Q?fG7Ks7M/kWi0qK8vCCvJ2xq3L7pdg2+/2gPaEAVd3Q49LAuWFhC7ZjsDg=3D?= =?us-ascii?Q?=3D?= X-Microsoft-Exchange-Diagnostics: 1;DM3PR15MB1081;6:xB4NO0CObgVLu/6H9exsrqh2sZGDYLQEnKrEL68XwcCA9+NW6lS85fAYu3lXOmTBu0EhBaNmDsgMZTGwpLuHCh+yusriqIquXtma+34/yO6uZpF3DO8ssMcpDR8kSY3vOEbG+mq9My9hHi1P+WzEnS+Nixg3XA3PCiVaXmMCUrrTQJVzLIHqostIABD8JCteHka2Y6V2KBK1VzbRflJxJNWO6Hp2R2lmmBVYh6RQPB4ZdCRyzTmnxh1JIYjCziYCW7BURo+e2dDhvNhmc4qxYoo4+Hov8mvmgJYeB2eGEqaKZpqiqZVGjlFunWNZCX3EXepnM3qmcKuQL2IBiRWHwg==;5:q1bvqPIGZVS4C/RienNz2wMdxvW1LGd3oRNegxVv4hP+bOV1v77xEpjTantuldFc7yooMzxZjnoKbirDyjWg4fSTxCFNsmG9+m3NTjACgImF8euqn18jijw85F0snHGzLQlJQLkAj7ZckCh++761aw==;24:5LNMnNmtRdG2hfVLOsJd5DGYWD7vqR1yfteC4i+MDW/bchVuuuUKdtz7YwD5YGXnOCr+oB6ysHdRpFc2GyIRLlwbA1Nsle/ahfiwKmfNYtg=;7:CEeuqLWBqlOR1icGBvnRzqv2Q9q/oJvopLzxtVCO0higMwkKbx5zB5uw8NUsY80BdDRVr6u8bC+7ZmExTSgXh73+SZkRBx7sZneefY4oouevNEZoiphm7ZWarea+sa52SJgywnMp/svlb74wQT8XJ/mniiIJonu7/mRy9If64G/+IumSUeqw/viIkm1a5VGYeAIJ0y2pfsJlrfHochPp/8d+z3BWZa8o3rBtgaYH3U0= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1;DM3PR15MB1081;20:kqXFiPOYrOKNpqHIPjDFDN6fN5/FDSkS+9/aSVg/39GiVTO/56m/cI1LidXr4fPEatHaSfBnOtBWI8MRds4J3rL4+p0CyclhdcoFGQuuRs2EHDqe+wFT1Jr3jaYd22EPHC6lUUzIsHfR0IQ/+txpWHy1jf9Eieaj9hD3cDtInic= X-MS-Exchange-CrossTenant-OriginalArrivalTime: 20 Sep 2017 22:24:11.4884 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 8ae927fe-1255-47a7-a2af-5f3a069daaa2 X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM3PR15MB1081 X-OriginatorOrg: fb.com X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-09-20_06:,, signatures=0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2832 Lines: 61 On Tue, Sep 19, 2017 at 01:54:48PM -0700, David Rientjes wrote: > On Fri, 15 Sep 2017, Roman Gushchin wrote: > > > > > > But then you just enforce a structural restriction on your configuration > > > > > because > > > > > root > > > > > / \ > > > > > A D > > > > > /\ > > > > > B C > > > > > > > > > > is a different thing than > > > > > root > > > > > / | \ > > > > > B C D > > > > > > > > > > > > > I actually don't have a strong argument against an approach to select > > > > largest leaf or kill-all-set memcg. I think, in practice there will be > > > > no much difference. > > > > > > > > The only real concern I have is that then we have to do the same with > > > > oom_priorities (select largest priority tree-wide), and this will limit > > > > an ability to enforce the priority by parent cgroup. > > > > > > > > > > Yes, oom_priority cannot select the largest priority tree-wide for exactly > > > that reason. We need the ability to control from which subtree the kill > > > occurs in ancestor cgroups. If multiple jobs are allocated their own > > > cgroups and they can own memory.oom_priority for their own subcontainers, > > > this becomes quite powerful so they can define their own oom priorities. > > > Otherwise, they can easily override the oom priorities of other cgroups. > > > > I believe, it's a solvable problem: we can require CAP_SYS_RESOURCE to set > > the oom_priority below parent's value, or something like this. > > > > But it looks more complex, and I'm not sure there are real examples, > > when we have to compare memcgs, which are on different levels > > (or in different subtrees). > > > > It's actually much more complex because in our environment we'd need an > "activity manager" with CAP_SYS_RESOURCE to control oom priorities of user > subcontainers when today it need only be concerned with top-level memory > cgroups. Users can create their own hierarchies with their own oom > priorities at will, it doesn't alter the selection heuristic for another > other user running on the same system and gives them full control over the > selection in their own subtree. We shouldn't need to have a system-wide > daemon with CAP_SYS_RESOURCE be required to manage subcontainers when > nothing else requires it. I believe it's also much easier to document: > oom_priority is considered for all sibling cgroups at each level of the > hierarchy and the cgroup with the lowest priority value gets iterated. I do agree actually. System-wide OOM priorities make no sense. Always compare sibling cgroups, either by priority or size, seems to be simple, clear and powerful enough for all reasonable use cases. Am I right, that it's exactly what you've used internally? This is a perfect confirmation, I believe. Thanks!