Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp3082214ybl; Fri, 20 Dec 2019 03:38:48 -0800 (PST) X-Google-Smtp-Source: APXvYqyKU2Eba2CgvXxei7PL4aNo1svW7SwV4VyfCK3O+wXNrt0e3jpuamzkE8DsW51lV8qCvKQ3 X-Received: by 2002:a9d:66c1:: with SMTP id t1mr12933222otm.73.1576841928348; Fri, 20 Dec 2019 03:38:48 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1576841928; cv=none; d=google.com; s=arc-20160816; b=WIhDw/QvaMj0grlm4aiicVjGSkpcZmzP4RnK0cA9zm5thiKSF1HIoVsuq/ezvpD6p7 iwRxjIoYyhyo0/IEvlsRrHzrRc4moi4bpMbllUexvc+WV9jvpYoA+RrsReRisr1YuE+E IMDYccmGdkgByFtoMTvFocKRGALrT+92stF/JRr8mbglN+kC9xTFMSF8OmKC3Rn83VCF BYWGutHG4cmPkIBgANgfGKK6RUtP/n+UQgMETl/nYUZzDwXfCHZOUrxd9NUA9i5Ei4kc Wqv0gEABhz2uyapv669SZ1+ruPC2Wao7TAmFOuOuNPqX9INKqchmyfMkfpyvE1jozZ8y kuXw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=+xqrG/Fr+lk5K0m1pptDnTQOO1AdoG749ulUXApgoJo=; b=OKeeG0fGVy7bN2bZQHCO/QAcyGu2iIPiOA/ROmKDGHMYSSRM8x3u4sNLHA8O6XmyDj y6UV5z4/PTkMOotkuvK+w3Mhj/7qrq0y4p1gULy3gt6Z27hIQVvRE2Nn1VP+0GWxK7T4 fhKHkbd2miebW/+eU/+R7KeTM4k+fKNKS5SVZof3MWtJryvT6uXrVpy+Nk8hXEnM+YXZ ovf3E1OwlWOOHysdMMcwitgijU7mGa3phHngYS3NeCkKtBbmzWTe/fY1dtY2gRGe8sjb yicH58jF7V4MY6cHG6pVJTB0qEthA0wccg/cbbwYx0bSIYgr4WSpd/euyHQx9C8HDfMe NeHQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id i20si4627791otk.270.2019.12.20.03.38.36; Fri, 20 Dec 2019 03:38:48 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727341AbfLTLht (ORCPT + 99 others); Fri, 20 Dec 2019 06:37:49 -0500 Received: from mail-wm1-f68.google.com ([209.85.128.68]:53923 "EHLO mail-wm1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727184AbfLTLht (ORCPT ); Fri, 20 Dec 2019 06:37:49 -0500 Received: by mail-wm1-f68.google.com with SMTP id m24so8595226wmc.3; Fri, 20 Dec 2019 03:37:47 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to:user-agent; bh=+xqrG/Fr+lk5K0m1pptDnTQOO1AdoG749ulUXApgoJo=; b=PjsD4G79liFs7IK7ZIxgjW2sIjJcOIfF+7D7HWaxyXYhsvxdmoVfdm6JcF9Nx73Z1R TyohPsE7Cu/VZJ6zF2X2we0ZT2jojRde9jJfztrNF1L9GVPHyFe3RYwUudRcgyekELW7 WWXZIM1jZeqvL8Yy6TCLojYuNcVEnswyssEMe20oFdgX42rcvIczqRv9XBzTHRxT83u5 hhIWjkdJM2TGxu0+lr6k7jao1sKPUuH89boBbj0Xkz65zGTtkIfasMH1zfZFFOYiPnhr OoDuK7ugzv4yViuSre3YsmfCaAtgKUuU6yhXNxzqdhXHOoPZtudqwCZgY5BTrZQvfn/u qm4w== X-Gm-Message-State: APjAAAVe85pwyqnDUtNdqIXch42djNI2d3K2fmoy/mJ61rsZto6pDFNk HUWg7i/y0H0UrHFADaryFlc= X-Received: by 2002:a1c:638a:: with SMTP id x132mr16983341wmb.43.1576841866625; Fri, 20 Dec 2019 03:37:46 -0800 (PST) Received: from localhost (prg-ext-pat.suse.com. [213.151.95.130]) by smtp.gmail.com with ESMTPSA id f1sm9955905wrp.93.2019.12.20.03.37.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 20 Dec 2019 03:37:45 -0800 (PST) Date: Fri, 20 Dec 2019 12:37:44 +0100 From: Michal Hocko To: =?utf-8?B?5b2t5b+X5Yia?= Cc: akpm@linux-foundation.org, hannes@cmpxchg.org, vdavydov.dev@gmail.com, Shakeel Butt , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, zgpeng Subject: Re: [PATCH] oom: choose a more suitable process to kill while all processes are not killable Message-ID: <20191220113744.GF20332@dhcp22.suse.cz> References: <1576823172-25943-1-git-send-email-zgpeng.linux@gmail.com> <20191220071334.GB20332@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.12.2 (2019-09-21) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org [Please do not top post] On Fri 20-12-19 17:56:20, 彭志刚 wrote: > certainly. > > Steps to reproduce: > (1)Create a mm cgroup and set memory.limit_in_bytes > (2)Move the bash process to the newly created cgroup, and set the > oom_score_adj of the bash process to -998. > (3)In bash, start multiple processes, each process consumes different > memory until cgroup oom is triggered. > > The triggered phenomenon is shown below. We can see that when cgroup oom > happened, process 23777 was killed, but in fact, 23772 consumes more memory; > > [ 591.000970] Tasks state (memory values in pages): > [ 591.000970] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name > [ 591.000973] [ 23344] 0 23344 2863 923 61440 0 -998 bash > [ 591.000975] [ 23714] 0 23714 27522 25935 258048 0 -998 test > [ 591.000976] [ 23772] 0 23772 104622 103032 876544 0 -998 test points = 103032 + 0 + 876544/4096 = 103246 > [ 591.000978] [ 23777] 0 23777 78922 77335 667648 0 -998 test points = 77335 + 0 + 667648/4096 = 77498 It is not clear what is the actual hard limit but let's assume that rss+page_tables is the only charged memory (or at least the majority of it). That would be 207680 so the normalized oom_score_adj would be -206586 which is way too big for both tasks so from the OOM killer perspective both tasks are equal. The question is whether this is a bug or a (mis)feature. The oom_score_adj je documented as follows: Documentation/filesystems/proc.txt : Consequently, it is very simple for userspace to define the amount of memory to : consider for each task. Setting a /proc//oom_score_adj value of +500, for : example, is roughly equivalent to allowing the remainder of tasks sharing the : same system, cpuset, mempolicy, or memory controller resources to use at least : 50% more memory. A value of -500, on the other hand, would be roughly : equivalent to discounting 50% of the task's allowed memory from being considered : as scoring against the task. Which implies that we are talking about the budget based on a usable memory (aka hard limit in this case). I do agree that the semantic is awkward. I know there are usecases which try to use the existing scheme for oom_score_adj to fine tune oom decisions and I am worried your patch might break those. That being said, I am not sure this change is safe wrt. to backward compatibility. I would rather recommend to not using oom_score_adj for anything but OOM_SCORE_ADJ_MIN resp OOM_SCORE_ADJ_MAX. -- Michal Hocko SUSE Labs