Received: by 10.213.65.68 with SMTP id h4csp356644imn; Tue, 13 Mar 2018 06:39:21 -0700 (PDT) X-Google-Smtp-Source: AG47ELuzb8mGnx29kKF3AE6DJkWuAbOSEc//waBWFEK94HkBGk2Q9RFcaMNVcPW0uZ07vtHJnnoY X-Received: by 10.101.97.26 with SMTP id z26mr561806pgu.44.1520948360973; Tue, 13 Mar 2018 06:39:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1520948360; cv=none; d=google.com; s=arc-20160816; b=a00rz4xz5g4c07nL4VrLr0RiyckQHzGjGtqpHkt0ibx7/b8PLcxt5xjjtJCe9x8RUT GUe+/VCIuPJvmNMc1zWJ1IixKFIgfMK603+v6Dh8h5vDB7xBX+9dLi2TGYu/quJrrPeD WLQ0Fw4ehwYT3iwYvU8WLCSz4uLRwcE56zpGxNcRpu+4o5El35QOnMI8NgOYhxXKHp/v DyX2wX4Jh+dCZFHiNK27mliBwGJtlaGgKR4WG7RietD+HUu4KibrioKTwaFFYNg0aLbl 3g8vveIB/a4L1phe1sGri66nGlvec1ZDHH7BmqtvYEfxFkO3t/tG9kiXwnm8RaCfwOJ1 Zs5Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=CLI3FAC6h67NXxHU3LjAU6TB28a+iffUUFl8jTj1G0c=; b=mIFRuOA20oPsph6l9Q7gvfvDLORLrA5iUYVUX7Mrss8LySTYozQ9lDM4qiNck1Bg8K fqFXjE67O+FH32cbokINUv+q5AH3e/KhsAziI+HAq6My2jJm0/tWs+oUxkYVyHcgp/Yb iNMeqFREANxPyPAkW8zFvNihvF2sp8TWC2KWLZf3trXDAapBtKu+zNnDi8kcu9ObFmBV EYPjnUaDIT28ompKl0lAK3sWzSrTvYEmDEUBoXB9pgNRdJJzqywWMN1czFaUWofOPk6Q gMIE1MBi+cZkDPXZJrVqegit1EuIcw48KyU3GAh8Q7ycV33Ps/DOH7p2DPhp7ETMGY/Z n0IA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l4-v6si158592plt.62.2018.03.13.06.39.06; Tue, 13 Mar 2018 06:39:20 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932904AbeCMNhg (ORCPT + 99 others); Tue, 13 Mar 2018 09:37:36 -0400 Received: from mx2.suse.de ([195.135.220.15]:52964 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752372AbeCMNhe (ORCPT ); Tue, 13 Mar 2018 09:37:34 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id E705AAEEE; Tue, 13 Mar 2018 13:37:32 +0000 (UTC) Date: Tue, 13 Mar 2018 14:37:32 +0100 From: Michal Hocko To: Tetsuo Handa Cc: gkohli@codeaurora.org, rientjes@google.com, akpm@linux-foundation.org, kirill.shutemov@linux.intel.com, aarcange@redhat.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-arm-msm@vger.kernel.org Subject: Re: [PATCH] mm: oom: Fix race condition between oom_badness and do_exit of task Message-ID: <20180313133732.GS12772@dhcp22.suse.cz> References: <1520427454-22813-1-git-send-email-gkohli@codeaurora.org> <22ebd655-ece4-37e5-5a98-e9750cb20665@codeaurora.org> <14ba6c44-d444-bd0a-0bac-0c6851b19344@codeaurora.org> <201803091948.FBC21396.LHOMSFFOVFtQJO@I-love.SAKURA.ne.jp> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201803091948.FBC21396.LHOMSFFOVFtQJO@I-love.SAKURA.ne.jp> User-Agent: Mutt/1.9.3 (2018-01-21) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org [Sorry about the slow response but I was offline for almost two weeks and catching up with a tsunami in my inbox now] On Fri 09-03-18 19:48:46, Tetsuo Handa wrote: > Kohli, Gaurav wrote: > > > t->alloc_lock is still held when leaving find_lock_task_mm(), which means > > > that t->mm != NULL. But nothing prevents t from setting t->mm = NULL at > > > exit_mm() from do_exit() and calling exit_creds() from __put_task_struct(t) > > > after task_unlock(t) is called. Seems difficult to trigger race window. Maybe > > > something has preempted because oom_badness() becomes outside of RCU grace > > > period upon leaving find_lock_task_mm() when called from proc_oom_score(). > > > > Hi Tetsuo, > > > > Yes it is not easy to reproduce seen twice till now and i agree with > > your analysis. But David has already fixing this in different way, > > So that also looks better to me: > > > > https://patchwork.kernel.org/patch/10265641/ > > > > Yes, I'm aware of that patch. > > > But if need to keep that code, So we have to bump up the task > > reference that's only i can think of now. > > I don't think so, for I think it is safe to call > has_capability_noaudit(p) with p->alloc_lock held. This however adds a subtle assumption on locking here and we should rather not do so. The scope of alloc_lock is quite messy already and adding on top is definitely not an improvement. > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > index f2e7dfb..4efcfb8 100644 > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -222,7 +222,6 @@ unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg, > */ > points = get_mm_rss(p->mm) + get_mm_counter(p->mm, MM_SWAPENTS) + > mm_pgtables_bytes(p->mm) / PAGE_SIZE; > - task_unlock(p); > > /* > * Root processes get 3% bonus, just like the __vm_enough_memory() > @@ -230,6 +229,7 @@ unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg, > */ > if (has_capability_noaudit(p, CAP_SYS_ADMIN)) > points -= (points * 3) / 100; > + task_unlock(p); > > /* Normalize to oom_score_adj units */ > adj *= totalpages / 1000; -- Michal Hocko SUSE Labs