Received: by 10.223.185.116 with SMTP id b49csp6350064wrg; Thu, 8 Mar 2018 06:07:16 -0800 (PST) X-Google-Smtp-Source: AG47ELvMDrPoC8bWWZwv5GPZoN9a19J2dsyCR5Zk/iXuYv8VAU3AmEZx3Fzp0Wj2p34xoepaogXG X-Received: by 10.99.6.85 with SMTP id 82mr20941529pgg.181.1520518036221; Thu, 08 Mar 2018 06:07:16 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1520518036; cv=none; d=google.com; s=arc-20160816; b=lXkEgTeXCnzO0pQoIw2rHZSH2VWEUO9CHA4V82Ckt5MLQBNCoITb7dmwnkzXZVr0zd iLIA0O7aY+q7r+Se2FbA96F53DkJgKobAkhkWp9UWVmDfYQzJMPMM7in71lhiwTKg1us B2UTgExEGAguhrbKjnemWe7mzFKbckTOeukSk5mwongAIIjgEUqLbTTYNGN1uA9sXMfs kA6g0NsLQH67D2hylpk9x02G4ucqu/F0EwVxMaRwzJufMr07k8oVxgS5xa82fUbgNWmK pQv2gQlnW279CV/yJeoO54f5bfOoFVF4FdfSHJT3jvSO05X6x+x98fGgalpZGxK3LZBo sH1w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:arc-authentication-results; bh=HMFrompKpxWnriRXk6g5UYd+WbLRTXTeM39JPXZVspI=; b=C82j+1DfEtMb8qyEvXVeOQiBaz/vlssptuYILT910qpqvulsm1WdmEDcVshRvY4KGY Ou5a6cAFzmLM3fX6LQpm4cMauTLj4KYC8WUtneFtaZ8GpCvn9S+ZRr7BUNFvKtX4WVny 79uAGQC3fyrKWoHGpTf7kDYbZhLKp82rqjNbTE+B67vZwC85aKj/HYYeLNynKCFhpjgz XM5ncE98dBS91rruu4SXryowrdRQNIi+z1hK+0KPXn/PULmfOzYyocjJhYtagDMc+ycT vYguntaXBB76j2BBM/PKPyY+b9XmjzdTxlr8tGqDgbIKPUiRTsd9TcYs2DggPB0s1Srs z6QA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f35-v6si14855774plh.107.2018.03.08.06.06.59; Thu, 08 Mar 2018 06:07:16 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935818AbeCHOFg (ORCPT + 99 others); Thu, 8 Mar 2018 09:05:36 -0500 Received: from www262.sakura.ne.jp ([202.181.97.72]:63128 "EHLO www262.sakura.ne.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755602AbeCHOFf (ORCPT ); Thu, 8 Mar 2018 09:05:35 -0500 Received: from fsav404.sakura.ne.jp (fsav404.sakura.ne.jp [133.242.250.103]) by www262.sakura.ne.jp (8.14.5/8.14.5) with ESMTP id w28E5XUK005459; Thu, 8 Mar 2018 23:05:33 +0900 (JST) (envelope-from penguin-kernel@I-love.SAKURA.ne.jp) Received: from www262.sakura.ne.jp (202.181.97.72) by fsav404.sakura.ne.jp (F-Secure/fsigk_smtp/530/fsav404.sakura.ne.jp); Thu, 08 Mar 2018 23:05:33 +0900 (JST) X-Virus-Status: clean(F-Secure/fsigk_smtp/530/fsav404.sakura.ne.jp) Received: from [192.168.1.8] (softbank126099184120.bbtec.net [126.99.184.120]) (authenticated bits=0) by www262.sakura.ne.jp (8.14.5/8.14.5) with ESMTP id w28E5SlL005420 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 8 Mar 2018 23:05:33 +0900 (JST) (envelope-from penguin-kernel@I-love.SAKURA.ne.jp) Subject: Re: [PATCH] mm: oom: Fix race condition between oom_badness and do_exit of task To: "Kohli, Gaurav" , David Rientjes Cc: akpm@linux-foundation.org, mhocko@suse.com, kirill.shutemov@linux.intel.com, aarcange@redhat.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-arm-msm@vger.kernel.org References: <1520427454-22813-1-git-send-email-gkohli@codeaurora.org> <22ebd655-ece4-37e5-5a98-e9750cb20665@codeaurora.org> From: Tetsuo Handa Message-ID: Date: Thu, 8 Mar 2018 23:05:33 +0900 User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 In-Reply-To: <22ebd655-ece4-37e5-5a98-e9750cb20665@codeaurora.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2018/03/08 13:51, Kohli, Gaurav wrote: > On 3/8/2018 2:26 AM, David Rientjes wrote: > >> On Wed, 7 Mar 2018, Gaurav Kohli wrote: >> >>> diff --git a/mm/oom_kill.c b/mm/oom_kill.c >>> index 6fd9773..5f4cc4b 100644 >>> --- a/mm/oom_kill.c >>> +++ b/mm/oom_kill.c >>> @@ -114,9 +114,11 @@ struct task_struct *find_lock_task_mm(struct task_struct *p) >>>         for_each_thread(p, t) { >>>           task_lock(t); >>> +        get_task_struct(t); >>>           if (likely(t->mm)) >>>               goto found; >>>           task_unlock(t); >>> +        put_task_struct(t); >>>       } >>>       t = NULL; >>>   found: >> We hold rcu_read_lock() here, so perhaps only do get_task_struct() before >> doing rcu_read_unlock() and we have a non-NULL t? > > Here rcu_read_lock will not help, as our task may change due to below algo: > > for_each_thread(p, t) { >          task_lock(t); > +        get_task_struct(t); >          if (likely(t->mm)) >              goto found; >          task_unlock(t); > +        put_task_struct(t) > > > So only we can increase usage counter here only at the current task. static int proc_single_show(struct seq_file *m, void *v) { struct inode *inode = m->private; struct pid_namespace *ns; struct pid *pid; struct task_struct *task; int ret; ns = inode->i_sb->s_fs_info; pid = proc_pid(inode); task = get_pid_task(pid, PIDTYPE_PID); /* get_task_struct() is called upon success. */ if (!task) return -ESRCH; ret = PROC_I(inode)->op.proc_show(m, ns, pid, task); put_task_struct(task); return ret; } static int proc_oom_score(struct seq_file *m, struct pid_namespace *ns, struct pid *pid, struct task_struct *task) { unsigned long totalpages = totalram_pages + total_swap_pages; unsigned long points = 0; points = oom_badness(task, NULL, NULL, totalpages) * 1000 / totalpages; /* task->usage > 0 due to proc_single_show() */ seq_printf(m, "%lu\n", points); return 0; } struct task_struct *find_lock_task_mm(struct task_struct *p) /* p->usage > 0 */ { struct task_struct *t; rcu_read_lock(); for_each_thread(p, t) { task_lock(t); if (likely(t->mm)) goto found; task_unlock(t); } t = NULL; found: rcu_read_unlock(); return t; /* t->usage > 0 even if t != p because t->mm != NULL */ } t->alloc_lock is still held when leaving find_lock_task_mm(), which means that t->mm != NULL. But nothing prevents t from setting t->mm = NULL at exit_mm() from do_exit() and calling exit_creds() from __put_task_struct(t) after task_unlock(t) is called. Seems difficult to trigger race window. Maybe something has preempted because oom_badness() becomes outside of RCU grace period upon leaving find_lock_task_mm() when called from proc_oom_score().