Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp854527imm; Fri, 13 Jul 2018 07:27:09 -0700 (PDT) X-Google-Smtp-Source: AAOMgpea37DHwH0kRLvkwX/+9RYU8VnwcYu2l/spB2kFCQFA3c/Wsr68VzwYfDH/xGuelS8TdJV6 X-Received: by 2002:a17:902:8210:: with SMTP id x16-v6mr6638505pln.307.1531492029228; Fri, 13 Jul 2018 07:27:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1531492029; cv=none; d=google.com; s=arc-20160816; b=n1R6PqGkdmiFQXjytGfSehrBe27nG25CWeo5PWPTCfCB4isUp9wiuNh2rVBYVNOolA tvzRMSA8ycnKhGIG2H5XLkzddEt6pnjPuFBvkxHNyc3tYxIOBXKUPB/U7GB7zPaiPUZx /6yecFzhHX7RR4TNdEVseSIWv1ZIRmByMPNNwC2DLHN5RRfWX+X6HaVZnVZAXDp+WxIi O4tUclRGhILhCVB2i+lXERSwo3Dn+/ZJx+Dh20d87HHYU7osBgFyOoOs48HR3V6CeIwZ pDBslRdX0wd75rjeame+fArvpVbDeDpB9YKdtzwgIfef6xZUwAWtEj6lrn8BZO2dZL/A 9IzA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=Uu2Z2bMN3WNkNuLWuNmVUtovcagLCpWG6xM94LxN9pk=; b=ehshxLRwuY0kltIe58TOP4Sv/uI/KiZE6/O+rRNAhOg5VRxebQOIeIBsAzihr5GPze xuINYSxvq7Mn/VC6OCHALNru+R4GAoNhBVeJdnfybH5umOqtekFGUs+x2wERXzY3VYBN noelmCV4wJYp/ziEfamCc+RvoUGda3sjL6fj9y1c/k+6ZOt/2TnjtmcDzP1sOu+nbBPC wKHa3BEcNO6cr05FDsZjTTBVdxCYRGVbG358rzaJTgRB7Uq9icekbaX0T8ZbfLP84yL5 Rrj6fpinijSbSQS1L+3hC4jmP/m9sBuo294Tzl3WQqf26UB8g636weSMGK4t0O5eJUSS nyDw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q8-v6si24239927pfh.353.2018.07.13.07.26.54; Fri, 13 Jul 2018 07:27:09 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731181AbeGMOlH (ORCPT + 99 others); Fri, 13 Jul 2018 10:41:07 -0400 Received: from mx2.suse.de ([195.135.220.15]:51688 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1729982AbeGMOlG (ORCPT ); Fri, 13 Jul 2018 10:41:06 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 350E0AD9C; Fri, 13 Jul 2018 14:26:13 +0000 (UTC) Date: Fri, 13 Jul 2018 16:26:12 +0200 From: Michal Hocko To: David Rientjes Cc: Andrew Morton , Tetsuo Handa , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [patch -mm] mm, oom: remove oom_lock from exit_mmap Message-ID: <20180713142612.GD19960@dhcp22.suse.cz> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.0 (2018-05-17) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu 12-07-18 14:34:00, David Rientjes wrote: [...] > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > index 0fe4087d5151..e6328cef090f 100644 > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -488,9 +488,11 @@ void __oom_reap_task_mm(struct mm_struct *mm) > * Tell all users of get_user/copy_from_user etc... that the content > * is no longer stable. No barriers really needed because unmapping > * should imply barriers already and the reader would hit a page fault > - * if it stumbled over a reaped memory. > + * if it stumbled over a reaped memory. If MMF_UNSTABLE is already set, > + * reaping as already occurred so nothing left to do. > */ > - set_bit(MMF_UNSTABLE, &mm->flags); > + if (test_and_set_bit(MMF_UNSTABLE, &mm->flags)) > + return; This could lead to pre mature oom victim selection oom_reaper exiting victim oom_reap_task exit_mmap __oom_reap_task_mm __oom_reap_task_mm test_and_set_bit(MMF_UNSTABLE) # wins the race test_and_set_bit(MMF_UNSTABLE) set_bit(MMF_OOM_SKIP) # new victim can be selected now. Besides that, why should we back off in the first place. We can race the two without any problems AFAICS. We already do have proper synchronization between the two due to mmap_sem and MMF_OOM_SKIP. diff --git a/mm/mmap.c b/mm/mmap.c index fc41c0543d7f..4642964f7741 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -3073,9 +3073,7 @@ void exit_mmap(struct mm_struct *mm) * which clears VM_LOCKED, otherwise the oom reaper cannot * reliably test it. */ - mutex_lock(&oom_lock); __oom_reap_task_mm(mm); - mutex_unlock(&oom_lock); set_bit(MMF_OOM_SKIP, &mm->flags); down_write(&mm->mmap_sem); diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 32e6f7becb40..f11108af122d 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -529,28 +529,9 @@ void __oom_reap_task_mm(struct mm_struct *mm) static bool oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm) { - bool ret = true; - - /* - * We have to make sure to not race with the victim exit path - * and cause premature new oom victim selection: - * oom_reap_task_mm exit_mm - * mmget_not_zero - * mmput - * atomic_dec_and_test - * exit_oom_victim - * [...] - * out_of_memory - * select_bad_process - * # no TIF_MEMDIE task selects new victim - * unmap_page_range # frees some memory - */ - mutex_lock(&oom_lock); - if (!down_read_trylock(&mm->mmap_sem)) { - ret = false; trace_skip_task_reaping(tsk->pid); - goto unlock_oom; + return false; } /* @@ -562,7 +543,7 @@ static bool oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm) if (mm_has_blockable_invalidate_notifiers(mm)) { up_read(&mm->mmap_sem); schedule_timeout_idle(HZ); - goto unlock_oom; + return true; } /* @@ -589,9 +570,7 @@ static bool oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm) up_read(&mm->mmap_sem); trace_finish_task_reaping(tsk->pid); -unlock_oom: - mutex_unlock(&oom_lock); - return ret; + return true; } #define MAX_OOM_REAP_RETRIES 10 -- Michal Hocko SUSE Labs