Received: by 10.192.165.156 with SMTP id m28csp584041imm; Tue, 17 Apr 2018 15:49:03 -0700 (PDT) X-Google-Smtp-Source: AIpwx4/ybnfCsckdoZRd1mhB8ONzFJPSieXh0cihWD2fj77jF7zL30QSiWBXQq0tHA01Alx7kelp X-Received: by 10.98.217.221 with SMTP id b90mr3570634pfl.113.1524005343296; Tue, 17 Apr 2018 15:49:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1524005343; cv=none; d=google.com; s=arc-20160816; b=YQB5rxtMA5tLnCYZT17BGrx7jkBOOOGOd3HvIK1Ae/rCYYTjKBgruwdwr/2JmLVdP6 ByPV4OI48rNWSEUGeO3DsvgiBUbHqXaNAWFdyNZPiF5BXvz5mYOBtqVctksqMhVJJ/Xw JmwVxxNN2h+NJzpc1Jk9Z/R4Oz4H7y2eWNxfITRKz4soceTFYjiZcXqiy71a7G4A47K2 b6qxTgBQ+M+d2RjYOoD9uMhguDR12Ncx9F2EY1/i7ANJ+OxiTJEiI12harU+7l7S87+A Sj4e2d8FhIpK2UlrMwjPTYzZWMxPtepEN2B/nWjbk4ES2aWIk1A0Go/pGk0AG2NlOvky ov+Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:message-id :subject:cc:to:from:date:dkim-signature:arc-authentication-results; bh=SuhoeKfJxnbRys9AaNdaMZVnGDIANmhLvW+VhrPbQEk=; b=E6JIbD+z7F3G8YktSpOvRX4QJ+A4sRpYoC/U89TxDLFd260kt5p0tr+j6o+DoiJGjc TEZA9nFHh7WCIKWCgvmhWdv6IYXtZoKRhfIYnBmSVMJsNnt2Wc0oWETjx64tiJhsQMiz hC3qS1UQWIIdNACqzSg97x75mctHPXM5VlBNeGSFuXofGL5zoo6LkjCRpymfh5PyE3Ev qO+6sA69sjT0PxcUR49IrpVziBZ3s5CG9B5o0kISq9kl0e472m40B+kNeCGsHL4YSkd9 srP1A4onCntYJV0dxO4sNFpSRQEunEv09V1EDnAvijC+s5vV068NCeWwKPLxsOV6tbm0 aPtg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=EO7WsFrR; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m14si12548680pgs.190.2018.04.17.15.48.38; Tue, 17 Apr 2018 15:49:03 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=EO7WsFrR; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752725AbeDQWq5 (ORCPT + 99 others); Tue, 17 Apr 2018 18:46:57 -0400 Received: from mail-pf0-f196.google.com ([209.85.192.196]:37409 "EHLO mail-pf0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751857AbeDQWq4 (ORCPT ); Tue, 17 Apr 2018 18:46:56 -0400 Received: by mail-pf0-f196.google.com with SMTP id p6so12844207pfn.4 for ; Tue, 17 Apr 2018 15:46:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:message-id:user-agent:mime-version; bh=SuhoeKfJxnbRys9AaNdaMZVnGDIANmhLvW+VhrPbQEk=; b=EO7WsFrRDuhAWctwr+FR7QIjrpO8bvxWcWkpcWcafEQ8LiODM/Q7X3EXsQw8ORFzi+ jlEMozjud6gUKCeIMY+h/CV0WdJLv3AHqLglYuvf17efAhFaHouqRYoOvw5VhNsjbvfX OJItqNHVHafNTcKIia2Np7q0ZzkS5f8lBHg4H3SnDIY02sDx800CoOrNW7AYOrresBil tn7OiT6OklYEzMYAzgSE36upapNo3n4l8JEzk1b2HsjWfaJiRyDuNC7ZEsP4vK/cuVRb B1Tf4qX3DU95DFlVk9q4v4xIA9EzDqkhvS9C75t1Y/H4iIr7boi84ETvoCOHg8Ih2zjg yvgQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:user-agent :mime-version; bh=SuhoeKfJxnbRys9AaNdaMZVnGDIANmhLvW+VhrPbQEk=; b=RXOrWSyk9c0N25GnZPU+aCzwKU/QRqGz/UF3SewSVKFICg/wkZaQEtOQX0UarLeCha EDljSPKEKk1G6TvzdhK0hwBIIxN6HXTPiHBR4d6IhwADpyfuhxov+hEZJsZ9WkrGVpei 6BB3jGBcNv4riQIpY+yKHISwJamP+oLSAULognAt9dfuBAPjcQAGDPJGZuc2cFeJkkfN PJhtrKK0OUEtsWpmTE5lm42yAzNoH6pR6YoYxeea3u97GsKK3wmy2LnNSzNmNEbaKPDt l+VA7VkNSD68jCTY1GNWgZkZBT7aLMIvPKi0FAvkGemhzEarnBhu8fImK1AdteRtDpyc Zb6g== X-Gm-Message-State: ALQs6tAfMYZxSIKT8g4z68Rx4kW4p8VG3/jI8qt41fyUcY2/THxsU8mY pRqAd28nRK9N9ClsBlk0wfHYmQ== X-Received: by 10.101.77.67 with SMTP id j3mr3269928pgt.210.1524005215671; Tue, 17 Apr 2018 15:46:55 -0700 (PDT) Received: from [2620:15c:17:3:3a5:23a7:5e32:4598] ([2620:15c:17:3:3a5:23a7:5e32:4598]) by smtp.gmail.com with ESMTPSA id w1sm28759233pfd.36.2018.04.17.15.46.54 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 17 Apr 2018 15:46:54 -0700 (PDT) Date: Tue, 17 Apr 2018 15:46:54 -0700 (PDT) From: David Rientjes X-X-Sender: rientjes@chino.kir.corp.google.com To: Andrew Morton cc: Michal Hocko , Andrea Arcangeli , Tetsuo Handa , Roman Gushchin , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [patch] mm, oom: fix concurrent munlock and oom reaper unmap Message-ID: User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Since exit_mmap() is done without the protection of mm->mmap_sem, it is possible for the oom reaper to concurrently operate on an mm until MMF_OOM_SKIP is set. This allows munlock_vma_pages_all() to concurrently run while the oom reaper is operating on a vma. Since munlock_vma_pages_range() depends on clearing VM_LOCKED from vm_flags before actually doing the munlock to determine if any other vmas are locking the same memory, the check for VM_LOCKED in the oom reaper is racy. This is especially noticeable on architectures such as powerpc where clearing a huge pmd requires kick_all_cpus_sync(). If the pmd is zapped by the oom reaper during follow_page_mask() after the check for pmd_none() is bypassed, this ends up deferencing a NULL ptl. Fix this by reusing MMF_UNSTABLE to specify that an mm should not be reaped. This prevents the concurrent munlock_vma_pages_range() and unmap_page_range(). The oom reaper will simply not operate on an mm that has the bit set and leave the unmapping to exit_mmap(). Fixes: 212925802454 ("mm: oom: let oom_reap_task and exit_mmap run concurrently") Cc: stable@vger.kernel.org [4.14+] Signed-off-by: David Rientjes --- mm/mmap.c | 38 ++++++++++++++++++++------------------ mm/oom_kill.c | 19 ++++++++----------- 2 files changed, 28 insertions(+), 29 deletions(-) diff --git a/mm/mmap.c b/mm/mmap.c --- a/mm/mmap.c +++ b/mm/mmap.c @@ -3015,6 +3015,25 @@ void exit_mmap(struct mm_struct *mm) /* mm's last user has gone, and its about to be pulled down */ mmu_notifier_release(mm); + if (unlikely(mm_is_oom_victim(mm))) { + /* + * Wait for oom_reap_task() to stop working on this mm. Because + * MMF_UNSTABLE is already set before calling down_read(), + * oom_reap_task() will not run on this mm after up_write(). + * oom_reap_task() also depends on a stable VM_LOCKED flag to + * indicate it should not unmap during munlock_vma_pages_all(). + * + * mm_is_oom_victim() cannot be set from under us because + * victim->mm is already set to NULL under task_lock before + * calling mmput() and victim->signal->oom_mm is set by the oom + * killer only if victim->mm is non-NULL while holding + * task_lock(). + */ + set_bit(MMF_UNSTABLE, &mm->flags); + down_write(&mm->mmap_sem); + up_write(&mm->mmap_sem); + } + if (mm->locked_vm) { vma = mm->mmap; while (vma) { @@ -3036,26 +3055,9 @@ void exit_mmap(struct mm_struct *mm) /* update_hiwater_rss(mm) here? but nobody should be looking */ /* Use -1 here to ensure all VMAs in the mm are unmapped */ unmap_vmas(&tlb, vma, 0, -1); - - if (unlikely(mm_is_oom_victim(mm))) { - /* - * Wait for oom_reap_task() to stop working on this - * mm. Because MMF_OOM_SKIP is already set before - * calling down_read(), oom_reap_task() will not run - * on this "mm" post up_write(). - * - * mm_is_oom_victim() cannot be set from under us - * either because victim->mm is already set to NULL - * under task_lock before calling mmput and oom_mm is - * set not NULL by the OOM killer only if victim->mm - * is found not NULL while holding the task_lock. - */ - set_bit(MMF_OOM_SKIP, &mm->flags); - down_write(&mm->mmap_sem); - up_write(&mm->mmap_sem); - } free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, USER_PGTABLES_CEILING); tlb_finish_mmu(&tlb, 0, -1); + set_bit(MMF_OOM_SKIP, &mm->flags); /* * Walk the list again, actually closing and freeing it, diff --git a/mm/oom_kill.c b/mm/oom_kill.c --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -521,12 +521,17 @@ static bool __oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm) } /* - * MMF_OOM_SKIP is set by exit_mmap when the OOM reaper can't - * work on the mm anymore. The check for MMF_OOM_SKIP must run + * Tell all users of get_user/copy_from_user etc... that the content + * is no longer stable. No barriers really needed because unmapping + * should imply barriers already and the reader would hit a page fault + * if it stumbled over reaped memory. + * + * MMF_UNSTABLE is also set by exit_mmap when the OOM reaper shouldn't + * work on the mm anymore. The check for MMF_OOM_UNSTABLE must run * under mmap_sem for reading because it serializes against the * down_write();up_write() cycle in exit_mmap(). */ - if (test_bit(MMF_OOM_SKIP, &mm->flags)) { + if (test_and_set_bit(MMF_UNSTABLE, &mm->flags)) { up_read(&mm->mmap_sem); trace_skip_task_reaping(tsk->pid); goto unlock_oom; @@ -534,14 +539,6 @@ static bool __oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm) trace_start_task_reaping(tsk->pid); - /* - * Tell all users of get_user/copy_from_user etc... that the content - * is no longer stable. No barriers really needed because unmapping - * should imply barriers already and the reader would hit a page fault - * if it stumbled over a reaped memory. - */ - set_bit(MMF_UNSTABLE, &mm->flags); - for (vma = mm->mmap ; vma; vma = vma->vm_next) { if (!can_madv_dontneed_vma(vma)) continue;