Received: by 10.192.165.156 with SMTP id m28csp739183imm; Tue, 17 Apr 2018 19:40:53 -0700 (PDT) X-Google-Smtp-Source: AIpwx4+/ZuAaAYzmpliJnIgwSHnJ9VLdcyQVlgrXW3JV1p1HBUno0XcDB38oMlRH5GQ5orkD3VeG X-Received: by 2002:a17:902:728f:: with SMTP id d15-v6mr280426pll.119.1524019253872; Tue, 17 Apr 2018 19:40:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1524019253; cv=none; d=google.com; s=arc-20160816; b=SpXd4RKV4oXvQC7JWnRLjqRUV94H2F8MhQLHS3fUa5jN460+y2SGVp1HH/ikF+4rn7 rkPTs2FzdBsCXUp0vSxwwKcPKQvVVGCasVX9NDIxw4/9uc+OXjb5bNBWl9GZckpQHo7z mDY2tQ4+8tx+UTRXbUQPuDJ0JhC3ATfg3tcgyPJNgb/WJ40wiUCvbdf6p4NiDaYeNhJ6 JdsuMHDBIIXhaHgNxSzCBWLJnHKSVcnlUdjyu1c57wdlb7y31zyONRde0hGAmd3E47R0 /e1ADDiaZOPnIHXNr6JszyZVSckbYeE3+2m46ONxgS/d0/Y15zuEeEGvRSQLJ/0R9VZD ZnvA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date:dkim-signature :arc-authentication-results; bh=MUNH42TIdt+5gaB5PXUpPcSQpCKcT976ksbYdvoYf1Q=; b=BM8lSOagLEbsoo51ADtqwxlLsxJTmr23o6XALgKTcgfg/Evt7PGTn4NVQRZ2tgWU0f JccYrwstr2+hWehOD7Am4QSu1kkDYWw5BGmObT3uqqe8pejs4Fq6+IjEadCkyJhXWOX9 CQehaa4/yATdFmd9kL6qw6+Ehfvg8vTECKU/nOXDc83RSC38B7wrUxWwB4xb8twxN+x+ hM8FqdmfSuFKdJ2wT583wwNJPD0ycDfnxbJzCVERu0MEkgAy8pDtBBMZWTJ9MqaaRQXT 3Amo7KQGpzvUPDMQ3A3M0Xa1+7QdTtQoy7PnTeGXHoyjHm/oaJrWtUp7L4u2hevShoiF AA1g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=rhQp35bz; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id r1si245396pff.24.2018.04.17.19.40.39; Tue, 17 Apr 2018 19:40:53 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=rhQp35bz; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753383AbeDRCjc (ORCPT + 99 others); Tue, 17 Apr 2018 22:39:32 -0400 Received: from mail-pl0-f47.google.com ([209.85.160.47]:40650 "EHLO mail-pl0-f47.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753002AbeDRCja (ORCPT ); Tue, 17 Apr 2018 22:39:30 -0400 Received: by mail-pl0-f47.google.com with SMTP id t22-v6so181552plo.7 for ; Tue, 17 Apr 2018 19:39:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=MUNH42TIdt+5gaB5PXUpPcSQpCKcT976ksbYdvoYf1Q=; b=rhQp35bz9FujDsJnCyaz2yh5rsmGE6Y8C/UTUHbQRVcVBaGC2k0kboyBW63SEwiUhe +E5f9UvVz1RigGwSN0WNrzeIdCAQEp6vwVoNd7aDE+ILDb23DToZN4FsoYns6ZDvYJEM N1KRX0jLB4u/6hBH8rdFWL4bCX6LGL87ZbpXC4LgXY0wsv8k8o1MRCIY4vSulJ26mk/9 mU9dkGzt044Mn30l6Ln9CU7J/IemlB/CI/NXJNCfdwylvnhumpYy51T9Shsx9IajX0XV pnqbpigi2RefmdVEl+QS9JSia6ntjYGlEn1pKtV1KxCUqx9i8uy2tC/qLigEnc02M77a 8fgg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version; bh=MUNH42TIdt+5gaB5PXUpPcSQpCKcT976ksbYdvoYf1Q=; b=rmvyW2XWRUVxYZb02j1X36IzPIl9MEyOkWAUneltUKfZJXEECeyLLnUsl3/o7UPU54 9vnoIiTqBOH+5y61uCn/CpK96Yi3VMLHRBhfE/5JypWjjNdtFAzpAWHGfUulAwZSUtHo M1esr5FEE1nb24Pl+StyTojHDJZRCZjY542QDx52p8d/0q+5VYqPgb9DxIv5LMDC0ENm 8nLBSgSntbJhBD/8MQut+MKwGc8XhNNo9jurMhS2VYbjXWwdcKeNK7lMOehhXB4kq+Kc dPUanioC1/xx7riLw3cPxjqp0W8aYCt7Y1WOPm30rl/mn8nBBxmXIWWzgA0z05wLkpGY IAPQ== X-Gm-Message-State: ALQs6tCxxOt0PKGwfLmguWG0F0wlbdbJNGiBCgb8GAtzppIPL5j3K2Lc 2ceksLe5+HcddSqRsxBb5WiKqQ== X-Received: by 2002:a17:902:70c7:: with SMTP id l7-v6mr263616plt.165.1524019169990; Tue, 17 Apr 2018 19:39:29 -0700 (PDT) Received: from [2620:15c:17:3:3a5:23a7:5e32:4598] ([2620:15c:17:3:3a5:23a7:5e32:4598]) by smtp.gmail.com with ESMTPSA id m7sm241019pgs.31.2018.04.17.19.39.28 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 17 Apr 2018 19:39:28 -0700 (PDT) Date: Tue, 17 Apr 2018 19:39:28 -0700 (PDT) From: David Rientjes X-X-Sender: rientjes@chino.kir.corp.google.com To: Tetsuo Handa cc: Andrew Morton , Michal Hocko , Andrea Arcangeli , Roman Gushchin , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [patch] mm, oom: fix concurrent munlock and oom reaper unmap In-Reply-To: <201804180057.w3I0vieV034949@www262.sakura.ne.jp> Message-ID: References: <201804180057.w3I0vieV034949@www262.sakura.ne.jp> User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 18 Apr 2018, Tetsuo Handa wrote: > > Since exit_mmap() is done without the protection of mm->mmap_sem, it is > > possible for the oom reaper to concurrently operate on an mm until > > MMF_OOM_SKIP is set. > > > > This allows munlock_vma_pages_all() to concurrently run while the oom > > reaper is operating on a vma. Since munlock_vma_pages_range() depends on > > clearing VM_LOCKED from vm_flags before actually doing the munlock to > > determine if any other vmas are locking the same memory, the check for > > VM_LOCKED in the oom reaper is racy. > > > > This is especially noticeable on architectures such as powerpc where > > clearing a huge pmd requires kick_all_cpus_sync(). If the pmd is zapped > > by the oom reaper during follow_page_mask() after the check for pmd_none() > > is bypassed, this ends up deferencing a NULL ptl. > > I don't know whether the explanation above is correct. > Did you actually see a crash caused by this race? > Yes, it's trivially reproducible on power by simply mlocking a ton of memory and triggering oom kill. > > Fix this by reusing MMF_UNSTABLE to specify that an mm should not be > > reaped. This prevents the concurrent munlock_vma_pages_range() and > > unmap_page_range(). The oom reaper will simply not operate on an mm that > > has the bit set and leave the unmapping to exit_mmap(). > > But this patch is setting MMF_OOM_SKIP without reaping any memory as soon as > MMF_UNSTABLE is set, which is the situation described in 212925802454: > Oh, you're referring to __oom_reap_task_mm() returning true because of MMF_UNSTABLE and then setting MMF_OOM_SKIP itself? Yes, that is dumb. We could change __oom_reap_task_mm() to only set MMF_OOM_SKIP if MMF_UNSTABLE hasn't been set. I'll send a v2, which I needed to do anyway to do s/kick_all_cpus_sync/serialize_against_pte_lookup/ in the changelog (power only does it for the needed cpus).