Received: by 2002:a05:6a10:144:0:0:0:0 with SMTP id 4csp87109pxw; Fri, 8 Apr 2022 01:42:28 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyTLkRg/tAghYnUqoLCqqR59dNxhMBbObOXoKOkXSbeLBR/NyEYgRJknKf+mNuV+xr8Kx0C X-Received: by 2002:a17:907:728b:b0:6df:8f3b:28ae with SMTP id dt11-20020a170907728b00b006df8f3b28aemr17269404ejc.336.1649407348081; Fri, 08 Apr 2022 01:42:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1649407348; cv=none; d=google.com; s=arc-20160816; b=glKCO8BrD2Aq4rGBFctKWJ1rW56gSv2+fyqCGnyQ04mS0ObovNT8WndYUJt4JWyalk 2W4tod81pxppG6jon3lUUpkzA6UoUEWbhyzz6h8P2rSb9WP3lJPHZv0Y2FrXq17OnrJi zkj18+D4TqjJd9OnjNDXUhZ3jhdaYk2itpqnsS9BCU5EZ+fSKCsltUVTEv9jGvRBGrKl KF56hsvhtavKPkodRjyP+mEpPF1Gz/oWYpEQ+Vyuwm3NK4rceu1ML4Z+rFtL0e5OtjRY 8d0wo61LC1WWIZDPuXtg67/6KJjZIrkU+tcGNM//ee3BdjrqwSI+BhSDUCCjNQrt2pEB B/rg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=9waK1GE6Xo1fRvL2g920wRma1kjG5tt3UIqXpis3+Gg=; b=NuysgXLpfMphN1NWxr2EFWQrWzn1Kfe4b+Jt8VpvexzOGtFVlWF/Zg7jvguyjwuahH YcePxn1oNGOV2VNzK0pcbbdvs8/vACNAPjw4yLiHDt7kw6UPJktqmpSiVL5teyiC9pOF 7t+/WWi83ZSMYoxysbJaFl0KHYwE2uw1wWaReQ9w86zjMusSDPZCp8gEFTypV5PUEjWm M9fbGSDVPZwOZTGEmu/VfGpCCGBI8gKffZLc0Pugzx9y5AkAJ9RajJZh/fw2SHmBqAYn fcoYMOACDxFGbSuaWNfZ+Vnu2hQoryTUeA1gc6/GufxymniytErcnXfQGzPkAz0hEUJG mu8A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=lUGzIaWd; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id v22-20020a170906381600b006df76385cd5si497461ejc.373.2022.04.08.01.42.01; Fri, 08 Apr 2022 01:42:28 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=lUGzIaWd; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231423AbiDHIS2 (ORCPT + 99 others); Fri, 8 Apr 2022 04:18:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35262 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230245AbiDHISY (ORCPT ); Fri, 8 Apr 2022 04:18:24 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BDA28657A0 for ; Fri, 8 Apr 2022 01:16:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=9waK1GE6Xo1fRvL2g920wRma1kjG5tt3UIqXpis3+Gg=; b=lUGzIaWdoZwMt2kUyFEN2QSwYH 7Yyi5jmaN7iUzK0vnVvs+sltz+iF5E6vpTGjt9skfyJbOKYxnC9FekrYfsGYAfoDPHFBX4cJgNskk UX4oL+V/9oe+00qlR+L0ython3kkTMO19+FbJZGZjhiqvFll1axcPhH8FRYBNjalBai1+rpYJN99f N09hlKKQ6zpu67MtCqT+g7mBC1jLmc/ApUx0oWLp7ntAS0E9uffmUTc4Hq2d0n4KO63uxr45RbLLG yZw0blKpD1jj8PcWU4Exs0/8bcdjrsKbcrpZ9ZDcTP6dgEIM9ZSJSAslvPpjIZwgBx/wns8KWS53X MuWjj+rA==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=worktop.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1ncjmZ-009dKd-NG; Fri, 08 Apr 2022 08:15:51 +0000 Received: by worktop.programming.kicks-ass.net (Postfix, from userid 1000) id ACBDF9862CF; Fri, 8 Apr 2022 10:15:49 +0200 (CEST) Date: Fri, 8 Apr 2022 10:15:49 +0200 From: Peter Zijlstra To: Nico Pache Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Rafael Aquini , Waiman Long , Baoquan He , Christoph von Recklinghausen , Don Dutile , "Herton R . Krzesinski" , David Rientjes , Michal Hocko , Andrea Arcangeli , Andrew Morton , Davidlohr Bueso , Thomas Gleixner , Ingo Molnar , Joel Savitz , Darren Hart , stable@kernel.org Subject: Re: [PATCH v8] oom_kill.c: futex: Don't OOM reap the VMA containing the robust_list_head Message-ID: <20220408081549.GM2731@worktop.programming.kicks-ass.net> References: <20220408032809.3696798-1-npache@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220408032809.3696798-1-npache@redhat.com> X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Apr 07, 2022 at 11:28:09PM -0400, Nico Pache wrote: > The pthread struct is allocated on PRIVATE|ANONYMOUS memory [1] which can > be targeted by the oom reaper. This mapping is used to store the futex > robust list head; the kernel does not keep a copy of the robust list and > instead references a userspace address to maintain the robustness during > a process death. A race can occur between exit_mm and the oom reaper that > allows the oom reaper to free the memory of the futex robust list before > the exit path has handled the futex death: > > CPU1 CPU2 > ------------------------------------------------------------------------ > page_fault > do_exit "signal" > wake_oom_reaper > oom_reaper > oom_reap_task_mm (invalidates mm) > exit_mm > exit_mm_release > futex_exit_release > futex_cleanup > exit_robust_list > get_user (EFAULT- can't access memory) > > If the get_user EFAULT's, the kernel will be unable to recover the > waiters on the robust_list, leaving userspace mutexes hung indefinitely. > > Use the robust_list address stored in the kernel to skip the VMA that holds > it, allowing a successful futex_cleanup. > > Theoretically a failure can still occur if there are locks mapped as > PRIVATE|ANON; however, the robust futexes are a best-effort approach. > This patch only strengthens that best-effort. > > The following case can still fail: > robust head (skipped) -> private lock (reaped) -> shared lock (skipped) This is still all sorts of confused.. it's a list head, the entries can be in any random other VMA. You must not remove *any* user memory before doing the robust thing. Not removing the VMA that contains the head is pointless in the extreme. Did you not read the previous discussion?