Received: by 2002:a05:6a10:413:0:0:0:0 with SMTP id 19csp571350pxp; Wed, 9 Mar 2022 08:22:20 -0800 (PST) X-Google-Smtp-Source: ABdhPJzBx9n2j5AqA/6I/zOL0B+bnlOZ++lsiHEvOifvIlIRgU7qKtj4z8Kq8mbIIl9Aq83Bc6R3 X-Received: by 2002:a17:906:2ecd:b0:6d3:d7c9:8fa4 with SMTP id s13-20020a1709062ecd00b006d3d7c98fa4mr519632eji.144.1646842939887; Wed, 09 Mar 2022 08:22:19 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1646842939; cv=none; d=google.com; s=arc-20160816; b=VJPXqT48AxYkDe4t5UtNGjMHvVBjw6MEundEPtzoPY73bwqKP7nYtT27uiw7qKa87q WFsMVx/0ys3pB3l7mwSfRFtsVzn/HEr6uZlnqfhMa6XQ1CU0L18FxGAueyuhgLV5QrIq Wt2kYvByf2m4gcrX2W2LPsO4XN4VHAV6ssZUqbzxj5sHyS1fh6kaj/70h63BfjE2abXt hofPswJHcq9UfDIK/4YlOmQ2xCzutoj0fM58eB/g4Ry0T6OsCpVm8vF1cE1lnoiDG1fH hdD5c8QGaxmsHSr6Ky7TE60mOjejsplGFqJX2ESZV6ewGsgCyVU66wTyOquZXnFT4vpZ diPQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=AF5nHoVPx+QVoqlhpPP/leSCgEv59byPDm5kXVlHWhQ=; b=kALUGvoJhLdwAxtFJnUUtRiIQprgXoS8g53Oq98MvfjmarTCUDZGzYz5is3ejLwDiW 6xKQh/7JrZUFsl+7wIptoolHz9G7o7AVhFtsegkOI/Judi/Qyw6X30J6njJC7Wvgq1Xg 1d2HpEtPJJZ7cnOwJ0v7VTTDfrHyQ1pg9DfwhUJ/TY+7fj5Ut4QXEKPPQugrw4utdsey 9/JA0QPu5pF5YIEZVQCjp3xQA26SBqkvzbiML7ycSDE5lS/v6KX8cR58QGcVyz0oJFsd sINEQZgFncQ3S80yis0jLG/ziz8wNtORUEObqwz4+j64/OVzbEZ0tNvNXqw223Ok7IHV lYpQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=mz4exN5K; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id c18-20020a170906171200b006d033a130fcsi1433928eje.728.2022.03.09.08.21.32; Wed, 09 Mar 2022 08:22:19 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=mz4exN5K; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232993AbiCINLD (ORCPT + 99 others); Wed, 9 Mar 2022 08:11:03 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49906 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231661AbiCINLC (ORCPT ); Wed, 9 Mar 2022 08:11:02 -0500 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 54689102154 for ; Wed, 9 Mar 2022 05:10:03 -0800 (PST) Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id C8C2D1F381; Wed, 9 Mar 2022 13:10:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1646831401; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=AF5nHoVPx+QVoqlhpPP/leSCgEv59byPDm5kXVlHWhQ=; b=mz4exN5KalKA/E49edzKguWynqFlnY2k74hzjQHSy7MY2N8okqHEgXg4oP2hud2Sdi90Ln 6+ytQQDJDQos6pchUbd0tGqRMkFH/kiwmazj3QkMgdr37E6c0WYxotsgqsMghBuL5XxuUz p9VoX/mPiip+j5tlpYGQRfR/XtnN32Q= Received: from suse.cz (unknown [10.100.201.86]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 5C904A3B81; Wed, 9 Mar 2022 13:09:56 +0000 (UTC) Date: Wed, 9 Mar 2022 14:09:59 +0100 From: Michal Hocko To: Nico Pache Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Rafael Aquini , Waiman Long , Baoquan He , Christoph von Recklinghausen , Don Dutile , "Herton R . Krzesinski" , David Rientjes , Andrea Arcangeli , Andrew Morton , tglx@linutronix.de, mingo@redhat.com, dvhart@infradead.org, dave@stgolabs.net, andrealmeid@collabora.com, peterz@infradead.org, Joel Savitz Subject: Re: [PATCH v4] mm/oom_kill.c: futex: Don't OOM reap a process with a futex robust list Message-ID: References: <20220309002550.103786-1-npache@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220309002550.103786-1-npache@redhat.com> X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue 08-03-22 17:25:50, Nico Pache wrote: > The pthread struct is allocated on PRIVATE|ANONYMOUS memory [1] which can > be targeted by the oom reaper. This mapping is also used to store the futex > robust list; the kernel does not keep a copy of the robust list and instead > references a userspace address to maintain the robustness during a process > death. A race can occur between exit_mm and the oom reaper that allows > the oom reaper to clear the memory of the futex robust list before the > exit path has handled the futex death. The above is missing the important part of the problem description. So the oom_reaper frees the memory which is backing the robust list. It would be useful to link that to the lockup on the futex. > Prevent the OOM reaper from concurrently reaping the mappings if the dying > process contains a robust_list. If the dying task_struct does not contain > a pointer in tsk->robust_list, we can assume there was either never one > setup for this task struct, or futex_cleanup has properly handled the > futex death and we can safely reap this memory. I do agree with Waiman that this should go into a helper function. This would be a quick workaround but I believe that it would be much better to either do the futex cleanup in the oom_reaper context if that could be done without blocking. If that is really not feasible for some reason then we could skip over vmas which are backing the robust list. Have you considered any of those solutions? -- Michal Hocko SUSE Labs