Received: by 2002:a05:6a10:413:0:0:0:0 with SMTP id 19csp3591312pxp; Tue, 8 Mar 2022 18:16:17 -0800 (PST) X-Google-Smtp-Source: ABdhPJyUJvaBmY2H6dgTQP/yZv8gIJzJMfI26PO+2puA1NEcQ2+3fSq55LjgPn/sLWJu/QSw+kpY X-Received: by 2002:a63:83c8:0:b0:380:bb85:56d with SMTP id h191-20020a6383c8000000b00380bb85056dmr128241pge.541.1646792177051; Tue, 08 Mar 2022 18:16:17 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1646792177; cv=none; d=google.com; s=arc-20160816; b=leyT4jr50te2TNAZrQ5CHUV1yTUtcLkz9Sg29J3TQ2xwWY4EQZ5j4CQH9USsGr/Xzt 5nAAV5J8jXMuYQuYcoco+ywApXi76jSRTbOEHZbst7jNUak/BQhSV4Pp0FyPO5wbiu3X vENgDqdTFBhiuRat1eCfS+2b+lTQnLN0OEwB7gj+P6A4ulBcynN6TGsgk/mr81QaZwBO RlQq3DBBshU1y47EI/bvkRi/p5ml7lxqVIrajyjE71HW2GLCDec6ZBDu5dye71LG/jBF SFUmCNYrzRtSYps+mq/VPD+QacUNA7rGEh4SVc1gKe56iWG2iyuBPyQ13lWPeGzI3Uei qPdQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=IQnmFC1cuJST7emDApuALkwlrEbTx7n+o26OvtIZJos=; b=DZYah0GhHeX14OFtQQGDacdtiHGO2lo7cUHAAEwj1UTQvp4LTmpOY2j54BWWaRpMHI 6s3nU8CBb4BClws2joHcHSjhr9zua54yhiWQzQwB5PX/Zy04zGrUTWTuCYC7hg7ByGxP PvzyMl2iVX2/R5jMbjYvc6rpggT3oVxd//fed7qer8gPFl0fTM6tXTkUArxTWMdYw7zi t5e9FInEjgcRn/tLmTxWldMXoTHei79i1LBeqSMN/V57XLwnKXbUqgZdNpI69kjU9Ytc CcyXKrKvqdLCTM57ugw5OdO8qs6X3Kerys4xVYTRoWf7fqvwvQAZ/wMoFlzhbuwj93Au uVUQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=KltD8UMe; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id q11-20020a65494b000000b0037c714a3993si591588pgs.126.2022.03.08.18.16.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 08 Mar 2022 18:16:17 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=KltD8UMe; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id F1ABA634F; Tue, 8 Mar 2022 17:10:39 -0800 (PST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229795AbiCIBJv (ORCPT + 99 others); Tue, 8 Mar 2022 20:09:51 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46148 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230208AbiCIBHy (ORCPT ); Tue, 8 Mar 2022 20:07:54 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 22491141478 for ; Tue, 8 Mar 2022 16:48:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1646786923; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=IQnmFC1cuJST7emDApuALkwlrEbTx7n+o26OvtIZJos=; b=KltD8UMensiSkeHYcaOkSA0myDEmaOi1iZ7qL1FXZUji8uHPjfUgrNsDDkmbnrBbjPerKI b79vODSJZ/OgZKYvMPlnXJjFt5ByUbQ/4iX4DZa6Byi4PbbI4tMjXfEOfgYW244ciMHFyM 98uS98lcS/cfZsSo8wQnSM42pceEFJk= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-465-kRPUGlnAMU20UWIWpsgLiA-1; Tue, 08 Mar 2022 19:27:09 -0500 X-MC-Unique: kRPUGlnAMU20UWIWpsgLiA-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id A488F1854E21; Wed, 9 Mar 2022 00:27:07 +0000 (UTC) Received: from localhost.localdomain.com (unknown [10.22.8.54]) by smtp.corp.redhat.com (Postfix) with ESMTP id B33AA62D4F; Wed, 9 Mar 2022 00:26:26 +0000 (UTC) From: Nico Pache To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Rafael Aquini , Waiman Long , Baoquan He , Christoph von Recklinghausen , Don Dutile , "Herton R . Krzesinski" , David Rientjes , Michal Hocko , Andrea Arcangeli , Andrew Morton , tglx@linutronix.de, mingo@redhat.com, dvhart@infradead.org, dave@stgolabs.net, andrealmeid@collabora.com, peterz@infradead.org, Joel Savitz Subject: [PATCH v4] mm/oom_kill.c: futex: Don't OOM reap a process with a futex robust list Date: Tue, 8 Mar 2022 17:25:50 -0700 Message-Id: <20220309002550.103786-1-npache@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Spam-Status: No, score=-2.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The pthread struct is allocated on PRIVATE|ANONYMOUS memory [1] which can be targeted by the oom reaper. This mapping is also used to store the futex robust list; the kernel does not keep a copy of the robust list and instead references a userspace address to maintain the robustness during a process death. A race can occur between exit_mm and the oom reaper that allows the oom reaper to clear the memory of the futex robust list before the exit path has handled the futex death. Prevent the OOM reaper from concurrently reaping the mappings if the dying process contains a robust_list. If the dying task_struct does not contain a pointer in tsk->robust_list, we can assume there was either never one setup for this task struct, or futex_cleanup has properly handled the futex death and we can safely reap this memory. Reproducer: https://gitlab.com/jsavitz/oom_futex_reproducer [1] https://elixir.bootlin.com/glibc/latest/source/nptl/allocatestack.c#L370 Fixes: 212925802454 ("mm: oom: let oom_reap_task and exit_mmap run concurrently") Cc: Rafael Aquini Cc: Waiman Long Cc: Baoquan He Cc: Christoph von Recklinghausen Cc: Don Dutile Cc: Herton R. Krzesinski Cc: David Rientjes Cc: Michal Hocko Cc: Andrea Arcangeli Cc: Andrew Morton Cc: Cc: Cc: Cc: Cc: Cc: Co-developed-by: Joel Savitz Signed-off-by: Joel Savitz Signed-off-by: Nico Pache --- mm/oom_kill.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 989f35a2bbb1..37af902494d8 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -587,6 +587,25 @@ static bool oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm) goto out_unlock; } + /* Don't reap a process holding a robust_list as the pthread + * struct is allocated in userspace using PRIVATE | ANONYMOUS + * memory which when reaped before futex_cleanup() can leave + * the waiting process stuck. + */ +#ifdef CONFIG_FUTEX + bool robust = false; + + robust = tsk->robust_list != NULL; +#ifdef CONFIG_COMPAT + robust |= tsk->compat_robust_list != NULL; +#endif + if (robust) { + trace_skip_task_reaping(tsk->pid); + pr_info("oom_reaper: skipping task as it contains a robust list"); + goto out_finish; + } +#endif + trace_start_task_reaping(tsk->pid); /* failed to reap part of the address space. Try again later */ -- 2.35.1