Received: by 2002:a05:6a10:af89:0:0:0:0 with SMTP id iu9csp721469pxb; Fri, 14 Jan 2022 15:00:51 -0800 (PST) X-Google-Smtp-Source: ABdhPJwBEWr6kFRzieIxbxpZ2sdab04S2RB4X24U/cGde/0W9TypgkXNPNbZ33TzlRfsQoBsVcin X-Received: by 2002:a17:90a:c788:: with SMTP id gn8mr12954272pjb.33.1642201250970; Fri, 14 Jan 2022 15:00:50 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1642201250; cv=none; d=google.com; s=arc-20160816; b=0MipH8N5jgFD6+LymXcFJBqoAe2pst5LyKxkEL55kTsYnhcaaMGWIa1wJ6i9yNMKzz rFkjqvODJ/0YS2l/etkoIGDMCcpXHRFrAGSIyXY45mtJxeevIRwQRZuNmEJ21JRuaZq5 WgkWYk9smij5yZMAy5+Q6EpZwMSzgok+El0RS8d71IXMFag643RNuMIdGCxLsXbbbZ9i 5IM1EI6WGa21YH0BqBGKEikioahaVRFS2KQ/MZBg8KRGFwUXnSr18lLSvcp5frgBx5B1 sDHi+RkFcZi1bAMIbauHIoZzaHhxHysAzERTc2F1kMLhvFtQxjCUkKCtAwqlElO0p+CN ynEw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=Zgi7kgPZpPYPdqiACX0LHJAeUyEaVTMb9kNv1vAmtfg=; b=orJnY3EDydmg5Veu7WFz5CJOMT66bB26brA183uTgg+gfDo21oJytnzbvfSpZdiFLj udnAvHbRinqabPDTv3LoK5j9FmbUFkrdXVqEViEQSZYL3XA9PJdHqb37rp0R+7GbLzXo 77vEGI7B+ep397lDrCLu3rN4WiFs4quswGqaULABTfxAraCzomQ87lVbXG/kG+iYvutb Wka+apIFsM2QyNp4G6HcBdJKyoL/EPP8yE4yFb2yGrLjw/M0O2OKOqzIDjFGlB/xpoMY JgxHXv0TgCPZUd3pXwOEn79+IWmr8zZewmj0nQEZxvKtKUxsLSM/w02YHB+91p4UmW+q +WVA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=V4qsQokc; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id a17si6393206pls.569.2022.01.14.15.00.39; Fri, 14 Jan 2022 15:00:50 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=V4qsQokc; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243918AbiANSCp (ORCPT + 99 others); Fri, 14 Jan 2022 13:02:45 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:40227 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229775AbiANSCo (ORCPT ); Fri, 14 Jan 2022 13:02:44 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1642183363; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=Zgi7kgPZpPYPdqiACX0LHJAeUyEaVTMb9kNv1vAmtfg=; b=V4qsQokc45L+z7/mrSpxFEf0vem3cRHFiNxH7rhWy5482WrvfCgHOJ/RZozg/wZkQs4LzE uzCPkF4DvBHo3bHV0B0sEHqhqZZhNc+FE6/VYZstWa5/eaclqubbA5NMwHUJLRfNs6gTVV TOfRD7yKv9j6iytALtIh+j1EGHQJJl8= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-644-_8Ae0FRgO7-oBLoZmCQ1hw-1; Fri, 14 Jan 2022 13:02:40 -0500 X-MC-Unique: _8Ae0FRgO7-oBLoZmCQ1hw-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id D1F78101F000; Fri, 14 Jan 2022 18:02:38 +0000 (UTC) Received: from localhost.net (unknown [10.22.34.129]) by smtp.corp.redhat.com (Postfix) with ESMTP id 4980E34941; Fri, 14 Jan 2022 18:02:23 +0000 (UTC) From: Nico Pache To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, jsavitz@redhat.com, mhocko@suse.com Cc: peterz@infradead.org, tglx@linutronix.de, mingo@redhat.com, dvhart@infradead.org, dave@stgolabs.net, andrealmeid@collabora.com, longman@redhat.com Subject: [PATCH v3] mm/oom: do not oom reap task with an unresolved robust futex Date: Fri, 14 Jan 2022 13:01:35 -0500 Message-Id: <20220114180135.83308-1-npache@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org In the case that two or more processes share a futex located within a shared mmaped region, such as a process that shares a lock between itself and child processes, we have observed that when a process holding the lock is oom killed, at least one waiter is never alerted to this new development and simply continues to wait. This is visible via pthreads by checking the __owner field of the pthread_mutex_t structure within a waiting process, perhaps with gdb. We identify reproduction of this issue by checking a waiting process of a test program and viewing the contents of the pthread_mutex_t, taking note of the value in the owner field, and then checking dmesg to see if the owner has already been killed. As mentioned by Michal in his patchset introducing the oom reaper, commit aac4536355496 ("mm, oom: introduce oom reaper"), the purpose of the oom reaper is to try and free memory more quickly; however, In the case that a robust futex is being used, we want to avoid utilizing the concurrent oom reaper. This is due to a race that can occur between the SIGKILL handling the robust futex, and the oom reaper freeing the memory needed to maintain the robust list. In the case that the oom victim is utilizing a robust futex, and the SIGKILL has not yet handled the futex death, the tsk->robust_list should be non-NULL. This issue can be tricky to reproduce, but with the modifications of this patch, we have found it to be impossible to reproduce. Add a check for tsk->robust_list is non-NULL in wake_oom_reaper() to return early and prevent waking the oom reaper. Reproducer: https://gitlab.com/jsavitz/oom_futex_reproducer Co-developed-by: Joel Savitz Signed-off-by: Joel Savitz Signed-off-by: Nico Pache --- mm/oom_kill.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 1ddabefcfb5a..3cdaac9c7de5 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -667,6 +667,21 @@ static void wake_oom_reaper(struct task_struct *tsk) if (test_and_set_bit(MMF_OOM_REAP_QUEUED, &tsk->signal->oom_mm->flags)) return; +#ifdef CONFIG_FUTEX + /* + * If the ooming task's SIGKILL has not finished handling the + * robust futex it is not correct to reap the mm concurrently. + * Do not wake the oom reaper when the task still contains a + * robust list. + */ + if (tsk->robust_list) + return; +#ifdef CONFIG_COMPAT + if (tsk->compat_robust_list) + return; +#endif +#endif + get_task_struct(tsk); spin_lock(&oom_reaper_lock); -- 2.33.1