Received: by 2002:a05:6a10:af89:0:0:0:0 with SMTP id iu9csp666504pxb; Fri, 14 Jan 2022 13:33:45 -0800 (PST) X-Google-Smtp-Source: ABdhPJzKc3/7lJjYMaW1gVzUfXWAP74xtxAa9Dw6vr3KFGbbmR18kCRnFHbDD/rzob8ZJqXGo5e5 X-Received: by 2002:a62:5f07:0:b0:4be:3e19:6c08 with SMTP id t7-20020a625f07000000b004be3e196c08mr11049171pfb.71.1642196024839; Fri, 14 Jan 2022 13:33:44 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1642196024; cv=none; d=google.com; s=arc-20160816; b=MD0ugr1MFVEYOKgAheZOkkBUOiVrOtL8TNnuMmoKZQWZu74Efh4Y854oZggtNhYa+F zOQ+tINUDeJ+DVt4p20Lpj/50uI7DIe2g/l65YkAHSITxvyF7/gpyVE7kxE2WuSXUcEa E4hgdQnFa+pdPqOx2ai8K3arkdTrpFOmDv9YIcJ9EqMzH6vvZlwcbT5evx0D1IfqD9DP k0uFsRHCRCjeIkEK4XbyP1qjmarI53qPbRqYsk7zCDnyaMwOuHZ5NSVdfRvk2iN4CV3b BViqPJ/92++bybHeLGgxEE4Cbc7GgdGYq+fWBv+0KieK0x+nVSw3/E3qIrlIaa19y2qq G9DA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=RKV19dENBhFl1olZIBfaFtSkXrSPs/HdlOLwrwpt/XY=; b=qQXrYTJkePA4aeZBFkQM01ATC+hvfjp8MCM9LJZgZpTpbe80DFA3U2qfkpbWuqs+sA VTa8fHLqSAW9Omvgy4RZJqh7WpBOBAhedTvmxT49DV+OO9YFqkak91qrwTou7PgfL5rN qZCJclTuDQnbY5na14JAbNB1u/LC2xW6pjtuaN1RE0e7AOcdJlU0S7qwcd0aRTjEc0vE RFOdVF2/oqZCtDj7+iqvK3Ki40oheLJ9hHGHeNxflghx82SBPf5zMVEDglvOuoBRVf6y xE8opVGuG13tU1AWMOgZi39b1lObo+ltpEC1DtRcno6WO6y1wJ2tXEWYf70FQ3pHrWTN Ksog== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="JI9pf9Z/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id fs10si12310658pjb.75.2022.01.14.13.33.32; Fri, 14 Jan 2022 13:33:44 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="JI9pf9Z/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238344AbiANOkQ (ORCPT + 99 others); Fri, 14 Jan 2022 09:40:16 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:42571 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238631AbiANOkO (ORCPT ); Fri, 14 Jan 2022 09:40:14 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1642171213; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=RKV19dENBhFl1olZIBfaFtSkXrSPs/HdlOLwrwpt/XY=; b=JI9pf9Z/tDn46TkJP/+RJ0h2O4YMZd2Arpzc6BQmKRS/zilusCTeQrYUF7R1S6Oux/w2wN czOoOH2qoElfhSL2p8jHWBoOnA3I5QuiNY+4Vp7qw4a9lHwBwXEMwGq6jNaYFOukIc0weQ rBXivjvO/L9ksiB/1AWXo2+RN5RgnBI= Received: from mail-yb1-f200.google.com (mail-yb1-f200.google.com [209.85.219.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-154-FKM1wCRiPlmQAJpHhpt-vA-1; Fri, 14 Jan 2022 09:40:12 -0500 X-MC-Unique: FKM1wCRiPlmQAJpHhpt-vA-1 Received: by mail-yb1-f200.google.com with SMTP id g7-20020a25bdc7000000b00611c616bc76so9335865ybk.5 for ; Fri, 14 Jan 2022 06:40:12 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=RKV19dENBhFl1olZIBfaFtSkXrSPs/HdlOLwrwpt/XY=; b=bAUZPTfwzoPJOEy5LD5PdlOeNFTDiBCkEVENg8QfNEQXPYFjNndlcKRfhuAo3vDLrR b6+TKJPg/c69rwu3a+EkqVXHjPpnSEniw2YnQ3V0x00U21TgFl5LxnlVTXplt9p/F3xe 6afRKy3f5njh2M5GsyEu4cw+NJh/pEK+cMklAtzslH2Knl76W6T88ttzquoXrIuhBWA/ k2kljVVgqArXjazoyKhyfYLHU6gHIyyahsm9uBJZfdYsWGzhGwPtUHzojtNa7a4e0dcc yB0n4YbOp0Ge/LaLK52o7SG98aArISOJF91S1Ilz3LZTaWN+0m53J96a7YAaN8US2f01 stzQ== X-Gm-Message-State: AOAM5318gj9J3jbOIEUdIf/NXdWQeJ8x+ZA6DIPOGBDcmpVzdaEeQlWY 46fCkdXw1XrF4U9F8+M6ZoP5IEQ/PnpzyskYJBnVwPQkzrVHJ2UBZcC8HMah16cqspvVRdpAYmf 1ipNzjGryoQi5dYEwalNmuHgcLuuDPQNmqamAWkRC X-Received: by 2002:a05:6902:1029:: with SMTP id x9mr13850662ybt.51.1642171211991; Fri, 14 Jan 2022 06:40:11 -0800 (PST) X-Received: by 2002:a05:6902:1029:: with SMTP id x9mr13850634ybt.51.1642171211744; Fri, 14 Jan 2022 06:40:11 -0800 (PST) MIME-Version: 1.0 References: <20211207214902.772614-1-jsavitz@redhat.com> <20211207154759.3f3fe272349c77e0c4aca36f@linux-foundation.org> In-Reply-To: From: Joel Savitz Date: Fri, 14 Jan 2022 09:39:55 -0500 Message-ID: Subject: Re: [PATCH] mm/oom_kill: wake futex waiters before annihilating victim shared mutex To: Michal Hocko Cc: Andrew Morton , linux-kernel , Waiman Long , linux-mm@kvack.org, Nico Pache , Peter Zijlstra , Thomas Gleixner , Ingo Molnar , Darren Hart , Davidlohr Bueso , =?UTF-8?Q?Andr=C3=A9_Almeida?= Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > What has happened to the oom victim and why it has never exited? What appears to happen is that the oom victim is sent SIGKILL by the process that triggers the oom while also being marked as an oom victim. As you mention in your patchset introducing the oom reaper in commit aac4536355496 ("mm, oom: introduce oom reaper"), the purpose the the oom reaper is to try and free more memory more quickly than it otherwise would have been by assuming anonymous or swapped out pages won't be needed in the exit path as the owner is already dying. However, this assumption is violated by the futex_cleanup() path, which needs access to userspace in fetch_robust_entry() when it is called in exit_robust_list(). Trace_printk()s in this failure path reveal an apparent race between the oom reaper thread reaping the victim's mm and the futex_cleanup() path. There may be other ways that this race manifests but we have been most consistently able to trace that one. Since in the case of an oom victim using robust futexes the core assumption of the oom reaper is violated, we propose to solve this problem by either canceling or delaying the waking of the oom reaper thread by wake_oom_reaper in the case that tsk->robust_list is non-NULL. e.g. the bug does not reproduce with this patch (from npache@redhat.com): diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 989f35a2bbb1..b8c518fdcf4d 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -665,6 +665,19 @@ static void wake_oom_reaper(struct task_struct *tsk) if (test_and_set_bit(MMF_OOM_REAP_QUEUED, &tsk->signal->oom_mm->flags)) return; +#ifdef CONFIG_FUTEX + /* + * don't wake the oom_reaper thread if we still have a robust list to handle + * This will then rely on the sigkill to handle the cleanup of memory + */ + if(tsk->robust_list) + return; +#ifdef CONFIG_COMPAT + if(tsk->compat_robust_list) + return; +#endif +#endif + get_task_struct(tsk); spin_lock(&oom_reaper_lock); Best, Joel Savitz