Received: by 2002:a05:6a10:af89:0:0:0:0 with SMTP id iu9csp667194pxb; Fri, 14 Jan 2022 13:34:38 -0800 (PST) X-Google-Smtp-Source: ABdhPJzQG9FS0qpLM03/F6YMAJH2B//Nr+7mBg0Vsqyys5sVlVhPZjqjEKZ8g8CEDAY5TgRaI9Hs X-Received: by 2002:a05:6402:60f:: with SMTP id n15mr2627449edv.295.1642196078210; Fri, 14 Jan 2022 13:34:38 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1642196078; cv=none; d=google.com; s=arc-20160816; b=grqbM+R+mS5xboMUkSzRIjkoeMxT2XUpxJ8GMg6vRMUzeyV8n102IW4nfVOXOs62m/ TJ2Z3lXoadcL4JHGdPy7P6psN6SpBLtaYELiHbCoqs2wKgSBmbRmLxVO0W4KkdVE3VMT 7tScfDFJH7tuhRJaQ4RF8ssXa+oeHB4FrWZmJvx3PHq+P3eANGcMk648fixc2H7AEgun 0wyqSJ6Q2PamHXo+dhAusGqB/q8VY4TXILrMyNn0WU5MUiO9K74U2nmNN5x0xi4TxM6A fPHDVntxoWm584Gl6hfOLqGlWsWbgI9uhdlGFVAiBt4YUCNNXgKPvo2ty1AMeU+VS0OV cMBw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:references :cc:to:from:content-language:subject:user-agent:mime-version:date :message-id:dkim-signature; bh=HOzdRMknZ5PmyxcIgplcnkj1XzKRyAcCC7j04FkTM74=; b=VhXd1L8RZP4AFQ36/wJSFnIwYeL4/O/fklisIFZUi4O+NzSqYsQGN3YDDgjl+G9f/Q ZsYiCOgrJjpH6t80UgRnlCdyHmVY5A004Pl41JHkFX2ONsDuJgjPPIN2IuAGgU3kDJbp U52ijqlJka5DcrNTSKct2RBEJjNJvTpvRHHXjuOjZGLxQDN9fw4NEs6PTWqIs6gb+xoT Y5NrB1pyxLR4y+5GNVOvvw+mHeLgRmNwG0yra4FgrEvmaz9unYpYgF2gFumtUtr+Dt+3 OB7NbbZxTCrv0HP9UCg0MlBoyIxmeSK4dSdkRNQucsi2Ij1YZ2gUFHF6s+vD2co9Pktn 2zBQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=BJN25dsk; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id g4si3868988ejt.430.2022.01.14.13.34.13; Fri, 14 Jan 2022 13:34:38 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=BJN25dsk; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242129AbiANO6n (ORCPT + 99 others); Fri, 14 Jan 2022 09:58:43 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:60776 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242086AbiANO6m (ORCPT ); Fri, 14 Jan 2022 09:58:42 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1642172322; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=HOzdRMknZ5PmyxcIgplcnkj1XzKRyAcCC7j04FkTM74=; b=BJN25dskzTPhOgHdI5r+HTLcjOy82DDwNkwfp+jwr4kuXIMIsWbvpbwz8fmxSl7O/mdvU8 wLzbSsqvUci6uNELnrwgfYAeVkrh/QP2FG4tY+bUMRrsenlp3XDOKE10auUnzteTyEu3Dj rDslcMqaljNNoAIQYonZaloS6B40iwo= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-641-lLo09EyNMbe6qgC0dC3laQ-1; Fri, 14 Jan 2022 09:58:39 -0500 X-MC-Unique: lLo09EyNMbe6qgC0dC3laQ-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id E4BD11083F66; Fri, 14 Jan 2022 14:58:36 +0000 (UTC) Received: from [10.22.33.90] (unknown [10.22.33.90]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9029B753E1; Fri, 14 Jan 2022 14:58:35 +0000 (UTC) Message-ID: Date: Fri, 14 Jan 2022 09:58:35 -0500 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.4.0 Subject: Re: [PATCH] mm/oom_kill: wake futex waiters before annihilating victim shared mutex Content-Language: en-US From: Waiman Long To: Joel Savitz , Michal Hocko Cc: Andrew Morton , linux-kernel , linux-mm@kvack.org, Nico Pache , Peter Zijlstra , Thomas Gleixner , Ingo Molnar , Darren Hart , Davidlohr Bueso , =?UTF-8?Q?Andr=c3=a9_Almeida?= References: <20211207214902.772614-1-jsavitz@redhat.com> <20211207154759.3f3fe272349c77e0c4aca36f@linux-foundation.org> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 1/14/22 09:55, Waiman Long wrote: > On 1/14/22 09:39, Joel Savitz wrote: >>> What has happened to the oom victim and why it has never exited? >> What appears to happen is that the oom victim is sent SIGKILL by the >> process that triggers the oom while also being marked as an oom >> victim. >> >> As you mention in your patchset introducing the oom reaper in commit >> aac4536355496 ("mm, oom: introduce oom reaper"), the purpose the the >> oom reaper is to try and free more memory more quickly than it >> otherwise would have been by assuming anonymous or swapped out pages >> won't be needed in the exit path as the owner is already dying. >> However, this assumption is violated by the futex_cleanup() path, >> which needs access to userspace in fetch_robust_entry() when it is >> called in exit_robust_list(). Trace_printk()s in this failure path >> reveal an apparent race between the oom reaper thread reaping the >> victim's mm and the futex_cleanup() path. There may be other ways that >> this race manifests but we have been most consistently able to trace >> that one. >> >> Since in the case of an oom victim using robust futexes the core >> assumption of the oom reaper is violated, we propose to solve this >> problem by either canceling or delaying the waking of the oom reaper >> thread by wake_oom_reaper in the case that tsk->robust_list is >> non-NULL. >> >> e.g. the bug does not reproduce with this patch (from >> npache@redhat.com): >> >> diff --git a/mm/oom_kill.c b/mm/oom_kill.c >> index 989f35a2bbb1..b8c518fdcf4d 100644 >> --- a/mm/oom_kill.c >> +++ b/mm/oom_kill.c >> @@ -665,6 +665,19 @@ static void wake_oom_reaper(struct task_struct >> *tsk) >>          if (test_and_set_bit(MMF_OOM_REAP_QUEUED, >> &tsk->signal->oom_mm->flags)) >>                  return; >> >> +#ifdef CONFIG_FUTEX >> +       /* >> +        * don't wake the oom_reaper thread if we still have a robust >> list to handle >> +        * This will then rely on the sigkill to handle the cleanup >> of memory >> +        */ >> +       if(tsk->robust_list) >> +               return; >> +#ifdef CONFIG_COMPAT >> +       if(tsk->compat_robust_list) >> +               return; >> +#endif >> +#endif >> + >>          get_task_struct(tsk); >> >>          spin_lock(&oom_reaper_lock); > > OK, that can explain why the robust futex is not properly cleaned up. > Could you post a more formal v2 patch with description about the > possible race condition? > It should be v3. Sorry for the mix-up. Cheers, Longman