MIME-Version: 1.0
References: <20211207214902.772614-1-jsavitz@redhat.com> <20211207154759.3f3fe272349c77e0c4aca36f@linux-foundation.org>
 <ce63e509-dedf-ce00-cd12-2c67a3e650ba@redhat.com> <20211207175816.8c45ff5b82cb964ade82d6f1@linux-foundation.org>
In-Reply-To: <20211207175816.8c45ff5b82cb964ade82d6f1@linux-foundation.org>
From:   Joel Savitz <jsavitz@redhat.com>
Date:   Tue, 7 Dec 2021 22:38:00 -0500
Message-ID: <CAL1p7m48NfO0WfOcQ0F36Wk+YxiVMea-Pr1YCyVktViNfaxQQw@mail.gmail.com>
Subject: Re: [PATCH] mm/oom_kill: wake futex waiters before annihilating
 victim shared mutex
To:     Andrew Morton <akpm@linux-foundation.org>
Cc:     Nico Pache <npache@redhat.com>,
        linux-kernel <linux-kernel@vger.kernel.org>,
        Waiman Long <longman@redhat.com>, linux-mm@kvack.org,
        Peter Zijlstra <peterz@infradead.org>,
        Michal Hocko <mhocko@suse.com>
Content-Type: text/plain; charset="UTF-8"
Precedence: bulk

On Tue, Dec 7, 2021 at 8:58 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Tue, 7 Dec 2021 19:46:57 -0500 Nico Pache <npache@redhat.com> wrote:
>
> >
> >
> > On 12/7/21 18:47, Andrew Morton wrote:
> > > (cc's added)
> > >
> > > On Tue,  7 Dec 2021 16:49:02 -0500 Joel Savitz <jsavitz@redhat.com> wrote:
> > >
> > >> In the case that two or more processes share a futex located within
> > >> a shared mmaped region, such as a process that shares a lock between
> > >> itself and a number of child processes, we have observed that when
> > >> a process holding the lock is oom killed, at least one waiter is never
> > >> alerted to this new development and simply continues to wait.
> > >
> > > Well dang.  Is there any way of killing off that waiting process, or do
> > > we have a resource leak here?
> >
> > If I understood your question correctly, there is a way to recover the system by
> > killing the process that is utilizing the futex; however, the purpose of robust
> > futexes is to avoid having to do this.
>
> OK.  My concern was whether we have a way in which userspace can
> permanently leak memory, which opens a (lame) form of denial-of-service
> attack.
I believe the resources are freed when the process is killed so to my
knowledge there is no resource leak in the case we were investigating.

> > >From my work with Joel on this it seems like a race is occurring between the
> > oom_reaper and the exit signal sent to the OMM'd process. By setting the
> > futex_exit_release before these signals are sent we avoid this.
>
> OK.  It would be nice if the patch had some comments explaining *why*
> we're doing this strange futex thing here.  Although that wouldn't be
> necessary if futex_exit_release() was documented...
>
Sounds good, will send a v2 tomorrow

Best,
Joel Savitz