Received: by 2002:a05:6a10:2726:0:0:0:0 with SMTP id ib38csp14050pxb; Wed, 30 Mar 2022 21:28:28 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxFlC32Y1KX/rat783fxpo8IpmRgvuKdAzfp4W8sAV/VFvrVeRtISRT8rpKqgwIE1GTp7aP X-Received: by 2002:aa7:9afc:0:b0:4fa:8750:152e with SMTP id y28-20020aa79afc000000b004fa8750152emr3445389pfp.52.1648700908065; Wed, 30 Mar 2022 21:28:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1648700908; cv=none; d=google.com; s=arc-20160816; b=Cuyrh2M2QBnc7vqL3j7K2jyG0+E7oj2kxNOZqibj4KOgOJct5qZNOFP0j34y8sA7DZ HewkqiyUlwqcaL0fL9+HoXhCm139xxFMHXZFRrvWWXBRTArr38YaGPz2SS2vFoOjxnK6 6kHwqNbihF7b2oO4RJ+dWCH3Ss2NuSZNfvShyTFlRKNH1D18MyanMgON4K9paRIXT/n2 nckfU2I0WwhLK3s/4GsSRYKlU29288ZD68211OksuSoTXuIef1gugkHM7NqeICdwcAg6 eW5SyGwHZtuQouYsd6SqBtJY3iQnSmljag3NxZJQybQVSEtzQoUwy2t5dBRWmfF+TCHu /eJg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:references :cc:to:from:content-language:subject:user-agent:mime-version:date :message-id:dkim-signature; bh=nq2QaWKmjRi7ediHTAyvB1XNhkWF7u/Z5LX0xWbEYz0=; b=gD68i5H59wnnKdhDWenPJP1V7bXSf+6sgIywisCQVABsd8MdMoQgE8KPV/9Hey+RzL XUcOD5jZWfGZS0CZX/ZNLecSOrIMAV2jT/6H2wfUI9kNmbbrjNmu1T19kENWCJFOs3mf mqw9t5ehd59eV8bAOmO29mHHUo12xON7BR4N48thiX6POwhuZXeJr3ZkLCPp/5Sug5Ib N+DCnAAQs4xKwNrEtAbkxt9CW3YnptW1mps+e+qYF4pWK5agNzYxTw7ndvV9NIT+wk/c 1tuPzYyafifWCPDsIFQuzx33oVY2pfqdRoHrSQdLaPIG5fA2qwmVxyNg1lrnNj73/nxn euGg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=bE1zB25n; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id d9-20020a056a0024c900b004fb1544bc72si25697061pfv.353.2022.03.30.21.28.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 30 Mar 2022 21:28:28 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=bE1zB25n; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id E6F141C944E; Wed, 30 Mar 2022 20:23:17 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233445AbiC3ViU (ORCPT + 99 others); Wed, 30 Mar 2022 17:38:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59694 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233375AbiC3ViQ (ORCPT ); Wed, 30 Mar 2022 17:38:16 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id B1BC34B43D for ; Wed, 30 Mar 2022 14:36:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1648676189; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=nq2QaWKmjRi7ediHTAyvB1XNhkWF7u/Z5LX0xWbEYz0=; b=bE1zB25nwmW1HYn+Wc3iau2sZH1t3oAPOPTIP57BiHnZaEgf51r+GpUcmkwNJaEoSHdk59 XDDf9Ph2OIHBpP26Cuf0MQl83/jPMJ0sa/AqfGLZFSCEFVLsqwc01H8nNHfnS2uyehrTUq TfpUcPF6nBvawckKvyP0dowRrUIaiBA= Received: from mail-io1-f70.google.com (mail-io1-f70.google.com [209.85.166.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-286-zi63WYE6PUiAttXTR99WnA-1; Wed, 30 Mar 2022 17:36:28 -0400 X-MC-Unique: zi63WYE6PUiAttXTR99WnA-1 Received: by mail-io1-f70.google.com with SMTP id z23-20020a6b0a17000000b00649f13ea3a7so13453687ioi.23 for ; Wed, 30 Mar 2022 14:36:28 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:from:to:cc:references:in-reply-to :content-transfer-encoding; bh=nq2QaWKmjRi7ediHTAyvB1XNhkWF7u/Z5LX0xWbEYz0=; b=wYYwkO1mooI0McUrwY12ErOfXsCisDiJPWnbRnN+ogXRn6f5T7ndbQ2PpjYuRSncAq bhgsaIoQPdGJy1x0nS1mYVkvz0C2NmMg8fWPRJ9xsd55BAPVRTnAiIzJLmwClrjrJuE0 szio/wrc4GcA9iJeRMs/4y3I/ImsVT4bY2HQeG+OYPfzdDSAHw78v7AtfFp2udD8biPj U1r9N1gusU9UAhjBJPwpkKvj37MBmdtrHkoVMPLPBeZ+VgNcYU3fn2n3BTfTOohetdgF KVlBOQs79kV7bgLn+S/GA+e9koIkee7/vxziHkOUquZdt8mf8nxU5hw7Jrnqp8rM30bd HMBA== X-Gm-Message-State: AOAM530+KwMYk0eVssYbF4wk2z3mN+6f3nDJxL+ert9k00Z2xFEQPxBx ta3Nkp5cUY8aylhtmd7yn3wUIsA+ZDUH89fu2h6PYqYtGo5BrWNFIJm2TWAFaD+1FuJBArduCC9 MZTmMFH0svAA19hNxRw6rd18Y X-Received: by 2002:a05:6638:d0c:b0:31a:5d8a:c013 with SMTP id q12-20020a0566380d0c00b0031a5d8ac013mr1095305jaj.132.1648676187677; Wed, 30 Mar 2022 14:36:27 -0700 (PDT) X-Received: by 2002:a05:6638:d0c:b0:31a:5d8a:c013 with SMTP id q12-20020a0566380d0c00b0031a5d8ac013mr1095294jaj.132.1648676187476; Wed, 30 Mar 2022 14:36:27 -0700 (PDT) Received: from ?IPV6:2601:280:4400:a2e0:7336:512c:930d:4f0e? ([2601:280:4400:a2e0:7336:512c:930d:4f0e]) by smtp.gmail.com with ESMTPSA id ay18-20020a5d9d92000000b0064c77f6aaecsm6121018iob.3.2022.03.30.14.36.26 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 30 Mar 2022 14:36:27 -0700 (PDT) Message-ID: <0991b55e-3d69-a591-9bf4-26013b6ba843@redhat.com> Date: Wed, 30 Mar 2022 15:36:25 -0600 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.5.0 Subject: Re: [PATCH v5] mm/oom_kill.c: futex: Close a race between do_exit and the oom_reaper Content-Language: en-US From: Nico Pache To: Michal Hocko , Thomas Gleixner Cc: Davidlohr Bueso , linux-mm@kvack.org, Andrea Arcangeli , Joel Savitz , Andrew Morton , linux-kernel@vger.kernel.org, Rafael Aquini , Waiman Long , Baoquan He , Christoph von Recklinghausen , Don Dutile , "Herton R . Krzesinski" , Ingo Molnar , Peter Zijlstra , Darren Hart , Andre Almeida , David Rientjes References: <20220318033621.626006-1-npache@redhat.com> <20220322004231.rwmnbjpq4ms6fnbi@offworld> <20220322025724.j3japdo5qocwgchz@offworld> <87bkxyaufi.ffs@tglx> <87zglha9rt.ffs@tglx> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,NICE_REPLY_A,RDNS_NONE,SPF_HELO_NONE, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 3/30/22 12:18, Nico Pache wrote: > > > On 3/30/22 03:18, Michal Hocko wrote: >> Nico, >> >> On Wed 23-03-22 10:17:29, Michal Hocko wrote: >>> Let me skip over futex part which I need to digest and only focus on the >>> oom side of the things for clarification. >>> >>> On Tue 22-03-22 23:43:18, Thomas Gleixner wrote: >> [...] >>>> You can easily validate that by doing: >>>> >>>> wake_oom_reaper(task) >>>> task->reap_time = jiffies + HZ; >>>> queue_task(task); >>>> wakeup(reaper); >>>> >>>> and then: >>>> >>>> oom_reap_task(task) >>>> now = READ_ONCE(jiffies); >>>> if (time_before(now, task->reap_time) >>>> schedule_timeout_idle(task->reap_time - now); >>>> >>>> before trying to actually reap the mm. >>>> >>>> That will prevent the enforced race in most cases and allow the exiting >>>> and/or killed processes to cleanup themself. Not pretty, but it should >>>> reduce the chance of the reaper to win the race with the exiting and/or >>>> killed process significantly. >>>> >>>> It's not going to work when the problem is combined with a heavy VM >>>> overload situation which keeps a guest (or one/some it's vCPUs) away >>>> from being scheduled. See below for a discussion of guarantees. >>>> >>>> If it failed to do so when the sleep returns, then you still can reap >>>> it. >>> >>> Yes, this is certainly an option. Please note that the oom_reaper is not >>> the only way to trigger this. process_mrelease syscall performs the same >>> operation from the userspace. Arguably process_mrelease could be used >>> sanely/correctly because the userspace oom killer can do pro-cleanup >>> steps before going to final SIGKILL & process_mrelease. One way would be >>> to send SIGTERM in the first step and allow the victim to perform its >>> cleanup. >> >> are you working on another version of the fix/workaround based on the >> discussion so far? > > We are indeed! Sorry for the delay we've been taking the time to do our due > diligence on some of the claims made. We are also spending time rewriting the > reproducer to include more test cases that Thomas brought up. > > Ill summarize here, and reply to the original emails in more detail.... > > Firstly, we have implemented & tested the VMA skipping... it does fix our case. > Thomas brought up a few good points about the robust list head and the potential > waiters being in different VMAs; however, I think its a moot point, given that > the locks will only be reaped if allocated as ((private|anon)|| !shared). Sorry... not completely moot. As Thomas pointed out, a robust list with the following structure will probably fail to recover its waiters: TLS (robust head, skip)* --> private lock (reaped) --> shared lock (not reaped) We are working on getting a test case with multiple locks and mixed mapping types to prove this. Skipping the robust list head VMA will be beneficial in cases were the robust list is full of shared locks: TLS (robust head, skip)* --> shared lock(not reaped) --> shared lock(not reaped) -- Nico