Received: by 2002:a17:90a:88:0:0:0:0 with SMTP id a8csp513533pja; Thu, 7 Nov 2019 00:43:00 -0800 (PST) X-Google-Smtp-Source: APXvYqyPk1zR+1ZkYc6GYLIsW0kdOL/9pW56urBFJRz+EPkJsl3yfs0oa4cN6/WdgDLPDpEKyujP X-Received: by 2002:a50:cb8a:: with SMTP id k10mr2213617edi.21.1573116180503; Thu, 07 Nov 2019 00:43:00 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1573116180; cv=none; d=google.com; s=arc-20160816; b=z2Ww4e88xcSyR3TbORHVdpZ+X98yQG5PSSyzGOj7L7DJKwRaLWoQRSfJ11/dgZk1Pi A6v07aqlYiwr1XedPJn+v2YwWaHTKzPtQLHImyfoimTElbjVBRtmg0ERaMetpdKGfIOw wry9WtMyn8noh//EG0D2y4oIKEa6Xv+m9vXAAiMjwUpW4C7GwV/u8KDR2IIvyag1c2tO vvIFZKpFsVYtxY3m4yOxCyoIZw8dBUxp3DNpUhDdYSEoXerGctn+3r9S/wsW/TQE1k19 5ZizofXhZog+BG2sC8FBhoEXDsSH7z8nkNCw0KVxFqaAt+Ofn9s9OujLL8tFBnwn/KTY Z51w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=xbyWc/DJqLH4rqqAp0MPTBp+HavJqMmW+l0ePiVYXRc=; b=IgG1QqFBBSw1B5d3RTa4Cnn6WZP+fvqVxP+ryx0xVp6hcDgAcpbSMBQB3ICIc5jkve 7hKMB/Nq1NkYRZaB06qrLnJIfAxBZqsAilrsIG3AVI+e1aIQDgJv0CjPfz2eGvWKFSq2 03AC1X6rbjS1MhsO+cCGDJ39J0DS7kRsMziPqYmi5HYf5P0NagfE+pbiqqW9TcgOEwwH kDJ3yGM7ek1yxTt5nsi2FoniBuzTdQ1rNPLRQ0/JPh1/AV1fZXYglsmPi7VD2SSaeyvW 60IzbNjvbzCv+We2XG76NfXOl6Is24bIPEhCaPcEY1Me1jls7WtBAJY/Bj8mY/Gx+XWk lRQQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b="FJ/dlMqT"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d10si1054094edk.122.2019.11.07.00.42.37; Thu, 07 Nov 2019 00:43:00 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b="FJ/dlMqT"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1733204AbfKGIll (ORCPT + 99 others); Thu, 7 Nov 2019 03:41:41 -0500 Received: from mail-wr1-f66.google.com ([209.85.221.66]:35973 "EHLO mail-wr1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727300AbfKGIll (ORCPT ); Thu, 7 Nov 2019 03:41:41 -0500 Received: by mail-wr1-f66.google.com with SMTP id r10so2018325wrx.3 for ; Thu, 07 Nov 2019 00:41:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=xbyWc/DJqLH4rqqAp0MPTBp+HavJqMmW+l0ePiVYXRc=; b=FJ/dlMqTBdGUlwcQQtsZ/GU+lGszdgjvMukf+332caSNe5ELewYypqhFfDFRP08pCo 33ETmXAfmC/SdiZgGOOvCkYn13BarlP8EHSvTddmFd/cGMRtkqOYCrHMnEVC7vEpIT9Z vPWbI5dO2zx1V2Sm1H3EQYY7jrPaG3G4lP+ML9LMA8KMGfd6pOmIB6CwyTsqI92AHxB6 eaiSOm57BaUnn4adhGQEJ1ZTZtpygJ+y67zGT5ckt5GaTE1T1ynCbd2MTYfhz2VM9l6B X5HKq5rLyCDepYGQ4wqoko+3GsekLX/z2F7VlwLCRXXvUHEQ7U2DptOmeYG6dRSdhK1N VNhw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to:user-agent; bh=xbyWc/DJqLH4rqqAp0MPTBp+HavJqMmW+l0ePiVYXRc=; b=Z6n2T17t9RwtOP+ja/3LMRGA/zEOhuU7UPp6SxNAliIeoYp4BSpL17oHC7uxU0ccrF 5rmOB1oMwHkPDFhVdcb82dBayyBOBztvGTvi5OgQenou46iwAUhyiZwkP9y7u6vX/zns UMVozv0LJaV+2YeVaoVmx9MVt10qMQYcu2gERrPjPISeQ8fnG1MOlOgtTJrTYdtVkGeC zDBEloy6xwgrponMbOlwaiXk801FDm/hGA2Q+mzxYIK3m8hlZNrS3zoOYotRgfFZMPSK t9X59QLuctv7BE0tqnxvNdK5kdjFs04znQl3WyHqW73N96ROUCeWy1ls0g2QaP6ymnvj t3cg== X-Gm-Message-State: APjAAAWdJ0NPFAllk1PhJPKlELfQHQmr0i1ukqeLdhC03pJqY3bT3VCP yUFfBWErE4GfkhvEZ1xwGhY= X-Received: by 2002:adf:ec4b:: with SMTP id w11mr1609520wrn.243.1573116099293; Thu, 07 Nov 2019 00:41:39 -0800 (PST) Received: from gmail.com (54033286.catv.pool.telekom.hu. [84.3.50.134]) by smtp.gmail.com with ESMTPSA id a8sm1309533wme.11.2019.11.07.00.41.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Nov 2019 00:41:38 -0800 (PST) Date: Thu, 7 Nov 2019 09:41:36 +0100 From: Ingo Molnar To: Thomas Gleixner Cc: LKML , Peter Zijlstra , Darren Hart , Yi Wang , Yang Tao , Oleg Nesterov , Florian Weimer , Carlos O'Donell , Alexander Viro Subject: Re: [patch 00/12] futex: Cure robust/PI futex exit races Message-ID: <20191107084136.GH30739@gmail.com> References: <20191106215534.241796846@linutronix.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20191106215534.241796846@linutronix.de> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Thomas Gleixner wrote: > This series addresses a couple of robust/PI futex exit races: > > 1) The unlock races debugged and fixed by Yi and Yang > > These races are really subtle and I'm still puzzled how to trigger them > reliably enough to decode them. > > The basic issue is that: > > A) An unlocking task can be killed between clearing the user space > futex value and calling futex(FUTEX_WAKE). > > B) A woken up waiter can be killed before it can acquire the futex > after returning to user space. > > In both cases the futex value is 0 and due to that the robust list exit > code refuses to wake up waiters as the futex is not owned by the > exiting task. As a consequence all other waiters might be blocked > forever. > > 2) Oleg provided a test case which causes an infinite loop in the > futex_lock_pi() code. > > The problem there is that an exiting task might be preempted by a > waiter in a state which makes the waiter busy wait for the exiting task > to complete the robust/PI exit cleanup code. > > That's obviously impossible when the waiter has higher priority than > the exiting task and both are pinned on the same CPU resulting in a > live lock. > > #1 is a straight forward and simple fix > > The solution Yi and Yang provided looks solid and in the worst case > causes a spurious wakeup of a waiter which is nothing to worry about > as all waiter code has to be prepared for that anyway. > > #2 is more complex > > In the current implementation there is no way to block until the exiting > task has finished the cleanup. > > To fix this there is quite some code reshuffling required which at the > same time is a valuable cleanup. > > The final solution is to guard the futex exit handling with a per task > mutex and make the waiter block on that mutex until the exiting task has > the cleanup completed. > > Details why a simpler solution is not feasible can be found here: > > https://lore.kernel.org/r/20191105152728.GA5666@redhat.com > > Ignore my confusion of fork vs. vfork at the beginning of the thread. > Futexes do that to human brains. :) > > The following series addresses both issues. > > Patch 1 is a slightly polished version of the original Yi and Yang > submission. It is included for completeness sake and because it > creates conflicts with the larger surgery which fixes issue #2. > > Aside of that a few eyeballs more on that subtlety are definitely not > a bad thing especially as this has a user space component in it. > > The rest of the series addresses issue #2 which is more or less a kernel > only problem, but extra eyeballs are appreciated. > > I'm certainly not proud about the solution for #2 but it's the best I could > come up with without violating the user/kernel state consistency > constraints. I really like the whole series - this is how it should have been implemented originally, but the exit scenarios 'looked' so simple so it was just open-coded ... Mea culpa. :-) As to ->futex_exit_mutex: that's really just a consequence of the ABI, and a lot cleaner than all the previous pretense that these exit ops are atomic - which they fundamentally aren't. Haven't tested the series beyond build coverage, but the high level principles behind the whole series look very sound to me: Reviewed-by: Ingo Molnar Thanks, Ingo