Received: by 2002:a25:31c3:0:0:0:0:0 with SMTP id x186csp177991ybx; Wed, 6 Nov 2019 15:13:11 -0800 (PST) X-Google-Smtp-Source: APXvYqxugHBrf8mZhgjJTBoG1jput+L0+xCmo8n9PuqJrVSzaF/MMHNztoCMRbY06E+yRc9L4ucG X-Received: by 2002:a05:6402:1250:: with SMTP id l16mr276068edw.219.1573081991140; Wed, 06 Nov 2019 15:13:11 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1573081991; cv=none; d=google.com; s=arc-20160816; b=KtvLJznlraf/ybQclfjBGaimCY/uPxzklEAir+b3mB+eYGy3OHvfoWycc96oftsdyZ 7Z0sCecQAlgqfHV3cI/JUdFGsfZnM8G/lhO+3sr08i8vO9TkMzFpfjh1q3jahHagQ5je Ev3ZqcUUlgUWsjZeLbj52CsHLH9P+DufwTEJmIfLJH72kSdXrYjxg9pLSm08Pmul0vHZ x8viMy00xg/AxXqQaJ+ULGu3zJy2kzk6xGUqjdubaOebaM9DnLYgLcaSeJOE0NYM6nsy H4UjwKVh4biRJqAkcJUXzf/rw3AfAGEHbOsXUPdMzWJmlQdyU4XLdAPXiEyvhttzO5ek Atsw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:subject:cc:to:from:date:user-agent :message-id; bh=RavJKQLrLFTpk663plfLi3KpzKeNyDGbbFojoW99T3s=; b=RhGX7CQMbr46AOPj4bYaAdtu/NrpnnTarilturZ5ODp97OfZ4kZ6cGqtthJwozNYD1 6AV5URre3c/y4JlO2so9iDdEdxsD2l8kWxcDvSY/vqsWM3R6s8ASyKbWXWfQqWf/dG/q GxJztNweGf+16y1N0wx8OFwv/OKolIMoV8wSN/k+ox42d7XcC3FmGGbb+I/jAZW0xvDb j3J4SvQQFjo0epo/Wv3t+lhhX/dyUc4hx1q+HCnkvhRcJxpzsi5MEtRSjmj7QcPgvG7F BUEc7HbL5pYGN1bw0zwtFeKUrWzX7rB+CrrqeYoHW7TZQtWK1zBjP+RlERkUxuDYtD0/ lvMg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y3si45490edv.423.2019.11.06.15.12.48; Wed, 06 Nov 2019 15:13:11 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732930AbfKFXIn (ORCPT + 99 others); Wed, 6 Nov 2019 18:08:43 -0500 Received: from Galois.linutronix.de ([193.142.43.55]:45503 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727656AbfKFXIH (ORCPT ); Wed, 6 Nov 2019 18:08:07 -0500 Received: from localhost ([127.0.0.1] helo=nanos.tec.linutronix.de) by Galois.linutronix.de with esmtp (Exim 4.80) (envelope-from ) id 1iSUP8-0004ik-9K; Thu, 07 Nov 2019 00:07:59 +0100 Message-Id: <20191106215534.241796846@linutronix.de> User-Agent: quilt/0.65 Date: Wed, 06 Nov 2019 22:55:34 +0100 From: Thomas Gleixner To: LKML Cc: Peter Zijlstra , Ingo Molnar , Darren Hart , Yi Wang , Yang Tao , Oleg Nesterov , Florian Weimer , Carlos O'Donell , Alexander Viro Subject: [patch 00/12] futex: Cure robust/PI futex exit races Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This series addresses a couple of robust/PI futex exit races: 1) The unlock races debugged and fixed by Yi and Yang These races are really subtle and I'm still puzzled how to trigger them reliably enough to decode them. The basic issue is that: A) An unlocking task can be killed between clearing the user space futex value and calling futex(FUTEX_WAKE). B) A woken up waiter can be killed before it can acquire the futex after returning to user space. In both cases the futex value is 0 and due to that the robust list exit code refuses to wake up waiters as the futex is not owned by the exiting task. As a consequence all other waiters might be blocked forever. 2) Oleg provided a test case which causes an infinite loop in the futex_lock_pi() code. The problem there is that an exiting task might be preempted by a waiter in a state which makes the waiter busy wait for the exiting task to complete the robust/PI exit cleanup code. That's obviously impossible when the waiter has higher priority than the exiting task and both are pinned on the same CPU resulting in a live lock. #1 is a straight forward and simple fix The solution Yi and Yang provided looks solid and in the worst case causes a spurious wakeup of a waiter which is nothing to worry about as all waiter code has to be prepared for that anyway. #2 is more complex In the current implementation there is no way to block until the exiting task has finished the cleanup. To fix this there is quite some code reshuffling required which at the same time is a valuable cleanup. The final solution is to guard the futex exit handling with a per task mutex and make the waiter block on that mutex until the exiting task has the cleanup completed. Details why a simpler solution is not feasible can be found here: https://lore.kernel.org/r/20191105152728.GA5666@redhat.com Ignore my confusion of fork vs. vfork at the beginning of the thread. Futexes do that to human brains. :) The following series addresses both issues. Patch 1 is a slightly polished version of the original Yi and Yang submission. It is included for completeness sake and because it creates conflicts with the larger surgery which fixes issue #2. Aside of that a few eyeballs more on that subtlety are definitely not a bad thing especially as this has a user space component in it. The rest of the series addresses issue #2 which is more or less a kernel only problem, but extra eyeballs are appreciated. I'm certainly not proud about the solution for #2 but it's the best I could come up with without violating the user/kernel state consistency constraints. Rusty Russell was definitely right when he said that futexes are cursed, but as Peter Zijlstra pointed out he should have named them SNAFUtex right away. The series is also available from git: git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git WIP.locking/futex Thanks, tglx 8<------------- fs/exec.c | 2 include/linux/compat.h | 2 include/linux/futex.h | 38 +++-- include/linux/sched.h | 3 include/linux/sched/mm.h | 6 kernel/exit.c | 30 ---- kernel/fork.c | 40 ++--- kernel/futex.c | 324 ++++++++++++++++++++++++++++++++++++++++------- 8 files changed, 330 insertions(+), 115 deletions(-)