Received: by 2002:a25:31c3:0:0:0:0:0 with SMTP id x186csp370364ybx; Wed, 6 Nov 2019 01:56:12 -0800 (PST) X-Google-Smtp-Source: APXvYqxspytk+M7SpxqcnDMC5TtrvnEsTzZ0hshCIi3TOsVNaIP0XDOn2WCMljNDe0Lkwj6kmcJ5 X-Received: by 2002:a17:906:22c9:: with SMTP id q9mr8812817eja.198.1573034172600; Wed, 06 Nov 2019 01:56:12 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1573034172; cv=none; d=google.com; s=arc-20160816; b=fuUs6qNkL60RRT3WOYWM6K7934+kUTf98unERO2kQGBAvrNzjHCGjMie/1oLKh3KP4 04y3UTSAVQ96dys5oSPgc1XlRHqduYPEYALAOhJVCAnYENKtKpr9d+/s56Zx7JmQXsnG AQbc8+2KwmIelLSyACRK2HV62colZ8tutdoc2EbhzAigznHqThhml+LTahEzg0TIYtf9 fyTZBYWB2hqJnxvxtLfD1POwOSKYU6n7ny+0Np9FBa3wWMRgV9S9WPGjjNdRhZyf3hfx ShEPmLhGX9CNPhuLRSGEqtz1qWIJY+yBv36b8Jowfloo2CKsNzTiIOJKgyNStyCynEnM y59Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date; bh=Aa7hKSMaXNt4X3HmYUgBWN45wXLWXioElGlE/cR8ZoY=; b=TK23kFN0z3pvCbkXr0xoLXkpGX2EFnAPAell0BEAVmHpypeLX95U5FF4UBdUkeYAzL vr873Z1WJ/IyOug61D8IzmNFgyaDINsnnqBAG+s3VrWizotu2OeX0dIE/Xq1YeHysp60 w9ZwIy3BnjUlA/3aL2RQmSLYqmwgibw5fPmC6x8PxpdVfegVakYLbLMDoAVkLSlJqhLX G62bmw0W7XXCgSWdCQDTqbe1MZFuozTpQ2WIn0dFg89dsAP1uguck7kSn7bObtG/tAv4 BnLJnxuCALxpoodEnFRxcIQoX8x4GjFREwpcgoOhynuzf+T3R+MzZ/Nrsyod6/bI8hX5 GC+g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a11si11231637ejt.202.2019.11.06.01.55.48; Wed, 06 Nov 2019 01:56:12 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728469AbfKFJxN (ORCPT + 99 others); Wed, 6 Nov 2019 04:53:13 -0500 Received: from Galois.linutronix.de ([193.142.43.55]:43761 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726143AbfKFJxM (ORCPT ); Wed, 6 Nov 2019 04:53:12 -0500 Received: from p5b06da22.dip0.t-ipconnect.de ([91.6.218.34] helo=nanos) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1iSHzq-0006eR-Rf; Wed, 06 Nov 2019 10:53:03 +0100 Date: Wed, 6 Nov 2019 10:53:01 +0100 (CET) From: Thomas Gleixner To: Oleg Nesterov cc: Florian Weimer , Shawn Landden , libc-alpha@sourceware.org, linux-api@vger.kernel.org, LKML , Arnd Bergmann , Deepa Dinamani , Andrew Morton , Catalin Marinas , Keith Packard , Peter Zijlstra Subject: Re: handle_exit_race && PF_EXITING In-Reply-To: <20191106085529.GA12575@redhat.com> Message-ID: References: <20191104002909.25783-1-shawn@git.icu> <87woceslfs.fsf@oldenburg2.str.redhat.com> <20191105152728.GA5666@redhat.com> <20191106085529.GA12575@redhat.com> User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Oleg, On Wed, 6 Nov 2019, Oleg Nesterov wrote: > I have found the fix I sent in 2015, attached below. I forgot everything > I knew about futex.c, so I need some time to adapt it to the current code. > > But I think it is clear what this patch tries to do, do you see any hole? > @@ -716,11 +716,13 @@ void exit_pi_state_list(struct task_struct *curr) > > if (!futex_cmpxchg_enabled) > return; > + > /* > - * We are a ZOMBIE and nobody can enqueue itself on > - * pi_state_list anymore, but we have to be careful > - * versus waiters unqueueing themselves: > + * attach_to_pi_owner() can no longer add the new entry. But > + * we have to be careful versus waiters unqueueing themselves. > */ > + curr->flags |= PF_EXITPIDONE; This obviously would need a barrier or would have to be moved inside of the pi_lock region. > raw_spin_lock_irq(&curr->pi_lock); > while (!list_empty(head)) { > > @@ -905,24 +907,12 @@ static int attach_to_pi_owner(u32 uval, union futex_key *key, > return -EPERM; > } > > - /* > - * We need to look at the task state flags to figure out, > - * whether the task is exiting. To protect against the do_exit > - * change of the task flags, we do this protected by > - * p->pi_lock: > - */ > raw_spin_lock_irq(&p->pi_lock); > - if (unlikely(p->flags & PF_EXITING)) { > - /* > - * The task is on the way out. When PF_EXITPIDONE is > - * set, we know that the task has finished the > - * cleanup: > - */ > - int ret = (p->flags & PF_EXITPIDONE) ? -ESRCH : -EAGAIN; > - > + if (unlikely(p->flags & PF_EXITPIDONE)) { > + /* exit_pi_state_list() was already called */ > raw_spin_unlock_irq(&p->pi_lock); > put_task_struct(p); > - return ret; > + return -ESRCH; But, this is incorrect because we'd return -ESRCH to user space while the futex value still has the TID of the exiting task set which will subsequently cleanout the futex and set the owner died bit. The result is inconsistent state and will trigger the asserts in the futex test suite and in the pthread_mutex implementation. The only reason why -ESRCH can be returned is when the user space value of the futex contains garbage. But in this case it does not contain garbage and returning -ESRCH violates the implicit robustness guarantee of PI futexes and causes unexpected havoc. See da791a667536 ("futex: Cure exit race") for example. The futex PI contract between kernel and user space relies on consistent state. Guess why that code has more corner case handling than actual functionality. :) Thanks, tglx