Received: by 2002:a25:31c3:0:0:0:0:0 with SMTP id x186csp1159635ybx; Tue, 5 Nov 2019 11:20:38 -0800 (PST) X-Google-Smtp-Source: APXvYqxv17s/T507dSH4iPucaZJKgyLaxsJJGf16gd+Fwby+NkIpt3taz4nxIVm3YLz7wZOON6Py X-Received: by 2002:a17:906:f0c9:: with SMTP id dk9mr3997907ejb.56.1572981637930; Tue, 05 Nov 2019 11:20:37 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1572981637; cv=none; d=google.com; s=arc-20160816; b=ntvyPsg+cZRBE8lJazAmYvX44OTIhqv8bpADRbookbVD8DXVgxMLQSnut+GffJmr1U JNgk3VNfLsavYirM9mw5ht3KPeg39x29w3oa8tNXaGsXE+w4aHt0MPrDwPPjmxC754Rr vRr/E7lTwoPyBS6CXXjsVOiqBfDEkZgN5cmd4vaJ/1+N2e9PX+YpW2uGsFscJ9g5hd7/ GO0KAvJlGlXmYX6F7NwcnYQA6ADZaRuRB33huZQwPCn7sBSW3UbMR37FF6BxL1Mnwflb 6JVYRO5psF3YKzHuUenhi6trAuXfwG0VfjSzQQoOzPGqwpQBqq66v3rM0w5JRIt3Ra9g VK3g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date; bh=vvPgChB76PVDlS6EnrGtI9SMDQlnPzGjdIb6/XL6NtU=; b=HxmkoZ/wxVRJIEKi/M/ZbG+L3m5RpRp7XHWnoElr0gVX6xKWg5UiuW2MmmCVyqGcko DNspS/ULVGdJ+QbB1RmzDDHyBq/vVm+DBnNEfsxbEx2u5jf9MZyDz7Ng0lrOvbkILrwq Na2KZlyyLMjxGMWEB22/w/YVIkYyRS72wrXpb0bilW/YximO0yqwa+DB/ug5QWpZOp5U tkh35Gwdq8xDMZzutTRZeAjKaQfwH/OK7ANKJgmR03g+umRQMHjtxpagm3bzICP2n62X 9KVahpoex8eErg+37hoKSbzocBqNnewJseb73FDhNNrMAwgKejfg+cQc0uHyXCZfIWg1 zFGg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z24si11145834edd.127.2019.11.05.11.20.13; Tue, 05 Nov 2019 11:20:37 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2390880AbfKETTX (ORCPT + 99 others); Tue, 5 Nov 2019 14:19:23 -0500 Received: from Galois.linutronix.de ([193.142.43.55]:42404 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2390856AbfKETTW (ORCPT ); Tue, 5 Nov 2019 14:19:22 -0500 Received: from p5b06da22.dip0.t-ipconnect.de ([91.6.218.34] helo=nanos) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1iS4MH-0002Zo-5d; Tue, 05 Nov 2019 20:19:17 +0100 Date: Tue, 5 Nov 2019 20:19:15 +0100 (CET) From: Thomas Gleixner To: Oleg Nesterov cc: Florian Weimer , Shawn Landden , libc-alpha@sourceware.org, linux-api@vger.kernel.org, LKML , Arnd Bergmann , Deepa Dinamani , Andrew Morton , Catalin Marinas , Keith Packard , Peter Zijlstra Subject: Re: handle_exit_race && PF_EXITING In-Reply-To: Message-ID: References: <20191104002909.25783-1-shawn@git.icu> <87woceslfs.fsf@oldenburg2.str.redhat.com> <20191105152728.GA5666@redhat.com> User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 5 Nov 2019, Thomas Gleixner wrote: > > I'm a moron. It's vfork() not fork() so the behaviour is expected. > > Staring more at the trace which shows me where this goes down the drain. parent child set FIFO prio 2 vfork() -> set FIFO prio 1 implies wait_for_child() sched_setscheduler(...) exit() do_exit() tsk->flags |= PF_EXITING; .... mm_release() exit_futex(); (NOOP in this case) complete() --> wakes parent sys_futex() loop infinite because PF_EXITING is set, but PF_EXITPIDONE not So the obvious question is why PF_EXITPIDONE is set way after the futex exit cleanup has run, but moving this right after exit_futex() would not solve the exit race completely because the code after setting PF_EXITING is preemptible. So the same crap could happen just by preemption: task holds futex ... do_exit() tsk->flags |= PF_EXITING; preemption (unrelated wakeup of some other higher prio task, e.g. timer) switch_to(other_task) return to user sys_futex() loop infinite as above And just for the fun of it the futex exit cleanup could trigger the wakeup itself before PF_EXITPIDONE is set. There is some other issue which I need to lookup again. That's a slightly different problem but related to futex exit race conditions. The way we can deal with that is: do_exit() tsk->flags |= PF_EXITING; ... mutex_lock(&tsk->futex_exit_mutex); futex_exit(); tsk->flags |= PF_EXITPIDONE; mutex_unlock(&tsk->futex_exit_mutex); and on the futex lock_pi side: if (!(tsk->flags & PF_EXITING)) return 0; <- All good if (tsk->flags & PF_EXITPIDONE) return -EOWNERDEAD; <- Locker can take over mutex_lock(&tsk->futex_exit_mutex); if (tsk->flags & PF_EXITPIDONE) { mutex_unlock(&tsk->futex_exit_mutex); return -EOWNERDEAD; <- Locker can take over } queue_futex(); mutex_unlock(&tsk->futex_exit_mutex); Not that I think it's pretty, but it plugs all holes AFAICT. Thanks, tglx