Received: by 2002:a25:31c3:0:0:0:0:0 with SMTP id x186csp1088207ybx; Thu, 7 Nov 2019 07:06:59 -0800 (PST) X-Google-Smtp-Source: APXvYqx42lgQZCG27m7rMlSbGOH0u3aFcgca/5ro6Se8ApNBHTVOfZiu1JqfV0d++C7qD0RHxiu6 X-Received: by 2002:aa7:c4c8:: with SMTP id p8mr4063946edr.121.1573139219491; Thu, 07 Nov 2019 07:06:59 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1573139219; cv=none; d=google.com; s=arc-20160816; b=KADRtID84075Y3nk9XyK8apYZZCzFxC1lBvkrG+QRYevTKOhgYiMFGq6UzHSLXAk8/ 9NmzHJA16Ht6O+wruwjqDXukPX7nbuL+yv/lgpGlLvSSIw19N0LCWcoYyg50D1GwUmtP lFeBD6IbgNLx+k++28wtk7rARuE4Y75M0wgQwCMtUCYIw7CcbqabmJ177WDs2Ycjx+6A Z7+FDGXAf7z2/z8kU6U/AJKEWZdHt3opKmdZh/Ft4DFpx40rrvuqVmNTz8Zv2SHnOJdl /PQmvodd5Wg6jNgCAvH/ydThvGsZ32dbA8bTiqdetItlp3vAUbqcir3M2Yi1Ex0ULZbF wC+w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-disposition :content-transfer-encoding:user-agent:in-reply-to:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=rz7Flxivq/MsSwcTJWISr8XdYwP7g8dNxx0v7qQboaY=; b=ccE/M/kcmPFbmf1jqVXT9KWIcH/fFVbUrbBea64e3v/dpUcXJsYV1RrNa08b4HZCnk xhB0VNBFzgLj7uam7bUYl6vdj9wcteU20bbz0MYu6CPV8HaJ2YDMUraKELntI9KYXvLS fWrwEXK0+SsbBjKN+9AwJnpvB4+zKnbDrHPoEbbvwYYNQRR9nfOObXGSt3rBNDKi8Vl7 t5AcFLR7eQ/1q5y1YnxCH2Z1KcdkwgmrxNUjaHnucGfHtMwTZqLworPiA7ioIxkg0Q4w oV8FYB7U7HTitvegEhr4teFPiGOjjxak1eWfSVlhZ9ZNHkovuG+KwdgJRp4JyG09jtFW /NGQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=aeNxoh4+; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 17si1552296ejt.419.2019.11.07.07.06.35; Thu, 07 Nov 2019 07:06:59 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=aeNxoh4+; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388664AbfKGPDh (ORCPT + 99 others); Thu, 7 Nov 2019 10:03:37 -0500 Received: from us-smtp-2.mimecast.com ([205.139.110.61]:57625 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726810AbfKGPDh (ORCPT ); Thu, 7 Nov 2019 10:03:37 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1573139015; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=rz7Flxivq/MsSwcTJWISr8XdYwP7g8dNxx0v7qQboaY=; b=aeNxoh4+5saKvptwaMqT8MgK1nhBU9UINOftBAIPVTja1PRHv6/1iwBD3XLSMBMYgCTIrw erez4VFhcRQavrvJKF+tzLDrTls0S6q+wQDOtOO5YI2x9PvPsf4veKt2/CtsRtyLPlyYiS tAK5KIyKvnfH7YqsuqeOydxcSouds1o= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-184-RVqa4xSAOlWCaLzfVm6j3A-1; Thu, 07 Nov 2019 10:03:32 -0500 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 66FAA1800D6B; Thu, 7 Nov 2019 15:03:30 +0000 (UTC) Received: from dhcp-27-174.brq.redhat.com (unknown [10.43.17.44]) by smtp.corp.redhat.com (Postfix) with SMTP id 7E71F5D6A0; Thu, 7 Nov 2019 15:03:24 +0000 (UTC) Received: by dhcp-27-174.brq.redhat.com (nbSMTP-1.00) for uid 1000 oleg@redhat.com; Thu, 7 Nov 2019 16:03:30 +0100 (CET) Date: Thu, 7 Nov 2019 16:03:23 +0100 From: Oleg Nesterov To: Thomas Gleixner Cc: LKML , Peter Zijlstra , Ingo Molnar , Darren Hart , Yi Wang , Yang Tao , Florian Weimer , Carlos O'Donell , Alexander Viro Subject: Re: [patch 00/12] futex: Cure robust/PI futex exit races Message-ID: <20191107150323.GA24042@redhat.com> References: <20191106215534.241796846@linutronix.de> MIME-Version: 1.0 In-Reply-To: <20191106215534.241796846@linutronix.de> User-Agent: Mutt/1.5.24 (2015-08-30) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-MC-Unique: RVqa4xSAOlWCaLzfVm6j3A-1 X-Mimecast-Spam-Score: 0 Content-Type: text/plain; charset=WINDOWS-1252 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/06, Thomas Gleixner wrote: > > fs/exec.c | 2=20 > include/linux/compat.h | 2=20 > include/linux/futex.h | 38 +++-- > include/linux/sched.h | 3=20 > include/linux/sched/mm.h | 6=20 > kernel/exit.c | 30 ---- > kernel/fork.c | 40 ++--- > kernel/futex.c | 324 ++++++++++++++++++++++++++++++++++++++++= ------- > 8 files changed, 330 insertions(+), 115 deletions(-) The whole series looks good to me. But I am just curious, what do you all think about the patch below instead of 10/12 and 12/12 ? Oleg. diff --git a/include/linux/sched.h b/include/linux/sched.h index 9e0de08..ad18433 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -621,6 +621,11 @@ struct wake_q_node { =09struct wake_q_node *next; }; =20 +struct wake_q_head { +=09struct wake_q_node *first; +=09struct wake_q_node **lastp; +}; + struct task_struct { #ifdef CONFIG_THREAD_INFO_IN_TASK =09/* @@ -1055,6 +1060,7 @@ struct task_struct { =09struct list_head=09=09pi_state_list; =09struct futex_pi_state=09=09*pi_state_cache; =09unsigned int=09=09=09futex_state; +=09struct wake_q_head=09=09futex_exit_q; #endif #ifdef CONFIG_PERF_EVENTS =09struct perf_event_context=09*perf_event_ctxp[perf_nr_task_contexts]; diff --git a/include/linux/sched/wake_q.h b/include/linux/sched/wake_q.h index 26a2013..62805b5 100644 --- a/include/linux/sched/wake_q.h +++ b/include/linux/sched/wake_q.h @@ -35,11 +35,6 @@ =20 #include =20 -struct wake_q_head { -=09struct wake_q_node *first; -=09struct wake_q_node **lastp; -}; - #define WAKE_Q_TAIL ((struct wake_q_node *) 0x01) =20 #define DEFINE_WAKE_Q(name)=09=09=09=09\ diff --git a/kernel/futex.c b/kernel/futex.c index 4b36bc8..87763c7 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -1176,6 +1176,24 @@ static int attach_to_pi_state(u32 __user *uaddr, u32= uval, =09return ret; } =20 +static void wait_for_owner_exiting(int ret) +{ +=09struct wake_q_node *node =3D ¤t->wake_q; + +=09if (ret !=3D -EBUSY) { +=09=09WARN_ON_ONCE(node->next); // XXX not really correct ... +=09=09return; +=09} + +=09for (;;) { +=09=09set_current_state(TASK_UNINTERRUPTIBLE); +=09=09if (!READ_ONCE(node->next)) +=09=09=09break; +=09=09schedule(); +=09} +=09__set_current_state(TASK_RUNNING); +} + static int handle_exit_race(u32 __user *uaddr, u32 uval, =09=09=09 struct task_struct *tsk) { @@ -1185,8 +1203,10 @@ static int handle_exit_race(u32 __user *uaddr, u32 u= val, =09 * If the futex exit state is not yet FUTEX_STATE_DEAD, tell the =09 * caller that the alleged owner is busy. =09 */ -=09if (tsk && tsk->futex_state !=3D FUTEX_STATE_DEAD) +=09if (tsk && tsk->futex_state !=3D FUTEX_STATE_DEAD) { +=09=09wake_q_add(&tsk->futex_exit_q, current); =09=09return -EBUSY; +=09} =20 =09/* =09 * Reread the user space value to handle the following situation: @@ -2104,6 +2124,7 @@ static int futex_requeue(u32 __user *uaddr1, unsigned= int flags, =09=09=09hb_waiters_dec(hb2); =09=09=09put_futex_key(&key2); =09=09=09put_futex_key(&key1); +=09=09=09wait_for_owner_exiting(ret); =09=09=09cond_resched(); =09=09=09goto retry; =09=09default: @@ -2855,6 +2876,7 @@ static int futex_lock_pi(u32 __user *uaddr, unsigned = int flags, =09=09=09queue_unlock(hb); =09=09=09put_futex_key(&q.key); =09=09=09cond_resched(); +=09=09=09wait_for_owner_exiting(ret); =09=09=09goto retry; =09=09default: =09=09=09goto out_unlock_put_key; @@ -3701,6 +3723,7 @@ static void futex_cleanup(struct task_struct *tsk) void futex_exit_recursive(struct task_struct *tsk) { =09tsk->futex_state =3D FUTEX_STATE_DEAD; +=09wake_up_q(&tsk->futex_exit_q); } =20 static void futex_cleanup_begin(struct task_struct *tsk) @@ -3718,16 +3741,17 @@ static void futex_cleanup_begin(struct task_struct = *tsk) =09 */ =09raw_spin_lock_irq(&tsk->pi_lock); =09tsk->futex_state =3D FUTEX_STATE_EXITING; +=09wake_q_init(&tsk->futex_exit_q); =09raw_spin_unlock_irq(&tsk->pi_lock); } =20 static void futex_cleanup_end(struct task_struct *tsk, int state) { -=09/* -=09 * Lockless store. The only side effect is that an observer might -=09 * take another loop until it becomes visible. -=09 */ +=09raw_spin_lock_irq(&tsk->pi_lock); =09tsk->futex_state =3D state; +=09raw_spin_unlock_irq(&tsk->pi_lock); + +=09wake_up_q(&tsk->futex_exit_q); } =20 void futex_exec_release(struct task_struct *tsk)