Received: by 2002:a25:31c3:0:0:0:0:0 with SMTP id x186csp56579ybx; Tue, 5 Nov 2019 19:24:13 -0800 (PST) X-Google-Smtp-Source: APXvYqxrnEm3BZ7KsR3ssjDkjtb1qQjIU5Ny+P3ZiPcHGD3d+Ip2fjAMTkfuVUwIqQgy8lavXof4 X-Received: by 2002:a50:f284:: with SMTP id f4mr293822edm.126.1573010652950; Tue, 05 Nov 2019 19:24:12 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1573010652; cv=none; d=google.com; s=arc-20160816; b=vK7OZXqtOGseUseFg2h+Pg5G9jHu+dgQbeXoR1G8ZuuwRU/LgQ3fOLRNCMNQMev6RQ Cpba0CeFQ61bZlNj+LYKiZmPPtlogpe+JpGM9q1RrDt8tHAEmmRgHBXNhszZEusrrLE1 pBoXHXf2u4N+FnNZ7el4IXmMgGEQ/MLPLnvzCRZQX1QfhjG1ObgZSzInzaCFJws+YOla wOOf9ewQ8u6enlOqXE4AXfsrnjP55QpLhzkKQplMQRgfOU3Je2Z9eynaI1lAjBcfQDRX yKxUlW/2LDAyKpQJkqBJhoAYWet/fAXc9mB3QN12iduy1la+f7yzHl33jvP6oacCaPHp Iziw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from; bh=vgtsCIqf6SB0uutdRIFpRapjuIPgDaAbDx6U4oes2vc=; b=qL9W6idVS9BgTccDUXTDiodXVoltOrFED/WnUeH9Xy7G4Acsv7trjAUdjR6oQtfyX8 iuLNUHcxN5GgF1FRGJCv9D3H1S+/uX746l5CcImTJaDnjgkksIVByzK7BybsoPr1CJnX 0iXayNMDYVP43W9wDvrQDahJK4F16XYmY67KxHMgIvbdigfjgs7qKCArO7ytg3qNNajG IYUPGYviQHc6fkRF3tsoJsvV4VeB+FjvMJrNUyD8kkeqJ+eRxmAGopZbot3bKoju6caS 4PA92LpWEQv4J4HWWN7PmAR/8Ji6vwnUBSIxrsIEqvsgU4gyYB8o4ia2KEzc30IrTSAh 0RkA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a45si11197752edc.332.2019.11.05.19.23.49; Tue, 05 Nov 2019 19:24:12 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730948AbfKFDVG (ORCPT + 99 others); Tue, 5 Nov 2019 22:21:06 -0500 Received: from mx7.zte.com.cn ([202.103.147.169]:36831 "EHLO mxct.zte.com.cn" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730655AbfKFDVG (ORCPT ); Tue, 5 Nov 2019 22:21:06 -0500 Received: from mse-fl1.zte.com.cn (unknown [10.30.14.238]) by Forcepoint Email with ESMTPS id EF20E289D0F04C342B62; Wed, 6 Nov 2019 11:21:02 +0800 (CST) Received: from notes_smtp.zte.com.cn (notes_smtp.zte.com.cn [10.30.1.239]) by mse-fl1.zte.com.cn with ESMTP id xA63JtrJ068188; Wed, 6 Nov 2019 11:19:55 +0800 (GMT-8) (envelope-from wang.yi59@zte.com.cn) Received: from fox-host8.localdomain ([10.74.120.8]) by szsmtp06.zte.com.cn (Lotus Domino Release 8.5.3FP6) with ESMTP id 2019110611201445-308508 ; Wed, 6 Nov 2019 11:20:14 +0800 From: Yi Wang To: tglx@linutronix.de Cc: mingo@redhat.com, peterz@infradead.org, dvhart@infradead.org, linux-kernel@vger.kernel.org, xue.zhihong@zte.com.cn, wang.yi59@zte.com.cn, jiang.xuexin@zte.com.cn, Yang Tao Subject: [PATCH] Robust-futex wakes up the waiters will be missed Date: Wed, 6 Nov 2019 11:23:02 +0800 Message-Id: <1573010582-35297-1-git-send-email-wang.yi59@zte.com.cn> X-Mailer: git-send-email 1.8.3.1 X-MIMETrack: Itemize by SMTP Server on SZSMTP06/server/zte_ltd(Release 8.5.3FP6|November 21, 2013) at 2019-11-06 11:20:14, Serialize by Router on notes_smtp/zte_ltd(Release 9.0.1FP7|August 17, 2016) at 2019-11-06 11:19:58, Serialize complete at 2019-11-06 11:19:58 X-MAIL: mse-fl1.zte.com.cn xA63JtrJ068188 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Yang Tao We found two scenarios: (1)When the owner of a mutex lock with robust attribute and no_pi will release the lock and executes pthread_mutex_unlock(),it is killed after setting mutex->__data.__lock = 0 and before wake others. It will go through the robust process, but it will not wake up watiers because mutex->__data.__lock = 0. OWNER pthread_mutex_unlock() | | V atomic_exchange_rel (&mutex->__data.__lock, 0) <------------------------killed lll_futex_wake () | | |(__lock = 0) |(enter kernel) | V do_exit() exit_mm() mm_release() exit_robust_list() handle_futex_death() | |(__lock = 0) |(uval = 0) | V if ((uval & FUTEX_TID_MASK) != task_pid_vnr(curr)) return 0;<------wakes up waiters will be missed (2) When a waiter wakes up and returns to the user space, it is killed before getting the lock (before modifying mutex->__data.__lock), and other waiters will not wake up. OWNER WAITER pthread_mutex_unlock() | |(__lock = 0) | V futex_wake() fuet_wait() //awaked | | |(enter userspace) |(__lock = 0) | V oldval = mutex->__data.__lock <-----------------killed atomic_compare_and_exchange_val_acq (&mutex->__data.__lock, | id | assume_other_futex_waiters, 0) | | | (enter kernel)| | V do_exit() | | V handle_futex_death() | |(__lock = 0) |(uval = 0) | V if ((uval & FUTEX_TID_MASK) != task_pid_vnr(curr)) return 0;<------wakes up waiters will be missed So, in these scenarios, task will not wake up waiters We found that when the task was killed in two scenarios, task->robust_list->list_op_pending =&mutex->__data.__list.__next, so we can do something. We think that task should wake up once when the following conditions are met: (1) task->robust_list->list_op_pending != NULL; (2) mutex->__data.__lock = 0; (3) no_pi In some cases, this may lead to some redundant wakeups, which will reduce the efficiency of the program, but it will not affectthe program operation, and it is very rare to meet these three conditions. At the same time, we only wake up and do not set the died bit, because mutex->__data.__lock = 0; mutex->__data.__owner = 0; At this time, it can be seen that there is no owner,and the wake-up process directly take the lock. If the died bit is set, it may cause some misoperation. Such as a waiter being killed when the owner is releasing the lock, it will mark the lock with the died bit, which is not good. We don't need to set "mutex->__data.__count"(in mutex structure), which will not affect repeated lock holding. Signed-off-by: Yang Tao --- kernel/futex.c | 41 ++++++++++++++++++++++++++++++++++++----- 1 file changed, 36 insertions(+), 5 deletions(-) diff --git a/kernel/futex.c b/kernel/futex.c index bd18f60..8511dad 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -3456,7 +3456,9 @@ SYSCALL_DEFINE3(get_robust_list, int, pid, * Process a futex-list entry, check whether it's owned by the * dying task, and do notification if so: */ -static int handle_futex_death(u32 __user *uaddr, struct task_struct *curr, int pi) +static int handle_futex_death(u32 __user *uaddr, + struct task_struct *curr, int pi, + bool pending) { u32 uval, uninitialized_var(nval), mval; int err; @@ -3469,6 +3471,35 @@ static int handle_futex_death(u32 __user *uaddr, struct task_struct *curr, int p if (get_user(uval, uaddr)) return -1; + /* + * When uval is 0, there may be waiters blocking on the lock: + * + * (1)When the owner of a mutex lock with robust attribute + * and no_pi will release the lock and executes + * pthread_mutex_unlock(),it is killed after setting + * mutex->__data.__lock = 0 (uval = 0)and before wake others. + * It will enter the robust process, but it will not wake up + * watiers because uval = 0. + * + * (2) When a waiter wakes up and returns to the user space, + * it is killed before getting the lock (before modifying + * mutex->__data.__lock), and other waiters will not wake up + * because uval = 0. + * + * We found that when the task was killed in two scenarios, + * task->robust_list->list_op_pending != NULL. Therefore, + * it can be judged that the task is killed when releasing + * or acquiring the lock. + * + * We should wake up once when the following conditions are + * met: + * 1) task->robust_list->list_op_pending != NULL + * 2) mutex->__data.__lock = 0 (uval = 0) + * 3) no_pi + */ + if (pending && !pi && uval == 0) + futex_wake(uaddr, 1, 1, FUTEX_BITSET_MATCH_ANY); + if ((uval & FUTEX_TID_MASK) != task_pid_vnr(curr)) return 0; @@ -3590,7 +3621,7 @@ void exit_robust_list(struct task_struct *curr) */ if (entry != pending) if (handle_futex_death((void __user *)entry + futex_offset, - curr, pi)) + curr, pi, 0)) return; if (rc) return; @@ -3607,7 +3638,7 @@ void exit_robust_list(struct task_struct *curr) if (pending) handle_futex_death((void __user *)pending + futex_offset, - curr, pip); + curr, pip, 1); } long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout, @@ -3784,7 +3815,7 @@ void compat_exit_robust_list(struct task_struct *curr) if (entry != pending) { void __user *uaddr = futex_uaddr(entry, futex_offset); - if (handle_futex_death(uaddr, curr, pi)) + if (handle_futex_death(uaddr, curr, pi, 0)) return; } if (rc) @@ -3803,7 +3834,7 @@ void compat_exit_robust_list(struct task_struct *curr) if (pending) { void __user *uaddr = futex_uaddr(pending, futex_offset); - handle_futex_death(uaddr, curr, pip); + handle_futex_death(uaddr, curr, pip, 1); } } -- 2.15.2