Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754461AbdLGO6n (ORCPT ); Thu, 7 Dec 2017 09:58:43 -0500 Received: from mx0b-00010702.pphosted.com ([148.163.158.57]:49914 "EHLO mx0b-00010702.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754355AbdLGO6i (ORCPT ); Thu, 7 Dec 2017 09:58:38 -0500 References: <20171129175605.GA863@jcartwri.amer.corp.natinst.com> <20171206234622.GZ3326@worktop> <87y3mf8f1j.fsf@ni.com> <20171207104516.ljmivyqx7yrthflu@hirez.programming.kicks-ass.net> <20171207142648.n4h3vzyajw2zlxv2@hirez.programming.kicks-ass.net> User-agent: mu4e 0.9.19; emacs 25.3.1 From: Gratian Crisan To: Peter Zijlstra Cc: Gratian Crisan , Julia Cartwright , Thomas Gleixner , linux-kernel@vger.kernel.org, Darren Hart , Ingo Molnar Subject: Re: PI futexes + lock stealing woes In-reply-to: <20171207142648.n4h3vzyajw2zlxv2@hirez.programming.kicks-ass.net> Date: Thu, 07 Dec 2017 08:57:59 -0600 Message-ID: <87mv2ur3ew.fsf@kerf.amer.corp.natinst.com> MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [130.164.62.230] X-ClientProxiedBy: DM3PR12CA0076.namprd12.prod.outlook.com (2603:10b6:0:57::20) To DM5PR0401MB3591.namprd04.prod.outlook.com (2603:10b6:4:78::35) X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: f9bb2176-f2d8-4376-7384-08d53d82ea2d X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:(5600026)(4604075)(4534020)(4602075)(4627115)(201703031133081)(201702281549075)(2017052603286);SRVR:DM5PR0401MB3591; X-Microsoft-Exchange-Diagnostics: 1;DM5PR0401MB3591;3:Ep1LYTdpOxGHoVqYRsk0nqrx7qgRllz3BxqB+OwAU23rZuLsnSkaFHTs2XM2itJeD8J81KCYqqSYLaqbxebyyJu6C7JpyNEZuve8I3kv9q5SK8Hs/Gx9lbPONJxVhwrKlMTHl05i6RGRKktKtD5iAi2xDNiNaHHGqqaGpM3DY3BrUTjiPFAxmb+uIn5Xr9cDwm+bckKNTMoV73YYD+56v+0gX5pPjavdOPiU4ns/PGyYWTV3vcKYPIwqaDZPZxk9;25:+OdGD++4WoVNtl6p4Hj8/veISOnbu/gMpAgAUv7e1T/nF98/3D+GNNhM3pwRVDpe4cBpzDe0MnTP2EBlCZnQYFL0KLdi9Frb5g070xCypdC4LrHMjQIDXtTZ84Sw+3O5Kx19tfgO3bxKM69IU22Ez6uejcxnZ1b966PXuaM7t942vd2s3tXN5Du5sdex20Q3Aitm4B9zUm1b6vz5u922r4Lbtb/vvCbLoPjbOrKs7zTqtdJIzZ0VQmbr8yx+bBHQC6NIXimeZIrUKpE8jiU+w8mzrT7OzShopkiTQI0wtvqWJ8JCgT+bXhHaNWcR2Al1HUax5Q4+uwS4/+Uf2Z4lGw==;31:NeE78H1kVWtKXteDDKgPCl/KZSMREV23L5OhUShu6t6jZj2DTk5jhpMU8ej+30U4c5m/GLdnA36UVBcTqpOMOICFGHiXSSGcD7hmzpsq+2do/Qfmg0DNp7noTnyRV759CVb6FBgZVQm4yCStU0Twqa8QDE87zJqh6C96nW36iYNPaRgHc75k0rbiVovdV7P7HNQ2wRYMHCtLzwGGgjIHfIcmkwk1WEPaGb2dcy+Mbig= X-MS-TrafficTypeDiagnostic: DM5PR0401MB3591: X-Microsoft-Exchange-Diagnostics: 1;DM5PR0401MB3591;20:a+B16+XGk9V6wNN12owDFxviurmJMZShbWqHZpyYXdnYIi2MCFuCNAjXDj8wRlsuPLAhsnvXKwD/P+Km/2y9qZgqVbmy8vh/GhKSBbWWXVzy5n1G892hxRliicBOz4aHULLFePrXUi1VZjrcTdWZUkgsgOUOFuOOMG/yCT/1LqhReU4kjVBbhRNeHm5CciYmM9KbYTXo04oBFvIFna7h7PYhouJI6iF8MOgyKOz7nAM7mg1zBbBEJa7OghX/G0yumuecFxEyxexHFg4vECY7YaA5k9VW/DMk0zRBj0kW5kmMAkXRPMA48I575+WP+LAB+0+73xC3cgLVr67pbDET+h3iDiZBv7BUuUbkeyE+7H3niqfJZ3k5MapGteF2/B6ci+lZNcwQav4ziz+hz6I4zFShjh9kHMrNAI52SFT3tA2DxfxDqZSkO0jDkD3W8ymtsqOyIGSQligrrrdyrgeG4EMmBZixHLOK/pyTUmEZWBvlHNWm6ObxFyKRq8Wk3ji4Uh4MljuBNYISdNaGGXiNZgigiEW2s4iXC4FJ4moRXPjU5gWM3zrRfRnunH7fXkWIwo3lNrl2hKhL8PsQudnVeod+TmvkDUWWftdIeKL7H4g=;4:Iwe+KuLucx6By9Zh8DxXbaj0ivonybDUW9ze2WvLO2+qiGjJasLnpYLSkm+/8+wLtc4Ypo0VvAVE9RULR12k4FTK4o2LOGwrM2iFfIt3m+xsOd70kZKQ9PRgAaLPMu+viLKCgQKtsg8FLUBrGs84I31XCs1tY0e56i7PfqpX2fLa7k3cMUBcJx5pU4WAT+qWj5KeXbXnnRxKP22CJv/OVM+nSgCi/ZQN9vUdNweffjNgcefhWwSydpii6CLhGM0pnkH6CXW1RpvRfbcralkO8B1xcnTGsc07Q99wONUHrmymG4lMLQ2zpaSuTTZb5Bb1 X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(788757137089); X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(6040450)(2401047)(8121501046)(5005006)(3231022)(3002001)(10201501046)(93006095)(93001095)(6041248)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123564025)(20161123562025)(20161123558100)(20161123560025)(20161123555025)(6072148)(201708071742011);SRVR:DM5PR0401MB3591;BCL:0;PCL:0;RULEID:(100000803101)(100110400095);SRVR:DM5PR0401MB3591; X-Forefront-PRVS: 05143A8241 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(376002)(346002)(366004)(199004)(189003)(8676002)(81156014)(575784001)(478600001)(81166006)(2906002)(86362001)(48376002)(6506006)(50466002)(25786009)(229853002)(55016002)(97736004)(6246003)(8936002)(53936002)(305945005)(5660300001)(6116002)(47776003)(66066001)(6916009)(9686003)(83506002)(6666003)(2950100002)(68736007)(52116002)(7696005)(4326008)(106356001)(54906003)(16526018)(76176011)(33646002)(51416003)(93886005)(101416001)(16586007)(3846002)(105586002)(316002)(58126008)(7736002);DIR:OUT;SFP:1102;SCL:1;SRVR:DM5PR0401MB3591;H:kerf.amer.corp.natinst.com;FPR:;SPF:None;PTR:InfoNoRecords;A:1;MX:1;LANG:en; X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1;DM5PR0401MB3591;23:RzqNUWV1604zUPilEUsxYIAnJrjFFCfs8Y9Gopi?= =?us-ascii?Q?xkoQ4+fo8Y6exdSTdhs8CctfAzbwDophfd2YE1s/CxJ8dTv7PLzhxA5aMgvn?= =?us-ascii?Q?vWjy9HfKXKM5TJPX4WpGW0VKR2Bc/hEns9T5uEG4hS/JhyC+fqffzJYeEEjP?= =?us-ascii?Q?FuT1BhuRnEHEESNqMLHDcMhKLhYkuLz1wFCbQKMBXeuNsqvkg/sQBVTtJT/t?= =?us-ascii?Q?K5OEhnv9ca0uhhd99FSA/fAwxQ9FUOkqm0QQ9xJDDzBEX8glFzIKeuILbYIY?= =?us-ascii?Q?I0X2k0TML7wnIXE+GwZ4u2IVOZC5/qv5BDM3s1lTrKfWoRJH/zjNST98E1II?= =?us-ascii?Q?KCWp/zrAnCxVbybeBIMPcH7qm+nEpdWJ22IhIWxkauxCXkMrgRrEnTX/2Oyq?= =?us-ascii?Q?w8SR3Vpyk+0IebXYo7a6zWyJn6RI8Y4N//C6iz3G6ailCQ+PbJ1H3AN20O/V?= =?us-ascii?Q?4VRQGU9DihD6ROzOEEftdwsPevbOeZPnEYiN9WDwCT56e3zgZpqZ22OtVIH+?= =?us-ascii?Q?xN+cSDmQ5L4xcX2VpfNuYZw4616/o/G5EwThWhjAcAGSjFdKdjnSDdBQIjkS?= =?us-ascii?Q?Hzxy4O9d3fJAzZlQUvdr7IByuSWf/KQoDfvB7fS9v9L1GktL/FhL2gzzQ6gd?= =?us-ascii?Q?eStv5zzKMBpO3DJpYS4rEcoyQsBtz6doGQjfXKgTCE8T+58j4yhG7BiEipq7?= =?us-ascii?Q?CnKa1S5zRdNofzzzoAs0ine0NOf0deCnnWo7TnThMDl02qhytgIRRTd/Tl0f?= =?us-ascii?Q?f2DUatpNnzknBeLpl51ZfMVAaYVFi1Rp52VA08niVAlReZQozgjsy1RiEAg8?= =?us-ascii?Q?jyYcpCc0uc/epCqN77kFq5lte861CJB7S3Wh+Lsury+e2Bp1DkiR852rRExS?= =?us-ascii?Q?YwlWY+HWpOpfaK29W+DoRCqyldLzYZQH0LbadZnFDPHbsjoW2K/nHr5rDBwL?= =?us-ascii?Q?FwpR+SyQP6yJO/sAmB6MG2n6Avh5dv6YLrrWaACKjuyGUgfP8DLln3HeOkEF?= =?us-ascii?Q?JzjoxBetSFIaeBC+Foot6su8WNpKkk+skTCRX7aSDVkusHkzNG2hVxbVQtI6?= =?us-ascii?Q?u1yWaK0h1Kp1G0ImiTqtHuyoPuyuVFXH8WxCkvLGIwzVTnrxjVE3BCHXBI7V?= =?us-ascii?Q?FZX7Lqta8pWyiJBIQRrmRg3yGC+S1r06we0o/kKwxwFziIesXu/euyQ=3D?= =?us-ascii?Q?=3D?= X-Microsoft-Exchange-Diagnostics: 1;DM5PR0401MB3591;6:xBgBCVAwIOThQp4b21wLJxIsEvNT/l3DqDBboYenjSHXIsHesv4E9urtrtHhcSXV1ghekhUEwJ4wNHj0XQ/LkSvHNU2XdbJdpckYohrHsuL9v/0lVZnnbLiSSjgm+mNqaZgDpMrNBZR3YmvQOeY/ggllJ9ZsRuMHvBWESuCM/cpMncyrRBMylr+Jag7r4sOJ+uq7fqIQuwPyvI06Oa4KNDjweqcwnwW2ztte6F3qYFaWaKlEVZlhyvdNEVRETm5OdViFDP3zcyvRsRsyMk+78TitLLModO5L5JeIDM6xrhOk+N6+VJ18y1itaSRaqlObx2FWM3ocmW5zofKNnRp8bouXXVFT22vOUHRufILfQ3k=;5:VHO/5ZzchATfuEmixxYesY8hQU8WESkTM1l1mVzGSA9iNhoaqw2kQKdT3Zquq/bO0OuTvp9WcO4U9C/Gm0qQSfTxRXjigBfKI8czIuKPKq9eioKHt7U19JHnU38Hq/SKJH475tMjsLgfryfb473IlS6a0PDaDs420KhzJHtEklU=;24:i/Zt9I01/rvtiUqGGH5h+VdmgfWlMo6E/OYD6pm04c8XUCsyJk4U2qY7LoHayn+9ahuX/xh3b331HVca5fqipDt+U5eu6h1AXMLvaLnTmjg=;7:ebd/fATrHfSnqTFLKEMODiHX7kPXB44wI8/WNa/O0b0bshCkH99Kh7ErN/ChLmtRE+hVJEzxDlKlY2UPY5lw3REewBPqPbHyNtwO0Cl+E59ZVf5g6mwtPz3Z6a2lHXU2SbBEUrKk9h/q0jtgCiXGK3FijcRS/TXmnKxlmzso6hGEZnt5+mBOX45EEhByoUCoUxrXyXlqR8psV+IAgt9Uj33+M6nbW99q0I+iSbXZm4+DdSmqiT1Rxp5TWMll7msC SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: ni.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 07 Dec 2017 14:58:02.4495 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: f9bb2176-f2d8-4376-7384-08d53d82ea2d X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 87ba1f9a-44cd-43a6-b008-6fdb45a5204e X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM5PR0401MB3591 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-12-07_06:,, signatures=0 X-Proofpoint-Spam-Details: rule=inbound_policy_notspam policy=inbound_policy score=30 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=30 reason=mlx scancount=1 engine=8.0.1-1709140000 definitions=main-1712070220 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6701 Lines: 206 Peter Zijlstra writes: > The below compiles and boots, but is otherwise untested. Could you give > it a spin? Thank you! Yes, I'll start a test now. -Gratian > --- > kernel/futex.c | 83 +++++++++++++++++++++++++++++++++-------- > kernel/locking/rtmutex.c | 26 +++++++++---- > kernel/locking/rtmutex_common.h | 1 + > 3 files changed, 87 insertions(+), 23 deletions(-) > > diff --git a/kernel/futex.c b/kernel/futex.c > index 76ed5921117a..29ac5b64e7c7 100644 > --- a/kernel/futex.c > +++ b/kernel/futex.c > @@ -2294,21 +2294,17 @@ static void unqueue_me_pi(struct futex_q *q) > spin_unlock(q->lock_ptr); > } > > -/* > - * Fixup the pi_state owner with the new owner. > - * > - * Must be called with hash bucket lock held and mm->sem held for non > - * private futexes. > - */ > static int fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q, > - struct task_struct *newowner) > + struct task_struct *argowner) > { > - u32 newtid = task_pid_vnr(newowner) | FUTEX_WAITERS; > struct futex_pi_state *pi_state = q->pi_state; > u32 uval, uninitialized_var(curval), newval; > - struct task_struct *oldowner; > + struct task_struct *oldowner, *newowner; > + u32 newtid; > int ret; > > + lockdep_assert_held(q->lock_ptr); > + > raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock); > > oldowner = pi_state->owner; > @@ -2317,11 +2313,17 @@ static int fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q, > newtid |= FUTEX_OWNER_DIED; > > /* > - * We are here either because we stole the rtmutex from the > - * previous highest priority waiter or we are the highest priority > - * waiter but have failed to get the rtmutex the first time. > + * We are here because either: > + * > + * - we stole the lock and pi_state->owner needs updating to reflect > + * that (@argowner == current), > * > - * We have to replace the newowner TID in the user space variable. > + * or: > + * > + * - someone stole our lock and we need to fix things to point to the > + * new owner (@argowner == NULL). > + * > + * Either way, we have to replace the TID in the user space variable. > * This must be atomic as we have to preserve the owner died bit here. > * > * Note: We write the user space value _before_ changing the pi_state > @@ -2334,6 +2336,42 @@ static int fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q, > * in the PID check in lookup_pi_state. > */ > retry: > + if (!argowner) { > + if (oldowner != current) { > + /* > + * We raced against a concurrent self; things are > + * already fixed up. Nothing to do. > + */ > + ret = 0; > + goto out_unlock; > + } > + > + if (__rt_mutex_futex_trylock(&pi_state->pi_mutex)) { > + /* We got the lock after all, nothing to fix. */ > + ret = 0; > + goto out_unlock; > + } > + > + /* > + * Since we just failed the trylock; there must be an owner. > + */ > + newowner = rt_mutex_owner(&pi_state->pi_mutex); > + BUG_ON(!newowner); > + } else { > + WARN_ON_ONCE(argowner != current); > + if (oldowner == current) { > + /* > + * We raced against a concurrent self; things are > + * already fixed up. Nothing to do. > + */ > + ret = 0; > + goto out_unlock; > + } > + newowner = argowner; > + } > + > + newtid = task_pid_vnr(newowner) | FUTEX_WAITERS; > + > if (get_futex_value_locked(&uval, uaddr)) > goto handle_fault; > > @@ -2434,15 +2472,28 @@ static int fixup_owner(u32 __user *uaddr, struct futex_q *q, int locked) > * Got the lock. We might not be the anticipated owner if we > * did a lock-steal - fix up the PI-state in that case: > * > - * We can safely read pi_state->owner without holding wait_lock > - * because we now own the rt_mutex, only the owner will attempt > - * to change it. > + * Speculative pi_state->owner read (we don't hold wait_lock); > + * since we own the lock pi_state->owner == current is the > + * stable state, anything else needs more attention. > */ > if (q->pi_state->owner != current) > ret = fixup_pi_state_owner(uaddr, q, current); > goto out; > } > > + /* > + * If we didn't get the lock; check if anybody stole it from us. In > + * that case, we need to fix up the uval to point to them instead of > + * us, otherwise bad things happen. [10] > + * > + * Another speculative read; pi_state->owner == current is unstable > + * but needs our attention. > + */ > + if (q->pi_state->owner == current) { > + ret = fixup_pi_state_owner(uaddr, q, NULL); > + goto out; > + } > + > /* > * Paranoia check. If we did not take the lock, then we should not be > * the owner of the rt_mutex. > diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c > index 6f3dba6e4e9e..65cc0cb984e6 100644 > --- a/kernel/locking/rtmutex.c > +++ b/kernel/locking/rtmutex.c > @@ -1290,6 +1290,19 @@ rt_mutex_slowlock(struct rt_mutex *lock, int state, > return ret; > } > > +static inline int __rt_mutex_slowtrylock(struct rt_mutex *lock) > +{ > + int ret = try_to_take_rt_mutex(lock, current, NULL); > + > + /* > + * try_to_take_rt_mutex() sets the lock waiters bit > + * unconditionally. Clean this up. > + */ > + fixup_rt_mutex_waiters(lock); > + > + return ret; > +} > + > /* > * Slow path try-lock function: > */ > @@ -1312,13 +1325,7 @@ static inline int rt_mutex_slowtrylock(struct rt_mutex *lock) > */ > raw_spin_lock_irqsave(&lock->wait_lock, flags); > > - ret = try_to_take_rt_mutex(lock, current, NULL); > - > - /* > - * try_to_take_rt_mutex() sets the lock waiters bit > - * unconditionally. Clean this up. > - */ > - fixup_rt_mutex_waiters(lock); > + ret = __rt_mutex_slowtrylock(lock); > > raw_spin_unlock_irqrestore(&lock->wait_lock, flags); > > @@ -1505,6 +1512,11 @@ int __sched rt_mutex_futex_trylock(struct rt_mutex *lock) > return rt_mutex_slowtrylock(lock); > } > > +int __sched __rt_mutex_futex_trylock(struct rt_mutex *lock) > +{ > + return __rt_mutex_slowtrylock(lock); > +} > + > /** > * rt_mutex_timed_lock - lock a rt_mutex interruptible > * the timeout structure is provided > diff --git a/kernel/locking/rtmutex_common.h b/kernel/locking/rtmutex_common.h > index 124e98ca0b17..68686b3ec3c1 100644 > --- a/kernel/locking/rtmutex_common.h > +++ b/kernel/locking/rtmutex_common.h > @@ -148,6 +148,7 @@ extern bool rt_mutex_cleanup_proxy_lock(struct rt_mutex *lock, > struct rt_mutex_waiter *waiter); > > extern int rt_mutex_futex_trylock(struct rt_mutex *l); > +extern int __rt_mutex_futex_trylock(struct rt_mutex *l); > > extern void rt_mutex_futex_unlock(struct rt_mutex *lock); > extern bool __rt_mutex_futex_unlock(struct rt_mutex *lock,