Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759077AbcLAT3x (ORCPT ); Thu, 1 Dec 2016 14:29:53 -0500 Received: from mail-bn3nam01on0087.outbound.protection.outlook.com ([104.47.33.87]:35520 "EHLO NAM01-BN3-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1754235AbcLAT3s (ORCPT ); Thu, 1 Dec 2016 14:29:48 -0500 Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=David.Daney@cavium.com; Message-ID: <58406458.7060606@caviumnetworks.com> Date: Thu, 1 Dec 2016 09:56:40 -0800 From: David Daney User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130625 Thunderbird/17.0.7 MIME-Version: 1.0 To: Thomas Gleixner CC: LKML , David Daney , Peter Zijlstra , "Ingo Molnar" , Steven Rostedt , "Sebastian Siewior" , Will Deacon , "Mark Rutland" , Subject: Re: [patch 1/4] rtmutex: Prevent dequeue vs. unlock race References: <20161130205431.629977871@linutronix.de> <20161130210030.351136722@linutronix.de> In-Reply-To: <20161130210030.351136722@linutronix.de> Content-Type: text/plain; charset="ISO-8859-15"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [50.233.148.156] X-ClientProxiedBy: CO2PR07CA0032.namprd07.prod.outlook.com (10.141.194.170) To SN1PR07MB2144.namprd07.prod.outlook.com (10.164.47.14) X-MS-Office365-Filtering-Correlation-Id: baadf46d-e3d9-4327-4f49-08d41a136a36 X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:(22001);SRVR:SN1PR07MB2144; X-Microsoft-Exchange-Diagnostics: 1;SN1PR07MB2144;3:WngriAMocQzfH27ilURHcz/vrSvUKp2TX4xK6Ilsxx5TA+g+WS42IWUrv5CajBYDFSKPzcBYzohsYedvLPcm/LxV08KL1k8hwaU65vmyxdX4dvA1p8dAxq+SEKGA2FFtGC8SJVk2Dergg9v6q9uBERReeXXDtSQG0hB5doTUkehUpmnD4Z6YSW/dan7YuGimtJ2gs+REgZ0GcBcQWY2+Qazwkid/H9rCyIui7BfXdMKE5fHlpAhCir7vtzlQvIImgjPB6ThJssWuWHRf5TsEdQ==;25:LuKonvugUbXWG//3ntYq8zIlVQrUW894wTOixeDXhZd/tUXojWC4JkkW0jH0MowjaOPKpxAELjsgMFvhW8A9PRVcKGaQd6dVgv+WnWv3bQbEe7ahp3z14p6tlyX0HCu8kIXVmcZ0uXj6FoYLSllpGtZUyPWo0gYazqpsTwGj7+6Ivf4x37LX3SExHZBO08gZnRB21u8IMU0h1+4ZL97BSDWl46ftDTtvLPgm4gq7MCmbL8HsI7+18MZd6xlS82Jrimcm2UoH4QG4zIYzNPmaUfqgZxL/UvkDwYI0tqCIEOdTOCBI9l7VdlJ09ouHqJcwvAALXP3PKyo2wuE/ePeYNT0JoO6AHMVav3MZ+IVx4tyxsAz95D9cUlwoudxFA1DHPyd+7Qibu3rwOqNldcIHYSlEjSd6b2rUJ9jrns2aD2LKXMlo+kTKAamai2zfoTpqvFguj6Ekch7rS8gD16hDpg== X-Microsoft-Exchange-Diagnostics: 1;SN1PR07MB2144;31:jez0/xgus5NUpSQ5v7GS0XPetIzxKUgs4UK0rBrOTvmaAhiP4dA/cdyMPDngFhQBFQBVf1YYUjdEGzMM+Ue8gyIe2Cvrvm+lHnp88mxpsj80Dzl3P6x1SqAJHQixHC8uYyojERfekF2WqWKbgKCpyrl7nK8RPcifSx5aDgCBIkFKitdHOYp9gEEYOChMqV8dyGi5JGnSz53nzk4Y/ytnX9rcrHlSWBcByGO/eUKP9g9+YNtFc0YpMjXA97Y8yBJRULCXfB6NqNDKsCWu6CmqAg==;20:c15NLJieBcmbz+lEv58keoTu9/qz4ICEBITX1Unn/P9tg0PBmPxTevodtbizbLOd7778xmsONSx9DOc7sPU8JSBqDO3sptynnN9m2wYHfjMch2RX3zWmhSSivRfJmtv0QugTaP7LnpGRoXLSJRp+I6O+lbbpessLRwRlxQLSUkqjGi56+mrsmxKJaY2IvxsaPmxwvdPYUiz0/pLuZrqxau1Ob4RVm9El2hCIg2wOHK2EMsRWhswCCfdJreo7O9gmprBbPUF5gDN9Y+zdMWL2LyQieUkdjImmqsoN7yDg4OBfjRvXz90Qt8L3WJnm1UAs4JW7Wtrv8OkCKLHKL2awFNfbQ/8oP+OfcBJOoOTSZgKjU0LuY2/Dq/2iTBVth9CcLSA69CjSh6veYGQffwqpJq2F3C8BqdyyAmLXaVCE2tWczthri1DIeczOE1Rz6X570kCAFtUFMdWnU6wr1j+8HrNTlmekZ497TPIeh9k4bh1/9LEl7VtIPpiL8An1TIa0oXe4tzoIwJ2LVbYu+QbP0cegeOC81TMc5oF0ZfvEtMIynbzmIBFeRsXTgV51qP+8LZHYbOSdAQVvl1biks+SHECypM7noF8pBrR3GhSyFBE= X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(9452136761055); X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(6040375)(601004)(2401047)(5005006)(8121501046)(3002001)(10201501046)(6041248)(20161123560025)(20161123564025)(20161123562025)(20161123555025)(20161123558021)(6072148);SRVR:SN1PR07MB2144;BCL:0;PCL:0;RULEID:;SRVR:SN1PR07MB2144; X-Microsoft-Exchange-Diagnostics: 1;SN1PR07MB2144;4:3SYrApicBWAiMCoX5Zc67KXup5/GAEhRMg7UTCnqIs+0a3Qj+PLAyX9c50d8irjjhMGalZyhnzg/GbM34NEx95d/Y4H9kt5qZTs+sCxY8fdCX8ZWoUSJL21bxJ9pOcDCoE3sb5cfVqXzFJvTAtqJ0K9MsixjK+IhSXwnFC0QIG2/8pW68vT1CGnV/p6wpkvVgPnSzTy/DBd3N4aA3FBFZwP5JmE8B9LzFwDgY7+330oEin4Co8trf1phpF+u2JcnNvFUySGPvL6BkhCZcl7BnD+ZsnusDC82KovL5RXWdL7dq8pGIdIFRHTK9FUgKioT8WN6rPx/7c+H8jILYFqTYaEa/CdmD4e/sSIdlX7rfxwrrF9wtxNGqDOFmlrSeXCX2G1JQmcV9e2CE8ak/s0U4v5fLMVXzH+Jqxk6AHJwxIX+eyGCCf0AdBHILIpIZVcXTETWjjFZZUHuNj/5naBEnxAAZYP5KwtjsNSYGT1iUEyMhNmQI2e8Nopr5LCeei+CYeveexH4EazNIDflWom9Gqzf204W98B7P1N5wTOQR1XC1TMHqFUt54yMpCsqPDedAUk+X0YEDXKl1mhDtxYnYRJ9AFCTy3Ij+wFCNeUZNuTcLBm59pZzl3Wv7bSapvQr9imH83RO6iAC7tgkwGi68g== X-Forefront-PRVS: 014304E855 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10009020)(4630300001)(6009001)(7916002)(24454002)(377454003)(189002)(199003)(101416001)(50466002)(64126003)(92566002)(36756003)(47776003)(4001350100001)(39410400001)(6486002)(6666003)(733004)(38730400001)(23756003)(39450400002)(8676002)(3846002)(6116002)(97736004)(50986999)(76176999)(54356999)(69596002)(99136001)(87266999)(65816999)(4326007)(189998001)(7846002)(2950100002)(305945005)(66066001)(65956001)(106356001)(65806001)(230700001)(110136003)(105586002)(117636001)(6916009)(59896002)(53416004)(7736002)(42186005)(2906002)(229853002)(5660300001)(42882006)(68736007)(81166006)(81156014)(83506001)(122286003)(62816006);DIR:OUT;SFP:1101;SCL:1;SRVR:SN1PR07MB2144;H:dl.caveonetworks.com;FPR:;SPF:None;PTR:InfoNoRecords;MX:1;A:1;LANG:en; X-Microsoft-Exchange-Diagnostics: =?iso-8859-15?Q?1;SN1PR07MB2144;23:W03Thxqqmgq/ZofTgjqYczewiGJcZSUh4ptlEq?= =?iso-8859-15?Q?6UlHNQzF7vPl1upGatjYyMYSstDHQEJAfvm1/aZ2SExw0yPIxMDjO5Y4/?= =?iso-8859-15?Q?XE1rFKfymzbp45kuAPYAzQ/HlvLXZZO2xjc+3arpqzZZbZ+puwS6m75s+?= =?iso-8859-15?Q?+JNEtKQ69lx+SysQTjuGfJPXhDkkdJTG2QMp4EGWjL8dIpnjYiAJs1gHL?= =?iso-8859-15?Q?p3S+wUBCvHlmqQwAQO9QpfW3mgQTauyKKd9++aUsAypsyEh6MfQ8J7BnQ?= =?iso-8859-15?Q?YKHBT8zBtvNa9wnBhdQLjYnBZI1li40gVV0Cx8toScdAnAKu1J1dUjlLz?= =?iso-8859-15?Q?Rjf/LH9GvFjQwUXvYZb6aKMvdRLVec9UE96nmSHpVS0+nsk03aLV7GIzx?= =?iso-8859-15?Q?31XHUUaDfmARjerZ+RZ6sxRXFra9kzwyylsw9ky0ZqJ7xwdI17e54gUhO?= =?iso-8859-15?Q?qU/K0FKVV+a3VhvMRXQkiEaGBAnVptpH04u67ox4ju3aCzzkfyt+Jkhry?= =?iso-8859-15?Q?HtZBF0rfMJit6BnVp+3q4khq0/S7LszwQhk625NpYOe+XBHH0C1zAmajn?= =?iso-8859-15?Q?xwpFqv7QzdnLQk6B65z9diKV+R8XZvvOrfnd1s9e9OFVAkKiUy+8JBYku?= =?iso-8859-15?Q?97hsPIxocU/JRaxjZ8BL+8PbBANWwgJtFP3ynw67YN8OV4B1FJITT6n2Q?= =?iso-8859-15?Q?doVgt9Rc7cL7ODlYdQzu4IvqlrCWtbONV1ETrrpzSAoGezbGBZ/eYcOII?= =?iso-8859-15?Q?ovM+3/q8OZo0xi/p4Mc+dD6CZeuvi33D8bSxkt6neJi3BqNzowZHqeQca?= =?iso-8859-15?Q?uHsoDmNSF1Qw6aRhAP5yMpLL1zRfrBMujyHnJ1FDiY/GnZ5XHGkJCK+m2?= =?iso-8859-15?Q?uGOkJCqNQVQYaabwn6AucPozv1JWaETZsRdQa1uoIoPjCuGup2NVkRHhf?= =?iso-8859-15?Q?JMTLpoxw2RA34pzTblQ4nkpQodndX5BPK+vjSuK/4Paz96BRJ4qE8UpoB?= =?iso-8859-15?Q?hLsObDzQeDgrbgW9yPDZf4kZGnLx4T+3I2YP6a29zGXCymwG26D4M78yV?= =?iso-8859-15?Q?BFtqsoXQxTrkZJ4Ae6EJd52hVlCxaFMfu4mXcoNHFr3HT2uTumgb1NxHR?= =?iso-8859-15?Q?+ylGBYSyFHvByfM+S9A9Wb3YQ8JgzV1A0ihegpULDN0K5sffVzk6rAdlZ?= =?iso-8859-15?Q?FBO5H6W+dMaFx7/47m4cGxY6L4ummmpz7odnOVEC9G6KiCxtT1oL32JBs?= =?iso-8859-15?Q?dSHN0xckuXEmKJ73cCIEbMLBm3lwZRfh7cpWrYOaMUrGev8f9W15HGQOe?= =?iso-8859-15?Q?eAGSBENK5Y4LdtUjpifzn8pEmLZrXHSP0JeKMqjSjBY234aFjBV7XIBOA?= =?iso-8859-15?Q?fDsxqnSo/CJkPWRTYwTkEbRxRR83xe5v1NuS5GYt8KrAun3ZxTnTqCDR0?= =?iso-8859-15?Q?aJbJu/Wzsf6CWpZqPg7/YFH3JcT5PeeFouzHx02ycXCEkz3y0lEiVwDnr?= =?iso-8859-15?Q?EGhhXh02GtcMTXHJi/KJsWWw=3D=3D?= X-Microsoft-Exchange-Diagnostics: 1;SN1PR07MB2144;6:ULwDiynL6y8wsO8aJevwXKgnVREyyVY35SqcBXrVlNfnfeUoorbooPodFQznvwm/kxsvK6c0Q+UW4wM+Dsb5JxaKesec8r5m+p3IKreN9gG4PvrqYP1SvGR1RNdcVQqTmXASve9XiLSvXFL93sGi+UjpJk1ajdYTDq9gnfv3V9VN0BOaWvpIIs8CydlvGrS1BFXUw5/ra1Yr39qBsLgrFPT6yBL8RmG2UWRBuASzbRIJnJXJqU8mo0nh70nZjnJfhIonnmVrd7R1TX5ZH9qy6sIcHYVYu2POIBLM78jvwv6WWh2ReB3V3NfpxVyywKNUXn3/CI7ThYyI8fH7ibyFu5XoHeoTVkyS7zaC/5nE44eJ9GjmNVSmywqN2YB9SVdZQUIQBI1d17/6tdawCSkt5MpoNry/NaNVozdtFyJk2gso3O0Hn4iyEDs0MbzUMo/hq6JQ8wJXWCOUqqP/tezlzg==;5:8lq4BH43CIYWlqzahsewJ8oo4HyuDaWj/zlTB7znNCyV3HJlL5O/s2eBggLxXHC+A7B5eGol2QNL6emCJbAglhxYsxSrVNBdA9dfjtZ2OVetWr+q+inMrL9LMM62xmvfxItdLuHm7TKeM/co0pPtIVeuSIYymNo/FsODVM4hEoA=;24:Lx2RRiCofzxiY3cncy4XaLJWxt+nggTCkX14HcKYJxFJNColi7mJI1cam9J8Sjh7iV+yAAk1SV0b/Rco3VcJ/ukCRmsQ9dzi6ruQqZd/kfk= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1;SN1PR07MB2144;7:yOziZ2AfLorhTRPPBZE2GC4dKZsIr/b87MiqTRx/1+lXDrG4nrf4PRK9iDCjxHxeaBP3VX75jNH/IwgvktQ5SQi0Sj/2sGi1CBEerNJquYqNi5ybGasWrNpxPZuS3XrECJXl3w+jarO8E02j9SQZOD9ow22uUG2iZClmKBJU1ouSK7j6iMgStxzJ26r+4+PLrSQ671aTgAUof0J7sLabQYVnwU/bdW+5YMPa7FZu9FkWrBVX8xizhw7heU7twfj19+SEgBNRI25ObneIrBODTer5Y9veYTxX5nUlIi37qunvu0mWRQ5BQC2y6x0ZbaJA5oXkmZMbMai2dLdm7IAJewVom5iA+IrH3jX0sWTftMDp038QuhLd88mVnTAZ/OhMcW7xFyUCk3uHori+r3ibb4z+M+ZE4xnbU86SKKSFEr/rNpZopaIuK9roytutVsKWmWk9HwywjCCRn1RZaO/hYw== X-OriginatorOrg: caviumnetworks.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 01 Dec 2016 17:56:44.5770 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: SN1PR07MB2144 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5511 Lines: 177 On 11/30/2016 01:04 PM, Thomas Gleixner wrote: > David reported a futex/rtmutex state corruption. It's caused by the > following problem: > > CPU0 CPU1 CPU2 > > l->owner=T1 > rt_mutex_lock(l) > lock(l->wait_lock) > l->owner = T1 | HAS_WAITERS; > enqueue(T2) > boost() > unlock(l->wait_lock) > schedule() > > rt_mutex_lock(l) > lock(l->wait_lock) > l->owner = T1 | HAS_WAITERS; > enqueue(T3) > boost() > unlock(l->wait_lock) > schedule() > signal(->T2) signal(->T3) > lock(l->wait_lock) > dequeue(T2) > deboost() > unlock(l->wait_lock) > lock(l->wait_lock) > dequeue(T3) > ===> wait list is now empty > deboost() > unlock(l->wait_lock) > lock(l->wait_lock) > fixup_rt_mutex_waiters() > if (wait_list_empty(l)) { > owner = l->owner & ~HAS_WAITERS; > l->owner = owner > ==> l->owner = T1 > } > > lock(l->wait_lock) > rt_mutex_unlock(l) fixup_rt_mutex_waiters() > if (wait_list_empty(l)) { > owner = l->owner & ~HAS_WAITERS; > cmpxchg(l->owner, T1, NULL) > ===> Success (l->owner = NULL) > l->owner = owner > ==> l->owner = T1 > } > > That means the problem is caused by fixup_rt_mutex_waiters() which does the > RMW to clear the waiters bit unconditionally when there are no waiters in > the rtmutexes rbtree. > > This can be fatal: A concurrent unlock can release the rtmutex in the > fastpath because the waiters bit is not set. If the cmpxchg() gets in the > middle of the RMW operation then the previous owner, which just unlocked > the rtmutex is set as the owner again when the write takes place after the > successfull cmpxchg(). > > The solution is rather trivial: Verify that the owner member of the rtmutex > has the waiters bit set before clearing it. This does not require a > cmpxchg() or other atomic operations because the waiters bit can only be > set and cleared with the rtmutex wait_lock held. It's also safe against the > fast path unlock attempt. The unlock attempt via cmpxchg() will either see > the bit set and take the slowpath or see the bit cleared and release it > atomically in the fastpath. > > It's remarkable that the test program provided by David triggers on ARM64 > and MIPS64 really quick, but it refuses to reproduce on x8664, while the > problem exists there as well. That refusal might explain that this got not > discovered earlier despite the bug existing from day one of the rtmutex > implementation more than 10 years ago. > > Thanks to David for meticulously instrumenting the code and providing the > information which allowed to decode this subtle problem. > > Fixes: 23f78d4a03c5 ("[PATCH] pi-futex: rt mutex core") > Reported-by: David Daney > Signed-off-by: Thomas Gleixner > Cc:stable@vger.kernel.org FWIW: Tested-by: David Daney ... on arm64 and mips64 where it fixes the failures we were seeing. Thanks to Thomas for taking the time to work through this thing. David Daney > --- > kernel/locking/rtmutex.c | 68 +++++++++++++++++++++++++++++++++++++++++++++-- > 1 file changed, 66 insertions(+), 2 deletions(-) > > --- a/kernel/locking/rtmutex.c > +++ b/kernel/locking/rtmutex.c > @@ -65,8 +65,72 @@ static inline void clear_rt_mutex_waiter > > static void fixup_rt_mutex_waiters(struct rt_mutex *lock) > { > - if (!rt_mutex_has_waiters(lock)) > - clear_rt_mutex_waiters(lock); > + unsigned long owner, *p = (unsigned long *) &lock->owner; > + > + if (rt_mutex_has_waiters(lock)) > + return; > + > + /* > + * The rbtree has no waiters enqueued, now make sure that the > + * lock->owner still has the waiters bit set, otherwise the > + * following can happen: > + * > + * CPU 0 CPU 1 CPU2 > + * l->owner=T1 > + * rt_mutex_lock(l) > + * lock(l->lock) > + * l->owner = T1 | HAS_WAITERS; > + * enqueue(T2) > + * boost() > + * unlock(l->lock) > + * block() > + * > + * rt_mutex_lock(l) > + * lock(l->lock) > + * l->owner = T1 | HAS_WAITERS; > + * enqueue(T3) > + * boost() > + * unlock(l->lock) > + * block() > + * signal(->T2) signal(->T3) > + * lock(l->lock) > + * dequeue(T2) > + * deboost() > + * unlock(l->lock) > + * lock(l->lock) > + * dequeue(T3) > + * ==> wait list is empty > + * deboost() > + * unlock(l->lock) > + * lock(l->lock) > + * fixup_rt_mutex_waiters() > + * if (wait_list_empty(l) { > + * l->owner = owner > + * owner = l->owner & ~HAS_WAITERS; > + * ==> l->owner = T1 > + * } > + * lock(l->lock) > + * rt_mutex_unlock(l) fixup_rt_mutex_waiters() > + * if (wait_list_empty(l) { > + * owner = l->owner & ~HAS_WAITERS; > + * cmpxchg(l->owner, T1, NULL) > + * ===> Success (l->owner = NULL) > + * > + * l->owner = owner > + * ==> l->owner = T1 > + * } > + * > + * With the check for the waiter bit in place T3 on CPU2 will not > + * overwrite. All tasks fiddling with the waiters bit are > + * serialized by l->lock, so nothing else can modify the waiters > + * bit. If the bit is set then nothing can change l->owner either > + * so the simple RMW is safe. The cmpxchg() will simply fail if it > + * happens in the middle of the RMW because the waiters bit is > + * still set. > + */ > + owner = READ_ONCE(*p); > + if (owner & RT_MUTEX_HAS_WAITERS) > + WRITE_ONCE(*p, owner & ~RT_MUTEX_HAS_WAITERS); > } > > /* > >