Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753112AbdHJMO4 (ORCPT ); Thu, 10 Aug 2017 08:14:56 -0400 Received: from terminus.zytor.com ([65.50.211.136]:55881 "EHLO terminus.zytor.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752537AbdHJMOw (ORCPT ); Thu, 10 Aug 2017 08:14:52 -0400 Date: Thu, 10 Aug 2017 05:11:19 -0700 From: tip-bot for Prateek Sood Message-ID: Cc: peterz@infradead.org, hpa@zytor.com, tglx@linutronix.de, torvalds@linux-foundation.org, linux-kernel@vger.kernel.org, prsood@codeaurora.org, mingo@kernel.org Reply-To: tglx@linutronix.de, hpa@zytor.com, peterz@infradead.org, mingo@kernel.org, prsood@codeaurora.org, linux-kernel@vger.kernel.org, torvalds@linux-foundation.org In-Reply-To: <1500040076-27626-1-git-send-email-prsood@codeaurora.org> References: <1500040076-27626-1-git-send-email-prsood@codeaurora.org> To: linux-tip-commits@vger.kernel.org Subject: [tip:locking/core] locking/osq_lock: Fix osq_lock queue corruption Git-Commit-ID: 50972fe78f24f1cd0b9d7bbf1f87d2be9e4f412e X-Mailer: tip-git-log-daemon Robot-ID: Robot-Unsubscribe: Contact to get blacklisted from these emails MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=UTF-8 Content-Disposition: inline Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3444 Lines: 131 Commit-ID: 50972fe78f24f1cd0b9d7bbf1f87d2be9e4f412e Gitweb: http://git.kernel.org/tip/50972fe78f24f1cd0b9d7bbf1f87d2be9e4f412e Author: Prateek Sood AuthorDate: Fri, 14 Jul 2017 19:17:56 +0530 Committer: Ingo Molnar CommitDate: Thu, 10 Aug 2017 12:28:54 +0200 locking/osq_lock: Fix osq_lock queue corruption Fix ordering of link creation between node->prev and prev->next in osq_lock(). A case in which the status of optimistic spin queue is CPU6->CPU2 in which CPU6 has acquired the lock. tail v ,-. <- ,-. |6| |2| `-' -> `-' At this point if CPU0 comes in to acquire osq_lock, it will update the tail count. CPU2 CPU0 ---------------------------------- tail v ,-. <- ,-. ,-. |6| |2| |0| `-' -> `-' `-' After tail count update if CPU2 starts to unqueue itself from optimistic spin queue, it will find an updated tail count with CPU0 and update CPU2 node->next to NULL in osq_wait_next(). unqueue-A tail v ,-. <- ,-. ,-. |6| |2| |0| `-' `-' `-' unqueue-B ->tail != curr && !node->next If reordering of following stores happen then prev->next where prev being CPU2 would be updated to point to CPU0 node: tail v ,-. <- ,-. ,-. |6| |2| |0| `-' `-' -> `-' osq_wait_next() node->next <- 0 xchg(node->next, NULL) tail v ,-. <- ,-. ,-. |6| |2| |0| `-' `-' `-' unqueue-C At this point if next instruction WRITE_ONCE(next->prev, prev); in CPU2 path is committed before the update of CPU0 node->prev = prev then CPU0 node->prev will point to CPU6 node. tail v----------. v ,-. <- ,-. ,-. |6| |2| |0| `-' `-' `-' `----------^ At this point if CPU0 path's node->prev = prev is committed resulting in change of CPU0 prev back to CPU2 node. CPU2 node->next is NULL currently, tail v ,-. <- ,-. <- ,-. |6| |2| |0| `-' `-' `-' `----------^ so if CPU0 gets into unqueue path of osq_lock it will keep spinning in infinite loop as condition prev->next == node will never be true. Signed-off-by: Prateek Sood [ Added pictures, rewrote comments. ] Signed-off-by: Peter Zijlstra (Intel) Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: sramana@codeaurora.org Link: http://lkml.kernel.org/r/1500040076-27626-1-git-send-email-prsood@codeaurora.org Signed-off-by: Ingo Molnar --- kernel/locking/osq_lock.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c index a316794..a74ee6a 100644 --- a/kernel/locking/osq_lock.c +++ b/kernel/locking/osq_lock.c @@ -109,6 +109,19 @@ bool osq_lock(struct optimistic_spin_queue *lock) prev = decode_cpu(old); node->prev = prev; + + /* + * osq_lock() unqueue + * + * node->prev = prev osq_wait_next() + * WMB MB + * prev->next = node next->prev = prev // unqueue-C + * + * Here 'node->prev' and 'next->prev' are the same variable and we need + * to ensure these stores happen in-order to avoid corrupting the list. + */ + smp_wmb(); + WRITE_ONCE(prev->next, node); /*