Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751300AbbLKRn5 (ORCPT ); Fri, 11 Dec 2015 12:43:57 -0500 Received: from mail-lb0-f170.google.com ([209.85.217.170]:34766 "EHLO mail-lb0-f170.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750814AbbLKRn4 (ORCPT ); Fri, 11 Dec 2015 12:43:56 -0500 MIME-Version: 1.0 Date: Fri, 11 Dec 2015 09:43:54 -0800 X-Google-Sender-Auth: o6ceK3UW1fXBqWuEZZdXR-t4dro Message-ID: Subject: Re: Commit 81a43adae3b9 (locking/mutex: Use acquire/release semantics) causing failures on arm64 (ThunderX) From: Andrew Pinski To: Will Deacon , dave@stgolabs.net Cc: Peter Zijlstra , Andrew , Davidlohr Bueso , Thomas Gleixner , "Paul E. McKenney" , Ingo Molnar , Linux Kernel Mailing List , "linux-arm-kernel@lists.infradead.org" , David Daney Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1615 Lines: 42 On Fri, Dec 11, 2015 at 6:18 AM, Davidlohr Bueso wrote: > > On Fri, 11 Dec 2015, Will Deacon wrote: > >>I think Andrew meant the atomic_xchg_acquire at the start of osq_lock, >>as opposed to "compare and swap". In which case, it does look like >>there's a bug here because there is nothing to order the initialisation >>of the node fields with publishing of the node, whether that's >>indirectly as a result of setting the tail to the current CPU or >>directly as a result of the WRITE_ONCE. > > Sorry I'm late to the party. > > Duh yes this is obviously bogus, and worse I recall triggering a similar tail initialization issue in osq_lock on some experimental work on x86, so this is very much a point of failure. Ack. > >> >>Andrew, David: does making that atomic_xchg_acquire and atomic_xchg fix >>things for you? Yes that works for me. And yes that looks like the correct fix. >> >>I don't fully grok what 81a43adae3b9 has to do with any of this, so >>maybe there's another bug too. > > I think this is mainly because mutex_optimistic_spin is where the stack shows the lockup, which really translates to c55a6ffa62. Yes as mutex_optimistic_spin calls into osq_lock/osq_unlock. And 81a43adae3b9 changed mutex.c which David thought was where the issue was located rather than not what mutex_optimistic_spin called. Thanks, Andrew Pinski > > Thanks, > Davidlohr -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/