Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp6455509imu; Wed, 30 Jan 2019 15:15:11 -0800 (PST) X-Google-Smtp-Source: ALg8bN6IWoKulqCkPWTf2VD/H9ujPAaTVoo04m6r7ReFMZGKvdJIRkWTfQLoVC2KUyDI1OXTLzDf X-Received: by 2002:a17:902:704b:: with SMTP id h11mr32557814plt.157.1548890111115; Wed, 30 Jan 2019 15:15:11 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1548890111; cv=none; d=google.com; s=arc-20160816; b=Ysm8MylGQpvlHPGKfxlB7geMUpNzUzKZE79RVvO6FbPbd1SYMpXxg0Sssxmqx5lQm4 w2cxu59koQDEjyf5LD/mgEl8rF7/RJVYRAEK0nQnpL+QE4ycfVkDeqMCySRgigf4az+M PP1D+ynuVS8xEz5L1QV4JcywqontIO3cvw54I8Dm85NzL6lfJXnbHSCLMvVpAtdbA9LI iEhBlY+qhiy0XSf7HpStmow9RIR+1za1V0WvuLR5dgAqVGw+artl5ySNjFj9WKYRKkrM yRKWmdl/oIJMADHR2m8XcTjtTI/RoSCY+HZVV+UM2kkcD2Jo35yRUaxBW0K72nthsW5W iarg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date; bh=DNlpTIxPqc357mRFUEkl0xd/CzQPkDEs26V736Spy/8=; b=nQXUqzdOPnvmCzvQ7t/Na8EFYbPZ7VWgswTeySa7v3+AD3zVNXi66oI+D222u1lWFB w0bfwQhGDsTVxq4lL5jylcFFnhEwqv29C+GcddmYquE5s1GH8cp90LYY2i/6ORZcCrRX 7Vuid7KtFuGcmK77jazhd2Ru/Xt1NMaxIgHLU9Xp4zHLjO4nOE/diFxz7yaiFkv8Dyuc BlxlTU/jFfsr1OWaCSbrI2SJN/pHfLl1OpNIGJbenuxYvLtoss4zjtHO94mC1UbArRE2 u7DNvIApggK9KDVEoAGVuVyobAmmAqQ1q/92cbio7hdwiLJ4WGiIbxCfg+KFkFhBjHpT +hYw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p26si2582060pfj.244.2019.01.30.15.14.56; Wed, 30 Jan 2019 15:15:11 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728808AbfA3XOA (ORCPT + 99 others); Wed, 30 Jan 2019 18:14:00 -0500 Received: from Galois.linutronix.de ([146.0.238.70]:48391 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726531AbfA3XN7 (ORCPT ); Wed, 30 Jan 2019 18:13:59 -0500 Received: from p5492e0d8.dip0.t-ipconnect.de ([84.146.224.216] helo=nanos) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1goz3I-0007nw-69; Thu, 31 Jan 2019 00:13:52 +0100 Date: Thu, 31 Jan 2019 00:13:51 +0100 (CET) From: Thomas Gleixner To: Sebastian Sewior cc: Heiko Carstens , Peter Zijlstra , Ingo Molnar , Martin Schwidefsky , LKML , linux-s390@vger.kernel.org, Stefan Liebler , "Paul E. McKenney" Subject: Re: WARN_ON_ONCE(!new_owner) within wake_futex_pi() triggerede In-Reply-To: <20190130210733.mg6aascw2gzl3oqz@linutronix.de> Message-ID: References: <20190129132303.GE26906@osiris> <20190129151058.GG26906@osiris> <20190129171653.ycl64psq2liy5o5c@linutronix.de> <20190130094913.GC5299@osiris> <20190130125955.GD5299@osiris> <20190130132420.spwrq2d4oxeydk5s@linutronix.de> <20190130210733.mg6aascw2gzl3oqz@linutronix.de> User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 30 Jan 2019, Sebastian Sewior wrote: > On 2019-01-30 18:56:54 [+0100], Thomas Gleixner wrote: > > TBH, no clue. Below are some more traceprintks which hopefully shed some > > light on that mystery. See kernel/futex.c line 30 ... > > The robust list it somehow buggy. In the last trace we had the > handle_futex_death() of uaddr 3ff9e880140 as the last action. That means > it was an entry in 56496's ->list_op_pending entry. This makes sense > because it tried to acquire the lock, failed, got killed. The robust list of the failing task seems to be correct. > According to uaddr pid 56956 is the owner. So 56956 invoked one of > pthread_mutex_lock() / pthread_mutex_timedlock() / > pthread_mutex_trylock() and should have obtained the lock in userland. > Depending on where it got killed, that mutex should be either recorded in > ->list_op_pending or the robust_list (or both if it didn't clear > ->list_op_pending yet). But it is not. > Similar for pthread_mutex_unlock(). > We don't have a trace_point if we abort processing the list. The only reason why it would abort is due a page fault because that cannot be handled in the exit code anymore. > On the other hand, it didn't trigger on x86 for hours. Could the atomic s/hours/days/ .... > ops be the culprit? The glibc code does: THREAD_SETMEM (THREAD_SELF, robust_head.list_op_pending, (void *) (((uintptr_t) &mutex->__data.__list.__next) | 1)); .... lock in user space or lock in kernel space ENQUEUE_MUTEX_PI (mutex); THREAD_SETMEM (THREAD_SELF, robust_head.list_op_pending, NULL); ENQUEUE_MUTEX_PI() resolves to a THREAD_GETMEM() which reads the list head from TLS, some list manipulation operations and the final THREAD_SETMEM() which stores the new list head Now on x86 THREAD_GETMEM() and THREAD_SETMEM() are resolving to asm volatile ("movX .....") on s390 they are descr->member based operations. Now the important part of the robust list is the store sequence, i.e. the list head and final update to the TLS visible part need to come _before_ list_op_pending is cleared. I might be missing something, but there is no compiler barrier in that code which would prevent the compiler from reordering the stores. It can rightfully do so because there is no compiler visible dependency of these two operations. On x8664 the asm volatile might prevent it by chance, but it does not have a 'memory' specified which would guarantee a compiler barrier. On s390 there is certainly nothing. So assumed that clearing list_op_pending comes before the list head update, then the robust exit code in the kernel will fail to see either of them. FAIL. I might be wrong as usual, but this would definitely explain the fail very well. Thanks, tglx