Received: by 2002:ac0:8c9a:0:0:0:0:0 with SMTP id r26csp1519297ima; Sat, 2 Feb 2019 02:18:37 -0800 (PST) X-Google-Smtp-Source: AHgI3IacASrdxQRkSrEzeUUoYU9qmq7BlQXcCS+Mp2gWq/5bAG+WysnjU8bvwj6r9PcsgpX7f1Rl X-Received: by 2002:a63:bd51:: with SMTP id d17mr5763576pgp.443.1549102717918; Sat, 02 Feb 2019 02:18:37 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1549102717; cv=none; d=google.com; s=arc-20160816; b=LSoKwgTBSeS8BTEF5e+0lYEWrPFQQCqydmGxB2LsXnC+7ceG5QKfMXAy7+PHeWOeMa eHFy0KeG9TQEzZy2dAIc9D2kQFESghJAJQ6ooMbiBRSQvnDeZ2+pTjci8Mdysi9xzXtk 0Xt35IdZvIau0OIXkWOq+3RWutMDDC7sMiUrmu727wHJJe8HqEZgN1N/RhSF1oXtSgXu U0pozbdIMscXoVUfmgLXpRuEB6IxgBDgZCgw97yLReWIdishRYO1mkz4MpEGCFtVNcWc W3/ZynOBs+mvX4+4XhZXVFOMQLF9pUxE7RaAr+YbcskbhADec2bt+M/opMABG8vbf4MD k8Ew== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date; bh=MNUcV6YqNfHXNVni5njk7dl8gDXcFW1LTKLyhXZM9JM=; b=mGZq5Nyk03t2D9ppUjBbo0UJ6DpBfncs87AXuBiAY1LS5O2HNy5RMyBZaiBVaKGcyf V1eJNw+sDx3IUGkA/ftpEBT8TAFF0BIqUdbP+qP2yjbwsCKS3GEMT/YDismnKeRcvddp 4cPwRcR5MvXYgKPX6J/hTBNVVvl6pdyPHyLnQwgyjipD8599BBwj4A784Fmoj3hMpuuy M/CAyJu42917YA9Y4HDaY7Hil5qk1TsrWL3jcSyLHpQOqAX8lmjVI2awfWMsL0260dOB eckP6xmwBQW9X0yIewCBiW66cQWqnoaPD5HjHO2NZVsV04Xlu5Y5D+UNvEV7tlA+4RES uv8w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a68si10550892pla.267.2019.02.02.02.18.22; Sat, 02 Feb 2019 02:18:37 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727454AbfBBKOg (ORCPT + 99 others); Sat, 2 Feb 2019 05:14:36 -0500 Received: from Galois.linutronix.de ([146.0.238.70]:53111 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726255AbfBBKOf (ORCPT ); Sat, 2 Feb 2019 05:14:35 -0500 Received: from p5492e0d8.dip0.t-ipconnect.de ([84.146.224.216] helo=nanos) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1gpsJg-0002Jm-Rm; Sat, 02 Feb 2019 11:14:29 +0100 Date: Sat, 2 Feb 2019 11:14:27 +0100 (CET) From: Thomas Gleixner To: Heiko Carstens cc: Sebastian Sewior , "Paul E. McKenney" , Peter Zijlstra , Ingo Molnar , Martin Schwidefsky , LKML , linux-s390@vger.kernel.org, Stefan Liebler Subject: Re: WARN_ON_ONCE(!new_owner) within wake_futex_pi() triggerede In-Reply-To: <20190202091043.GA3381@osiris> Message-ID: References: <20190130210733.mg6aascw2gzl3oqz@linutronix.de> <20190130233557.GA4240@linux.ibm.com> <20190131165228.GA32680@osiris> <20190131170653.spnrxsiblkssleyd@linutronix.de> <20190201161227.GG3770@osiris> <20190202091043.GA3381@osiris> User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, 2 Feb 2019, Heiko Carstens wrote: > On Fri, Feb 01, 2019 at 10:59:08PM +0100, Thomas Gleixner wrote: > > Were you able to capture a trace with the last set of additional trace > > printks? > > Of course I forgot to collect that, sorry! But just reproduced; see > log below (last 1000 lines) and attachment for full log. The failing futex is here: <...>-48786 [002] .... 337.231645: sys_futex(uaddr: 3ff90c00460, op: 6, val: 1, utime: 0, uaddr2: 4, val3: 0) <...>-48786 [002] .... 337.231646: attach_to_pi_owner: Missing pid 49011 <...>-48786 [002] .... 337.231646: handle_exit_race: uval2 vs uval 8000bf73 vs 8000bf73 (-1) <...>-48786 [002] .... 337.231741: sys_futex -> 0xfffffffffffffffd Lets look were it was handled in the kernel right before that: <...>-49014 [006] .... 337.215675: sys_futex(uaddr: 3ff90c00460, op: 7, val: 3ff00000007, utime: 3ff8d3f8910, uaddr2: 3ff8d3f8910, val3: 3ffc64fe8f7) <...>-49014 [006] .... 337.215675: do_futex: uaddr: 3ff90c00460 cur: 8000bf76 new: 0 49014 unlocks the futex in the kernel and due to lack of waiters it sets it to unlocked ---> new: 0. Between this and the failing sys_futex() invocation, the missing task exits: <...>-49011 [000] .... 337.221543: handle_futex_death: uaddr: 3ff90c00a00 pi: 1 ... <...>-49011 [000] .... 337.221547: handle_futex_death: uaddr: 3ff90c00488 success <...>-49011 [000] .... 337.221548: sched_process_exit: comm=ld64.so.1 pid=49011 prio=120 but it does not have futex 3ff90c00460 in its robust list. So after the unlock @timestamp 337.215675 the kernel does not deal with that futex at all until the failed lock attempt where it rightfully rejects the attempt due to the alleged owner being gone. So this looks more like user space doing something stupid... As we talked about the missing barriers before, I just looked at pthread_mutex_trylock() and that does still: if (robust) { ENQUEUE_MUTEX_PI (mutex); THREAD_SETMEM (THREAD_SELF, robust_head.list_op_pending, NULL); } So it's missing the barriers which pthread_mutex_lock() has. Grasping for straws obviously.... Thanks, tglx