Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp4898564imu; Tue, 29 Jan 2019 09:18:38 -0800 (PST) X-Google-Smtp-Source: ALg8bN6rF8nkf3aTWH9xuqYTNS91lN5WSBSbxu47vXNIkX+7efyXnP8Eh/8NceAr1B+EtECza25E X-Received: by 2002:a17:902:6848:: with SMTP id f8mr26148682pln.300.1548782318054; Tue, 29 Jan 2019 09:18:38 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1548782318; cv=none; d=google.com; s=arc-20160816; b=A9MjGgHWQggYn2TU9JttuKGdWuGvilOSIysAvUyihz43HWSv1EG5H9cf6ufdxVfOKj +o1LPLNVtguBdQvtSM2SXycpQvEmFazAQ6py4qX7NyAQkKRFqf4Vn9wQSsgd5HVQ/zcf YZzw2LdArQ3IcD9cNpChMtcHY8UONUEZ41yBus0e6yAC91aG4VKoov0/+LpHg2d2fsR2 Ph127CCOEqVykMs8u/EnJUs/PGLh52N/U22sijBhQRKt/U3Pt/y7NVQtkfCXDK30CSoM Pk2K9flkcq50/9fH86TN7gvnz6ryQHxl2T9cE2agGljEzkmldSphsx8bvuZy+aVDHT69 uhnA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=PYq1XKYT9EfFWssMViuCqu//m5NaetS3hcPByOWJ7Mg=; b=y+NOZjI8mjM9B5TqzaZen5HF8OhrGBDjvmCfoXuV5pnuryhefFc37NMepDX2dUr6gJ TofSeKGohNpEKDXx+dB8bT0Z3c4xgcU36R7UIjQJPhZYk01v9tn1Y3UmcE4aE+YXo5qZ AoL6kUOp6IG5ofrvexfouNQ7aawOSi9kg2qjeQY3BtX7qTNuPT2oP4chAdUl2Vao2ugF kzxHrBwwk5baE81yChCx1f4hQD8Y4eKoPvqStGGgaIq837k1LLmFH+NKGjxMQ7t9pfH/ MAOUC8Bf1+XNzO0HGTCsCIkuITjAdAGLvgRv8zC7jmzmLoM3ilTbwt2gb3dVArrpdm+P kzNA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c37si11243417pgm.156.2019.01.29.09.18.15; Tue, 29 Jan 2019 09:18:38 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729017AbfA2RQ7 convert rfc822-to-8bit (ORCPT + 99 others); Tue, 29 Jan 2019 12:16:59 -0500 Received: from Galois.linutronix.de ([146.0.238.70]:44891 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727240AbfA2RQ7 (ORCPT ); Tue, 29 Jan 2019 12:16:59 -0500 Received: from bigeasy by Galois.linutronix.de with local (Exim 4.80) (envelope-from ) id 1goX0H-0002t1-7p; Tue, 29 Jan 2019 18:16:53 +0100 Date: Tue, 29 Jan 2019 18:16:53 +0100 From: Sebastian Sewior To: Heiko Carstens Cc: Thomas Gleixner , Peter Zijlstra , Ingo Molnar , Martin Schwidefsky , LKML , linux-s390@vger.kernel.org, Stefan Liebler Subject: Re: WARN_ON_ONCE(!new_owner) within wake_futex_pi() triggered Message-ID: <20190129171653.ycl64psq2liy5o5c@linutronix.de> References: <20190128134410.GA28485@hirez.programming.kicks-ass.net> <20190128135804.GB28878@hirez.programming.kicks-ass.net> <20190129090108.GA26906@osiris> <20190129102409.GB26906@osiris> <20190129103557.GF28485@hirez.programming.kicks-ass.net> <20190129132303.GE26906@osiris> <20190129151058.GG26906@osiris> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8BIT In-Reply-To: <20190129151058.GG26906@osiris> User-Agent: NeoMutt/20180716 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2019-01-29 16:10:58 [+0100], Heiko Carstens wrote: > Finally... the trace output is quite large with 26 MB... Therefore an > xz compressed attachment. Hope that's ok. > > The kernel used was linux-next 20190129 + your patch. | ld64.so.1-10237 [006] .... 14232.031726: sys_futex(uaddr: 3ff88e80618, op: 7, val: 3ff00000007, utime: 3ff88e7f910, uaddr2: 3ff88e7f910, val3: 3ffc167e8d7) FUTEX_UNLOCK_PI | SHARED | ld64.so.1-10237 [006] .... 14232.031726: sys_futex -> 0x0 … | ld64.so.1-10237 [006] .... 14232.051751: sched_process_exit: comm=ld64.so.1 pid=10237 prio=120 … | ld64.so.1-10148 [006] .... 14232.061826: sys_futex(uaddr: 3ff88e80618, op: 6, val: 1, utime: 0, uaddr2: 2, val3: 0) FUTEX_LOCK_PI | SHARED | ld64.so.1-10148 [006] .... 14232.061826: sys_futex -> 0xfffffffffffffffd So there got to be another task that acquired the lock in userland and left since the last in kernel-user unlocked it. This might bring more light to it: diff --git a/kernel/futex.c b/kernel/futex.c index 599da35c2768..aaa782a8a115 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -1209,6 +1209,9 @@ static int handle_exit_race(u32 __user *uaddr, u32 uval, * corrupted or the user space value in *uaddr is simply bogus. * Give up and tell user space. */ + trace_printk("uval2 vs uval %08x vs %08x (%d)\n", uval2, uval, + tsk ? tsk->pid : -1); + __WARN(); return -ESRCH; } @@ -1233,8 +1236,10 @@ static int attach_to_pi_owner(u32 __user *uaddr, u32 uval, union futex_key *key, if (!pid) return -EAGAIN; p = find_get_task_by_vpid(pid); - if (!p) + if (!p) { + trace_printk("Missing pid %d\n", pid); return handle_exit_race(uaddr, uval, NULL); + } if (unlikely(p->flags & PF_KTHREAD)) { put_task_struct(p); --- I am not sure, but isn't this the "known" issue where the kernel drops ESRCH in a valid case and glibc upstream does not recognize it because it is not a valid /POSIX-defined error code? (I *think* same is true for -ENOMEM) If it is, the following C snippet is a small tc: ---->8------ #include #include #include #include #include #include #include #include static char nothing[4096]; int main(void) { int fd; ssize_t wn; void *lockm; pid_t child; pthread_mutex_t *the_lock; pthread_mutexattr_t mutexattr; int ret; fd = open("/dev/shm/futex-test-lock", O_RDWR | O_CREAT | O_TRUNC, 0644); if (fd < 0) { printf("Failed to create lock file: %m\n"); return 1; } wn = write(fd, nothing, sizeof(nothing)); if (wn != sizeof(nothing)) { printf("Failed to write to file: %m\n"); goto out_unlink; } lockm = mmap(NULL, sizeof(nothing), PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); if (lockm == MAP_FAILED) { printf("mmap() failed: %m\n"); goto out_unlink; } close(fd); unlink("/dev/shm/futex-test-lock"); the_lock = lockm; ret = pthread_mutexattr_init(&mutexattr); ret |= pthread_mutexattr_setpshared(&mutexattr, PTHREAD_PROCESS_SHARED); ret |= pthread_mutexattr_setprotocol(&mutexattr, PTHREAD_PRIO_INHERIT); if (ret) { printf("Something went wrong during init\n"); return 1; } ret = pthread_mutex_init(the_lock, &mutexattr); if (ret) { printf("Failed to init the lock\n"); return 1; } child = fork(); if (child < 0) { printf("fork(): %m\n"); return 1; } if (!child) { pthread_mutex_lock(the_lock); exit(2); } sleep(2); ret = pthread_mutex_lock(the_lock); printf("-> %x\n", ret); return 0; out_unlink: unlink("/dev/shm/futex-test-lock"); return 1; } ---------------8<----------------------- strace gives this: |openat(AT_FDCWD, "/dev/shm/futex-test-lock", O_RDWR|O_CREAT|O_TRUNC, 0644) = 3 |write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 |mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = 0x7f5e23e37000 |close(3) = 0 |unlink("/dev/shm/futex-test-lock") = 0 … |clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f5e23c1da10) = 25777 |strace: Process 25777 attached |[pid 25776] nanosleep({tv_sec=2, tv_nsec=0}, |[pid 25777] set_robust_list(0x7f5e23c1da20, 24) = 0 |[pid 25777] exit_group(2) = ? |[pid 25777] +++ exited with 2 +++ |<... nanosleep resumed> {tv_sec=1, tv_nsec=999821679}) = ? ERESTART_RESTARTBLOCK (Interrupted by signal) |--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=25777, si_uid=1000, si_status=2, si_utime=0, si_stime=0} --- |restart_syscall(<... resuming interrupted nanosleep ...>) = 0 |futex(0x7f5e23e37000, FUTEX_LOCK_PI, NULL) = -1 ESRCH (No such process) |pause(^Cstrace: Process 25776 detached and if I remember correctly, if asserts are not enabled we end up with a pause loop instead. Sebastian