Received: by 2002:a25:31c3:0:0:0:0:0 with SMTP id x186csp2401502ybx; Fri, 8 Nov 2019 03:53:27 -0800 (PST) X-Google-Smtp-Source: APXvYqwThnz5fWLG7ihVnkjM2kr2YDjTVlXYsz6cnC6GQznSuzBO7Xg1AnNvK6np2P9k/jBsOWFQ X-Received: by 2002:a17:906:7051:: with SMTP id r17mr7918029ejj.155.1573214007229; Fri, 08 Nov 2019 03:53:27 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1573214007; cv=none; d=google.com; s=arc-20160816; b=adxYgl3DXiryPvsaNNKvVEPVEb/OICedANkNZj/FpXjQxWZkDbfI3szsz5/K2nUsZJ 3OYHwUoJrOVSah/Bhpv07Ba/a+FMSnekG+s/+kkp+guudyJ4jShckrPDqe95imtqSUCJ THEIXBnRdqXjhelsZRuHzBHxqGFXovTTfZbkhd6La4vIdbkdHsWqSevBsTKZb8sPgv1F 5/eyFXIvRzXHgNy3eeA1c/CovIAJ3g5DX3qpDf5b6YSHe9EI3nuMY4MPrHWC/XMhUF3D 4PFPo/ZspXEQT0OCxIBQhrwMUIYTn95FN0hppuvcaH+dKEKrUI0t/jdxMLzGvO77K7d8 yO1w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date; bh=TIVd59IVB+tozGd2QDUoBF/79ZAFd1Ine15nx4g98eA=; b=GZhe3QAZqVww7ANrcsdcFA9o2A5iRVaJ5W5f4VYLuqK9lKpiRErF2qkM94jupsCsMZ 60pKBd3Uv0eG+/TKx7PRTOBUJ19mT/nJMWJYnAiz8Jc/rKKp0iyLtiftqQJauKiwzM/0 1pnse/W/wtXWT7SYgOyDf9Zk682+DM+wXAijdzf91cgx36lNMqJYFyHpFI5QiGdQEjuB oE8LyKLUGeV9sOyqigveJSlvGp678tUxXLeyNbh2WSturuVP6Qp3Uac0kTETR9X5QZnZ QV++2a5cvCuwSZJRkKU4E+QLHNUqi0IJJ62D9yIOI/wnDByKPGSku8Z+h1Bn0GP/80Au gg1A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id ga20si384705ejb.14.2019.11.08.03.53.04; Fri, 08 Nov 2019 03:53:27 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391021AbfKHLwD (ORCPT + 99 others); Fri, 8 Nov 2019 06:52:03 -0500 Received: from Galois.linutronix.de ([193.142.43.55]:50990 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2390287AbfKHLwA (ORCPT ); Fri, 8 Nov 2019 06:52:00 -0500 Received: from [5.158.153.52] (helo=nanos.tec.linutronix.de) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1iT2ny-0008Jb-Dv; Fri, 08 Nov 2019 12:51:55 +0100 Date: Fri, 8 Nov 2019 12:51:53 +0100 (CET) From: Thomas Gleixner To: Florian Weimer cc: LKML , Peter Zijlstra , Ingo Molnar , Darren Hart , Yi Wang , Yang Tao , Oleg Nesterov , Carlos O'Donell , Alexander Viro Subject: Re: [patch 00/12] futex: Cure robust/PI futex exit races In-Reply-To: Message-ID: References: <20191106215534.241796846@linutronix.de> <87zhh78gnf.fsf@oldenburg2.str.redhat.com> <87v9rv8g44.fsf@oldenburg2.str.redhat.com> <87o8xm95rt.fsf@oldenburg2.str.redhat.com> <87o8xm65ar.fsf@oldenburg2.str.redhat.com> User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 8 Nov 2019, Thomas Gleixner wrote: > On Fri, 8 Nov 2019, Florian Weimer wrote: > > > On Fri, 8 Nov 2019, Florian Weimer wrote: > > > Unpatched 5.4-rc6: > > > > > > FAIL: nptl/tst-thread-affinity-pthread > > > original exit status 1 > > > info: Detected CPU set size (in bits): 225 > > > info: Maximum test CPU: 255 > > > error: pthread_create for thread 253 failed: Resource temporarily unavailable > > > > Huh. Reverting your patches (at commit 26bc672134241a080a83b2ab9aa8abede8d30e1c) > > fixes the test for me. > > > > > TBH, the futex changes have absolutely nothing to do with that resource > > > fail. > > > > I suspect that there are some changes to task exit latency, which > > triggers the latent resource management bug. > > Right, and depending on which hardware you run, this changes. On the big > testbox I use the failure is also bouncing around between thread 252 and > 254. Which was just an assumption and is completely wrong. The fail is expected and the failure output of that test is totally bonkers: Tracing shows that clone is not failing at all: ld-linux.so.2-26694 [060] .... 6477.924785: sys_enter: NR 120 (3d0f00, f7cda424, f7cdaba8, ff819790, f7cdaba8, f7edd000) ld-linux.so.2-26694 [060] .... 6477.924867: sys_exit: NR 120 = 26695 ... ld-linux.so.2-26694 [191] .... 6477.985139: sys_enter: NR 120 (3d0f00, fef27424, fef27ba8, ff819790, fef27ba8, f7edd000) ld-linux.so.2-26694 [191] .... 6477.985220: sys_exit: NR 120 = 27203 That's a total of 509 threads created. And then right after that: ld-linux.so.2-26694 [191] .... 6477.985221: sys_enter: NR 192 (0, 801000, 0, 20022, ffffffff, 0) ld-linux.so.2-26694 [191] .... 6477.985222: sys_exit: NR 192 = -12 mmap2 fails with ENOMEM which is not really surprising. The map length is 0x801000 which means that the already started threads have already consumed 509 * 0x801000 == 4073.99 MB == 3.9785 GB The next mmap2 fails for a 32bit process for pretty obvious reasons and rightfully so. pthread_create() returns EAGAIN while the underlying problem is ENOMEM which causes this bonkers output: error: pthread_create for thread 253 failed: Resource temporarily unavailable There is nothing temporarily. The process has its address space exhausted. That test's output is anyway strange: info: Detected CPU set size (in bits): 225 info: Maximum test CPU: 255 Interesting how it fits 256 CPUs into a cpuset with a size of 225 bits. /me goes back to stare into iopl(). Thanks, tglx