Received: by 2002:a05:7412:ba23:b0:fa:4c10:6cad with SMTP id jp35csp2019661rdb; Sun, 21 Jan 2024 04:09:33 -0800 (PST) X-Google-Smtp-Source: AGHT+IFcLYMFF6+35FnQ3vGvbL1hDyu1TNHeAK+AbAP30ua+ULqX6BdQwJ2kZqG8sXdELTube2Se X-Received: by 2002:a17:907:c314:b0:a30:393e:410b with SMTP id tl20-20020a170907c31400b00a30393e410bmr200904ejc.185.1705838972944; Sun, 21 Jan 2024 04:09:32 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1705838972; cv=pass; d=google.com; s=arc-20160816; b=AH5CXp+P4oahBMUxCae8v+GFo8m/q6ralBKo3LuaB6zfNPQaLV2wQxdLF4Z1Pdw6+R TAynVsRdxfD3hZsJx5TgyUUhioilJLFozEcp8Nqm48uwnhZWeChNFktB2iGkZcNiBotq XOXZYa9aKTYNT41iDRFJ3KnFOPfbvbjJ0Hzqq/QclW6Hdx/fZ7xwPQyA8Etbea+oJQ7n mLsYL3C1dcYUH77g6i5KOSb9PMn4P/Zui/RYSEZ0VCIfQqFZ4xDUB5KFI/7JaVBqhBhe lvEyap5l1Vr6guH56Msz3UjSE+MVVHSJ+sym9TIlTTtK0pVlVAFKNBloX5qIJ1vV7TaC lhnw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=user-agent:in-reply-to:content-disposition:mime-version :list-unsubscribe:list-subscribe:list-id:precedence:references :message-id:subject:cc:to:from:date:dkim-signature; bh=njG3gKc62UJc7HgDAh1SK2MaK5/xyFMyFyxaC4o9424=; fh=cZa389trmIM4k/vAHLOq6ZOki50gMEbkLkveJOSODdc=; b=xZ61mt0HRteu8iRVyk7aJia2GIQ8/euXFypMpm40AapX5pX2HL9ia33T98bffrktsP Pe0lIajmiBRZruH/xub0xkLgTNz9idkFl2+7dcTdatxqXI1mxryB4zMs45Sel762318I zycU++uzCeiYD15Rb0CLOCG7wbDaAAUDkvLrNvFAIORwNv2OGoGttypeHLgLroIHltgd xF+WmPcnBVoDNaMuQy8ysydhGDaQ9pvC/ROXHiqjqlg2dZnNmLFUefTU3hnntmV9rEXL ZRCcXREANY1JxEaMx06UwfGtYLV8Gj0wy1b+Ef5hcMpzNCoj9AOaiWubZpbEOxMwhnFU nimg== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=RmsnFOvk; arc=pass (i=1 spf=pass spfdomain=redhat.com dkim=pass dkdomain=redhat.com dmarc=pass fromdomain=redhat.com); spf=pass (google.com: domain of linux-kernel+bounces-32040-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-32040-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id c22-20020a1709060fd600b00a2cf5059420si8067579ejk.546.2024.01.21.04.09.32 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 21 Jan 2024 04:09:32 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-32040-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=RmsnFOvk; arc=pass (i=1 spf=pass spfdomain=redhat.com dkim=pass dkdomain=redhat.com dmarc=pass fromdomain=redhat.com); spf=pass (google.com: domain of linux-kernel+bounces-32040-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-32040-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id AAD031F215B5 for ; Sun, 21 Jan 2024 12:09:32 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 26FFF36AFC; Sun, 21 Jan 2024 12:09:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="RmsnFOvk" Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B27A412E59 for ; Sun, 21 Jan 2024 12:09:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705838964; cv=none; b=ZorzpGDNvkG3IT7YtrWOaxzxMb+oJ5oRIiwVZCtXZ+XaSu1+u0C8QRlRPNCFyPIkI+QXLtYcfe/GlUSmuHOfsLxYDcJR0bydT6zIk4xOb2JvUSpwhDnPfY4WNLGLuuWnrMxrZwNSFBHXbsKnT+occeIiGXwvaRJkP5FsGYUl2Ps= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705838964; c=relaxed/simple; bh=ywucihbEZtAwUAoztcsPJW2jtsM/QNTyPlqlL77yKmU=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=NEaOoJGNtnRHfumHDSih90JgChvYP9I9aYV5ajIra3mVubX+mcTkaW8ymUNi4yjYauGfC6F24MxfJTXieKHCDmen94ErUt8kq+RUmgUvTQaLUNP60gJ1D+unXfB7niqTAwIKjA6GqSrfI9ZJugD9dvLPvKwg13+WNwAJ8knXvNU= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=RmsnFOvk; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1705838961; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=njG3gKc62UJc7HgDAh1SK2MaK5/xyFMyFyxaC4o9424=; b=RmsnFOvkTbSZ0wuiS7zMOmN5SG/mob2VXAotjVorIRXYa9/iqupTHVPncor7xSSQ6wV5hF m6ji/iuuV1WooOEqGULXEw5/sg/7nC5F2aAPAiHIL0vHEji+VyVsVUAIqUZdvdIMWaJtj3 Qr0ad86hqC5ytfhiVRLqC24QVoLx0+Y= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-647-A-lisYmZO9-Us9d6ryWS0w-1; Sun, 21 Jan 2024 07:09:15 -0500 X-MC-Unique: A-lisYmZO9-Us9d6ryWS0w-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id DC2021013768; Sun, 21 Jan 2024 12:09:14 +0000 (UTC) Received: from dhcp-27-174.brq.redhat.com (unknown [10.45.224.6]) by smtp.corp.redhat.com (Postfix) with SMTP id 7BB942BA; Sun, 21 Jan 2024 12:09:08 +0000 (UTC) Received: by dhcp-27-174.brq.redhat.com (nbSMTP-1.00) for uid 1000 oleg@redhat.com; Sun, 21 Jan 2024 13:08:01 +0100 (CET) Date: Sun, 21 Jan 2024 13:07:54 +0100 From: Oleg Nesterov To: Andrew Morton Cc: Dylan Hatch , Kees Cook , Frederic Weisbecker , "Joel Fernandes (Google)" , Ard Biesheuvel , "Matthew Wilcox (Oracle)" , Thomas Gleixner , Sebastian Andrzej Siewior , "Eric W. Biederman" , Vincent Whitchurch , Dmitry Vyukov , Luis Chamberlain , Mike Christie , David Hildenbrand , Catalin Marinas , Stefan Roesch , Joey Gouly , Josh Triplett , Helge Deller , Ondrej Mosnacek , Florent Revest , Miguel Ojeda , linux-kernel@vger.kernel.org Subject: Re: [PATCH 2/2] getrusage: use sig->stats_lock Message-ID: <20240121120754.GA2814@redhat.com> References: <20240117192534.1327608-1-dylanbhatch@google.com> <20240119141501.GA23739@redhat.com> <20240119141529.GB23739@redhat.com> <20240120204552.c0708fd10fc8e2442c447049@linux-foundation.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240120204552.c0708fd10fc8e2442c447049@linux-foundation.org> User-Agent: Mutt/1.5.24 (2015-08-30) X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.1 On 01/20, Andrew Morton wrote: > > On Fri, 19 Jan 2024 19:27:49 -0800 Dylan Hatch wrote: > > > > > I applied these to a 5.10 kernel, and my repro (calling getrusage(RUSAGE_SELF) > > from 200K threads) is no longer triggering a hard lockup. > > Thanks, but... > > The changelogs don't actually describe any hard lockup. [1/2] does > mention "the deadlock" but that's all the info we have. Sorry for confusion... 1/2 tries to explain that this change is not strictly necessary for 2/2, it is safe to call thread_group_cputime() with sig->stats_lock held for writing even if thread_group_cputime() takes the same lock, because in this case thread_group_cputime() can't enter the slow mode. > So could we please have a suitable description of the bug which these are > addressing? And a Reported-by:, a Closes: and a Fixes would be great too. Yes sorry I forgot to add Reported-by. So I'll try to update the changelog and add Reported-and-tested-by. But the problem is known and old. I think do_io_accounting() had the same problem until 1df4bd83cdfdbd0 ("do_io_accounting: use sig->stats_lock"). and do_task_stat() ... getrusage() takes siglock and does for_each_thread() twice. If NR_THREADS call sys_getrusage() in an endless loop on NR_CPUS, lock_task_sighand() can trigger a hard lockup because it spins with irqs disabled waiting for other NR_CPUS-1 which need the same siglock. So the time it spins with irqs disabled is O(NR_CPUS * NR_THREADS). With this patch all the threads can run lockless in parallel in the likely case. Dylan, do you have a better description? Can you share your repro? although I think that something simple like #define NT BIG_NUMBER pthread_barrier_t barr; void *thread(void *arg) { struct rusage ru; pthread_barrier_wait(&barr); for (;;) getrusage(RUSAGE_SELF, &ru); return NULL; } int main(void) { pthread_barrier_init(&barr, NULL, NT); for (int n = 0; n < NT-1; ++n) { pthread_t pt; pthread_create(&pt, NULL, thread, NULL); } thread(NULL); return 0; } should work if you have a machine with a lot of memory/cpus. Oleg.