Received: by 2002:a05:6902:102b:0:0:0:0 with SMTP id x11csp2878787ybt; Mon, 22 Jun 2020 09:13:16 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxNzyLyTlu01VV0zLqD3v4jK3pbVBJp/3OWljHYiSa0eZeUi38LXHBJBLRlXXDDbw4mglYi X-Received: by 2002:a05:6402:787:: with SMTP id d7mr2169846edy.46.1592842396037; Mon, 22 Jun 2020 09:13:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1592842396; cv=none; d=google.com; s=arc-20160816; b=oMHqPohrP1m/pIx2BXfSxIHMy1IsPfcJ4zECYIOlRtKeg89HOM7TMZYfNBbheUT36q AE48h9W4Piyi5GYcaOHrdRvozTAhJ5wUZN9UEkcWCnhZvuT5FxIBWwDjZNBR3u3zcdYm IA9PAQVRsWj1qMUfY6nMgf4kpyOHzAlOvwKKHqSOv707M9gqqAvX6L+REhpw+eUU5aUh 5fOCZ0AvPtXJOLW4QJdH3RwBt4IlLmuo4wYqXWoeKyTZiI8oMgn3P+0z3LGSZmKvvLpa 5C32cN8ThFSWDWN1I17VPEfryhctVgCvz85Evaafu0/svVaWhjwfjAJ394I41FougiXU 8YTQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature; bh=K1w3Es33FaLd3FA+vAkCOuYJ1kplKwwrQTokMapYqC0=; b=hI6ENnBi4I9wMOUAqPO2kx5/clkbdK/1sHO+msbOzZxtjhhvUfrYN46JgFN9CsNYVu 39U3n+LtRN8jO1qQywqPnO5OcseU+8fdpn3D5uB2K2S5HLgDnfBX1V4HvSEs3p4HlG3Z y9GvolevxUPoVHLS6c51+ZV5j3P7iUOGBv/Um5MVLrjbDYFZL5yZ939PVX4orUr3nzQ5 7l7MZ8uw85D5XbDRzUaT6VSxdIfdZhTe5gtrMbcloQ5qBLqVRuWXdaF8RMyQaAFezP0z NiyoJGUafFDx4ufijs9Hs7kdBr9farhRfDs/xAemLfrJg+pM7nzSMJ7O7TZhBX1UJEIE 4cDQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@infradead.org header.s=casper.20170209 header.b=fDTCalao; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id cm1si9406688edb.235.2020.06.22.09.12.53; Mon, 22 Jun 2020 09:13:16 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=fail header.i=@infradead.org header.s=casper.20170209 header.b=fDTCalao; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729626AbgFVQJL (ORCPT + 99 others); Mon, 22 Jun 2020 12:09:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44216 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729390AbgFVQJL (ORCPT ); Mon, 22 Jun 2020 12:09:11 -0400 X-Greylist: delayed 1203 seconds by postgrey-1.37 at lindbergh.monkeyblade.net; Mon, 22 Jun 2020 09:09:11 PDT Received: from casper.infradead.org (unknown [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 825BDC061573; Mon, 22 Jun 2020 09:09:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Sender:In-Reply-To:Content-Type: MIME-Version:References:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=K1w3Es33FaLd3FA+vAkCOuYJ1kplKwwrQTokMapYqC0=; b=fDTCalao2peu2NLnAdq19qtI3f 7xJ07Z+bHK77ohChgt0obakRXIoDNi3awvxbpCn9mGVATs47NnLz5u7unGufj6xu2g6HCGVZn6INe jJwS1XK2kiNAsxJiMJsARGDvRjkibUFvZZHzIMRUZNK6S+4DPyn+seczDEURTZbUMd1GlZnnWaoxA 5hH+TZYCsY5sN8ZyKJDpZLcvCZgGqEO2xmnUJ13fkqcY3zX2gboWkcy+pMpyf7DpmReYIkirdZiws KPr00MTv6CiwGuRfF4BEoXSdjae4yaGNBLIadclcqfPYQ4lf/IOVyAOcztojtvm+FLzlxqZuymGyk c2KU+8Iw==; Received: from willy by casper.infradead.org with local (Exim 4.92.3 #3 (Red Hat Linux)) id 1jnOga-0004Pe-P7; Mon, 22 Jun 2020 15:48:41 +0000 Date: Mon, 22 Jun 2020 16:48:40 +0100 From: willy@casper.infradead.org To: "Eric W. Biederman" Cc: Junxiao Bi , Matthew Wilcox , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, Matthew Wilcox , Srinivas Eeda , "joe.jin@oracle.com" , Wengang Wang Subject: Re: [PATCH] proc: Avoid a thundering herd of threads freeing proc dentries Message-ID: <20200622154840.GA13945@casper.infradead.org> References: <877dw3apn8.fsf@x220.int.ebiederm.org> <2cf6af59-e86b-f6cc-06d3-84309425bd1d@oracle.com> <87bllf87ve.fsf_-_@x220.int.ebiederm.org> <87k1036k9y.fsf@x220.int.ebiederm.org> <68a1f51b-50bf-0770-2367-c3e1b38be535@oracle.com> <87blle4qze.fsf@x220.int.ebiederm.org> <20200620162752.GF8681@bombadil.infradead.org> <39e9f488-110c-588d-d977-413da3dc5dfa@oracle.com> <87d05r2kl3.fsf@x220.int.ebiederm.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87d05r2kl3.fsf@x220.int.ebiederm.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jun 22, 2020 at 10:20:40AM -0500, Eric W. Biederman wrote: > Junxiao Bi writes: > > On 6/20/20 9:27 AM, Matthew Wilcox wrote: > >> On Fri, Jun 19, 2020 at 05:42:45PM -0500, Eric W. Biederman wrote: > >>> Junxiao Bi writes: > >>>> Still high lock contention. Collect the following hot path. > >>> A different location this time. > >>> > >>> I know of at least exit_signal and exit_notify that take thread wide > >>> locks, and it looks like exit_mm is another. Those don't use the same > >>> locks as flushing proc. > >>> > >>> > >>> So I think you are simply seeing a result of the thundering herd of > >>> threads shutting down at once. Given that thread shutdown is fundamentally > >>> a slow path there is only so much that can be done. > >>> > >>> If you are up for a project to working through this thundering herd I > >>> expect I can help some. It will be a long process of cleaning up > >>> the entire thread exit process with an eye to performance. > >> Wengang had some tests which produced wall-clock values for this problem, > >> which I agree is more informative. > >> > >> I'm not entirely sure what the customer workload is that requires a > >> highly threaded workload to also shut down quickly. To my mind, an > >> overall workload is normally composed of highly-threaded tasks that run > >> for a long time and only shut down rarely (thus performance of shutdown > >> is not important) and single-threaded tasks that run for a short time. > > > > The real workload is a Java application working in server-agent mode, issue > > happened in agent side, all it do is waiting works dispatching from server and > > execute. To execute one work, agent will start lots of short live threads, there > > could be a lot of threads exit same time if there were a lots of work to > > execute, the contention on the exit path caused a high %sys time which impacted > > other workload. > > If I understand correctly, the Java VM is not exiting. Just some of > it's threads. > > That is a very different problem to deal with. That are many > optimizations that are possible when _all_ of the threads are exiting > that are not possible when _many_ threads are exiting. Ah! Now I get it. This explains why the dput() lock contention was so important. A new thread starting would block on that lock as it tried to create its new /proc/$pid/task/ directory. Terminating thousands of threads but not the entire process isn't going to hit many of the locks (eg exit_signal() and exit_mm() aren't going to be called). So we need a more sophisticated micro benchmark that is continually starting threads and asking dozens-to-thousands of them to stop at the same time. Otherwise we'll try to fix lots of scalability problems that our customer doesn't care about. > Do you know if it is simply the cpu time or if it is the lock contention > that is the problem? If it is simply the cpu time we should consider if > some of the locks that can be highly contended should become mutexes. > Or perhaps something like Matthew's cpu pinning idea. If we're not trying to optimise for the entire process going down, then we definitely don't want my CPU pinning idea.