Received: by 2002:ac8:6216:0:b0:41c:c224:f26f with SMTP id ks22csp81511qtb; Thu, 2 Nov 2023 12:08:23 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFpSeVraNBeEuLe9f4PkXBAGbLl3yxnLxtshPyzBluUcADHdEnmLXkRyR++uveJckkeydPu X-Received: by 2002:a17:90b:158:b0:280:3b05:3081 with SMTP id em24-20020a17090b015800b002803b053081mr12617517pjb.37.1698952103587; Thu, 02 Nov 2023 12:08:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698952103; cv=none; d=google.com; s=arc-20160816; b=yzVs9DfRKKOnIF7jGo3XVbuoYlt1Ro07Nx6EE5KtsIHiYTwqR5+VnMYfxYi6ULtJt5 0QwBNZLXZJTYnD+KQaHhpcUWdtHBEKlm5Ejb9UOKJeIQTB4mHJUHw+p1g27608MjuAY0 NPvq1bfvpZdGrI1MCgqLygtKgvu24vGOWJiPocpQsfrNZQ2HLIZxG98YcnHPRnwHFfhn 0dgb2L1uo9AcVRDmy0rKBWAYwi8f93hodrSZmefWrAb8zNeESknwxZLOKV0Xz3LEhlOv xNcZahPA1urR+HQhCyKgRwHtmCt17pARDIwCeqCGHm+o17EkSpnHdCJK2L/fZ2Ozhz9f Sacg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=z3K3OmvqH5lzaH4yBBJivGjTa0UMweaOBbHusKFdRvM=; fh=yo9k1oLGaSYccmaLgBzNsHY7VKt+226CJNwoSDLImVM=; b=gxhrMmzpV2pW8Z5Duo8HSWExj1vLKyz2iNO8vJFShFyORyNRskT8QkOkILSgyzs/qm 5eMlFib+05B2uhHj/35gK/mQrpjGZfw2F7ADECqdjQXDhl/znaqWE/WzxoEXKYklm/Nd INDEE+0s3Vk86PxMJYW2kaA0dYPW4rPhdeYcKbsE2nYRn1LrS1ivcSmkD81wdl66VEc4 +nvnOfu6+KEAym0mU3MzF1R95yh92YZeEpJLRtWh9L8BlyILaOArka10mx5dEoag7yFP B9iFmJed2c71RX82IBIadjQJXylzmXx6HxkkTDWXkZRJagfaDB0fDH4DNPf36iRPgfeq dEeQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=cg8eLvku; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from lipwig.vger.email (lipwig.vger.email. [23.128.96.33]) by mx.google.com with ESMTPS id u9-20020a17090adb4900b00280294ec225si308306pjx.24.2023.11.02.12.08.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 02 Nov 2023 12:08:23 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) client-ip=23.128.96.33; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=cg8eLvku; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id 1D3F081EEBCB; Thu, 2 Nov 2023 12:08:21 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229717AbjKBTIK (ORCPT + 99 others); Thu, 2 Nov 2023 15:08:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59212 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229594AbjKBTIJ (ORCPT ); Thu, 2 Nov 2023 15:08:09 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 237E318E for ; Thu, 2 Nov 2023 12:07:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1698952041; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=z3K3OmvqH5lzaH4yBBJivGjTa0UMweaOBbHusKFdRvM=; b=cg8eLvkuMY+AL/emCsWYjHa6ENOY/cLDuHQFD4eyiPt1K5ToKFBWwDhkpJ3yFQ3/E+0CDL 1GURV9e9D6Ze/6a2lO36rLhjnjmYE7HjmZ2O+wjlLZMfuzcc/mu9kh/H2l2/KmPfepyD09 mKjSs9cSIsdburDCnWmRv95njMXlxts= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-1-nZ-r8DN8NEyonLN5Q80LVQ-1; Thu, 02 Nov 2023 15:07:15 -0400 X-MC-Unique: nZ-r8DN8NEyonLN5Q80LVQ-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 19CAB2801A4A; Thu, 2 Nov 2023 19:07:15 +0000 (UTC) Received: from [10.22.17.8] (unknown [10.22.17.8]) by smtp.corp.redhat.com (Postfix) with ESMTP id 543B02026D4C; Thu, 2 Nov 2023 19:07:14 +0000 (UTC) Message-ID: <19251b0c-a6b6-ba5b-3b7b-620416165121@redhat.com> Date: Thu, 2 Nov 2023 15:07:14 -0400 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.14.0 Subject: Re: [PATCH v2] cgroup/rstat: Reduce cpu_lock hold time in cgroup_rstat_flush_locked() Content-Language: en-US To: Yosry Ahmed Cc: Tejun Heo , Zefan Li , Johannes Weiner , cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, Joe Mario , Sebastian Jug References: <20231102005310.439588-1-longman@redhat.com> From: Waiman Long In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.4 X-Spam-Status: No, score=-5.0 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Thu, 02 Nov 2023 12:08:21 -0700 (PDT) On 11/2/23 00:35, Yosry Ahmed wrote: > On Wed, Nov 1, 2023 at 5:53 PM Waiman Long wrote: >> When cgroup_rstat_updated() isn't being called concurrently with >> cgroup_rstat_flush_locked(), its run time is pretty short. When >> both are called concurrently, the cgroup_rstat_updated() run time >> can spike to a pretty high value due to high cpu_lock hold time in >> cgroup_rstat_flush_locked(). This can be problematic if the task calling >> cgroup_rstat_updated() is a realtime task running on an isolated CPU >> with a strict latency requirement. The cgroup_rstat_updated() call can >> happens when there is a page fault even though the task is running in > s/happens/happen > >> user space most of the time. >> >> The percpu cpu_lock is used to protect the update tree - >> updated_next and updated_children. This protection is only needed >> when cgroup_rstat_cpu_pop_updated() is being called. The subsequent >> flushing operation which can take a much longer time does not need >> that protection. > nit: add: as it is already protected by cgroup_rstat_lock. > >> To reduce the cpu_lock hold time, we need to perform all the >> cgroup_rstat_cpu_pop_updated() calls up front with the lock >> released afterward before doing any flushing. This patch adds a new >> cgroup_rstat_updated_list() function to return a singly linked list of >> cgroups to be flushed. >> >> By adding some instrumentation code to measure the maximum elapsed times >> of the new cgroup_rstat_updated_list() function and each cpu iteration of >> cgroup_rstat_updated_locked() around the old cpu_lock lock/unlock pair >> on a 2-socket x86-64 server running parallel kernel build, the maximum >> elapsed times are 27us and 88us respectively. The maximum cpu_lock hold >> time is now reduced to about 30% of the original. >> >> Below were the run time distribution of cgroup_rstat_updated_list() >> during the same period: >> >> Run time Count >> -------- ----- >> t <= 1us 12,574,302 >> 1us < t <= 5us 2,127,482 >> 5us < t <= 10us 8,445 >> 10us < t <= 20us 6,425 >> 20us < t <= 30us 50 >> >> Signed-off-by: Waiman Long > LGTM with some nits. > > Reviewed-by: Yosry Ahmed > >> --- >> include/linux/cgroup-defs.h | 6 +++++ >> kernel/cgroup/rstat.c | 45 ++++++++++++++++++++++++------------- >> 2 files changed, 36 insertions(+), 15 deletions(-) >> >> diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h >> index 265da00a1a8b..daaf6d4eb8b6 100644 >> --- a/include/linux/cgroup-defs.h >> +++ b/include/linux/cgroup-defs.h >> @@ -491,6 +491,12 @@ struct cgroup { >> struct cgroup_rstat_cpu __percpu *rstat_cpu; >> struct list_head rstat_css_list; >> >> + /* >> + * A singly-linked list of cgroup structures to be rstat flushed. >> + * Protected by cgroup_rstat_lock. > Do you think we should mention that this is a scratch area for > cgroup_rstat_flush_locked()? IOW, this field will be invalid or may > contain garbage otherwise. I can certainly add that into the comment. > > It might be also useful to mention that the scope of usage for this is > for each percpu flushing iteration. The cgroup_rstat_lock can be > dropped between percpu flushing iterations, so different flushers can > reuse this field safely because it is re-initialized in every > iteration and only used there. > >> + */ >> + struct cgroup *rstat_flush_next; >> + >> /* cgroup basic resource statistics */ >> struct cgroup_base_stat last_bstat; >> struct cgroup_base_stat bstat; >> diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c >> index d80d7a608141..a86d40ed8bda 100644 >> --- a/kernel/cgroup/rstat.c >> +++ b/kernel/cgroup/rstat.c >> @@ -145,6 +145,34 @@ static struct cgroup *cgroup_rstat_cpu_pop_updated(struct cgroup *pos, >> return pos; >> } >> >> +/* >> + * Return a list of updated cgroups to be flushed >> + */ > Why not just on a single line? > /* Return a list of updated cgroups to be flushed */ Yes, it can be compressed into a one liner. Thanks for the review and suggestion. Cheers, Longman