Received: by 2002:a05:7412:8521:b0:e2:908c:2ebd with SMTP id t33csp2407313rdf; Mon, 6 Nov 2023 13:18:24 -0800 (PST) X-Google-Smtp-Source: AGHT+IE7PdmVWG+e1joIDU1zphZ9haDqLDhkW9jW8R0J15NMXzJumXvk6h6MBxtOoESWr8eM+soW X-Received: by 2002:a05:6358:881f:b0:168:e7f4:8fc9 with SMTP id hv31-20020a056358881f00b00168e7f48fc9mr41816812rwb.24.1699305504477; Mon, 06 Nov 2023 13:18:24 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1699305504; cv=none; d=google.com; s=arc-20160816; b=RhDL1R53q8R14Ibe0czkxrIxrvn2znNxtf2bkPkVE4sTpD6gTzDlMRkejCUdT3FK3J /tt5t1pgCQv1eV1Ped3ia3dQqaIX96Pe2cL5r9YR1okuBXCKteMsO3MxuYHuLPupE7kY +ngcJSCpmi57YHI56wRXldtM9c13kZY9N842gBiwT934N9p1F0CEqZGBRKrSce9deMQ7 SLyVjucSk8TG9W5f8momPz65ong/+EpZ0OXKx7HDpRiZoTG5K1XltG4+jUzhI9P7YfB1 +j2drrzipTx95UtsrE71ehp4uVL/MrUnonPtMh/8fVaw4d3lguAv8CIPJfSi1BE7mqMS Cu0Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=AAnPepekn0x1vd4cfhce4CIyWaNk1MulfdxYt46UIgc=; fh=LQMTPyEs/tWarnekVK8AJSbwhZXl+zr61VpLUND6AQw=; b=ZhxG93+y8R8LFBHe4uuhXfJ5xh95RLPs803dLcFSBcfGpAdX1XstA2QIYRz8OlDBok jdlNz8yG8Ps04UpCB3RdtLM7mzWBO9JldeKJxFSg+h84py/hqS7q74+LeoWMSMUaTKEg PVl+RB5fzzbk/ujVwmUViYRnvofDWOQGfhnDST7li0+xkRyrmFlF+mMo+/vgXOR4qC+k plf2ZUIR9STs7xzw4jv5RhnnF2/j7w+GQiaS5F8rXaAXD0gKu3yxrgM8V1wH7znHL+d+ P3eyF0pWR4mwASOwHX2UX8rAN72FMnrgFaITRxH3FSHeXKWFsUP928ifv/tEprNgPly+ 4+iA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=PznbsALX; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from groat.vger.email (groat.vger.email. [23.128.96.35]) by mx.google.com with ESMTPS id bw20-20020a056a02049400b005702257f32esi589534pgb.185.2023.11.06.13.18.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 06 Nov 2023 13:18:24 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) client-ip=23.128.96.35; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=PznbsALX; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id 35A29804B2BB; Mon, 6 Nov 2023 13:18:21 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232954AbjKFVR6 (ORCPT + 99 others); Mon, 6 Nov 2023 16:17:58 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40752 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231739AbjKFVR4 (ORCPT ); Mon, 6 Nov 2023 16:17:56 -0500 Received: from mail-ej1-x634.google.com (mail-ej1-x634.google.com [IPv6:2a00:1450:4864:20::634]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DAC27AF for ; Mon, 6 Nov 2023 13:17:52 -0800 (PST) Received: by mail-ej1-x634.google.com with SMTP id a640c23a62f3a-9dd5879a126so495790366b.3 for ; Mon, 06 Nov 2023 13:17:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1699305471; x=1699910271; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=AAnPepekn0x1vd4cfhce4CIyWaNk1MulfdxYt46UIgc=; b=PznbsALXkm9uyk2DK9Bo6VJORe44GAcpckaeSL0s/gISyJJCpflPlY/Q8OHZ2mmUNj /UMY8SCsGiPoKmi1eXiwgEQd3TY8/g2ohz0EFG6ZR5KCrw9TWjiYSi4jfgjm3eeIlfTO kPI8puQk6RMmQIGmy/Z26pK6+FtjVRrDv51kscsaWD/iq0VCC7XiI2DGpWD38Jv0O1r+ nTIGhPs92ipifQCWBlnSdkBaEtF5IqStVro4nRgV0qjhBhnOdEUknZvUnoblNgOiY6+k Dhh4CstswOIYvHmrOlt7E/YvZYiR5DSXvA4JHxAvZfArHi6vmGF48yVDU3kvh014grg9 2OBA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699305471; x=1699910271; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=AAnPepekn0x1vd4cfhce4CIyWaNk1MulfdxYt46UIgc=; b=LjAheVYdmEeWASkvXMA6kYGW4W3CtnLGph0GEpiVf3pcYEBToiPTk9Ri1PnbzBmsoy XxjnfpaumnP9rNUJc8V8ft2RxBHlSdWYZUCQLm/crvJjW5npss27+B/P9ypjmojinPs8 Wr3sTNkzvrzhnze34lHzya9V4rI10gVlBfc9X/9lM8KCujE1yuP58a5MqtEp5gszpBjH 5fJpHHJZqGHv88GmBLYInyX/b8YZUvAtASYNoB97FhxnhSRlspFumAf2eJiMfQtndZsG +qa2mNigNfaMigEgIQdvzNuZeWsjliaAOYbWvF/usTtJyORHnvYADjBcW0kaTbefFS2Z fpjw== X-Gm-Message-State: AOJu0Yyj7OK7BUsLqTHd9nftaxzgFewe0aO5rAzfAOcQ1uIJ3Gor8sJ/ x/9VC/rEkStRmwf48T6/DYZ7Tm2olQppASm7vbrrdw== X-Received: by 2002:a17:907:7295:b0:9bf:30e8:5bf9 with SMTP id dt21-20020a170907729500b009bf30e85bf9mr14245983ejc.4.1699305471174; Mon, 06 Nov 2023 13:17:51 -0800 (PST) MIME-Version: 1.0 References: <20231104031303.592879-1-longman@redhat.com> <20231104031303.592879-3-longman@redhat.com> <2212f172-def9-3ec7-b3d7-732c2b2c365e@redhat.com> In-Reply-To: <2212f172-def9-3ec7-b3d7-732c2b2c365e@redhat.com> From: Yosry Ahmed Date: Mon, 6 Nov 2023 13:17:14 -0800 Message-ID: Subject: Re: [PATCH v3 2/3] cgroup/rstat: Optimize cgroup_rstat_updated_list() To: Waiman Long Cc: Tejun Heo , Zefan Li , Johannes Weiner , cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, Joe Mario , Sebastian Jug Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-8.4 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Mon, 06 Nov 2023 13:18:21 -0800 (PST) On Mon, Nov 6, 2023 at 12:37=E2=80=AFPM Waiman Long wr= ote: > > On 11/6/23 15:07, Yosry Ahmed wrote: > > On Fri, Nov 3, 2023 at 8:13=E2=80=AFPM Waiman Long = wrote: > >> The current design of cgroup_rstat_cpu_pop_updated() is to traverse > >> the updated tree in a way to pop out the leaf nodes first before > >> their parents. This can cause traversal of multiple nodes before a > >> leaf node can be found and popped out. IOW, a given node in the tree > >> can be visited multiple times before the whole operation is done. So > >> it is not very efficient and the code can be hard to read. > >> > >> With the introduction of cgroup_rstat_updated_list() to build a list > >> of cgroups to be flushed first before any flushing operation is being > >> done, we can optimize the way the updated tree nodes are being popped > >> by pushing the parents first to the tail end of the list before their > >> children. In this way, most updated tree nodes will be visited only > >> once with the exception of the subtree root as we still need to go > >> back to its parent and popped it out of its updated_children list. > >> This also makes the code easier to read. > >> > >> A parallel kernel build on a 2-socket x86-64 server is used as the > >> benchmarking tool for measuring the lock hold time. Below were the loc= k > >> hold time frequency distribution before and after the patch: > >> > >> Hold time Before patch After patch > >> --------- ------------ ----------- > >> 0-01 us 13,738,708 14,594,545 > >> 01-05 us 1,177,194 439,926 > >> 05-10 us 4,984 5,960 > >> 10-15 us 3,562 3,543 > >> 15-20 us 1,314 1,397 > >> 20-25 us 18 25 > >> 25-30 us 12 12 > >> > >> It can be seen that the patch pushes the lock hold time towards the > >> lower end. > >> > >> Signed-off-by: Waiman Long > >> --- > > I don't know why git decided to show this diff in the most confusing > > way possible. > I agree. The diff is really hard to look at. It will be easier to apply > the patch & looks at the actual rstat.c file. Would the diff be simpler if patches 1 & 2 were squashed? [..] > > > >> * > >> * The only ordering guarantee is that, for a parent and a child pai= r > >> - * covered by a given traversal, if a child is visited, its parent is > >> - * guaranteed to be visited afterwards. > >> + * covered by a given traversal, the child is before its parent in > >> + * the list. > >> + * > >> + * Note that updated_children is self terminated while updated_next i= s > >> + * parent cgroup terminated except the cgroup root which can be self > >> + * terminated. > > IIUC updated_children and updated_next is the same list. > > updated_children is the head, and updated_next is how the list items > > are linked. This comment makes it seem like they are two different > > lists. > Thanks for the comment. I will rework the comment to clarify that a bit > more. > > > > I am actually wondering if it's worth using the singly linked list > > here. We are saving 8 bytes percpu, but the semantics are fairly > > confusing. Wouldn't this be easier to reason about if you just use > > list_head? > > > > updated_children would be replaced with LIST_HEAD (or similar), and > > the list would be NULL terminated instead of terminated by self/parent > > cgroup. IIUC the reason it's not NULL-terminated now is because we use > > cgroup->updated_next to check quickly if a cgroup is on the list or > > not. If we use list_heads, we can just use list_emtpy() IIUC. > > > > We can also simplify the semantics of unlinking @root from the updated > > tree below, it would just be list_del() IIUC, which is actually more > > performant as well. It seems like overall we would simplify a lot of > > things. When forming the updated_list, we can just walk the tree and > > splice the lists in the correct order. > > > > It seems to me that saving 8 bytes percpu is not worth the complexity > > of the custom list semantics here. Am I missing something here? > > It will cost an additional 16 bytes of percpu memory if converted to > list_heads. Like other lists, there will be sibling and children > list_heads. There are also 2 pointers to update instead of one. Anyway, > I don't have an objection to convert them to list_heads if agreed by Teju= n. Yes you are right. It's definitely not free, but it's also not super costly. It's just that every time I look at the rstat code I need to remind myself of how updated_next and updated_children work. I will let Tejun decide.