Received: by 2002:a05:7412:419a:b0:f3:1519:9f41 with SMTP id i26csp4663591rdh; Wed, 29 Nov 2023 07:27:18 -0800 (PST) X-Google-Smtp-Source: AGHT+IH2ZBJIb30+r6C24qlUjNqRkk3Bzxs4w6C3CfCjqmSiSudu/UOKFP98cfMVnfkIuQXwpvHV X-Received: by 2002:a05:6a00:1251:b0:6cd:d67f:7cb with SMTP id u17-20020a056a00125100b006cdd67f07cbmr2488635pfi.16.1701271638524; Wed, 29 Nov 2023 07:27:18 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701271638; cv=none; d=google.com; s=arc-20160816; b=YLHgL3Q7Y028hyxN28S/hKpcnBWXT8ccIoRjeJlUSdpAnzmzBTgz8hejrXcK3yaCHE yQeS84Plnk798lnU+6qcWSXtWZMja3CJV+WUYx1pqOsCDvSBUa/gKirOvIZDqPWAG77f 1qjFZbH8UotUe2m4sAqg/bQd9gYF+j7kT6lVna/iR42gBJsGPhSOvvppe7YOWbCUBihR FfgmC38dP+0qiWDIzsltwulqZDuHY8iOMQpWDNvTS1IdpVBnwlTBvyOUgAWFGfYi7tfQ 2QLcL+7mcXMco67u+csAXmItM3DlEKkD6AkrINEkScnCBhJwdEiB6WiQDLfczzKvMi/C oL3g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature:dkim-filter; bh=akCVsohWzhgHFlDJGn9FqbXDx2oHKT/KQD6M/QgipVI=; fh=T67tHxwLgIN4zQPbh5hihQ69FP8Tr+riKiecwZK97GE=; b=jab0qfbA3nEZRFgkpBvzo8cZpGCAzifV8s/13zrPAN46idhtAdBR3J/Jh4vj5mzHX1 ayMnrZd46EsNyOykwYsASEbPL59EaCEBPEZUjsR+ZlwZZQMSZRnt7V8oS5m0+JlsGKGA N+4Q6JY9uoZmwtbyZct+4uqrpz5lfmzYeFWdBF55JM0n7b2Y97JqzVqdB9fMdkaCCJcs wuEt7N68nra6n0ZBdE+Dj/CvUsZ1hsyD2w5erkVAuCJTxT1DaRoS3sEGs5XNnAMU1vDl kNZuIolxYYPc5num1T8a5EAP+WR7kIZ2bxrYNhgWqxb8KhHStnzPfKP+L09d/yIGnakX YD3Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@salutedevices.com header.s=mail header.b=C5r6pVRb; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=salutedevices.com Return-Path: Received: from lipwig.vger.email (lipwig.vger.email. [23.128.96.33]) by mx.google.com with ESMTPS id r35-20020a635d23000000b005c283e45927si13892907pgb.393.2023.11.29.07.27.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 29 Nov 2023 07:27:18 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) client-ip=23.128.96.33; Authentication-Results: mx.google.com; dkim=pass header.i=@salutedevices.com header.s=mail header.b=C5r6pVRb; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=salutedevices.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id C2F34803511A; Wed, 29 Nov 2023 07:27:14 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1343659AbjK2P0x (ORCPT + 99 others); Wed, 29 Nov 2023 10:26:53 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45038 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235012AbjK2P0i (ORCPT ); Wed, 29 Nov 2023 10:26:38 -0500 Received: from mx1.sberdevices.ru (mx1.sberdevices.ru [37.18.73.165]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 471C910CB; Wed, 29 Nov 2023 07:26:15 -0800 (PST) Received: from p-infra-ksmg-sc-msk01 (localhost [127.0.0.1]) by mx1.sberdevices.ru (Postfix) with ESMTP id C8C6D100013; Wed, 29 Nov 2023 18:26:13 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.sberdevices.ru C8C6D100013 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=salutedevices.com; s=mail; t=1701271573; bh=akCVsohWzhgHFlDJGn9FqbXDx2oHKT/KQD6M/QgipVI=; h=Date:From:To:Subject:Message-ID:MIME-Version:Content-Type:From; b=C5r6pVRbhIu+BDwsK3OYZYLx6h9KNlw/q4UG0htK7NLAnCkfKNfmrvcon83pRq//a 5kkglds76vx9nzbXUUTN6k4MORleHYuQ3480JShCcmpmht2EzrHetRNuRdmpqkMEhI 1AUNrTJtIRxsPKLleSlAQsEObzTz6pwKucHcAo5DEHtkZcyxtOWVqP66wqMRijm5NH 542M4jLnr0MCMJpm3uZ/299COj/eN1ZrtZpbTgbFcfeW2eNW08iFt06LlJCU09W6BH pzNC6NKjwcOkmLGuRSdAP3DU+kQwoanFcQVhZr8XOeDkEPo5UXXAuFbNGcls4vkK4F AAny+LeUluCqw== Received: from p-i-exch-sc-m01.sberdevices.ru (p-i-exch-sc-m01.sberdevices.ru [172.16.192.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.sberdevices.ru (Postfix) with ESMTPS; Wed, 29 Nov 2023 18:26:13 +0300 (MSK) Received: from localhost (100.64.160.123) by p-i-exch-sc-m01.sberdevices.ru (172.16.192.107) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Wed, 29 Nov 2023 18:26:13 +0300 Date: Wed, 29 Nov 2023 18:26:13 +0300 From: Dmitry Rokosov To: Michal Hocko CC: , , , , , , , , , , , , Subject: Re: [PATCH v3 2/2] mm: memcg: introduce new event to trace shrink_memcg Message-ID: <20231129152613.6vfz4b675u7wbz25@CAB-WSD-L081021> References: <20231123193937.11628-1-ddrokosov@salutedevices.com> <20231123193937.11628-3-ddrokosov@salutedevices.com> <20231127113644.btg2xrcpjhq4cdgu@CAB-WSD-L081021> <20231127161637.5eqxk7xjhhyr5tj4@CAB-WSD-L081021> <20231129152057.x7fhbcvwtsmkbdpb@CAB-WSD-L081021> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <20231129152057.x7fhbcvwtsmkbdpb@CAB-WSD-L081021> User-Agent: NeoMutt/20220415 X-Originating-IP: [100.64.160.123] X-ClientProxiedBy: p-i-exch-sc-m02.sberdevices.ru (172.16.192.103) To p-i-exch-sc-m01.sberdevices.ru (172.16.192.107) X-KSMG-Rule-ID: 10 X-KSMG-Message-Action: clean X-KSMG-AntiSpam-Lua-Profiles: 181705 [Nov 29 2023] X-KSMG-AntiSpam-Version: 6.0.0.2 X-KSMG-AntiSpam-Envelope-From: ddrokosov@salutedevices.com X-KSMG-AntiSpam-Rate: 0 X-KSMG-AntiSpam-Status: not_detected X-KSMG-AntiSpam-Method: none X-KSMG-AntiSpam-Auth: dkim=none X-KSMG-AntiSpam-Info: LuaCore: 5 0.3.5 98d108ddd984cca1d7e65e595eac546a62b0144b, {Tracking_uf_ne_domains}, {Track_E25351}, {Tracking_from_domain_doesnt_match_to}, 100.64.160.123:7.1.2;127.0.0.199:7.1.2;d41d8cd98f00b204e9800998ecf8427e.com:7.1.1;salutedevices.com:7.1.1;p-i-exch-sc-m01.sberdevices.ru:5.0.1,7.1.1;git.kernel.org:7.1.1, FromAlignment: s, ApMailHostAddress: 100.64.160.123 X-MS-Exchange-Organization-SCL: -1 X-KSMG-AntiSpam-Interceptor-Info: scan successful X-KSMG-AntiPhishing: Clean, bases: 2023/11/29 13:21:00 X-KSMG-LinksScanning: Clean, bases: 2023/11/29 13:21:00 X-KSMG-AntiVirus: Kaspersky Secure Mail Gateway, version 2.0.1.6960, bases: 2023/11/29 12:04:00 #22572143 X-KSMG-AntiVirus-Status: Clean, skipped X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Wed, 29 Nov 2023 07:27:14 -0800 (PST) On Wed, Nov 29, 2023 at 06:20:57PM +0300, Dmitry Rokosov wrote: > On Tue, Nov 28, 2023 at 10:32:50AM +0100, Michal Hocko wrote: > > On Mon 27-11-23 19:16:37, Dmitry Rokosov wrote: > > > On Mon, Nov 27, 2023 at 01:50:22PM +0100, Michal Hocko wrote: > > > > On Mon 27-11-23 14:36:44, Dmitry Rokosov wrote: > > > > > On Mon, Nov 27, 2023 at 10:33:49AM +0100, Michal Hocko wrote: > > > > > > On Thu 23-11-23 22:39:37, Dmitry Rokosov wrote: > > > > > > > The shrink_memcg flow plays a crucial role in memcg reclamation. > > > > > > > Currently, it is not possible to trace this point from non-direct > > > > > > > reclaim paths. However, direct reclaim has its own tracepoint, so there > > > > > > > is no issue there. In certain cases, when debugging memcg pressure, > > > > > > > developers may need to identify all potential requests for memcg > > > > > > > reclamation including kswapd(). The patchset introduces the tracepoints > > > > > > > mm_vmscan_memcg_shrink_{begin|end}() to address this problem. > > > > > > > > > > > > > > Example of output in the kswapd context (non-direct reclaim): > > > > > > > kswapd0-39 [001] ..... 240.356378: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16 > > > > > > > kswapd0-39 [001] ..... 240.356396: mm_vmscan_memcg_shrink_end: nr_reclaimed=0 memcg=16 > > > > > > > kswapd0-39 [001] ..... 240.356420: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16 > > > > > > > kswapd0-39 [001] ..... 240.356454: mm_vmscan_memcg_shrink_end: nr_reclaimed=1 memcg=16 > > > > > > > kswapd0-39 [001] ..... 240.356479: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16 > > > > > > > kswapd0-39 [001] ..... 240.356506: mm_vmscan_memcg_shrink_end: nr_reclaimed=4 memcg=16 > > > > > > > kswapd0-39 [001] ..... 240.356525: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16 > > > > > > > kswapd0-39 [001] ..... 240.356593: mm_vmscan_memcg_shrink_end: nr_reclaimed=11 memcg=16 > > > > > > > kswapd0-39 [001] ..... 240.356614: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16 > > > > > > > kswapd0-39 [001] ..... 240.356738: mm_vmscan_memcg_shrink_end: nr_reclaimed=25 memcg=16 > > > > > > > kswapd0-39 [001] ..... 240.356790: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16 > > > > > > > kswapd0-39 [001] ..... 240.357125: mm_vmscan_memcg_shrink_end: nr_reclaimed=53 memcg=16 > > > > > > > > > > > > In the previous version I have asked why do we need this specific > > > > > > tracepoint when we already do have trace_mm_vmscan_lru_shrink_{in}active > > > > > > which already give you a very good insight. That includes the number of > > > > > > reclaimed pages but also more. I do see that we do not include memcg id > > > > > > of the reclaimed LRU, but that shouldn't be a big problem to add, no? > > > > > > > > > > >From my point of view, memcg reclaim includes two points: LRU shrink and > > > > > slab shrink, as mentioned in the vmscan.c file. > > > > > > > > > > > > > > > static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc) > > > > > ... > > > > > reclaimed = sc->nr_reclaimed; > > > > > scanned = sc->nr_scanned; > > > > > > > > > > shrink_lruvec(lruvec, sc); > > > > > > > > > > shrink_slab(sc->gfp_mask, pgdat->node_id, memcg, > > > > > sc->priority); > > > > > ... > > > > > > > > > > So, both of these operations are important for understanding whether > > > > > memcg reclaiming was successful or not, as well as its effectiveness. I > > > > > believe it would be beneficial to summarize them, which is why I have > > > > > created new tracepoints. > > > > > > > > This sounds like nice to have rather than must. Put it differently. If > > > > you make existing reclaim trace points memcg aware (print memcg id) then > > > > what prevents you from making analysis you need? > > > > > > You are right, nothing prevents me from making this analysis... but... > > > > > > This approach does have some disadvantages: > > > 1) It requires more changes to vmscan. At the very least, the memcg > > > object should be forwarded to all subfunctions for LRU and SLAB > > > shrinkers. > > > > We should have lruvec or memcg available. lruvec_memcg() could be used > > to get memcg from the lruvec. It might be more places to add the id but > > arguably this would improve them to identify where the memory has been > > scanned/reclaimed from. > > > > Oh, thank you, didn't see this conversion function before... > > > > 2) With this approach, we will not have the ability to trace a situation > > > where the kernel is requesting reclaim for a specific memcg, but due to > > > limits issues, we are unable to run it. > > > > I do not follow. Could you be more specific please? > > > > I'm referring to a situation where kswapd() or another kernel mm code > requests some reclaim pages from memcg, but memcg rejects it due to > limits checkers. This occurs in the shrink_node_memcgs() function. > > === > mem_cgroup_calculate_protection(target_memcg, memcg); > > if (mem_cgroup_below_min(target_memcg, memcg)) { > /* > * Hard protection. > * If there is no reclaimable memory, OOM. > */ > continue; > } else if (mem_cgroup_below_low(target_memcg, memcg)) { > /* > * Soft protection. > * Respect the protection only as long as > * there is an unprotected supply > * of reclaimable memory from other cgroups. > */ > if (!sc->memcg_low_reclaim) { > sc->memcg_low_skipped = 1; > continue; > } > memcg_memory_event(memcg, MEMCG_LOW); > } > === > > With separate shrink begin()/end() tracepoints we can detect such > problem. > > > > > 3) LRU and SLAB shrinkers are too common places to handle memcg-related > > > tasks. Additionally, memcg can be disabled in the kernel configuration. > > > > Right. This could be all hidden in the tracing code. You simply do not > > print memcg id when the controller is disabled. Or just simply print 0. > > I do not really see any major problems with that. > > > > I would really prefer to focus on that direction rather than adding > > another begin/end tracepoint which overalaps with existing begin/end > > traces and provides much more limited information because I would bet we > > will have somebody complaining that mere nr_reclaimed is not sufficient. > > Okay, I will try to prepare a new patch version with memcg printing from > lruvec and slab tracepoints. > > Then Andrew should drop the previous patchsets, I suppose. Please advise > on the correct workflow steps here. Actually, it has already been merged into linux-next... I just checked. Maybe it would be better to prepare lruvec and slab memcg printing as a separate patch series? https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=0e7f0c52a76cb22c8633f21bff6e48fabff6016e -- Thank you, Dmitry