Received: by 2002:a05:6358:bb9e:b0:b9:5105:a5b4 with SMTP id df30csp3438725rwb; Mon, 5 Sep 2022 11:39:30 -0700 (PDT) X-Google-Smtp-Source: AA6agR5cy4IrGwmEPtEhrER7aGXybMOPTJBpeATlyh7nBc4Y9c4XFigEvy8HkVIujuEn1UsOXIpL X-Received: by 2002:a17:903:2343:b0:176:830a:c2ae with SMTP id c3-20020a170903234300b00176830ac2aemr12997451plh.107.1662403169297; Mon, 05 Sep 2022 11:39:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1662403169; cv=none; d=google.com; s=arc-20160816; b=pPQ5HJ5V0b2b5Pslo+a77QGDKrliJN8Oj9zJCnYArVE2RHdQvK+JGwpAnuRrunIZsf 1z2VVLYaQfmE+93wwBe4C6qdqSFoNJeOSfsuVEm7gb9WJwWkMien2jIZmj+QSntfBroK 9E1XUY/qLzz8WZKzvZT2anRyvof/uK/083mhw6MG8JXaOj+miZG82E0XlfzEttMJLhCg emMa03uI+dt+oRbjxxLe+ku+iLmt9tS1EaJHKKwThbPw2/uYv9RuLJytNPGbetd/Gbqq Ky48UGYjLk3yNRWdJfjXOWswTjqloQDzLqZNYFQBFGvfYQr248BpHb2Thx4HyX0dLS6O D6NQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=T67TRgbvL4Ai6kaNg3fck6vcoHVKT+0s4pwJ/NSZTGA=; b=i+8o7aJUKdP/5Rc6sFB9JEsLv0RsGlgX/kQ+7JDfZccpJBjYDFZKVuhhes2KnMcLQj AFvPVnTB0T9WIc409Fnf2evWcj/EwNEq5Y1Vf9nPZpr39cORRwW5wWqu/E6C2uNEDWZD HCrgZpyff8bk3+X8PqxKUggqtMZ+X7g+GvIluF2mC0ipgIqqrRvr31PmDehi7CNbB7UC dh8ZbrrXFH6Kz4FHum/7i9o5wfIp7x+Z3gP45SthL/D6zJbXGKovaieKsrq78dURFcUP QzTKgO4EQKGLCi8u0Uy1+8hFZXlmtmsvhV9QeFUrWv81Ee2bZiGS4bb+ek/807Xlg5uN wkNA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=cPF8g7Tq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id k15-20020aa7820f000000b00535e3dc1519si10266695pfi.64.2022.09.05.11.39.18; Mon, 05 Sep 2022 11:39:29 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=cPF8g7Tq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231466AbiIESDw (ORCPT + 99 others); Mon, 5 Sep 2022 14:03:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42110 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231545AbiIESDt (ORCPT ); Mon, 5 Sep 2022 14:03:49 -0400 Received: from mail-il1-x136.google.com (mail-il1-x136.google.com [IPv6:2607:f8b0:4864:20::136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3C7B642ACA for ; Mon, 5 Sep 2022 11:03:47 -0700 (PDT) Received: by mail-il1-x136.google.com with SMTP id r7so4931528ile.11 for ; Mon, 05 Sep 2022 11:03:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date; bh=T67TRgbvL4Ai6kaNg3fck6vcoHVKT+0s4pwJ/NSZTGA=; b=cPF8g7Tq5Xr7vWwWPXlAIGbvK841YSQ7JL9eISWcCc1GmuozWN4LmUfAHYFGsC3plp xbkMUOOnQhpFxa0XIBosBwjnD4RAr2Kl5FqcTcUiqI63BnVfEWkAAfAMPK2+2D9jC4rI SBcZs3gwdK8qGIp1MV9X/fGPGkKuR5qRQqaUNUEfXT08Tya/qRlmc89AlTJ5Ml7Xorrw xo1/RpLc8BQEQ206qriFaMJIssLZDxhxOljYpigcvkdlTlPx06einaUW7/uPd1dNz5in HXBRJcn7nXiV8BlMPqUAxSW6U9l+lE6gj/Oi85wJcyYJwIvqaZZYLOTvZP+kHnJ13k+5 7W5g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date; bh=T67TRgbvL4Ai6kaNg3fck6vcoHVKT+0s4pwJ/NSZTGA=; b=0WmBl72JB8cPB5STMHkW/Jif8X3V2vABP9Gl0pcE78Fr2E3iJILSTAfXMa/3oXc8bE pg6YJS+8zbeeXlcxGigvYtG4cbvhcsbw+SLpIK3JXuH1OK0IqJYi4LOdUgD79IlZEdfV GmrTuEXtagdJm3m00yRwADHQU6rzgVPgLqoQ6ISs4cr8KzUIvMs/Ut9wuIhquAOlPqjj NOz2qfdelc1KSI82D0So8vAMsgMjs5caGBhhU28OnkolFpEI0tj622TKml2qtdKLANSG LL2tqlAFRMFXZ29DN2xekXF5J8Lyl1I9Ma8kjNJjiJtqwn5lgJOpXi44sOLfs4XN2wyy ZEPw== X-Gm-Message-State: ACgBeo3g+zJ9jBD/m41MwZE28uYpfpYTiQuGEu21R1JlZrafag+tsSKR 42JQU6rYQ+LGxzJw3gyPCXSOdPH4Y3Pz0dsqvqn+hA== X-Received: by 2002:a05:6e02:1ba8:b0:2eb:7d50:5fb8 with SMTP id n8-20020a056e021ba800b002eb7d505fb8mr14014798ili.296.1662401026346; Mon, 05 Sep 2022 11:03:46 -0700 (PDT) MIME-Version: 1.0 References: <20220830214919.53220-1-surenb@google.com> <20220831084230.3ti3vitrzhzsu3fs@moria.home.lan> <20220831101948.f3etturccmp5ovkl@suse.de> <20220831190154.qdlsxfamans3ya5j@moria.home.lan> In-Reply-To: From: Suren Baghdasaryan Date: Mon, 5 Sep 2022 11:03:35 -0700 Message-ID: Subject: Re: [RFC PATCH 00/30] Code tagging framework and applications To: Michal Hocko Cc: Kent Overstreet , Mel Gorman , Peter Zijlstra , Andrew Morton , Vlastimil Babka , Johannes Weiner , Roman Gushchin , Davidlohr Bueso , Matthew Wilcox , "Liam R. Howlett" , David Vernet , Juri Lelli , Laurent Dufour , Peter Xu , David Hildenbrand , Jens Axboe , mcgrof@kernel.org, masahiroy@kernel.org, nathan@kernel.org, changbin.du@intel.com, ytcoode@gmail.com, Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Benjamin Segall , Daniel Bristot de Oliveira , Valentin Schneider , Christopher Lameter , Pekka Enberg , Joonsoo Kim , 42.hyeyoo@gmail.com, Alexander Potapenko , Marco Elver , Dmitry Vyukov , Shakeel Butt , Muchun Song , arnd@arndb.de, jbaron@akamai.com, David Rientjes , Minchan Kim , Kalesh Singh , kernel-team , linux-mm , iommu@lists.linux.dev, kasan-dev@googlegroups.com, io-uring@vger.kernel.org, linux-arch@vger.kernel.org, xen-devel@lists.xenproject.org, linux-bcache@vger.kernel.org, linux-modules@vger.kernel.org, LKML Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Sep 5, 2022 at 1:12 AM Michal Hocko wrote: > > On Sun 04-09-22 18:32:58, Suren Baghdasaryan wrote: > > On Thu, Sep 1, 2022 at 12:15 PM Michal Hocko wrote: > [...] > > > Yes, tracking back the call trace would be really needed. The question > > > is whether this is really prohibitively expensive. How much overhead are > > > we talking about? There is no free lunch here, really. You either have > > > the overhead during runtime when the feature is used or on the source > > > code level for all the future development (with a maze of macros and > > > wrappers). > > > > As promised, I profiled a simple code that repeatedly makes 10 > > allocations/frees in a loop and measured overheads of code tagging, > > call stack capturing and tracing+BPF for page and slab allocations. > > Summary: > > > > Page allocations (overheads are compared to get_free_pages() duration): > > 6.8% Codetag counter manipulations (__lazy_percpu_counter_add + __alloc_tag_add) > > 8.8% lookup_page_ext > > 1237% call stack capture > > 139% tracepoint with attached empty BPF program > > Yes, I am not surprised that the call stack capturing is really > expensive comparing to the allocator fast path (which is really highly > optimized and I suspect that with 10 allocation/free loop you mostly get > your memory from the pcp lists). Is this overhead still _that_ visible > for somehow less microoptimized workloads which have to take slow paths > as well? Correct, it's a comparison with the allocation fast path, so in a sense represents the worst case scenario. However at the same time the measurements are fair because they measure the overheads against the same meaningful baseline, therefore can be used for comparison. > > Also what kind of stack unwinder is configured (I guess ORC)? This is > not my area but from what I remember the unwinder overhead varies > between ORC and FP. I used whatever is default and didn't try other mechanisms. Don't think the difference would be orders of magnitude better though. > > And just to make it clear. I do realize that an overhead from the stack > unwinding is unavoidable. And code tagging would logically have lower > overhead as it performs much less work. But the main point is whether > our existing stack unwiding approach is really prohibitively expensive > to be used for debugging purposes on production systems. I might > misremember but I recall people having bigger concerns with page_owner > memory footprint than the actual stack unwinder overhead. That's one of those questions which are very difficult to answer (if even possible) because that would depend on the use scenario. If the workload allocates frequently then adding the overhead will likely affect it, otherwise might not be even noticeable. In general, in pre-production testing we try to minimize the difference in performance and memory profiles between the software we are testing and the production one. From that point of view, the smaller the overhead, the better. I know it's kinda obvious but unfortunately I have no better answer to that question. For the memory overhead, in my early internal proposal with assumption of 10000 instrumented allocation call sites, I've made some calculations for an 8GB 8-core system (quite typical for Android) and ended up with the following: per-cpu counters atomic counters page_ext references 16MB 16MB slab object references 10.5MB 10.5MB alloc_tags 900KB 312KB Total memory overhead 27.4MB 26.8MB so, about 0.34% of the total memory. Our implementation has changed since then and the number might not be completely correct but it should be in the ballpark. I just checked the number of instrumented calls that we currently have in the 6.0-rc3 built with defconfig and it's 165 page allocation and 2684 slab allocation sites. I readily accept that we are probably missing some allocations and additional modules can also contribute to these numbers but my guess it's still less than 10000 that I used in my calculations. I don't claim that 0.34% overhead is low enough to be always acceptable, just posting the numbers to provide some reference points. > -- > Michal Hocko > SUSE Labs