Received: by 2002:a05:6358:45e:b0:b5:b6eb:e1f9 with SMTP id 30csp752371rwe; Wed, 31 Aug 2022 10:18:36 -0700 (PDT) X-Google-Smtp-Source: AA6agR4+diuC9lwe4BoQn5ilkHtuI+pQXT7cON/cLCJFNk4yedNg6WgB7oc3a6B9j36Ywp0wtpqy X-Received: by 2002:aa7:88c4:0:b0:538:4308:fe99 with SMTP id k4-20020aa788c4000000b005384308fe99mr15065472pff.74.1661966316539; Wed, 31 Aug 2022 10:18:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1661966316; cv=none; d=google.com; s=arc-20160816; b=rZVdrvgdPTGNmKsdenI0LnUrdg5IH720dgV/kxbf06BdOFm5t3Mjh8y8IB90GgeuTI QTT3zT6GtoDIOOb2uYJhc7t9OzRUoUiWdRRdSWsFY6EZkc5xnAQtd+Qp0OtEilPABNvY WJsjSCyOIvfIj+CIVNtoUuymSeJ3Hvx9sA1Bf9kMwcuMKD7Xb01j7u08k3WYQX0SDg64 uGdPGC/wbtyxf/8tFAnEDtZrP+JTQrm/aBzsWruTUK9s2mHurdLwiLiOaE/NndXHQIDL TC5M4oxcxeiLCAV220w+4kzPj26kdWF+vJo99LQdsuODmqLMo+1WKhSayVMCZ0DGD/e/ 3+BA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=6n1zc63jDuzyjKJX97Ppd3P3cpaKkA3xHe9Swtr7rxY=; b=OMVdTRPcX9c7JMdo5bB53EPFe62tPghPvRXON9L118vCGzcSGNp43s0jjztwoI8P4L W7XRQSSbdYC5I/k7YE3Wt3khvCuK5JpfAfwZtzShIwRsYKR8hzzElU/WrJpN6KYub9db 2U7TpvHw05uww06Ek0NFMTtxVg+TURMI0eQMozh5JbdOjZc/FOEW38dOPoOCwsbspmMe TK/oZnCJmo5fv41kfPgkySbTE4nk8xy1I0Qs/QoAqGqE9taa8/gdVyZ4jG618TbI/4F1 aLC0owzHxuojT/9C/LkrSu/qnCZeVq+0pbgNfgd7ms8FtWs9tlHOzCgVnUXVz/iPA7vf Fhsw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=VIdq+REr; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s70-20020a637749000000b0042a8ba84c20si5780733pgc.631.2022.08.31.10.18.24; Wed, 31 Aug 2022 10:18:36 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=VIdq+REr; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232274AbiHaQsm (ORCPT + 99 others); Wed, 31 Aug 2022 12:48:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46352 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232059AbiHaQsc (ORCPT ); Wed, 31 Aug 2022 12:48:32 -0400 Received: from mail-yb1-xb2b.google.com (mail-yb1-xb2b.google.com [IPv6:2607:f8b0:4864:20::b2b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9E657D86D6 for ; Wed, 31 Aug 2022 09:48:29 -0700 (PDT) Received: by mail-yb1-xb2b.google.com with SMTP id c9so5009053ybf.5 for ; Wed, 31 Aug 2022 09:48:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date; bh=6n1zc63jDuzyjKJX97Ppd3P3cpaKkA3xHe9Swtr7rxY=; b=VIdq+RErxf0PvdNDjrVGs2xPK3B0rNLymMqmwIf/bH5QaRKd3T3TPNaSejCRtao3kK KXPLbmwg5LRzFYtF5BuBFJiwK/+mPBjKO/MlQ27xXQf8k4L7vWPckSKFmWQP7Dfsp6Sl M5hXBIs6T8rqHUTvYskZOH6CC+dXw6Lm4bBWXUwBF1eDWO7eNfIGjAEigAV+0s81txPn Fz0URG4E8J5Lj4s98oD61ddGJEDNPlbl1qKyb3s4bDCN9GbA0+acWByl//DGVqlqSYLa B+9EEskkUFezCj8PiDfqv37B3qhmrr/PjuX9xcNxSUscGYuyFBYe2yRfMi3mBikEte0+ ygEw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date; bh=6n1zc63jDuzyjKJX97Ppd3P3cpaKkA3xHe9Swtr7rxY=; b=kyFtwO4R4MH0YuAAqmPwYhoio4GXkpIQGKjnW0cljY85pdnc+iU2mDziIxQ/ZWrT3S 6tkIFaoU3weLpK5Q+sfXO+bb5ys8NPRQ+49iD/23AZX00zGuHfwZuwfMkkAoCuw6+4zi xgOIz7Ql3qEDZlCuJM9oseD5RIg5xP6NVp7gOMENvO8iOViWBddlxFt8ZV4S823rao6S HmIFqV5IGs2SU7BYh/0+FAnOc0wQR3Mzli9shLLRqS1pmnRg2RgPfsBEZqN2d+jH7+K8 Hm7eSiFKLJng/YLycrurFKCG1xOq6bd7lqc1Pqrnah++iO74JrbcI9lk8Zaa4LsHH9Xd O/oQ== X-Gm-Message-State: ACgBeo2H5KDjneL4y+90Ak+SPIqG6FNGFimO3pv1xNfSVavPJYZsTfSA rBhi1upbFnY7fRAuebJA/4DudfxwMpx9gj+2qhsR+g== X-Received: by 2002:a05:6902:1366:b0:691:4335:455b with SMTP id bt6-20020a056902136600b006914335455bmr15675462ybb.282.1661964508623; Wed, 31 Aug 2022 09:48:28 -0700 (PDT) MIME-Version: 1.0 References: <20220830214919.53220-1-surenb@google.com> <20220831084230.3ti3vitrzhzsu3fs@moria.home.lan> <20220831101948.f3etturccmp5ovkl@suse.de> In-Reply-To: From: Suren Baghdasaryan Date: Wed, 31 Aug 2022 09:48:17 -0700 Message-ID: Subject: Re: [RFC PATCH 00/30] Code tagging framework and applications To: Michal Hocko Cc: Mel Gorman , Kent Overstreet , Peter Zijlstra , Andrew Morton , Vlastimil Babka , Johannes Weiner , Roman Gushchin , Davidlohr Bueso , Matthew Wilcox , "Liam R. Howlett" , David Vernet , Juri Lelli , Laurent Dufour , Peter Xu , David Hildenbrand , Jens Axboe , mcgrof@kernel.org, masahiroy@kernel.org, nathan@kernel.org, changbin.du@intel.com, ytcoode@gmail.com, Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Benjamin Segall , Daniel Bristot de Oliveira , Valentin Schneider , Christopher Lameter , Pekka Enberg , Joonsoo Kim , 42.hyeyoo@gmail.com, Alexander Potapenko , Marco Elver , dvyukov@google.com, Shakeel Butt , Muchun Song , arnd@arndb.de, jbaron@akamai.com, David Rientjes , Minchan Kim , Kalesh Singh , kernel-team , linux-mm , iommu@lists.linux.dev, kasan-dev@googlegroups.com, io-uring@vger.kernel.org, linux-arch@vger.kernel.org, xen-devel@lists.xenproject.org, linux-bcache@vger.kernel.org, linux-modules@vger.kernel.org, LKML Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Aug 31, 2022 at 8:28 AM Suren Baghdasaryan wrote: > > On Wed, Aug 31, 2022 at 3:47 AM Michal Hocko wrote: > > > > On Wed 31-08-22 11:19:48, Mel Gorman wrote: > > > On Wed, Aug 31, 2022 at 04:42:30AM -0400, Kent Overstreet wrote: > > > > On Wed, Aug 31, 2022 at 09:38:27AM +0200, Peter Zijlstra wrote: > > > > > On Tue, Aug 30, 2022 at 02:48:49PM -0700, Suren Baghdasaryan wrote: > > > > > > =========================== > > > > > > Code tagging framework > > > > > > =========================== > > > > > > Code tag is a structure identifying a specific location in the source code > > > > > > which is generated at compile time and can be embedded in an application- > > > > > > specific structure. Several applications of code tagging are included in > > > > > > this RFC, such as memory allocation tracking, dynamic fault injection, > > > > > > latency tracking and improved error code reporting. > > > > > > Basically, it takes the old trick of "define a special elf section for > > > > > > objects of a given type so that we can iterate over them at runtime" and > > > > > > creates a proper library for it. > > > > > > > > > > I might be super dense this morning, but what!? I've skimmed through the > > > > > set and I don't think I get it. > > > > > > > > > > What does this provide that ftrace/kprobes don't already allow? > > > > > > > > You're kidding, right? > > > > > > It's a valid question. From the description, it main addition that would > > > be hard to do with ftrace or probes is catching where an error code is > > > returned. A secondary addition would be catching all historical state and > > > not just state since the tracing started. > > > > > > It's also unclear *who* would enable this. It looks like it would mostly > > > have value during the development stage of an embedded platform to track > > > kernel memory usage on a per-application basis in an environment where it > > > may be difficult to setup tracing and tracking. Would it ever be enabled > > > in production? Would a distribution ever enable this? If it's enabled, any > > > overhead cannot be disabled/enabled at run or boot time so anyone enabling > > > this would carry the cost without never necessarily consuming the data. > > Thank you for the question. > For memory tracking my intent is to have a mechanism that can be enabled in > the field testing (pre-production testing on a large population of > internal users). > The issue that we are often facing is when some memory leaks are happening > in the field but very hard to reproduce locally. We get a bugreport > from the user > which indicates it but often has not enough information to track it. Note that > quite often these leaks/issues happen in the drivers, so even simply finding out > where they came from is a big help. > The way I envision this mechanism to be used is to enable the basic memory > tracking in the field tests and have a user space process collecting > the allocation > statistics periodically (say once an hour). Once it detects some counter growing > infinitely or atypically (the definition of this is left to the user > space) it can enable > context capturing only for that specific location, still keeping the > overhead to the > minimum but getting more information about potential issues. Collected stats and > contexts are then attached to the bugreport and we get more visibility > into the issue > when we receive it. > The goal is to provide a mechanism with low enough overhead that it > can be enabled > all the time during these field tests without affecting the device's > performance profiles. > Tracing is very cheap when it's disabled but having it enabled all the > time would > introduce higher overhead than the counter manipulations. > My apologies, I should have clarified all this in this cover letter > from the beginning. > > As for other applications, maybe I'm not such an advanced user of > tracing but I think only > the latency tracking application might be done with tracing, assuming > we have all the > right tracepoints but I don't see how we would use tracing for fault > injections and > descriptive error codes. Again, I might be mistaken. Sorry about the formatting of my reply. Forgot to reconfigure the editor on the new machine. > > Thanks, > Suren. > > > > > > > It might be an ease-of-use thing. Gathering the information from traces > > > is tricky and would need combining multiple different elements and that > > > is development effort but not impossible. > > > > > > Whatever asking for an explanation as to why equivalent functionality > > > cannot not be created from ftrace/kprobe/eBPF/whatever is reasonable. > > > > Fully agreed and this is especially true for a change this size > > 77 files changed, 3406 insertions(+), 703 deletions(-) > > > > -- > > Michal Hocko > > SUSE Labs