Received: by 2002:a05:6358:45e:b0:b5:b6eb:e1f9 with SMTP id 30csp650814rwe; Wed, 31 Aug 2022 08:42:35 -0700 (PDT) X-Google-Smtp-Source: AA6agR5UtBfyZuXi3gVJfzrPWk7ATva3XU8KO2SqAViUlRjvAtaSLBv/NuNfpeWZxHa2biy8jWWt X-Received: by 2002:a05:6402:2423:b0:446:3b6b:e3ee with SMTP id t35-20020a056402242300b004463b6be3eemr25299298eda.412.1661960554775; Wed, 31 Aug 2022 08:42:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1661960554; cv=none; d=google.com; s=arc-20160816; b=0/bruGW35GcQwWro6ZaqppF+3ax2ZACEEIOagnn9Tow6zHi5OJ11aXztuz5wYb89Oa gTdW5kFS0vnLL3tl4QQHa4sSFsVN1LU+9eWoRzeza/F3SlzGFjY3HIKgn//vMG9/qs3D ls/PK2T5iiombi0rgIYwHWYLQA5fMM/8EuBYMoJrPF6GOjLdU2q1+/afA8SfvTkj4bwN nbrGseJ6t12HwTmZENeizzrXg7qs7yY4djRoIDygcnvB+X37zDTui1+8wLOOmE7pBqPW xROliV/ufrcE+PG8K7eEe7+ddYomkiS5MSniEW2dWvK0WWNulZGzYZVA2z7OZBrczDjS nQAw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=WG3L5AqBV6fri0E+FNG2tsbDp/uFyzzRTMikwR7NSY8=; b=WR/RPgjhfsYv2wYAx/3TojqXX7ClQuhNc0VONnntPuylJrEOjoVkxTPAqL9Vti8hw7 CAWAZ3zRUdcRYtLhEwGPfC1ewPK9Z19wTE00OSDP2Uh755jQrTSMPSpF+vKq/S5ELm06 zKdX7ZsJpjtcb0WLYOoJ5QGor8jgtK2LwcHKcOVTXpyHLpQ2me+yAKGnwBEMT9sKa2il qlHgn7W9GUCm9oxV1VrKqGggsmuQfp99J/mnJQvMAyt9CGitRouW6w/88dhXvCBVnWh5 bahGC/Hsf9HvmWNoj6Tizk68xYp/Su35ZKfmkPmllbVALN0DqtQ0FeWkDIjUQqfd7imN tYbA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=K4mZtzeF; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b7-20020a056402350700b00448d387c313si4430077edd.160.2022.08.31.08.42.09; Wed, 31 Aug 2022 08:42:34 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=K4mZtzeF; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231259AbiHaP25 (ORCPT + 99 others); Wed, 31 Aug 2022 11:28:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46198 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231488AbiHaP2z (ORCPT ); Wed, 31 Aug 2022 11:28:55 -0400 Received: from mail-yw1-x112c.google.com (mail-yw1-x112c.google.com [IPv6:2607:f8b0:4864:20::112c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BDC6ED91E8 for ; Wed, 31 Aug 2022 08:28:52 -0700 (PDT) Received: by mail-yw1-x112c.google.com with SMTP id 00721157ae682-340f82c77baso195106647b3.1 for ; Wed, 31 Aug 2022 08:28:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc; bh=WG3L5AqBV6fri0E+FNG2tsbDp/uFyzzRTMikwR7NSY8=; b=K4mZtzeFsKxJKOsvfz3u8buWcDbdHB0bPmoDat2amZVxf6i7fSAQFaeyCHfBhOW0gn TYiPTPwinxH8yKQGmCabK7zsAg7t70798MwIPLUmtbWdWtkPoH8nkkd0L6+Cf/71NitT pIpTm135oSjNuBnp7lYQC08Y98tRGNR+0DcT0LYO7fxV5Ng1p7CsbsO1uZH/I63XPhjj LoKETIXPqOU+NyCB1s85tiwuh2e//ou8VKmFXVH3pOn419Mn7E1W55VRw4JlXGoTp7nX lwPAdy+2tjWZHnwP0BxNNwETNl+fZjmT1bVCkdUM4qWebnjcYikmadptjs0201jXoXGN /Wug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc; bh=WG3L5AqBV6fri0E+FNG2tsbDp/uFyzzRTMikwR7NSY8=; b=i2KhSJHbZPD+ciG2vIl6BtM1bkrThgqpPdBQ/p9LkUN+G0OfaSrqZCYhH2lwihcALP ulPphd5FL5P/CSaMJ5SNtfiJRyqbuzvanb9Z9E/qTALtOhwX+Kzr7OeOCKgYrOgCPBA5 JXgjJeTjH3voA2xku35SYMDFd5RIVSHsvALscZTeKpG83A05P5ji0SMsINcR8fYy2L+3 0eQeYQhlsBTy7JVJt8tpCgCeGqtZKFQ7nf8h0s+4f5oGlofFLf3V9buaz3MVTskNay7/ EvvTOsujAk96Da+zamYsYy15Z0u1zOUtFs+VOQreh2n+QA2NRK0iVKl+Ma6QvxtD641N q7Cg== X-Gm-Message-State: ACgBeo2Ou2t3+9r89CHRsB9KSWHZT3dwfe1Wj3aGWfWkbXqk9Dr24qJb CGpH+kERshWgGhukBP7jRcMkw0FKw2X+8Hs2eOSFcA== X-Received: by 2002:a0d:d850:0:b0:340:d2c0:b022 with SMTP id a77-20020a0dd850000000b00340d2c0b022mr16165795ywe.469.1661959731749; Wed, 31 Aug 2022 08:28:51 -0700 (PDT) MIME-Version: 1.0 References: <20220830214919.53220-1-surenb@google.com> <20220831084230.3ti3vitrzhzsu3fs@moria.home.lan> <20220831101948.f3etturccmp5ovkl@suse.de> In-Reply-To: From: Suren Baghdasaryan Date: Wed, 31 Aug 2022 08:28:40 -0700 Message-ID: Subject: Re: [RFC PATCH 00/30] Code tagging framework and applications To: Michal Hocko Cc: Mel Gorman , Kent Overstreet , Peter Zijlstra , Andrew Morton , Vlastimil Babka , Johannes Weiner , Roman Gushchin , Davidlohr Bueso , Matthew Wilcox , "Liam R. Howlett" , David Vernet , Juri Lelli , Laurent Dufour , Peter Xu , David Hildenbrand , Jens Axboe , mcgrof@kernel.org, masahiroy@kernel.org, nathan@kernel.org, changbin.du@intel.com, ytcoode@gmail.com, Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Benjamin Segall , Daniel Bristot de Oliveira , Valentin Schneider , Christopher Lameter , Pekka Enberg , Joonsoo Kim , 42.hyeyoo@gmail.com, Alexander Potapenko , Marco Elver , dvyukov@google.com, Shakeel Butt , Muchun Song , arnd@arndb.de, jbaron@akamai.com, David Rientjes , Minchan Kim , Kalesh Singh , kernel-team , linux-mm , iommu@lists.linux.dev, kasan-dev@googlegroups.com, io-uring@vger.kernel.org, linux-arch@vger.kernel.org, xen-devel@lists.xenproject.org, linux-bcache@vger.kernel.org, linux-modules@vger.kernel.org, LKML Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Aug 31, 2022 at 3:47 AM Michal Hocko wrote: > > On Wed 31-08-22 11:19:48, Mel Gorman wrote: > > On Wed, Aug 31, 2022 at 04:42:30AM -0400, Kent Overstreet wrote: > > > On Wed, Aug 31, 2022 at 09:38:27AM +0200, Peter Zijlstra wrote: > > > > On Tue, Aug 30, 2022 at 02:48:49PM -0700, Suren Baghdasaryan wrote: > > > > > =========================== > > > > > Code tagging framework > > > > > =========================== > > > > > Code tag is a structure identifying a specific location in the source code > > > > > which is generated at compile time and can be embedded in an application- > > > > > specific structure. Several applications of code tagging are included in > > > > > this RFC, such as memory allocation tracking, dynamic fault injection, > > > > > latency tracking and improved error code reporting. > > > > > Basically, it takes the old trick of "define a special elf section for > > > > > objects of a given type so that we can iterate over them at runtime" and > > > > > creates a proper library for it. > > > > > > > > I might be super dense this morning, but what!? I've skimmed through the > > > > set and I don't think I get it. > > > > > > > > What does this provide that ftrace/kprobes don't already allow? > > > > > > You're kidding, right? > > > > It's a valid question. From the description, it main addition that would > > be hard to do with ftrace or probes is catching where an error code is > > returned. A secondary addition would be catching all historical state and > > not just state since the tracing started. > > > > It's also unclear *who* would enable this. It looks like it would mostly > > have value during the development stage of an embedded platform to track > > kernel memory usage on a per-application basis in an environment where it > > may be difficult to setup tracing and tracking. Would it ever be enabled > > in production? Would a distribution ever enable this? If it's enabled, any > > overhead cannot be disabled/enabled at run or boot time so anyone enabling > > this would carry the cost without never necessarily consuming the data. Thank you for the question. For memory tracking my intent is to have a mechanism that can be enabled in the field testing (pre-production testing on a large population of internal users). The issue that we are often facing is when some memory leaks are happening in the field but very hard to reproduce locally. We get a bugreport from the user which indicates it but often has not enough information to track it. Note that quite often these leaks/issues happen in the drivers, so even simply finding out where they came from is a big help. The way I envision this mechanism to be used is to enable the basic memory tracking in the field tests and have a user space process collecting the allocation statistics periodically (say once an hour). Once it detects some counter growing infinitely or atypically (the definition of this is left to the user space) it can enable context capturing only for that specific location, still keeping the overhead to the minimum but getting more information about potential issues. Collected stats and contexts are then attached to the bugreport and we get more visibility into the issue when we receive it. The goal is to provide a mechanism with low enough overhead that it can be enabled all the time during these field tests without affecting the device's performance profiles. Tracing is very cheap when it's disabled but having it enabled all the time would introduce higher overhead than the counter manipulations. My apologies, I should have clarified all this in this cover letter from the beginning. As for other applications, maybe I'm not such an advanced user of tracing but I think only the latency tracking application might be done with tracing, assuming we have all the right tracepoints but I don't see how we would use tracing for fault injections and descriptive error codes. Again, I might be mistaken. Thanks, Suren. > > > > It might be an ease-of-use thing. Gathering the information from traces > > is tricky and would need combining multiple different elements and that > > is development effort but not impossible. > > > > Whatever asking for an explanation as to why equivalent functionality > > cannot not be created from ftrace/kprobe/eBPF/whatever is reasonable. > > Fully agreed and this is especially true for a change this size > 77 files changed, 3406 insertions(+), 703 deletions(-) > > -- > Michal Hocko > SUSE Labs