Received: by 2002:a05:6358:9144:b0:117:f937:c515 with SMTP id r4csp1154079rwr; Wed, 3 May 2023 10:51:55 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ67xmJvOL5hwa//uwk3E7Vnh5z4MoGbw8amRUfihz+I0hoRVNqUA4PFwxDM2G777oAZ50HP X-Received: by 2002:a05:6a20:1614:b0:ec:7696:ee96 with SMTP id l20-20020a056a20161400b000ec7696ee96mr11160768pzj.38.1683136315241; Wed, 03 May 2023 10:51:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1683136315; cv=none; d=google.com; s=arc-20160816; b=paMW3HbJLb12NZRrSEw41ncDjIRv69qWIGOxfnoSh4h6dVG+zzAHqD2yoRfbX5LNor iBisCANHY4rTvUnyvN2GXQpKFPCLl90Sh9j8zEvXqp+uupfhc1fqZFvBstNUZb5fxtzG xYGRvrUyfJEDsIjaTMNRNhWyVmVcwpL9G1X8Z1VUzqk2gww9fbiIaF1REjT5XU+YOLhd eCQ8EXEhbeU5yO3RpURHdWzAebHF58rncwZzfZ6B9RMzoP4n+tlxJMnDFHO38J5CImJi lQ7jMPlfoISkhPptGKjQCkEJSrB5AKARidzAniy1G43pgByo6OmCZG9d6D8j4BXPrDqf pswA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=CyWASKhXrRNVBOOq8h+mxQvWE1hyBqHS7EnIBAfoYzo=; b=L3x1HK7eUfFfDaeJplpdRe6Mnq2mPmU/fPllgCuFJKNVUBcGdZ2sLvpX+ERrl0DoGQ IPbFh5sJr3amnpQPw1W/wOOiBf619yLLSmXfPT9IkMDuEE+QjTpsv2SAbojMhTpw9aE4 DeermY3GnZY+ytYdoUYdAvL3YTU7l35hgyurhm0VXYGRG8XdXzLHHEMCbA23X17IhLJN kv6CNRCuolLhDpFmJ3Dr6E2bD5LiNrsWGDIYWqFz0Vqa1IOIi+hoc5UdGW5UYoZYOeHu 6b4iaOWERHXbZ5KiE8vkl9oS88yHVrpU42cZcDAZqU6luDKxqg3kIR9MwCDYf9ShhntV WyFg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=lILby+6O; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id w6-20020a17090aea0600b0024e059b55adsi7122586pjy.116.2023.05.03.10.51.42; Wed, 03 May 2023 10:51:55 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=lILby+6O; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229892AbjECRm3 (ORCPT + 99 others); Wed, 3 May 2023 13:42:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36266 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229538AbjECRm1 (ORCPT ); Wed, 3 May 2023 13:42:27 -0400 Received: from mail-yb1-xb32.google.com (mail-yb1-xb32.google.com [IPv6:2607:f8b0:4864:20::b32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8CADAAC for ; Wed, 3 May 2023 10:42:24 -0700 (PDT) Received: by mail-yb1-xb32.google.com with SMTP id 3f1490d57ef6-b9a6869dd3cso7930515276.2 for ; Wed, 03 May 2023 10:42:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1683135744; x=1685727744; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=CyWASKhXrRNVBOOq8h+mxQvWE1hyBqHS7EnIBAfoYzo=; b=lILby+6OTcd6uGew9Gqa7pI7NZDERobvBEoUxW0HhFyA1BRI3IbjTEpaxJY5cI5dqK DA1zSIWnr7NZMDeUF05eCVuGGh2CT9vrOAzryneu8JduHDYkP6FYutQNmyLESxSwuuGA QuyLAcoDwkidYgw1urIbr/7WmePlI+jjpkr/y8SQuG47YOMKyQjyUqQZLY1V6cSIQnsN gF8ibE844UGbtwqhvwKyiswwRZ5DNj7ChXG/ByW2u3EBE9OETY+04pa10/S8xbX8Bwfw LuAtgmQvuglHVzqT12ntCD4oe90ZORyd/+eYUAlMEp51urNfjxmt4VM8J4BfTDsPwJJU u65A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683135744; x=1685727744; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=CyWASKhXrRNVBOOq8h+mxQvWE1hyBqHS7EnIBAfoYzo=; b=Jo7IcAf5Mx8sjzH+JmWcR5EzIu4+rEA+hDhpF+o5UuVXlZu/v8WyZT+XkvTZ/WdYOQ BnwIRanhGA0xtDPppyB9oS2pVCwh+4KPzROhsGifc9m/ohzxRFF+6gaJz2PwbtTh0kbE +2tG1n/4Bfr/yLYCtRrWPaLnUw6XQHZOqcuDS2Z4760Yt1h3cTghmuCCqc5PHyB8auDK TP0b1IajHy2+aVXeZNhgNu+PHpe5YqCD5TcZLQFCBG0k3VAlSXWYohtERx1vDgfIhlwa lh5O73RNuow8R2J0pEsPAqIj+WzhmD/LbJi7xsaXVSlt3XhX2SoTwyysubHVjkCBkyck nAsg== X-Gm-Message-State: AC+VfDyGdXsFuDLJ+04AyMavh1hQ8GO7yzhK+ApB90mJKGpXHQICEa8V Fc+ampk+dSEtOpMNSaSCvMXlKtpprWqbgGJB/OhRuQ== X-Received: by 2002:a25:b782:0:b0:b95:2bd5:8f86 with SMTP id n2-20020a25b782000000b00b952bd58f86mr19664721ybh.26.1683135743488; Wed, 03 May 2023 10:42:23 -0700 (PDT) MIME-Version: 1.0 References: <20230501165450.15352-1-surenb@google.com> In-Reply-To: From: Suren Baghdasaryan Date: Wed, 3 May 2023 10:42:11 -0700 Message-ID: Subject: Re: [PATCH 00/40] Memory allocation profiling To: Tejun Heo Cc: Kent Overstreet , Michal Hocko , akpm@linux-foundation.org, vbabka@suse.cz, hannes@cmpxchg.org, roman.gushchin@linux.dev, mgorman@suse.de, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, corbet@lwn.net, void@manifault.com, peterz@infradead.org, juri.lelli@redhat.com, ldufour@linux.ibm.com, catalin.marinas@arm.com, will@kernel.org, arnd@arndb.de, tglx@linutronix.de, mingo@redhat.com, dave.hansen@linux.intel.com, x86@kernel.org, peterx@redhat.com, david@redhat.com, axboe@kernel.dk, mcgrof@kernel.org, masahiroy@kernel.org, nathan@kernel.org, dennis@kernel.org, muchun.song@linux.dev, rppt@kernel.org, paulmck@kernel.org, pasha.tatashin@soleen.com, yosryahmed@google.com, yuzhao@google.com, dhowells@redhat.com, hughd@google.com, andreyknvl@gmail.com, keescook@chromium.org, ndesaulniers@google.com, gregkh@linuxfoundation.org, ebiggers@google.com, ytcoode@gmail.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, bristot@redhat.com, vschneid@redhat.com, cl@linux.com, penberg@kernel.org, iamjoonsoo.kim@lge.com, 42.hyeyoo@gmail.com, glider@google.com, elver@google.com, dvyukov@google.com, shakeelb@google.com, songmuchun@bytedance.com, jbaron@akamai.com, rientjes@google.com, minchan@google.com, kaleshsingh@google.com, kernel-team@android.com, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, iommu@lists.linux.dev, linux-arch@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-modules@vger.kernel.org, kasan-dev@googlegroups.com, cgroups@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, May 3, 2023 at 9:35=E2=80=AFAM Tejun Heo wrote: > > Hello, Kent. > > On Wed, May 03, 2023 at 04:05:08AM -0400, Kent Overstreet wrote: > > No, we're still waiting on the tracing people to _demonstrate_, not > > claim, that this is at all possible in a comparable way with tracing. > > So, we (meta) happen to do stuff like this all the time in the fleet to h= unt > down tricky persistent problems like memory leaks, ref leaks, what-have-y= ou. > In recent kernels, with kprobe and BPF, our ability to debug these sorts = of > problems has improved a great deal. Below, I'm attaching a bcc script I u= sed > to hunt down, IIRC, a double vfree. It's not exactly for a leak but leaks > can follow the same pattern. Thanks for sharing, Tejun! > > There are of course some pros and cons to this approach: > > Pros: > > * The framework doesn't really have any runtime overhead, so we can have = it > deployed in the entire fleet and debug wherever problem is. Do you mean it has no runtime overhead when disabled? If so, do you know what's the overhead when enabled? I want to understand if that's truly a viable solution to track all allocations (including slab) all the time. Thanks, Suren. > > * It's fully flexible and programmable which enables non-trivial filterin= g > and summarizing to be done inside kernel w/ BPF as necessary, which is > pretty handy for tracking high frequency events. > > * BPF is pretty performant. Dedicated built-in kernel code can do better = of > course but BPF's jit compiled code & its data structures are fast enoug= h. > I don't remember any time this was a problem. > > Cons: > > * BPF has some learning curve. Also the fact that what it provides is a w= ide > open field rather than something scoped out for a specific problem can > make it seem a bit daunting at the beginning. > > * Because tracking starts when the script starts running, it doesn't know > anything which has happened upto that point, so you gotta pay attention= to > handling e.g. handling frees which don't match allocs. It's kinda annoy= ing > but not a huge problem usually. There are ways to build in BPF progs in= to > the kernel and load it early but I haven't experiemnted with it yet > personally. > > I'm not necessarily against adding dedicated memory debugging mechanism b= ut > do wonder whether the extra benefits would be enough to justify the code = and > maintenance overhead. > > Oh, a bit of delta but for anyone who's more interested in debugging > problems like this, while I tend to go for bcc > (https://github.com/iovisor/bcc) for this sort of problems. Others prefer= to > write against libbpf directly or use bpftrace > (https://github.com/iovisor/bpftrace). > > Thanks. > > #!/usr/bin/env bcc-py > > import bcc > import time > import datetime > import argparse > import os > import sys > import errno > > description =3D """ > Record vmalloc/vfrees and trigger on unmatched vfree > """ > > bpf_source =3D """ > #include > #include > > struct vmalloc_rec { > unsigned long ptr; > int last_alloc_stkid; > int last_free_stkid; > int this_stkid; > bool allocated; > }; > > BPF_STACK_TRACE(stacks, 8192); > BPF_HASH(vmallocs, unsigned long, struct vmalloc_rec, 131072); > BPF_ARRAY(dup_free, struct vmalloc_rec, 1); > > int kpret_vmalloc_node_range(struct pt_regs *ctx) > { > unsigned long ptr =3D PT_REGS_RC(ctx); > uint32_t zkey =3D 0; > struct vmalloc_rec rec_init =3D { }; > struct vmalloc_rec *rec; > int stkid; > > if (!ptr) > return 0; > > stkid =3D stacks.get_stackid(ctx, 0); > > rec_init.ptr =3D ptr; > rec_init.last_alloc_stkid =3D -1; > rec_init.last_free_stkid =3D -1; > rec_init.this_stkid =3D -1; > > rec =3D vmallocs.lookup_or_init(&ptr, &rec_init); > rec->allocated =3D true; > rec->last_alloc_stkid =3D stkid; > return 0; > } > > int kp_vfree(struct pt_regs *ctx, const void *addr) > { > unsigned long ptr =3D (unsigned long)addr; > uint32_t zkey =3D 0; > struct vmalloc_rec rec_init =3D { }; > struct vmalloc_rec *rec; > int stkid; > > stkid =3D stacks.get_stackid(ctx, 0); > > rec_init.ptr =3D ptr; > rec_init.last_alloc_stkid =3D -1; > rec_init.last_free_stkid =3D -1; > rec_init.this_stkid =3D -1; > > rec =3D vmallocs.lookup_or_init(&ptr, &rec_init); > if (!rec->allocated && rec->last_alloc_stkid >=3D 0) { > rec->this_stkid =3D stkid; > dup_free.update(&zkey, rec); > } > > rec->allocated =3D false; > rec->last_free_stkid =3D stkid; > return 0; > } > """ > > bpf =3D bcc.BPF(text=3Dbpf_source) > bpf.attach_kretprobe(event=3D"__vmalloc_node_range", fn_name=3D"kpret_vma= lloc_node_range"); > bpf.attach_kprobe(event=3D"vfree", fn_name=3D"kp_vfree"); > bpf.attach_kprobe(event=3D"vfree_atomic", fn_name=3D"kp_vfree"); > > stacks =3D bpf["stacks"] > vmallocs =3D bpf["vmallocs"] > dup_free =3D bpf["dup_free"] > last_dup_free_ptr =3D dup_free[0].ptr > > def print_stack(stkid): > for addr in stacks.walk(stkid): > sym =3D bpf.ksym(addr) > print(' {}'.format(sym)) > > def print_dup(dup): > print('allocated=3D{} ptr=3D{}'.format(dup.allocated, hex(dup.ptr))) > if (dup.last_alloc_stkid >=3D 0): > print('last_alloc_stack: ') > print_stack(dup.last_alloc_stkid) > if (dup.last_free_stkid >=3D 0): > print('last_free_stack: ') > print_stack(dup.last_free_stkid) > if (dup.this_stkid >=3D 0): > print('this_stack: ') > print_stack(dup.this_stkid) > > while True: > time.sleep(1) > > if dup_free[0].ptr !=3D last_dup_free_ptr: > print('\nDUP_FREE:') > print_dup(dup_free[0]) > last_dup_free_ptr =3D dup_free[0].ptr > > -- > To unsubscribe from this group and stop receiving emails from it, send an= email to kernel-team+unsubscribe@android.com. >