Received: by 2002:a05:6a10:2726:0:0:0:0 with SMTP id ib38csp11644pxb; Wed, 30 Mar 2022 21:21:52 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzjPTf3RYF5wgWPK6tsISf53l6s0u4qKEZIXspgEYfBTULw1hfrYEH9EEP2i0CUfjkxFsvj X-Received: by 2002:a63:d64d:0:b0:374:6edc:989c with SMTP id d13-20020a63d64d000000b003746edc989cmr9144218pgj.434.1648700512119; Wed, 30 Mar 2022 21:21:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1648700512; cv=none; d=google.com; s=arc-20160816; b=Ay5jx6cUxrtixeN3sNJ59zPbRfLH7slpMDUvS0EljsgVb8RtB1t6GfrJJrT0mQxxad WBBLoj0VANclhOG4LjWJTM/IEmcOzbi4oyOId6JIh8n9N0knSIrYA8qytFtnsd6gPetz od4cSXsxVZujy1j2UkKxPu46HvygnO+ygz3faZKwvOv58all0uLN6L/+FC80/1f4rC7i ZlJtgcRdg4fqpWKFKT/FYYAWfyfHNG1HIA7gxhN9EKOgNS+/vdZizPfEFu21fPqMdI0r SRaPAFS9dMqIbx5/AXmnar9MKdOrnsbXQ0gbK4orJ8B+Cngjp16aH0mFJnCulDw1fgz6 g2kA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=IxJ+2fqQtSsTYPlRGM3piQWIajIXCVBAF61vrtKQylo=; b=pDk9z9bxVDOLFk7ckN08yemASY8GskMrngDoFR61xxwTQsCaatWVWZk/nwT8uqJGvK 2gO5spjYoAqVuUkMG0kFhg3lUvVKVdnUfN+ADL5OurtBXOvu5ku293etCGlCPftoGKQ3 mwx/GhV+L0igf0laYGkQIaa6hAzFA1xUMUpIK4WzE7rsW7cJkWj9U5OY3Hd/q8xBzYXy FFBniqjGv+ZKVfhet59t2KLb7CMy2REr3iJlFigq+KRJRATvsratC4UyKwey3cFgqUWD Sl/nU6IBQtPz0y2Wrgihtg3UI2VFcb1HN8RQXkq/PnmtfPuSKn/8CfeWTfZIWKV3zULD znuA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=UCBqabKf; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id n17-20020a170902e55100b00153b2d16429si23715277plf.49.2022.03.30.21.21.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 30 Mar 2022 21:21:52 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=UCBqabKf; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 8FD6919E382; Wed, 30 Mar 2022 20:17:26 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1351052AbiC3Uls (ORCPT + 99 others); Wed, 30 Mar 2022 16:41:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38886 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232155AbiC3Ulq (ORCPT ); Wed, 30 Mar 2022 16:41:46 -0400 Received: from mail-pf1-x436.google.com (mail-pf1-x436.google.com [IPv6:2607:f8b0:4864:20::436]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5DF34237D8; Wed, 30 Mar 2022 13:40:01 -0700 (PDT) Received: by mail-pf1-x436.google.com with SMTP id u22so19914325pfg.6; Wed, 30 Mar 2022 13:40:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=IxJ+2fqQtSsTYPlRGM3piQWIajIXCVBAF61vrtKQylo=; b=UCBqabKfkd5UJAsITfqoLnXYRfLTNT7gLImrdxHwgVmLmetSr+kv/fqtCG3pKnSUUO YR6dQnL6QZC+y481uYIYnjhR+dplfBbUX39gYlS95XwaE5F2dETDn5c0etEy4PceLEbC ukR2df69dxu+UWeIU0RfYuOdn5vf7PYjvFbBptVhcWCFmsK1RVp3hbB/gvZJVUElAphq azPiJA9Y+jGL8Gnxh3k5NGZEpWL92NoEdGYA+msXea8DzTMB3aaLtQFFIbe++bqAKKKT 80en/DHj1zHa+weCt7lZfaqBCVOtXUvCTUD6fvKsKI/KZpkKmirzVsYxsLnbpNpZcmap zSMw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=IxJ+2fqQtSsTYPlRGM3piQWIajIXCVBAF61vrtKQylo=; b=B/F9GIb77cRT+tYJu/5hV3Axn5EFsOudSdIAncMY1NEpn3eLx5QS4KWe8HdDW+/lwg bbMoNNjLq4N+Zf2S+9GgRGar1KVKlQE5k3gQ1UHQ1zXqU/llhitVfEhNsCWu6F37kizj 30wl8iQcQVyjjXHYcVHZBKsix6umTR7Gf1Vn+y2AxL+JBdHZdPj0z5A1oQ/sB//LubLc hM7dxxzJzxwNHuEsVziBYEzrKYawZL+aArCTHQj6YA1fUPX4+Z9+fjXG53GY7omex4fb AVKfw6o1dGg9Xno1FT/Z1AvoOyBMI7v4ddGeTljY1mD8/zdkRqmdLWHSuAamb2MpoZ86 04Zw== X-Gm-Message-State: AOAM532sMfc1sGnk1J+zfe4sGb8fbsfbRnehVVIS8VOJmKGMXCGgH9b1 EP64ZPUSYI3ixseuT6Lhi7KWb2Dbg9ryh6HQbOs= X-Received: by 2002:a05:6a00:1c9e:b0:4fa:d946:378b with SMTP id y30-20020a056a001c9e00b004fad946378bmr1345122pfw.46.1648672800697; Wed, 30 Mar 2022 13:40:00 -0700 (PDT) MIME-Version: 1.0 References: <20220329181935.2183-1-beaub@linux.microsoft.com> <20220329201057.GA2549@kbox> <20220329231137.GA3357@kbox> <20220330163411.GA1812@kbox> <20220330191551.GA2377@kbox> In-Reply-To: <20220330191551.GA2377@kbox> From: Alexei Starovoitov Date: Wed, 30 Mar 2022 13:39:49 -0700 Message-ID: Subject: Re: [PATCH] tracing/user_events: Add eBPF interface for user_event created events To: Beau Belgrave Cc: Song Liu , Steven Rostedt , Masami Hiramatsu , linux-trace-devel , LKML , bpf , Network Development , linux-arch , Mathieu Desnoyers Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RDNS_NONE, SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Mar 30, 2022 at 12:15 PM Beau Belgrave wrote: > > On Wed, Mar 30, 2022 at 11:22:32AM -0700, Alexei Starovoitov wrote: > > On Wed, Mar 30, 2022 at 9:34 AM Beau Belgrave wrote: > > > > > > > > > > But you are fine with uprobe costs? uprobes appear to be much more costly > > > > > than a syscall approach on the hardware I've run on. > > > > Care to share the numbers? > > uprobe over USDT is a single trap. > > Not much slower compared to syscall with kpti. > > > > Sure, these are the numbers we have from a production device. > > They are captured via perf via PERF_COUNT_HW_CPU_CYCLES. > It's running a 20K loop emitting 4 bytes of data out. > Each 4 byte event time is recorded via perf. > At the end we have the total time and the max seen. > > null numbers represent a 20K loop with just perf start/stop ioctl costs. > > null: min=2863, avg=2953, max=30815 > uprobe: min=10994, avg=11376, max=146682 I suspect it's a 3 trap case of uprobe. USDT is a nop. It's a 1 trap case. > uevent: min=7043, avg=7320, max=95396 > lttng: min=6270, avg=6508, max=41951 > > These costs include the data getting into a buffer, so they represent > what we would see in production vs the trap cost alone. For uprobe this > means we created a uprobe and attached it via tracefs to get the above > numbers. > > There also seems to be some thinking around this as well from Song Liu. > Link: https://lore.kernel.org/lkml/20200801084721.1812607-1-songliubraving@fb.com/ > > From the link: > 1. User programs are faster. The new selftest added in 5/5, shows that a > simple uprobe program takes 1400 nanoseconds, while user program only > takes 300 nanoseconds. Take a look at Song's code. It's 2 trap case. The USDT is a half of that. ~700ns. Compared to 300ns of syscall that difference could be acceptable. > > > > > > > > > Can we achieve the same/similar performance with sys_bpf(BPF_PROG_RUN)? > > > > > > > > > > I think so, the tough part is how do you let the user-space know which > > > program is attached to run? In the current code this is done by the BPF > > > program attaching to the event via perf and we run the one there if > > > any when data is emitted out via write calls. > > > > > > I would want to make sure that operators can decide where the user-space > > > data goes (perf/ftrace/eBPF) after the code has been written. With the > > > current code this is done via the tracepoint callbacks that perf/ftrace > > > hook up when operators enable recording via perf, tracefs, libbpf, etc. > > > > > > We have managed code (C#/Java) where we cannot utilize stubs or traps > > > easily due to code movement. So we are limited in how we can approach > > > this problem. Having the interface be mmap/write has enabled this > > > for us, since it's easy to interact with in most languages and gives us > > > lifetime management of the trace objects between user-space and the > > > kernel. > > > > Then you should probably invest into making USDT work inside > > java applications instead of reinventing the wheel. > > > > As an alternative you can do a dummy write or any other syscall > > and attach bpf on the kernel side. > > No kernel changes are necessary. > > We only want syscall/tracing overheads for the specific events that are > hooked. I don't see how we could hook up a dummy write that is unique > per-event without having a way to know when the event is being traced. You're adding writev-s to user apps. Keep that writev without any user_events on the kernel side and pass -1 as FD. Hook bpf prog to sys_writev and filter by pid.