Received: by 2002:a05:6512:2355:0:0:0:0 with SMTP id p21csp204916lfu; Wed, 30 Mar 2022 20:53:01 -0700 (PDT) X-Google-Smtp-Source: ABdhPJx5k2dNtrLURzeT3j6J83ib3C9Z+GyzABkMIvn2cpumsccskXGSxmhAVzn4y4R/7erNLPXn X-Received: by 2002:a62:e213:0:b0:4fa:6b13:3a9a with SMTP id a19-20020a62e213000000b004fa6b133a9amr3225740pfi.18.1648698781365; Wed, 30 Mar 2022 20:53:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1648698781; cv=none; d=google.com; s=arc-20160816; b=JfQhXPQZWvJbj3rY59Vgf0Qn8zTdQXZCURpB15x1RBBYJJ8el3Qe6YAMaIZM5+zlqo rbaKUj+Oeaoiw36vLjOtdMGyMW5kw+p5OztI2wMBILhl76zHf1a/1Kjc2ZpynMjZ/xl1 s6N9kNBASL8tIiE1HeYMvZ8NFW47j5zOm34dtngPfas/H+CJV7PjMQPyoszQJA7ZwR0g wq0iebwQprwNN1NEbtWfowNVxJuhPEkdACCY513R659X2dWeEOmZHdZT9FTL8vaE5jW0 52uBroolR+E2S+dM0yHbQYPY2gGi7goWaePHXW5Y59ZjqxaH9nAxuJf7y4ZWgGJp+o5w Ufng== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature :dkim-filter; bh=NE1Gzgfcu1RVAQTBcQFmkLsZA71WWxETeZdDl0NVbGk=; b=Ti6KwdZmVU9Q7UCJdd+jCK7GUB7IfV/os+GRqmdLxRuE9CgfBQswn9JK4eqmBtaHhn /ETcIJJE9kr/crYuAn8ztwGnrbbnL5CH5P0QArGywmmXDm2pFYkyls4FXi9BLzuCSQCG gfWyHaz3CnS1CSjqO7mORl8p+y4flGTNro+XD8HZ/o/jBzTRlnt9laO1zDyFNeved03w irCrNZXeYmmcaw+vasps9OltJ8qJ9tSWyEVsgYdsZoQyAVj2IQ+hdlk2BzZpSmjjhmd/ C+eZGAmr1VYFlj7sZrG8jymZBu3duwweJBjw1NTkAGQ20PHAc/mcaSa68XAuix1R8/6M RrMw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux.microsoft.com header.s=default header.b=aPSThqM5; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.microsoft.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id c20-20020a631c14000000b0038251571eacsi23096345pgc.180.2022.03.30.20.53.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 30 Mar 2022 20:53:01 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; dkim=pass header.i=@linux.microsoft.com header.s=default header.b=aPSThqM5; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.microsoft.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id E93EE1557F8; Wed, 30 Mar 2022 20:06:01 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1348848AbiC3QgF (ORCPT + 99 others); Wed, 30 Mar 2022 12:36:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53734 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1348839AbiC3QgE (ORCPT ); Wed, 30 Mar 2022 12:36:04 -0400 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 870211D320; Wed, 30 Mar 2022 09:34:18 -0700 (PDT) Received: from kbox (c-73-140-2-214.hsd1.wa.comcast.net [73.140.2.214]) by linux.microsoft.com (Postfix) with ESMTPSA id E7AF420B96D6; Wed, 30 Mar 2022 09:34:17 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com E7AF420B96D6 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1648658058; bh=NE1Gzgfcu1RVAQTBcQFmkLsZA71WWxETeZdDl0NVbGk=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=aPSThqM5IPvFG/nHKA0shVLzXfcjf3BuMxLCBELpv/wumVaQ/8A6ypVhJfck1YAat 0bd7iRH9n722EW/ZXFcqE/ULmpv1L/sqLZSiWJ9HfCB8E5L0b1Y1bICMux2PPMkDNv YQZnbdhmN0xtdoPXKTDQMauOZ9yemyA8+fKPIOLs= Date: Wed, 30 Mar 2022 09:34:11 -0700 From: Beau Belgrave To: Song Liu Cc: Alexei Starovoitov , Steven Rostedt , Masami Hiramatsu , linux-trace-devel , LKML , bpf , Network Development , linux-arch , Mathieu Desnoyers Subject: Re: [PATCH] tracing/user_events: Add eBPF interface for user_event created events Message-ID: <20220330163411.GA1812@kbox> References: <20220329181935.2183-1-beaub@linux.microsoft.com> <20220329201057.GA2549@kbox> <20220329231137.GA3357@kbox> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-9.5 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Mar 30, 2022 at 09:06:24AM -0700, Song Liu wrote: > On Tue, Mar 29, 2022 at 4:11 PM Beau Belgrave wrote: > > > > On Tue, Mar 29, 2022 at 03:31:31PM -0700, Alexei Starovoitov wrote: > > > On Tue, Mar 29, 2022 at 1:11 PM Beau Belgrave wrote: > > > > > > > > On Tue, Mar 29, 2022 at 12:50:40PM -0700, Alexei Starovoitov wrote: > > > > > On Tue, Mar 29, 2022 at 11:19 AM Beau Belgrave > > > > > wrote: > > > > > > > > > > > > Send user_event data to attached eBPF programs for user_event based perf > > > > > > events. > > > > > > > > > > > > Add BPF_ITER flag to allow user_event data to have a zero copy path into > > > > > > eBPF programs if required. > > > > > > > > > > > > Update documentation to describe new flags and structures for eBPF > > > > > > integration. > > > > > > > > > > > > Signed-off-by: Beau Belgrave > > > > > > > > > > The commit describes _what_ it does, but says nothing about _why_. > > > > > At present I see no use out of bpf and user_events connection. > > > > > The whole user_events feature looks redundant to me. > > > > > We have uprobes and usdt. It doesn't look to me that > > > > > user_events provide anything new that wasn't available earlier. > > > > > > > > A lot of the why, in general, for user_events is covered in the first > > > > change in the series. > > > > Link: https://lore.kernel.org/all/20220118204326.2169-1-beaub@linux.microsoft.com/ > > > > > > > > The why was also covered in Linux Plumbers Conference 2021 within the > > > > tracing microconference. > > > > > > > > An example of why we want user_events: > > > > Managed code running that emits data out via Open Telemetry. > > > > Since it's managed there isn't a stub location to patch, it moves. > > > > We watch the Open Telemetry spans in an eBPF program, when a span takes > > > > too long we collect stack data and perform other actions. > > > > With user_events and perf we can monitor the entire system from the root > > > > container without having to have relay agents within each > > > > cgroup/namespace taking up resources. > > > > We do not need to enter each cgroup mnt space and determine the correct > > > > patch location or the right version of each binary for processes that > > > > use user_events. > > > > > > > > An example of why we want eBPF integration: > > > > We also have scenarios where we are live decoding the data quickly. > > > > Having user_data fed directly to eBPF lets us cast the data coming in to > > > > a struct and decode very very quickly to determine if something is > > > > wrong. > > > > We can take that data quickly and put it into maps to perform further > > > > aggregation as required. > > > > We have scenarios that have "skid" problems, where we need to grab > > > > further data exactly when the process that had the problem was running. > > > > eBPF lets us do all of this that we cannot easily do otherwise. > > > > > > > > Another benefit from user_events is the tracing is much faster than > > > > uprobes or others using int 3 traps. This is critical to us to enable on > > > > production systems. > > > > > > None of it makes sense to me. > > > > Sorry. > > > > > To take advantage of user_events user space has to be modified > > > and writev syscalls inserted. > > > > Yes, both user_events and lttng require user space modifications to do > > tracing correctly. The syscall overheads are real, and the cost depends > > on the mitigations around spectre/meltdown. > > > > > This is not cheap and I cannot see a production system using this interface. > > > > But you are fine with uprobe costs? uprobes appear to be much more costly > > than a syscall approach on the hardware I've run on. > > Can we achieve the same/similar performance with sys_bpf(BPF_PROG_RUN)? > I think so, the tough part is how do you let the user-space know which program is attached to run? In the current code this is done by the BPF program attaching to the event via perf and we run the one there if any when data is emitted out via write calls. I would want to make sure that operators can decide where the user-space data goes (perf/ftrace/eBPF) after the code has been written. With the current code this is done via the tracepoint callbacks that perf/ftrace hook up when operators enable recording via perf, tracefs, libbpf, etc. We have managed code (C#/Java) where we cannot utilize stubs or traps easily due to code movement. So we are limited in how we can approach this problem. Having the interface be mmap/write has enabled this for us, since it's easy to interact with in most languages and gives us lifetime management of the trace objects between user-space and the kernel. > Thanks, > Song > > > > > > All you did is a poor man version of lttng that doesn't rely > > > on such heavy instrumentation. > > > > Well I am a frugal person. :) > > > > This work has solved some critical issues we've been having, and I would > > appreciate a review of the code if possible. > > > > Thanks, > > -Beau Thanks, -Beau