Received: by 2002:a05:6358:45e:b0:b5:b6eb:e1f9 with SMTP id 30csp1100094rwe; Thu, 25 Aug 2022 15:42:36 -0700 (PDT) X-Google-Smtp-Source: AA6agR5cghD7dmjTsRWZs9EQ+CDTqKzhwIATM/FH8lZEwTSFhcOhTxW0/RSRGdsBJgGxZlhRmXNs X-Received: by 2002:a05:6402:5190:b0:447:284c:b3a9 with SMTP id q16-20020a056402519000b00447284cb3a9mr4706099edd.428.1661467355876; Thu, 25 Aug 2022 15:42:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1661467355; cv=none; d=google.com; s=arc-20160816; b=PRiMJeCrooD9D2rl/uyBTNy88rJhSntC4zE6Zwa4sEJb/KFZurMN53hy8E03Qt07Gl DY3Dy1hMzXTeNuWUNqrJu0uIy66oKquM3LC5uIqNRnyPdrp3zhLYz7crAXPHFXyF1HBz reMwKyZoHLFPGfgSEgckcurwMR/8a1aXbhkiVcicca3IH6Xu8yhO2TATPW6qpqHqXz7S Yn3wKDSAhb0eiKP1bSSrwtsdgFpoqbI+cp+O0c3amUc1cGkSPK+1d3P9P6Plaa8hc0Mw mE+SqajasYysTUWoQ45LIPJQO61z6+s09HmFmzXY7EhKYtKMbGIDXLoZZiiNoUWxBqzd rsDg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version; bh=0yiLGAygSoo6Ig9sU43elYK3mW36FwAxOIDV175b8NU=; b=odf9SGUPVxm8jz9D+N6ClBeXpjRrtS/2wnWr4kqFNYQ4CPucjH7tsMIV4QfAf9F5Xe zyJNkvnQRbs+2EldFBM8vKbFdrorL969HdKNSXRenrtIv9aFlSZCyRVcPkdsYOAsRKd7 dHnRWR/pMGRSZCCcMSwiVqKgaT8Qcmrst5gigCExGLmzlqGkFgGKHvasYcqQu6gkr5ws PzGrxdnBANutO43eRkBmcmhcMfT9LeulG6YRNzA1zElRYHegtcrGIx1gQg7IWsQv8L3l F/9MjMivjtrkzM6xbhkR+Wx69YEdAdj8XHFyGfl775/aFpxHJWmhLwXshyZLhauLtmhg PpFw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id hq17-20020a1709073f1100b0073046c15d8csi236410ejc.610.2022.08.25.15.42.09; Thu, 25 Aug 2022 15:42:35 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242855AbiHYWNp (ORCPT + 99 others); Thu, 25 Aug 2022 18:13:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44496 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237937AbiHYWNj (ORCPT ); Thu, 25 Aug 2022 18:13:39 -0400 Received: from mail-ot1-f53.google.com (mail-ot1-f53.google.com [209.85.210.53]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9C360C3F61; Thu, 25 Aug 2022 15:13:38 -0700 (PDT) Received: by mail-ot1-f53.google.com with SMTP id q39-20020a056830442700b0063889adc0ddso14873457otv.1; Thu, 25 Aug 2022 15:13:38 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc; bh=0yiLGAygSoo6Ig9sU43elYK3mW36FwAxOIDV175b8NU=; b=k1PYE919Z7zpQDHLmPmmvphNiPAbxv8ZCESZGCQjHQdBbFzhnISeD+O8SDzwceLueZ DKJHhN699ZSh6+UVh/XpCoxcxerWN4ueN4O8Kv/Rnet5c6p9fGbIBxvV8qGnbSpNuOlz kHsc2o4YtZOsSUe9Jr5y11okQN5VQBx3hu8UZKrFubWuu8xMLBwEtAIJJC2Kkn6PpNTE KpAjUoVlPkkA1z71CvilacOKs0nR8YZ+rvilvJtcTtFjuhDNLr0PnK45K9rcW6nEDwZg 08gdDU2VzM5dkUHCpupntmnV0Nw53a5LTQBT58K+fhOcf+/SVZyX8z9N62WpmHEz5/Qk hvkw== X-Gm-Message-State: ACgBeo2/oSQlcR2vN7Cp29G5b8S66Yah5qMmK8bw6uEnQgTQgsYz81WX gBq+74gGGo0B5mbWX+4gCsEKx/ZJagEEuNYHuYOHg02A X-Received: by 2002:a9d:6f18:0:b0:638:b4aa:a546 with SMTP id n24-20020a9d6f18000000b00638b4aaa546mr409050otq.124.1661465617600; Thu, 25 Aug 2022 15:13:37 -0700 (PDT) MIME-Version: 1.0 References: <20220823210354.1407473-1-namhyung@kernel.org> In-Reply-To: From: Namhyung Kim Date: Thu, 25 Aug 2022 15:13:28 -0700 Message-ID: Subject: Re: [PATCH bpf-next] bpf: Add bpf_read_raw_record() helper To: Andrii Nakryiko Cc: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , Steven Rostedt , Peter Zijlstra , Ingo Molnar , bpf , LKML Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-1.4 required=5.0 tests=BAYES_00, FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, On Thu, Aug 25, 2022 at 2:34 PM Andrii Nakryiko wrote: > > On Tue, Aug 23, 2022 at 2:04 PM Namhyung Kim wrote: > > > > The helper is for BPF programs attached to perf_event in order to read > > event-specific raw data. I followed the convention of the > > bpf_read_branch_records() helper so that it can tell the size of > > record using BPF_F_GET_RAW_RECORD flag. > > > > The use case is to filter perf event samples based on the HW provided > > data which have more detailed information about the sample. > > > > Note that it only reads the first fragment of the raw record. But it > > seems mostly ok since all the existing PMU raw data have only single > > fragment and the multi-fragment records are only for BPF output attached > > to sockets. So unless it's used with such an extreme case, it'd work > > for most of tracing use cases. > > > > Signed-off-by: Namhyung Kim > > --- > > I don't know how to test this. As the raw data is available on some > > hardware PMU only (e.g. AMD IBS). I tried a tracepoint event but it was > > rejected by the verifier. Actually it needs a bpf_perf_event_data > > context so that's not an option IIUC. > > > > include/uapi/linux/bpf.h | 23 ++++++++++++++++++++++ > > kernel/trace/bpf_trace.c | 41 ++++++++++++++++++++++++++++++++++++++++ > > 2 files changed, 64 insertions(+) > > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > > index 934a2a8beb87..af7f70564819 100644 > > --- a/include/uapi/linux/bpf.h > > +++ b/include/uapi/linux/bpf.h > > @@ -5355,6 +5355,23 @@ union bpf_attr { > > * Return > > * Current *ktime*. > > * > > + * long bpf_read_raw_record(struct bpf_perf_event_data *ctx, void *buf, u32 size, u64 flags) > > + * Description > > + * For an eBPF program attached to a perf event, retrieve the > > + * raw record associated to *ctx* and store it in the buffer > > + * pointed by *buf* up to size *size* bytes. > > + * Return > > + * On success, number of bytes written to *buf*. On error, a > > + * negative value. > > + * > > + * The *flags* can be set to **BPF_F_GET_RAW_RECORD_SIZE** to > > + * instead return the number of bytes required to store the raw > > + * record. If this flag is set, *buf* may be NULL. > > It looks pretty ugly from a usability standpoint to have one helper > doing completely different things and returning two different values > based on BPF_F_GET_RAW_RECORD_SIZE. Agreed. > > I'm not sure what's best, but I have two alternative proposals: > > 1. Add two helpers: one to get perf record information (and size will > be one of them). Something like bpf_perf_record_query(ctx, flags) > where you pass perf ctx and what kind of information you want to read > (through flags), and u64 return result returns that (see > bpf_ringbuf_query() for such approach). And then have separate helper > to read data. I like this as I want to have more info for the perf event sample like instruction address or sample type. I know some of the info is available through the context but I think this is a better approach. > > 2. Keep one helper, but specify that it always returns record size, > even if user specified smaller size to read. And then allow passing > buf==NULL && size==0. So passing NULL, 0 -- you get record size. > Passing non-NULL buf -- you read data. > > > And also, "read_raw_record" is way too generic. We have > bpf_perf_prog_read_value(), let's use "bpf_perf_read_raw_record()" as > a name. We should have called bpf_read_branch_records() as > bpf_perf_read_branch_records(), probably, as well. But it's too late. Yeah, what about this? * bpf_perf_event_query(ctx, flag) * bpf_perf_event_get(ctx, flag, buf, size) Maybe we can use the same flag for both. Like BPF_PERF_RAW_RECORD can return the size (or -1 if not) on _query() and read the data on _get(). Or we can have a BPF_PERF_RAW_RECORD_SIZE only for _query(). It seems we don't need _get() for things like BPF_PERF_SAMPLE_IP. What do you think? Thanks, Namhyung