Received: by 2002:a05:6358:45e:b0:b5:b6eb:e1f9 with SMTP id 30csp1127671rwe; Thu, 25 Aug 2022 16:22:46 -0700 (PDT) X-Google-Smtp-Source: AA6agR6xM9ZiYNOn7+PmqpmeLEishlgOgr68c08NCAsuvzm/S0VcjadLUvcrNnBxAIOh/fL60Tcf X-Received: by 2002:a17:907:1de2:b0:73d:852:511a with SMTP id og34-20020a1709071de200b0073d0852511amr3673786ejc.316.1661469766610; Thu, 25 Aug 2022 16:22:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1661469766; cv=none; d=google.com; s=arc-20160816; b=mUTgrNLevoSBC/N2t4iUtgfEXCFSppURKwlvyxMvQBHk4M3si+1YRx1o8vPwXrgSOQ YWrM0nbQ30wmA6CMSpgoh6FDMjFYKd5mAPi7reoCUrQ5cNRykweL5ndjphApdtnqX5LQ w7+yaINxfxQhV9ab34R32MT7t0Q/tKey0Sfe8eTDb2jEawDlynrJcS8592I11sTm2ium eGopQZYsYb19w/t2cdSG6RJx2x9r4p3FfkNGaGt+3GBpZYg2OFvpRRP1r5SnUhzeIQnR NtkQ8CXv7ap/GsYiaH9ueowcVWXYFx9/nfVpbbMarr8p7pdU1ksRT2zEDYAR7KCfuYQe Ohhg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=uy4eH0cBiH6VU7iDFo4rGPAwd8NTm7jEGdIZq3tRJxM=; b=te4syBXEu9GpN4f3XYVV7BaaeTw4pzhnZNegx1IqBKDZ153ruLY/m6RPM6+un4iVqO pGTr1KK1vi0HMdIql3551onZNB3kahYGSwg8ooXcfMBFWEjYq0ytjLv1s3wIQVxRd7Af NyIi4CaGPuELnrM6RXoHcVZ6lkYvpmmiOI2hEwMQkPrdrs6YPVlmebfwmENrvtQ0DX6R kLFwZWCBBUdxFR2RaWVib9tqyOZHxk616CabC6uityV9Ld3lClEQUTvQhR9+CO+qYgAQ fWVvH6+ustLTssGtDjlQtXn6en59JPhrQh2lVzD3MEBts5RREu0d6yIwzmJ8s9xS/gTE sHLQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=d4FVXcH8; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id y3-20020a056402440300b00445e6a004bfsi523243eda.363.2022.08.25.16.22.21; Thu, 25 Aug 2022 16:22:46 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=d4FVXcH8; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235138AbiHYXDn (ORCPT + 99 others); Thu, 25 Aug 2022 19:03:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34248 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229990AbiHYXDm (ORCPT ); Thu, 25 Aug 2022 19:03:42 -0400 Received: from mail-ed1-x531.google.com (mail-ed1-x531.google.com [IPv6:2a00:1450:4864:20::531]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A4902C2F87; Thu, 25 Aug 2022 16:03:37 -0700 (PDT) Received: by mail-ed1-x531.google.com with SMTP id z8so183754edb.0; Thu, 25 Aug 2022 16:03:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc; bh=uy4eH0cBiH6VU7iDFo4rGPAwd8NTm7jEGdIZq3tRJxM=; b=d4FVXcH8g22QdNUlTFXYjsz3gDlyRAh4ZnMhVQbhaR8KUSAMJ1t7SnnuMeH28v3bMy C/lwGiKZhGz9mAOTqe1lhnNUCOrfy89en1XSHEBwNPrMyd9IURQjIskBNV+5XNJTq+w6 zctifzwtpvMOmGlMv45gtFv8jmbLReyKdFUb65e+RXmZyUSdohxNOvDk13+1F0fBlB7p 3TYK+PuN51hFJICB4S4lAVJsRSHEnHACp82xGHDkmWgGN+iIA1gJvVBwAl2BSMpntqDq RaV9WFjttTSWOfvKfAis3/8AQKnAtaIidtIHWIJONlUcKfgH5U973PFdAC8YALVazAK3 7TTQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc; bh=uy4eH0cBiH6VU7iDFo4rGPAwd8NTm7jEGdIZq3tRJxM=; b=Y53WNirwpIowSQ6rVN0t5Odqv5V3Uo2IAR1IlWLORiqOdhDFxGUEkxw+2c3Ie4nggz DOA39DCeR0DfznEafjy3TK5olGR15jkP+6uBTWO0ewyXw/HDGHFdvZ75poQdnoYKXdhr BYtjuw/Csz5olQ/XulbBNdgjEJI5YTThLjqeNLUjVNBu8z9bCDNgrIjNokPBZlPmdzJH 9Ii/gVIUqYSxG6lxtJQMbnYKzUOLYIPJjuBSzTZojgrgtJPTYy1HVpSi/ydeeqvEZtwm akOq21pOoTRAI0dcOWHikDvLiu2pu8MA962egm7DmLgSDgg1hcYA821M7YB57KRORqjX Myew== X-Gm-Message-State: ACgBeo2BHnhuoD7zccCRvbEtNwA0/Qi3kBPuRiT4vJrc8VJBqwC99rGn zkrCn7WY3mn1MtERpoF72y5CliIHqhGp63Or8cg= X-Received: by 2002:a05:6402:24a4:b0:440:8c0c:8d2b with SMTP id q36-20020a05640224a400b004408c0c8d2bmr4677996eda.311.1661468616191; Thu, 25 Aug 2022 16:03:36 -0700 (PDT) MIME-Version: 1.0 References: <20220823210354.1407473-1-namhyung@kernel.org> In-Reply-To: From: Andrii Nakryiko Date: Thu, 25 Aug 2022 16:03:24 -0700 Message-ID: Subject: Re: [PATCH bpf-next] bpf: Add bpf_read_raw_record() helper To: Song Liu Cc: Namhyung Kim , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin Lau , Yonghong Song , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , Steven Rostedt , Peter Zijlstra , Ingo Molnar , "bpf@vger.kernel.org" , LKML Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Aug 25, 2022 at 3:08 PM Song Liu wrote: > > > > > On Aug 25, 2022, at 2:33 PM, Andrii Nakryiko wrote: > > > > On Tue, Aug 23, 2022 at 2:04 PM Namhyung Kim wrote: > >> > >> The helper is for BPF programs attached to perf_event in order to read > >> event-specific raw data. I followed the convention of the > >> bpf_read_branch_records() helper so that it can tell the size of > >> record using BPF_F_GET_RAW_RECORD flag. > >> > >> The use case is to filter perf event samples based on the HW provided > >> data which have more detailed information about the sample. > >> > >> Note that it only reads the first fragment of the raw record. But it > >> seems mostly ok since all the existing PMU raw data have only single > >> fragment and the multi-fragment records are only for BPF output attached > >> to sockets. So unless it's used with such an extreme case, it'd work > >> for most of tracing use cases. > >> > >> Signed-off-by: Namhyung Kim > >> --- > >> I don't know how to test this. As the raw data is available on some > >> hardware PMU only (e.g. AMD IBS). I tried a tracepoint event but it was > >> rejected by the verifier. Actually it needs a bpf_perf_event_data > >> context so that's not an option IIUC. > >> > >> include/uapi/linux/bpf.h | 23 ++++++++++++++++++++++ > >> kernel/trace/bpf_trace.c | 41 ++++++++++++++++++++++++++++++++++++++++ > >> 2 files changed, 64 insertions(+) > >> > >> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > >> index 934a2a8beb87..af7f70564819 100644 > >> --- a/include/uapi/linux/bpf.h > >> +++ b/include/uapi/linux/bpf.h > >> @@ -5355,6 +5355,23 @@ union bpf_attr { > >> * Return > >> * Current *ktime*. > >> * > >> + * long bpf_read_raw_record(struct bpf_perf_event_data *ctx, void *buf, u32 size, u64 flags) > >> + * Description > >> + * For an eBPF program attached to a perf event, retrieve the > >> + * raw record associated to *ctx* and store it in the buffer > >> + * pointed by *buf* up to size *size* bytes. > >> + * Return > >> + * On success, number of bytes written to *buf*. On error, a > >> + * negative value. > >> + * > >> + * The *flags* can be set to **BPF_F_GET_RAW_RECORD_SIZE** to > >> + * instead return the number of bytes required to store the raw > >> + * record. If this flag is set, *buf* may be NULL. > > > > It looks pretty ugly from a usability standpoint to have one helper > > doing completely different things and returning two different values > > based on BPF_F_GET_RAW_RECORD_SIZE. > > Yeah, I had the same thought when I first looked at it. But that's the > exact syntax with bpf_read_branch_records(). Well, we still have time > to fix the new helper.. > > > > > I'm not sure what's best, but I have two alternative proposals: > > > > 1. Add two helpers: one to get perf record information (and size will > > be one of them). Something like bpf_perf_record_query(ctx, flags) > > where you pass perf ctx and what kind of information you want to read > > (through flags), and u64 return result returns that (see > > bpf_ringbuf_query() for such approach). And then have separate helper > > to read data. > > > > 2. Keep one helper, but specify that it always returns record size, > > even if user specified smaller size to read. And then allow passing > > buf==NULL && size==0. So passing NULL, 0 -- you get record size. > > Passing non-NULL buf -- you read data. > > AFAICT, this is also confusing. > this is analogous to snprintf() behavior, so not that new and surprising when you think about it. But if query + read makes more sense, then it's fine by me > Maybe we should use two kfuncs for this? > > Thanks, > Song > > > > > > > And also, "read_raw_record" is way too generic. We have > > bpf_perf_prog_read_value(), let's use "bpf_perf_read_raw_record()" as > > a name. We should have called bpf_read_branch_records() as > > bpf_perf_read_branch_records(), probably, as well. But it's too late. > > > >> + * > >> + * **-EINVAL** if arguments invalid or **size** not a multiple > >> + * of **sizeof**\ (u64\ ). > >> + * > >> + * **-ENOENT** if the event does not have raw records. > >> */ > >> #define __BPF_FUNC_MAPPER(FN) \ > >> FN(unspec), \ > >> @@ -5566,6 +5583,7 @@ union bpf_attr { > >> FN(tcp_raw_check_syncookie_ipv4), \ > >> FN(tcp_raw_check_syncookie_ipv6), \ > >> FN(ktime_get_tai_ns), \ > >> + FN(read_raw_record), \ > >> /* */ > >> > > > > [...] >