Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1AEA4C433EF for ; Sun, 28 Nov 2021 22:36:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1359307AbhK1Wjq (ORCPT ); Sun, 28 Nov 2021 17:39:46 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:41288 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1345056AbhK1Whf (ORCPT ); Sun, 28 Nov 2021 17:37:35 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1638138858; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=SiBeCQgxXQAu5ozvORDzA8+7QhkHumTeqXoN8dJ0Bwc=; b=TLsXO0UwwsDR6nTRK1Wi+ha28i/9BYDarkpP4LB5mlrri0whxubfWBSCTppBuO6ISS1rqa i4pcSSCZAvtXt/HMQNK37IH+fbw/+jIaOm6ahoJntr7QeQNiRNvUCIFAsbPiz0/zQ4KYTP KvceBPqq9681Uw1F+jTuDs9p4xA1Mq8= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-220-TOk5chz5NdeLHsYFA1acnQ-1; Sun, 28 Nov 2021 17:34:17 -0500 X-MC-Unique: TOk5chz5NdeLHsYFA1acnQ-1 Received: by mail-wm1-f71.google.com with SMTP id o18-20020a05600c511200b00332fa17a02eso9518052wms.5 for ; Sun, 28 Nov 2021 14:34:16 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=SiBeCQgxXQAu5ozvORDzA8+7QhkHumTeqXoN8dJ0Bwc=; b=jhKbInq2n8BJiClBN26pHkuRW77VMQaQsrnYzEL5Dc8yIe/mt43HZeMtE7LlB279jY xtTTLIqw94DQ6pSmDvFgLhmdXNZrOWWnclRDfwQeRs9QPJNQTKyHk3XCroQvCuC7e2eQ kfLu+mEVjVfvRoUo1X6gtxEmCQrRSTYI2lp+0mHdv5SFewmHSIzRnlovpgYXTgpQsMQh bsqsySM9e4yjlQlMj4VAcbb5OWi5mH1403WUhrQnmh+eI5FJAhhSDGZg9r92sFF5/BP1 LibJGoFAYpTUAipBy+mL2kwy+igvlXIP4VTCMdlNlAgzGNeomYoxPnEeqv/hp09e/gVp jF0w== X-Gm-Message-State: AOAM530wrp31hUhZ+nhOIFGJ9nBaHQty7qdUQCDDqFdTEnfakl9b+aBR +xkoe6SoRwmRs2UpbR0H22oZGStHaRTXo+x09Wk8mzP+XTUWjspy2itxP0BYSbi7aWlR+g+q54y cuOJBQnIuwQrxQF1J8ySSVI14 X-Received: by 2002:adf:ce08:: with SMTP id p8mr29736677wrn.154.1638138855679; Sun, 28 Nov 2021 14:34:15 -0800 (PST) X-Google-Smtp-Source: ABdhPJwTuGqXCmuhYDtqiUT/gAE5vGSE9GEnS4LQiBLJpXljgc6TTlRnQJbc6Mw7zsMMJRIp8Kk1zg== X-Received: by 2002:adf:ce08:: with SMTP id p8mr29736648wrn.154.1638138855371; Sun, 28 Nov 2021 14:34:15 -0800 (PST) Received: from krava ([83.240.60.218]) by smtp.gmail.com with ESMTPSA id n7sm11674471wro.68.2021.11.28.14.34.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 28 Nov 2021 14:34:14 -0800 (PST) Date: Sun, 28 Nov 2021 23:34:13 +0100 From: Jiri Olsa To: Masami Hiramatsu Cc: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Arnaldo Carvalho de Melo , Peter Zijlstra , Steven Rostedt , netdev@vger.kernel.org, bpf@vger.kernel.org, lkml , Ingo Molnar , Mark Rutland , Martin KaFai Lau , Alexander Shishkin , Song Liu , Yonghong Song , John Fastabend , KP Singh , Ravi Bangoria Subject: Re: [PATCH 1/8] perf/kprobe: Add support to create multiple probes Message-ID: References: <20211124084119.260239-1-jolsa@kernel.org> <20211124084119.260239-2-jolsa@kernel.org> <20211128224954.11e8ac2a2ff1f45354c4a161@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20211128224954.11e8ac2a2ff1f45354c4a161@kernel.org> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Nov 28, 2021 at 10:49:54PM +0900, Masami Hiramatsu wrote: > On Wed, 24 Nov 2021 09:41:12 +0100 > Jiri Olsa wrote: > > > Adding support to create multiple probes within single perf event. > > This way we can associate single bpf program with multiple kprobes, > > because bpf program gets associated with the perf event. > > > > The perf_event_attr is not extended, current fields for kprobe > > attachment are used for multi attachment. > > > > For current kprobe atachment we use either: > > > > kprobe_func (in config1) + probe_offset (in config2) > > > > to define kprobe by function name with offset, or: > > > > kprobe_addr (in config2) > > > > to define kprobe with direct address value. > > > > For multi probe attach the same fields point to array of values > > with the same semantic. Each probe is defined as set of values > > with the same array index (idx) as: > > > > kprobe_func[idx] + probe_offset[idx] > > > > to define kprobe by function name with offset, or: > > > > kprobe_addr[idx] > > > > to define kprobe with direct address value. > > > > The number of probes is passed in probe_cnt value, which shares > > the union with wakeup_events/wakeup_watermark values which are > > not used for kprobes. > > > > Since [1] it's possible to stack multiple probes events under > > one head event. Using the same code to allow that for probes > > defined under perf kprobe interface. > > OK, so you also want to add multi-probes on single event by > single perf-event syscall. Not defining different events. correct.. bpf program is then attached to perf event with ioctl call.. this way we can have multiple probes attached to single bpf program > > Those are bit different, multi-probes on single event can > invoke single event handler from different probe points. For > exapmple same user bpf handler will be invoked from different > address. right, that's the goal, having single bpf program executed from multiple probes > > > > > [1] https://lore.kernel.org/lkml/156095682948.28024.14190188071338900568.stgit@devnote2/ > > Signed-off-by: Jiri Olsa > > --- > > include/uapi/linux/perf_event.h | 1 + > > kernel/trace/trace_event_perf.c | 106 ++++++++++++++++++++++++++++---- > > kernel/trace/trace_kprobe.c | 47 ++++++++++++-- > > kernel/trace/trace_probe.c | 2 +- > > kernel/trace/trace_probe.h | 3 +- > > 5 files changed, 138 insertions(+), 21 deletions(-) > > > > diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h > > index bd8860eeb291..eea80709d1ed 100644 > > --- a/include/uapi/linux/perf_event.h > > +++ b/include/uapi/linux/perf_event.h > > @@ -414,6 +414,7 @@ struct perf_event_attr { > > union { > > __u32 wakeup_events; /* wakeup every n events */ > > __u32 wakeup_watermark; /* bytes before wakeup */ > > + __u32 probe_cnt; /* number of [k,u] probes */ > > }; > > > > __u32 bp_type; > > diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c > > index a114549720d6..26078e40c299 100644 > > --- a/kernel/trace/trace_event_perf.c > > +++ b/kernel/trace/trace_event_perf.c > > @@ -245,23 +245,27 @@ void perf_trace_destroy(struct perf_event *p_event) > > } > > > > #ifdef CONFIG_KPROBE_EVENTS > > -int perf_kprobe_init(struct perf_event *p_event, bool is_retprobe) > > +static struct trace_event_call* > > +kprobe_init(bool is_retprobe, u64 kprobe_func, u64 kprobe_addr, > > + u64 probe_offset, struct trace_event_call *old) > > { > > int ret; > > char *func = NULL; > > struct trace_event_call *tp_event; > > > > - if (p_event->attr.kprobe_func) { > > + if (kprobe_func) { > > func = kzalloc(KSYM_NAME_LEN, GFP_KERNEL); > > if (!func) > > - return -ENOMEM; > > + return ERR_PTR(-ENOMEM); > > ret = strncpy_from_user( > > - func, u64_to_user_ptr(p_event->attr.kprobe_func), > > + func, u64_to_user_ptr(kprobe_func), > > KSYM_NAME_LEN); > > if (ret == KSYM_NAME_LEN) > > ret = -E2BIG; > > - if (ret < 0) > > - goto out; > > + if (ret < 0) { > > + kfree(func); > > + return ERR_PTR(ret); > > + } > > > > if (func[0] == '\0') { > > kfree(func); > > @@ -270,20 +274,96 @@ int perf_kprobe_init(struct perf_event *p_event, bool is_retprobe) > > } > > > > tp_event = create_local_trace_kprobe( > > - func, (void *)(unsigned long)(p_event->attr.kprobe_addr), > > - p_event->attr.probe_offset, is_retprobe); > > - if (IS_ERR(tp_event)) { > > - ret = PTR_ERR(tp_event); > > - goto out; > > + func, (void *)(unsigned long)(kprobe_addr), > > + probe_offset, is_retprobe, old); > > Hmm, here I have a concern (maybe no real issue is caused at this momemnt.) > Since ftrace's multi-probe event has same event/group name among the > probes's internal event-calls. However, create_local_trace_kprobe() > actually uses the "func" name for the event name. > I think you should choose a randome different "representative" event name > for the event (not probe), and share it among the probes on the event, > if the perf event has no event name. > > (I'm not sure how the event names are used from inside of the BPF programs, > but just for the consistency.) ok, I don't think event names are used, I'll check > > > + kfree(func); > > + return tp_event; > > +} > > + > > +static struct trace_event_call* > > +kprobe_init_multi(struct perf_event *p_event, bool is_retprobe) > > +{ > > + void __user *kprobe_func = u64_to_user_ptr(p_event->attr.kprobe_func); > > + void __user *kprobe_addr = u64_to_user_ptr(p_event->attr.kprobe_addr); > > + u64 *funcs = NULL, *addrs = NULL, *offs = NULL; > > + struct trace_event_call *tp_event, *tp_old = NULL; > > + u32 i, cnt = p_event->attr.probe_cnt; > > + int ret = -EINVAL; > > + size_t size; > > + > > + if (!cnt) > > + return ERR_PTR(-EINVAL); > > + > > + size = cnt * sizeof(u64); > > + if (kprobe_func) { > > + ret = -ENOMEM; > > + funcs = kmalloc(size, GFP_KERNEL); > > + if (!funcs) > > + goto out; > > + ret = -EFAULT; > > + if (copy_from_user(funcs, kprobe_func, size)) > > + goto out; > > + } > > + > > + if (kprobe_addr) { > > + ret = -ENOMEM; > > + addrs = kmalloc(size, GFP_KERNEL); > > + if (!addrs) > > + goto out; > > + ret = -EFAULT; > > + if (copy_from_user(addrs, kprobe_addr, size)) > > + goto out; > > + /* addresses and ofsets share the same array */ > > + offs = addrs; > > } > > > > + for (i = 0; i < cnt; i++) { > > + tp_event = kprobe_init(is_retprobe, funcs ? funcs[i] : 0, > > + addrs ? addrs[i] : 0, offs ? offs[i] : 0, > > + tp_old); > > + if (IS_ERR(tp_event)) { > > + if (tp_old) > > + destroy_local_trace_kprobe(tp_old); > > + ret = PTR_ERR(tp_event); > > + goto out; > > + } > > + if (!tp_old) > > + tp_old = tp_event; > > + } > > + ret = 0; > > +out: > > + kfree(funcs); > > + kfree(addrs); > > + return ret ? ERR_PTR(ret) : tp_old; > > +} > > + > > +static struct trace_event_call* > > +kprobe_init_single(struct perf_event *p_event, bool is_retprobe) > > +{ > > + struct perf_event_attr *attr = &p_event->attr; > > + > > + return kprobe_init(is_retprobe, attr->kprobe_func, attr->kprobe_addr, > > + attr->probe_offset, NULL); > > +} > > + > > +int perf_kprobe_init(struct perf_event *p_event, bool is_retprobe) > > +{ > > + struct trace_event_call *tp_event; > > + int ret; > > + > > + if (p_event->attr.probe_cnt) > > isn't this "p_event->attr.probe_cnt > 1" ? right, probe_cnt is just added by this patchset and used only when there are multiple probes being attached, so that's why it works even with 1 at the moment will change > > > + tp_event = kprobe_init_multi(p_event, is_retprobe); > > + else > > + tp_event = kprobe_init_single(p_event, is_retprobe); > > + > > + if (IS_ERR(tp_event)) > > + return PTR_ERR(tp_event); > > + > > mutex_lock(&event_mutex); > > ret = perf_trace_event_init(tp_event, p_event); > > if (ret) > > destroy_local_trace_kprobe(tp_event); > > mutex_unlock(&event_mutex); > > -out: > > - kfree(func); > > return ret; > > } > > > > diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c > > index 33272a7b6912..86a7aada853a 100644 > > --- a/kernel/trace/trace_kprobe.c > > +++ b/kernel/trace/trace_kprobe.c > > @@ -237,13 +237,18 @@ static int kprobe_dispatcher(struct kprobe *kp, struct pt_regs *regs); > > static int kretprobe_dispatcher(struct kretprobe_instance *ri, > > struct pt_regs *regs); > > > > +static void __free_trace_kprobe(struct trace_kprobe *tk) > > +{ > > + kfree(tk->symbol); > > + free_percpu(tk->nhit); > > + kfree(tk); > > +} > > + > > static void free_trace_kprobe(struct trace_kprobe *tk) > > { > > if (tk) { > > trace_probe_cleanup(&tk->tp); > > - kfree(tk->symbol); > > - free_percpu(tk->nhit); > > - kfree(tk); > > + __free_trace_kprobe(tk); > > Why is this needed? I needed some free function that does not call trace_probe_cleanup and trace_probe_unlink.. I wrote details in the destroy_local_trace_kprobe comment below > > > } > > } > > > > @@ -1796,7 +1801,7 @@ static int unregister_kprobe_event(struct trace_kprobe *tk) > > /* create a trace_kprobe, but don't add it to global lists */ > > struct trace_event_call * > > create_local_trace_kprobe(char *func, void *addr, unsigned long offs, > > - bool is_return) > > + bool is_return, struct trace_event_call *old) > > { > > enum probe_print_type ptype; > > struct trace_kprobe *tk; > > @@ -1820,6 +1825,28 @@ create_local_trace_kprobe(char *func, void *addr, unsigned long offs, > > return ERR_CAST(tk); > > } > > > > + if (old) { > > + struct trace_kprobe *tk_old; > > + > > + tk_old = trace_kprobe_primary_from_call(old); > > So, this will choose the first(primary) probe's function name as > the representative event name. But other probes can have different > event names. ok > > > + if (!tk_old) { > > + ret = -EINVAL; > > + goto error; > > + } > > + > > + /* Append to existing event */ > > + ret = trace_probe_append(&tk->tp, &tk_old->tp); > > + if (ret) > > + goto error; > > + > > + /* Register k*probe */ > > + ret = __register_trace_kprobe(tk); > > + if (ret) > > + goto error; > > If "appended" probe failed to register, it must be "unlinked" from > the first one and goto error to free the trace_kprobe. > > if (ret) { > trace_probe_unlink(&tk->tp); > goto error; > } > > See append_trace_kprobe() for details. so there's goto error jumping to: error: free_trace_kprobe(tk); that calls: trace_probe_cleanup -> trace_probe_unlink that should do it, right? > > > + > > + return trace_probe_event_call(&tk->tp); > > + } > > + > > init_trace_event_call(tk); > > > > ptype = trace_kprobe_is_return(tk) ? > > @@ -1841,6 +1868,8 @@ create_local_trace_kprobe(char *func, void *addr, unsigned long offs, > > > > void destroy_local_trace_kprobe(struct trace_event_call *event_call) > > { > > + struct trace_probe_event *event; > > + struct trace_probe *pos, *tmp; > > struct trace_kprobe *tk; > > > > tk = trace_kprobe_primary_from_call(event_call); > > @@ -1852,9 +1881,15 @@ void destroy_local_trace_kprobe(struct trace_event_call *event_call) > > return; > > } > > > > - __unregister_trace_kprobe(tk); > > + event = tk->tp.event; > > + list_for_each_entry_safe(pos, tmp, &event->probes, list) { > > + list_for_each_entry_safe(pos, tmp, &event->probes, list) { > > + list_del_init(&pos->list); > > + __unregister_trace_kprobe(tk); > > + __free_trace_kprobe(tk); > > + } > > > > - free_trace_kprobe(tk); > > + trace_probe_event_free(event); > > Actually, each probe already allocated the trace_probe events (which are not > used if it is appended). Thus you have to use trace_probe_unlink(&tk->tp) in > the above loop. > > list_for_each_entry_safe(pos, tmp, &event->probes, list) { > list_for_each_entry_safe(pos, tmp, &event->probes, list) { > __unregister_trace_kprobe(tk); > trace_probe_unlink(&tk->tp); /* This will call trace_probe_event_free() internally */ > free_trace_kprobe(tk); > } so calling trace_probe_event_free inside this loop is a problem, because the loop iterates that trace_probe_event's probes list, and last probe removed will trigger trace_probe_event_free, that will free the list we iterate.. and we go down ;-) so that's why I added new free function '__free_trace_kprobe' that frees everything as free_trace_kprobe, but does not call trace_probe_unlink event = tk->tp.event; list_for_each_entry_safe(pos, tmp, &event->probes, list) { list_for_each_entry_safe(pos, tmp, &event->probes, list) { list_del_init(&pos->list); __unregister_trace_kprobe(tk); __free_trace_kprobe(tk); } trace_probe_event_free(event); and there's trace_probe_event_free(event) to make the final free thanks, jirka