Received: by 2002:a25:1506:0:0:0:0:0 with SMTP id 6csp5589251ybv; Mon, 17 Feb 2020 23:55:58 -0800 (PST) X-Google-Smtp-Source: APXvYqzkSeVs2eUlnxM1e9q5U8+q9m43IytCyKoqD/vVJYAgK/PapMDy8QQ+nzzifyL9JvfT3kcH X-Received: by 2002:a9d:6a06:: with SMTP id g6mr13755278otn.305.1582012558332; Mon, 17 Feb 2020 23:55:58 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1582012558; cv=none; d=google.com; s=arc-20160816; b=SZRu2qf/hjtEw9HW2I6F3ivGpIqR1WWo9AXXO/hywgxcH/IaZPh/TNRIK0iuWl4wMp bnRQyb6yRR+tUDxRRfTs7ngC9KAnAKXr+TSbOujyr/9oU6NtpJzxKGf+UTcYt18XvEzL FDK0qXTwQUgruBEpX6fE8JLaipDlptRjgb+EWTeT28aFBFJ8226jxrtbBjDgFYxkSesi MQV0hucVilzuIGpb9mGgx/GyyWt52ORU2/KcIrrenysqp7dFTcV8A/e6RcQV5xKbB8ss uH0u5ZJk/0KuDQ4bw8ucyraRV3eHHcr09AyH9wdolGjQj1WIGfj4/G9ftBQVIr3AKl/s 7mbQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date :dkim-signature; bh=JmfOkEwmo2M0/6xsO06LN2AzpySvN7KKCnSXY0/3UzM=; b=qIhA3o30dcUJF9XohSUIYkA4nvXP27SE+bOs0r34ZURJovHV0xncMlKAZ39lyVH9eF inDf6iNfhxLJIC2HEldH5RxauyKU7fMxWii1jyD7F/t82rYjQKY5emLhWsoiJGX69JX0 b0z5gmoNVPlTmkcLiaUcE7Gk+T48U3gJixOaArTfbzJObEJxUjG30LJgWREv6DDL3K8i VDO/KedmlsiZLwmwzhPvXrpdQLX1vNaaEkvfSXrf7d8bRb3C8owxyVy7ktmFUmv70hFr BQuZDJxgAKokl/NsD+1QJfv0eiSZMBF2xLLRBQkTgCssQPeCzpFMcduwDOZmH1yVnbNi WRyw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=fsT6nsp9; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f4si1411991oto.169.2020.02.17.23.55.43; Mon, 17 Feb 2020 23:55:58 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=fsT6nsp9; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726199AbgBRHze (ORCPT + 99 others); Tue, 18 Feb 2020 02:55:34 -0500 Received: from mail.kernel.org ([198.145.29.99]:47100 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726138AbgBRHze (ORCPT ); Tue, 18 Feb 2020 02:55:34 -0500 Received: from devnote2 (NE2965lan1.rev.em-net.ne.jp [210.141.244.193]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id C89E72176D; Tue, 18 Feb 2020 07:55:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1582012533; bh=fj5RUzJmMIJH3SsuKj7SzywH9sWCRMRrLHPA3kXBFMU=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=fsT6nsp9Ralp/l5S7MVGxllSKxpXsjtwQbz3OqDgJLKv1Gv6zZKHHQxGpOMUW1xMz r/QWyAgMFgxWjcd7zih/m2eC2AA9hKqVfYxwdnmXAxQoYSPenqzLSRMnqZM/BnKcTj ZCu696tANZ81lsqoXtoS23E2myC5bZPpIqyPtaG4= Date: Tue, 18 Feb 2020 16:55:29 +0900 From: Masami Hiramatsu To: Luigi Rizzo Cc: linux-kernel@vger.kernel.org, naveen.n.rao@linux.ibm.com, anil.s.keshavamurthy@intel.com, davem@davemloft.net, gregkh@linuxfoundation.org, Ingo Molnar , Steven Rostedt Subject: Re: [PATCH v3] kretprobe: percpu support Message-Id: <20200218165529.39e761c4be828285cc060279@kernel.org> In-Reply-To: <20200218005659.91318-1-lrizzo@google.com> References: <20200218005659.91318-1-lrizzo@google.com> X-Mailer: Sylpheed 3.5.1 (GTK+ 2.24.32; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Luigi, On Mon, 17 Feb 2020 16:56:59 -0800 Luigi Rizzo wrote: > kretprobe uses a list protected by a single lock to allocate a > kretprobe_instance in pre_handler_kretprobe(). This works poorly with > concurrent calls. Yes, there are several potential performance issue and the recycle instance is one of them. However, I think this spinlock is not so racy, but noisy (especially on many core machine) right? Racy lock is the kretprobe_hash_lock(), I would like to replace it with ftrace's per-task shadow stack. But that will be available only if CONFIG_FUNCTION_GRAPH_TRACER=y (and instance has no own payload). > This patch offers a simplified fix: the percpu_instance flag indicates > that we allocate one instance per CPU, and the allocation is contention > free, but we allow only have one pending entry per CPU (this could be > extended to a small constant number without much trouble). OK, the percpu instance idea is good to me, and I think it should be default option. Unless user specifies the number of instances, it should choose percpu instance by default. Moreover, this makes things a bit complicated, can you add per-cpu instance array? If it is there, we can remove the old recycle rp insn code. > > Signed-off-by: Luigi Rizzo > --- > include/linux/kprobes.h | 10 ++++++++-- > kernel/kprobes.c | 41 +++++++++++++++++++++++++++++++++++++++++ > kernel/test_kprobes.c | 18 +++++++++++++----- > 3 files changed, 62 insertions(+), 7 deletions(-) > > diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h > index 04bdaf01112cb..5c549d6a599b9 100644 > --- a/include/linux/kprobes.h > +++ b/include/linux/kprobes.h > @@ -142,7 +142,8 @@ static inline int kprobe_ftrace(struct kprobe *p) > * can be active concurrently. > * nmissed - tracks the number of times the probed function's return was > * ignored, due to maxactive being too low. > - * > + * percpu_instance - if set, uses one instance per cpu instead of allocating > + * from the list protected by lock. > */ > struct kretprobe { > struct kprobe kp; > @@ -151,8 +152,13 @@ struct kretprobe { > int maxactive; > int nmissed; > size_t data_size; > - struct hlist_head free_instances; > + union { > + struct kretprobe_instance __percpu *pcpu; > + struct hlist_head free_instances; > + }; > raw_spinlock_t lock; > + u32 percpu_instance:1; > + u32 unused:31; Please use a bool for the flag. > }; > > struct kretprobe_instance { > diff --git a/kernel/kprobes.c b/kernel/kprobes.c > index 2625c241ac00f..12ca682083ffc 100644 > --- a/kernel/kprobes.c > +++ b/kernel/kprobes.c > @@ -1184,6 +1184,10 @@ void recycle_rp_inst(struct kretprobe_instance *ri, > hlist_del(&ri->hlist); > INIT_HLIST_NODE(&ri->hlist); > if (likely(rp)) { > + if (rp->percpu_instance) { > + ri->rp = NULL; > + return; > + } > raw_spin_lock(&rp->lock); > hlist_add_head(&ri->hlist, &rp->free_instances); > raw_spin_unlock(&rp->lock); > @@ -1274,6 +1278,11 @@ static inline void free_rp_inst(struct kretprobe *rp) > struct kretprobe_instance *ri; > struct hlist_node *next; > > + if (rp->percpu_instance) { > + free_percpu(rp->pcpu); > + return; > + } > + > hlist_for_each_entry_safe(ri, next, &rp->free_instances, hlist) { > hlist_del(&ri->hlist); > kfree(ri); > @@ -1874,6 +1883,27 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs) > > /* TODO: consider to only swap the RA after the last pre_handler fired */ > hash = hash_ptr(current, KPROBE_HASH_BITS); Hmm, this hash calculation position is also not optimized. This should be done right before kretprobe_table_lock(). > + if (rp->percpu_instance) { > + unsigned long flags; We already have flags, right? > + > + local_irq_save(flags); > + ri = this_cpu_ptr(rp->pcpu); > + if (!ri || ri->rp) { /* already in use */ > + local_irq_restore(flags); > + rp->nmissed++; > + return 0; > + } > + INIT_HLIST_NODE(&ri->hlist); > + ri->rp = rp; > + ri->task = current; > + local_irq_restore(flags); > + > + if (rp->entry_handler && rp->entry_handler(ri, regs)) { > + ri->rp = NULL; /* failed */ > + return 0; > + } > + goto good; Can't you make a new helper function for finding new free instance? (note: don't forget to make it NOKPROBE) What I would like to see is; ri = get_free_rp_instance(rp); /* If !ri, increment rp->nmissed */ if (ri) { arch_prepare_kretprobe(ri, regs); ... } > + } > raw_spin_lock_irqsave(&rp->lock, flags); > if (!hlist_empty(&rp->free_instances)) { > ri = hlist_entry(rp->free_instances.first, > @@ -1891,6 +1921,7 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs) > return 0; > } > > +good: > arch_prepare_kretprobe(ri, regs); > > /* XXX(hch): why is there no hlist_move_head? */ > @@ -1950,6 +1981,15 @@ int register_kretprobe(struct kretprobe *rp) > rp->kp.post_handler = NULL; > rp->kp.fault_handler = NULL; > > + if (rp->percpu_instance) { > + rp->pcpu = __alloc_percpu(sizeof(*rp->pcpu) + rp->data_size, > + __alignof__(*rp->pcpu)); > + if (rp->pcpu) > + goto finalize; > + free_rp_inst(rp); > + return -ENOMEM; > + } > + > /* Pre-allocate memory for max kretprobe instances */ > if (rp->maxactive <= 0) { Above new code should be here, and set rp->percpu_instance = true. Thank you, -- Masami Hiramatsu