Received: by 2002:a05:6a10:f3d0:0:0:0:0 with SMTP id a16csp191215pxv; Wed, 30 Jun 2021 03:24:11 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwOjbbfgj1TnlIke4oZgxxflcXoZeprg3j0r6zF4CDHGLPMkLp3aj38uXZU6/J2vvXgzLG1 X-Received: by 2002:a02:bb02:: with SMTP id y2mr253010jan.83.1625048651424; Wed, 30 Jun 2021 03:24:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1625048651; cv=none; d=google.com; s=arc-20160816; b=cao6Gpo4znemcbMKozJrHeezRJV7BD4f44ShV0W6FcVJajl0yDTL0APBHqd0+xXZMh JEc1o5i43y/cw/23R0XYCiYz6AVuAmpmauGHkxhSPitopSTi/rKKQ7WKAzH0s0rLj5XW qBYRQwSPv2bbamAlJd5aLzJlXw3jRxbYF7tw/v0ZpSp8na5uvujGIB354DG47aAZPyNv PzP/gr2NUiwunJotY68cczGDo1lc5CJTKgbI3fsCuyYX5AUzeys/XWlNOsu6CSoD5vS5 RZ3c9st22NTOGWOP+kHa+64iul0YPF3hHhFgr1R0eY7Yu7z1jbOXvM6PBBv5xkpD/AOM dKtw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject; bh=LXJtp9vPYYcSjvkZZUDXPlBSz6A3n5kPC3evEVhvtic=; b=xdm5sSVYUNGQVwmtU+zrR9i4smsrnt+EFU+zEacVOG+bEXYpxICg9EUl1EkGsDqXM3 O50w3lWhrha/Wc3JQfzjMu5gnKiRRu4grMdxd4NlkNg8WvSCdQRM4nOs9wfrXxNmM8Qa eHRnxmbbKIQ7GMsU+sDRrQp0ukgn4fJXu/aGrr7ftv152bMKgBqNfA/gdZ0iarHoSIeH jZc3TRcKOE9fJeIecvbYPCsOkBMjsn8W0DUMuP6kuiBveulyrZwHGVdOzf8OTg5Plo4H y3NaA7pOlKutjGIFHjW+vE9d4JxBtSUa4rUp5YCQMD/sT69QU6ZjWsZQrEEqJ8wBAxAn cJYw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id a4si11105501ilj.44.2021.06.30.03.23.59; Wed, 30 Jun 2021 03:24:11 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234161AbhF3KYg (ORCPT + 99 others); Wed, 30 Jun 2021 06:24:36 -0400 Received: from foss.arm.com ([217.140.110.172]:35038 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234112AbhF3KYg (ORCPT ); Wed, 30 Jun 2021 06:24:36 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 703B011D4; Wed, 30 Jun 2021 03:22:07 -0700 (PDT) Received: from [10.57.40.45] (unknown [10.57.40.45]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 68B6C3F5A1; Wed, 30 Jun 2021 03:22:05 -0700 (PDT) Subject: Re: [RFC PATCH] arm64: kprobes: Enable OPTPROBE for arm64 To: "Song Bao Hua (Barry Song)" , "liuqi (BA)" , Linuxarm , "catalin.marinas@arm.com" , "will@kernel.org" , "linux-arm-kernel@lists.infradead.org" , Jean-Philippe Brucker Cc: "Zengtao (B)" , "linux-kernel@vger.kernel.org" References: <1622803839-27354-1-git-send-email-liuqi115@huawei.com> <2409bcc3-fc5c-1e41-d7be-c81e59042c4d@huawei.com> From: Robin Murphy Message-ID: <527265b8-35c3-eeec-5751-cc2920184d4e@arm.com> Date: Wed, 30 Jun 2021 11:22:00 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-GB Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2021-06-30 08:05, Song Bao Hua (Barry Song) wrote: >> >> On 2021/6/4 18:50, Qi Liu wrote: >>> This patch introduce optprobe for ARM64. In optprobe, probed >>> instruction is replaced by a branch instruction to detour >>> buffer. Detour buffer contains trampoline code and a call to >>> optimized_callback(). optimized_callback() calls opt_pre_handler() >>> to execute kprobe handler. >>> >>> Limitations: >>> - We only support !CONFIG_RANDOMIZE_MODULE_REGION_FULL case to >>> guarantee the offset between probe point and kprobe pre_handler >>> is not larger than 128MiB. >>> >>> Performance of optprobe on Hip08 platform is test using kprobe >>> example module[1] to analyze the latency of a kernel function, >>> and here is the result: > > + Jean-Philippe Brucker as well. > > I assume both Jean and Robin expressed interest on having > an optprobe solution on ARM64 in a previous discussion > when I tried to add some tracepoints for debugging: > "[PATCH] iommu/arm-smmu-v3: add tracepoints for cmdq_issue_cmdlist" > > https://lore.kernel.org/linux-arm-kernel/20200828083325.GC3825485@myrica/ > https://lore.kernel.org/linux-arm-kernel/9acf1acf-19fb-26db-e908-eb4d4c666bae@arm.com/ FWIW mine was a more general comment that if the possibility exists, making kprobes faster seems more productive than adding tracepoints to every bit of code where performance might be of interest to work around kprobes being slow. I don't know enough about the details to meaningfully review an implementation, sorry. >>> >>> [1] >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/sa >> mples/kprobes/kretprobe_example.c >>> >>> kprobe before optimized: >>> [280709.846380] do_empty returned 0 and took 1530 ns to execute >>> [280709.852057] do_empty returned 0 and took 550 ns to execute >>> [280709.857631] do_empty returned 0 and took 440 ns to execute >>> [280709.863215] do_empty returned 0 and took 380 ns to execute >>> [280709.868787] do_empty returned 0 and took 360 ns to execute >>> [280709.874362] do_empty returned 0 and took 340 ns to execute >>> [280709.879936] do_empty returned 0 and took 320 ns to execute >>> [280709.885505] do_empty returned 0 and took 300 ns to execute >>> [280709.891075] do_empty returned 0 and took 280 ns to execute >>> [280709.896646] do_empty returned 0 and took 290 ns to execute >>> [280709.902220] do_empty returned 0 and took 290 ns to execute >>> [280709.907807] do_empty returned 0 and took 290 ns to execute > > I used to see the same phenomenon when I used kprobe to debug > arm64 smmu driver. When a kprobe was executed for the first > time, it was crazily slow. But second time it became much faster > though it was still slow and affected the performance related > debugging negatively. > Not sure if it was due to hot cache or something. I didn't dig > into it. From the shape of the data, my hunch would be that retraining of branch predictors is probably a factor (but again I don't know enough about the existing kprobes implementation to back that up). Robin.