Received: by 2002:a05:6a10:1287:0:0:0:0 with SMTP id d7csp6816881pxv; Fri, 30 Jul 2021 03:09:30 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxPWlNxpX3awavNPiOTyRS4Tani9re9lzUJpAQWLYnciFkLo3HvIMno6QroYAUwyqatavTE X-Received: by 2002:a05:6402:1d30:: with SMTP id dh16mr2019334edb.224.1627639770170; Fri, 30 Jul 2021 03:09:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1627639770; cv=none; d=google.com; s=arc-20160816; b=fKypTDPg8BdLkWMarTQHCJ8VscMgpE5RLBbo1zG2WOQgbyMrdN9Qjid1gcr7XTgwVf Ucm+5Dtwdl1xaXnrDMOYpY0TPI+4Tp+MxVF8AulXZK4OqWGo/bh76wW7sJvf8R2A7SDa d0VxdQPVceN7xPFVdcCyLdMKv5wDTxrYcleN7Z+A7fMZzIL64QanRB6wK9Q3rbWvjVQq iUQhGBc1uZzTbw+lUXzCexXzB75McNyth22iSQ+iw5rdY51ZHwjOXNV+Sp1Vj1iO+sBp YQmHWD2S3upDa9b8NolVeHQnMzCBCRrqg4ZDhPN7JkB3vaqWQQKH9q3UD2OzA8i5IvS5 Wktw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:content-transfer-encoding :content-language:accept-language:references:message-id:date :thread-index:thread-topic:subject:cc:to:from; bh=Ti9UFYDrSeiHn73dLzTzGloD5PZT83UgM2S/WSWnF7I=; b=OPZh+ik5C0gC/XbSa5invcGYUkF6IzAsgzF6TWVFeh6b5g+jEk9OEyAG4D51DAHl6h cxc6LznIScIsOShWiZJ6K8NRtoAqq7jtJQ57WW+Zjb0BCTqz/7CACAmNOlfAAJG+w9uA T/xmd0PypMKtwZrxp9oWM24i/QUITiZaanwMobeVRdraSe3siW8Ht6Xg7eJOGVm5nnLq YrleklJQCVz/eOKiAcP0awm7f2vedTcEma3jkDAQetK2+7EkW5l3fxZbYaK5YJX99Ef9 qTTQz5Rk9e9maC2l3dBmiJHuuvURFYt36PQeG0Bm1bPZ08ZnaDnZEG83jGeyBXaaHsc3 U8AQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=hisilicon.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id co20si1159102edb.50.2021.07.30.03.09.05; Fri, 30 Jul 2021 03:09:30 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=hisilicon.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238616AbhG3KEc convert rfc822-to-8bit (ORCPT + 99 others); Fri, 30 Jul 2021 06:04:32 -0400 Received: from szxga04-in.huawei.com ([45.249.212.190]:6645 "EHLO szxga04-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238550AbhG3KEO (ORCPT ); Fri, 30 Jul 2021 06:04:14 -0400 Received: from dggems704-chm.china.huawei.com (unknown [172.30.72.58]) by szxga04-in.huawei.com (SkyGuard) with ESMTP id 4GbjZD3pQtz18NRB; Fri, 30 Jul 2021 18:00:20 +0800 (CST) Received: from dggemi761-chm.china.huawei.com (10.1.198.147) by dggems704-chm.china.huawei.com (10.3.19.181) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256) id 15.1.2176.2; Fri, 30 Jul 2021 18:04:07 +0800 Received: from dggemi761-chm.china.huawei.com ([10.9.49.202]) by dggemi761-chm.china.huawei.com ([10.9.49.202]) with mapi id 15.01.2176.012; Fri, 30 Jul 2021 18:04:06 +0800 From: "Song Bao Hua (Barry Song)" To: Masami Hiramatsu CC: "liuqi (BA)" , "catalin.marinas@arm.com" , "will@kernel.org" , "naveen.n.rao@linux.ibm.com" , "anil.s.keshavamurthy@intel.com" , "davem@davemloft.net" , "linux-arm-kernel@lists.infradead.org" , "Zengtao (B)" , "robin.murphy@arm.com" , Linuxarm , "linux-kernel@vger.kernel.org" Subject: RE: [PATCH] arm64: kprobe: Enable OPTPROBE for arm64 Thread-Topic: [PATCH] arm64: kprobe: Enable OPTPROBE for arm64 Thread-Index: AQHXfJkyPamnf4yoyEqhBrLk7zE456tMmTOAgAIsGRD//9DNAIABRquggAt7BNA= Date: Fri, 30 Jul 2021 10:04:06 +0000 Message-ID: References: <20210719122417.10355-1-liuqi115@huawei.com> <20210721174153.34c1898dc9eea135eb0b8be8@kernel.org> <332df5b7d7bb4bd096b6521ffefaabe6@hisilicon.com> <20210723000318.5594c86e7c454aed82d9465d@kernel.org> Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.126.200.38] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > -----Original Message----- > From: Song Bao Hua (Barry Song) > Sent: Friday, July 23, 2021 2:43 PM > To: 'Masami Hiramatsu' > Cc: liuqi (BA) ; catalin.marinas@arm.com; > will@kernel.org; naveen.n.rao@linux.ibm.com; anil.s.keshavamurthy@intel.com; > davem@davemloft.net; linux-arm-kernel@lists.infradead.org; Zengtao (B) > ; robin.murphy@arm.com; Linuxarm > ; linux-kernel@vger.kernel.org > Subject: RE: [PATCH] arm64: kprobe: Enable OPTPROBE for arm64 > > > > > -----Original Message----- > > From: Masami Hiramatsu [mailto:mhiramat@kernel.org] > > Sent: Friday, July 23, 2021 3:03 AM > > To: Song Bao Hua (Barry Song) > > Cc: liuqi (BA) ; catalin.marinas@arm.com; > > will@kernel.org; naveen.n.rao@linux.ibm.com; > anil.s.keshavamurthy@intel.com; > > davem@davemloft.net; linux-arm-kernel@lists.infradead.org; Zengtao (B) > > ; robin.murphy@arm.com; Linuxarm > > ; linux-kernel@vger.kernel.org > > Subject: Re: [PATCH] arm64: kprobe: Enable OPTPROBE for arm64 > > > > Hi Song, > > > > On Thu, 22 Jul 2021 10:24:54 +0000 > > "Song Bao Hua (Barry Song)" wrote: > > > > > > > > > > > > -----Original Message----- > > > > From: Masami Hiramatsu [mailto:mhiramat@kernel.org] > > > > Sent: Wednesday, July 21, 2021 8:42 PM > > > > To: liuqi (BA) > > > > Cc: catalin.marinas@arm.com; will@kernel.org; > naveen.n.rao@linux.ibm.com; > > > > anil.s.keshavamurthy@intel.com; davem@davemloft.net; > > > > linux-arm-kernel@lists.infradead.org; Song Bao Hua (Barry Song) > > > > ; Zengtao (B) ; > > > > robin.murphy@arm.com; Linuxarm ; > > > > linux-kernel@vger.kernel.org > > > > Subject: Re: [PATCH] arm64: kprobe: Enable OPTPROBE for arm64 > > > > > > > > Hi Qi, > > > > > > > > Thanks for your effort! > > > > > > > > On Mon, 19 Jul 2021 20:24:17 +0800 > > > > Qi Liu wrote: > > > > > > > > > This patch introduce optprobe for ARM64. In optprobe, probed > > > > > instruction is replaced by a branch instruction to detour > > > > > buffer. Detour buffer contains trampoline code and a call to > > > > > optimized_callback(). optimized_callback() calls opt_pre_handler() > > > > > to execute kprobe handler. > > > > > > > > OK so this will replace only one instruction. > > > > > > > > > > > > > > Limitations: > > > > > - We only support !CONFIG_RANDOMIZE_MODULE_REGION_FULL case to > > > > > guarantee the offset between probe point and kprobe pre_handler > > > > > is not larger than 128MiB. > > > > > > > > Hmm, shouldn't we depends on !CONFIG_ARM64_MODULE_PLTS? Or, > > > > allocate an intermediate trampoline area similar to arm optprobe > > > > does. > > > > > > Depending on !CONFIG_ARM64_MODULE_PLTS will totally disable > > > RANDOMIZE_BASE according to arch/arm64/Kconfig: > > > config RANDOMIZE_BASE > > > bool "Randomize the address of the kernel image" > > > select ARM64_MODULE_PLTS if MODULES > > > select RELOCATABLE > > > > Yes, but why it is required for "RANDOMIZE_BASE"? > > Does that imply the module call might need to use PLT in > > some cases? > > > > > > > > Depending on !RANDOMIZE_MODULE_REGION_FULL seems to be still > > > allowing RANDOMIZE_BASE via avoiding long jump according to: > > > arch/arm64/Kconfig: > > > > > > config RANDOMIZE_MODULE_REGION_FULL > > > bool "Randomize the module region over a 4 GB range" > > > depends on RANDOMIZE_BASE > > > default y > > > help > > > Randomizes the location of the module region inside a 4 GB window > > > covering the core kernel. This way, it is less likely for modules > > > to leak information about the location of core kernel data structures > > > but it does imply that function calls between modules and the core > > > kernel will need to be resolved via veneers in the module PLT. > > > > > > When this option is not set, the module region will be randomized over > > > a limited range that contains the [_stext, _etext] interval of the > > > core kernel, so branch relocations are always in range. > > > > Hmm, this dependency looks strange. If it always in range, don't we need > > PLT for modules? > > > > Cataline, would you know why? > > Maybe it's a KASLR's Kconfig issue? > > I actually didn't see any problem after making this change: > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > index e07e7de9ac49..6440671b72e0 100644 > --- a/arch/arm64/Kconfig > +++ b/arch/arm64/Kconfig > @@ -1781,7 +1781,6 @@ config RELOCATABLE > > config RANDOMIZE_BASE > bool "Randomize the address of the kernel image" > - select ARM64_MODULE_PLTS if MODULES > select RELOCATABLE > help > Randomizes the virtual address at which the kernel image is > @@ -1801,6 +1800,7 @@ config RANDOMIZE_BASE > config RANDOMIZE_MODULE_REGION_FULL > bool "Randomize the module region over a 4 GB range" > depends on RANDOMIZE_BASE > + select ARM64_MODULE_PLTS if MODULES > default y > help > Randomizes the location of the module region inside a 4 GB window > > and having this config: > # zcat /proc/config.gz | grep RANDOMIZE_BASE > CONFIG_RANDOMIZE_BASE=y > > # zcat /proc/config.gz | grep RANDOMIZE_MODULE_REGION_FULL > # CONFIG_RANDOMIZE_MODULE_REGION_FULL is not set > > # zcat /proc/config.gz | grep ARM64_MODULE_PLTS > # CONFIG_ARM64_MODULE_PLTS is not set > > Modules work all good: > # lsmod > Module Size Used by > btrfs 1355776 0 > blake2b_generic 20480 0 > libcrc32c 16384 1 btrfs > xor 20480 1 btrfs > xor_neon 16384 1 xor > zstd_compress 163840 1 btrfs > raid6_pq 110592 1 btrfs > ctr 16384 0 > md5 16384 0 > ip_tunnel 32768 0 > ipv6 442368 28 > > > I am not quite sure if there is a corner case. If no, > I would think the kconfig might be some improper. The corner case is that even CONFIG_RANDOMIZE_MODULE_REGION_FULL is not enabled, but if CONFIG_ARM64_MODULE_PLTS is enabled, when we can't get memory from the 128MB area in case the area is exhausted, we will fall back in module_alloc() to a 2GB area as long as either of the below two conditions is met: 1. KASAN is not enabled 2. KASAN is enabled and CONFIG_KASAN_VMALLOC is also enabled. void *module_alloc(unsigned long size) { u64 module_alloc_end = module_alloc_base + MODULES_VSIZE; gfp_t gfp_mask = GFP_KERNEL; void *p; /* Silence the initial allocation */ if (IS_ENABLED(CONFIG_ARM64_MODULE_PLTS)) gfp_mask |= __GFP_NOWARN; if (IS_ENABLED(CONFIG_KASAN_GENERIC) || IS_ENABLED(CONFIG_KASAN_SW_TAGS)) /* don't exceed the static module region - see below */ module_alloc_end = MODULES_END; p = __vmalloc_node_range(size, MODULE_ALIGN, module_alloc_base, module_alloc_end, gfp_mask, PAGE_KERNEL, 0, NUMA_NO_NODE, __builtin_return_address(0)); if (!p && IS_ENABLED(CONFIG_ARM64_MODULE_PLTS) && (IS_ENABLED(CONFIG_KASAN_VMALLOC) || (!IS_ENABLED(CONFIG_KASAN_GENERIC) && !IS_ENABLED(CONFIG_KASAN_SW_TAGS)))) /* * KASAN without KASAN_VMALLOC can only deal with module * allocations being served from the reserved module region, * since the remainder of the vmalloc region is already * backed by zero shadow pages, and punching holes into it * is non-trivial. Since the module region is not randomized * when KASAN is enabled without KASAN_VMALLOC, it is even * less likely that the module region gets exhausted, so we * can simply omit this fallback in that case. */ p = __vmalloc_node_range(size, MODULE_ALIGN, module_alloc_base, module_alloc_base + SZ_2G, GFP_KERNEL, PAGE_KERNEL, 0, NUMA_NO_NODE, __builtin_return_address(0)); if (p && (kasan_module_alloc(p, size) < 0)) { vfree(p); return NULL; } return p; } This should be happening quite rarely. But maybe arm64's document needs some minor fixup, otherwise, it is quite confusing. > > > > > > > > and > > > > > > arch/arm64/kernel/kaslr.c: > > > if (IS_ENABLED(CONFIG_RANDOMIZE_MODULE_REGION_FULL)) { > > > /* > > > * Randomize the module region over a 2 GB window covering the > > > * kernel. This reduces the risk of modules leaking information > > > * about the address of the kernel itself, but results in > > > * branches between modules and the core kernel that are > > > * resolved via PLTs. (Branches between modules will be > > > * resolved normally.) > > > */ > > > module_range = SZ_2G - (u64)(_end - _stext); > > > module_alloc_base = max((u64)_end + offset - SZ_2G, > > > (u64)MODULES_VADDR); > > > } else { > > > /* > > > * Randomize the module region by setting module_alloc_base to > > > * a PAGE_SIZE multiple in the range [_etext - MODULES_VSIZE, > > > * _stext) . This guarantees that the resulting region still > > > * covers [_stext, _etext], and that all relative branches can > > > * be resolved without veneers. > > > */ > > > module_range = MODULES_VSIZE - (u64)(_etext - _stext); > > > module_alloc_base = (u64)_etext + offset - MODULES_VSIZE; > > > } > > > > > > So depending on ! ARM64_MODULE_PLTS seems to narrow the scenarios > > > while depending on ! RANDOMIZE_MODULE_REGION_FULL permit more > > > machines to use optprobe. > > > > OK, I see that the code ensures the range will be in the MODULE_VSIZE (=128MB). > > > > > > > > I am not quite sure I am 100% right but tests seem to back this. > > > hopefully Catalin and Will can correct me. > > > > > > > > > > > > > > > > > Performance of optprobe on Hip08 platform is test using kprobe > > > > > example module[1] to analyze the latency of a kernel function, > > > > > and here is the result: > > > > > > > > > > [1] > > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/sa > > > > mples/kprobes/kretprobe_example.c > > > > > > > > > > kprobe before optimized: > > > > > [280709.846380] do_empty returned 0 and took 1530 ns to execute > > > > > [280709.852057] do_empty returned 0 and took 550 ns to execute > > > > > [280709.857631] do_empty returned 0 and took 440 ns to execute > > > > > [280709.863215] do_empty returned 0 and took 380 ns to execute > > > > > [280709.868787] do_empty returned 0 and took 360 ns to execute > > > > > [280709.874362] do_empty returned 0 and took 340 ns to execute > > > > > [280709.879936] do_empty returned 0 and took 320 ns to execute > > > > > [280709.885505] do_empty returned 0 and took 300 ns to execute > > > > > [280709.891075] do_empty returned 0 and took 280 ns to execute > > > > > [280709.896646] do_empty returned 0 and took 290 ns to execute > > > > > [280709.902220] do_empty returned 0 and took 290 ns to execute > > > > > [280709.907807] do_empty returned 0 and took 290 ns to execute > > > > > > > > > > optprobe: > > > > > [ 2965.964572] do_empty returned 0 and took 90 ns to execute > > > > > [ 2965.969952] do_empty returned 0 and took 80 ns to execute > > > > > [ 2965.975332] do_empty returned 0 and took 70 ns to execute > > > > > [ 2965.980714] do_empty returned 0 and took 60 ns to execute > > > > > [ 2965.986128] do_empty returned 0 and took 80 ns to execute > > > > > [ 2965.991507] do_empty returned 0 and took 70 ns to execute > > > > > [ 2965.996884] do_empty returned 0 and took 70 ns to execute > > > > > [ 2966.002262] do_empty returned 0 and took 80 ns to execute > > > > > [ 2966.007642] do_empty returned 0 and took 70 ns to execute > > > > > [ 2966.013020] do_empty returned 0 and took 70 ns to execute > > > > > [ 2966.018400] do_empty returned 0 and took 70 ns to execute > > > > > [ 2966.023779] do_empty returned 0 and took 70 ns to execute > > > > > [ 2966.029158] do_empty returned 0 and took 70 ns to execute > > > > > > > > Great result! > > > > I have other comments on the code below. > > > > > > > > [...] > > > > > diff --git a/arch/arm64/kernel/probes/kprobes.c > > > > b/arch/arm64/kernel/probes/kprobes.c > > > > > index 6dbcc89f6662..83755ad62abe 100644 > > > > > --- a/arch/arm64/kernel/probes/kprobes.c > > > > > +++ b/arch/arm64/kernel/probes/kprobes.c > > > > > @@ -11,6 +11,7 @@ > > > > > #include > > > > > #include > > > > > #include > > > > > +#include > > > > > #include > > > > > #include > > > > > #include > > > > > @@ -113,9 +114,21 @@ int __kprobes arch_prepare_kprobe(struct kprobe > *p) > > > > > > > > > > void *alloc_insn_page(void) > > > > > { > > > > > - return __vmalloc_node_range(PAGE_SIZE, 1, VMALLOC_START, > > VMALLOC_END, > > > > > - GFP_KERNEL, PAGE_KERNEL_ROX, VM_FLUSH_RESET_PERMS, > > > > > - NUMA_NO_NODE, __builtin_return_address(0)); > > > > > + void *page; > > > > > + > > > > > + page = module_alloc(PAGE_SIZE); > > > > > + if (!page) > > > > > + return NULL; > > > > > + > > > > > + set_vm_flush_reset_perms(page); > > > > > + /* > > > > > + * First make the page read-only, and only then make it executable > > to > > > > > + * prevent it from being W+X in between. > > > > > + */ > > > > > + set_memory_ro((unsigned long)page, 1); > > > > > + set_memory_x((unsigned long)page, 1); > > > > > + > > > > > + return page; > > > > > > > > Isn't this a separated change? Or any reason why you have to > > > > change this function? > > > > > > As far as I can tell, this is still related with the 128MB > > > short jump limitation. > > > VMALLOC_START, VMALLOC_END is an fixed virtual address area > > > which isn't necessarily modules will be put. > > > So this patch is moving to module_alloc() which will get > > > memory between module_alloc_base and module_alloc_end. > > > > Ah, I missed that point. Yes, VMALLOC_START and VMALLOC_END > > are not correct range. > > > > > > > > Together with depending on !RANDOMIZE_MODULE_REGION_FULL, > > > this makes all kernel, module and trampoline in short > > > jmp area. > > > > > > As long as we can figure out a way to support long jmp > > > for optprobe, the change in alloc_insn_page() can be > > > dropped. > > > > No, I think above change is rather readable, so it is OK. > > > > > > > > Masami, any reference code from any platform to support long > > > jump for optprobe? For long jmp, we need to put jmp address > > > to a memory and then somehow load the target address > > > to PC. Right now, we are able to replace an instruction > > > only. That is the problem. > > > > Hmm, I had read a paper about 2-stage jump idea 15years ago. That > > paper allocated an intermediate trampoline (like PLT) which did a long > > jump to the real trampoline on SPARC. > > (something like, "push x0; ldr x0, [pc+8]; br x0; " for > > a slot of the intermediate trampoline.) > > > > For the other (simpler) solution example is optprobe in powerpc > > (arch/powerpc/kernel/optprobes_head.S). That reserves a buffer page > > in the text section, and use it. > > > > But I think your current implementation is good enough for the > > first step. If someone needs CONFIG_RANDOMIZE_MODULE_REGION_FULL > > and optprobe, we can revisit this point. > > > > Thank you, Thanks Barry