Received: by 2002:ab2:7a55:0:b0:1f4:4a7d:290d with SMTP id u21csp166689lqp; Thu, 4 Apr 2024 09:36:25 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCUgQBZeE3mJfskTX2idyODyQews6Z3vEiSU5vwMctLIeqJsVkIBa6ZWtYHQTcpEH9J6Du/SI8OM0sY7ScWY5OE1Fwv8dg33UgIgccWEWg== X-Google-Smtp-Source: AGHT+IH2sBoxuLoFjpcirs0fTzzg5WDatVc9ursPX6374tkqspeqLO9MaqKADstcJiUtrJXq0qv7 X-Received: by 2002:a17:90a:4bc6:b0:2a0:9741:c957 with SMTP id u6-20020a17090a4bc600b002a09741c957mr2808109pjl.33.1712248584619; Thu, 04 Apr 2024 09:36:24 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1712248584; cv=pass; d=google.com; s=arc-20160816; b=XnIIfwP/3FiSbaq6tCNQrg1ORewPX7DWioMiBimWOk41LN7mT8qArrw6IJEvHIHtkH duMYsnnr+/qAFZAYyJ1KvSW2O1R/nwrY8DORYEjh6FwCa3eWiEa4YLABNIds0S8heVkF 1vpeV14GrTfeUeX3ubMgDkONsiHYPvDOxVh1cXdYsfzLKeeGIXDCMqSkaFZVPHiYi3Ed XzobEU1NeMiQFueqH8p2Gpi1bmsfvqu1VALPVIB3CAw8ENI7plaJZDU0nwyWOOEWuYRe Ap74L+h8A+p34K9t9SfDXwnPEirkg67wcC8igp1oDkhfkiyLlQHC0nDTUv/LnyR07EuW 93DA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :subject:cc:to:from:date:dkim-signature; bh=/oiTPD3i4uvfASWoUzvSocG+WxiwsuUNlr0fphWET54=; fh=0NpOILLL5Sq0q9WB+kqlBqHOYUKTunR47HByP4+A75o=; b=FN+AIWpEQ0fKopqrHjlF7qeua5+5EWd+V1EIM0LMHI7DZMuMKklh8SkISZGegsKeju cDXwpoHueV1uyv9knvbUQDst4OGiORQd7Zph/t3GMsYQxd//xQjm14HbRfzH55WXZfwk nOG48IHfYn0TitoFtKvDDsTUQr4/Z/QmSLUxm98lyUronUBSw+Ufdek4gjIbDjeXyFvB QFL1JUMdIluZxcpeG/cyJkm76vKhv4jHynlVKduWX3olVLSL1qeL4glPRG7mW54v+tgh VobNz7Mpe8l8RV1rHz+UJtH3ZSPsGOA0o0ZkDQR2EhWztc06lQUyQ5jC8EIqQJ8wmp5Z JdlA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="g+9LmoP/"; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-131789-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-131789-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [147.75.48.161]) by mx.google.com with ESMTPS id qe2-20020a17090b4f8200b002a27f4c5d43si2103289pjb.66.2024.04.04.09.36.24 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 Apr 2024 09:36:24 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-131789-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) client-ip=147.75.48.161; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="g+9LmoP/"; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-131789-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-131789-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 9D5B5B23CF2 for ; Thu, 4 Apr 2024 16:06:28 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 4684F12AACB; Thu, 4 Apr 2024 16:06:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="g+9LmoP/" Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 397AA1BC46; Thu, 4 Apr 2024 16:06:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712246769; cv=none; b=t9AIG/wWdU2k49tF/s4A2vE5iXGTgK8mJ9RNJprZ9ogeIQN6v2JI/AAr19nav6o8AQ8I3hp3BlNv85cVKOS3FiHWqcplhW34MaTf0xs8DNAJX1szjDtSno0Wnit6YQnEYA7wtSad/cC4FsbGGI6GZXN1qirp5J0lOlJrPRueoqY= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712246769; c=relaxed/simple; bh=+5n1opxFvTrkkm7rzpBmiqYFvlJrF7IWFtBu70bBigs=; h=Date:From:To:Cc:Subject:Message-Id:In-Reply-To:References: Mime-Version:Content-Type; b=p6fT5hf2KqnF2T/RpHCCuL875l2tRM6PEc1eDJH9wOWbL2o86g32ZunSQu/LFx62qFhVxINayH4ENrpEJtMQEB58mx5QIGOJeas6HFmfZP5NEbO8WrE2M17rVQSn0C1vbUF6cGIWxgAMb46TNqwlrM/bi3BgRTAUFP12Qyti3Xs= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=g+9LmoP/; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 79270C43390; Thu, 4 Apr 2024 16:06:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1712246768; bh=+5n1opxFvTrkkm7rzpBmiqYFvlJrF7IWFtBu70bBigs=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=g+9LmoP/qNM5Fayaerz57GtxMTmBUXsjnSbxebbCkQxULEi3MCu4hJBJzVmLZzcVh 7jQ+YkTF0SnkK3uxx0dZGfgkxjvD7VvUIHQlTi3B/79sofEimfUjezt5UQZeOBDcEv a3QRqDuXvPUbMBA9RcjPB4i4jHeB0A49dVYhg10xQzGOsr512khahosuoZi1iUgXF2 Ks0sl41DeIUiGVxyRGEW56kdBnFRb0AvRTsViKdufvUhNiMyfrV+L/vXWfTmPCDW2j yipFk6iA39YdQpVPb8z83KsbJdxVev2nrESqvgGMIA8SsxCGFVtPtPfJIlOefQw+XB MUE8BKBXFw8nQ== Date: Fri, 5 Apr 2024 01:06:03 +0900 From: Masami Hiramatsu (Google) To: Jiri Olsa Cc: Andrii Nakryiko , Masami Hiramatsu , Steven Rostedt , linux-api@vger.kernel.org, Oleg Nesterov , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, bpf@vger.kernel.org, Song Liu , Yonghong Song , John Fastabend , Peter Zijlstra , Thomas Gleixner , "Borislav Petkov (AMD)" , x86@kernel.org Subject: Re: [PATCHv2 1/3] uprobe: Add uretprobe syscall to speed up return probe Message-Id: <20240405010603.58bd96cdf56d456103844e65@kernel.org> In-Reply-To: References: <20240402093302.2416467-1-jolsa@kernel.org> <20240402093302.2416467-2-jolsa@kernel.org> <20240403100708.233575a8ac2a5bac2192d180@kernel.org> <20240403230937.c3bd47ee47c102cd89713ee8@kernel.org> <20240404095829.ec5db177f29cd29e849169fa@kernel.org> X-Mailer: Sylpheed 3.7.0 (GTK+ 2.24.33; x86_64-pc-linux-gnu) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit On Thu, 4 Apr 2024 13:58:43 +0200 Jiri Olsa wrote: > On Wed, Apr 03, 2024 at 07:00:07PM -0700, Andrii Nakryiko wrote: > > SNIP > > > Check rt_sigreturn syscall (manpage at [0], for example). > > > > sigreturn() exists only to allow the implementation of signal > > handlers. It should never be called directly. (Indeed, a simple > > sigreturn() wrapper in the GNU C library simply returns -1, with > > errno set to ENOSYS.) Details of the arguments (if any) passed > > to sigreturn() vary depending on the architecture. (On some > > architectures, such as x86-64, sigreturn() takes no arguments, > > since all of the information that it requires is available in the > > stack frame that was previously created by the kernel on the > > user-space stack.) > > > > This is a very similar use case. Also, check its source code in > > arch/x86/kernel/signal_64.c. It sends SIGSEGV to the calling process > > on any sign of something not being right. It's exactly the same with > > sys_uretprobe. > > > > [0] https://man7.org/linux/man-pages/man2/sigreturn.2.html > > > > > And the number of syscalls are limited resource. > > > > We have almost 500 of them, it didn't seems like adding 1-2 for good > > reasons would be a problem. Can you please point to where the limits > > on syscalls as a resource are described? I'm curious to learn. > > > > > > > > I'm actually not sure how much we need to care of it, but adding a new > > > syscall is worth to be discussed carefully because all of them are > > > user-space compatibility. > > > > Absolutely, it's a good discussion to have. > > > > > > > > > > > > Also, we should run syzkaller on this syscall. And if uretprobe is > > > > > > > > > > > > right, I'll check on syzkaller > > > > > > > > > > > > > set in the user function, what happen if the user function directly > > > > > > > calls this syscall? (maybe it consumes shadow stack?) > > > > > > > > > > > > the process should receive SIGILL if there's no pending uretprobe for > > > > > > the current task, or it will trigger uretprobe if there's one pending > > > > > > > > > > No, that is too aggressive and not safe. Since the syscall is exposed to > > > > > user program, it should return appropriate error code instead of SIGILL. > > > > > > > > > > > > > This is the way it is today with uretprobes even through interrupt. > > > > > > I doubt that the interrupt (exception) and syscall should be handled > > > differently. Especially, this exception is injected by uprobes but > > > syscall will be caused by itself. But syscall can be called from user > > > program (of couse this works as sys_kill(self, SIGILL)). > > > > Yep, I'd keep the behavior the same between uretprobes implemented > > through int3 and sys_uretprobe. > > +1 > > > > > > > > > > E.g., it could happen that user process is using fibers and is > > > > replacing stack pointer without kernel realizing this, which will > > > > trigger some defensive checks in uretprobe handling code and kernel > > > > will send SIGILL because it can't support such cases. This is > > > > happening today already, and it works fine in practice (except for > > > > applications that manually change stack pointer, too bad, you can't > > > > trace them with uretprobes, unfortunately). > > > > > > OK, we at least need to document it. > > > > +1, yep > > > > > > > > > > > > > So I think it's absolutely adequate to have this behavior if the user > > > > process is *intentionally* abusing this API. > > > > > > Of course user expected that it is abusing. So at least we need to > > > add a document that this syscall number is reserved to uprobes and > > > user program must not use it. > > > > > > > Totally agree about documenting this. > > ok there's map page on sigreturn.. do you think we should add man page > for uretprobe or you can think of some other place to document it? I think it is better to have a man-page. Anyway, to discuss and explain this syscall, the man-page is a good format to describe it. > > > > > > > > > > > > > > > > > > > but we could limit the syscall to be executed just from the trampoline, > > > > > > that should prevent all the user space use cases, I'll do that in next > > > > > > version and add more tests for that > > > > > > > > > > Why not limit? :) The uprobe_handle_trampoline() expects it is called > > > > > only from the trampoline, so it is natural to check the caller address. > > > > > (and uprobe should know where is the trampoline) > > > > > > > > > > Since the syscall is always exposed to the user program, it should > > > > > - Do nothing and return an error unless it is properly called. > > > > > - check the prerequisites for operation strictly. > > > > > I concern that new system calls introduce vulnerabilities. > > > > > > > > > > > > > As Oleg and Jiri mentioned, this syscall can't harm kernel or other > > > > processes, only the process that is abusing the API. So any extra > > > > checks that would slow down this approach is an unnecessary overhead > > > > and complication that will never be useful in practice. > > > > > > I think at least it should check the caller address to ensure the > > > address is in the trampoline. > > > But anyway, uprobes itself can break the target process, so no one > > > might care if this system call breaks the process now. > > > > If we already have an expected range of addresses, then I think it's > > fine to do a quick unlikely() check. I'd be more concerned if we need > > to do another lookup or search to just validate this. I'm sure Jiri > > will figure it out. > > Oleg mentioned the trampoline address check could race with another > thread's mremap call, however trap is using that check as well, it > still seems like good idea to do it also in the uretprobe syscall Yeah, and also, can we add a stack pointer check if the trampoline is shared with other probe points? Thank you, > > jirka -- Masami Hiramatsu (Google)