Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4887BC25B4E for ; Sun, 22 Jan 2023 23:46:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230241AbjAVXqV (ORCPT ); Sun, 22 Jan 2023 18:46:21 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54758 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229990AbjAVXqT (ORCPT ); Sun, 22 Jan 2023 18:46:19 -0500 Received: from mail.zytor.com (unknown [IPv6:2607:7c80:54:3::138]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DB9FA1207A for ; Sun, 22 Jan 2023 15:46:17 -0800 (PST) Received: from [IPV6:2601:646:8600:40c0:425:cd56:6750:e1bf] ([IPv6:2601:646:8600:40c0:425:cd56:6750:e1bf]) (authenticated bits=0) by mail.zytor.com (8.17.1/8.17.1) with ESMTPSA id 30MNjnNG2107520 (version=TLSv1.3 cipher=TLS_AES_128_GCM_SHA256 bits=128 verify=NO); Sun, 22 Jan 2023 15:45:50 -0800 DKIM-Filter: OpenDKIM Filter v2.11.0 mail.zytor.com 30MNjnNG2107520 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=zytor.com; s=2023010601; t=1674431154; bh=Hx31l10Tn1gyxYATdZwdRteTD2Eb2dvRep72nyd3u+U=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=AMlC9J+sE/zP9qpAc8xnvSoXP39PXLgLIgukeZl/p4ABmFwFiiOn5Kt0e0qOBKwXc 6g1egH54n/W9+hqQmE0A8Xt0/H2mzzV6ox3zNJVRQlJtTzc1Cwvd+AA5l7Zmj5heNJ 8KtfVz0Cszxf/i29y7FQNiSHaD5+A0wd3+r86xabQAgo/Iw9obEksz/aQI6dCcbd7b VD2a9KAE555srzktwzWhffY0NZbyMAC/wdrxgxjNaEbu+uYqGnegyuh1oHDMpqu0gy XgF12EP33xwte7IWwxRyWNRCx8V4PHtvvg9DvPxinUN18exRo/MSJlA+RxdifCBJbO bD632RgBOc+ag== Message-ID: <25b96960-a07e-a952-5c23-786b55054126@zytor.com> Date: Sun, 22 Jan 2023 15:45:44 -0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.6.0 Subject: Re: the x86 sysret_rip test fails on the Intel FRED architecture Content-Language: en-US To: Dave Hansen , "Li, Xin3" , "tglx@linutronix.de" , "mingo@redhat.com" , "bp@alien8.de" , "peterz@infradead.org" , "dave.hansen@linux.intel.com" Cc: "x86@kernel.org" , "linux-kernel@vger.kernel.org" References: <5d4ad3e3-034f-c7da-d141-9c001c2343af@intel.com> <18B5DB6D-AEBD-4A67-A7B3-CE64940819B7@zytor.com> From: "H. Peter Anvin" In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 1/21/23 20:34, Dave Hansen wrote: > On 1/21/23 19:38, Li, Xin3 wrote: >>> However, it doesn't seem to make sense to do so to me. The current behavior is >>> much more of an artifact than desired behavior. >> We kind of have an agreement that %r11 = %flags after returning from the kernel. >> >> And the question is, is it what we want? > > Can the selftest just set r11=rflags before the syscall? The old > syscall entry path will set r11=rflags. The FRED path won't touch it. > Either case will pass an r11==rflags check. Thinking about this some more, the whole point here is to make sure that all syscalls work consistently, so that we don't leak a kernel implementation detail. Either r11 and rcx should both be clobbered (set to %rcx and the return %rip respectively), or none should be. As such, I think we want to do something like: /* Arbitrary values */ const unsigned long r11_sentinel = 0xfeedfacedeadbeef; const unsigned long rcx_sentinel = 0x5ca1ab1e0b57ac1e; /* An arbitrary *valid* RFLAGS value */ const unsigned long rflags_sentinel = 0x200893; enum regs_ok { REGS_ERROR = -1, /* Invalid register contents */ REGS_SAVED = 0, /* Registers properly preserved */ REGS_SYSRET = 1 /* Registers match syscall/sysret */ }; /* * Returns: * 0 = %rcx and %r11 preserved * 1 = %rcx and %r11 set to %rflags and %rip * -1 = %rcx and/or %r11 set to any other values * * Note that check_regs_syscall() set %rbx to the syscall return %rip. */ static enum regs_ok check_regs_result(unsigned long r11, unsigned long rcx, unsigned long rbx) { if (r11 == r11_sentinel && rcx == rcx_sentinel) return REGS_SAVED; else if (r11 == rflags_sentinel && rcx == rbx) return REGS_SYSRET; else return REGS_ERROR; } static enum regs_ok check_regs_syscall(int syscall, unsigned long arg1, unsigned long arg2) { register unsigned long r11 asm("%r11"); unsigned long rcx, rbx, tmp; r11 = r11_sentinel; rcx = rcx_sentinel; asm volatile("push %3; popf; " "lea 1f(%%rip),%2; " "syscall; " "1:" : "+r" (r11), "+c" (rcx), "=b" (rbx) : "g" (rflags_sentinel), "a" (syscall), "D" (arg1), "S" (arg2)); return check_regs_result(r11, rcx, rbx); } enum regs_ok check_regs_getppid(void) { return check_regs_syscall(__NR_getppid, 0, 0); } Test it out with a trivial system call like __NR_getppid which is extremely likely to return with SYSRET on an IDT system; check that it returns a nonnegative value and then save the result. All tests from that point on should return the *same* value, including the signal handler etc. I make the result-checking function separate so that it can be fed data from e.g. the signal stack or ptrace. -hpa