Received: by 2002:a25:ca44:0:0:0:0:0 with SMTP id a65csp569519ybg; Tue, 28 Jul 2020 13:03:56 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzaFMD2P3nCWOW1UuMqwsHHBIpNVivoKH4l4LR5cSg+m1XShKiqf8V6RjpH+aIpQ3iymTiJ X-Received: by 2002:a17:906:8595:: with SMTP id v21mr28026932ejx.333.1595966636776; Tue, 28 Jul 2020 13:03:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1595966636; cv=none; d=google.com; s=arc-20160816; b=QWwlbIVOMTUW1bBXcNW01lz4l8+DuR65YhzzP+PEZ0F5MrdrfmSYA2lEA4DjcUi66t dq0fAv/UO/2YqTazChnJTd610c2GIf1eoQ+FuoDo+f1dMLvHx4UTrpzqL/YnJdnmuJvw MB4e/cAW5KSeba/QOYz0yjezfVEslXHl9jJKSQdYxIY3KHEY4Fg7z2frQ92fKiTSyE7n 3KPXKEXqjFiAjI7+vmwa7oZaPX2c+ze2AhKsjIhMXwUL3HAnGsXz+dO21pW/Gse7bu5Q lLSFQxo7h/MpUR1p4UMvfeWLKl/H4dpFN676mOqGIvnyzldrF9rPF9RbI2Q+aTIitbcT nU7g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language :content-transfer-encoding:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature:dkim-filter; bh=zjLEFTK4Orgld2wOZdcW94az96+FP6KvaYi2k5aZSPI=; b=r5aTqLXBRkCVSiuSOH2AB//aGC6wnw80hIe+Mox/iCERBlQ0hpk5kK6ycc2csLfLH7 pMvhjwi3QDwD0fcglNvx6AwkjnBmGv5OOQK42kKIeA4DypCq7imv2NCO/U1YMo35Z5Jj LuL6qFYv7u+v0x4k3K4QlGV9dsp+oh+ticZBYUJG3xwG8ZDWeWxmi09PDkZZ28aXY9ZB Y2heYMcs7Ba/pGF2ea3Pn6K8KRKX7wiNImoPbnXCGNyni79MnEs+fw4qZTVp7uQ39x5G wZNoIpksp3us5xi3PRiOKZoPySuZSRu0GasboXXKDJrSLMVzZHevCtXBEgzmg/CcB6Nj u59w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux.microsoft.com header.s=default header.b=eyOucZIc; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.microsoft.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id j3si1091982edt.444.2020.07.28.13.03.34; Tue, 28 Jul 2020 13:03:56 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linux.microsoft.com header.s=default header.b=eyOucZIc; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.microsoft.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729269AbgG1TBP (ORCPT + 99 others); Tue, 28 Jul 2020 15:01:15 -0400 Received: from linux.microsoft.com ([13.77.154.182]:55922 "EHLO linux.microsoft.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728334AbgG1TBO (ORCPT ); Tue, 28 Jul 2020 15:01:14 -0400 Received: from [192.168.254.32] (unknown [47.187.206.220]) by linux.microsoft.com (Postfix) with ESMTPSA id 4BC4420B4908; Tue, 28 Jul 2020 12:01:13 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com 4BC4420B4908 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1595962873; bh=zjLEFTK4Orgld2wOZdcW94az96+FP6KvaYi2k5aZSPI=; h=Subject:To:Cc:References:From:Date:In-Reply-To:From; b=eyOucZIcQn3R6OEntESHtyelkaFVjV2UK/nFkDNLCPLGXdph7ZdzSj8TIdGEPaGtU HNfLQEbH03jQTrCFJfGmYdF0TdR5xD3V/YFVr3NEjScCz38PMbJbBy0OdJ42NGgnrr Ehzo1Gfr/DDSOvCbbINwW1aQjqHKXsQAPhe9ttIA= Subject: Re: [PATCH v1 0/4] [RFC] Implement Trampoline File Descriptor To: Andy Lutomirski Cc: Kernel Hardening , Linux API , linux-arm-kernel , Linux FS Devel , linux-integrity , LKML , LSM List , Oleg Nesterov , X86 ML References: <20200728131050.24443-1-madvenka@linux.microsoft.com> From: "Madhavan T. Venkataraman" Message-ID: <8b28f4a5-2d9e-0686-40e5-2ea9e37c5933@linux.microsoft.com> Date: Tue, 28 Jul 2020 14:01:12 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Content-Language: en-US Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org I am working on a response to this. I will send it soon. Thanks. Madhavan On 7/28/20 12:31 PM, Andy Lutomirski wrote: >> On Jul 28, 2020, at 6:11 AM, madvenka@linux.microsoft.com wrote: >> >> From: "Madhavan T. Venkataraman" >> >> The kernel creates the trampoline mapping without any permissions. When >> the trampoline is executed by user code, a page fault happens and the >> kernel gets control. The kernel recognizes that this is a trampoline >> invocation. It sets up the user registers based on the specified >> register context, and/or pushes values on the user stack based on the >> specified stack context, and sets the user PC to the requested target >> PC. When the kernel returns, execution continues at the target PC. >> So, the kernel does the work of the trampoline on behalf of the >> application. > This is quite clever, but now I’m wondering just how much kernel help > is really needed. In your series, the trampoline is an non-executable > page. I can think of at least two alternative approaches, and I'd > like to know the pros and cons. > > 1. Entirely userspace: a return trampoline would be something like: > > 1: > pushq %rax > pushq %rbc > pushq %rcx > ... > pushq %r15 > movq %rsp, %rdi # pointer to saved regs > leaq 1b(%rip), %rsi # pointer to the trampoline itself > callq trampoline_handler # see below > > You would fill a page with a bunch of these, possibly compacted to get > more per page, and then you would remap as many copies as needed. The > 'callq trampoline_handler' part would need to be a bit clever to make > it continue to work despite this remapping. This will be *much* > faster than trampfd. How much of your use case would it cover? For > the inverse, it's not too hard to write a bit of asm to set all > registers and jump somewhere. > > 2. Use existing kernel functionality. Raise a signal, modify the > state, and return from the signal. This is very flexible and may not > be all that much slower than trampfd. > > 3. Use a syscall. Instead of having the kernel handle page faults, > have the trampoline code push the syscall nr register, load a special > new syscall nr into the syscall nr register, and do a syscall. On > x86_64, this would be: > > pushq %rax > movq __NR_magic_trampoline, %rax > syscall > > with some adjustment if the stack slot you're clobbering is important. > > > Also, will using trampfd cause issues with various unwinders? I can > easily imagine unwinders expecting code to be readable, although this > is slowly going away for other reasons. > > All this being said, I think that the kernel should absolutely add a > sensible interface for JITs to use to materialize their code. This > would integrate sanely with LSMs and wouldn't require hacks like using > files, etc. A cleverly designed JIT interface could function without > seriailization IPIs, and even lame architectures like x86 could > potentially avoid shootdown IPIs if the interface copied code instead > of playing virtual memory games. At its very simplest, this could be: > > void *jit_create_code(const void *source, size_t len); > > and the result would be a new anonymous mapping that contains exactly > the code requested. There could also be: > > int jittfd_create(...); > > that does something similar but creates a memfd. A nicer > implementation for short JIT sequences would allow appending more code > to an existing JIT region. On x86, an appendable JIT region would > start filled with 0xCC, and I bet there's a way to materialize new > code into a previously 0xcc-filled virtual page wthout any > synchronization. One approach would be to start with: > > > 0xcc > 0xcc > ... > 0xcc > > and to create a whole new page like: > > > > 0xcc > ... > 0xcc > > so that the only difference is that some code changed to some more > code. Then replace the PTE to swap from the old page to the new page, > and arrange to avoid freeing the old page until we're sure it's gone > from all TLBs. This may not work if spans a page > boundary. The #BP fixup would zap the TLB and retry. Even just > directly copying code over some 0xcc bytes almost works, but there's a > nasty corner case involving instructions that fetch I$ fetch > boundaries. I'm not sure to what extent I$ snooping helps. > > --Andy