Received: by 2002:a05:6a10:a0d1:0:0:0:0 with SMTP id j17csp1562169pxa; Sun, 2 Aug 2020 12:02:33 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwXvE1OafRZPXuBG24Pbrnxp0cqRp4WWENX2jBHEuf673rMUqvMop848M/AgbAQt+w5sKHX X-Received: by 2002:a05:6402:21d3:: with SMTP id bi19mr12937230edb.56.1596394953570; Sun, 02 Aug 2020 12:02:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1596394953; cv=none; d=google.com; s=arc-20160816; b=av6xUubTYBt3p+axNZ6bsG+YfFMONP2JboU53kGa8Q0tXV8Mo6pthKQ3n2p+YXrn+W TOlVhpQxYBSF22n7upfcz+yhgvBQ6yZ7ydDWn2NbKW8Czz0GQVUYRE1uhD03XtIn1hv/ WnzfQ1WGs28oP+MLpOAhdSQSTaq4lE6kQLLpUhBAJPgIg53fc483ascXRu7Y/zZAhLf5 v+JyE3rNiCN27qW81UI37iT87fFinDjzBsdPra3h5IQ+KE0NNRM+u2pvBLOvGExZOVwc CvdfFuRMzoEy2+XGPEHGMeSZKCXo6c8A6QLzp5vpFkaAlpXBIcMbRwqlO04lJ9A07WB4 nrrg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language :content-transfer-encoding:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature:dkim-filter; bh=Bi0Avm087nVKLtlYIbtSWKURIDHCpmDKtLVQ5vuWflc=; b=Ju7AgdoT34MRpZvjXv10YUPu2PkOkMhn//LmLy4HcO5jo1IbZLiDO59/ICwUJreiUc JdH2igIjgkcqUSGqVkmC6QvlWyJ/MH7GsQy1tgS1DiRVvPQvL4MqXT4Pqk3SzIT4GwmH IzS+Vv8OF3tn+/mBEHyw+b6OEeHfCfhsW3+5E/BiIs260xVdTrPRivC+CYOgEDfKG6fk AOuUeggwKhVZ7vUB1F1TvNNGH2Mv8pdY+JGZg79hGPSG2iLUqhUXhGACSZde3iEF0oJd 63djTDXuGTuwtZ+P7H+o6PhEhS+q/LF1nSSvHwVRZfyPY495dFct7Oi4OdGPhepq1tV8 ykmA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux.microsoft.com header.s=default header.b=fHpnzFOJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.microsoft.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id b14si9119856eds.102.2020.08.02.12.01.56; Sun, 02 Aug 2020 12:02:33 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linux.microsoft.com header.s=default header.b=fHpnzFOJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.microsoft.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726797AbgHBSyj (ORCPT + 99 others); Sun, 2 Aug 2020 14:54:39 -0400 Received: from linux.microsoft.com ([13.77.154.182]:55680 "EHLO linux.microsoft.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725801AbgHBSyi (ORCPT ); Sun, 2 Aug 2020 14:54:38 -0400 Received: from [192.168.254.32] (unknown [47.187.206.220]) by linux.microsoft.com (Postfix) with ESMTPSA id 8B20620B4908; Sun, 2 Aug 2020 11:54:36 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com 8B20620B4908 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1596394477; bh=Bi0Avm087nVKLtlYIbtSWKURIDHCpmDKtLVQ5vuWflc=; h=Subject:To:Cc:References:From:Date:In-Reply-To:From; b=fHpnzFOJOlzwXKwSY8zPf6k+L+PvjjAKdLTYmh86GcRZ1GlLSdgkG9g2/aH5jJhcq rLnj76Yu6QdnUO6KTpkXFwOCQ9jicshEkSNSDxcUKT9TPhmvEepbEQ05Wq5VephZTP PadVn+hOSExc6VqPzwe7hlZHes7rqpGKqv+UP3nE= Subject: Re: [PATCH v1 0/4] [RFC] Implement Trampoline File Descriptor To: Andy Lutomirski Cc: Kernel Hardening , Linux API , linux-arm-kernel , Linux FS Devel , linux-integrity , LKML , LSM List , Oleg Nesterov , X86 ML References: <20200728131050.24443-1-madvenka@linux.microsoft.com> From: "Madhavan T. Venkataraman" Message-ID: <3b916198-3a98-bd19-9a1c-f2d8d44febe8@linux.microsoft.com> Date: Sun, 2 Aug 2020 13:54:35 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Content-Language: en-US Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org More responses inline.. On 7/28/20 12:31 PM, Andy Lutomirski wrote: >> On Jul 28, 2020, at 6:11 AM, madvenka@linux.microsoft.com wrote: >> >> From: "Madhavan T. Venkataraman" >> > > 2. Use existing kernel functionality. Raise a signal, modify the > state, and return from the signal. This is very flexible and may not > be all that much slower than trampfd. Let me understand this. You are saying that the trampoline code would raise a signal and, in the signal handler, set up the context so that when the signal handler returns, we end up in the target function with the context correctly set up. And, this trampoline code can be generated statically at build time so that there are no security issues using it. Have I understood your suggestion correctly? So, my argument would be that this would always incur the overhead of a trip to the kernel. I think twice the overhead if I am not mistaken. With trampfd, we can have the kernel generate the code so that there is no performance penalty at all. Signals have many problems. Which signal number should we use for this purpose? If we use an existing one, that might conflict with what the application is already handling. Getting a new signal number for this could meet with resistance from the community. Also, signals are asynchronous. So, they are vulnerable to race conditions. To prevent other signals from coming in while handling the raised signal, we would need to block and unblock signals. This will cause more overhead. > 3. Use a syscall. Instead of having the kernel handle page faults, > have the trampoline code push the syscall nr register, load a special > new syscall nr into the syscall nr register, and do a syscall. On > x86_64, this would be: > > pushq %rax > movq __NR_magic_trampoline, %rax > syscall > > with some adjustment if the stack slot you're clobbering is important. How is this better than the kernel handling an address fault? The system call still needs to do the same work as the fault handler. We do need to specify the register and stack contexts before hand so the system call can do its job. Also, this always incurs a trip to the kernel. With trampfd, the kernel could generate the code to avoid the performance penalty. > > Also, will using trampfd cause issues with various unwinders? I can > easily imagine unwinders expecting code to be readable, although this > is slowly going away for other reasons. I need to study unwinders a little before I respond to this question. So, bear with me. > All this being said, I think that the kernel should absolutely add a > sensible interface for JITs to use to materialize their code. This > would integrate sanely with LSMs and wouldn't require hacks like using > files, etc. A cleverly designed JIT interface could function without > seriailization IPIs, and even lame architectures like x86 could > potentially avoid shootdown IPIs if the interface copied code instead > of playing virtual memory games. At its very simplest, this could be: > > void *jit_create_code(const void *source, size_t len); > > and the result would be a new anonymous mapping that contains exactly > the code requested. There could also be: > > int jittfd_create(...); > > that does something similar but creates a memfd. A nicer > implementation for short JIT sequences would allow appending more code > to an existing JIT region. On x86, an appendable JIT region would > start filled with 0xCC, and I bet there's a way to materialize new > code into a previously 0xcc-filled virtual page wthout any > synchronization. One approach would be to start with: > > > 0xcc > 0xcc > ... > 0xcc > > and to create a whole new page like: > > > > 0xcc > ... > 0xcc > > so that the only difference is that some code changed to some more > code. Then replace the PTE to swap from the old page to the new page, > and arrange to avoid freeing the old page until we're sure it's gone > from all TLBs. This may not work if spans a page > boundary. The #BP fixup would zap the TLB and retry. Even just > directly copying code over some 0xcc bytes almost works, but there's a > nasty corner case involving instructions that fetch I$ fetch > boundaries. I'm not sure to what extent I$ snooping helps. I am thinking that the trampfd API can be used for addressing JIT code as well. I have not yet started thinking about the details. But I think the API is sufficient. E.g.,     struct trampfd_jit {         void    *source;         size_t    len;     };     struct trampfd_jit    jit;     struct trampfd_map    map;     void    *addr;     jit.source = blah;     jit.size = blah;     fd = syscall(440, TRAMPFD_JIT, &jit, flags);     pread(fd, &map, sizeof(map), TRAMPFD_MAP_OFFSET);     addr = mmap(NULL, map.size, map.prot, map.flags, fd, map.offset); And addr would be used to invoke the generated JIT code. Madhavan