Received: by 2002:a25:683:0:0:0:0:0 with SMTP id 125csp655602ybg; Mon, 1 Jun 2020 10:54:27 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwyfcnReipig8Mw//R9PZEFNbo5BPyRzJAVm9emagfcRkbVhhFIs8GAkX5F6Nixiay8fhPg X-Received: by 2002:a17:906:4a8d:: with SMTP id x13mr9994543eju.26.1591034067648; Mon, 01 Jun 2020 10:54:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1591034067; cv=none; d=google.com; s=arc-20160816; b=hjVBN9tWbKw1KeIuqUhgDeXLjBVETozP7ptrzVoCyXRKUNodRTuaIyAcP9I7AIiUu+ 9pB380um+uI2A7NYqM8qXXu0679+lMiIlEmr6xjhPmLBH4hdbX3NzWCTUDjnAtwW+kU/ Xw2aZomRqXO5Q9HXmC5P5tgnueY1Xbqhedw7sUexoA+l6XmHElR57LV3OxezYQ9ukzhX G2o9x7nh6e1lqq8VRCxz09zt8/d9lmxpJ/9ka4IGnccIAGcs3rVebb9APUJLE1Cwqqx5 V4cETgHyxoRka0SoYHLPl66FdvvudK3+umpMHTQu5ADt74iLdD7/tdxVLANnAK+YnKgF ke3g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:from:cc:to:subject :content-transfer-encoding:mime-version:references:in-reply-to :user-agent:date:dkim-signature:dkim-filter; bh=fr+l0XoKIlsqLxaqesK1j3eC3xk+9waplQgCOIlQuTc=; b=gKoB0rhnyh7cEL9sK8dzWi0vidB763I9jz9LgpKrjXsFfOM4UiPcmVEADVu9Vj1rI1 U4PcXoIpLH6foSkCbqJP+ivkQA8qRNHHYkNSOmrM7EzKxnUO4/mExVZzqn09gZw1b/Ij iWZ1yBzfWn5gC1qneA6DL3g8NBVFRK+0tXqbYlhazmFvCAoybOAL2MmQ7KZN0E6a6slL C8y/q7GyphW3G490t4VcZHv1qszJhQD2trHncNrr77Nz9EWg54lwZh/Kq/jh/LybFjG+ 61dLg9Q6tICXQR8jlWKy9hg8qNnYU9q4zlSSo6YyUCq7BjRst2pXI2VUVk92Sb55TCPg wTmA== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@zytor.com header.s=2020052301 header.b=s71ec2kV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=zytor.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id ly20si112617ejb.474.2020.06.01.10.54.04; Mon, 01 Jun 2020 10:54:27 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=fail header.i=@zytor.com header.s=2020052301 header.b=s71ec2kV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=zytor.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726176AbgFARtF (ORCPT + 99 others); Mon, 1 Jun 2020 13:49:05 -0400 Received: from terminus.zytor.com ([198.137.202.136]:46603 "EHLO mail.zytor.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726201AbgFARtE (ORCPT ); Mon, 1 Jun 2020 13:49:04 -0400 Received: from [IPv6:2601:646:8600:3281:e829:29be:a995:f934] ([IPv6:2601:646:8600:3281:e829:29be:a995:f934]) (authenticated bits=0) by mail.zytor.com (8.15.2/8.15.2) with ESMTPSA id 051HmhLE847708 (version=TLSv1.3 cipher=TLS_AES_128_GCM_SHA256 bits=128 verify=NO); Mon, 1 Jun 2020 10:48:45 -0700 DKIM-Filter: OpenDKIM Filter v2.11.0 mail.zytor.com 051HmhLE847708 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=zytor.com; s=2020052301; t=1591033726; bh=fr+l0XoKIlsqLxaqesK1j3eC3xk+9waplQgCOIlQuTc=; h=Date:In-Reply-To:References:Subject:To:CC:From:From; b=s71ec2kVLNZ+dSNa25KG4sOuAKLvazy/amPQEfMieZGkdCGgsNZ/ASkyu402XDTe/ DpTWgvm9yEZXtLUUypmJH03G3MTWQ4EiUumVUwNOVNu/EXcD6yYwF4A24AmYaKHLF8 Zoiyp8svPCw19ad6k088vQGuQQG4+KX6ThKD4GwN1ZfcKuqtlm+kll3EQImHAaUkwd plaUubvQj1a48MphU90phImXi9Bl0j5T/ITaZhE6KeiEK72v6aEK75yHFjGOkFmLxW ssmSLpqoLNLYlk7qvBcTO4FONnUvq+XRcyH1oeoYDtvJ3CFG6+AfpyiuVhMVxCeHZ6 xfnHPWzA2BKEQ== Date: Mon, 01 Jun 2020 10:48:36 -0700 User-Agent: K-9 Mail for Android In-Reply-To: <3691744C-F4BC-49C6-9450-52E31DD14A92@amacapital.net> References: <3691744C-F4BC-49C6-9450-52E31DD14A92@amacapital.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [PATCH RFC] seccomp: Implement syscall isolation based on memory areas To: Andy Lutomirski , Billy Laws CC: krisman@collabora.com, gofmanp@gmail.com, keescook@chromium.org, kernel@collabora.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, tglx@linutronix.de, wad@chromium.org From: hpa@zytor.com Message-ID: Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On June 1, 2020 6:59:26 AM PDT, Andy Lutomirski wro= te: > > >> On Jun 1, 2020, at 2:23 AM, Billy Laws wrote: >>=20 >> =EF=BB=BF >>>=20 >>> On May 30, 2020, at 5:26 PM, Gabriel Krisman Bertazi > wrote: >>>=20 >>> =EF=BB=BFAndy Lutomirski writes: >>>=20 >>>>>>> On May 29, 2020, at 11:00 PM, Gabriel Krisman Bertazi > wrote: >>>>>>=20 >>>>>> =EF=BB=BFModern Windows applications are executing system call >instructions >>>>>> directly from the application's code without going through the >WinAPI=2E >>>>>> This breaks Wine emulation, because it doesn't have a chance to >>>>>> intercept and emulate these syscalls before they are submitted to >Linux=2E >>>>>>=20 >>>>>> In addition, we cannot simply trap every system call of the >application >>>>>> to userspace using PTRACE_SYSEMU, because performance would >suffer, >>>>>> since our main use case is to run Windows games over Linux=2E=20 >Therefore, >>>>>> we need some in-kernel filtering to decide whether the syscall >was >>>>>> issued by the wine code or by the windows application=2E >>>>=20 >>>> Do you really need in-kernel filtering? What if you could have >>>> efficient userspace filtering instead? That is, set something up >so >>>> that all syscalls, except those from a special address, are >translated >>>> to CALL thunk where the thunk is configured per task=2E Then the >thunk >>>> can do whatever emulation is needed=2E >>>=20 >>> Hi, >>>=20 >>> I suggested something similar to my customer, by using >>> libsyscall-intercept=2E The idea would be overwritting the syscall >>> instruction with a call to the entry point=2E I'm not a specialist on >the >>> specifics of Windows games, (cc'ed Paul Gofman, who can provide more >>> details on that side), but as far as I understand, the reason why >that >>> is not feasible is that the anti-cheat protection in games will >abort >>> execution if the binary region was modified either on-disk or >in-memory=2E >>>=20 >>> Is there some mechanism to do that without modiyfing the >application? >>=20 >> Hi, >>=20 >> I work on an emulator for the Nintendo Switch that uses a similar >technique, >> in our testing it works very well and is much more performant than >even >> PTRACE_SYSEMU=2E >>=20 >> To work around DRM reading the memory contents I think mprotect could >> be used, after patching the syscall a copy of the original code could >be >> kept somewhere in memory and the patched region mapped --X=2E >> With this, any time the DRM attempts to read to the patched region >and >> perform integrity checks it will cause a segfault and a branch to the >> signal handler=2E This handler can then return the contents of the >original, >> unpatched region to satisfy them checks=2E >>=20 >> Are memory contents checked by DRM solutions too often for this to be >> performant? > >A bigger issue is that hardware support for =E2=80=94X is quite spotty=2E= There >is no x86 CPU that can do it cleanly in a bare metal setup, and client >CPUs that can do it at all without hypervisor help may be nonexistent=2E >I don=E2=80=99t know if the ARM situation is much better=2E > >> -- >> Billy Laws >>>=20 >>>> Getting the details and especially the interaction with any seccomp >>>> filters that may be installed right could be tricky, but the >performance >>>> should be decent, at least on non-PTI systems=2E >>>>=20 >>>> (If we go this route, I suspect that the correct interaction with >>>> seccomp is that this type of redirection takes precedence over >seccomp >>>> and seccomp filters are not invoked for redirected syscalls=2E After >all, >>>> a redirected syscall is, functionally, not a syscall at all=2E) >>>>=20 >>>=20 >>>=20 >>> -- >>> Gabriel Krisman Bertazi Running these things in a minimal VM container would allow this kind of fi= ltering/trapping to be done in the VMM, too=2E I don't know how many layers= deep you invoke native Linux libraries, and so if the option would exist t= o use out-of-range system call numbers for the Linux system numbers? --=20 Sent from my Android device with K-9 Mail=2E Please excuse my brevity=2E