Received: by 2002:a05:6a10:16a7:0:0:0:0 with SMTP id gp39csp769586pxb; Thu, 19 Nov 2020 13:23:22 -0800 (PST) X-Google-Smtp-Source: ABdhPJyhMdQkUZVlvwoq682cEOGvSDdslZDgvN5t1SAt9AFKQb+tLTtFm0M+5yj3aX4co+zSPOY9 X-Received: by 2002:a17:906:4c41:: with SMTP id d1mr31524371ejw.485.1605821002029; Thu, 19 Nov 2020 13:23:22 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1605821002; cv=none; d=google.com; s=arc-20160816; b=ReJiOsQimduCf5551usmm2UacflMJR3mZfML+5mG8EGeeIPyBczXHXIXa1fZfddFrh vemlM0URK3hmSkJl1FnJrezjWXDiQJ3kGhlMsYtQNNkb0c3zNdiWn+L7+lldm6s4coqK /CkZKtrDfXWti1z2GEsLUupEjqaYFGm3cJp3ovI8XXjo/vqx/Sbs38LXPQK98J7J2Fss LAaIk7/cGGD/hrxAxHx4IDsvoLyCpcTwhbtCQrsJxwO59Y7rTGcVZLA30HXdei+y0/xm 5T4gEUJAuHgGbDnqWwB9FJqHFibsDUDGtUzdSDYAfH8Jy130m5Z9ziNsLmL4DJcsq9YX iJwQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-language:content-transfer-encoding :in-reply-to:mime-version:user-agent:date:message-id:references:cc :to:from:subject:dkim-signature; bh=J2ebOPK/pjLdgiTg0vxZYJ8uGSx8XkWHUakpbOJUXQ8=; b=WGi3LBe9z3epByPc5Y/oA7Yx0V0HiBpcuauPSq7e8vkpuy+G9Acdwxn6m7xWvb4NgT 0is5TUIchWYqvTFHz6CSiZfxaRhxHUo/GnaindYbGX1juhP5oH//OqOkHnmvLShcwpt0 7DA8vr9orBUjmSmxzHpCw5VfMT/BRALcXtVnCtBiJz6tkNpBUHX7xvYUZQbEVnI7r15z QcM41RO8owJGuF6DFam4Mh8I+qAcgrGaK6iiy4XSKCphHd1ed4CaIG3sQ6umuJWJszsp blyVUkgaqqIQ1Ip/y2DZ0GarJ2cesrmeuePFy4YeYJCHYB4SVTGZ9Brm/uzHSOcxKc0B ahdg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=hE24fukD; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id q26si605761edi.57.2020.11.19.13.22.57; Thu, 19 Nov 2020 13:23:22 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=hE24fukD; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726297AbgKSVTU (ORCPT + 99 others); Thu, 19 Nov 2020 16:19:20 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42622 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726154AbgKSVTT (ORCPT ); Thu, 19 Nov 2020 16:19:19 -0500 Received: from mail-lj1-x233.google.com (mail-lj1-x233.google.com [IPv6:2a00:1450:4864:20::233]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0A96FC0613CF for ; Thu, 19 Nov 2020 13:19:18 -0800 (PST) Received: by mail-lj1-x233.google.com with SMTP id y16so7840008ljh.0 for ; Thu, 19 Nov 2020 13:19:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:from:to:cc:references:message-id:date:user-agent :mime-version:in-reply-to:content-transfer-encoding:content-language; bh=J2ebOPK/pjLdgiTg0vxZYJ8uGSx8XkWHUakpbOJUXQ8=; b=hE24fukD9LQSN2wALWYLKJgs7ZDBWTAXE7xsaC5di9MVRLaYY3Kcv8/H9qiu1LqIz8 MpvL8ee9fypfpEp2cNhLDUMHFKdHPZHGzt6AmZZ5a0eYbeqUR79Eh4sHIqVMJMgIfKMx 9MZy04hEnE8UlWfvVdAmnCCSGXgUBJLLK1qHUZ3c4XCQxe0bmL8aJsSTUfzkrRIy8NUZ AEbZIbcPJ6I2aSkHOUxscD8/sw5s87xCzqJUGoYfmPh12yNzPsWL6TDq3KyRayY840VK ur2SHvVhl+zDa71crOfHkHe0KNLnk3gYaOPLdCFZlWIxGTxyGTeH92re7VyLFVnetmFy rIUw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:references:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding :content-language; bh=J2ebOPK/pjLdgiTg0vxZYJ8uGSx8XkWHUakpbOJUXQ8=; b=U/BMZHWE7l2mGzm3AzTwlV1oEotegyo4j7yuVKxDOFAiLP+lVXUcZHrkBlZ6hQkCba ip1uGi7R8INns9c6lDDOjMIWVR8Xt56EL+Yv7nnYJj8SSChjh3CvHdkbV5KZrzsebLzz X7PAxHmlo86qQ33sHNSUiRsYq6Qxj2wX1XTBc9DyrdGEg0LAMI9W68RTzLS735/I+6wq ptRChmAOl3QymAJ/SpqZwPDkUflBZJ+MnP0UlMMcwp7CaH1v1je4HKLlvZysY31maU5r gQrzegidFEZNo6+vevLCC3xa5QeBx/+bOlGhkCD6EWDlkfVQHrCfYdUmyMKkuQf2mlx0 Crhg== X-Gm-Message-State: AOAM531BeDs/Gs+5pGTsVjmp7P2LOhzFcVJMJcMlSjmX9JKVZnplUgqB Dr65ANorElZCgRRr+3sayQH5HFyWGuU= X-Received: by 2002:a2e:7a0d:: with SMTP id v13mr7001086ljc.348.1605820755479; Thu, 19 Nov 2020 13:19:15 -0800 (PST) Received: from [192.168.1.62] (89-178-168-199.broadband.corbina.ru. [89.178.168.199]) by smtp.gmail.com with ESMTPSA id w28sm94395lfk.202.2020.11.19.13.19.14 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 19 Nov 2020 13:19:14 -0800 (PST) Subject: Re: Kernel prctl feature for syscall interception and emulation From: Paul Gofman To: David Laight , 'Rich Felker' , Gabriel Krisman Bertazi Cc: "libc-alpha@sourceware.org" , Florian Weimer , "linux-kernel@vger.kernel.org" References: <873616v6g9.fsf@collabora.com> <20201119151317.GF534@brightrain.aerifal.cx> <87h7pltj9p.fsf@collabora.com> <20201119162801.GH534@brightrain.aerifal.cx> <87eekpmeux.fsf@collabora.com> <20201119173938.GJ534@brightrain.aerifal.cx> Message-ID: Date: Fri, 20 Nov 2020 00:19:13 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.4.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Content-Language: en-GB Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/19/20 23:54, Paul Gofman wrote: > On 11/19/20 20:57, David Laight wrote: >>>> The Windows code is not completely loaded at initialization time. It >>>> also has dynamic libraries loaded later. yes, wine knows the memory >>>> regions, but there is no guarantee there is a small number of segments >>>> or that the full picture is known at any given moment. >>> Yes, I didn't mean it was known statically at init time (although >>> maybe it can be; see below) just that all the code doing the loading >>> is under Wine's control (vs having system dynamic linker doing stuff >>> it can't reliably see, which is the case with host libraries). >> Since wine must itself make the mmap() system calls that make memory >> executable can't it arrange for windows code and linux code to be >> above/below some critical address? >> >> IIRC 32bit windows has the user/kernel split at 2G, so all the >> linux code could be shoe-horned into the top 1GB. >> >> A similar boundary could be picked for 64bit code. >> >> This would probably require flags to mmap() to map above/below >> the specified address (is there a flag for the 2G boundary >> these days - wine used to do very horrid things). >> It might also need a special elf interpreter to load the >> wine code itself high. >> > Wine does not control the loading of native libraries (which are subject > to ASLR and thus do not necessarily exactly follow mmap's top down > order). Wine is also not free to choose where to load the Windows > libraries. Some of Win libraries are relocatable, some are not. Even > those relocatable are still often assumed to be loaded at the base > address specified in PE, with assumption made either by library itself > or DRM or sandboxing / hotpatching / interception code from around. > > Also, it is very common to DRMs to unpack the encrypted code to a newly > allocated segment (which gives no clue at the moment of allocation > whether it is going to be executable later), and then make it > executable. There are a lot of tricks about that and such code sometimes > assumes very specific (and Windows implementation dependent) things, in > particular, about the memory layout. Windows VirtualAlloc[Ex] gives the > way to request top down or bottom up allocation order, as well as > specific allocation address. The latter is not guaranteed to succeed of > course just like on Linux for obvious reasons, but if specific (high) > address rangesĀ  always have some space available on Windows, then there > are the apps in the wild which depend of that, as far as our practice goes. > > If we were given mmap flag for specifying memory allocation boundary, > and also a sort of process-wide dlopen() config option for specifying > that boundary for every host shared library load, the address space > separation could probably work... until we hit a tricky case when the > app wants to get a memory specifically high address range. I think we > can't do that cleanly as both Windows and Linux currently have the same > 128TB limit for user address space on x64 and we've got no spare space > to safely put native code without potential interference with Windows code. > Maybe it is also interesting to mention that the initial Gabriel's patches version was introducing the emulation trigger by specifying a flag for memory region through mprotect(), so we could mark the regions calls from which should be trapped. That would be probably the easiest possible solution in terms of using that in Wine (as no memory allocated by Wine itself is supposed to contain native host syscalls) but that idea was not accepted. Mainly because, as I understand, such a functionality does not belong to VM management.