Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp5863675imu; Tue, 13 Nov 2018 12:59:48 -0800 (PST) X-Google-Smtp-Source: AJdET5cdWKKUglah0I6HpHdN00G7koXbkLeoRybkdcEMBIbtkUiYiVcrcsGkWxXpkDvK9xqGn3/E X-Received: by 2002:a62:59cd:: with SMTP id k74-v6mr4617008pfj.243.1542142788016; Tue, 13 Nov 2018 12:59:48 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542142787; cv=none; d=google.com; s=arc-20160816; b=oY+JRXlKYtdINnqvl+loRdxEFj4coj4t6yY66sKQHJGKNp+OM2Hd92CR+biqYy4cwU z9Mm89NY8NQPKEzjyH0gqXQ67jhJ/NFM0AhTaS2Jt4FLvv3UvkOxUb0WYCGtzBhjnH7a A6J+bkllZ1TBy2w5fHaHmgkE+xzqjkYraEM8wp6tjU73QIQlEyexcXt/vbr93ycYsk7/ Fs8Dqa+G3G7U3kH3OJ+wYEIoEiq7QMprwvbiIUc/Om0CEKjV7PNSAF5PVq/ZS1oGd6NO R9ni5EQhWdJXzfOTf96iYkMOfOwW1MF+laG3iCwNx+qTmPJRLIlpkd6tA1Rl19OIqte3 WAfg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:to:references:message-id :content-transfer-encoding:cc:date:in-reply-to:from:subject :mime-version:dkim-signature; bh=hdZDM/3I+pDA/9yywM0qy7D8QuhHoAKpXa2MeX7LccI=; b=UdP5ISWWPzjPk6X8YmG2P4dCkTGfU72yiPXRQ6/GJ8C6rC+eIdJlNfinx8/tB3caGo nFwMvW1/ZDfLUsK0w3pfW4rb24UC6hm2MCJFZ5NFG1PcVivxDSH8tsGD22bzvogIj2o/ yYeBK6O6X3lZtf11sLWuGkMjJnRYBuqXPIrEXTk4WBuHXTd1nHvzSO8K9ny3HUA/AJu2 wuC908EAe7CpAHh0SWITJkMFXEtBRH4B7bybSIzqY8V7toQDf/SAAzznFOY+Zy/qPXtI QEFaTtEWShxybvEiv4W7ld4wjDZkIUuASiNbPwfxIbbt29Hv+EbVE69BUPgq494D/7Cj kgSA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@amacapital-net.20150623.gappssmtp.com header.s=20150623 header.b=tvHv4q0m; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id t126-v6si24131016pfd.133.2018.11.13.12.59.31; Tue, 13 Nov 2018 12:59:47 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@amacapital-net.20150623.gappssmtp.com header.s=20150623 header.b=tvHv4q0m; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730223AbeKNG6h (ORCPT + 99 others); Wed, 14 Nov 2018 01:58:37 -0500 Received: from mail-pf1-f193.google.com ([209.85.210.193]:40233 "EHLO mail-pf1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728730AbeKNG6h (ORCPT ); Wed, 14 Nov 2018 01:58:37 -0500 Received: by mail-pf1-f193.google.com with SMTP id x2-v6so6674515pfm.7 for ; Tue, 13 Nov 2018 12:58:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amacapital-net.20150623.gappssmtp.com; s=20150623; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=hdZDM/3I+pDA/9yywM0qy7D8QuhHoAKpXa2MeX7LccI=; b=tvHv4q0ms2uuTZDQ9Iq2iewWKTXQpZqhVF0WvEiJbmsLTMXmFRtzVzL8/KQOI//Wvq Mde3zA5vaXaunzcE8xcZ+cjz0P6KrECTdoIEf8Zvtzt+QxZ9/9vPUrGYMSrX/3bhbq5+ oDTfPCh/Y6Qe/OJjAFXMeLmZOyo7xSZv/MvAc9cEgZubmErYMuTUh8ce4wC//S5f1NHD mVIS2h1Fp8QVuopk+xORBxE24Dgiq8IEv7q/yoKDGSeqUSJkdl+9wiTc7XrO/pMZqseo XVxub6KRMm5RnKtV1dBFMXACp3tBsEbo/cNVC5PL8YY2NfisuGVd76Zi6+/PQGblFz7+ ETsw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=hdZDM/3I+pDA/9yywM0qy7D8QuhHoAKpXa2MeX7LccI=; b=YYLCyufF/eU339pzuTFOCUgo9eU27JeQ0ouYvABL3SjzeC3oZyYQIosoItNLvDF4Xs 6T8/oHB9LTatccdUXuuFZGUA3WrH0ndJ0MUMZA8vhraYhfg7ifxhF9tHVAnf3WuSQl+y /NFRzhBb9RKrGs4Foh0YN3ir5QcOH+sqt9rOPBtTuQt7QWms7ay0FiIy7lL2MHM9/tF3 k/+U3RvATngpq4Hp82R6luZY/Nau18UB+2R9+JvEHjl1szpMVL0l8m5HeFr9x0ao3Ur6 /RvgoFYHuPflMz8PDwJzuKnH9UG3OywuN7jjlUqdZlH44P3it3j8gUJXMRjAIsVK5M6D 0j2w== X-Gm-Message-State: AGRZ1gI8RK8TnOji8OoxQxbga8FFhlZoPvJLm7XBIGFjiOoNLY47AazI K9k0+nE6BE6xTRTNu2aWTeDpUg== X-Received: by 2002:a62:5c41:: with SMTP id q62-v6mr6833834pfb.171.1542142722169; Tue, 13 Nov 2018 12:58:42 -0800 (PST) Received: from ?IPv6:2600:1012:b00f:2d6f:bc21:1d37:c350:a8ba? ([2600:1012:b00f:2d6f:bc21:1d37:c350:a8ba]) by smtp.gmail.com with ESMTPSA id z2sm19483261pgu.4.2018.11.13.12.58.40 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 13 Nov 2018 12:58:41 -0800 (PST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (1.0) Subject: Re: Official Linux system wrapper library? From: Andy Lutomirski X-Mailer: iPhone Mail (16A404) In-Reply-To: <20181113193859.GJ3505@e103592.cambridge.arm.com> Date: Tue, 13 Nov 2018 12:58:39 -0800 Cc: Daniel Colascione , Florian Weimer , "Michael Kerrisk (man-pages)" , linux-kernel , Joel Fernandes , Linux API , Willy Tarreau , Vlastimil Babka , Carlos O'Donell , "libc-alpha@sourceware.org" Content-Transfer-Encoding: quoted-printable Message-Id: <69B07026-5E8B-47FC-9313-E51E899FAFB0@amacapital.net> References: <877ehjx447.fsf@oldenburg.str.redhat.com> <875zx2vhpd.fsf@oldenburg.str.redhat.com> <20181113193859.GJ3505@e103592.cambridge.arm.com> To: Dave Martin Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > On Nov 13, 2018, at 11:39 AM, Dave Martin wrote: >=20 > On Mon, Nov 12, 2018 at 05:19:14AM -0800, Daniel Colascione wrote: >=20 > [...] >=20 >> We can learn something from how Windows does things. On that system, >> what we think of as "libc" is actually two parts. (More, actually, but >> I'm simplifying.) At the lowest level, you have the semi-documented >> ntdll.dll, which contains raw system call wrappers and arcane >> kernel-userland glue. On top of ntdll live the "real" libc >> (msvcrt.dll, kernel32.dll, etc.) that provide conventional >> application-level glue. The tight integration between ntdll.dll and >> the kernel allows Windows to do very impressive things. (For example, >> on x86_64, Windows has no 32-bit ABI as far as the kernel is >> concerned! You can still run 32-bit programs though, and that works >> via ntdll.dll essentially shimming every system call and switching the >> processor between long and compatibility mode as needed.) Normally, >> you'd use the higher-level capabilities, but if you need something in >> ntdll (e.g., if you're Cygwin) nothing stops your calling into the >> lower-level system facilities directly. ntdll is tightly bound to the >> kernel; the higher-level libc, not so. >>=20 >> We should adopt a similar approach. Shipping a lower-level >> "liblinux.so" tightly bound to the kernel would not only let the >> kernel bypass glibc's "editorial discretion" in exposing new >> facilities to userspace, but would also allow for tighter user-kernel >> integration that one can achieve with a simplistic syscall(2)-style >> escape hatch. (For example, for a long time now, I've wanted to go >> beyond POSIX and improve the system's signal handling API, and this >> improvement requires userspace cooperation.) The vdso is probably too >> small and simplistic to serve in this role; I'd want a real library. >=20 > Can you expand on your reasoning here? >=20 > Playing devil's advocate: >=20 > If the library is just exposing the syscall interface, I don't see > why it _couldn't_ fit into the vdso (or something vdso-like). >=20 > If a separate library, I'd be concerned that it would accumulate > value-add bloat over time, and the kernel ABI may start to creep since > most software wouldn't invoke the kernel directly any more. Even if > it's maintained in the kernel tree, its existence as an apparently > standalone component may encourage forking, leading to a potential > compatibility mess. >=20 > The vdso approach would mean we can guarantee that the library is > available and up to date at runtime, and may make it easier to keep > what's in it down to sane essentials. Hmm. Putting on my vDSO hat: The vDSO could provide all kinds of nifty things. Better exception handling c= omes to mind. But it has two major limitations that severely restrict what i= t can do: - It can=E2=80=99t allocate memory. We probably want to keep it that way. - It can=E2=80=99t use TLS. Solving this without genuinely awful ABI issue= s may be extremely hard. We *could* require callers to pass a thread pointer= in, I suppose. Also, if we make the vDSO stateful, CRIU is going to have a blast. We might= need to expose explicit save and restore abilities. As a straw man use case, it would be neat if DSOs (or the loader, maybe) cou= ld register a list of exception fixups per DSO. The kernel could consult th= ese lists before delivering a signal. ISTM it wouldn=E2=80=99t be so crazy i= f the vDSO handled registration, although it could uses syscalls as well. If= the vDSO did it, it would need somewhere to put the lists.=