Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp2885835imu; Sun, 11 Nov 2018 03:10:14 -0800 (PST) X-Google-Smtp-Source: AJdET5dq3pX8NaaUR3AbJ8v6aGpbnZPnvtzzSYlWvNgPibUX6NSzCH+KzbdHlO1+l4oo14Ka2wO1 X-Received: by 2002:a63:f412:: with SMTP id g18mr13826847pgi.262.1541934614811; Sun, 11 Nov 2018 03:10:14 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1541934614; cv=none; d=google.com; s=arc-20160816; b=MhdFTwuuamWazx9n4QdbB6A4ba2Tyg2O1QLkfwwMqjNxYVTPjHKJpEtQxTytADw8oo /Mdl1W5bYzx8Z0d7SKl2vig4qJiJtWfyI9KgGEsetZRs1jck86jcDLFuODsV9n8xLXPq hKrbhJuCuM/JN0HO42HfoVkwLjcC15yNIQz+Jt8v+ReOMljiJi9t1gOkCs0MmtXcRfNO P5bPwChvh7NTMl9rBmTUA6W2krQ5Y8F6MZ4C+guzkhCT/PfYxqEudV68ujb85mFDGOs9 PYDIGGfn0ArSVOIFncBGXc+CES4nR94A8U15sBP5+8UxN1omWV2shhTr4J5iGgIHWgUz IrpA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:message-id :in-reply-to:date:references:subject:cc:to:from; bh=YuMddr1IgLKs/S+3NPPBDCkH9JQqD+Vvi9brBCWV+Cs=; b=PrJhyAPIwa1fTd0k/N58kbbhqhaOJ8Q3PZfdW0j5wpcqk3JPajL6XbXQQiaOEmo570 9bQ3rWoPhaB1z6FllwK8BtCydZgM2hitMFKbbSPf/d7ve1E3m0cGulGxklqfRqdcH1Fi X1+zRMcX7sqX34YxNh9EBpUBskRGT0xvlY4cnP4pggcEqWzeNJP41K9BfH/XODWDJpi4 l+1FuFZYFfrRyXNRKAwXtn5z2wxK15wLLDqspTPe1+28T9o2rWXHLvGcMQCZqQLOUd+u c9PCVMNyVk5F2nciMDcIsp1jsuGMyHEpTP4y7nlqcIHCZHsuNwLxJmR3HOpya78uIIDT 6Xnw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id bg3-v6si14487331plb.350.2018.11.11.03.09.59; Sun, 11 Nov 2018 03:10:14 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727597AbeKKU5y (ORCPT + 99 others); Sun, 11 Nov 2018 15:57:54 -0500 Received: from mx1.redhat.com ([209.132.183.28]:60934 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727440AbeKKU5y (ORCPT ); Sun, 11 Nov 2018 15:57:54 -0500 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id E688B308338E; Sun, 11 Nov 2018 11:09:38 +0000 (UTC) Received: from oldenburg.str.redhat.com (ovpn-116-74.ams2.redhat.com [10.36.116.74]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 85DFD608F4; Sun, 11 Nov 2018 11:09:33 +0000 (UTC) From: Florian Weimer To: "Michael Kerrisk \(man-pages\)" Cc: Daniel Colascione , linux-kernel , Joel Fernandes , Linux API , Willy Tarreau , Vlastimil Babka , Carlos O'Donell , "libc-alpha\@sourceware.org" Subject: Re: Official Linux system wrapper library? References: Date: Sun, 11 Nov 2018 12:09:28 +0100 In-Reply-To: (Michael Kerrisk's message of "Sun, 11 Nov 2018 07:55:30 +0100") Message-ID: <877ehjx447.fsf@oldenburg.str.redhat.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.44]); Sun, 11 Nov 2018 11:09:39 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Michael Kerrisk: > [adding in glibc folk for comment] > > On 11/10/18 7:52 PM, Daniel Colascione wrote: >> Now that glibc is basically not adding any new system call wrappers, >> how about publishing an "official" system call glue library as part of >> the kernel distribution, along with the uapi headers? I don't think >> it's reasonable to expect people to keep using syscall(__NR_XXX) for >> all new functionality, especially as the system grows increasingly >> sophisticated capabilities (like the new mount API, and hopefully the >> new process API) outside the strictures of the POSIX process. > > As a quick glance at the glibc NEWS file shows, the above is not > quite true: > > [[ > Version 2.28 > * The renameat2 function has been added... > * The statx function has been added... > > Version 2.27 > * Support for memory protection keys was added. The header now > declares the functions pkey_alloc, pkey_free, pkey_mprotect... > * The copy_file_range function was added. > > Version 2.26 > * New wrappers for the Linux-specific system calls preadv2 and pwritev2. > > Version 2.25 > * The getrandom [function] have been added. > ]] > > I make that 11 system call wrappers added in the last 2 years. And you missed mlock2 and memfd_create. In some cases, we used system calls before the kernel had them (because the kernel does not add system calls consistently across architectures). On the other hand, this is only half of the story because distributions do not backport system call wrappers, even those that backport kernel implementations (or just rebase the kernel). This is something that could be fixed eventually, but it is realted to another problem: We had a patch for the membarrier system call, but the kernel developers could not tell us what the system call does in therms of the C/C++ memory model, and the kernel developers and our concurrency expert could not agree on documentation. A lot of the new system calls lack clear specifications or are just somewhat misdesigned. For example, pkey_alloc uses PKEY_DISABLE_WRITE and PKEY_DISABLE_ACCESS flags (where the latter implies disabling both read and write access), not something that matches the PROT_READ and PROT_WRITE flags used by mmap/mprotect. This caused problems when POWER support for pkey_alloc was added, and we are still working on resolving that. getrandom still causes boot delays because the kernel somehow fails to seed its internal pool before starting PID 1 even on mainstream hardware which has plenty of (true) randomness sources available, leading to indefinite blocking of getrandom. It seems to me that people have largely given up on fixing this in the upstream kernel. For copy_file_range, we still have debates whether the system call (and the glibc emulation) should preserve holes or not, and there a plans to lift the cross-device restriction. For renameat2, we already had a function in gnulib with the same name, but which did not provide the atomic RENAME_NOREPLACE behavior for which renameat2 was introduced. These problems are relevant to the backporting question. One relatively low-cost way do backport straight wrappers would be to put them as hidden functions into libc_nonshared.a. But with these uncertainties, this would be rather risky because fixing bugs of the wrappers would then require relinking. Thanks, Florian