Received: by 2002:a5b:505:0:0:0:0:0 with SMTP id o5csp41409ybp; Thu, 3 Oct 2019 09:55:17 -0700 (PDT) X-Google-Smtp-Source: APXvYqyDm5638AbOMuTB5viomU1Twsq0HvGH38I7ljGou7Kuiuc3WC3UUid/DkKprAeXZgRHf3kS X-Received: by 2002:a17:906:5584:: with SMTP id y4mr8593029ejp.89.1570121717662; Thu, 03 Oct 2019 09:55:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1570121717; cv=none; d=google.com; s=arc-20160816; b=PepVCQSDRuT6ffoUoPoA5NJ4dcsXXxhT/vwG9lOUZ1VUg5R5gJ+UEsYysRu/ian7RX Qa7EJFF4q05bKtw19gqadwFr827z9tQXQc2Je620m4jh5T/2AWJR2FsS+1gCI1svKmRb wcHTYmNQDEnLVo92yBn8ty986VmlHUtnbQulWBJVT6DMKuIUdjvMxpm2CGzso1yoxGiC cR5K/y8suPTosQRK5THgm5QZv9uVaYwQk7eMeuTq77k+DFyPYBcSbplrDY31xBTEI09R /c4bReIvIDaV5OiLTI0q1Y1962mMFVvEozLNWJ7cfdBi5Wipi8OoT5SN2hXHNrjfw0kD WLhQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=45dDwdzFZsQMpgYQpZ0K8kJuosc7V5CSZ9/Vrht0WTU=; b=qddTSvrflvh9a4ZKd/cSNDu9jAOl0S+MZ4NJsesLlrg2fJCC67RqdwRRgZUKkICEtI 3aONejabZPAtuOORruRNOwMlAcRcWp5IAb1HAmTjtNe+vIJwC7g7yVbpXPWFQc1sQZ6m 6bK8U9sZOn7U49KRYvqKrZtcm8v3x9zWlRnFldrEMKTPuYxG5y0M7maZQl52Wekufp9c m66fKWCcKAo8F1lhWA/8X159SWfalJRAogXT5yOr/LCyttib7k1PxhvIAyMO5o69kRQC tZW5rhZ1yXDuPkjOOJlLzzhNttdLyFMtOrnleQmOyjR7WA+NGbkUAq0x31aDmosAb45V fysQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g19si1511507ejj.120.2019.10.03.09.54.52; Thu, 03 Oct 2019 09:55:17 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731802AbfJCQxY (ORCPT + 99 others); Thu, 3 Oct 2019 12:53:24 -0400 Received: from mx2.mailbox.org ([80.241.60.215]:42194 "EHLO mx2.mailbox.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2405232AbfJCQxT (ORCPT ); Thu, 3 Oct 2019 12:53:19 -0400 Received: from smtp2.mailbox.org (smtp1.mailbox.org [IPv6:2001:67c:2050:105:465:1:1:0]) (using TLSv1.2 with cipher ECDHE-RSA-CHACHA20-POLY1305 (256/256 bits)) (No client certificate requested) by mx2.mailbox.org (Postfix) with ESMTPS id E55FEA33D1; Thu, 3 Oct 2019 18:53:15 +0200 (CEST) X-Virus-Scanned: amavisd-new at heinlein-support.de Received: from smtp2.mailbox.org ([80.241.60.240]) by spamfilter06.heinlein-hosting.de (spamfilter06.heinlein-hosting.de [80.241.56.125]) (amavisd-new, port 10030) with ESMTP id 2oKS5ApijqDi; Thu, 3 Oct 2019 18:53:11 +0200 (CEST) Date: Fri, 4 Oct 2019 02:53:01 +1000 From: Aleksa Sarai To: Jonathan Corbet Cc: Christian Brauner , linux-kernel@vger.kernel.org, libc-alpha@sourceware.org, Christian Brauner , Federico Vaga , Alessia Mantegazza , Guillaume Dore , linux-doc@vger.kernel.org, linux-abi@vger.kernel.org, Michael Kerrisk Subject: Re: [PATCH] Documentation: update about adding syscalls Message-ID: <20191003165301.ihrlrgcldgun7dld@yavin.dot.cyphar.com> References: <20191002151437.5367-1-christian.brauner@ubuntu.com> <20191003103931.52683cb6@lwn.net> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="ecmiy52y44gvxiaw" Content-Disposition: inline In-Reply-To: <20191003103931.52683cb6@lwn.net> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --ecmiy52y44gvxiaw Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 2019-10-03, Jonathan Corbet wrote: > [Expanding CC a bit; this is the sort of change I'm reluctant to take > without being sure it reflects what the community thinks.] >=20 > On Wed, 2 Oct 2019 17:14:37 +0200 > Christian Brauner wrote: >=20 > > Add additional information on how to ensure that syscalls with structure > > arguments are extensible and add a section about naming conventions to > > follow when adding revised versions of already existing syscalls. > >=20 > > Co-Developed-by: Aleksa Sarai > > Signed-off-by: Aleksa Sarai > > Signed-off-by: Christian Brauner > > --- > > Documentation/process/adding-syscalls.rst | 82 +++++++++++++++++++---- > > 1 file changed, 70 insertions(+), 12 deletions(-) > >=20 > > diff --git a/Documentation/process/adding-syscalls.rst b/Documentation/= process/adding-syscalls.rst > > index 1c3a840d06b9..93e0221fbb9a 100644 > > --- a/Documentation/process/adding-syscalls.rst > > +++ b/Documentation/process/adding-syscalls.rst > > @@ -79,7 +79,7 @@ flags, and reject the system call (with ``EINVAL``) i= f it does:: > > For more sophisticated system calls that involve a larger number of ar= guments, > > it's preferred to encapsulate the majority of the arguments into a str= ucture > > that is passed in by pointer. Such a structure can cope with future e= xtension > > -by including a size argument in the structure:: > > +by either including a size argument in the structure:: > > =20 > > struct xyzzy_params { > > u32 size; /* userspace sets p->size =3D sizeof(struct xyzzy_pa= rams) */ > > @@ -87,20 +87,56 @@ by including a size argument in the structure:: > > u64 param_2; > > u64 param_3; > > }; > > + int sys_xyzzy(struct xyzzy_params __user *uarg); > > + /* in case of -E2BIG, p->size is set to the in-kernel size and thu= s all > > + extensions after that offset are unsupported. */ >=20 > That comment kind of threw me for a loop - this is the first mention of > E2BIG and readers may not just know what's being talked about. Especially > since the comment suggests *not* actually returning an error. I probably could've worded this better -- this comment describes what userspace sees when they use the API (sched_setattr(2) is an example of this style of API). In the case where the kernel doesn't support a requested extension (usize > ksize, and there are non-zero bytes past ksize) then the kernel returns -E2BIG *but also* sets p->size to ksize so that userspace knows what extensions the kernel supports. Maybe I should've replicated more of the details from the kernel-doc for copy_struct_from_user(). > > -As long as any subsequently added field, say ``param_4``, is designed = so that a > > -zero value gives the previous behaviour, then this allows both directi= ons of > > -version mismatch: > > +or by including a separate argument that specifies the size:: > > =20 > > - - To cope with a later userspace program calling an older kernel, the= kernel > > - code should check that any memory beyond the size of the structure = that it > > - expects is zero (effectively checking that ``param_4 =3D=3D 0``). > > - - To cope with an older userspace program calling a newer kernel, the= kernel > > - code can zero-extend a smaller instance of the structure (effective= ly > > - setting ``param_4 =3D 0``). > > + struct xyzzy_params { > > + u32 param_1; > > + u64 param_2; > > + u64 param_3; > > + }; > > + /* userspace sets @usize =3D sizeof(struct xyzzy_params) */ > > + int sys_xyzzy(struct xyzzy_params __user *uarg, size_t usize); > > + /* in case of -E2BIG, userspace has to attempt smaller @usize valu= es > > + to figure out which extensions are unsupported. */ >=20 > Here too. But what I'm really wondering about now is: you're describing > different behavior for what are essentially two cases of the same thing. > Why should the kernel simply accept the smaller size if the size is > embedded in the structure itself, but return an error and force user space > to retry if it's a separate argument? >=20 > I guess maybe because in the latter case the kernel can't easily return > the size it's actually using? I think that should be explicit if so. As above, the -E2BIG only happens if userspace is trying to use an extension that the kernel doesn't support (usize > ksize, non-zero bytes after ksize). The main difference between the two API styles is whether or not userspace gets told what ksize is explicitly in the case of an -E2BIG. Maybe it would be less confusing to only mention one of ways of doing it, but then we have to pick one (and while the newer syscalls [clone3 and openat2] use a separate argument, there are more syscalls which embed it in the struct). --=20 Aleksa Sarai Senior Software Engineer (Containers) SUSE Linux GmbH --ecmiy52y44gvxiaw Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQSxZm6dtfE8gxLLfYqdlLljIbnQEgUCXZYnaQAKCRCdlLljIbnQ EhlVAQCqbspnt/NT+1gkI4JVdUZFAWs0Fw/uNiDdbdMPIaf3JAEA/PeY3tJKd9bc O9UhoWmz9VranrMXxxyR6e0LVw7J3wo= =BOSH -----END PGP SIGNATURE----- --ecmiy52y44gvxiaw--