Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp3179366ybl; Sun, 15 Dec 2019 04:36:03 -0800 (PST) X-Google-Smtp-Source: APXvYqw8pb4voPEgo0yOiDIt8NJXed+zzVUZZWu2tgpUxbdUcsS7R/ObwNed/j9MAQSsjPhPAWbr X-Received: by 2002:a05:6830:1047:: with SMTP id b7mr26986318otp.77.1576413363340; Sun, 15 Dec 2019 04:36:03 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1576413363; cv=none; d=google.com; s=arc-20160816; b=KPUxdM9/KsBPxyzIRFh9OxKZwbrBBVPOjnO2IXotbql8f5jL8h7ltai99XeSoPcY7B iEEBky0CNL1C5QsKGOzpWYqQlJr2FbQ1GWIGJ2VjvjLGZnyg6dwreCwA93WvymGC5n+E S3+ewLVQXLUMLDZp/q2/90q8L+NYQ4396EzKuAXXSKnoIJ+liL6WEYmzT+5DNvTlMrN7 k6Gcv/7PB1c3MLOCew0e9pHpMMc2LMaNbEpmIRNiGD57cQYF1YNAb63cuWJsWEfzAWdA umTOn3OPW6X35Lhk69pebajbHjUoY+9Su0uxLv31G/HwbJJ9sHY/+Wv/oUBWCI/ho1fD XxmA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=xR2YROM95vREAYK3o98DUPBdECtE9Dpdfx6/5uwT8O8=; b=O2Xkl5m8aOuoi3EpI20rsjrrSoTHjbzNuy1DU9Y4TiXdoVC71yrAW/0r8SwIVWTj+H 9FGqo3Se20F6F6pEyuy51gvM8M4QWcq43u8B1B738IQwJzDHA7MLyCUJdk+X403VPvfL Cm3ScIe7ffSJG4oTVQy6wIjGBwRiy6rsBN2dvx9/q1SGZVYzuWKxeBocmY87TDHhRF42 kOo24Vtn74a+JsAbXTRBHmr+smzePtnk4mvFVG8g6hYDx1IIsIqjkoZ0hzqxBfinqdZl 7toYkVlS0ozQaXuYGUAIS6Lsbb5oZH+3kZANQ4/BO6AJ2RddEr2Htt5+XCG3900iI6Td l58A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e15si8968762otp.161.2019.12.15.04.35.51; Sun, 15 Dec 2019 04:36:03 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726207AbfLOMfF (ORCPT + 99 others); Sun, 15 Dec 2019 07:35:05 -0500 Received: from mout-p-201.mailbox.org ([80.241.56.171]:41034 "EHLO mout-p-201.mailbox.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726101AbfLOMfE (ORCPT ); Sun, 15 Dec 2019 07:35:04 -0500 Received: from smtp1.mailbox.org (smtp1.mailbox.org [80.241.60.240]) (using TLSv1.2 with cipher ECDHE-RSA-CHACHA20-POLY1305 (256/256 bits)) (No client certificate requested) by mout-p-201.mailbox.org (Postfix) with ESMTPS id 47bP4P1TWdzQk8j; Sun, 15 Dec 2019 13:35:01 +0100 (CET) X-Virus-Scanned: amavisd-new at heinlein-support.de Received: from smtp1.mailbox.org ([80.241.60.240]) by spamfilter05.heinlein-hosting.de (spamfilter05.heinlein-hosting.de [80.241.56.123]) (amavisd-new, port 10030) with ESMTP id uAfNvh4N5EDU; Sun, 15 Dec 2019 13:34:56 +0100 (CET) Date: Sun, 15 Dec 2019 23:34:43 +1100 From: Aleksa Sarai To: Rasmus Villemoes Cc: Alexander Viro , Jeff Layton , "J. Bruce Fields" , Shuah Khan , dev@opencontainers.org, containers@lists.linux-foundation.org, linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org Subject: Re: [PATCH] openat2: switch to __attribute__((packed)) for open_how Message-ID: <20191215123443.jmfnrtgbscdwfohc@yavin.dot.cyphar.com> References: <20191213222351.14071-1-cyphar@cyphar.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="uvrtjxp4z3rdqsxa" Content-Disposition: inline In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --uvrtjxp4z3rdqsxa Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 2019-12-14, Rasmus Villemoes wrote: > On 13/12/2019 23.23, Aleksa Sarai wrote: > > The design of the original open_how struct layout was such that it > > ensured that there would be no un-labelled (and thus potentially > > non-zero) padding to avoid issues with struct expansion, as well as > > providing a uniform representation on all architectures (to avoid > > complications with OPEN_HOW_SIZE versioning). > >=20 > > However, there were a few other desirable features which were not > > fulfilled by the previous struct layout: > >=20 > > * Adding new features (other than new flags) should always result in > > the struct getting larger. However, by including a padding field, it > > was possible for new fields to be added without expanding the > > structure. This would somewhat complicate version-number based > > checking of feature support. > >=20 > > * A non-zero bit in __padding yielded -EINVAL when it should arguably > > have been -E2BIG (because the padding bits are effectively > > yet-to-be-used fields). However, the semantics are not entirely clear > > because userspace may expect -E2BIG to only signify that the > > structure is too big. It's much simpler to just provide the guarantee > > that new fields will always result in a struct size increase, and > > -E2BIG indicates you're using a field that's too recent for an older > > kernel. >=20 > And when the first extension adds another u64 field, that padding has to > be added back in and checked for being 0, at which point the padding is > again yet-to-be-used fields. Maybe I'm missing something, but what is the issue with struct open_how { u64 flags; u64 resolve; u16 mode; u64 next_extension; } __attribute__((packed)); It was my understanding that __aligned_u64 was used to ensure consistent layouts, not that it was needed for safety against unaligned accesses. > So what exactly is the problem with returning EINVAL now? I would argue that -EINVAL was the wrong choice of return code from the outset (and if we do keep the padding, I will send a patch to switch it to -E2BIG -- see below). The purpose of -E2BIG for the newer "extensible" syscalls is to differentiate between using an unsupported extension field and an unsupported (or invalid) flag. This will be useful for a few other extension ideas for these types of syscalls (related to allowing userspace to more efficiently figure out what flags are supported by the kernel without having to try each one separately). > > * The padding wasted space needlessly, and would very likely not be > > used up entirely by future extensions for a long time (because it > > couldn't fit a u64). >=20 > Who knows, it does fit a u32. And if the struct is to be 8-byte aligned > (see below), it doesn't actually waste space. Yeah, though giving it some more thought I think this might be a better layout to avoid this problem: struct open_how { __aligned_u64 flags; __aligned_u64 resolve; __u16 mode; __u16 __padding[3]; /* must be zero */ }; That way, we won't end up with a u16 which we never use (and we won't have multiple __padding fields in the future). > > diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h > > index d886bdb585e4..0e070c7f568a 100644 > > --- a/include/uapi/linux/fcntl.h > > +++ b/include/uapi/linux/fcntl.h > > @@ -109,17 +109,16 @@ > > * O_TMPFILE} are set. > > * > > * @flags: O_* flags. > > - * @mode: O_CREAT/O_TMPFILE file mode. > > * @resolve: RESOLVE_* flags. > > + * @mode: O_CREAT/O_TMPFILE file mode. > > */ > > struct open_how { > > - __aligned_u64 flags; > > + __u64 flags; > > + __u64 resolve; > > __u16 mode; > > - __u16 __padding[3]; /* must be zeroed */ > > - __aligned_u64 resolve; > > -}; > > +} __attribute__((packed)); >=20 > IIRC, gcc assumes such a struct has alignment 1, which means that it > will generate horrible code to access it. So if you do this (and I don't > think it's a good idea), I think you'd also want to include a > __attribute__((__aligned__(8))) - or perhaps that can be accomplished by > just keeping flags as an explicitly aligned member. But that will of > course bump its sizeof() back to 24, at which point it seems better to > just make the padding explicit. Yeah, you're quite right -- I was aware that GCC generated "less than great" code for aligned(1) structures, but wasn't sure whether it would be seen as being a serious enough issue to NACK the change. There is an additional problem -- unfortunately, having the struct be __attribute__((aligned(8))) doesn't solve the Rust representation problem because Rust can't represent a struct as both being #[repr(packed)] and #[repr(align(n))]. Obviously the kernel doesn't really care about Rust language restrictions, but given one of the main users of how->resolve will be libpathrs, I'd prefer to not make my own life any harder if possible. ;) So, given all of the above, I suggest that instead I send something like this instead: diff --git a/fs/open.c b/fs/open.c index 50a46501bcc9..6c97f52453fe 100644 --- a/fs/open.c +++ b/fs/open.c @@ -994,7 +994,7 @@ static inline int build_open_flags(const struct open_ho= w *how, if (how->resolve & ~VALID_RESOLVE_FLAGS) return -EINVAL; if (memchr_inv(how->__padding, 0, sizeof(how->__padding))) - return -EINVAL; + return -E2BIG; =20 /* Deal with the mode. */ if (WILL_CREATE(flags)) { diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h index d886bdb585e4..c307640071c8 100644 --- a/include/uapi/linux/fcntl.h +++ b/include/uapi/linux/fcntl.h @@ -114,9 +114,9 @@ */ struct open_how { __aligned_u64 flags; + __aligned_u64 resolve; __u16 mode; __u16 __padding[3]; /* must be zeroed */ - __aligned_u64 resolve; }; =20 #define OPEN_HOW_SIZE_VER0 24 /* sizeof first published struct */ diff --git a/tools/testing/selftests/openat2/openat2_test.c b/tools/testing= /selftests/openat2/openat2_test.c index 0b64fedc008b..88e3614cbb3a 100644 --- a/tools/testing/selftests/openat2/openat2_test.c +++ b/tools/testing/selftests/openat2/openat2_test.c @@ -61,15 +61,15 @@ void test_openat2_struct(void) { .name =3D "normal struct (non-zero padding[0])", .arg.inner.flags =3D O_RDONLY, .arg.inner.__padding =3D {0xa0, 0x00, 0x00}, - .size =3D sizeof(struct open_how_ext), .err =3D -EINVAL }, + .size =3D sizeof(struct open_how_ext), .err =3D -E2BIG }, { .name =3D "normal struct (non-zero padding[1])", .arg.inner.flags =3D O_RDONLY, .arg.inner.__padding =3D {0x00, 0x1a, 0x00}, - .size =3D sizeof(struct open_how_ext), .err =3D -EINVAL }, + .size =3D sizeof(struct open_how_ext), .err =3D -E2BIG }, { .name =3D "normal struct (non-zero padding[2])", .arg.inner.flags =3D O_RDONLY, .arg.inner.__padding =3D {0x00, 0x00, 0xef}, - .size =3D sizeof(struct open_how_ext), .err =3D -EINVAL }, + .size =3D sizeof(struct open_how_ext), .err =3D -E2BIG }, =20 /* TODO: Once expanded, check zero-padding. */ --=20 Aleksa Sarai Senior Software Engineer (Containers) SUSE Linux GmbH --uvrtjxp4z3rdqsxa Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQSxZm6dtfE8gxLLfYqdlLljIbnQEgUCXfYoVwAKCRCdlLljIbnQ EvN7AQDwo4/O9nZFcp0yersckNBPVj7BMZ9v79JCcI1aurRDfAEAwe9HeP+jGXMh oYq9nIRg+RvL03iTvyD9Q/4gIXDqoA8= =PES2 -----END PGP SIGNATURE----- --uvrtjxp4z3rdqsxa--