2022-11-28 12:01:48

by Alexander Larsson

[permalink] [raw]
Subject: [PATCH RFC 0/6] Composefs: an opportunistically sharing verified image filesystem

Giuseppe Scrivano and I have recently been working on a new project we
call composefs. This is the first time we propose this publically and
we would like some feedback on it.

At its core, composefs is a way to construct and use read only images
that are used similarly to how you would use e.g. loop-back mounted
squashfs images. On top of this composefs has two new fundamental
features. First it allows sharing of file data (both on disk and in
page cache) between images, and secondly it has dm-verity like
validation on read.

Let me first start with a minimal example of how this can be used,
before going into the details:

Suppose we have this source for an image:

rootfs/
├── dir
│ └── another_a
├── file_a
└── file_b

We can then use this to generate an image file and a set of
content-addressed backing files:

$ mkcomposefs --digest-store=objects rootfs/ rootfs.img
$ ls -l rootfs.img objects/*/*
-rw-------. 1 root root 10 Nov 18 13:20 objects/02/927862b4ab9fb69919187bb78d394e235ce444eeb0a890d37e955827fe4bf4
-rw-------. 1 root root 10 Nov 18 13:20 objects/cc/3da5b14909626fc99443f580e4d8c9b990e85e0a1d18883dc89b23d43e173f
-rw-r--r--. 1 root root 4228 Nov 18 13:20 rootfs.img

The rootfs.img file contains all information about directory and file
metadata plus references to the backing files by name. We can now
mount this and look at the result:

$ mount -t composefs rootfs.img -o basedir=objects,verity_check=2 /mnt
$ ls /mnt/
dir file_a file_b
$ cat /mnt/file_a
content_a

When reading this file the kernel is actually reading the backing
file, in a fashion similar to overlayfs. Since the backing file is
content-addressed, the objects directory can be shared for multiple
images, and any files that happen to have the same content are
shared. I refer to this as opportunistic sharing, as it is different
than the more coarse-grained explicit sharing used by e.g. container
base images.

The next step is the validation. Note how the object files have
fs-verity enabled. In fact, they are named by their fs-verity digest:

$ fsverity digest objects/*/*
sha256:02927862b4ab9fb69919187bb78d394e235ce444eeb0a890d37e955827fe4bf4 objects/02/927862b4ab9fb69919187bb78d394e235ce444eeb0a890d37e955827fe4bf4
sha256:cc3da5b14909626fc99443f580e4d8c9b990e85e0a1d18883dc89b23d43e173f objects/cc/3da5b14909626fc99443f580e4d8c9b990e85e0a1d18883dc89b23d43e173f

The generated filesystem image can contain the expected digest for the
backing files. If you mount the filesystem with the verity_check
option, then open will fail when the backing file digest is incorrect.
And if the open succeeds, any other on-disk file-changes will be
detected by fs-verity:

$ cat objects/cc/3da5b14909626fc99443f580e4d8c9b990e85e0a1d18883dc89b23d43e173f
content_a
$ rm -f objects/cc/3da5b14909626fc99443f580e4d8c9b990e85e0a1d18883dc89b23d43e173f
$ echo modified > objects/cc/3da5b14909626fc99443f580e4d8c9b990e85e0a1d18883dc89b23d43e173f
$ cat /mnt/file_a
WARNING: composefs backing file '3da5b14909626fc99443f580e4d8c9b990e85e0a1d18883dc89b23d43e173f' unexpectedly had no fs-verity digest
cat: /mnt/file_a: Input/output error

This re-uses the existing fs-verity functionality to protect against
changes in file contents, while adding on top of it protection against
changes in filesystem metadata and structure. In other words, it
protects against replacing a fs-verity enabled file or modifying file
permissions, xattrs or other metadata not verified by fs-verity.

To be fully verified we need another step: we use fs-verity on the
image itself. Then we pass the expected digest on the mount command
line (which will be verified at mount time):

$ fsverity enable rootfs.img
$ fsverity digest rootfs.img
sha256:da42003782992856240a3e25264b19601016114775debd80c01620260af86a76 rootfs.img
$ mount -t composefs rootfs.img -o basedir=objects,digest=da42003782992856240a3e25264b19601016114775debd80c01620260af86a76 /mnt

So, given a trusted set of mount options (say unlocked from TPM), we
have a fully verified filesystem tree mounted, with opportunistic
fine-grained sharing of identical files.

So, why do we want this? There are two initial user cases. First of
all we want to use the opportunistic sharing for podman container
layers. The idea is to use a composefs mount as the lower directory in
an overlay mount, with the upper directory being the container work
dir. This will allow automatic file-level disk and page-cache sharing
between any two images, independent of details like the permissions
and timestamps of the files and the origin of the images.

Secondly we are interested in using the verification aspects of
composefs in the ostree project. Ostree already uses a
content-addressed object store, but it is currently referenced to by
hardlink farms. The object store and the trees that reference it are
signed and verified at download time, but there is no runtime
verification. If we replace the hardlink farm with a composefs image
that points into the existing object store we can use the verification
to implement runtime verification.

In fact, the tooling to create composefs images is fully reproducible,
so all we need is to add the fs-verity digest of the composefs image
into the ostree commit metadata. Then the image can be reconstructed
from the ostree commit, generating a composefs image with the same
fs-verity digest.

These are the use cases we're currently interested in, but there seems
to be a wealth of other possible uses. For example, many systems use
loopback mounts for images (like lxc or snap), and these could take
advantage of the opportunistic sharing. We've also talked about using
fuse to implement a local cache for the backing files. I.e. you would
have a second basedir be a fuse filesystem, and on lookup failure in
the first basedir the fuse one triggers a download which is also saved
in the first dir for later lookups. There are many interesting
possibilities here.

The patch series contains documentation on the file format and how to
use the filesystem.

The userspace tools (and a standalone kernel module) is available
here:
https://github.com/containers/composefs

Initial work on ostree integration is here:
https://github.com/ostreedev/ostree/pull/2640

This patchset in git is available here:
https://github.com/alexlarsson/linux/tree/composefs-v1

Alexander Larsson (6):
fsverity: Export fsverity_get_digest
composefs: Add on-disk layout
composefs: Add descriptor parsing code
composefs: Add filesystem implementation
composefs: Add documentation
composefs: Add kconfig and build support

Documentation/filesystems/composefs.rst | 162 ++++
fs/Kconfig | 1 +
fs/Makefile | 1 +
fs/composefs/Kconfig | 18 +
fs/composefs/Makefile | 5 +
fs/composefs/cfs-internals.h | 65 ++
fs/composefs/cfs-reader.c | 958 ++++++++++++++++++++++++
fs/composefs/cfs.c | 941 +++++++++++++++++++++++
fs/composefs/cfs.h | 242 ++++++
fs/verity/measure.c | 1 +
10 files changed, 2394 insertions(+)
create mode 100644 Documentation/filesystems/composefs.rst
create mode 100644 fs/composefs/Kconfig
create mode 100644 fs/composefs/Makefile
create mode 100644 fs/composefs/cfs-internals.h
create mode 100644 fs/composefs/cfs-reader.c
create mode 100644 fs/composefs/cfs.c
create mode 100644 fs/composefs/cfs.h

--
2.38.1


2022-11-28 12:01:48

by Alexander Larsson

[permalink] [raw]
Subject: [PATCH 5/6] composefs: Add documentation

This adds documentation about the composefs filesystem and
how to use it.

Signed-off-by: Alexander Larsson <[email protected]>
---
Documentation/filesystems/composefs.rst | 162 ++++++++++++++++++++++++
1 file changed, 162 insertions(+)
create mode 100644 Documentation/filesystems/composefs.rst

diff --git a/Documentation/filesystems/composefs.rst b/Documentation/filesystems/composefs.rst
new file mode 100644
index 000000000000..75fbf14aeb33
--- /dev/null
+++ b/Documentation/filesystems/composefs.rst
@@ -0,0 +1,162 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+====================
+Composefs Filesystem
+====================
+
+Introduction
+============
+
+Composefs is a read-only file system that is backed by regular files
+(rather than a block device). It is designed to help easily share
+content between different directory trees, such as container images in
+a local store or ostree checkouts. In addition it also has support for
+integrity validation of file content and directory metadata, in an
+efficient way (using fs-verity).
+
+The filesystem mount source is a binary blob called the descriptor. It
+contains all the inode and directory entry data for the entire
+filesystem. However, instead of storing the file content each regular
+file inode stores a relative path name, and the filesystem gets the
+file content from the filesystem by looking up that filename in a set
+of base directories.
+
+Given such a descriptor called "image.cfs" and a directory with files
+called "/dir" you can mount it like:
+
+ mount -t composefs image.cfs -o basedir=/dir /mnt
+
+Content sharing
+===============
+
+Suppose you have a single basedir where the files are content
+addressed (i.e. named by content digest), and a set of composefs
+descriptors using this basedir. Any file that happen to be shared
+between two images (same content, so same digest) will now only be
+stored once on the disk.
+
+Such sharing is possible even if the metadata for the file in the
+image differs (common reasons for metadata difference are mtime,
+permissions, xattrs, etc). The sharing is also anonymous in the sense
+that you can't tell the difference on the mounted files from a
+non-shared file (for example by looking at the link count for a
+hardlinked file).
+
+In addition, any shared files that are actively in use will share
+page-cache, because the page cache for the file contents will be
+addressed by the backing file in the basedir, This means (for example)
+that shared libraries between images will only be mmap:ed once across
+all mounts.
+
+Integrity validation
+====================
+
+Composefs uses `fs-verity
+<https://www.kernel.org/doc/Documentation/filesystems/fsverity.rst>`
+for integrity validation, and extends it by making the validation also
+apply to the directory metadata. This happens on two levels,
+validation of the descriptor and validation of the backing files.
+
+For descriptor validation, the idea is that you enable fs-verity on
+the descriptor file which seals it from changes that would affect the
+directory metadata. Additionally you can pass a `digest` mount option,
+which composefs verifies against the descriptor fs-verity
+measure. Such a mount option could be encoded in a trusted source
+(like a signed kernel command line) and be used as a root of trust if
+using composefs for the root filesystem.
+
+For file validation, the descriptor can contain digest for each
+backing file, and you can enable fs-verity on the backing
+files. Composefs will validate the digest before using the backing
+files. This means any (accidental or malicious) modification of the
+basedir will be detected at the time the file is used.
+
+Expected use-cases
+=================
+
+Container Image Storage
+```````````````````````
+
+Typically a container image is stored as a set of "layer"
+directories. merged into one mount by using overlayfs. The lower
+layers are read-only image content and the upper layer is the
+writable state of a running container. Multiple uses of the same
+layer can be shared this way, but it is hard to share individual
+files between unrelated layers.
+
+Using composefs, we can instead use a shared, content-addressed
+store for all the images in the system, and use a composefs image
+for the read-only image content of each image, pointing into the
+shared store. Then for a running container we use an overlayfs
+with the lower dir being the composefs and the upper dir being
+the writable state.
+
+
+Ostree root filesystem validation
+`````````````````````````````````
+
+Ostree uses a content-addressed on-disk store for file content,
+allowing efficient updates and sharing of content. However to actually
+use these as a root filesystem it needs to create a real
+"chroot-style" directory, containing hard links into the store. The
+store itself is validated when created, but once the hard-link
+directory is created, nothing validates the directory structure of
+that.
+
+Instead of a chroot we can we can use composefs. We create a composefs
+image pointing into the object store, enable fs-verity for everything
+and encode the fs-verity digest of the descriptor in the
+kernel-command line. This will allow booting a trusted system where
+all directory metadata and file content is validated lazily at use.
+
+
+Mount options
+=============
+
+`basedir`: A colon separated list of directories to use as a base when resolving relative content paths.
+`verity_check=[0,1,2]`: When to verify backing file fs-verity: 0 == never, 1 == if specified in image, 2 == always and require it in image.
+`digest`: A fs-verity sha256 digest that the descriptor file must match. If set, `verity_check` defaults to 2.
+
+
+Filesystem format
+=================
+
+The format of the descriptor is contains three sections: header,
+inodes and variable data. All data in the file is stored in
+little-endian form.
+
+The header starts at the beginning of the file and contains version,
+magic value, offsets to the variable data and the root inode nr.
+
+The inode section starts at a fixed location right after the
+header. It is a array of inode data, where for each inode there is
+first a variable length chunk and then a fixed size chunk. An inode nr
+is the offset in the inode data to the start of the fixed chunk.
+
+The fixed inode chunk starts with a flag that tells what parts of the
+inode are stored in the file (meaning it is only the maximal size that
+is fixed). After that the various inode attributes are serialized in
+order, such as mode, ownership, xattrs, and payload length. The
+payload length attribute gives the size of the variable chunk.
+
+The inode variable chunk contains different things depending on the
+file type. For regular files it is the backing filename. For symlinks
+it is the symlink target. For directories it is a list of references to
+dentries, stored in chunks of maximum 4k. The dentry chunks themselves
+are stored in the variable data section.
+
+The variable data section is stored after the inode section, and you
+can find it from the offset in the header. It contains dentries and
+Xattrs data. The xattrs are referred to by offset and size in the
+xattr attribute in the inode data. Each xattr data can be used by many
+inodes in the filesystem. The variable data chunks are all smaller than
+a page (4K) and are padded to not span pages.
+
+Tools
+=====
+
+Tools for composefs can be found at https://github.com/containers/composefs
+
+There is a mkcomposefs tool which can be used to create images on the
+CLI, and a library that applications can use to create composefs
+images.
--
2.38.1

2022-11-28 12:22:29

by Alexander Larsson

[permalink] [raw]
Subject: [PATCH 2/6] composefs: Add on-disk layout

This commit adds the on-disk layout header file of composefs.

Signed-off-by: Alexander Larsson <[email protected]>
Signed-off-by: Giuseppe Scrivano <[email protected]>
---
fs/composefs/cfs.h | 242 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 242 insertions(+)
create mode 100644 fs/composefs/cfs.h

diff --git a/fs/composefs/cfs.h b/fs/composefs/cfs.h
new file mode 100644
index 000000000000..8f001fd28d6b
--- /dev/null
+++ b/fs/composefs/cfs.h
@@ -0,0 +1,242 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * composefs
+ *
+ * Copyright (C) 2021 Giuseppe Scrivano
+ * Copyright (C) 2022 Alexander Larsson
+ *
+ * This file is released under the GPL.
+ */
+
+#ifndef _CFS_H
+#define _CFS_H
+
+#include <asm/byteorder.h>
+#include <crypto/sha2.h>
+#include <linux/fs.h>
+#include <linux/stat.h>
+#include <linux/types.h>
+
+#define CFS_VERSION 1
+
+#define CFS_MAGIC 0xc078629aU
+
+#define CFS_MAX_DIR_CHUNK_SIZE 4096
+#define CFS_MAX_XATTRS_SIZE 4096
+
+static inline u16 cfs_u16_to_file(u16 val)
+{
+ return cpu_to_le16(val);
+}
+
+static inline u32 cfs_u32_to_file(u32 val)
+{
+ return cpu_to_le32(val);
+}
+
+static inline u64 cfs_u64_to_file(u64 val)
+{
+ return cpu_to_le64(val);
+}
+
+static inline u16 cfs_u16_from_file(u16 val)
+{
+ return le16_to_cpu(val);
+}
+
+static inline u32 cfs_u32_from_file(u32 val)
+{
+ return le32_to_cpu(val);
+}
+
+static inline u64 cfs_u64_from_file(u64 val)
+{
+ return le64_to_cpu(val);
+}
+
+static inline int cfs_xdigit_value(char c)
+{
+ if (c >= '0' && c <= '9')
+ return c - '0';
+ if (c >= 'A' && c <= 'F')
+ return c - 'A' + 10;
+ if (c >= 'a' && c <= 'f')
+ return c - 'a' + 10;
+ return -1;
+}
+
+static inline int cfs_digest_from_payload(const char *payload,
+ size_t payload_len,
+ u8 digest_out[SHA256_DIGEST_SIZE])
+{
+ const char *p, *end;
+ u8 last_digit = 0;
+ int digit = 0;
+ size_t n_nibbles = 0;
+
+ end = payload + payload_len;
+ for (p = payload; p != end; p++) {
+ /* Skip subdir structure */
+ if (*p == '/')
+ continue;
+
+ /* Break at (and ignore) extension */
+ if (*p == '.')
+ break;
+
+ if (n_nibbles == SHA256_DIGEST_SIZE * 2)
+ return -1; /* Too long */
+
+ digit = cfs_xdigit_value(*p);
+ if (digit == -1)
+ return -1; /* Not hex digit */
+
+ n_nibbles++;
+ if ((n_nibbles % 2) == 0) {
+ digest_out[n_nibbles / 2 - 1] =
+ (last_digit << 4) | digit;
+ }
+ last_digit = digit;
+ }
+
+ if (n_nibbles != SHA256_DIGEST_SIZE * 2)
+ return -1; /* Too short */
+
+ return 0;
+}
+
+struct cfs_vdata_s {
+ u64 off;
+ u32 len;
+} __packed;
+
+struct cfs_header_s {
+ u8 version;
+ u8 unused1;
+ u16 unused2;
+
+ u32 magic;
+ u64 data_offset;
+ u64 root_inode;
+
+ u64 unused3[2];
+} __packed;
+
+enum cfs_inode_flags {
+ CFS_INODE_FLAGS_NONE = 0,
+ CFS_INODE_FLAGS_PAYLOAD = 1 << 0,
+ CFS_INODE_FLAGS_MODE = 1 << 1,
+ CFS_INODE_FLAGS_NLINK = 1 << 2,
+ CFS_INODE_FLAGS_UIDGID = 1 << 3,
+ CFS_INODE_FLAGS_RDEV = 1 << 4,
+ CFS_INODE_FLAGS_TIMES = 1 << 5,
+ CFS_INODE_FLAGS_TIMES_NSEC = 1 << 6,
+ CFS_INODE_FLAGS_LOW_SIZE = 1 << 7, /* Low 32bit of st_size */
+ CFS_INODE_FLAGS_HIGH_SIZE = 1 << 8, /* High 32bit of st_size */
+ CFS_INODE_FLAGS_XATTRS = 1 << 9,
+ CFS_INODE_FLAGS_DIGEST = 1
+ << 10, /* fs-verity sha256 digest of content */
+ CFS_INODE_FLAGS_DIGEST_FROM_PAYLOAD =
+ 1 << 11, /* Compute digest from payload */
+};
+
+#define CFS_INODE_FLAG_CHECK(_flag, _name) \
+ (((_flag) & (CFS_INODE_FLAGS_##_name)) != 0)
+#define CFS_INODE_FLAG_CHECK_SIZE(_flag, _name, _size) \
+ (CFS_INODE_FLAG_CHECK(_flag, _name) ? (_size) : 0)
+
+#define CFS_INODE_DEFAULT_MODE 0100644
+#define CFS_INODE_DEFAULT_NLINK 1
+#define CFS_INODE_DEFAULT_NLINK_DIR 2
+#define CFS_INODE_DEFAULT_UIDGID 0
+#define CFS_INODE_DEFAULT_RDEV 0
+#define CFS_INODE_DEFAULT_TIMES 0
+
+struct cfs_inode_s {
+ u32 flags;
+
+ /* Optional data: (selected by flags) */
+
+ /* This is the size of the type specific data that comes directly after
+ * the inode in the file. Of this type:
+ *
+ * directory: cfs_dir_s
+ * regular file: the backing filename
+ * symlink: the target link
+ *
+ * Canonically payload_length is 0 for empty dir/file/symlink.
+ */
+ u32 payload_length;
+
+ u32 st_mode; /* File type and mode. */
+ u32 st_nlink; /* Number of hard links, only for regular files. */
+ u32 st_uid; /* User ID of owner. */
+ u32 st_gid; /* Group ID of owner. */
+ u32 st_rdev; /* Device ID (if special file). */
+ u64 st_size; /* Size of file, only used for regular files */
+
+ struct cfs_vdata_s xattrs; /* ref to variable data */
+
+ u8 digest[SHA256_DIGEST_SIZE]; /* fs-verity digest */
+
+ struct timespec64 st_mtim; /* Time of last modification. */
+ struct timespec64 st_ctim; /* Time of last status change. */
+};
+
+static inline u32 cfs_inode_encoded_size(u32 flags)
+{
+ return sizeof(u32) /* flags */ +
+ CFS_INODE_FLAG_CHECK_SIZE(flags, PAYLOAD, sizeof(u32)) +
+ CFS_INODE_FLAG_CHECK_SIZE(flags, MODE, sizeof(u32)) +
+ CFS_INODE_FLAG_CHECK_SIZE(flags, NLINK, sizeof(u32)) +
+ CFS_INODE_FLAG_CHECK_SIZE(flags, UIDGID,
+ sizeof(u32) + sizeof(u32)) +
+ CFS_INODE_FLAG_CHECK_SIZE(flags, RDEV, sizeof(u32)) +
+ CFS_INODE_FLAG_CHECK_SIZE(flags, TIMES, sizeof(u64) * 2) +
+ CFS_INODE_FLAG_CHECK_SIZE(flags, TIMES_NSEC, sizeof(u32) * 2) +
+ CFS_INODE_FLAG_CHECK_SIZE(flags, LOW_SIZE, sizeof(u32)) +
+ CFS_INODE_FLAG_CHECK_SIZE(flags, HIGH_SIZE, sizeof(u32)) +
+ CFS_INODE_FLAG_CHECK_SIZE(flags, XATTRS,
+ sizeof(u64) + sizeof(u32)) +
+ CFS_INODE_FLAG_CHECK_SIZE(flags, DIGEST, SHA256_DIGEST_SIZE);
+}
+
+struct cfs_dentry_s {
+ /* Index of struct cfs_inode_s */
+ u64 inode_index;
+ u8 d_type;
+ u8 name_len;
+ u16 name_offset;
+} __packed;
+
+struct cfs_dir_chunk_s {
+ u16 n_dentries;
+ u16 chunk_size;
+ u64 chunk_offset;
+} __packed;
+
+struct cfs_dir_s {
+ u32 n_chunks;
+ struct cfs_dir_chunk_s chunks[];
+} __packed;
+
+#define cfs_dir_size(_n_chunks) \
+ (sizeof(struct cfs_dir_s) + \
+ (_n_chunks) * sizeof(struct cfs_dir_chunk_s))
+
+/* xattr representation. */
+struct cfs_xattr_element_s {
+ u16 key_length;
+ u16 value_length;
+} __packed;
+
+struct cfs_xattr_header_s {
+ u16 n_attr;
+ struct cfs_xattr_element_s attr[0];
+} __packed;
+
+#define cfs_xattr_header_size(_n_element) \
+ (sizeof(struct cfs_xattr_header_s) + \
+ (_n_element) * sizeof(struct cfs_xattr_element_s))
+
+#endif
--
2.38.1

2023-01-05 16:31:07

by Brian Masney

[permalink] [raw]
Subject: Re: [PATCH 2/6] composefs: Add on-disk layout

On Mon, Nov 28, 2022 at 12:16:23PM +0100, Alexander Larsson wrote:
> This commit adds the on-disk layout header file of composefs.
>
> Signed-off-by: Alexander Larsson <[email protected]>
> Signed-off-by: Giuseppe Scrivano <[email protected]>

Add Co-Developed-By: Giuseppe ... ?

Full disclosure: I'm not a file system developer but I'll attempt to
help with the review of this series.

> ---
> fs/composefs/cfs.h | 242 +++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 242 insertions(+)
> create mode 100644 fs/composefs/cfs.h
>
> diff --git a/fs/composefs/cfs.h b/fs/composefs/cfs.h
> new file mode 100644
> index 000000000000..8f001fd28d6b
> --- /dev/null
> +++ b/fs/composefs/cfs.h
> @@ -0,0 +1,242 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * composefs
> + *
> + * Copyright (C) 2021 Giuseppe Scrivano
> + * Copyright (C) 2022 Alexander Larsson
> + *
> + * This file is released under the GPL.
> + */
> +
> +#ifndef _CFS_H
> +#define _CFS_H
> +
> +#include <asm/byteorder.h>
> +#include <crypto/sha2.h>
> +#include <linux/fs.h>
> +#include <linux/stat.h>
> +#include <linux/types.h>
> +
> +#define CFS_VERSION 1
> +
> +#define CFS_MAGIC 0xc078629aU
> +
> +#define CFS_MAX_DIR_CHUNK_SIZE 4096
> +#define CFS_MAX_XATTRS_SIZE 4096
> +
> +static inline u16 cfs_u16_to_file(u16 val)
> +{
> + return cpu_to_le16(val);
> +}
> +
> +static inline u32 cfs_u32_to_file(u32 val)
> +{
> + return cpu_to_le32(val);
> +}
> +
> +static inline u64 cfs_u64_to_file(u64 val)
> +{
> + return cpu_to_le64(val);
> +}
> +
> +static inline u16 cfs_u16_from_file(u16 val)
> +{
> + return le16_to_cpu(val);
> +}
> +
> +static inline u32 cfs_u32_from_file(u32 val)
> +{
> + return le32_to_cpu(val);
> +}
> +
> +static inline u64 cfs_u64_from_file(u64 val)
> +{
> + return le64_to_cpu(val);
> +}

I don't see where the cfs_xxx_{to,from}_file() approach is used in other
filesystems. Instead, move the cpu() functions directly into the code.

> +static inline int cfs_xdigit_value(char c)
> +{
> + if (c >= '0' && c <= '9')
> + return c - '0';
> + if (c >= 'A' && c <= 'F')
> + return c - 'A' + 10;
> + if (c >= 'a' && c <= 'f')
> + return c - 'a' + 10;
> + return -1;
> +}

There's some utilities in lib/hexdump.c that you can use. hex_to_bin()
will convert a single character and hex2bin() will convert a string for
you.

> +static inline int cfs_digest_from_payload(const char *payload,
> + size_t payload_len,
> + u8 digest_out[SHA256_DIGEST_SIZE])
> +{
> + const char *p, *end;
> + u8 last_digit = 0;
> + int digit = 0;
> + size_t n_nibbles = 0;

Put in reverse Christmas tree order.

> +
> + end = payload + payload_len;
> + for (p = payload; p != end; p++) {
> + /* Skip subdir structure */
> + if (*p == '/')
> + continue;
> +
> + /* Break at (and ignore) extension */
> + if (*p == '.')
> + break;

A comment would be helpful in this area that shows what the payload is
expected to be.

> +
> + if (n_nibbles == SHA256_DIGEST_SIZE * 2)
> + return -1; /* Too long */

return -EINVAL; ?

> +
> + digit = cfs_xdigit_value(*p);
> + if (digit == -1)
> + return -1; /* Not hex digit */

-EINVAL here as well

> +
> + n_nibbles++;
> + if ((n_nibbles % 2) == 0) {
> + digest_out[n_nibbles / 2 - 1] =
> + (last_digit << 4) | digit;
> + }
> + last_digit = digit;
> + }
> +
> + if (n_nibbles != SHA256_DIGEST_SIZE * 2)
> + return -1; /* Too short */

-EINVAL here as well

> +
> + return 0;
> +}
> +
> +struct cfs_vdata_s {
> + u64 off;
> + u32 len;
> +} __packed;
> +
> +struct cfs_header_s {
> + u8 version;
> + u8 unused1;
> + u16 unused2;
> +
> + u32 magic;

Should the magic number appear first?

> + u64 data_offset;
> + u64 root_inode;
> +
> + u64 unused3[2];
> +} __packed;
> +
> +enum cfs_inode_flags {
> + CFS_INODE_FLAGS_NONE = 0,
> + CFS_INODE_FLAGS_PAYLOAD = 1 << 0,
> + CFS_INODE_FLAGS_MODE = 1 << 1,
> + CFS_INODE_FLAGS_NLINK = 1 << 2,
> + CFS_INODE_FLAGS_UIDGID = 1 << 3,
> + CFS_INODE_FLAGS_RDEV = 1 << 4,
> + CFS_INODE_FLAGS_TIMES = 1 << 5,
> + CFS_INODE_FLAGS_TIMES_NSEC = 1 << 6,
> + CFS_INODE_FLAGS_LOW_SIZE = 1 << 7, /* Low 32bit of st_size */
> + CFS_INODE_FLAGS_HIGH_SIZE = 1 << 8, /* High 32bit of st_size */
> + CFS_INODE_FLAGS_XATTRS = 1 << 9,
> + CFS_INODE_FLAGS_DIGEST = 1
> + << 10, /* fs-verity sha256 digest of content */

Include << 10 on line above

Brian

2023-01-10 16:38:37

by Alexander Larsson

[permalink] [raw]
Subject: Re: [PATCH 2/6] composefs: Add on-disk layout

On Thu, 2023-01-05 at 10:55 -0500, Brian Masney wrote:
> On Mon, Nov 28, 2022 at 12:16:23PM +0100, Alexander Larsson wrote:
> > This commit adds the on-disk layout header file of composefs.
> >
> > Signed-off-by: Alexander Larsson <[email protected]>
> > Signed-off-by: Giuseppe Scrivano <[email protected]>
>
> Add Co-Developed-By: Giuseppe ... ?
>
> Full disclosure: I'm not a file system developer but I'll attempt to
> help with the review of this series.
>

Thanks. I did various changes to the github repo based on your review,
here are the outstanding comments:

>
> > +struct cfs_header_s {
> > +       u8 version;
> > +       u8 unused1;
> > +       u16 unused2;
> > +
> > +       u32 magic;
>
> Should the magic number appear first?

I don't think so, the version number is essentially part of the full
magic string.
>

>
--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
=-=-=
Alexander Larsson Red Hat,
Inc
[email protected] [email protected]
He's a notorious vegetarian filmmaker who knows the secret of the alien
invasion. She's an artistic tomboy nun married to the Mob. They fight
crime!