Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp3977876rdb; Mon, 11 Dec 2023 05:46:55 -0800 (PST) X-Google-Smtp-Source: AGHT+IEdVLETvlI7g2x+bYzCHW/abn0CcesFz/L+KA0gkjoebnNq54gFIUzR/es7sgcECUK0Ilo+ X-Received: by 2002:a05:6a00:8b17:b0:6d0:93da:72b0 with SMTP id if23-20020a056a008b1700b006d093da72b0mr952342pfb.46.1702302414756; Mon, 11 Dec 2023 05:46:54 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702302414; cv=none; d=google.com; s=arc-20160816; b=OE027ifGTxeR7MVqEekRV6aZ97YP+4pvuno/WyJZ4KZzATpp9gjITizyyEX3EUQn+3 CHca2qdJ3nFz+ySg1vzyxF7qs+N4imRQMrs3Yd4cMY0sXn7e4evh+HDv5zzeajgkXQw3 iidHuXa17jsYrRQDPFgeqbbirmUok5zKffwmbXpjX6qmcd7GArOPsXEGkuOx2HJfKMlU B2hzWDaWEmkiXnM7rwVqqmRPR2MGXkKQ9VvIHP8ggroVPzbIKUWbBLHTB6Fk8krwkyno ISNh7GcZt/EFzXTXeJzpVGYiuO+joJInlBwEdjtjgtKH1MTU4EzUfi8BQ84sUZH4DYfD Tj0w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:mime-version :dkim-signature; bh=IFsLeKW3DW6BmxUhVRH0V3zQ/xj1O1MUYot7lQD78gA=; fh=6AMnhalG7FOaAi/aVCJBG5/aH5/E4k1Fc1BxIv0jysw=; b=Z/QBR1cQlMQESnz/kw1YVfrObS8m0ocHGT5x8b673Eff/Qf0FhSmCgJ61lUzXvOpUC kB14lrMBGib+0x5UXba1ojk/OGgdOPHj2JQheXZ4aRvpzZH8Ba0xw4hEOfSWtdHQEJAG brgZoDlT8LHd5RpF1pMJh9/nkLxZ1lTul414lkY3y7wAPr3j07nsjXLuZSDa2k9gRAFN yDq+4z4nfNmyHhiX60ku/TZwEhWpmW70dKgM5kEwqmTx+z+JY4SpZcgPX1SeM4xxyB0x jKrAVRkzkvEClvZ1XAkKMJ4boAClSRsV6YXUSmVfiWn8RTcWZCZkzp5zdNw9PiS+wtDb o7oA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=P7+enMsU; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from pete.vger.email (pete.vger.email. [2620:137:e000::3:6]) by mx.google.com with ESMTPS id bq22-20020a056a000e1600b006cbd40b46basi6062003pfb.133.2023.12.11.05.46.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 11 Dec 2023 05:46:54 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) client-ip=2620:137:e000::3:6; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=P7+enMsU; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by pete.vger.email (Postfix) with ESMTP id D981E80AF24C; Mon, 11 Dec 2023 05:46:51 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at pete.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1343631AbjLKNqf (ORCPT + 99 others); Mon, 11 Dec 2023 08:46:35 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58906 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234625AbjLKNqe (ORCPT ); Mon, 11 Dec 2023 08:46:34 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 99758AD for ; Mon, 11 Dec 2023 05:46:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1702302397; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type; bh=IFsLeKW3DW6BmxUhVRH0V3zQ/xj1O1MUYot7lQD78gA=; b=P7+enMsUDv30Ht7W/sWNEfF3bBPRMeTlsQ+ZBNVbXlSvp1yzJJ7TNNauF/xuHHA9MwTieq Fwzhii7L92lcLgUCWbzLdrnRAnBskm1moHh7CgJrVuOdJ4wmR0joclhWt30eok8DFRKUMS GvY1D9QY9L1tLpIsEHa/66ngr/tLR+M= Received: from mail-pg1-f199.google.com (mail-pg1-f199.google.com [209.85.215.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-572-4F7FSfA6OCuMiBi7pQTbZg-1; Mon, 11 Dec 2023 08:46:36 -0500 X-MC-Unique: 4F7FSfA6OCuMiBi7pQTbZg-1 Received: by mail-pg1-f199.google.com with SMTP id 41be03b00d2f7-5c6bd30ee89so3982102a12.0 for ; Mon, 11 Dec 2023 05:46:36 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702302395; x=1702907195; h=cc:to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=IFsLeKW3DW6BmxUhVRH0V3zQ/xj1O1MUYot7lQD78gA=; b=q4J4v/I0YX6v5NY0TV+AkCxOdgofZzUQGfQ71Q1eba2E2MYSAQG9u9XldixAj7Frc+ cHVdwsfIO5Gss1U0mOyB9gZxbSRiTpVNo+ci4di2AI1CKzQ8yq4THmLM1A83XgAAstml bVY+byYjRGb3GHPw/ryT4gUqwrR/ww4GeGWHgE5fpNis/s9vwaqJGsqDFAxhfDdGKmpr CnOxTeszsQqKqhC2EoJLzFSW8gfVA78A74MIOQKn/qLtaXXWLyANL3DblVpEQ8tZkCH2 e4LSgNqp5b1UayXwPVJduyyRp0XYRLsYoQ2aJ+Yor59z49fhWIy08vIs4PiM1h1Bn+rx V7hQ== X-Gm-Message-State: AOJu0Yz0sxz6beUMLTuOv/x4TdJwy0jtN534YT08ZYpW9o8jKzMZJyhA l0yYGdKBPpSH12t9ogoFovsRCPQXmeuD47SOO/bbSweZA9V+mbXGfZH79/9MA9KeLSTOON9aXDM plI8SQz+LkPJKqMyVeMDs0XQEIeQWMrMdk1hxr35QO/9+vRniHmbQMw== X-Received: by 2002:a05:6a20:1604:b0:190:3b35:5999 with SMTP id l4-20020a056a20160400b001903b355999mr6160563pzj.9.1702302395292; Mon, 11 Dec 2023 05:46:35 -0800 (PST) X-Received: by 2002:a05:6a20:1604:b0:190:3b35:5999 with SMTP id l4-20020a056a20160400b001903b355999mr6160548pzj.9.1702302394991; Mon, 11 Dec 2023 05:46:34 -0800 (PST) MIME-Version: 1.0 From: Eric Curtin Date: Mon, 11 Dec 2023 13:45:58 +0000 Message-ID: Subject: [RFC KERNEL] initoverlayfs - a scalable initial filesystem To: Linux Kernel Mailing List , linux-unionfs@vger.kernel.org, linux-erofs@lists.ozlabs.org Cc: Daan De Meyer , Stephen Smoogen , Yariv Rachmani , Daniel Walsh , Douglas Landgraf , Alexander Larsson , Colin Walters , Brian Masney , Eric Chanudet , Pavol Brilla , Lokesh Mandvekar , =?UTF-8?B?UGV0ciDFoGFiYXRh?= , Lennart Poettering , Luca Boccassi , Neal Gompa Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on pete.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (pete.vger.email [0.0.0.0]); Mon, 11 Dec 2023 05:46:52 -0800 (PST) Hi All, We have recently been working on something called initoverlayfs, which we sent an RFC email to the systemd and dracut mailing lists to gather feedback. This is an exploratory email as we are unsure if a solution like this fits in userspace or kernelspace and we would like to gather feedback from the community. To describe this briefly, the idea is to use erofs+overlayfs as an initial filesystem rather than an initramfs. The benefits are, we can start userspace significantly faster as we do not have to unpack, decompress and populate a tmpfs upfront, instead we can rely on transparent decompression like lz4hc instead. What we believe is the greater benefit, is that we can have less fear of initial filesystem bloat, as when you are using transparent decompression you only pay for decompressing the bytes you actually use. We implemented the first version of this, by creating a small initramfs that only contains storage drivers, udev and a couple of 100 lines of C code, just enough userspace to mount an erofs with transient overlay. Then we build a second initramfs which has all the contents of a normal everyday initramfs with all the bells and whistles and convert this into an erofs. Then at boot time you basically transition to this erofs+overlayfs in userspace and everything works as normal as it would in a traditional initramfs. The current implementation looks like this: ``` From the filesystem perspective (roughly): fw -> bootloader -> kernel -> mini-initramfs -> initoverlayfs -> rootfs From the process perspective (roughly): fw -> bootloader -> kernel -> storage-init -> init -----------------> ``` But we have been asking the question whether we should be implementing this in kernelspace so it looks more like: ``` From the filesystem perspective (roughly): fw -> bootloader -> kernel -> initoverlayfs -> rootfs From the process perspective (roughly): fw -> bootloader -> kernel -> init -----------------> ``` The kind of questions we are asking are: Would it be possible to implement this in kernelspace so we could just mount the initial filesystem data as an erofs+overlayfs filesystem without unpacking, decompressing, copying the data to a tmpfs, etc.? Could we memmap the initramfs buffer and mount it like an erofs? What other considerations should be taken into account? Echo'ing Lennart we must also "keep in mind from the beginning how authentication of every component of your process shall work" as that's essential to a couple of different Linux distributions today. We kept this email short because we want people to read it and avoid duplicating information from elsewhere. The effort is described from different perspectives in the systemd/dracut RFC email and github README.md if you'd like to learn more, it's worth reading the discussion in the systemd mailing list: https://marc.info/?l=systemd-devel&m=170214639006704&w=2 https://github.com/containers/initoverlayfs/blob/main/README.md We also received feedback informally in the community that it would be nice if we could optionally use btrfs as an alternative. Is mise le meas/Regards, Eric Curtin