Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp3025881rwb; Fri, 9 Dec 2022 09:04:15 -0800 (PST) X-Google-Smtp-Source: AA0mqf6q1jlbqoOcxM+N8/IjmJSLJC8ft8LnZ/yLWSwJ8+LBUAPbjHdqcerDVT9YtNUqL45FsqH5 X-Received: by 2002:a17:906:6d8e:b0:7ad:b6d8:c9d0 with SMTP id h14-20020a1709066d8e00b007adb6d8c9d0mr5532768ejt.53.1670605455147; Fri, 09 Dec 2022 09:04:15 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670605455; cv=none; d=google.com; s=arc-20160816; b=Vb/groAp8boW0jQyb5RcYZT/XBSJyyTy/NHW11w6PQaMukIQ9h6NrEIiTKCXv5cIRd V773FL7RS/bF4YGGU170ehPZ+Zy0nOfGd5aWe0B8AsT9FILUrgLkZbXP9LW5IDp432mm iDSMxibY9zZt9/JZTDio3RgsktYwo/lum2NZ4BwoiMxUuXLGjU1TijSE4FvfIPVG3e+f oQ+IC2QGXlBJ+N+tCdvEd2e5VYzZdIQPKH0CEM9as4EkQQAtZTK9bbn4ygi18jJYG7+L zkJD2jDk8Bp4/G6r4COuJvczgLJfKPtbdzL2nZ+cTS2Z9ETBwpfDmiBgo27D7Pr97cEF +Irw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=sMl0xHL08xEVlmhDHRhL35aHsFURzUzLl8PZBEWloas=; b=gx8AAZkmTp8DM7KgNmeQnMrsFQPnXaJQrbGmmzxoNhgbv2bgEy7LIrb1iPG67jBVXd Ayi8JfuNLdWGulqugWuOGG/AMAWzsUA+YJ5s2AZgWZ+VEksO4xRtEx6GnW1QnSJjmn4X +2GU7r5SG7j/hvpiH6cmpft/L0H0YPtrEJeeCrp2SoO9qge7jT86u8oJVmZlEdgXxiCZ rcp+kGoR01NMbrmAS9GZM993CAu/G8exuHpRAt7fqHjRDG6hO/VWD2oA7fWu+UWC0OEx kYFoaIw38CJPlVPBF1wwF6MsdP8iwn1EW5ZH2LRekDWGjm1QqmBSP2/PW+c2vY8Cx5ND ON6g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ht21-20020a170907609500b007c0c524c68dsi194906ejc.768.2022.12.09.09.03.54; Fri, 09 Dec 2022 09:04:15 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229779AbiLIRA6 (ORCPT + 76 others); Fri, 9 Dec 2022 12:00:58 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51336 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229612AbiLIRA4 (ORCPT ); Fri, 9 Dec 2022 12:00:56 -0500 Received: from fudo.makrotopia.org (fudo.makrotopia.org [IPv6:2a07:2ec0:3002::71]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8D4716311; Fri, 9 Dec 2022 09:00:55 -0800 (PST) Received: from local by fudo.makrotopia.org with esmtpsa (TLS1.3:TLS_AES_256_GCM_SHA384:256) (Exim 4.94.2) (envelope-from ) id 1p3gjm-0003BX-4w; Fri, 09 Dec 2022 18:00:38 +0100 Date: Fri, 9 Dec 2022 17:00:34 +0000 From: Daniel Golle To: Christoph Hellwig Cc: Richard Weinberger , Matthew Wilcox , Jens Axboe , "Martin K. Petersen" , Chaitanya Kulkarni , Wolfram Sang , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH 1/4] init: move block device helpers from init/do_mounts.c Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_HELO_NONE, SPF_PASS,T_PDS_OTHER_BAD_TLD autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Christoph, On Tue, Nov 22, 2022 at 04:37:08AM -0800, Christoph Hellwig wrote: > On Sat, Nov 19, 2022 at 04:03:11PM +0000, Daniel Golle wrote: > > That works, but has slightly less utility value than the partition > > parser approach as in this way I cannot easily populate the PARTNAME > > uevent which can later help userspace to identify a device by the FIT > > subimage name -- I'd have to either invent a new bus_type or > > device_type, both seem inappropriate and have unwanted side effects. > > Or am I missing something and there is a way to use add_uevent_var() > > for a disk_type device? > > You're not exposing a partition here - this is an image format that > sits in a partition and we should not pretend that is a partition. It doesn't need to be literally the PARTNAME uevent, just any way to communicate the names of mapped subimages to userspace. My understanding by now is that there is no way around introducing a new device_type and then mitigate the unwanted side effects by follow-up changes, ie. make it possible to use that new device_type when specifying the rootfs= cmdline variable (currently only disks and partitions are considered there). Or give up on the idea that uImage.FIT subimages mapped by the new driver can be identified by userspace by poking uevent from sysfs and just rely on convention and ordering. > > > However, I don't see a way to avoid using (or duplicating) > > devt_from_devname() to select the lower device somehow without having > > to probe and parse *all* block devices present (which is, from what I > > understood, what you want to avoid, and I agree that it is more safe to > > not do that...) > > > > Can you or anyone give some advise on how this should be done? > > Just set the block driver up from an initramfs, like we do for all > modern stackable drivers. Instead of using a kernel cmdline parameter we could also have the bootloader embed that information as string in the 'chosen' section in the device tree blob, right next to the cmdline. However, as there is no representation of block partitions in device tree, also in that case the lower device will have to be referenced by a string somehow, ie. devt_from_devname() or the like will be needed. Needing an initramfs, even if it boils down to just one statically compile executable, is a massive bloat and complication when building embedded device firmware and none of the over 1580 devices currently supported by OpenWrt need an intermediate initramfs to mount their on-flash squashfs rootfs (some, however, already use this uImage.FIT partition parser, and not needing a downstream patch for that would be nice). uImage.FIT typically contains the complete firmware used on an embedded device, ie. at least a Linux kernel, device tree blob and a filesystem. The main use of this whole uImage.FIT-parsing-in-Linux approach I'm trying to get across here is to expose one or more 'filesystem'-type subimages of such an image as block devices, also so that one of them can directly be mounted as rootfs by the kernel. This *replaces* the use of 'ramdisk' type sub-images which need to remain allocated at runtime, while using a squashfs 'filesystem' type sub-image allows freeing the filesystem cache if ram is becomes scarce. As both, storage and memory, are often very limited on small embedded devices, OpenWrt has always been using a squashfs as rootfs with a storage-type specific filesytem used as r/w overlay on top. Up to now, the rootfs is often stored in platform-specific ways, ie. an additional partition on block devices, MTD partition on NOR flash or UBI volume on NAND flash. Carrying the read-only squashfs filesystem inside the uImage.FIT structure has the advantage of being agnostic regarding the storage-type (NOR/mtdblockX, NAND/ubiblockX, MMC/mmcblkXpY) and allows the bootloader to validate the filesystem hash before starting the kernel, ie. ensuring integrity of the firmware as-a-whole which includes the root filesystem. > > > Yet another (imho not terrible) problem is removal of the lower device. > > Many of the supported SBC use a micro SD card to boot, which can be > > removed by the user while the system is running (which is generally not > > a good idea, but anyway). For partitions this is handled automatically > > by blk_drop_partitions() called directly from genhd.c. > > I'm currently playing with doing something similar using the bus device > > removal notification, but it doesn't seem to work for all cases, e.g. > > mmcblk device do not seem to have the ->bus pointer populated at all > > (ie. disk_to_dev(disk)->bus == NULL for mmcblk devices). > > I have WIP patches that allow the claimer of a block device get > resize and removal notification. It's not going to land for 6.2, > but I hope I have it ready in time for the next merge window. I'm looking forward to integrate that in the uImage.FIT block driver I've been working on. In the meantime, should I already post my current draft so we can start discussing if that solution could be acceptable? Best regards Daniel