We want to be able to mount virtiofs as rootfs and pass appropriate
kernel command line. Right now there does not seem to be a good way
to do that. If I specify "root=myfs rootfstype=virtiofs", system
panics.
virtio-fs: tag </dev/root> not found
..
..
[ end Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0) ]
Basic problem here is that kernel assumes that device identifier
passed in "root=" is a block device. But there are few execptions
to this rule to take care of the needs of mtd, ubi, NFS and CIFS.
For example, mtd and ubi prefix "mtd:" or "ubi:" respectively.
"root=mtd:<identifier>" or "root=ubi:<identifier>"
NFS and CIFS use "root=/dev/nfs" and CIFS passes "root=/dev/cifs" and
actual root device details come from filesystem specific kernel
command line options.
virtiofs does not seem to fit in any of the above categories. In fact
we have 9pfs which can be used to boot from but it also does not
have a proper syntax to specify rootfs and does not fit into any of
the existing syntax. They both expect a device "tag" to be passed
in a device to be mounted. And filesystem knows how to parse and
use "tag".
So this patch proposes that we add a new prefix "fstag:" which specifies
that identifier which follows is filesystem specific tag and its not
a block device. Just pass this tag to filesystem and filesystem will
figure out how to mount it.
For example, "root=fstag:<tag>".
In case of virtiofs, I can specify "root=fstag:myfs rootfstype=virtiofs"
and it works.
I think this should work for 9p as well. "root=fstag:myfs rootfstype=9p".
Though I have yet to test it.
This kind of syntax should be able to address wide variety of use cases
where root device is not a block device and is simply some kind of
tag/label understood by filesystem.
Signed-off-by: Vivek Goyal <[email protected]>
---
include/linux/root_dev.h | 1 +
init/do_mounts.c | 13 +++++++++++++
2 files changed, 14 insertions(+)
diff --git a/include/linux/root_dev.h b/include/linux/root_dev.h
index 4e78651371ba..3fda7c5d9327 100644
--- a/include/linux/root_dev.h
+++ b/include/linux/root_dev.h
@@ -9,6 +9,7 @@
enum {
Root_NFS = MKDEV(UNNAMED_MAJOR, 255),
Root_CIFS = MKDEV(UNNAMED_MAJOR, 254),
+ Root_FSTAG = MKDEV(UNNAMED_MAJOR, 253),
Root_RAM0 = MKDEV(RAMDISK_MAJOR, 0),
Root_RAM1 = MKDEV(RAMDISK_MAJOR, 1),
Root_FD0 = MKDEV(FLOPPY_MAJOR, 0),
diff --git a/init/do_mounts.c b/init/do_mounts.c
index a78e44ee6adb..4d1df0da1ccc 100644
--- a/init/do_mounts.c
+++ b/init/do_mounts.c
@@ -275,6 +275,7 @@ static dev_t devt_from_devnum(const char *name)
* 9) PARTLABEL=<name> with name being the GPT partition label.
* MSDOS partitions do not support labels!
* 10) /dev/cifs represents Root_CIFS (0xfe)
+ * 11) fstag:<tag> represents Root_FSTAG (0xfd)
*
* If name doesn't have fall into the categories above, we return (0,0).
* block_class is used to check if something is a disk name. If the disk
@@ -287,6 +288,8 @@ dev_t name_to_dev_t(const char *name)
return Root_NFS;
if (strcmp(name, "/dev/cifs") == 0)
return Root_CIFS;
+ if (strncmp(name, "fstag:", 6) == 0)
+ return Root_FSTAG;
if (strcmp(name, "/dev/ram") == 0)
return Root_RAM0;
#ifdef CONFIG_BLOCK
@@ -552,6 +555,16 @@ void __init mount_root(void)
return;
}
#endif
+ if (ROOT_DEV == Root_FSTAG && root_fs_names) {
+ /* Skip "fstag:" prefix and point to real tag */
+ root_device_name += 6;
+ if (!do_mount_root(root_device_name, root_fs_names,
+ root_mountflags, root_mount_data))
+ return;
+ panic("VFS: Unable to mount root \"fstag:%s\" via \"%s\"\n",
+ root_device_name, root_fs_names);
+ }
+
#ifdef CONFIG_BLOCK
{
int err = create_dev("/dev/root", ROOT_DEV);
--
2.25.4
... And that's why I told Changbin Du a few times his patches needed
more Ccs :/
FWIW: we just got last month a couple of patches that would allow
initrd-less 9p root mount (using the nfs/cifs method described below
with root=/dev/v9fs)
Vivek Goyal wrote on Tue, Jun 08, 2021 at 11:35:24AM -0400:
> NFS and CIFS use "root=/dev/nfs" and CIFS passes "root=/dev/cifs" and
> actual root device details come from filesystem specific kernel
> command line options.
>
> virtiofs does not seem to fit in any of the above categories. In fact
> we have 9pfs which can be used to boot from but it also does not
> have a proper syntax to specify rootfs and does not fit into any of
> the existing syntax. They both expect a device "tag" to be passed
> in a device to be mounted. And filesystem knows how to parse and
> use "tag".
>
> So this patch proposes that we add a new prefix "fstag:" which specifies
> that identifier which follows is filesystem specific tag and its not
> a block device. Just pass this tag to filesystem and filesystem will
> figure out how to mount it.
...However I agree something more generic would be welcome in my
opinion, so I like this approach.
I'll give it a try for 9p over the weekend and report back.
--
Dominique
On 6/8/21 10:35 AM, Vivek Goyal wrote:
> We want to be able to mount virtiofs as rootfs and pass appropriate
> kernel command line. Right now there does not seem to be a good way
> to do that. If I specify "root=myfs rootfstype=virtiofs", system
> panics.
>
> virtio-fs: tag </dev/root> not found
> ..
> ..
> [ end Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0) ]
Whatever the best direction forward might be for kernel patches
regarding 'not block device as root', it would ease learning curves if
'the patterns that set config issue X' were the same across root 'not
block device options' whether cephfs, nfs, 9p, virtiofs. All of them
will have to handle the selinux xattr/issue, posix and flock issues,
caching etc. While by definition virtiofs has to exist only in a vm
guest, the others could be baremetal or vm guest roots. (How much 9p's
other-than-guest transports are used I don't know).
FYI (though patching the kernel may be the best option) there is a case
that does not have those kernel panics for virtiofs-root and 9p root
using stock fc34. As 9p, the virtiofs method uses the initrd creation
mechanisms provided by 'dracut' or 'initramfs' to provide the 'sysroot
pivot glue'.
On the fc34 guest a successful 'direct kernel boot' today looks like:
kernel path: /vmsystems/fedora_generic/boot/vmlinuz
initrd path: /vmsystems/fedora_generic/boot/initrd.img
Kernel args: root=virtiofs:myfs rd.shell rd.fstab
The xml to pass through virtio-fs is:
<filesystem type="mount" accessmode="passthrough">
<driver type="virtiofs" queue="1024"/>
<binary xattr="on">
<lock posix="on" flock="on"/>
</binary>
<source dir="/vmsystems/fedora_generic"/>
<target dir="myfs"/>
</filesystem>
The guest fstab is:
myfs / virtiofs defaults 0 0
HTH
Harry Coin
On Tue, Jun 08, 2021 at 11:35:24AM -0400, Vivek Goyal wrote:
> We want to be able to mount virtiofs as rootfs and pass appropriate
> kernel command line. Right now there does not seem to be a good way
> to do that. If I specify "root=myfs rootfstype=virtiofs", system
> panics.
>
> virtio-fs: tag </dev/root> not found
> ..
> ..
> [ end Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0) ]
>
> Basic problem here is that kernel assumes that device identifier
> passed in "root=" is a block device. But there are few execptions
> to this rule to take care of the needs of mtd, ubi, NFS and CIFS.
>
> For example, mtd and ubi prefix "mtd:" or "ubi:" respectively.
>
> "root=mtd:<identifier>" or "root=ubi:<identifier>"
>
> NFS and CIFS use "root=/dev/nfs" and CIFS passes "root=/dev/cifs" and
> actual root device details come from filesystem specific kernel
> command line options.
>
> virtiofs does not seem to fit in any of the above categories. In fact
> we have 9pfs which can be used to boot from but it also does not
> have a proper syntax to specify rootfs and does not fit into any of
> the existing syntax. They both expect a device "tag" to be passed
> in a device to be mounted. And filesystem knows how to parse and
> use "tag".
>
> So this patch proposes that we add a new prefix "fstag:" which specifies
> that identifier which follows is filesystem specific tag and its not
> a block device. Just pass this tag to filesystem and filesystem will
> figure out how to mount it.
>
> For example, "root=fstag:<tag>".
>
> In case of virtiofs, I can specify "root=fstag:myfs rootfstype=virtiofs"
> and it works.
>
> I think this should work for 9p as well. "root=fstag:myfs rootfstype=9p".
> Though I have yet to test it.
>
> This kind of syntax should be able to address wide variety of use cases
> where root device is not a block device and is simply some kind of
> tag/label understood by filesystem.
"fstag" is kind of virtio-9p/fs specific. The intended effect is really
to specify the file system source (like in mount(2)) without it being
interpreted as a block device.
In a previous discussion David Gilbert suggested detecting file systems
that do not need a block device:
https://patchwork.kernel.org/project/linux-fsdevel/patch/[email protected]/
I never got around to doing it, but can do_mounts.c just look at struct
file_system_type::fs_flags FS_REQUIRES_DEV to detect non-block device
file systems?
That way it would know to just mount with root= as the source instead of
treating it as a block device. No root= prefix would be required and it
would handle NFS, virtiofs, virtio-9p, etc without introducing the
concept of a "tag".
root=myfs rootfstype=virtiofs rootflags=...
I wrote this up quickly after not thinking about the topic for 2 years,
so the idea may not work at all :).
On 6/9/21 4:51 AM, Stefan Hajnoczi wrote:
> On Tue, Jun 08, 2021 at 11:35:24AM -0400, Vivek Goyal wrote:
>> We want to be able to mount virtiofs as rootfs and pass appropriate
>> kernel command line. Right now there does not seem to be a good way
>> to do that. If I specify "root=myfs rootfstype=virtiofs", system
>> panics.
>>
>> virtio-fs: tag </dev/root> not found
>> ..
>> ..
>> [ end Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0) ]
>>
>> Basic problem here is that kernel assumes that device identifier
>> passed in "root=" is a block device. But there are few execptions
>> to this rule to take care of the needs of mtd, ubi, NFS and CIFS.
>>
>> For example, mtd and ubi prefix "mtd:" or "ubi:" respectively.
>>
>> "root=mtd:<identifier>" or "root=ubi:<identifier>"
>>
>> NFS and CIFS use "root=/dev/nfs" and CIFS passes "root=/dev/cifs" and
>> actual root device details come from filesystem specific kernel
>> command line options.
>>
>> virtiofs does not seem to fit in any of the above categories. In fact
>> we have 9pfs which can be used to boot from but it also does not
>> have a proper syntax to specify rootfs and does not fit into any of
>> the existing syntax. They both expect a device "tag" to be passed
>> in a device to be mounted. And filesystem knows how to parse and
>> use "tag".
>>
>> So this patch proposes that we add a new prefix "fstag:" which specifies
>> that identifier which follows is filesystem specific tag and its not
>> a block device. Just pass this tag to filesystem and filesystem will
>> figure out how to mount it.
>>
>> For example, "root=fstag:<tag>".
>>
>> In case of virtiofs, I can specify "root=fstag:myfs rootfstype=virtiofs"
>> and it works.
>>
>> I think this should work for 9p as well. "root=fstag:myfs rootfstype=9p".
>> Though I have yet to test it.
>>
>> This kind of syntax should be able to address wide variety of use cases
>> where root device is not a block device and is simply some kind of
>> tag/label understood by filesystem.
> "fstag" is kind of virtio-9p/fs specific. The intended effect is really
> to specify the file system source (like in mount(2)) without it being
> interpreted as a block device.
>
> In a previous discussion David Gilbert suggested detecting file systems
> that do not need a block device:
> https://patchwork.kernel.org/project/linux-fsdevel/patch/[email protected]/
>
> I never got around to doing it, but can do_mounts.c just look at struct
> file_system_type::fs_flags FS_REQUIRES_DEV to detect non-block device
> file systems?
>
> That way it would know to just mount with root= as the source instead of
> treating it as a block device. No root= prefix would be required and it
> would handle NFS, virtiofs, virtio-9p, etc without introducing the
> concept of a "tag".
>
> root=myfs rootfstype=virtiofs rootflags=...
>
> I wrote this up quickly after not thinking about the topic for 2 years,
> so the idea may not work at all :).
I plead for the long term goal of syntax harmony between the kernel
command line and the first three fields of /etc/fstab.
Let's do one thing one way, even if it is specified more than one place.
HC
On Wed, Jun 09, 2021 at 10:51:56AM +0100, Stefan Hajnoczi wrote:
> On Tue, Jun 08, 2021 at 11:35:24AM -0400, Vivek Goyal wrote:
> > We want to be able to mount virtiofs as rootfs and pass appropriate
> > kernel command line. Right now there does not seem to be a good way
> > to do that. If I specify "root=myfs rootfstype=virtiofs", system
> > panics.
> >
> > virtio-fs: tag </dev/root> not found
> > ..
> > ..
> > [ end Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0) ]
> >
> > Basic problem here is that kernel assumes that device identifier
> > passed in "root=" is a block device. But there are few execptions
> > to this rule to take care of the needs of mtd, ubi, NFS and CIFS.
> >
> > For example, mtd and ubi prefix "mtd:" or "ubi:" respectively.
> >
> > "root=mtd:<identifier>" or "root=ubi:<identifier>"
> >
> > NFS and CIFS use "root=/dev/nfs" and CIFS passes "root=/dev/cifs" and
> > actual root device details come from filesystem specific kernel
> > command line options.
> >
> > virtiofs does not seem to fit in any of the above categories. In fact
> > we have 9pfs which can be used to boot from but it also does not
> > have a proper syntax to specify rootfs and does not fit into any of
> > the existing syntax. They both expect a device "tag" to be passed
> > in a device to be mounted. And filesystem knows how to parse and
> > use "tag".
> >
> > So this patch proposes that we add a new prefix "fstag:" which specifies
> > that identifier which follows is filesystem specific tag and its not
> > a block device. Just pass this tag to filesystem and filesystem will
> > figure out how to mount it.
> >
> > For example, "root=fstag:<tag>".
> >
> > In case of virtiofs, I can specify "root=fstag:myfs rootfstype=virtiofs"
> > and it works.
> >
> > I think this should work for 9p as well. "root=fstag:myfs rootfstype=9p".
> > Though I have yet to test it.
> >
> > This kind of syntax should be able to address wide variety of use cases
> > where root device is not a block device and is simply some kind of
> > tag/label understood by filesystem.
>
> "fstag" is kind of virtio-9p/fs specific. The intended effect is really
> to specify the file system source (like in mount(2)) without it being
> interpreted as a block device.
[ CC christoph ]
I think mount(2) has little different requirements. It more or less
passes the source to filesystem. But during early boot, we do so
much more with source, that is parse it and determine device major
and minor and create blockdevice and then call into filesystem.
>
> In a previous discussion David Gilbert suggested detecting file systems
> that do not need a block device:
> https://patchwork.kernel.org/project/linux-fsdevel/patch/[email protected]/
>
> I never got around to doing it, but can do_mounts.c just look at struct
> file_system_type::fs_flags FS_REQUIRES_DEV to detect non-block device
> file systems?
I guess we can use FS_REQUIRES_DEV. We probably will need to add a helper
to determine if filesystem passed in "rootfstype=" has FS_REQUIRES_DEV
set or not.
For now, I have written a patch which does not rely on FS_REQUIRES_DEV.
Instead I have created an array of filesystems which do not want
root=<source> to be treated as block device and expect that "source"
will be directly passed to filesytem to be mounted.
Reason I am not parsing FS_REQUIRES_DEV yet is that I am afraid that
this can change behavior and introduce regression. Some filesystem
which does not have FS_REQUIRES_DEV set but still somehow is going
through block device path (or some path which I can't see yet).
So for now I am playing safe and explicitly creating a list of
filesystems which will opt-in into this behavior. But if folks think
that my fears of regression are misplaced and I should parse
FS_REQUIRES_DEV and that way any filesystem which does not have
FS_REQUIRES_DEV set automatically gets opted in, I can do that.
>
> That way it would know to just mount with root= as the source instead of
> treating it as a block device. No root= prefix would be required and it
> would handle NFS, virtiofs, virtio-9p, etc without introducing the
> concept of a "tag".
>
> root=myfs rootfstype=virtiofs rootflags=...
>
> I wrote this up quickly after not thinking about the topic for 2 years,
> so the idea may not work at all :).
Now with this patch "root=myfs, rootfstype=virtiofs, rootflags=..." syntax
works for virtiofs.
Please have a look.
Thanks
Vivek
Subject: [PATCH] init/do_mounts.c: Add a path to boot from non blockdev filesystems
We want to be able to mount virtiofs as rootfs and pass appropriate
kernel command line. Right now there does not seem to be a good way
to do that. If I specify "root=myfs rootfstype=virtiofs", system
panics.
virtio-fs: tag </dev/root> not found
..
..
[ end Kernel panic - not syncing: VFS: Unable to mount root fs on
+unknown-block(0,0) ]
Basic problem here is that kernel assumes that device identifier
passed in "root=" is a block device. But there are few execptions
to this rule to take care of the needs of mtd, ubi, NFS and CIFS.
For example, mtd and ubi prefix "mtd:" or "ubi:" respectively.
"root=mtd:<identifier>" or "root=ubi:<identifier>"
NFS and CIFS use "root=/dev/nfs" and CIFS passes "root=/dev/cifs" and
actual root device details come from filesystem specific kernel
command line options.
virtiofs does not seem to fit in any of the above categories. In fact
we have 9pfs which can be used to boot from but it also does not
have a proper syntax to specify rootfs and does not fit into any of
the existing syntax. They both expect a device "tag" to be passed
in a device to be mounted. And filesystem knows how to parse and
use "tag".
So this patch proposes that we internally create a list of filesystems
which don't expect a block device and whatever "source" has been
passed in "root=<source>" option, should be passed to filesystem and
filesystem should be able to figure out how to use "source" to
mount filesystem.
As of now I have only added "virtiofs" in the list of such filesystems.
To enable it on 9p, it should be a simple change. Just add "9p" to
the nobdev_filesystems[] array.
And any other file system which is fine with these semantics, that
is pass "source" to filesystem directo to mount, should be able
to work with it easily.
Signed-off-by: Vivek Goyal <[email protected]>
---
init/do_mounts.c | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)
Index: redhat-linux/init/do_mounts.c
===================================================================
--- redhat-linux.orig/init/do_mounts.c 2021-06-09 11:12:58.839309965 -0400
+++ redhat-linux/init/do_mounts.c 2021-06-09 11:41:56.642982703 -0400
@@ -31,6 +31,8 @@
int root_mountflags = MS_RDONLY | MS_SILENT;
static char * __initdata root_device_name;
static char __initdata saved_root_name[64];
+static char *__initdata nobdev_filesystems[] = {"virtiofs"};
+static bool __initdata nobdev_root = false;
static int root_wait;
dev_t ROOT_DEV;
@@ -552,6 +554,14 @@ void __init mount_root(void)
return;
}
#endif
+ if (nobdev_root) {
+ if (!do_mount_root(root_device_name, root_fs_names,
+ root_mountflags, root_mount_data))
+ return;
+ panic("VFS: Unable to mount root \"%s\" via \"%s\"\n",
+ root_device_name, root_fs_names);
+ }
+
#ifdef CONFIG_BLOCK
{
int err = create_dev("/dev/root", ROOT_DEV);
@@ -563,6 +573,22 @@ void __init mount_root(void)
#endif
}
+static bool nobdev_rootfs(char *name)
+{
+ int i, len;
+
+ len = sizeof(nobdev_filesystems)/sizeof(nobdev_filesystems[0]);
+
+ for (i = 0; i < len; i++) {
+ int name_len = strlen(nobdev_filesystems[i]) + 1;
+
+ if (!strncmp(nobdev_filesystems[i], name, name_len))
+ return true;
+ }
+
+ return false;
+}
+
/*
* Prepare the namespace - decide what/where to mount, load ramdisks, etc.
*/
@@ -593,6 +619,10 @@ void __init prepare_namespace(void)
goto out;
}
ROOT_DEV = name_to_dev_t(root_device_name);
+ if (ROOT_DEV == 0 && root_fs_names) {
+ if (nobdev_rootfs(root_fs_names))
+ nobdev_root = true;
+ }
if (strncmp(root_device_name, "/dev/", 5) == 0)
root_device_name += 5;
}
On Wed, Jun 09, 2021 at 11:45:43AM -0400, Vivek Goyal wrote:
> On Wed, Jun 09, 2021 at 10:51:56AM +0100, Stefan Hajnoczi wrote:
> > On Tue, Jun 08, 2021 at 11:35:24AM -0400, Vivek Goyal wrote:
> > > We want to be able to mount virtiofs as rootfs and pass appropriate
> > > kernel command line. Right now there does not seem to be a good way
> > > to do that. If I specify "root=myfs rootfstype=virtiofs", system
> > > panics.
> > >
> > > virtio-fs: tag </dev/root> not found
> > > ..
> > > ..
> > > [ end Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0) ]
> > >
> > > Basic problem here is that kernel assumes that device identifier
> > > passed in "root=" is a block device. But there are few execptions
> > > to this rule to take care of the needs of mtd, ubi, NFS and CIFS.
> > >
> > > For example, mtd and ubi prefix "mtd:" or "ubi:" respectively.
> > >
> > > "root=mtd:<identifier>" or "root=ubi:<identifier>"
> > >
> > > NFS and CIFS use "root=/dev/nfs" and CIFS passes "root=/dev/cifs" and
> > > actual root device details come from filesystem specific kernel
> > > command line options.
> > >
> > > virtiofs does not seem to fit in any of the above categories. In fact
> > > we have 9pfs which can be used to boot from but it also does not
> > > have a proper syntax to specify rootfs and does not fit into any of
> > > the existing syntax. They both expect a device "tag" to be passed
> > > in a device to be mounted. And filesystem knows how to parse and
> > > use "tag".
> > >
> > > So this patch proposes that we add a new prefix "fstag:" which specifies
> > > that identifier which follows is filesystem specific tag and its not
> > > a block device. Just pass this tag to filesystem and filesystem will
> > > figure out how to mount it.
> > >
> > > For example, "root=fstag:<tag>".
> > >
> > > In case of virtiofs, I can specify "root=fstag:myfs rootfstype=virtiofs"
> > > and it works.
> > >
> > > I think this should work for 9p as well. "root=fstag:myfs rootfstype=9p".
> > > Though I have yet to test it.
> > >
> > > This kind of syntax should be able to address wide variety of use cases
> > > where root device is not a block device and is simply some kind of
> > > tag/label understood by filesystem.
> >
> > "fstag" is kind of virtio-9p/fs specific. The intended effect is really
> > to specify the file system source (like in mount(2)) without it being
> > interpreted as a block device.
>
> [ CC christoph ]
>
> I think mount(2) has little different requirements. It more or less
> passes the source to filesystem. But during early boot, we do so
> much more with source, that is parse it and determine device major
> and minor and create blockdevice and then call into filesystem.
>
> >
> > In a previous discussion David Gilbert suggested detecting file systems
> > that do not need a block device:
> > https://patchwork.kernel.org/project/linux-fsdevel/patch/[email protected]/
> >
> > I never got around to doing it, but can do_mounts.c just look at struct
> > file_system_type::fs_flags FS_REQUIRES_DEV to detect non-block device
> > file systems?
>
> I guess we can use FS_REQUIRES_DEV. We probably will need to add a helper
> to determine if filesystem passed in "rootfstype=" has FS_REQUIRES_DEV
> set or not.
>
> For now, I have written a patch which does not rely on FS_REQUIRES_DEV.
> Instead I have created an array of filesystems which do not want
> root=<source> to be treated as block device and expect that "source"
> will be directly passed to filesytem to be mounted.
>
> Reason I am not parsing FS_REQUIRES_DEV yet is that I am afraid that
> this can change behavior and introduce regression. Some filesystem
> which does not have FS_REQUIRES_DEV set but still somehow is going
> through block device path (or some path which I can't see yet).
>
> So for now I am playing safe and explicitly creating a list of
> filesystems which will opt-in into this behavior. But if folks think
> that my fears of regression are misplaced and I should parse
> FS_REQUIRES_DEV and that way any filesystem which does not have
> FS_REQUIRES_DEV set automatically gets opted in, I can do that.
>
> >
> > That way it would know to just mount with root= as the source instead of
> > treating it as a block device. No root= prefix would be required and it
> > would handle NFS, virtiofs, virtio-9p, etc without introducing the
> > concept of a "tag".
> >
> > root=myfs rootfstype=virtiofs rootflags=...
> >
> > I wrote this up quickly after not thinking about the topic for 2 years,
> > so the idea may not work at all :).
>
> Now with this patch "root=myfs, rootfstype=virtiofs, rootflags=..." syntax
> works for virtiofs.
>
> Please have a look.
Looks good from a user perspective.
> Subject: [PATCH] init/do_mounts.c: Add a path to boot from non blockdev filesystems
>
> We want to be able to mount virtiofs as rootfs and pass appropriate
> kernel command line. Right now there does not seem to be a good way
> to do that. If I specify "root=myfs rootfstype=virtiofs", system
> panics.
>
> virtio-fs: tag </dev/root> not found
> ..
> ..
> [ end Kernel panic - not syncing: VFS: Unable to mount root fs on
> +unknown-block(0,0) ]
>
> Basic problem here is that kernel assumes that device identifier
> passed in "root=" is a block device. But there are few execptions
> to this rule to take care of the needs of mtd, ubi, NFS and CIFS.
>
> For example, mtd and ubi prefix "mtd:" or "ubi:" respectively.
>
> "root=mtd:<identifier>" or "root=ubi:<identifier>"
>
> NFS and CIFS use "root=/dev/nfs" and CIFS passes "root=/dev/cifs" and
> actual root device details come from filesystem specific kernel
> command line options.
>
> virtiofs does not seem to fit in any of the above categories. In fact
> we have 9pfs which can be used to boot from but it also does not
> have a proper syntax to specify rootfs and does not fit into any of
> the existing syntax. They both expect a device "tag" to be passed
> in a device to be mounted. And filesystem knows how to parse and
> use "tag".
>
> So this patch proposes that we internally create a list of filesystems
> which don't expect a block device and whatever "source" has been
> passed in "root=<source>" option, should be passed to filesystem and
> filesystem should be able to figure out how to use "source" to
> mount filesystem.
>
> As of now I have only added "virtiofs" in the list of such filesystems.
> To enable it on 9p, it should be a simple change. Just add "9p" to
> the nobdev_filesystems[] array.
virtio-9p should be simple. I'm not sure how much additional setup the
other 9p transports require. TCP and RDMA seem doable if there are
kernel parameters to configure things before the root file system is
mounted.
Stefan Hajnoczi wrote on Thu, Jun 10, 2021 at 09:16:38AM +0100:
> virtio-9p should be simple. I'm not sure how much additional setup the
> other 9p transports require. TCP and RDMA seem doable if there are
> kernel parameters to configure things before the root file system is
> mounted.
For TCP, we can probably piggyback on what nfs does for this, see the
ip= parameter in Documentation/admin-guide/nfs/nfsroot.rst -- it lives
in net/ipv4/ipconfig.c so should just work out of the box
For RDMA I'm less optimistic, technically if we can setup ipoib with an
ip with the same parameter it should work, but I have nothing to test it
on anyway so that'll wait until someone who cares actually does try...
Either way, we've got to start somewhere, so it's good there's something :)
--
Dominique
Dominique Martinet wrote on Thu, Jun 10, 2021 at 05:33:34PM +0900:
> Stefan Hajnoczi wrote on Thu, Jun 10, 2021 at 09:16:38AM +0100:
> > virtio-9p should be simple. I'm not sure how much additional setup the
> > other 9p transports require. TCP and RDMA seem doable if there are
> > kernel parameters to configure things before the root file system is
> > mounted.
>
> For TCP, we can probably piggyback on what nfs does for this, see the
> ip= parameter in Documentation/admin-guide/nfs/nfsroot.rst -- it lives
> in net/ipv4/ipconfig.c so should just work out of the box
Hm, just tried and it doesn't quite work for some reason -- in this
stack trace:
kthread_should_stop+0x71/0xb0
wait_woken+0x182/0x1c0
__inet_stream_connect+0x48a/0xc00
inet_stream_connect+0x53/0xa0
p9_fd_create_tcp+0x2d6/0x420
p9_client_create+0x7bc/0x11d0
v9fs_session_init+0x1cd/0x1220
v9fs_mount+0x8c/0x870
legacy_get_tree+0xef/0x1d0
vfs_get_tree+0x83/0x240
path_mount+0xda3/0x1800
init_mount+0x98/0xdd
do_mount_root+0xe0/0x255
mount_root+0x47/0xd7
prepare_namespace+0x136/0x165
kernel_init+0xd/0x123
ret_from_fork+0x22/0x30
current->set_child_tid is null, causing a null deref when checking
&to_kthread(current)->flags
It does work with nfsroot though so even if this doesn't look 9p
specific I guess I'll need to debug that eventually, but this can
be done later... I'm guessing they don't use the same connect() function
as 9p's is ipv4-specific (ugh) and that needs fixing eventually anyway.
For reference this is relevant part of kernel command line I used for
tcp:
root=fstag:x.y.z.t rootflags=trans=tcp,aname=rootfs rootfstype=9p ip=dhcp
(and ip=dhcp requires CONFIG_IP_PNP_DHCP=y)
Virtio does work quite well though and that's good enough for me -- I
was going to suggest also documenting increasing the msize (setting
e.g. rootflags=msize=262144) but we really ought to increase the
default, that came up recently and since no patch was sent I kind of
forgot... Will do that now.
@Vivek - I personally don't really care much, but would tend to prefer
your v2 (without fstag:) from a user perspective the later is definitely
better but I don't really like the static nobdev_filesystems array --
I'd bite the bullet and use FS_REQUIRES_DEV (and move this part of the
code just a bit below after the root_wait check just in case it matters,
but at that point if something would mount with /dev/root but not with
the raw root=argument then they probably do require a device!)
It could also be gated by a config option like e.g. CONFIG_ROOT_NFS and
others all are to make sure it doesn't impact anyone who doesn't want to
be impacted - I'm sure some people want to make sure their device
doesn't boot off a weird root if someone manages to change kernel params
so would want a way of disabling the option...
Well, if you keep the array, please add 9p to the list and resend as a
proper patch so I can reply with tested-by/reviewed-by tags on something
more final.
Also, matter-of-factedly, how is this going to be picked up?
Is the plan to send it directly to Linus as part of the next virtiofs
PR? Going through Al Viro?
Thanks,
--
Dominique
On Sun, Jun 13, 2021 at 08:56:11PM +0900, Dominique Martinet wrote:
> Dominique Martinet wrote on Thu, Jun 10, 2021 at 05:33:34PM +0900:
> > Stefan Hajnoczi wrote on Thu, Jun 10, 2021 at 09:16:38AM +0100:
> > > virtio-9p should be simple. I'm not sure how much additional setup the
> > > other 9p transports require. TCP and RDMA seem doable if there are
> > > kernel parameters to configure things before the root file system is
> > > mounted.
> >
> > For TCP, we can probably piggyback on what nfs does for this, see the
> > ip= parameter in Documentation/admin-guide/nfs/nfsroot.rst -- it lives
> > in net/ipv4/ipconfig.c so should just work out of the box
>
> Hm, just tried and it doesn't quite work for some reason -- in this
> stack trace:
> kthread_should_stop+0x71/0xb0
> wait_woken+0x182/0x1c0
> __inet_stream_connect+0x48a/0xc00
> inet_stream_connect+0x53/0xa0
> p9_fd_create_tcp+0x2d6/0x420
> p9_client_create+0x7bc/0x11d0
> v9fs_session_init+0x1cd/0x1220
> v9fs_mount+0x8c/0x870
> legacy_get_tree+0xef/0x1d0
> vfs_get_tree+0x83/0x240
> path_mount+0xda3/0x1800
> init_mount+0x98/0xdd
> do_mount_root+0xe0/0x255
> mount_root+0x47/0xd7
> prepare_namespace+0x136/0x165
> kernel_init+0xd/0x123
> ret_from_fork+0x22/0x30
>
> current->set_child_tid is null, causing a null deref when checking
> &to_kthread(current)->flags
>
> It does work with nfsroot though so even if this doesn't look 9p
> specific I guess I'll need to debug that eventually, but this can
> be done later... I'm guessing they don't use the same connect() function
> as 9p's is ipv4-specific (ugh) and that needs fixing eventually anyway.
>
> For reference this is relevant part of kernel command line I used for
> tcp:
> root=fstag:x.y.z.t rootflags=trans=tcp,aname=rootfs rootfstype=9p ip=dhcp
>
> (and ip=dhcp requires CONFIG_IP_PNP_DHCP=y)
>
>
>
> Virtio does work quite well though and that's good enough for me -- I
> was going to suggest also documenting increasing the msize (setting
> e.g. rootflags=msize=262144) but we really ought to increase the
> default, that came up recently and since no patch was sent I kind of
> forgot... Will do that now.
>
>
>
> @Vivek - I personally don't really care much, but would tend to prefer
> your v2 (without fstag:) from a user perspective the later is definitely
> better but I don't really like the static nobdev_filesystems array --
I am not a big fan of nobdev_filesystems array but I really don't feel
comfortable opening up this code by default to all filesystems having
flag FS_REQUIRES_DEV. Use cases of this code path are not well documented
and something somewhere will be broken and called regression.
I think nobdev_filesystems is sort of a misfit. Even mtd, ubi, cifs
and nfs are nobdev filesystems but they are not covered by this.
May be this array needs to be called say "tag_based_rootfs[]" and
array entries can be gated by config options from filesystems.
#ifdef CONFIG_VIRTIO_FS
"virtiofs",
#endif
#ifdef CONFIG_FOO
"foo",
#endif
Now all the filesystems which want to simply pass a "tag" as source of
mount to filesystem directly during mount can opt-in into this.
virtiofs and 9p are the first two filesystmes to opt-in. And then
flow of mount code becomes much more clean.
Is it mtd or ubi device
mount mtd/ubi root
Is it nfs/cifs device
mount nfs/cifs root
Is it tag based fs
mount tag based fs
else
mount block device based fs.
Calling it nobdev based fs makes it too generic while we are not
covering all the cases.
So my personal perference is that just to carve out a new class
of rootfs (tag based) and let users opt in if they fit into
this class. This kind of approach also reduces risk of any
regression significantly.
I don't know enough about mtd/ubi but I suspect that it might be
possible that they can make use of this option as well.
> I'd bite the bullet and use FS_REQUIRES_DEV (and move this part of the
> code just a bit below after the root_wait check just in case it matters,
Problem with moving this below root_wait check is that if user boots
with root_wait option for virtiofs/9p, it will loop infitely. Reason
being that ROOT_DEV=0 and device will never show up.
I am assuming that for out use cases, device will need to be present
at the time of boot. We can't have a notion of waiting for device to
show up.
> but at that point if something would mount with /dev/root but not with
> the raw root=argument then they probably do require a device!)
>
> It could also be gated by a config option like e.g. CONFIG_ROOT_NFS and
> others all are to make sure it doesn't impact anyone who doesn't want to
> be impacted - I'm sure some people want to make sure their device
> doesn't boot off a weird root if someone manages to change kernel params
> so would want a way of disabling the option...
I guess I could do that. Given more than one filesystem will use this
option (virtiofs and 9p to begin with), so we will have to have a
config option name which is little more generic and not filesystem
specific like CONFIG_ROOT_NFS or CONFIG_ROOT_CIFS.
>
>
> Well, if you keep the array, please add 9p to the list and resend as a
> proper patch so I can reply with tested-by/reviewed-by tags on something
> more final.
Sure, I will add 9p to list.
>
>
> Also, matter-of-factedly, how is this going to be picked up?
> Is the plan to send it directly to Linus as part of the next virtiofs
> PR? Going through Al Viro?
I was hoping that this patch can be routed through Al Viro.
Thanks
Vivek
Vivek Goyal wrote on Mon, Jun 14, 2021 at 10:28:04AM -0400:
> I am not a big fan of nobdev_filesystems array but I really don't feel
> comfortable opening up this code by default to all filesystems having
> flag FS_REQUIRES_DEV. Use cases of this code path are not well documented
> and something somewhere will be broken and called regression.
>
> I think nobdev_filesystems is sort of a misfit. Even mtd, ubi, cifs
> and nfs are nobdev filesystems but they are not covered by this.
I think it's fine being able to have these root mounted both ways, then
eventually removing the old fs-specific options after a period of
deprecation to have a unique and simple interface.
Maybe it's just a bit of a dream big attitude :-)
> > I'd bite the bullet and use FS_REQUIRES_DEV (and move this part of the
> > code just a bit below after the root_wait check just in case it matters,
>
> Problem with moving this below root_wait check is that if user boots
> with root_wait option for virtiofs/9p, it will loop infitely. Reason
> being that ROOT_DEV=0 and device will never show up.
Hm, well, then don't use root_wait?! would be my first reaction.
The reason I suggested to move below would be that there might be
filesystems which handle both a block device and no block device, and
for these we wouldn't want to break root_wait which would become kind of
a switch saying "this really is a block device usecase even if the fs
doesn't require dev" -- that's also the reason I was mostly optimistic
even if we make it generic for all filesystems, there'd be this way out
even if the thing is compiled in.
Ultimately if we go through the explicit whitelist that's not required
anyway, and in that case it's probably better to check before as you've
said.
> I am assuming that for out use cases, device will need to be present
> at the time of boot. We can't have a notion of waiting for device to
> show up.
>
> > but at that point if something would mount with /dev/root but not with
> > the raw root=argument then they probably do require a device!)
> >
> > It could also be gated by a config option like e.g. CONFIG_ROOT_NFS and
> > others all are to make sure it doesn't impact anyone who doesn't want to
> > be impacted - I'm sure some people want to make sure their device
> > doesn't boot off a weird root if someone manages to change kernel params
> > so would want a way of disabling the option...
>
> I guess I could do that. Given more than one filesystem will use this
> option (virtiofs and 9p to begin with), so we will have to have a
> config option name which is little more generic and not filesystem
> specific like CONFIG_ROOT_NFS or CONFIG_ROOT_CIFS.
Well there's the builtin check you added, and there's the ability to
root boot from it that's historically always been separated.
The builtin checks you added actually doesn't matter all that much to
me. I mean, it'll pass this step, but fail as it cannot mount later
anyway, and it was an explicit request to have this filesystem in the
command line: you've changed an error that says "I cannot mount 9p!" to
"I cannot find root-device!" so it's not really a big deal.
What I was advocating for is the whole feature being gated by some
option - my example with an embdedded device having 9p builtin (because
for some reason they have everything builtin) but not wanting to boot on
a tcp 9p rootfs still stands even if we're limiting this to a few
filesystems.
If you're keeping the idea of tags CONFIG_ROOT_TAGS ?
> > Also, matter-of-factedly, how is this going to be picked up?
> > Is the plan to send it directly to Linus as part of the next virtiofs
> > PR? Going through Al Viro?
>
> I was hoping that this patch can be routed through Al Viro.
Sounds good to me as well - I've upgraded him to To: to get his
attention.
(v2 has been sent as "[PATCH v2 0/2] Add support to boot virtiofs and
9pfs as rootfs"; I'll review/retest in the next few days)
--
Dominique
On Tue, Jun 15, 2021 at 08:14:37AM +0900, Dominique Martinet wrote:
> Vivek Goyal wrote on Mon, Jun 14, 2021 at 10:28:04AM -0400:
> > I am not a big fan of nobdev_filesystems array but I really don't feel
> > comfortable opening up this code by default to all filesystems having
> > flag FS_REQUIRES_DEV. Use cases of this code path are not well documented
> > and something somewhere will be broken and called regression.
> >
> > I think nobdev_filesystems is sort of a misfit. Even mtd, ubi, cifs
> > and nfs are nobdev filesystems but they are not covered by this.
>
> I think it's fine being able to have these root mounted both ways, then
> eventually removing the old fs-specific options after a period of
> deprecation to have a unique and simple interface.
Sure. And I think v2 of patches where I propose a whitelist of filesystems
should be able to achieve that. Right now I call that tag based
filesystems where tag is directly passed to filesystem. I think
mtd/ubi/cifs/nfs should be able to use that as well and be added to
that whitelist down the line.
>
> Maybe it's just a bit of a dream big attitude :-)
>
> > > I'd bite the bullet and use FS_REQUIRES_DEV (and move this part of the
> > > code just a bit below after the root_wait check just in case it matters,
> >
> > Problem with moving this below root_wait check is that if user boots
> > with root_wait option for virtiofs/9p, it will loop infitely. Reason
> > being that ROOT_DEV=0 and device will never show up.
>
> Hm, well, then don't use root_wait?! would be my first reaction.
>
> The reason I suggested to move below would be that there might be
> filesystems which handle both a block device and no block device, and
> for these we wouldn't want to break root_wait which would become kind of
> a switch saying "this really is a block device usecase even if the fs
> doesn't require dev" -- that's also the reason I was mostly optimistic
> even if we make it generic for all filesystems, there'd be this way out
> even if the thing is compiled in.
>
>
> Ultimately if we go through the explicit whitelist that's not required
> anyway, and in that case it's probably better to check before as you've
> said.
Yes, current whitelist based approach will not allow to have both
block devices as well as tag/non-block based root devices. Are there
any examples where we current filesystems support such things. And
can filesystem deal with it instead?
If this becomes a requirement, then we will have to go back to my
previous proposal of "root=fstag=<tag>" instead. That way "root=<foo>"
will be interpreted as block device while "root=fstag=<foo>" explicitly
says its some kind of tag (and not a block device).
>
>
> > I am assuming that for out use cases, device will need to be present
> > at the time of boot. We can't have a notion of waiting for device to
> > show up.
> >
> > > but at that point if something would mount with /dev/root but not with
> > > the raw root=argument then they probably do require a device!)
> > >
> > > It could also be gated by a config option like e.g. CONFIG_ROOT_NFS and
> > > others all are to make sure it doesn't impact anyone who doesn't want to
> > > be impacted - I'm sure some people want to make sure their device
> > > doesn't boot off a weird root if someone manages to change kernel params
> > > so would want a way of disabling the option...
> >
> > I guess I could do that. Given more than one filesystem will use this
> > option (virtiofs and 9p to begin with), so we will have to have a
> > config option name which is little more generic and not filesystem
> > specific like CONFIG_ROOT_NFS or CONFIG_ROOT_CIFS.
>
> Well there's the builtin check you added, and there's the ability to
> root boot from it that's historically always been separated.
>
> The builtin checks you added actually doesn't matter all that much to
> me. I mean, it'll pass this step, but fail as it cannot mount later
> anyway, and it was an explicit request to have this filesystem in the
> command line: you've changed an error that says "I cannot mount 9p!" to
> "I cannot find root-device!" so it's not really a big deal.
>
>
> What I was advocating for is the whole feature being gated by some
> option - my example with an embdedded device having 9p builtin (because
> for some reason they have everything builtin) but not wanting to boot on
> a tcp 9p rootfs still stands even if we're limiting this to a few
> filesystems.
>
> If you're keeping the idea of tags CONFIG_ROOT_TAGS ?
I thought about it and CONFIG_ROOT_TAGS made less sense because it will
disable all filesystem roots. So say you don't want to boot from 9p
rootfs but are ok booting from virtiofs rootfs, then disablig whole
feature does not allow that.
We probably need to have per filesystem option. Something like CONFIG_ROOT_NFS
and CONFIG_CIFS_ROOT. So may be we need to add CONFIG_ROOT_VIRTIOFS
and COFIG_ROOT_9P_FS to decide wither to include filesystem in whitelist
or not and that will enable/disable boot from root functionality.
I feel that these kind of patches can go in later. Because a user
can boot from 9p or virtiofs rootfs anyway using mtd prefix hack
or using /dev/root as tag hack and adding these options does not
close those paths. So I thought that adding these config
options should not be a strict requirement for this patch series and
these options can be added separately in respective filesystems. WDYT?
Thanks
Vivek
>
>
> > > Also, matter-of-factedly, how is this going to be picked up?
> > > Is the plan to send it directly to Linus as part of the next virtiofs
> > > PR? Going through Al Viro?
> >
> > I was hoping that this patch can be routed through Al Viro.
>
> Sounds good to me as well - I've upgraded him to To: to get his
> attention.
> (v2 has been sent as "[PATCH v2 0/2] Add support to boot virtiofs and
> 9pfs as rootfs"; I'll review/retest in the next few days)
>
> --
> Dominique
>
Vivek Goyal wrote on Tue, Jun 15, 2021 at 09:50:57AM -0400:
>> Ultimately if we go through the explicit whitelist that's not required
>> anyway, and in that case it's probably better to check before as you've
>> said.
>
> Yes, current whitelist based approach will not allow to have both
> block devices as well as tag/non-block based root devices. Are there
> any examples where we current filesystems support such things. And
> can filesystem deal with it instead?
Hmm I had thought ubi might allow for both through mount options, but
that doesn't seem to be the case.
I guess it is possible to imagine some fuse filesystem allowing both but
I'm not sure fuse is a valid target for rootfs, and looking at the list
of others filesystems which don't have the flag (from a quick grep:
mm/shmem, ipc/mqueue, nfs, sysfs, ramfs, procfs, overlayfs, hostfs,
fuse, ecryptfs, devpts, coda, binderfs, cifs, ceph, afs, 9p, cgroups) I
guess that might not be a problem.
> If this becomes a requirement, then we will have to go back to my
> previous proposal of "root=fstag=<tag>" instead. That way "root=<foo>"
> will be interpreted as block device while "root=fstag=<foo>" explicitly
> says its some kind of tag (and not a block device).
I guess that if it ever becomes an issue we could make rootwait wait for
only driver_probe_done() at that point, but as it is now sounds good to
me for now.
While I'm looking at this code, I feel that the two
if (!strncmp(root_device_name, "mtd", 3) ||
!strncmp(root_device_name, "ubi", 3)) {
will eventually cause some problems once we say arbitrary tags are
allowed if one ever starts with it, but I'm not sure that can be changed
without breaking something else so let's leave that aside for now as
well...
>> What I was advocating for is the whole feature being gated by some
>> option - my example with an embdedded device having 9p builtin (because
>> for some reason they have everything builtin) but not wanting to boot on
>> a tcp 9p rootfs still stands even if we're limiting this to a few
>> filesystems.
>>
>> If you're keeping the idea of tags CONFIG_ROOT_TAGS ?
>
> I thought about it and CONFIG_ROOT_TAGS made less sense because it will
> disable all filesystem roots. So say you don't want to boot from 9p
> rootfs but are ok booting from virtiofs rootfs, then disablig whole
> feature does not allow that.
>
> We probably need to have per filesystem option. Something like CONFIG_ROOT_NFS
> and CONFIG_CIFS_ROOT. So may be we need to add CONFIG_ROOT_VIRTIOFS
> and COFIG_ROOT_9P_FS to decide wither to include filesystem in whitelist
> or not and that will enable/disable boot from root functionality.
Hm, I guess that makes sense.
A global kill switch might have made integration easier if it's disabled
by default like others, but you're right that in practice people would
only want to enable a specific filesystem, right.
> I feel that these kind of patches can go in later. Because a user
> can boot from 9p or virtiofs rootfs anyway using mtd prefix hack
> or using /dev/root as tag hack and adding these options does not
> close those paths. So I thought that adding these config
> options should not be a strict requirement for this patch series and
> these options can be added separately in respective filesystems. WDYT?
I wasn't aware of such possibilities, good to know :-D
Sounds good to me, I'll do proper review and retest the v2 patch over the
weekend.
--
Dominique