2021-02-24 15:22:29

by Rasmus Villemoes

[permalink] [raw]
Subject: [PATCH/RFC 0/2] background initramfs unpacking, and CONFIG_MODPROBE_PATH

These two patches are independent, but better-together.

The second is a rather trivial patch that simply allows the developer
to change "/sbin/modprobe" to something else - e.g. the empty string,
so that all request_module() during early boot return -ENOENT early,
without even spawning a usermode helper.

The first patch allows delegating decompressing the initramfs to a
worker thread, allowing do_initcalls() in main.c to proceed to the
device_ and late_ initcalls without waiting for that decompression
(and populating of rootfs) to finish. Obviously, some of those later
calls may rely on the initramfs being available, so I've added
synchronization points in the firmware loader and usermodehelper paths
- there might be other places that would need this.

There's not much to win if most of the functionality needed during
boot is only available as modules. But systems with a custom-made
.config and initramfs can boot faster, partly due to utilizing more
than one cpu earlier, partly by avoiding known-futile modprobe calls
(which would still trigger synchronization with the initramfs
unpacking, thus eliminating most of the first benefit).

Rasmus Villemoes (2):
init/initramfs.c: allow asynchronous unpacking
modules: add CONFIG_MODPROBE_PATH

.../admin-guide/kernel-parameters.txt | 12 +++++
drivers/base/firmware_loader/main.c | 2 +
include/linux/initrd.h | 7 +++
init/Kconfig | 12 +++++
init/initramfs.c | 51 ++++++++++++++++++-
init/main.c | 1 +
kernel/kmod.c | 2 +-
kernel/umh.c | 2 +
usr/Kconfig | 10 ++++
9 files changed, 97 insertions(+), 2 deletions(-)

--
2.29.2


2021-02-24 15:23:09

by Rasmus Villemoes

[permalink] [raw]
Subject: [PATCH/RFC 2/2] modules: add CONFIG_MODPROBE_PATH

Allow the developer to specifiy the initial value of the
modprobe_path[] string. This can be used to set it to the empty string
initially, thus effectively disabling request_module() during early
boot until userspace writes a new value via the
/proc/sys/kernel/modprobe interface. [1]

When building a custom kernel (often for an embedded target), it's
normal to build everything into the kernel that is needed for booting,
and indeed the initramfs often contains no modules at all, so every
such request_module() done before userspace init has mounted the real
rootfs is a waste of time.

This is particularly useful when combined with the previous patch,
which allowed the initramfs to be unpacked asynchronously - for that
to work, it had to make any usermodehelper call wait for the unpacking
to finish before attempting to invoke the userspace helper. By
eliminating all such (known-to-be-futile) calls of usermodehelper, the
initramfs unpacking and the {device,late}_initcalls can proceed in
parallel for much longer.

[1] __request_module() already has an early -ENOENT return when
modprobe_path is the empty string.

Signed-off-by: Rasmus Villemoes <[email protected]>
---
init/Kconfig | 12 ++++++++++++
kernel/kmod.c | 2 +-
2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/init/Kconfig b/init/Kconfig
index ba8bd5256980..e3f61e15445d 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -2272,6 +2272,18 @@ config MODULE_ALLOW_MISSING_NAMESPACE_IMPORTS

If unsure, say N.

+config MODPROBE_PATH
+ string "Path to modprobe binary"
+ default "/sbin/modprobe"
+ help
+ When kernel code requests a module, it does so by calling
+ the "modprobe" userspace utility. This option allows you to
+ set the path where that binary is found. This can be changed
+ at runtime via the sysctl file
+ /proc/sys/kernel/modprobe. Setting this to the empty string
+ removes the kernel's ability to request modules (but
+ userspace can still load modules explicitly).
+
config TRIM_UNUSED_KSYMS
bool "Trim unused exported kernel symbols"
depends on BROKEN
diff --git a/kernel/kmod.c b/kernel/kmod.c
index 3cd075ce2a1e..b717134ebe17 100644
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -58,7 +58,7 @@ static DECLARE_WAIT_QUEUE_HEAD(kmod_wq);
/*
modprobe_path is set via /proc/sys.
*/
-char modprobe_path[KMOD_PATH_LEN] = "/sbin/modprobe";
+char modprobe_path[KMOD_PATH_LEN] = CONFIG_MODPROBE_PATH;

static void free_modprobe_argv(struct subprocess_info *info)
{
--
2.29.2

2021-02-25 03:53:41

by Rasmus Villemoes

[permalink] [raw]
Subject: Re: [PATCH/RFC 0/2] background initramfs unpacking, and CONFIG_MODPROBE_PATH

On 24/02/2021 15.29, Rasmus Villemoes wrote:
> These two patches are independent, but better-together.

kernel test robot says I'll have to add a 0/2 patch:

linux/initrd.h: grow an include guard

[because defining a macro a second time with the same contents is ok, so
is doing another extern declaration, but a trivial static inline
definition can't be repeated sigh]

2021-03-09 21:18:45

by Rasmus Villemoes

[permalink] [raw]
Subject: [PATCH v2 0/2] background initramfs unpacking, and CONFIG_MODPROBE_PATH

These two patches are independent, but better-together.

The second is a rather trivial patch that simply allows the developer
to change "/sbin/modprobe" to something else - e.g. the empty string,
so that all request_module() during early boot return -ENOENT early,
without even spawning a usermode helper.

The first patch allows delegating decompressing the initramfs to a
worker thread, allowing do_initcalls() in main.c to proceed to the
device_ and late_ initcalls without waiting for that decompression
(and populating of rootfs) to finish. Obviously, some of those later
calls may rely on the initramfs being available, so I've added
synchronization points in the firmware loader and usermodehelper paths
- there might be other places that would need this.

There's not much to win if most of the functionality needed during
boot is only available as modules. But systems with a custom-made
.config and initramfs can boot faster, partly due to utilizing more
than one cpu earlier, partly by avoiding known-futile modprobe calls
(which would still trigger synchronization with the initramfs
unpacking, thus eliminating most of the first benefit).

Routing-wise, I hope akpm can handle both patches. Andrew, Luis?

Changes in v2:

- Rebase on master, piggy-backing on the include guard and #ifdef
CONFIG_BLK_DEV_INITRD added to initrd.h in fade5cad93 and
c72160fe05.

- Use existing async_* API instead of wait_for_completion/complete_all

- Drop debug leftovers from wait_for_initramfs().

- Fix initialization of initramfs_async variable.

Rasmus Villemoes (2):
init/initramfs.c: allow asynchronous unpacking
modules: add CONFIG_MODPROBE_PATH

.../admin-guide/kernel-parameters.txt | 12 ++++++
drivers/base/firmware_loader/main.c | 2 +
include/linux/initrd.h | 2 +
init/Kconfig | 12 ++++++
init/initramfs.c | 41 ++++++++++++++++++-
init/main.c | 1 +
kernel/kmod.c | 2 +-
kernel/umh.c | 2 +
usr/Kconfig | 10 +++++
9 files changed, 82 insertions(+), 2 deletions(-)

--
2.29.2

2021-03-09 21:18:49

by Rasmus Villemoes

[permalink] [raw]
Subject: [PATCH v2 1/2] init/initramfs.c: allow asynchronous unpacking

This is primarily motivated by an embedded ppc target, where unpacking
even the rather modest sized initramfs takes 0.6 seconds, which is
long enough that the external watchdog becomes unhappy that it doesn't
get attention soon enough. By doing the initramfs decompression in a
worker thread, we get to do the device_initcalls and hence start
petting the watchdog much sooner.

So add an initramfs_async= kernel parameter, allowing the main init
process to proceed to handling device_initcall()s without waiting for
populate_rootfs() to finish.

Should one of those initcalls need something from the initramfs (say,
a kernel module or a firmware blob), it will simply wait for the
initramfs unpacking to be done before proceeding, which should in
theory make this completely safe to always enable. But if some driver
pokes around in the filesystem directly and not via one of the
official kernel interfaces (i.e. request_firmware*(),
call_usermodehelper*) that theory may not hold - also, I certainly
might have missed a spot when sprinkling wait_for_initramfs().

Normal desktops might benefit as well. On my mostly stock Ubuntu
kernel, my initramfs is a 26M xz-compressed blob, decompressing to
around 126M. That takes almost two seconds:

[ 0.201454] Trying to unpack rootfs image as initramfs...
[ 1.976633] Freeing initrd memory: 29416K

Before this patch, or with initramfs_async=0, these lines occur
consecutively in dmesg. With initramfs_async=1, the timestamps on
these two lines is roughly the same as above, but with 172 lines
inbetween - so more than one cpu has been kept busy doing work that
would otherwise only happen after the populate_rootfs() finished.

Signed-off-by: Rasmus Villemoes <[email protected]>
---
.../admin-guide/kernel-parameters.txt | 12 ++++++
drivers/base/firmware_loader/main.c | 2 +
include/linux/initrd.h | 2 +
init/initramfs.c | 41 ++++++++++++++++++-
init/main.c | 1 +
kernel/umh.c | 2 +
usr/Kconfig | 10 +++++
7 files changed, 69 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 04545725f187..fda9f012c42b 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1825,6 +1825,18 @@
initcall functions. Useful for debugging built-in
modules and initcalls.

+ initramfs_async= [KNL] Normally, the initramfs image is
+ unpacked synchronously, before most devices
+ are initialized. When the initramfs is huge,
+ or on slow CPUs, this can take a significant
+ amount of time. Setting this option means the
+ unpacking is instead done in a background
+ thread, allowing the main init process to
+ begin calling device_initcall()s while the
+ initramfs is being unpacked.
+ Format: <bool>
+ Default set by CONFIG_INITRAMFS_ASYNC.
+
initrd= [BOOT] Specify the location of the initial ramdisk

initrdmem= [KNL] Specify a physical address and size from which to
diff --git a/drivers/base/firmware_loader/main.c b/drivers/base/firmware_loader/main.c
index 78355095e00d..4fdb8219cd08 100644
--- a/drivers/base/firmware_loader/main.c
+++ b/drivers/base/firmware_loader/main.c
@@ -15,6 +15,7 @@
#include <linux/kernel_read_file.h>
#include <linux/module.h>
#include <linux/init.h>
+#include <linux/initrd.h>
#include <linux/timer.h>
#include <linux/vmalloc.h>
#include <linux/interrupt.h>
@@ -504,6 +505,7 @@ fw_get_filesystem_firmware(struct device *device, struct fw_priv *fw_priv,
if (!path)
return -ENOMEM;

+ wait_for_initramfs();
for (i = 0; i < ARRAY_SIZE(fw_path); i++) {
size_t file_size = 0;
size_t *file_size_ptr = NULL;
diff --git a/include/linux/initrd.h b/include/linux/initrd.h
index 85c15717af34..1bbe9af48dc3 100644
--- a/include/linux/initrd.h
+++ b/include/linux/initrd.h
@@ -20,8 +20,10 @@ extern void free_initrd_mem(unsigned long, unsigned long);

#ifdef CONFIG_BLK_DEV_INITRD
extern void __init reserve_initrd_mem(void);
+extern void wait_for_initramfs(void);
#else
static inline void __init reserve_initrd_mem(void) {}
+static inline void wait_for_initramfs(void) {}
#endif

extern phys_addr_t phys_initrd_start;
diff --git a/init/initramfs.c b/init/initramfs.c
index d677e8e717f1..d33bd98481c2 100644
--- a/init/initramfs.c
+++ b/init/initramfs.c
@@ -1,5 +1,6 @@
// SPDX-License-Identifier: GPL-2.0
#include <linux/init.h>
+#include <linux/async.h>
#include <linux/fs.h>
#include <linux/slab.h>
#include <linux/types.h>
@@ -541,6 +542,14 @@ static int __init keepinitrd_setup(char *__unused)
__setup("keepinitrd", keepinitrd_setup);
#endif

+static bool initramfs_async = IS_ENABLED(CONFIG_INITRAMFS_ASYNC);
+static int __init initramfs_async_setup(char *str)
+{
+ strtobool(str, &initramfs_async);
+ return 1;
+}
+__setup("initramfs_async=", initramfs_async_setup);
+
extern char __initramfs_start[];
extern unsigned long __initramfs_size;
#include <linux/initrd.h>
@@ -658,7 +667,7 @@ static void __init populate_initrd_image(char *err)
}
#endif /* CONFIG_BLK_DEV_RAM */

-static int __init populate_rootfs(void)
+static void __init do_populate_rootfs(void *unused, async_cookie_t cookie)
{
/* Load the built in initramfs */
char *err = unpack_to_rootfs(__initramfs_start, __initramfs_size);
@@ -693,6 +702,36 @@ static int __init populate_rootfs(void)
initrd_end = 0;

flush_delayed_fput();
+}
+
+static ASYNC_DOMAIN_EXCLUSIVE(initramfs_domain);
+static async_cookie_t initramfs_cookie;
+
+void wait_for_initramfs(void)
+{
+ if (!initramfs_async)
+ return;
+ if (!initramfs_cookie) {
+ /*
+ * Something before rootfs_initcall wants to access
+ * the filesystem/initramfs. Probably a bug. Make a
+ * note, avoid deadlocking the machine, and let the
+ * caller's access fail as it used to.
+ */
+ pr_warn_once("wait_for_initramfs() called before rootfs_initcalls\n");
+ return;
+ }
+ async_synchronize_cookie_domain(initramfs_cookie + 1, &initramfs_domain);
+}
+EXPORT_SYMBOL_GPL(wait_for_initramfs);
+
+static int __init populate_rootfs(void)
+{
+ if (initramfs_async)
+ initramfs_cookie = async_schedule_domain(do_populate_rootfs, NULL,
+ &initramfs_domain);
+ else
+ do_populate_rootfs(NULL, 0);
return 0;
}
rootfs_initcall(populate_rootfs);
diff --git a/init/main.c b/init/main.c
index 53b278845b88..64253b181a84 100644
--- a/init/main.c
+++ b/init/main.c
@@ -1538,6 +1538,7 @@ static noinline void __init kernel_init_freeable(void)

kunit_run_all_tests();

+ wait_for_initramfs();
console_on_rootfs();

/*
diff --git a/kernel/umh.c b/kernel/umh.c
index 3f646613a9d3..61f6b82c354b 100644
--- a/kernel/umh.c
+++ b/kernel/umh.c
@@ -27,6 +27,7 @@
#include <linux/ptrace.h>
#include <linux/async.h>
#include <linux/uaccess.h>
+#include <linux/initrd.h>

#include <trace/events/module.h>

@@ -107,6 +108,7 @@ static int call_usermodehelper_exec_async(void *data)

commit_creds(new);

+ wait_for_initramfs();
retval = kernel_execve(sub_info->path,
(const char *const *)sub_info->argv,
(const char *const *)sub_info->envp);
diff --git a/usr/Kconfig b/usr/Kconfig
index 8bbcf699fe3b..0f167c9f7eb9 100644
--- a/usr/Kconfig
+++ b/usr/Kconfig
@@ -32,6 +32,16 @@ config INITRAMFS_FORCE
and is useful if you cannot or don't want to change the image
your bootloader passes to the kernel.

+config INITRAMFS_ASYNC
+ bool "Unpack initramfs asynchronously"
+ help
+ This option sets the default value of the initramfs_async=
+ command line parameter. If that parameter is set, unpacking
+ of initramfs (both the builtin and one passed from a
+ bootloader) is done asynchronously. See
+ <file:Documentation/admin-guide/kernel-parameters.txt> for
+ details.
+
config INITRAMFS_ROOT_UID
int "User ID to map to 0 (user root)"
depends on INITRAMFS_SOURCE!=""
--
2.29.2

2021-03-09 22:10:22

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] init/initramfs.c: allow asynchronous unpacking

On Tue, Mar 9, 2021 at 1:17 PM Rasmus Villemoes
<[email protected]> wrote:
>
> So add an initramfs_async= kernel parameter, allowing the main init
> process to proceed to handling device_initcall()s without waiting for
> populate_rootfs() to finish.

I like this smaller second version of the patch, but am wondering why
we even need the parameter.

It sounds mostly like a "maybe I didn't think of all cases" thing -
and one that will mean that this code will not see a lot of actual
test coverage..

And because of the lack of test coverage, I'd rather reverse the
meaning, and have the async case on by default (without even the
Kconfig option), and have the kernel command line purely as a "oops,
it's buggy, easy to ask people to test if this is what ails them".

What *can* happen early boot outside of firmware loading and usermodehelpers?

Hmm?

Linus

2021-03-09 22:18:35

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] init/initramfs.c: allow asynchronous unpacking

On Tue, Mar 9, 2021 at 1:17 PM Rasmus Villemoes
<[email protected]> wrote:
>
> So add an initramfs_async= kernel parameter, allowing the main init
> process to proceed to handling device_initcall()s without waiting for
> populate_rootfs() to finish.

Oh, and a completely unrelated second comment about this: some of the
initramfs population code seems to be actively written to be slow.

For example, I'm not sure why that write_buffer() function uses an
array of indirect function pointer actions. Even ignoring the whole
"speculation protections make that really slow" issue that came later,
it seems to always have been actively (a) slower and (b) more complex.

The normal way to do that is with a simple switch() statement, which
makes the compiler able to do a much better job. Not just for the
state selector - maybe it picks that function pointer approach, but
probably these days just direct comparisons - but simply to do things
like inline all those "it's used in one place" cases entirely. In
fact, at that point, a lot of the state machine changes may end up
turning into pure goto's - compilers are sometimes good at that
(because state machines have often been very timing-critical).

Is that likely to be a big part of the costs? No. I assume it's the
decompression and the actual VFS operations. But when the series is
about how this all takes a long time, and I go "that code really looks
actively pessimally written", maybe it would be a good thing to look
into it?

Linus

2021-03-09 22:42:02

by Rasmus Villemoes

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] init/initramfs.c: allow asynchronous unpacking

On 09/03/2021 23.07, Linus Torvalds wrote:
> On Tue, Mar 9, 2021 at 1:17 PM Rasmus Villemoes
> <[email protected]> wrote:
>>
>> So add an initramfs_async= kernel parameter, allowing the main init
>> process to proceed to handling device_initcall()s without waiting for
>> populate_rootfs() to finish.
>
> I like this smaller second version of the patch, but am wondering why
> we even need the parameter.
>
> It sounds mostly like a "maybe I didn't think of all cases" thing -

That's exactly what it is.

> and one that will mean that this code will not see a lot of actual
> test coverage..

Yeah, that's probably true.

> And because of the lack of test coverage, I'd rather reverse the
> meaning, and have the async case on by default (without even the
> Kconfig option), and have the kernel command line purely as a "oops,
> it's buggy, easy to ask people to test if this is what ails them".

Well, I wasn't bold enough to make it "default y" by myself, but I can
certainly do that and nuke the config option.

> What *can* happen early boot outside of firmware loading and usermodehelpers?

Well, that was what I tried to get people to tell me when I sent the
first version as RFC, and also before that
(https://lore.kernel.org/lkml/[email protected]/).
That you can't think of anything suggests that I have covered the
important cases - which does leave random drivers that poke around the
filesystem on their own, but (a) it would probably be a good thing to
have this flush those out and (b) there's the command line option to
make it boot anyway.

Rasmus

2021-03-09 22:53:30

by Rasmus Villemoes

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] init/initramfs.c: allow asynchronous unpacking

On 09/03/2021 23.16, Linus Torvalds wrote:
> On Tue, Mar 9, 2021 at 1:17 PM Rasmus Villemoes
> <[email protected]> wrote:
>>
>> So add an initramfs_async= kernel parameter, allowing the main init
>> process to proceed to handling device_initcall()s without waiting for
>> populate_rootfs() to finish.
>
> Oh, and a completely unrelated second comment about this: some of the
> initramfs population code seems to be actively written to be slow.
>
> For example, I'm not sure why that write_buffer() function uses an
> array of indirect function pointer actions. Even ignoring the whole
> "speculation protections make that really slow" issue that came later,
> it seems to always have been actively (a) slower and (b) more complex.
>
[...]
> Is that likely to be a big part of the costs? No. I assume it's the
> decompression and the actual VFS operations.

Yes, I have been doing some simple measurements, simply by decompressing
the blob in userspace and comparing to the time to that used by
populate_rootfs(). For both the 6M lz4-compressed blob on my ppc target
and the 26M xz-compressed blob on my laptop, the result is that the
decompression itself accounts for the vast majority of the time - and
for ppc in particular, I don't think there's any spectre slowdown.

So I haven't dared looking into changing the unpack implementation since
it doesn't seem it could buy that much.

Rasmus

2021-03-11 00:22:24

by Rasmus Villemoes

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] init/initramfs.c: allow asynchronous unpacking

On 09/03/2021 23.16, Linus Torvalds wrote:
> On Tue, Mar 9, 2021 at 1:17 PM Rasmus Villemoes
> <[email protected]> wrote:
>>
>> So add an initramfs_async= kernel parameter, allowing the main init
>> process to proceed to handling device_initcall()s without waiting for
>> populate_rootfs() to finish.
>
> Oh, and a completely unrelated second comment about this: some of the
> initramfs population code seems to be actively written to be slow.
>
> For example, I'm not sure why that write_buffer() function uses an
> array of indirect function pointer actions. Even ignoring the whole
> "speculation protections make that really slow" issue that came later,
> it seems to always have been actively (a) slower and (b) more complex.
>
> The normal way to do that is with a simple switch() statement, which
> makes the compiler able to do a much better job. Not just for the
> state selector - maybe it picks that function pointer approach, but
> probably these days just direct comparisons - but simply to do things
> like inline all those "it's used in one place" cases entirely. In
> fact, at that point, a lot of the state machine changes may end up
> turning into pure goto's - compilers are sometimes good at that
> (because state machines have often been very timing-critical).

FWIW, I tried doing

--- a/init/initramfs.c
+++ b/init/initramfs.c
@@ -401,24 +401,26 @@ static int __init do_symlink(void)
return 0;
}

-static __initdata int (*actions[])(void) = {
- [Start] = do_start,
- [Collect] = do_collect,
- [GotHeader] = do_header,
- [SkipIt] = do_skip,
- [GotName] = do_name,
- [CopyFile] = do_copy,
- [GotSymlink] = do_symlink,
- [Reset] = do_reset,
-};
-
static long __init write_buffer(char *buf, unsigned long len)
{
+ int ret;
+
byte_count = len;
victim = buf;

- while (!actions[state]())
- ;
+ do {
+ switch (state) {
+ case Start: ret = do_start(); break;
+ case Collect: ret = do_collect(); break;
+ case GotHeader: ret = do_header(); break;
+ case SkipIt: ret = do_skip(); break;
+ case GotName: ret = do_name(); break;
+ case CopyFile: ret = do_copy(); break;
+ case GotSymlink: ret = do_symlink(); break;
+ case Reset: ret = do_reset(); break;
+ }
+ } while (!ret);
+
return len - byte_count;
}


and yes, all the do_* functions get inlined into write_buffer with some
net space saving:

add/remove: 0/9 grow/shrink: 1/0 up/down: 1696/-2112 (-416)
Function old new delta
write_buffer 100 1796 +1696
actions 32 - -32
do_start 52 - -52
do_reset 112 - -112
do_skip 128 - -128
do_collect 144 - -144
do_symlink 172 - -172
do_copy 304 - -304
do_header 572 - -572
do_name 596 - -596
Total: Before=5360, After=4944, chg -7.76%

(ppc32 build). But the unpacking still takes the same time. It might be
different on x86.

Rasmus

2021-03-11 01:47:54

by Rasmus Villemoes

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] init/initramfs.c: allow asynchronous unpacking

On 11/03/2021 01.17, Rasmus Villemoes wrote:
> On 09/03/2021 23.16, Linus Torvalds wrote:
>> On Tue, Mar 9, 2021 at 1:17 PM Rasmus Villemoes
>> <[email protected]> wrote:
>>>
>>> So add an initramfs_async= kernel parameter, allowing the main init
>>> process to proceed to handling device_initcall()s without waiting for
>>> populate_rootfs() to finish.
>>
>> Oh, and a completely unrelated second comment about this: some of the
>> initramfs population code seems to be actively written to be slow.
>>
>> For example, I'm not sure why that write_buffer() function uses an
>> array of indirect function pointer actions. Even ignoring the whole
>> "speculation protections make that really slow" issue that came later,
>> it seems to always have been actively (a) slower and (b) more complex.
>>
>> The normal way to do that is with a simple switch() statement, which
>> makes the compiler able to do a much better job. Not just for the
>> state selector - maybe it picks that function pointer approach, but
>> probably these days just direct comparisons - but simply to do things
>> like inline all those "it's used in one place" cases entirely. In
>> fact, at that point, a lot of the state machine changes may end up
>> turning into pure goto's - compilers are sometimes good at that
>> (because state machines have often been very timing-critical).
>
> FWIW, I tried doing
>

Hm, gcc does elide the test of the return value, but jumps back to a
place where it always loads state from its memory location and does the
whole switch(). To get it to jump directly to the code implementing the
various do_* helpers it seems one needs to avoid that global variable
and instead return the next state explicitly. The below boots, but I
still can't see any measurable improvement on ppc.

Rasmus

Subject: [PATCH] init/initramfs.c: change state machine implementation

Instead of having write_buffer() rely on the global variable "state",
have each of the do_* helpers return the next state, or the new token
Stop. Also, instead of an array of function pointers, use a switch
statement.

This means all the do_* helpers end up inlined into write_buffer(),
and all the places which return a compile-time constant next state now
compile to a direct jump to that code.

We still need the global variable state for the initial choice within
write_buffer, and we also need to preserve the last non-Stop state
across calls.
---
init/initramfs.c | 90 ++++++++++++++++++++++++------------------------
1 file changed, 45 insertions(+), 45 deletions(-)

diff --git a/init/initramfs.c b/init/initramfs.c
index 1d0fdd05e5e9..ad7e04393acb 100644
--- a/init/initramfs.c
+++ b/init/initramfs.c
@@ -189,7 +189,8 @@ static __initdata enum state {
GotName,
CopyFile,
GotSymlink,
- Reset
+ Reset,
+ Stop
} state, next_state;

static __initdata char *victim;
@@ -207,17 +208,17 @@ static __initdata char *collected;
static long remains __initdata;
static __initdata char *collect;

-static void __init read_into(char *buf, unsigned size, enum state next)
+static int __init read_into(char *buf, unsigned size, enum state next)
{
if (byte_count >= size) {
collected = victim;
eat(size);
- state = next;
+ return next;
} else {
collect = collected = buf;
remains = size;
next_state = next;
- state = Collect;
+ return Collect;
}
}

@@ -225,8 +226,7 @@ static __initdata char *header_buf, *symlink_buf,
*name_buf;

static int __init do_start(void)
{
- read_into(header_buf, 110, GotHeader);
- return 0;
+ return read_into(header_buf, 110, GotHeader);
}

static int __init do_collect(void)
@@ -238,50 +238,46 @@ static int __init do_collect(void)
eat(n);
collect += n;
if ((remains -= n) != 0)
- return 1;
- state = next_state;
- return 0;
+ return Stop;
+ return next_state;
}

static int __init do_header(void)
{
if (memcmp(collected, "070707", 6)==0) {
error("incorrect cpio method used: use -H newc option");
- return 1;
+ return Stop;
}
if (memcmp(collected, "070701", 6)) {
error("no cpio magic");
- return 1;
+ return Stop;
}
parse_header(collected);
next_header = this_header + N_ALIGN(name_len) + body_len;
next_header = (next_header + 3) & ~3;
- state = SkipIt;
if (name_len <= 0 || name_len > PATH_MAX)
- return 0;
+ return SkipIt;
if (S_ISLNK(mode)) {
if (body_len > PATH_MAX)
- return 0;
+ return SkipIt;
collect = collected = symlink_buf;
remains = N_ALIGN(name_len) + body_len;
next_state = GotSymlink;
- state = Collect;
- return 0;
+ return Collect;
}
if (S_ISREG(mode) || !body_len)
- read_into(name_buf, N_ALIGN(name_len), GotName);
- return 0;
+ return read_into(name_buf, N_ALIGN(name_len), GotName);
+ return SkipIt;
}

static int __init do_skip(void)
{
if (this_header + byte_count < next_header) {
eat(byte_count);
- return 1;
+ return Stop;
} else {
eat(next_header - this_header);
- state = next_state;
- return 0;
+ return next_state;
}
}

@@ -291,7 +287,7 @@ static int __init do_reset(void)
eat(1);
if (byte_count && (this_header & 3))
error("broken padding");
- return 1;
+ return Stop;
}

static void __init clean_path(char *path, umode_t fmode)
@@ -324,11 +320,12 @@ static __initdata loff_t wfile_pos;

static int __init do_name(void)
{
- state = SkipIt;
+ int s = SkipIt;
+
next_state = Reset;
if (strcmp(collected, "TRAILER!!!") == 0) {
free_hash();
- return 0;
+ return s;
}
clean_path(collected, mode);
if (S_ISREG(mode)) {
@@ -339,14 +336,14 @@ static int __init do_name(void)
openflags |= O_TRUNC;
wfile = filp_open(collected, openflags, mode);
if (IS_ERR(wfile))
- return 0;
+ return s;
wfile_pos = 0;

vfs_fchown(wfile, uid, gid);
vfs_fchmod(wfile, mode);
if (body_len)
vfs_truncate(&wfile->f_path, body_len);
- state = CopyFile;
+ s = CopyFile;
}
} else if (S_ISDIR(mode)) {
init_mkdir(collected, mode);
@@ -362,7 +359,7 @@ static int __init do_name(void)
do_utime(collected, mtime);
}
}
- return 0;
+ return s;
}

static int __init do_copy(void)
@@ -378,14 +375,13 @@ static int __init do_copy(void)

fput(wfile);
eat(body_len);
- state = SkipIt;
- return 0;
+ return SkipIt;
} else {
if (xwrite(wfile, victim, byte_count, &wfile_pos) != byte_count)
error("write error");
body_len -= byte_count;
eat(byte_count);
- return 1;
+ return Stop;
}
}

@@ -396,29 +392,33 @@ static int __init do_symlink(void)
init_symlink(collected + N_ALIGN(name_len), collected);
init_chown(collected, uid, gid, AT_SYMLINK_NOFOLLOW);
do_utime(collected, mtime);
- state = SkipIt;
next_state = Reset;
- return 0;
+ return SkipIt;
}

-static __initdata int (*actions[])(void) = {
- [Start] = do_start,
- [Collect] = do_collect,
- [GotHeader] = do_header,
- [SkipIt] = do_skip,
- [GotName] = do_name,
- [CopyFile] = do_copy,
- [GotSymlink] = do_symlink,
- [Reset] = do_reset,
-};
-
static long __init write_buffer(char *buf, unsigned long len)
{
+ int s = state;
+ int save;
+
byte_count = len;
victim = buf;

- while (!actions[state]())
- ;
+ do {
+ save = s;
+ switch (s) {
+ case Start: s = do_start(); break;
+ case Collect: s = do_collect(); break;
+ case GotHeader: s = do_header(); break;
+ case SkipIt: s = do_skip(); break;
+ case GotName: s = do_name(); break;
+ case CopyFile: s = do_copy(); break;
+ case GotSymlink: s = do_symlink(); break;
+ case Reset: s = do_reset(); break;
+ }
+ } while (s != Stop);
+ state = save;
+
return len - byte_count;
}

--
2.29.2

2021-03-11 17:57:07

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] init/initramfs.c: allow asynchronous unpacking

On Tue, Mar 09, 2021 at 02:07:36PM -0800, Linus Torvalds wrote:
> On Tue, Mar 9, 2021 at 1:17 PM Rasmus Villemoes
> <[email protected]> wrote:
> >
> > So add an initramfs_async= kernel parameter, allowing the main init
> > process to proceed to handling device_initcall()s without waiting for
> > populate_rootfs() to finish.
>
> I like this smaller second version of the patch, but am wondering why
> we even need the parameter.
>
> It sounds mostly like a "maybe I didn't think of all cases" thing -
> and one that will mean that this code will not see a lot of actual
> test coverage..
>
> And because of the lack of test coverage, I'd rather reverse the
> meaning, and have the async case on by default (without even the
> Kconfig option), and have the kernel command line purely as a "oops,
> it's buggy, easy to ask people to test if this is what ails them".

If we're going to set this as default it might be good to document on
init.h that components that need content in initramfs need the wait
call.

> What *can* happen early boot outside of firmware loading and usermodehelpers?

*In practice* the only thing I can think of at this time is races with
other rootfs_initcall() calls, granted ordering among these is done at
linker time, but I can't think of a issue with them:

arch/x86/kernel/pci-dma.c:rootfs_initcall(pci_iommu_init);
drivers/iommu/intel/irq_remapping.c:rootfs_initcall(ir_dev_scope_init);
drivers/mfd/sta2x11-mfd.c:rootfs_initcall(sta2x11_mfd_init);
drivers/thunderbolt/nhi.c:rootfs_initcall(nhi_init);

But Cc'ing the maintainers of these components just in case.

Luis

2021-03-11 18:04:58

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] init/initramfs.c: allow asynchronous unpacking

On Wed, Mar 10, 2021 at 5:45 PM Rasmus Villemoes
<[email protected]> wrote:
>
> Hm, gcc does elide the test of the return value, but jumps back to a
> place where it always loads state from its memory location and does the
> whole switch(). To get it to jump directly to the code implementing the
> various do_* helpers it seems one needs to avoid that global variable
> and instead return the next state explicitly. The below boots, but I
> still can't see any measurable improvement on ppc.

Ok. That's definitely the right way to do efficient statemachines that
the compiler can actually generate ok code for, but if you can't
measure the difference I guess it isn't even worth doing.

Thanks for checking, though.

Linus

2021-03-13 13:19:36

by Rasmus Villemoes

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] init/initramfs.c: allow asynchronous unpacking

On 11/03/2021 19.02, Linus Torvalds wrote:
> On Wed, Mar 10, 2021 at 5:45 PM Rasmus Villemoes
> <[email protected]> wrote:
>>
>> Hm, gcc does elide the test of the return value, but jumps back to a
>> place where it always loads state from its memory location and does the
>> whole switch(). To get it to jump directly to the code implementing the
>> various do_* helpers it seems one needs to avoid that global variable
>> and instead return the next state explicitly. The below boots, but I
>> still can't see any measurable improvement on ppc.
>
> Ok. That's definitely the right way to do efficient statemachines that
> the compiler can actually generate ok code for, but if you can't
> measure the difference I guess it isn't even worth doing.

Just for good measure, I now got around to test on x86 as well, where I
thought the speculation stuff might make a difference. However, the
indirect calls through the actions[] array don't actually hurt due to
__noinitretpoline, and even removing that from the __init definition, I
only see about 1.5% difference with that state machine patch applied.

So it doesn't seem worth pursuing. I'll send v3 of the async patches
shortly.

Rasmus