2023-05-31 13:19:43

by Christoph Hellwig

[permalink] [raw]
Subject: fix the name_to_dev_t mess v2

Hi all,

this series tries to sort out accumulated mess around the name_to_dev_t
function. This function is intended to allow looking up the dev_t of a
block device based on a name string before the root file systems is
mounted and thus the normal path based lookup is available.

Unfortunately a few years ago it managed to get exported and used in
non-init contexts, leading to the something looking like a path name
also beeing lookuped up by a different and potential dangerous
algorithm.

This series does a fair amount of refactoring and finally ends up with
the renamed and improved name_to_dev_t only beeing available for the
early init code again.

The series is against Jens' for-6.5/block tree but probably applies
against current mainline just fine as well.

A git tree is also available here:

git://git.infradead.org/users/hch/block.git blk-init-cleanup

Gitweb:

http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/blk-init-cleanup

Changes since v1:
- really propagate the actual error in dm_get_device
- improve the documentation in kernel-parameters.txt
- spelling fixes

Diffstat:


2023-05-31 13:22:25

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 11/24] init: factor the root_wait logic in prepare_namespace into a helper

The root_wait logic is a bit obsfucated right now. Expand it and move it
into a helper.

Signed-off-by: Christoph Hellwig <[email protected]>
---
init/do_mounts.c | 32 ++++++++++++++++++++++----------
1 file changed, 22 insertions(+), 10 deletions(-)

diff --git a/init/do_mounts.c b/init/do_mounts.c
index be6d14733ba02f..d5c06c1546e82c 100644
--- a/init/do_mounts.c
+++ b/init/do_mounts.c
@@ -606,6 +606,26 @@ void __init mount_root(char *root_device_name)
}
}

+/* wait for any asynchronous scanning to complete */
+static void __init wait_for_root(char *root_device_name)
+{
+ if (ROOT_DEV != 0)
+ return;
+
+ pr_info("Waiting for root device %s...\n", root_device_name);
+
+ for (;;) {
+ if (driver_probe_done()) {
+ ROOT_DEV = name_to_dev_t(root_device_name);
+ if (ROOT_DEV)
+ break;
+ }
+ msleep(5);
+ }
+ async_synchronize_full();
+
+}
+
static dev_t __init parse_root_device(char *root_device_name)
{
if (!strncmp(root_device_name, "mtd", 3) ||
@@ -642,16 +662,8 @@ void __init prepare_namespace(void)
if (initrd_load(saved_root_name))
goto out;

- /* wait for any asynchronous scanning to complete */
- if ((ROOT_DEV == 0) && root_wait) {
- printk(KERN_INFO "Waiting for root device %s...\n",
- saved_root_name);
- while (!driver_probe_done() ||
- (ROOT_DEV = name_to_dev_t(saved_root_name)) == 0)
- msleep(5);
- async_synchronize_full();
- }
-
+ if (root_wait)
+ wait_for_root(saved_root_name);
mount_root(saved_root_name);
out:
devtmpfs_mount();
--
2.39.2


2023-05-31 13:22:35

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 03/24] PM: hibernate: remove the global snapshot_test variable

Passing call dependent variable in global variables is a huge
antipattern. Fix it up.

Signed-off-by: Christoph Hellwig <[email protected]>
Acked-by: Rafael J. Wysocki <[email protected]>
---
kernel/power/hibernate.c | 17 ++++++-----------
kernel/power/power.h | 3 +--
kernel/power/swap.c | 2 +-
3 files changed, 8 insertions(+), 14 deletions(-)

diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
index 07279506366255..78696aa04f5ca3 100644
--- a/kernel/power/hibernate.c
+++ b/kernel/power/hibernate.c
@@ -64,7 +64,6 @@ enum {
static int hibernation_mode = HIBERNATION_SHUTDOWN;

bool freezer_test_done;
-bool snapshot_test;

static const struct platform_hibernation_ops *hibernation_ops;

@@ -684,7 +683,7 @@ static void power_down(void)
cpu_relax();
}

-static int load_image_and_restore(void)
+static int load_image_and_restore(bool snapshot_test)
{
int error;
unsigned int flags;
@@ -721,6 +720,7 @@ static int load_image_and_restore(void)
*/
int hibernate(void)
{
+ bool snapshot_test = false;
unsigned int sleep_flags;
int error;

@@ -748,9 +748,6 @@ int hibernate(void)
if (error)
goto Exit;

- /* protected by system_transition_mutex */
- snapshot_test = false;
-
lock_device_hotplug();
/* Allocate memory management structures */
error = create_basic_memory_bitmaps();
@@ -792,9 +789,9 @@ int hibernate(void)
unlock_device_hotplug();
if (snapshot_test) {
pm_pr_dbg("Checking hibernation image\n");
- error = swsusp_check();
+ error = swsusp_check(snapshot_test);
if (!error)
- error = load_image_and_restore();
+ error = load_image_and_restore(snapshot_test);
}
thaw_processes();

@@ -982,8 +979,6 @@ static int software_resume(void)
*/
mutex_lock_nested(&system_transition_mutex, SINGLE_DEPTH_NESTING);

- snapshot_test = false;
-
if (!swsusp_resume_device) {
error = find_resume_device();
if (error)
@@ -994,7 +989,7 @@ static int software_resume(void)
MAJOR(swsusp_resume_device), MINOR(swsusp_resume_device));

pm_pr_dbg("Looking for hibernation image.\n");
- error = swsusp_check();
+ error = swsusp_check(false);
if (error)
goto Unlock;

@@ -1022,7 +1017,7 @@ static int software_resume(void)
goto Close_Finish;
}

- error = load_image_and_restore();
+ error = load_image_and_restore(false);
thaw_processes();
Finish:
pm_notifier_call_chain(PM_POST_RESTORE);
diff --git a/kernel/power/power.h b/kernel/power/power.h
index b83c8d5e188dec..978189fcafd124 100644
--- a/kernel/power/power.h
+++ b/kernel/power/power.h
@@ -59,7 +59,6 @@ asmlinkage int swsusp_save(void);

/* kernel/power/hibernate.c */
extern bool freezer_test_done;
-extern bool snapshot_test;

extern int hibernation_snapshot(int platform_mode);
extern int hibernation_restore(int platform_mode);
@@ -174,7 +173,7 @@ extern int swsusp_swap_in_use(void);
#define SF_HW_SIG 8

/* kernel/power/hibernate.c */
-extern int swsusp_check(void);
+int swsusp_check(bool snapshot_test);
extern void swsusp_free(void);
extern int swsusp_read(unsigned int *flags_p);
extern int swsusp_write(unsigned int flags);
diff --git a/kernel/power/swap.c b/kernel/power/swap.c
index 92e41ed292ada8..efed11568bfc72 100644
--- a/kernel/power/swap.c
+++ b/kernel/power/swap.c
@@ -1514,7 +1514,7 @@ int swsusp_read(unsigned int *flags_p)
* swsusp_check - Check for swsusp signature in the resume device
*/

-int swsusp_check(void)
+int swsusp_check(bool snapshot_test)
{
int error;
void *holder;
--
2.39.2


2023-05-31 13:22:34

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 13/24] init: improve the name_to_dev_t interface

name_to_dev_t has a very misleading name, that doesn't make clear
it should only be used by the early init code, and also has a bad
calling convention that doesn't allow returning different kinds of
errors. Rename it to early_lookup_bdev to make the use case clear,
and return an errno, where -EINVAL means the string could not be
parsed, and -ENODEV means it the string was valid, but there was
no device found for it.

Also stub out the whole call for !CONFIG_BLOCK as all the non-block
root cases are always covered in the caller.

Signed-off-by: Christoph Hellwig <[email protected]>
---
.../admin-guide/kernel-parameters.txt | 4 +-
drivers/md/dm-table.c | 5 +-
drivers/md/md-autodetect.c | 3 +-
drivers/mtd/devices/block2mtd.c | 3 +-
fs/pstore/blk.c | 4 +-
include/linux/blkdev.h | 5 +
include/linux/mount.h | 1 -
init/do_mounts.c | 102 +++++++++---------
kernel/power/hibernate.c | 22 ++--
9 files changed, 74 insertions(+), 75 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 457c342d15977e..a6bc31349cbb76 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5453,8 +5453,8 @@

root= [KNL] Root filesystem
Usually this a a block device specifier of some kind,
- see the name_to_dev_t comment in init/do_mounts.c for
- details.
+ see the early_lookup_bdev comment in init/do_mounts.c
+ for details.
Alternatively this can be "ram" for the legacy initial
ramdisk, "nfs" and "cifs" for root on a network file
system, or "mtd" and "ubi" for mounting from raw flash.
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 1398f1d6e83e7f..05aa16da43b0d5 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -330,8 +330,9 @@ dev_t dm_get_dev_t(const char *path)
{
dev_t dev;

- if (lookup_bdev(path, &dev))
- dev = name_to_dev_t(path);
+ if (lookup_bdev(path, &dev) &&
+ early_lookup_bdev(path, &dev))
+ return 0;
return dev;
}
EXPORT_SYMBOL_GPL(dm_get_dev_t);
diff --git a/drivers/md/md-autodetect.c b/drivers/md/md-autodetect.c
index 91836e6de3260f..6eaa0eab40f962 100644
--- a/drivers/md/md-autodetect.c
+++ b/drivers/md/md-autodetect.c
@@ -147,7 +147,8 @@ static void __init md_setup_drive(struct md_setup_args *args)
if (p)
*p++ = 0;

- dev = name_to_dev_t(devname);
+ if (early_lookup_bdev(devname, &dev))
+ dev = 0;
if (strncmp(devname, "/dev/", 5) == 0)
devname += 5;
snprintf(comp_name, 63, "/dev/%s", devname);
diff --git a/drivers/mtd/devices/block2mtd.c b/drivers/mtd/devices/block2mtd.c
index 4cd37ec45762b6..4c21e9f13bead5 100644
--- a/drivers/mtd/devices/block2mtd.c
+++ b/drivers/mtd/devices/block2mtd.c
@@ -254,8 +254,7 @@ static struct block2mtd_dev *add_device(char *devname, int erase_size,
msleep(1000);
wait_for_device_probe();

- devt = name_to_dev_t(devname);
- if (!devt)
+ if (early_lookup_bdev(devname, &devt))
continue;
bdev = blkdev_get_by_dev(devt, mode, dev);
}
diff --git a/fs/pstore/blk.c b/fs/pstore/blk.c
index 4ae0cfcd15f20b..de8cf5d75f34d5 100644
--- a/fs/pstore/blk.c
+++ b/fs/pstore/blk.c
@@ -263,9 +263,9 @@ static __init const char *early_boot_devpath(const char *initial_devname)
* same scheme to find the device that we use for mounting
* the root file system.
*/
- dev_t dev = name_to_dev_t(initial_devname);
+ dev_t dev;

- if (!dev) {
+ if (early_lookup_bdev(initial_devname, &dev)) {
pr_err("failed to resolve '%s'!\n", initial_devname);
return initial_devname;
}
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index d89c2da1469872..0bda6cb98d7eb8 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1495,6 +1495,7 @@ int sync_blockdev_nowait(struct block_device *bdev);
void sync_bdevs(bool wait);
void bdev_statx_dioalign(struct inode *inode, struct kstat *stat);
void printk_all_partitions(void);
+int early_lookup_bdev(const char *pathname, dev_t *dev);
#else
static inline void invalidate_bdev(struct block_device *bdev)
{
@@ -1516,6 +1517,10 @@ static inline void bdev_statx_dioalign(struct inode *inode, struct kstat *stat)
static inline void printk_all_partitions(void)
{
}
+static inline int early_lookup_bdev(const char *pathname, dev_t *dev)
+{
+ return -EINVAL;
+}
#endif /* CONFIG_BLOCK */

int fsync_bdev(struct block_device *bdev);
diff --git a/include/linux/mount.h b/include/linux/mount.h
index 1ea326c368f726..4b81ea90440e45 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -107,7 +107,6 @@ extern struct vfsmount *vfs_submount(const struct dentry *mountpoint,
extern void mnt_set_expiry(struct vfsmount *mnt, struct list_head *expiry_list);
extern void mark_mounts_for_expiry(struct list_head *mounts);

-extern dev_t name_to_dev_t(const char *name);
extern bool path_is_mountpoint(const struct path *path);

extern bool our_mnt(struct vfsmount *mnt);
diff --git a/init/do_mounts.c b/init/do_mounts.c
index 86599faf2bf8a1..f1953aeb321978 100644
--- a/init/do_mounts.c
+++ b/init/do_mounts.c
@@ -96,11 +96,10 @@ static int match_dev_by_uuid(struct device *dev, const void *data)
*
* Returns the matching dev_t on success or 0 on failure.
*/
-static dev_t devt_from_partuuid(const char *uuid_str)
+static int devt_from_partuuid(const char *uuid_str, dev_t *devt)
{
struct uuidcmp cmp;
struct device *dev = NULL;
- dev_t devt = 0;
int offset = 0;
char *slash;

@@ -124,21 +123,21 @@ static dev_t devt_from_partuuid(const char *uuid_str)

dev = class_find_device(&block_class, NULL, &cmp, &match_dev_by_uuid);
if (!dev)
- return 0;
+ return -ENODEV;

if (offset) {
/*
* Attempt to find the requested partition by adding an offset
* to the partition number found by UUID.
*/
- devt = part_devt(dev_to_disk(dev),
- dev_to_bdev(dev)->bd_partno + offset);
+ *devt = part_devt(dev_to_disk(dev),
+ dev_to_bdev(dev)->bd_partno + offset);
} else {
- devt = dev->devt;
+ *devt = dev->devt;
}

put_device(dev);
- return devt;
+ return 0;

clear_root_wait:
pr_err("VFS: PARTUUID= is invalid.\n"
@@ -146,7 +145,7 @@ static dev_t devt_from_partuuid(const char *uuid_str)
if (root_wait)
pr_err("Disabling rootwait; root= is invalid.\n");
root_wait = 0;
- return 0;
+ return -EINVAL;
}

/**
@@ -166,38 +165,35 @@ static int match_dev_by_label(struct device *dev, const void *data)
return 1;
}

-static dev_t devt_from_partlabel(const char *label)
+static int devt_from_partlabel(const char *label, dev_t *devt)
{
struct device *dev;
- dev_t devt = 0;

dev = class_find_device(&block_class, NULL, label, &match_dev_by_label);
- if (dev) {
- devt = dev->devt;
- put_device(dev);
- }
-
- return devt;
+ if (!dev)
+ return -ENODEV;
+ *devt = dev->devt;
+ put_device(dev);
+ return 0;
}

-static dev_t devt_from_devname(const char *name)
+static int devt_from_devname(const char *name, dev_t *devt)
{
- dev_t devt = 0;
int part;
char s[32];
char *p;

if (strlen(name) > 31)
- return 0;
+ return -EINVAL;
strcpy(s, name);
for (p = s; *p; p++) {
if (*p == '/')
*p = '!';
}

- devt = blk_lookup_devt(s, 0);
- if (devt)
- return devt;
+ *devt = blk_lookup_devt(s, 0);
+ if (*devt)
+ return 0;

/*
* Try non-existent, but valid partition, which may only exist after
@@ -206,41 +202,42 @@ static dev_t devt_from_devname(const char *name)
while (p > s && isdigit(p[-1]))
p--;
if (p == s || !*p || *p == '0')
- return 0;
+ return -EINVAL;

/* try disk name without <part number> */
part = simple_strtoul(p, NULL, 10);
*p = '\0';
- devt = blk_lookup_devt(s, part);
- if (devt)
- return devt;
+ *devt = blk_lookup_devt(s, part);
+ if (*devt)
+ return 0;

/* try disk name without p<part number> */
if (p < s + 2 || !isdigit(p[-2]) || p[-1] != 'p')
- return 0;
+ return -EINVAL;
p[-1] = '\0';
- return blk_lookup_devt(s, part);
+ *devt = blk_lookup_devt(s, part);
+ if (*devt)
+ return 0;
+ return -EINVAL;
}
-#endif /* CONFIG_BLOCK */

-static dev_t devt_from_devnum(const char *name)
+static int devt_from_devnum(const char *name, dev_t *devt)
{
unsigned maj, min, offset;
- dev_t devt = 0;
char *p, dummy;

if (sscanf(name, "%u:%u%c", &maj, &min, &dummy) == 2 ||
sscanf(name, "%u:%u:%u:%c", &maj, &min, &offset, &dummy) == 3) {
- devt = MKDEV(maj, min);
- if (maj != MAJOR(devt) || min != MINOR(devt))
- return 0;
+ *devt = MKDEV(maj, min);
+ if (maj != MAJOR(*devt) || min != MINOR(*devt))
+ return -EINVAL;
} else {
- devt = new_decode_dev(simple_strtoul(name, &p, 16));
+ *devt = new_decode_dev(simple_strtoul(name, &p, 16));
if (*p)
- return 0;
+ return -EINVAL;
}

- return devt;
+ return 0;
}

/*
@@ -271,19 +268,18 @@ static dev_t devt_from_devnum(const char *name)
* name contains slashes, the device name has them replaced with
* bangs.
*/
-dev_t name_to_dev_t(const char *name)
+int early_lookup_bdev(const char *name, dev_t *devt)
{
-#ifdef CONFIG_BLOCK
if (strncmp(name, "PARTUUID=", 9) == 0)
- return devt_from_partuuid(name + 9);
+ return devt_from_partuuid(name + 9, devt);
if (strncmp(name, "PARTLABEL=", 10) == 0)
- return devt_from_partlabel(name + 10);
+ return devt_from_partlabel(name + 10, devt);
if (strncmp(name, "/dev/", 5) == 0)
- return devt_from_devname(name + 5);
-#endif
- return devt_from_devnum(name);
+ return devt_from_devname(name + 5, devt);
+ return devt_from_devnum(name, devt);
}
-EXPORT_SYMBOL_GPL(name_to_dev_t);
+EXPORT_SYMBOL_GPL(early_lookup_bdev);
+#endif

static int __init root_dev_setup(char *line)
{
@@ -606,20 +602,17 @@ static void __init wait_for_root(char *root_device_name)

pr_info("Waiting for root device %s...\n", root_device_name);

- for (;;) {
- if (driver_probe_done()) {
- ROOT_DEV = name_to_dev_t(root_device_name);
- if (ROOT_DEV)
- break;
- }
+ while (!driver_probe_done() ||
+ early_lookup_bdev(root_device_name, &ROOT_DEV) < 0)
msleep(5);
- }
async_synchronize_full();

}

static dev_t __init parse_root_device(char *root_device_name)
{
+ dev_t dev;
+
if (!strncmp(root_device_name, "mtd", 3) ||
!strncmp(root_device_name, "ubi", 3))
return Root_Generic;
@@ -629,7 +622,10 @@ static dev_t __init parse_root_device(char *root_device_name)
return Root_CIFS;
if (strcmp(root_device_name, "/dev/ram") == 0)
return Root_RAM0;
- return name_to_dev_t(root_device_name);
+
+ if (early_lookup_bdev(root_device_name, &dev))
+ return 0;
+ return dev;
}

/*
diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
index 45e24b02cd50b6..c52dedb9f7c8e8 100644
--- a/kernel/power/hibernate.c
+++ b/kernel/power/hibernate.c
@@ -11,6 +11,7 @@

#define pr_fmt(fmt) "PM: hibernation: " fmt

+#include <linux/blkdev.h>
#include <linux/export.h>
#include <linux/suspend.h>
#include <linux/reboot.h>
@@ -921,8 +922,7 @@ static int __init find_resume_device(void)
}

/* Check if the device is there */
- swsusp_resume_device = name_to_dev_t(resume_file);
- if (swsusp_resume_device)
+ if (!early_lookup_bdev(resume_file, &swsusp_resume_device))
return 0;

/*
@@ -931,15 +931,12 @@ static int __init find_resume_device(void)
*/
wait_for_device_probe();
if (resume_wait) {
- while (!(swsusp_resume_device = name_to_dev_t(resume_file)))
+ while (early_lookup_bdev(resume_file, &swsusp_resume_device))
msleep(10);
async_synchronize_full();
}

- swsusp_resume_device = name_to_dev_t(resume_file);
- if (!swsusp_resume_device)
- return -ENODEV;
- return 0;
+ return early_lookup_bdev(resume_file, &swsusp_resume_device);
}

static int software_resume(void)
@@ -1169,7 +1166,8 @@ static ssize_t resume_store(struct kobject *kobj, struct kobj_attribute *attr,
unsigned int sleep_flags;
int len = n;
char *name;
- dev_t res;
+ dev_t dev;
+ int error;

if (!hibernation_available())
return 0;
@@ -1180,13 +1178,13 @@ static ssize_t resume_store(struct kobject *kobj, struct kobj_attribute *attr,
if (!name)
return -ENOMEM;

- res = name_to_dev_t(name);
+ error = early_lookup_bdev(name, &dev);
kfree(name);
- if (!res)
- return -EINVAL;
+ if (error)
+ return error;

sleep_flags = lock_system_sleep();
- swsusp_resume_device = res;
+ swsusp_resume_device = dev;
unlock_system_sleep(sleep_flags);

pm_pr_dbg("Configured hibernation resume from disk to %u\n",
--
2.39.2


2023-05-31 13:22:41

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 04/24] PM: hibernate: move finding the resume device out of software_resume

software_resume can be called either from an init call in the boot code,
or from sysfs once the system has finished booting, and the two
invocation methods this can't race with each other.

For the latter case we did just parse the suspend device manually, while
the former might not have one. Split software_resume so that the search
only happens for the boot case, which also means the special lockdep
nesting annotation can go away as the system transition mutex can be
taken a little later and doesn't have the sysfs locking nest inside it.

Signed-off-by: Christoph Hellwig <[email protected]>
Acked-by: Rafael J. Wysocki <[email protected]>
---
kernel/power/hibernate.c | 80 ++++++++++++++++++++--------------------
1 file changed, 39 insertions(+), 41 deletions(-)

diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
index 78696aa04f5ca3..45e24b02cd50b6 100644
--- a/kernel/power/hibernate.c
+++ b/kernel/power/hibernate.c
@@ -907,7 +907,7 @@ int hibernate_quiet_exec(int (*func)(void *data), void *data)
}
EXPORT_SYMBOL_GPL(hibernate_quiet_exec);

-static int find_resume_device(void)
+static int __init find_resume_device(void)
{
if (!strlen(resume_file))
return -ENOENT;
@@ -942,53 +942,16 @@ static int find_resume_device(void)
return 0;
}

-/**
- * software_resume - Resume from a saved hibernation image.
- *
- * This routine is called as a late initcall, when all devices have been
- * discovered and initialized already.
- *
- * The image reading code is called to see if there is a hibernation image
- * available for reading. If that is the case, devices are quiesced and the
- * contents of memory is restored from the saved image.
- *
- * If this is successful, control reappears in the restored target kernel in
- * hibernation_snapshot() which returns to hibernate(). Otherwise, the routine
- * attempts to recover gracefully and make the kernel return to the normal mode
- * of operation.
- */
static int software_resume(void)
{
int error;

- /*
- * If the user said "noresume".. bail out early.
- */
- if (noresume || !hibernation_available())
- return 0;
-
- /*
- * name_to_dev_t() below takes a sysfs buffer mutex when sysfs
- * is configured into the kernel. Since the regular hibernate
- * trigger path is via sysfs which takes a buffer mutex before
- * calling hibernate functions (which take system_transition_mutex)
- * this can cause lockdep to complain about a possible ABBA deadlock
- * which cannot happen since we're in the boot code here and
- * sysfs can't be invoked yet. Therefore, we use a subclass
- * here to avoid lockdep complaining.
- */
- mutex_lock_nested(&system_transition_mutex, SINGLE_DEPTH_NESTING);
-
- if (!swsusp_resume_device) {
- error = find_resume_device();
- if (error)
- goto Unlock;
- }
-
pm_pr_dbg("Hibernation image partition %d:%d present\n",
MAJOR(swsusp_resume_device), MINOR(swsusp_resume_device));

pm_pr_dbg("Looking for hibernation image.\n");
+
+ mutex_lock(&system_transition_mutex);
error = swsusp_check(false);
if (error)
goto Unlock;
@@ -1035,7 +998,39 @@ static int software_resume(void)
goto Finish;
}

-late_initcall_sync(software_resume);
+/**
+ * software_resume_initcall - Resume from a saved hibernation image.
+ *
+ * This routine is called as a late initcall, when all devices have been
+ * discovered and initialized already.
+ *
+ * The image reading code is called to see if there is a hibernation image
+ * available for reading. If that is the case, devices are quiesced and the
+ * contents of memory is restored from the saved image.
+ *
+ * If this is successful, control reappears in the restored target kernel in
+ * hibernation_snapshot() which returns to hibernate(). Otherwise, the routine
+ * attempts to recover gracefully and make the kernel return to the normal mode
+ * of operation.
+ */
+static int __init software_resume_initcall(void)
+{
+ /*
+ * If the user said "noresume".. bail out early.
+ */
+ if (noresume || !hibernation_available())
+ return 0;
+
+ if (!swsusp_resume_device) {
+ int error = find_resume_device();
+
+ if (error)
+ return error;
+ }
+
+ return software_resume();
+}
+late_initcall_sync(software_resume_initcall);


static const char * const hibernation_modes[] = {
@@ -1176,6 +1171,9 @@ static ssize_t resume_store(struct kobject *kobj, struct kobj_attribute *attr,
char *name;
dev_t res;

+ if (!hibernation_available())
+ return 0;
+
if (len && buf[len-1] == '\n')
len--;
name = kstrndup(buf, len, GFP_KERNEL);
--
2.39.2


2023-05-31 13:23:06

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 18/24] dm: open code dm_get_dev_t in dm_init_init

dm_init_init is called from early boot code, and thus lookup_bdev
will never succeed. Just open code that call to early_lookup_bdev
instead.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Mike Snitzer <[email protected]>
---
drivers/md/dm-init.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/md/dm-init.c b/drivers/md/dm-init.c
index d369457dbed0ed..2a71bcdba92d14 100644
--- a/drivers/md/dm-init.c
+++ b/drivers/md/dm-init.c
@@ -293,8 +293,10 @@ static int __init dm_init_init(void)

for (i = 0; i < ARRAY_SIZE(waitfor); i++) {
if (waitfor[i]) {
+ dev_t dev;
+
DMINFO("waiting for device %s ...", waitfor[i]);
- while (!dm_get_dev_t(waitfor[i]))
+ while (early_lookup_bdev(waitfor[i], &dev))
fsleep(5000);
}
}
--
2.39.2


2023-05-31 13:23:06

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 15/24] block: move the code to do early boot lookup of block devices to block/

Create a new block/early-lookup.c to keep the early block device lookup
code instead of having this code sit with the early mount code.

Signed-off-by: Christoph Hellwig <[email protected]>
---
.../admin-guide/kernel-parameters.txt | 4 +-
block/Makefile | 2 +-
block/early-lookup.c | 224 ++++++++++++++++++
init/do_mounts.c | 219 -----------------
4 files changed, 227 insertions(+), 222 deletions(-)
create mode 100644 block/early-lookup.c

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index a6bc31349cbb76..911c54829c7c59 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5453,8 +5453,8 @@

root= [KNL] Root filesystem
Usually this a a block device specifier of some kind,
- see the early_lookup_bdev comment in init/do_mounts.c
- for details.
+ see the early_lookup_bdev comment in
+ block/early-lookup.c for details.
Alternatively this can be "ram" for the legacy initial
ramdisk, "nfs" and "cifs" for root on a network file
system, or "mtd" and "ubi" for mounting from raw flash.
diff --git a/block/Makefile b/block/Makefile
index b31b05390749a1..46ada9dc8bbfe2 100644
--- a/block/Makefile
+++ b/block/Makefile
@@ -9,7 +9,7 @@ obj-y := bdev.o fops.o bio.o elevator.o blk-core.o blk-sysfs.o \
blk-lib.o blk-mq.o blk-mq-tag.o blk-stat.o \
blk-mq-sysfs.o blk-mq-cpumap.o blk-mq-sched.o ioctl.o \
genhd.o ioprio.o badblocks.o partitions/ blk-rq-qos.o \
- disk-events.o blk-ia-ranges.o
+ disk-events.o blk-ia-ranges.o early-lookup.o

obj-$(CONFIG_BOUNCE) += bounce.o
obj-$(CONFIG_BLK_DEV_BSG_COMMON) += bsg.o
diff --git a/block/early-lookup.c b/block/early-lookup.c
new file mode 100644
index 00000000000000..9fc30d039508af
--- /dev/null
+++ b/block/early-lookup.c
@@ -0,0 +1,224 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Code for looking up block devices in the early boot code before mounting the
+ * root file system. Unfortunately currently also abused in a few other places.
+ */
+#include <linux/blkdev.h>
+#include <linux/ctype.h>
+
+struct uuidcmp {
+ const char *uuid;
+ int len;
+};
+
+/**
+ * match_dev_by_uuid - callback for finding a partition using its uuid
+ * @dev: device passed in by the caller
+ * @data: opaque pointer to the desired struct uuidcmp to match
+ *
+ * Returns 1 if the device matches, and 0 otherwise.
+ */
+static int match_dev_by_uuid(struct device *dev, const void *data)
+{
+ struct block_device *bdev = dev_to_bdev(dev);
+ const struct uuidcmp *cmp = data;
+
+ if (!bdev->bd_meta_info ||
+ strncasecmp(cmp->uuid, bdev->bd_meta_info->uuid, cmp->len))
+ return 0;
+ return 1;
+}
+
+/**
+ * devt_from_partuuid - looks up the dev_t of a partition by its UUID
+ * @uuid_str: char array containing ascii UUID
+ *
+ * The function will return the first partition which contains a matching
+ * UUID value in its partition_meta_info struct. This does not search
+ * by filesystem UUIDs.
+ *
+ * If @uuid_str is followed by a "/PARTNROFF=%d", then the number will be
+ * extracted and used as an offset from the partition identified by the UUID.
+ *
+ * Returns the matching dev_t on success or 0 on failure.
+ */
+static int devt_from_partuuid(const char *uuid_str, dev_t *devt)
+{
+ struct uuidcmp cmp;
+ struct device *dev = NULL;
+ int offset = 0;
+ char *slash;
+
+ cmp.uuid = uuid_str;
+
+ slash = strchr(uuid_str, '/');
+ /* Check for optional partition number offset attributes. */
+ if (slash) {
+ char c = 0;
+
+ /* Explicitly fail on poor PARTUUID syntax. */
+ if (sscanf(slash + 1, "PARTNROFF=%d%c", &offset, &c) != 1)
+ goto out_invalid;
+ cmp.len = slash - uuid_str;
+ } else {
+ cmp.len = strlen(uuid_str);
+ }
+
+ if (!cmp.len)
+ goto out_invalid;
+
+ dev = class_find_device(&block_class, NULL, &cmp, &match_dev_by_uuid);
+ if (!dev)
+ return -ENODEV;
+
+ if (offset) {
+ /*
+ * Attempt to find the requested partition by adding an offset
+ * to the partition number found by UUID.
+ */
+ *devt = part_devt(dev_to_disk(dev),
+ dev_to_bdev(dev)->bd_partno + offset);
+ } else {
+ *devt = dev->devt;
+ }
+
+ put_device(dev);
+ return 0;
+
+out_invalid:
+ pr_err("VFS: PARTUUID= is invalid.\n"
+ "Expected PARTUUID=<valid-uuid-id>[/PARTNROFF=%%d]\n");
+ return -EINVAL;
+}
+
+/**
+ * match_dev_by_label - callback for finding a partition using its label
+ * @dev: device passed in by the caller
+ * @data: opaque pointer to the label to match
+ *
+ * Returns 1 if the device matches, and 0 otherwise.
+ */
+static int match_dev_by_label(struct device *dev, const void *data)
+{
+ struct block_device *bdev = dev_to_bdev(dev);
+ const char *label = data;
+
+ if (!bdev->bd_meta_info || strcmp(label, bdev->bd_meta_info->volname))
+ return 0;
+ return 1;
+}
+
+static int devt_from_partlabel(const char *label, dev_t *devt)
+{
+ struct device *dev;
+
+ dev = class_find_device(&block_class, NULL, label, &match_dev_by_label);
+ if (!dev)
+ return -ENODEV;
+ *devt = dev->devt;
+ put_device(dev);
+ return 0;
+}
+
+static int devt_from_devname(const char *name, dev_t *devt)
+{
+ int part;
+ char s[32];
+ char *p;
+
+ if (strlen(name) > 31)
+ return -EINVAL;
+ strcpy(s, name);
+ for (p = s; *p; p++) {
+ if (*p == '/')
+ *p = '!';
+ }
+
+ *devt = blk_lookup_devt(s, 0);
+ if (*devt)
+ return 0;
+
+ /*
+ * Try non-existent, but valid partition, which may only exist after
+ * opening the device, like partitioned md devices.
+ */
+ while (p > s && isdigit(p[-1]))
+ p--;
+ if (p == s || !*p || *p == '0')
+ return -EINVAL;
+
+ /* try disk name without <part number> */
+ part = simple_strtoul(p, NULL, 10);
+ *p = '\0';
+ *devt = blk_lookup_devt(s, part);
+ if (*devt)
+ return 0;
+
+ /* try disk name without p<part number> */
+ if (p < s + 2 || !isdigit(p[-2]) || p[-1] != 'p')
+ return -EINVAL;
+ p[-1] = '\0';
+ *devt = blk_lookup_devt(s, part);
+ if (*devt)
+ return 0;
+ return -EINVAL;
+}
+
+static int devt_from_devnum(const char *name, dev_t *devt)
+{
+ unsigned maj, min, offset;
+ char *p, dummy;
+
+ if (sscanf(name, "%u:%u%c", &maj, &min, &dummy) == 2 ||
+ sscanf(name, "%u:%u:%u:%c", &maj, &min, &offset, &dummy) == 3) {
+ *devt = MKDEV(maj, min);
+ if (maj != MAJOR(*devt) || min != MINOR(*devt))
+ return -EINVAL;
+ } else {
+ *devt = new_decode_dev(simple_strtoul(name, &p, 16));
+ if (*p)
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+/*
+ * Convert a name into device number. We accept the following variants:
+ *
+ * 1) <hex_major><hex_minor> device number in hexadecimal represents itself
+ * no leading 0x, for example b302.
+ * 3) /dev/<disk_name> represents the device number of disk
+ * 4) /dev/<disk_name><decimal> represents the device number
+ * of partition - device number of disk plus the partition number
+ * 5) /dev/<disk_name>p<decimal> - same as the above, that form is
+ * used when disk name of partitioned disk ends on a digit.
+ * 6) PARTUUID=00112233-4455-6677-8899-AABBCCDDEEFF representing the
+ * unique id of a partition if the partition table provides it.
+ * The UUID may be either an EFI/GPT UUID, or refer to an MSDOS
+ * partition using the format SSSSSSSS-PP, where SSSSSSSS is a zero-
+ * filled hex representation of the 32-bit "NT disk signature", and PP
+ * is a zero-filled hex representation of the 1-based partition number.
+ * 7) PARTUUID=<UUID>/PARTNROFF=<int> to select a partition in relation to
+ * a partition with a known unique id.
+ * 8) <major>:<minor> major and minor number of the device separated by
+ * a colon.
+ * 9) PARTLABEL=<name> with name being the GPT partition label.
+ * MSDOS partitions do not support labels!
+ *
+ * If name doesn't have fall into the categories above, we return (0,0).
+ * block_class is used to check if something is a disk name. If the disk
+ * name contains slashes, the device name has them replaced with
+ * bangs.
+ */
+int early_lookup_bdev(const char *name, dev_t *devt)
+{
+ if (strncmp(name, "PARTUUID=", 9) == 0)
+ return devt_from_partuuid(name + 9, devt);
+ if (strncmp(name, "PARTLABEL=", 10) == 0)
+ return devt_from_partlabel(name + 10, devt);
+ if (strncmp(name, "/dev/", 5) == 0)
+ return devt_from_devname(name + 5, devt);
+ return devt_from_devnum(name, devt);
+}
+EXPORT_SYMBOL_GPL(early_lookup_bdev);
diff --git a/init/do_mounts.c b/init/do_mounts.c
index 0b36a5f39ee8e2..780546a6cbfb6f 100644
--- a/init/do_mounts.c
+++ b/init/do_mounts.c
@@ -59,225 +59,6 @@ static int __init readwrite(char *str)
__setup("ro", readonly);
__setup("rw", readwrite);

-#ifdef CONFIG_BLOCK
-struct uuidcmp {
- const char *uuid;
- int len;
-};
-
-/**
- * match_dev_by_uuid - callback for finding a partition using its uuid
- * @dev: device passed in by the caller
- * @data: opaque pointer to the desired struct uuidcmp to match
- *
- * Returns 1 if the device matches, and 0 otherwise.
- */
-static int match_dev_by_uuid(struct device *dev, const void *data)
-{
- struct block_device *bdev = dev_to_bdev(dev);
- const struct uuidcmp *cmp = data;
-
- if (!bdev->bd_meta_info ||
- strncasecmp(cmp->uuid, bdev->bd_meta_info->uuid, cmp->len))
- return 0;
- return 1;
-}
-
-/**
- * devt_from_partuuid - looks up the dev_t of a partition by its UUID
- * @uuid_str: char array containing ascii UUID
- *
- * The function will return the first partition which contains a matching
- * UUID value in its partition_meta_info struct. This does not search
- * by filesystem UUIDs.
- *
- * If @uuid_str is followed by a "/PARTNROFF=%d", then the number will be
- * extracted and used as an offset from the partition identified by the UUID.
- *
- * Returns the matching dev_t on success or 0 on failure.
- */
-static int devt_from_partuuid(const char *uuid_str, dev_t *devt)
-{
- struct uuidcmp cmp;
- struct device *dev = NULL;
- int offset = 0;
- char *slash;
-
- cmp.uuid = uuid_str;
-
- slash = strchr(uuid_str, '/');
- /* Check for optional partition number offset attributes. */
- if (slash) {
- char c = 0;
-
- /* Explicitly fail on poor PARTUUID syntax. */
- if (sscanf(slash + 1, "PARTNROFF=%d%c", &offset, &c) != 1)
- goto out_invalid;
- cmp.len = slash - uuid_str;
- } else {
- cmp.len = strlen(uuid_str);
- }
-
- if (!cmp.len)
- goto out_invalid;
-
- dev = class_find_device(&block_class, NULL, &cmp, &match_dev_by_uuid);
- if (!dev)
- return -ENODEV;
-
- if (offset) {
- /*
- * Attempt to find the requested partition by adding an offset
- * to the partition number found by UUID.
- */
- *devt = part_devt(dev_to_disk(dev),
- dev_to_bdev(dev)->bd_partno + offset);
- } else {
- *devt = dev->devt;
- }
-
- put_device(dev);
- return 0;
-
-out_invalid:
- pr_err("VFS: PARTUUID= is invalid.\n"
- "Expected PARTUUID=<valid-uuid-id>[/PARTNROFF=%%d]\n");
- return -EINVAL;
-}
-
-/**
- * match_dev_by_label - callback for finding a partition using its label
- * @dev: device passed in by the caller
- * @data: opaque pointer to the label to match
- *
- * Returns 1 if the device matches, and 0 otherwise.
- */
-static int match_dev_by_label(struct device *dev, const void *data)
-{
- struct block_device *bdev = dev_to_bdev(dev);
- const char *label = data;
-
- if (!bdev->bd_meta_info || strcmp(label, bdev->bd_meta_info->volname))
- return 0;
- return 1;
-}
-
-static int devt_from_partlabel(const char *label, dev_t *devt)
-{
- struct device *dev;
-
- dev = class_find_device(&block_class, NULL, label, &match_dev_by_label);
- if (!dev)
- return -ENODEV;
- *devt = dev->devt;
- put_device(dev);
- return 0;
-}
-
-static int devt_from_devname(const char *name, dev_t *devt)
-{
- int part;
- char s[32];
- char *p;
-
- if (strlen(name) > 31)
- return -EINVAL;
- strcpy(s, name);
- for (p = s; *p; p++) {
- if (*p == '/')
- *p = '!';
- }
-
- *devt = blk_lookup_devt(s, 0);
- if (*devt)
- return 0;
-
- /*
- * Try non-existent, but valid partition, which may only exist after
- * opening the device, like partitioned md devices.
- */
- while (p > s && isdigit(p[-1]))
- p--;
- if (p == s || !*p || *p == '0')
- return -EINVAL;
-
- /* try disk name without <part number> */
- part = simple_strtoul(p, NULL, 10);
- *p = '\0';
- *devt = blk_lookup_devt(s, part);
- if (*devt)
- return 0;
-
- /* try disk name without p<part number> */
- if (p < s + 2 || !isdigit(p[-2]) || p[-1] != 'p')
- return -EINVAL;
- p[-1] = '\0';
- *devt = blk_lookup_devt(s, part);
- if (*devt)
- return 0;
- return -EINVAL;
-}
-
-static int devt_from_devnum(const char *name, dev_t *devt)
-{
- unsigned maj, min, offset;
- char *p, dummy;
-
- if (sscanf(name, "%u:%u%c", &maj, &min, &dummy) == 2 ||
- sscanf(name, "%u:%u:%u:%c", &maj, &min, &offset, &dummy) == 3) {
- *devt = MKDEV(maj, min);
- if (maj != MAJOR(*devt) || min != MINOR(*devt))
- return -EINVAL;
- } else {
- *devt = new_decode_dev(simple_strtoul(name, &p, 16));
- if (*p)
- return -EINVAL;
- }
-
- return 0;
-}
-
-/*
- * Convert a name into device number. We accept the following variants:
- *
- * 1) <hex_major><hex_minor> device number in hexadecimal represents itself
- * no leading 0x, for example b302.
- * 3) /dev/<disk_name> represents the device number of disk
- * 4) /dev/<disk_name><decimal> represents the device number
- * of partition - device number of disk plus the partition number
- * 5) /dev/<disk_name>p<decimal> - same as the above, that form is
- * used when disk name of partitioned disk ends on a digit.
- * 6) PARTUUID=00112233-4455-6677-8899-AABBCCDDEEFF representing the
- * unique id of a partition if the partition table provides it.
- * The UUID may be either an EFI/GPT UUID, or refer to an MSDOS
- * partition using the format SSSSSSSS-PP, where SSSSSSSS is a zero-
- * filled hex representation of the 32-bit "NT disk signature", and PP
- * is a zero-filled hex representation of the 1-based partition number.
- * 7) PARTUUID=<UUID>/PARTNROFF=<int> to select a partition in relation to
- * a partition with a known unique id.
- * 8) <major>:<minor> major and minor number of the device separated by
- * a colon.
- * 9) PARTLABEL=<name> with name being the GPT partition label.
- * MSDOS partitions do not support labels!
- *
- * If name doesn't have fall into the categories above, we return (0,0).
- * block_class is used to check if something is a disk name. If the disk
- * name contains slashes, the device name has them replaced with
- * bangs.
- */
-int early_lookup_bdev(const char *name, dev_t *devt)
-{
- if (strncmp(name, "PARTUUID=", 9) == 0)
- return devt_from_partuuid(name + 9, devt);
- if (strncmp(name, "PARTLABEL=", 10) == 0)
- return devt_from_partlabel(name + 10, devt);
- if (strncmp(name, "/dev/", 5) == 0)
- return devt_from_devname(name + 5, devt);
- return devt_from_devnum(name, devt);
-}
-EXPORT_SYMBOL_GPL(early_lookup_bdev);
-#endif
-
static int __init root_dev_setup(char *line)
{
strscpy(saved_root_name, line, sizeof(saved_root_name));
--
2.39.2


2023-05-31 13:23:17

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 08/24] init: pass root_device_name explicitly

Instead of declaring root_device_name as a global variable pass it as an
argument to the functions using it.

Signed-off-by: Christoph Hellwig <[email protected]>
---
init/do_mounts.c | 29 ++++++++++++++++-------------
init/do_mounts.h | 14 +++++++-------
init/do_mounts_initrd.c | 11 ++++++-----
3 files changed, 29 insertions(+), 25 deletions(-)

diff --git a/init/do_mounts.c b/init/do_mounts.c
index e708b02d9d6566..1405ee7218bf00 100644
--- a/init/do_mounts.c
+++ b/init/do_mounts.c
@@ -28,7 +28,6 @@
#include "do_mounts.h"

int root_mountflags = MS_RDONLY | MS_SILENT;
-static char * __initdata root_device_name;
static char __initdata saved_root_name[64];
static int root_wait;

@@ -391,7 +390,7 @@ static int __init do_mount_root(const char *name, const char *fs,
return ret;
}

-void __init mount_root_generic(char *name, int flags)
+void __init mount_root_generic(char *name, char *pretty_name, int flags)
{
struct page *page = alloc_page(GFP_KERNEL);
char *fs_names = page_address(page);
@@ -425,7 +424,7 @@ void __init mount_root_generic(char *name, int flags)
* and give them a list of the available devices
*/
printk("VFS: Cannot open root device \"%s\" or %s: error %d\n",
- root_device_name, b, err);
+ pretty_name, b, err);
printk("Please append a correct \"root=\" boot option; here are the available partitions:\n");

printk_all_partitions();
@@ -541,7 +540,7 @@ static bool __init fs_is_nodev(char *fstype)
return ret;
}

-static int __init mount_nodev_root(void)
+static int __init mount_nodev_root(char *root_device_name)
{
char *fs_names, *fstype;
int err = -EINVAL;
@@ -569,21 +568,21 @@ static int __init mount_nodev_root(void)
}

#ifdef CONFIG_BLOCK
-static void __init mount_block_root(void)
+static void __init mount_block_root(char *root_device_name)
{
int err = create_dev("/dev/root", ROOT_DEV);

if (err < 0)
pr_emerg("Failed to create /dev/root: %d\n", err);
- mount_root_generic("/dev/root", root_mountflags);
+ mount_root_generic("/dev/root", root_device_name, root_mountflags);
}
#else
-static inline void mount_block_root(void)
+static inline void mount_block_root(char *root_device_name)
{
}
#endif /* CONFIG_BLOCK */

-void __init mount_root(void)
+void __init mount_root(char *root_device_name)
{
switch (ROOT_DEV) {
case Root_NFS:
@@ -593,11 +592,12 @@ void __init mount_root(void)
mount_cifs_root();
break;
case 0:
- if (root_device_name && root_fs_names && mount_nodev_root() == 0)
+ if (root_device_name && root_fs_names &&
+ mount_nodev_root(root_device_name) == 0)
break;
fallthrough;
default:
- mount_block_root();
+ mount_block_root(root_device_name);
break;
}
}
@@ -607,6 +607,8 @@ void __init mount_root(void)
*/
void __init prepare_namespace(void)
{
+ char *root_device_name;
+
if (root_delay) {
printk(KERN_INFO "Waiting %d sec before mounting root device...\n",
root_delay);
@@ -628,7 +630,8 @@ void __init prepare_namespace(void)
root_device_name = saved_root_name;
if (!strncmp(root_device_name, "mtd", 3) ||
!strncmp(root_device_name, "ubi", 3)) {
- mount_root_generic(root_device_name, root_mountflags);
+ mount_root_generic(root_device_name, root_device_name,
+ root_mountflags);
goto out;
}
ROOT_DEV = name_to_dev_t(root_device_name);
@@ -636,7 +639,7 @@ void __init prepare_namespace(void)
root_device_name += 5;
}

- if (initrd_load())
+ if (initrd_load(root_device_name))
goto out;

/* wait for any asynchronous scanning to complete */
@@ -649,7 +652,7 @@ void __init prepare_namespace(void)
async_synchronize_full();
}

- mount_root();
+ mount_root(root_device_name);
out:
devtmpfs_mount();
init_mount(".", "/", NULL, MS_MOVE, NULL);
diff --git a/init/do_mounts.h b/init/do_mounts.h
index 33623025f6951a..15e372b00ce704 100644
--- a/init/do_mounts.h
+++ b/init/do_mounts.h
@@ -10,8 +10,8 @@
#include <linux/root_dev.h>
#include <linux/init_syscalls.h>

-void mount_root_generic(char *name, int flags);
-void mount_root(void);
+void mount_root_generic(char *name, char *pretty_name, int flags);
+void mount_root(char *root_device_name);
extern int root_mountflags;

static inline __init int create_dev(char *name, dev_t dev)
@@ -33,11 +33,11 @@ static inline int rd_load_image(char *from) { return 0; }
#endif

#ifdef CONFIG_BLK_DEV_INITRD
-
-bool __init initrd_load(void);
-
+bool __init initrd_load(char *root_device_name);
#else
-
-static inline bool initrd_load(void) { return false; }
+static inline bool initrd_load(char *root_device_name)
+{
+ return false;
+ }

#endif
diff --git a/init/do_mounts_initrd.c b/init/do_mounts_initrd.c
index 686d1ff3af4bb1..425f4bcf4b77e0 100644
--- a/init/do_mounts_initrd.c
+++ b/init/do_mounts_initrd.c
@@ -83,7 +83,7 @@ static int __init init_linuxrc(struct subprocess_info *info, struct cred *new)
return 0;
}

-static void __init handle_initrd(void)
+static void __init handle_initrd(char *root_device_name)
{
struct subprocess_info *info;
static char *argv[] = { "linuxrc", NULL, };
@@ -95,7 +95,8 @@ static void __init handle_initrd(void)
real_root_dev = new_encode_dev(ROOT_DEV);
create_dev("/dev/root.old", Root_RAM0);
/* mount initrd on rootfs' /root */
- mount_root_generic("/dev/root.old", root_mountflags & ~MS_RDONLY);
+ mount_root_generic("/dev/root.old", root_device_name,
+ root_mountflags & ~MS_RDONLY);
init_mkdir("/old", 0700);
init_chdir("/old");

@@ -117,7 +118,7 @@ static void __init handle_initrd(void)

init_chdir("/");
ROOT_DEV = new_decode_dev(real_root_dev);
- mount_root();
+ mount_root(root_device_name);

printk(KERN_NOTICE "Trying to move old root to /initrd ... ");
error = init_mount("/old", "/root/initrd", NULL, MS_MOVE, NULL);
@@ -133,7 +134,7 @@ static void __init handle_initrd(void)
}
}

-bool __init initrd_load(void)
+bool __init initrd_load(char *root_device_name)
{
if (mount_initrd) {
create_dev("/dev/ram", Root_RAM0);
@@ -145,7 +146,7 @@ bool __init initrd_load(void)
*/
if (rd_load_image("/initrd.image") && ROOT_DEV != Root_RAM0) {
init_unlink("/initrd.image");
- handle_initrd();
+ handle_initrd(root_device_name);
return true;
}
}
--
2.39.2


2023-06-24 00:41:58

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH 08/24] init: pass root_device_name explicitly

Hi,

On Wed, May 31, 2023 at 02:55:19PM +0200, Christoph Hellwig wrote:
> Instead of declaring root_device_name as a global variable pass it as an
> argument to the functions using it.
>
> Signed-off-by: Christoph Hellwig <[email protected]>

This patch results in the following build error when trying to build
xtensa:tinyconfig.

WARNING: modpost: vmlinux: section mismatch in reference: strcpy.isra.0+0x14 (section: .text.unlikely) -> initcall_level_names (section: .init.data)
ERROR: modpost: Section mismatches detected.

Unfortunately, reverting it is not possible due to conflicts,
so I can not confirm the bisect results.

Bisect log attached.

Guenter

---
# bad: [8d2be868b42c08290509c60515865f4de24ea704] Add linux-next specific files for 20230623
# good: [45a3e24f65e90a047bef86f927ebdc4c710edaa1] Linux 6.4-rc7
git bisect start 'HEAD' 'v6.4-rc7'
# good: [a5838c78db6a3a02e8d221e588c948f792e7f256] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next.git
git bisect good a5838c78db6a3a02e8d221e588c948f792e7f256
# bad: [cca41cc0b5485a0ec20707316c1a00082c01a2af] Merge branch 'for-next' of git://git.kernel.dk/linux-block.git
git bisect bad cca41cc0b5485a0ec20707316c1a00082c01a2af
# good: [901bdf5ea1a836400ee69aa32b04e9c209271ec7] Merge tag 'amd-drm-next-6.5-2023-06-09' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
git bisect good 901bdf5ea1a836400ee69aa32b04e9c209271ec7
# good: [b4666c320b8113d94b3f4624054562e7add57e4a] Merge branch 'for-next' of https://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394.git
git bisect good b4666c320b8113d94b3f4624054562e7add57e4a
# good: [b2c28785b125acb28a681462510297410cbbabd7] ASoC: dt-bindings: microchip,sama7g5-pdmc: Simplify "microchip,mic-pos" constraints
git bisect good b2c28785b125acb28a681462510297410cbbabd7
# bad: [9d217fb0e778d69b2e3988efbc441976c0fb29b5] nvme: reorder fields in 'struct nvme_ctrl'
git bisect bad 9d217fb0e778d69b2e3988efbc441976c0fb29b5
# good: [20d099756b98fa6b5b838448b1ffbce46f4f3283] block: Replace all non-returning strlcpy with strscpy
git bisect good 20d099756b98fa6b5b838448b1ffbce46f4f3283
# bad: [93c8f6f38be67e30adf8d8eb5e7e9ccb89326119] pktcdvd: Drop redundant castings for sector_t
git bisect bad 93c8f6f38be67e30adf8d8eb5e7e9ccb89326119
# bad: [c8643c72bc42781fc169c6498a3902bec447099e] init: pass root_device_name explicitly
git bisect bad c8643c72bc42781fc169c6498a3902bec447099e
# good: [87efb39075be6a288cd7f23858f15bd01c83028a] fs: add a method to shut down the file system
git bisect good 87efb39075be6a288cd7f23858f15bd01c83028a
# good: [aa5f6ed8c21ec1aa5fd688118d8d5cd87c5ffc1d] driver core: return bool from driver_probe_done
git bisect good aa5f6ed8c21ec1aa5fd688118d8d5cd87c5ffc1d
# good: [cc89c63e2fe37d476357c82390dfb12edcd41cdd] PM: hibernate: move finding the resume device out of software_resume
git bisect good cc89c63e2fe37d476357c82390dfb12edcd41cdd
# good: [e3102722ffe77094ba9e7e46380792b3dd8a7abd] init: rename mount_block_root to mount_root_generic
git bisect good e3102722ffe77094ba9e7e46380792b3dd8a7abd
# good: [a6a41d39c2d91ff2543d31b6cc6070f3957e3aea] init: refactor mount_root
git bisect good a6a41d39c2d91ff2543d31b6cc6070f3957e3aea
# first bad commit: [c8643c72bc42781fc169c6498a3902bec447099e] init: pass root_device_name explicitly

2023-06-26 08:10:47

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 08/24] init: pass root_device_name explicitly

On Fri, Jun 23, 2023 at 05:08:59PM -0700, Guenter Roeck wrote:
> Hi,
>
> On Wed, May 31, 2023 at 02:55:19PM +0200, Christoph Hellwig wrote:
> > Instead of declaring root_device_name as a global variable pass it as an
> > argument to the functions using it.
> >
> > Signed-off-by: Christoph Hellwig <[email protected]>
>
> This patch results in the following build error when trying to build
> xtensa:tinyconfig.
>
> WARNING: modpost: vmlinux: section mismatch in reference: strcpy.isra.0+0x14 (section: .text.unlikely) -> initcall_level_names (section: .init.data)
> ERROR: modpost: Section mismatches detected.

I can reproduce these with gcc 13.1 on xtensa, but the report makes
no sense to me. If I disable CONFIG_CC_OPTIMIZE_FOR_SIZE it now reports
a similar warning for put_page intead of strcpy which seems just as
arcance.


2023-06-26 15:07:01

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH 08/24] init: pass root_device_name explicitly

On 6/26/23 00:53, Christoph Hellwig wrote:
> On Fri, Jun 23, 2023 at 05:08:59PM -0700, Guenter Roeck wrote:
>> Hi,
>>
>> On Wed, May 31, 2023 at 02:55:19PM +0200, Christoph Hellwig wrote:
>>> Instead of declaring root_device_name as a global variable pass it as an
>>> argument to the functions using it.
>>>
>>> Signed-off-by: Christoph Hellwig <[email protected]>
>>
>> This patch results in the following build error when trying to build
>> xtensa:tinyconfig.
>>
>> WARNING: modpost: vmlinux: section mismatch in reference: strcpy.isra.0+0x14 (section: .text.unlikely) -> initcall_level_names (section: .init.data)
>> ERROR: modpost: Section mismatches detected.
>
> I can reproduce these with gcc 13.1 on xtensa, but the report makes
> no sense to me. If I disable CONFIG_CC_OPTIMIZE_FOR_SIZE it now reports
> a similar warning for put_page intead of strcpy which seems just as
> arcance.
>

I don't see that (I tried 11.3, 11.4, 12.3, and 13.1), but then I am not sure
if this is worth tracking down. I just force CONFIG_SECTION_MISMATCH_WARN_ONLY=y
for xtensa builds instead.

Guenter


2023-06-27 10:55:55

by Max Filippov

[permalink] [raw]
Subject: Re: [PATCH 08/24] init: pass root_device_name explicitly

On Mon, Jun 26, 2023 at 8:10 AM Guenter Roeck <[email protected]> wrote:
>
> On 6/26/23 00:53, Christoph Hellwig wrote:
> > On Fri, Jun 23, 2023 at 05:08:59PM -0700, Guenter Roeck wrote:
> >> Hi,
> >>
> >> On Wed, May 31, 2023 at 02:55:19PM +0200, Christoph Hellwig wrote:
> >>> Instead of declaring root_device_name as a global variable pass it as an
> >>> argument to the functions using it.
> >>>
> >>> Signed-off-by: Christoph Hellwig <[email protected]>
> >>
> >> This patch results in the following build error when trying to build
> >> xtensa:tinyconfig.
> >>
> >> WARNING: modpost: vmlinux: section mismatch in reference: strcpy.isra.0+0x14 (section: .text.unlikely) -> initcall_level_names (section: .init.data)
> >> ERROR: modpost: Section mismatches detected.
> >
> > I can reproduce these with gcc 13.1 on xtensa, but the report makes
> > no sense to me. If I disable CONFIG_CC_OPTIMIZE_FOR_SIZE it now reports
> > a similar warning for put_page intead of strcpy which seems just as
> > arcance.
> >
>
> I don't see that (I tried 11.3, 11.4, 12.3, and 13.1), but then I am not sure
> if this is worth tracking down. I just force CONFIG_SECTION_MISMATCH_WARN_ONLY=y
> for xtensa builds instead.

I believe it's yet another manifestation of the following issue:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92938

Hunting is still on my todo list, but it's quite low, so I guess
forcing CONFIG_SECTION_MISMATCH_WARN_ONLY=y for xtensa
is the right thing to do for now.

--
Thanks.
-- Max

2023-08-03 10:26:49

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH 04/24] PM: hibernate: move finding the resume device out of software_resume

On 5/31/23 14:55, Christoph Hellwig wrote:
> software_resume can be called either from an init call in the boot code,
> or from sysfs once the system has finished booting, and the two
> invocation methods this can't race with each other.
>
> For the latter case we did just parse the suspend device manually, while
> the former might not have one. Split software_resume so that the search
> only happens for the boot case, which also means the special lockdep
> nesting annotation can go away as the system transition mutex can be
> taken a little later and doesn't have the sysfs locking nest inside it.
>
> Signed-off-by: Christoph Hellwig <[email protected]>
> Acked-by: Rafael J. Wysocki <[email protected]>

This caused a regression for me in 6.5-rc1+, fix below.

----8<----
From 95a310ae6cfae9b3cab61e54a1bce488c3ab93a1 Mon Sep 17 00:00:00 2001
From: Vlastimil Babka <[email protected]>
Date: Wed, 2 Aug 2023 15:46:18 +0200
Subject: [PATCH] PM: hibernate: fix resume_store() return value when
hibernation not available

On a laptop with hibernation set up but not actively used, and with
secure boot and lockdown enabled kernel, 6.5-rc1 gets stuck on boot with
the following repeated messages:

A start job is running for Resume from hibernation using device /dev/system/swap (24s / no limit)
lockdown_is_locked_down: 25311154 callbacks suppressed
Lockdown: systemd-hiberna: hibernation is restricted; see man kernel_lockdown.7
...

Checking the resume code leads to commit cc89c63e2fe3 ("PM: hibernate:
move finding the resume device out of software_resume") which
inadvertently changed the return value from resume_store() to 0 when
!hibernation_available(). This apparently translates to userspace
write() returning 0 as in number of bytes written, and userspace looping
indefinitely in the attempt to write the intended value.

Fix this by returning the full number of bytes that were to be written,
as that's what was done before the commit.

Fixes: cc89c63e2fe3 ("PM: hibernate: move finding the resume device out of software_resume")
Signed-off-by: Vlastimil Babka <[email protected]>
---
kernel/power/hibernate.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
index e1b4bfa938dd..2b4a946a6ff5 100644
--- a/kernel/power/hibernate.c
+++ b/kernel/power/hibernate.c
@@ -1166,7 +1166,7 @@ static ssize_t resume_store(struct kobject *kobj, struct kobj_attribute *attr,
int error;

if (!hibernation_available())
- return 0;
+ return n;

if (len && buf[len-1] == '\n')
len--;
--
2.41.0



2023-08-04 15:02:22

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 04/24] PM: hibernate: move finding the resume device out of software_resume

On Fri, Aug 04, 2023 at 12:31:01PM +0200, Christoph Hellwig wrote:
> Looks good, thanks!
>
> Reviewed-by: Christoph Hellwig <[email protected]>
>

Acked-by: Greg Kroah-Hartman <[email protected]>



2023-08-05 13:59:24

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: [PATCH 04/24] PM: hibernate: move finding the resume device out of software_resume

[TLDR: I'm adding this report to the list of tracked Linux kernel
regressions; the text you find below is based on a few templates
paragraphs you might have encountered already in similar form.
See link in footer if these mails annoy you.]

On 03.08.23 10:27, Vlastimil Babka wrote:
> On 5/31/23 14:55, Christoph Hellwig wrote:
>> software_resume can be called either from an init call in the boot code,
>> or from sysfs once the system has finished booting, and the two
>> invocation methods this can't race with each other.
>>
>> For the latter case we did just parse the suspend device manually, while
>> the former might not have one. Split software_resume so that the search
>> only happens for the boot case, which also means the special lockdep
>> nesting annotation can go away as the system transition mutex can be
>> taken a little later and doesn't have the sysfs locking nest inside it.
>>
>> Signed-off-by: Christoph Hellwig <[email protected]>
>> Acked-by: Rafael J. Wysocki <[email protected]>
>
> This caused a regression for me in 6.5-rc1+, fix below.
>
> ----8<----
>>From 95a310ae6cfae9b3cab61e54a1bce488c3ab93a1 Mon Sep 17 00:00:00 2001
> From: Vlastimil Babka <[email protected]>
> Date: Wed, 2 Aug 2023 15:46:18 +0200
> Subject: [PATCH] PM: hibernate: fix resume_store() return value when
> hibernation not available
>
> On a laptop with hibernation set up but not actively used, and with
> secure boot and lockdown enabled kernel, 6.5-rc1 gets stuck on boot with
> the following repeated messages:
>
> A start job is running for Resume from hibernation using device /dev/system/swap (24s / no limit)
> lockdown_is_locked_down: 25311154 callbacks suppressed
> Lockdown: systemd-hiberna: hibernation is restricted; see man kernel_lockdown.7
> ...
>
> Checking the resume code leads to commit cc89c63e2fe3 ("PM: hibernate:
> move finding the resume device out of software_resume") which
> inadvertently changed the return value from resume_store() to 0 when
> !hibernation_available(). This apparently translates to userspace
> write() returning 0 as in number of bytes written, and userspace looping
> indefinitely in the attempt to write the intended value.
>
> Fix this by returning the full number of bytes that were to be written,
> as that's what was done before the commit.
>
> Fixes: cc89c63e2fe3 ("PM: hibernate: move finding the resume device out of software_resume")
> [...]

Thanks for the report. To be sure the issue doesn't fall through the
cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
tracking bot:

#regzbot ^introduced cc89c63e2fe3
#regzbot title pm: boot problems when hibernate is configured and kernel
locked down
#regzbot fix: PM: hibernate: fix resume_store() return value when
hibernation not available
#regzbot ignore-activity

This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply and tell me -- ideally
while also telling regzbot about it, as explained by the page listed in
the footer of this mail.

Developers: When fixing the issue, remember to add 'Link:' tags pointing
to the report (the parent of this mail). See page linked in footer for
details.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.