I'm happy to announce the first non-RFC version of this patch set.
Over the xmas holidays I found some time to experiment with various userspace
implementations of MTDs and gave the kernel side more fine-tuning.
Rationale:
----------
When working with flash devices a common task is emulating them to run various
tests or inspect dumps from real hardware. To achieve that we have plenty of
emulators in the MTD subsystem: mtdram, block2mtd, nandsim.
Each of them implements an ad-hoc MTD and have various drawbacks.
Over the last years some developers tried to extend them but these attempts
often got rejected because they added just more adhoc feature instead of
addressing overall problems.
MUSE is a novel approach to address the need of advanced MTD emulators.
Advanced means in this context supporting different (vendor specific) image
formats, different ways for fault injection (fuzzing) and recoding/replaying
IOs to emulate power cuts.
The core goal of MUSE is having the complexity on the userspace side and
only a small MTD driver in kernelspace.
While playing with different approaches I realized that FUSE offers everything
we need. So MUSE is a little like CUSE except that it does not implement a
bare character device but an MTD.
Notes:
------
- OOB support is currently limited. Currently MUSE has no support for processing
in- and out-band in the same MTD operation. It is good enough to make JFFS2
happy. This limitation is because FUSE has no support more than one variable
length buffer in a FUSE request.
At least I didn’t find a good way to pass more than one buffer to a request.
Maybe FUSE folks can correct me. :-)
- Every MTD read/write operation maps 1:1 to a MUSE_READ/WRITE opcode.
Since FUSE requests are not cheap, the amount of read/written bytes in a MTD
operation as a huge impact on the performance. Especially when NOR style MTDs
are implemented in userspace a large writebufsize should be requested to gain
good write performance.
On the other hand, MTD operations with lengths larger than writesize are *not*
split up into multiple MUSE_READ/WRITE requests. This means that userspace
has to split them manually when doing power-cut emulation.
- MUSE is not super fast. On my i5 workstation nandsim is almost twice as fast
as a NAND flash in userspace. But MUSE is still magnitudes faster than any
real world MTD out there. So it is good enough for the use cases I have in
mind.
Changelog:
----------
Changes since v2 (RFC):
- OOB support
- MUSE_READ/WRITE opcodes are no longer a min IO MTD unit
- MTD partitions support via mtdparts string
- More code cleanup
- Code rebased to 5.11-rc4
Changes since v1 (RFC):
- Rewrote IO path, fuse_direct_io() is no longer used.
Instead of cheating fuse_direct_io() use custom ops to implement
reading and writing. That way MUSE no longer needs a dummy file object
nor a fuse file object.
In MTD all IO is synchronous and operations on kernel buffers, this
makes IO processing simple for MUSE.
- Support for bad blocks.
- No more (ab)use of FUSE ops such as FUSE_FSYNC.
- Major code cleanup.
This series can also be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/rw/misc.git muse_v3
Richard Weinberger (8):
fuse: Export fuse_simple_request
fuse: Export IO helpers
fuse: Make cuse_parse_one a common helper
mtd: Add MTD_MUSE flag
mtd: Allow passing a custom cmdline to cmdline line parser
fuse: Add MUSE specific defines FUSE interface
fuse: Implement MUSE - MTD in userspace
MAINTAINERS: Add entry for MUSE
Documentation/ABI/testing/sysfs-class-mtd | 8 +
MAINTAINERS | 7 +
drivers/mtd/parsers/cmdlinepart.c | 73 +-
fs/fuse/Kconfig | 15 +
fs/fuse/Makefile | 2 +
fs/fuse/cuse.c | 58 +-
fs/fuse/dev.c | 1 +
fs/fuse/file.c | 16 +-
fs/fuse/fuse_i.h | 18 +
fs/fuse/helper.c | 70 ++
fs/fuse/muse.c | 1086 +++++++++++++++++++++
include/linux/mtd/partitions.h | 2 +
include/uapi/linux/fuse.h | 76 ++
include/uapi/mtd/mtd-abi.h | 1 +
14 files changed, 1346 insertions(+), 87 deletions(-)
create mode 100644 fs/fuse/helper.c
create mode 100644 fs/fuse/muse.c
--
2.26.2
This function will be used by MUSE too, let's share it.
Signed-off-by: Richard Weinberger <[email protected]>
---
fs/fuse/Kconfig | 4 +++
fs/fuse/Makefile | 1 +
fs/fuse/cuse.c | 58 +--------------------------------------
fs/fuse/fuse_i.h | 2 ++
fs/fuse/helper.c | 70 ++++++++++++++++++++++++++++++++++++++++++++++++
5 files changed, 78 insertions(+), 57 deletions(-)
create mode 100644 fs/fuse/helper.c
diff --git a/fs/fuse/Kconfig b/fs/fuse/Kconfig
index 40ce9a1c12e5..9c8cc1e7b3a5 100644
--- a/fs/fuse/Kconfig
+++ b/fs/fuse/Kconfig
@@ -18,9 +18,13 @@ config FUSE_FS
If you want to develop a userspace FS, or if you want to use
a filesystem based on FUSE, answer Y or M.
+config FUSE_HELPER
+ def_bool n
+
config CUSE
tristate "Character device in Userspace support"
depends on FUSE_FS
+ select FUSE_HELPER
help
This FUSE extension allows character devices to be
implemented in userspace.
diff --git a/fs/fuse/Makefile b/fs/fuse/Makefile
index 8c7021fb2cd4..7a5768cce6be 100644
--- a/fs/fuse/Makefile
+++ b/fs/fuse/Makefile
@@ -9,5 +9,6 @@ obj-$(CONFIG_VIRTIO_FS) += virtiofs.o
fuse-y := dev.o dir.o file.o inode.o control.o xattr.o acl.o readdir.o
fuse-$(CONFIG_FUSE_DAX) += dax.o
+fuse-$(CONFIG_FUSE_HELPER) += helper.o
virtiofs-y := virtio_fs.o
diff --git a/fs/fuse/cuse.c b/fs/fuse/cuse.c
index 45082269e698..fe8515844064 100644
--- a/fs/fuse/cuse.c
+++ b/fs/fuse/cuse.c
@@ -199,62 +199,6 @@ struct cuse_devinfo {
const char *name;
};
-/**
- * cuse_parse_one - parse one key=value pair
- * @pp: i/o parameter for the current position
- * @end: points to one past the end of the packed string
- * @keyp: out parameter for key
- * @valp: out parameter for value
- *
- * *@pp points to packed strings - "key0=val0\0key1=val1\0" which ends
- * at @end - 1. This function parses one pair and set *@keyp to the
- * start of the key and *@valp to the start of the value. Note that
- * the original string is modified such that the key string is
- * terminated with '\0'. *@pp is updated to point to the next string.
- *
- * RETURNS:
- * 1 on successful parse, 0 on EOF, -errno on failure.
- */
-static int cuse_parse_one(char **pp, char *end, char **keyp, char **valp)
-{
- char *p = *pp;
- char *key, *val;
-
- while (p < end && *p == '\0')
- p++;
- if (p == end)
- return 0;
-
- if (end[-1] != '\0') {
- pr_err("info not properly terminated\n");
- return -EINVAL;
- }
-
- key = val = p;
- p += strlen(p);
-
- if (valp) {
- strsep(&val, "=");
- if (!val)
- val = key + strlen(key);
- key = strstrip(key);
- val = strstrip(val);
- } else
- key = strstrip(key);
-
- if (!strlen(key)) {
- pr_err("zero length info key specified\n");
- return -EINVAL;
- }
-
- *pp = p;
- *keyp = key;
- if (valp)
- *valp = val;
-
- return 1;
-}
-
/**
* cuse_parse_dev_info - parse device info
* @p: device info string
@@ -275,7 +219,7 @@ static int cuse_parse_devinfo(char *p, size_t len, struct cuse_devinfo *devinfo)
int rc;
while (true) {
- rc = cuse_parse_one(&p, end, &key, &val);
+ rc = fuse_kv_parse_one(&p, end, &key, &val);
if (rc < 0)
return rc;
if (!rc)
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 8c56a3fd2c4e..555856b0d998 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -1228,5 +1228,7 @@ void fuse_dax_cancel_work(struct fuse_conn *fc);
/* file.c */
struct page **fuse_pages_alloc(unsigned int npages, gfp_t flags,
struct fuse_page_desc **desc);
+/* helper.c */
+int fuse_kv_parse_one(char **pp, char *end, char **keyp, char **valp);
#endif /* _FS_FUSE_I_H */
diff --git a/fs/fuse/helper.c b/fs/fuse/helper.c
new file mode 100644
index 000000000000..0c828daf8e8a
--- /dev/null
+++ b/fs/fuse/helper.c
@@ -0,0 +1,70 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Helper functions used by CUSE and MUSE
+ *
+ * Copyright (C) 2008-2009 SUSE Linux Products GmbH
+ * Copyright (C) 2008-2009 Tejun Heo <[email protected]>
+ *
+ */
+
+#include <linux/string.h>
+#include <linux/module.h>
+
+#include "fuse_i.h"
+
+/**
+ * fuse_kv_parse_one - parse one key=value pair
+ * @pp: i/o parameter for the current position
+ * @end: points to one past the end of the packed string
+ * @keyp: out parameter for key
+ * @valp: out parameter for value
+ *
+ * *@pp points to packed strings - "key0=val0\0key1=val1\0" which ends
+ * at @end - 1. This function parses one pair and set *@keyp to the
+ * start of the key and *@valp to the start of the value. Note that
+ * the original string is modified such that the key string is
+ * terminated with '\0'. *@pp is updated to point to the next string.
+ *
+ * RETURNS:
+ * 1 on successful parse, 0 on EOF, -errno on failure.
+ */
+int fuse_kv_parse_one(char **pp, char *end, char **keyp, char **valp)
+{
+ char *p = *pp;
+ char *key, *val;
+
+ while (p < end && *p == '\0')
+ p++;
+ if (p == end)
+ return 0;
+
+ if (end[-1] != '\0') {
+ pr_err("info not properly terminated\n");
+ return -EINVAL;
+ }
+
+ key = val = p;
+ p += strlen(p);
+
+ if (valp) {
+ strsep(&val, "=");
+ if (!val)
+ val = key + strlen(key);
+ key = strstrip(key);
+ val = strstrip(val);
+ } else
+ key = strstrip(key);
+
+ if (!strlen(key)) {
+ pr_err("zero length info key specified\n");
+ return -EINVAL;
+ }
+
+ *pp = p;
+ *keyp = key;
+ if (valp)
+ *valp = val;
+
+ return 1;
+}
+EXPORT_SYMBOL_GPL(fuse_kv_parse_one);
--
2.26.2
MUSE will use this function to issue requests,
so export it.
Signed-off-by: Richard Weinberger <[email protected]>
---
fs/fuse/dev.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 588f8d1240aa..8b7209537683 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -522,6 +522,7 @@ ssize_t fuse_simple_request(struct fuse_mount *fm, struct fuse_args *args)
return ret;
}
+EXPORT_SYMBOL_GPL(fuse_simple_request);
static bool fuse_request_queue_background(struct fuse_req *req)
{
--
2.26.2
MUSE will use this functions in its IO path,
so export them.
Signed-off-by: Richard Weinberger <[email protected]>
---
fs/fuse/file.c | 16 +++-------------
fs/fuse/fuse_i.h | 16 ++++++++++++++++
2 files changed, 19 insertions(+), 13 deletions(-)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 8cccecb55fb8..d41660b7f5bc 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -20,8 +20,8 @@
#include <linux/uio.h>
#include <linux/fs.h>
-static struct page **fuse_pages_alloc(unsigned int npages, gfp_t flags,
- struct fuse_page_desc **desc)
+struct page **fuse_pages_alloc(unsigned int npages, gfp_t flags,
+ struct fuse_page_desc **desc)
{
struct page **pages;
@@ -31,6 +31,7 @@ static struct page **fuse_pages_alloc(unsigned int npages, gfp_t flags,
return pages;
}
+EXPORT_SYMBOL_GPL(fuse_pages_alloc);
static int fuse_send_open(struct fuse_mount *fm, u64 nodeid, struct file *file,
int opcode, struct fuse_open_out *outargp)
@@ -1356,17 +1357,6 @@ static inline void fuse_page_descs_length_init(struct fuse_page_desc *descs,
descs[i].length = PAGE_SIZE - descs[i].offset;
}
-static inline unsigned long fuse_get_user_addr(const struct iov_iter *ii)
-{
- return (unsigned long)ii->iov->iov_base + ii->iov_offset;
-}
-
-static inline size_t fuse_get_frag_size(const struct iov_iter *ii,
- size_t max_size)
-{
- return min(iov_iter_single_seg_count(ii), max_size);
-}
-
static int fuse_get_user_pages(struct fuse_args_pages *ap, struct iov_iter *ii,
size_t *nbytesp, int write,
unsigned int max_pages)
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 7c4b8cb93f9f..8c56a3fd2c4e 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -31,6 +31,7 @@
#include <linux/pid_namespace.h>
#include <linux/refcount.h>
#include <linux/user_namespace.h>
+#include <linux/uio.h>
/** Default max number of pages that can be used in a single read request */
#define FUSE_DEFAULT_MAX_PAGES_PER_REQ 32
@@ -871,6 +872,17 @@ static inline bool fuse_is_bad(struct inode *inode)
return unlikely(test_bit(FUSE_I_BAD, &get_fuse_inode(inode)->state));
}
+static inline unsigned long fuse_get_user_addr(const struct iov_iter *ii)
+{
+ return (unsigned long)ii->iov->iov_base + ii->iov_offset;
+}
+
+static inline size_t fuse_get_frag_size(const struct iov_iter *ii,
+ size_t max_size)
+{
+ return min(iov_iter_single_seg_count(ii), max_size);
+}
+
/** Device operations */
extern const struct file_operations fuse_dev_operations;
@@ -1213,4 +1225,8 @@ void fuse_dax_inode_cleanup(struct inode *inode);
bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int map_alignment);
void fuse_dax_cancel_work(struct fuse_conn *fc);
+/* file.c */
+struct page **fuse_pages_alloc(unsigned int npages, gfp_t flags,
+ struct fuse_page_desc **desc);
+
#endif /* _FS_FUSE_I_H */
--
2.26.2
This flag will be set if an MTD is implemeted in userspace
using MUSE.
Signed-off-by: Richard Weinberger <[email protected]>
---
include/uapi/mtd/mtd-abi.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/include/uapi/mtd/mtd-abi.h b/include/uapi/mtd/mtd-abi.h
index 65b9db936557..2ad2217e3a96 100644
--- a/include/uapi/mtd/mtd-abi.h
+++ b/include/uapi/mtd/mtd-abi.h
@@ -105,6 +105,7 @@ struct mtd_write_req {
#define MTD_NO_ERASE 0x1000 /* No erase necessary */
#define MTD_POWERUP_LOCK 0x2000 /* Always locked after reset */
#define MTD_SLC_ON_MLC_EMULATION 0x4000 /* Emulate SLC behavior on MLC NANDs */
+#define MTD_MUSE 0x8000 /* This MTD is implemented in userspace */
/* Some common devices / combinations of capabilities */
#define MTD_CAP_ROM 0
--
2.26.2
The cmdline parser uses usually the mtdparts string from the kernel cmdline.
For special purpose MTDs like MUSE it is useful to pass a custom
mtdparts string to the parser. This allows the MTD simulator in
userspace directly passing a mtdparts string and the new MTD
partitioned.
To achieve this, struct mtd_part_parser_data now has a mtdparts pointer
where a custom mtdparts string can be provided, it overrules the kernel
cmdline.
Since the cmdline parser stays for ever in the kernel, the memory lifecycle
had to be changed a little such that custom mtdparts string don't result
in memory leaks.
Signed-off-by: Richard Weinberger <[email protected]>
---
drivers/mtd/parsers/cmdlinepart.c | 73 ++++++++++++++++++++++++-------
include/linux/mtd/partitions.h | 2 +
2 files changed, 58 insertions(+), 17 deletions(-)
diff --git a/drivers/mtd/parsers/cmdlinepart.c b/drivers/mtd/parsers/cmdlinepart.c
index 0ddff1a4b51f..f0fe87267380 100644
--- a/drivers/mtd/parsers/cmdlinepart.c
+++ b/drivers/mtd/parsers/cmdlinepart.c
@@ -64,7 +64,7 @@ struct cmdline_mtd_partition {
};
/* mtdpart_setup() parses into here */
-static struct cmdline_mtd_partition *partitions;
+static struct cmdline_mtd_partition *cmdline_partitions;
/* the command line passed to mtdpart_setup() */
static char *mtdparts;
@@ -138,9 +138,6 @@ static struct mtd_partition * newpart(char *s,
name_len = 13; /* Partition_000 */
}
- /* record name length for memory allocation later */
- extra_mem_size += name_len + 1;
-
/* test for options */
if (strncmp(s, "ro", 2) == 0) {
mask_flags |= MTD_WRITEABLE;
@@ -192,12 +189,17 @@ static struct mtd_partition * newpart(char *s,
parts[this_part].offset = offset;
parts[this_part].mask_flags = mask_flags;
parts[this_part].add_flags = add_flags;
+
+ /*
+ * Will get free()'ed in ->cleanup()
+ */
if (name)
- strlcpy(extra_mem, name, name_len + 1);
+ parts[this_part].name = kmemdup_nul(name, name_len, GFP_KERNEL);
else
- sprintf(extra_mem, "Partition_%03d", this_part);
- parts[this_part].name = extra_mem;
- extra_mem += name_len + 1;
+ parts[this_part].name = kasprintf(GFP_KERNEL, "Partition_%03d", this_part);
+
+ if (!parts[this_part].name)
+ return ERR_PTR(-ENOMEM);
dbg(("partition %d: name <%s>, offset %llx, size %llx, mask flags %x\n",
this_part, parts[this_part].name, parts[this_part].offset,
@@ -217,7 +219,7 @@ static struct mtd_partition * newpart(char *s,
/*
* Parse the command line.
*/
-static int mtdpart_setup_real(char *s)
+static int mtdpart_setup_real(char *s, struct cmdline_mtd_partition **partitions)
{
cmdline_parsed = 1;
@@ -301,8 +303,8 @@ static int mtdpart_setup_real(char *s)
strlcpy(this_mtd->mtd_id, mtd_id, mtd_id_len + 1);
/* link into chain */
- this_mtd->next = partitions;
- partitions = this_mtd;
+ this_mtd->next = *partitions;
+ *partitions = this_mtd;
dbg(("mtdid=<%s> num_parts=<%d>\n",
this_mtd->mtd_id, this_mtd->num_parts));
@@ -335,13 +337,23 @@ static int parse_cmdline_partitions(struct mtd_info *master,
struct mtd_part_parser_data *data)
{
unsigned long long offset;
- int i, err;
+ int i, err, num_parts;
struct cmdline_mtd_partition *part;
const char *mtd_id = master->name;
+ struct cmdline_mtd_partition *parsed_parts = cmdline_partitions;
+ bool free_parts = false;
+
+ if (data && data->mtdparts) {
+ parsed_parts = NULL;
- /* parse command line */
- if (!cmdline_parsed) {
- err = mtdpart_setup_real(cmdline);
+ err = mtdpart_setup_real(data->mtdparts, &parsed_parts);
+ if (err)
+ return err;
+
+ free_parts = true;
+ } else if (!cmdline_parsed) {
+ /* parse command line */
+ err = mtdpart_setup_real(cmdline, &cmdline_partitions);
if (err)
return err;
}
@@ -350,7 +362,7 @@ static int parse_cmdline_partitions(struct mtd_info *master,
* Search for the partition definition matching master->name.
* If master->name is not set, stop at first partition definition.
*/
- for (part = partitions; part; part = part->next) {
+ for (part = parsed_parts; part; part = part->next) {
if ((!mtd_id) || (!strcmp(part->mtd_id, mtd_id)))
break;
}
@@ -384,12 +396,38 @@ static int parse_cmdline_partitions(struct mtd_info *master,
}
}
+ /*
+ * Will get free()'ed in ->cleanup()
+ */
*pparts = kmemdup(part->parts, sizeof(*part->parts) * part->num_parts,
GFP_KERNEL);
if (!*pparts)
return -ENOMEM;
- return part->num_parts;
+ num_parts = part->num_parts;
+
+ if (free_parts == true) {
+ part = parsed_parts;
+ while (part) {
+ struct cmdline_mtd_partition *next = part->next;
+
+ kfree(part->parts);
+ part = next;
+ }
+ }
+
+ return num_parts;
+}
+
+static void cmdline_partitions_cleanup(const struct mtd_partition *pparts,
+ int nr_parts)
+{
+ int i;
+
+ for (i = 0; i < nr_parts; i++)
+ kfree(pparts[i].name);
+
+ kfree(pparts);
}
@@ -410,6 +448,7 @@ __setup("mtdparts=", mtdpart_setup);
static struct mtd_part_parser cmdline_parser = {
.parse_fn = parse_cmdline_partitions,
+ .cleanup = cmdline_partitions_cleanup,
.name = "cmdlinepart",
};
diff --git a/include/linux/mtd/partitions.h b/include/linux/mtd/partitions.h
index b74a539ec581..6c8b3399143d 100644
--- a/include/linux/mtd/partitions.h
+++ b/include/linux/mtd/partitions.h
@@ -65,9 +65,11 @@ struct device_node;
/**
* struct mtd_part_parser_data - used to pass data to MTD partition parsers.
* @origin: for RedBoot, start address of MTD device
+ * @mtdparts: for cmdline parser, use this string instead of mtdparts= from cmdline
*/
struct mtd_part_parser_data {
unsigned long origin;
+ char *mtdparts;
};
--
2.26.2
Raise the FUSE API minor version to 34 and add all
MUSE specific operations and data structures.
MUSE_INIT: Initialize a new connection and installs the MTD
MUSE_ERASE: Erases a block
MUSE_READ: Reads a page with or without OOB
MUSE_WRITE: Writes a page with or without OOB
MUSE_MARKBAD: Marks a block as bad
MUSE_ISBAD: Checks whether a block is bad
MUSE_SYNC: Flushes all cached data
Signed-off-by: Richard Weinberger <[email protected]>
---
include/uapi/linux/fuse.h | 76 +++++++++++++++++++++++++++++++++++++++
1 file changed, 76 insertions(+)
diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
index 98ca64d1beb6..1c8fa9e42e73 100644
--- a/include/uapi/linux/fuse.h
+++ b/include/uapi/linux/fuse.h
@@ -179,6 +179,10 @@
* 7.33
* - add FUSE_HANDLE_KILLPRIV_V2, FUSE_WRITE_KILL_SUIDGID, FATTR_KILL_SUIDGID
* - add FUSE_OPEN_KILL_SUIDGID
+ *
+ * 7.34
+ * - add support for MUSE: MUSE_INIT, MUSE_ERASE, MUSE_READ, MUSE_WRITE,
+ * MUSE_MARKBAD, MUSE_ISBAD and MUSE_SYNC
*/
#ifndef _LINUX_FUSE_H
@@ -503,6 +507,15 @@ enum fuse_opcode {
/* CUSE specific operations */
CUSE_INIT = 4096,
+ /* MUSE specific operations */
+ MUSE_INIT = 8192,
+ MUSE_ERASE = 8193,
+ MUSE_READ = 8194,
+ MUSE_WRITE = 8195,
+ MUSE_MARKBAD = 8196,
+ MUSE_ISBAD = 8197,
+ MUSE_SYNC = 8198,
+
/* Reserved opcodes: helpful to detect structure endian-ness */
CUSE_INIT_BSWAP_RESERVED = 1048576, /* CUSE_INIT << 8 */
FUSE_INIT_BSWAP_RESERVED = 436207616, /* FUSE_INIT << 24 */
@@ -956,4 +969,67 @@ struct fuse_removemapping_one {
#define FUSE_REMOVEMAPPING_MAX_ENTRY \
(PAGE_SIZE / sizeof(struct fuse_removemapping_one))
+#define MUSE_INIT_INFO_MAX 4096
+
+struct muse_init_in {
+ uint32_t fuse_major;
+ uint32_t fuse_minor;
+};
+
+struct muse_init_out {
+ uint32_t fuse_major;
+ uint32_t fuse_minor;
+ uint32_t max_read;
+ uint32_t max_write;
+};
+
+struct muse_erase_in {
+ uint64_t addr;
+ uint64_t len;
+};
+
+#define MUSE_IO_INBAND (1 << 0)
+#define MUSE_IO_OOB_AUTO (1 << 1)
+#define MUSE_IO_OOB_PLACE (1 << 2)
+#define MUSE_IO_RAW (1 << 3)
+
+struct muse_read_in {
+ uint64_t addr;
+ uint64_t len;
+ uint32_t flags;
+ uint32_t padding;
+};
+
+struct muse_read_out {
+ uint64_t len;
+ uint32_t soft_error;
+ uint32_t padding;
+};
+
+struct muse_write_in {
+ uint64_t addr;
+ uint64_t len;
+ uint32_t flags;
+ uint32_t padding;
+};
+
+struct muse_write_out {
+ uint64_t len;
+ uint32_t soft_error;
+ uint32_t padding;
+};
+
+struct muse_markbad_in {
+ uint64_t addr;
+};
+
+struct muse_isbad_in {
+ uint64_t addr;
+};
+
+struct muse_isbad_out {
+ uint32_t result;
+ uint32_t padding;
+};
+
#endif /* _LINUX_FUSE_H */
--
2.26.2
MUSE allows implementing a MTD in userspace.
So far userspace has control over mtd_read, mtd_write, mtd_erase,
mtd_block_isbad, mtd_block_markbad, and mtd_sync.
It can also set the various MTD parameters such as
name, flags, site, writesize and erasesize.
That way advanced simulators for many types of flashes
can be implemented in userspace such that the complexity
is in userspace. Furthermore at some point we can depricate
ad-hoc in-kernel MTD simulators such as nandsim.
Signed-off-by: Richard Weinberger <[email protected]>
---
Documentation/ABI/testing/sysfs-class-mtd | 8 +
fs/fuse/Kconfig | 11 +
fs/fuse/Makefile | 1 +
fs/fuse/muse.c | 1086 +++++++++++++++++++++
4 files changed, 1106 insertions(+)
create mode 100644 fs/fuse/muse.c
diff --git a/Documentation/ABI/testing/sysfs-class-mtd b/Documentation/ABI/testing/sysfs-class-mtd
index 3bc7c0a95c92..1aa8d7855f9c 100644
--- a/Documentation/ABI/testing/sysfs-class-mtd
+++ b/Documentation/ABI/testing/sysfs-class-mtd
@@ -240,3 +240,11 @@ Contact: [email protected]
Description:
Number of bytes available for a client to place data into
the out of band area.
+
+What: /sys/class/mtd/mtdX/muse_pid
+Date: January 2021
+KernelVersion: 5.12
+Contact: [email protected]
+Description:
+ If this MTD is a userspace driven MTD, muse_pid shows the PID
+ of the process behind it at creation time.
diff --git a/fs/fuse/Kconfig b/fs/fuse/Kconfig
index 9c8cc1e7b3a5..2fc63dc18a53 100644
--- a/fs/fuse/Kconfig
+++ b/fs/fuse/Kconfig
@@ -56,3 +56,14 @@ config FUSE_DAX
If you want to allow mounting a Virtio Filesystem with the "dax"
option, answer Y.
+
+config MUSE
+ tristate "Memory Technology Device (MTD) in Userspace support"
+ depends on FUSE_FS
+ select FUSE_HELPER
+ select MTD
+ help
+ This FUSE extension allows an MTD to be implemented in userspace.
+
+ If you want to develop or use a userspace MTD based on MUSE,
+ answer Y or M.
diff --git a/fs/fuse/Makefile b/fs/fuse/Makefile
index 7a5768cce6be..67a7af3fb047 100644
--- a/fs/fuse/Makefile
+++ b/fs/fuse/Makefile
@@ -6,6 +6,7 @@
obj-$(CONFIG_FUSE_FS) += fuse.o
obj-$(CONFIG_CUSE) += cuse.o
obj-$(CONFIG_VIRTIO_FS) += virtiofs.o
+obj-$(CONFIG_MUSE) += muse.o
fuse-y := dev.o dir.o file.o inode.o control.o xattr.o acl.o readdir.o
fuse-$(CONFIG_FUSE_DAX) += dax.o
diff --git a/fs/fuse/muse.c b/fs/fuse/muse.c
new file mode 100644
index 000000000000..43f8e400abcd
--- /dev/null
+++ b/fs/fuse/muse.c
@@ -0,0 +1,1086 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * MUSE: MTD in userspace
+ * Copyright (C) 2021 sigma star gmbh
+ * Author: Richard Weinberger <[email protected]>
+ */
+
+#define pr_fmt(fmt) "MUSE: " fmt
+
+#include <linux/fuse.h>
+#include <linux/miscdevice.h>
+#include <linux/module.h>
+#include <linux/mtd/mtd.h>
+#include <linux/mtd/partitions.h>
+#include <linux/mutex.h>
+#include <linux/slab.h>
+#include <linux/sysfs.h>
+#include <linux/workqueue.h>
+
+#include "fuse_i.h"
+
+/*
+ * struct muse_conn - MUSE connection object.
+ *
+ * @fm: FUSE mount object.
+ * @fc: FUSE connection object.
+ * @mtd: MTD object.
+ * @creator: PID of the creating process.
+ * @want_exit: Denotes that userspace is disconncted and the MTD shall be
+ * removed as soon the last user vanishes.
+ * @mtd_registered: true if this MUSE connection sucessfully registered an MTD.
+ * @mtd_exit_work: work context for async MTD removal.
+ * @ref_mutex: synchronizes @want_exit and MTD put/get.
+ *
+ * Describes a connection to a userspace server.
+ * Each connection implements a single (master) MTD.
+ *
+ */
+struct muse_conn {
+ struct fuse_mount fm;
+ struct fuse_conn fc;
+ struct mtd_info mtd;
+ pid_t creator;
+ bool want_exit;
+ bool mtd_registered;
+ struct work_struct mtd_exit_work;
+ struct mutex ref_mutex;
+};
+
+/*
+ * struct muse_init_args - MUSE init arguments.
+ *
+ * @ap: FUSE argument pages object.
+ * @in: MUSE init parameters sent to userspace.
+ * @out: MUSE init parameters sent from userspace.
+ * @page: A single pages used to pass stringy key-value parameters
+ * from userspace to this module.
+ * @desc: FUSE page description object.
+ *
+ * Descripes arguments used by the MUSE_INIT FUSE opcode.
+ *
+ */
+struct muse_init_args {
+ struct fuse_args_pages ap;
+ struct muse_init_in in;
+ struct muse_init_out out;
+ struct page *page;
+ struct fuse_page_desc desc;
+};
+
+/*
+ * struct muse_mtd_create_req - MUSE MTD creation request.
+ *
+ * @name: Name of the (master) MTD, usually something like muse-<pid>.
+ * @type: Type of the MTD, one out of MTD_RAM, MTD_ROM, MTD_NORFLASH,
+ * MTD_NANDFLASH, MTD_DATAFLASH or MTD_MLCNANDFLASH.
+ * @size: Total size of the MTD.
+ * @writesize: writesize of the MTD.
+ * @writebufsize: writebufsize of the MTD, usually euqal to @writesize.
+ * @erasesize: erasesize of the MTD.
+ * @oobsize: Total number of out-of-band bytes per page (writesize),
+ * only useful for NAND style MTDs.
+ * @oobavail: Number of available bytes in the out-of-band area.
+ * Only useful for NAND style MTDs.
+ * @subpage_shift: Subpages shift value, either 0, 1 or 2. Only useful for
+ * NAND style MTDs.
+ * @mtdparts: mtdparts string *without* leading MTD name which describes
+ * partitioning of the MTD as understood by
+ * drivers/mtd/parsers/cmdlinepart.c.
+ *
+ * Describes the MTD as desired by userspace.
+ *
+ */
+struct muse_mtd_create_req {
+ const char *name;
+ unsigned int type;
+ uint32_t flags;
+ uint64_t size;
+ uint32_t writesize;
+ uint32_t writebufsize;
+ uint32_t erasesize;
+ uint32_t oobsize;
+ uint32_t oobavail;
+ unsigned int subpage_shift;
+ const char *mtdparts;
+};
+
+/*
+ * struct muse_mtd_init_ctx
+ *
+ * @mtd_init_work: workqueue context object.
+ * @pd: Extra parameters for the MTD partition parser, usually an mtdparts
+ * string.
+ * @mc: MUSE connection this object belongs to.
+ *
+ * Describes the parameter object passed to a workqueue worker to create the
+ * MTD asynchronously.
+ *
+ */
+struct muse_mtd_init_ctx {
+ struct work_struct mtd_init_work;
+ struct mtd_part_parser_data pd;
+ struct muse_conn *mc;
+};
+
+static void muse_fc_release(struct fuse_conn *fc)
+{
+ struct muse_conn *mc = container_of(fc, struct muse_conn, fc);
+
+ WARN_ON_ONCE(mc->mtd.usecount);
+ kfree_rcu(mc, fc.rcu);
+}
+
+static struct muse_conn *get_mc_from_mtd(struct mtd_info *mtd)
+{
+ struct mtd_info *master = mtd_get_master(mtd);
+
+ return master->priv;
+}
+
+static int muse_mtd_erase(struct mtd_info *mtd, struct erase_info *instr)
+{
+ struct muse_conn *mc = get_mc_from_mtd(mtd);
+ struct fuse_mount *fm = &mc->fm;
+ struct muse_erase_in inarg;
+ FUSE_ARGS(args);
+ ssize_t ret;
+
+ inarg.addr = instr->addr;
+ inarg.len = instr->len;
+
+ args.opcode = MUSE_ERASE;
+ args.nodeid = FUSE_ROOT_ID;
+ args.in_numargs = 1;
+ args.in_args[0].size = sizeof(inarg);
+ args.in_args[0].value = &inarg;
+
+ ret = fuse_simple_request(fm, &args);
+ if (ret < 0)
+ return ret;
+
+ return 0;
+}
+
+static int muse_mtd_markbad(struct mtd_info *mtd, loff_t addr)
+{
+ struct muse_conn *mc = get_mc_from_mtd(mtd);
+ struct fuse_mount *fm = &mc->fm;
+ struct muse_markbad_in inarg;
+ FUSE_ARGS(args);
+ ssize_t ret;
+
+ inarg.addr = addr;
+
+ args.opcode = MUSE_MARKBAD;
+ args.nodeid = FUSE_ROOT_ID;
+ args.in_numargs = 1;
+ args.in_args[0].size = sizeof(inarg);
+ args.in_args[0].value = &inarg;
+
+ ret = fuse_simple_request(fm, &args);
+ if (ret < 0)
+ return ret;
+
+ return 0;
+}
+
+static int muse_mtd_isbad(struct mtd_info *mtd, loff_t addr)
+{
+ struct muse_conn *mc = get_mc_from_mtd(mtd);
+ struct fuse_mount *fm = &mc->fm;
+ struct muse_isbad_in inarg;
+ struct muse_isbad_out outarg;
+ FUSE_ARGS(args);
+ ssize_t ret;
+
+ inarg.addr = addr;
+
+ args.opcode = MUSE_ISBAD;
+ args.nodeid = FUSE_ROOT_ID;
+ args.in_numargs = 1;
+ args.in_args[0].size = sizeof(inarg);
+ args.in_args[0].value = &inarg;
+ args.out_numargs = 1;
+ args.out_args[0].size = sizeof(outarg);
+ args.out_args[0].value = &outarg;
+
+ ret = fuse_simple_request(fm, &args);
+ if (ret < 0)
+ return ret;
+
+ return outarg.result;
+}
+
+static void muse_mtd_sync(struct mtd_info *mtd)
+{
+ struct muse_conn *mc = get_mc_from_mtd(mtd);
+ struct fuse_mount *fm = &mc->fm;
+ FUSE_ARGS(args);
+
+ args.opcode = MUSE_SYNC;
+ args.nodeid = FUSE_ROOT_ID;
+ args.in_numargs = 0;
+
+ fuse_simple_request(fm, &args);
+}
+
+static ssize_t muse_send_write(struct fuse_args_pages *ap, struct fuse_mount *fm,
+ loff_t from, size_t count, int flags, int *soft_error)
+{
+ struct fuse_args *args = &ap->args;
+ ssize_t ret;
+
+ struct muse_write_in in;
+ struct muse_write_out out;
+
+ in.addr = from;
+ in.len = count;
+ in.flags = flags;
+ args->opcode = MUSE_WRITE;
+ args->nodeid = FUSE_ROOT_ID;
+ args->in_numargs = 2;
+ args->in_args[0].size = sizeof(in);
+ args->in_args[0].value = ∈
+ /*
+ * args->in_args[1].value was set in set_ap_inout_bufs()
+ */
+ args->in_args[1].size = count;
+ args->out_numargs = 1;
+ args->out_args[0].size = sizeof(out);
+ args->out_args[0].value = &out;
+
+ ret = fuse_simple_request(fm, &ap->args);
+ if (ret < 0)
+ goto out;
+
+ ret = out.len;
+ *soft_error = out.soft_error;
+
+out:
+ return ret;
+}
+
+static ssize_t muse_send_read(struct fuse_args_pages *ap, struct fuse_mount *fm,
+ loff_t from, size_t count, int flags, int *soft_error)
+{
+ struct fuse_args *args = &ap->args;
+ ssize_t ret;
+
+ struct muse_read_in in;
+ struct muse_read_out out;
+
+ in.addr = from;
+ in.len = count;
+ in.flags = flags;
+ args->opcode = MUSE_READ;
+ args->nodeid = FUSE_ROOT_ID;
+ args->in_numargs = 1;
+ args->in_args[0].size = sizeof(in);
+ args->in_args[0].value = ∈
+ args->out_argvar = true;
+ args->out_numargs = 2;
+ args->out_args[0].size = sizeof(out);
+ args->out_args[0].value = &out;
+ /*
+ * args->out_args[1].value was set in set_ap_inout_bufs()
+ */
+ args->out_args[1].size = count;
+
+ ret = fuse_simple_request(fm, &ap->args);
+ if (ret < 0)
+ goto out;
+
+ ret = out.len;
+ *soft_error = out.soft_error;
+
+out:
+ return ret;
+}
+
+/*
+ * set_ap_inout_bufs - Set in/out buffers for fuse args
+ *
+ * @ap: FUSE args pages object
+ * @iter: IOV iter which describes source/destination of the IO operation
+ * @count: Inputs the max amount of data we can process,
+ * outputs the amount of data @iter has left.
+ * @write: If non-zero, this is a write operation, read otherwise.
+ *
+ * This function takes a IOV iter object and sets up FUSE args pointer.
+ * Since in MTD all buffers are kernel memory we can directly use
+ * fuse_get_user_addr().
+ */
+static void set_ap_inout_bufs(struct fuse_args_pages *ap, struct iov_iter *iter,
+ size_t *count, int write)
+{
+ unsigned long addr;
+ size_t frag_size;
+
+ addr = fuse_get_user_addr(iter);
+ frag_size = fuse_get_frag_size(iter, *count);
+
+ if (write)
+ ap->args.in_args[1].value = (void *)addr;
+ else
+ ap->args.out_args[1].value = (void *)addr;
+
+ iov_iter_advance(iter, frag_size);
+ *count = frag_size;
+}
+
+/*
+ * muse_do_io - MUSE main IO processing function.
+ *
+ * @mc: MUSE connection object.
+ * @ops: MTD read/write operation object.
+ * @pos: Where to start reading/writing on the MTD.
+ * @write: If non-zero, this is a write operation, read otherwise.
+ *
+ * This function is responsible for processing reads and writes to the MTD.
+ * It directly takes @pos and @ops from the MTD subsystem.
+ * All IO is synchronous and buffers provided by @ops have to be kernel memory.
+ * The userspace server can inject also custom errors into the IO path,
+ * mostly -EUCLEAN to signal fixed bit-flips or -EBADMSG for uncorrectable
+ * bit-flips.
+ *
+ */
+static int muse_do_io(struct muse_conn *mc, struct mtd_oob_ops *ops,
+ loff_t pos, int write)
+{
+ struct fuse_mount *fm = &mc->fm;
+ struct fuse_conn *fc = &mc->fc;
+ size_t fc_max_io = write ? fc->max_write : fc->max_read;
+ struct fuse_args_pages ap;
+ int oob = !!ops->ooblen;
+ unsigned int max_pages;
+ struct iov_iter iter;
+ struct kvec iov;
+ size_t count;
+ size_t retlen = 0;
+ int bitflips = 0;
+ int eccerrors = 0;
+ int retcode = 0;
+ int io_mode = 0;
+ ssize_t ret = 0;
+
+ /*
+ * We don't support accessing in- and out-of-band data in the same op.
+ * AFAICT FUSE does not support attaching two variable sized buffers to
+ * a request.
+ */
+ if ((ops->len && ops->ooblen) || (ops->datbuf && ops->oobbuf)) {
+ ret = -ENOTSUPP;
+ goto out;
+ }
+
+ if (!oob) {
+ iov.iov_base = ops->datbuf;
+ iov.iov_len = ops->len;
+ iov_iter_kvec(&iter, write ? WRITE : READ, &iov, 1, ops->len);
+
+ /*
+ * When ops->ooblen is not set, we don't care about
+ * MTD_OPS_PLACE_OOB vs. MTD_OPS_AUTO_OOB.
+ */
+ io_mode |= MUSE_IO_INBAND;
+ if (ops->mode == MTD_OPS_RAW)
+ io_mode |= MUSE_IO_RAW;
+ } else {
+ iov.iov_base = ops->oobbuf;
+ iov.iov_len = ops->ooblen;
+ iov_iter_kvec(&iter, write ? WRITE : READ, &iov, 1, ops->ooblen);
+
+ /*
+ * When accessing OOB we just move the address by ooboffs.
+ * This works because oobsize is smaller than writesize.
+ */
+ pos += ops->ooboffs;
+
+ if (ops->mode == MTD_OPS_PLACE_OOB) {
+ io_mode |= MUSE_IO_OOB_PLACE;
+ } else if (ops->mode == MTD_OPS_AUTO_OOB) {
+ io_mode |= MUSE_IO_OOB_AUTO;
+ } else if (ops->mode == MTD_OPS_RAW) {
+ io_mode |= MUSE_IO_OOB_PLACE | MUSE_IO_RAW;
+ } else {
+ ret = -ENOTSUPP;
+ goto out;
+ }
+ }
+
+ /*
+ * A full page needs to fit into a single FUSE request.
+ */
+ if (fc_max_io < mc->mtd.writebufsize) {
+ ret = -ENOBUFS;
+ goto out;
+ }
+
+ count = iov_iter_count(&iter);
+
+ max_pages = iov_iter_npages(&iter, fc->max_pages);
+ memset(&ap, 0, sizeof(ap));
+
+ ap.pages = fuse_pages_alloc(max_pages, GFP_KERNEL, &ap.descs);
+ if (!ap.pages) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ while (count) {
+ size_t nbytes = min_t(size_t, count, fc_max_io);
+ int soft_error = 0;
+
+ set_ap_inout_bufs(&ap, &iter, &nbytes, write);
+
+ if (write)
+ ret = muse_send_write(&ap, fm, pos, nbytes, io_mode, &soft_error);
+ else
+ ret = muse_send_read(&ap, fm, pos, nbytes, io_mode, &soft_error);
+
+ kfree(ap.pages);
+ ap.pages = NULL;
+
+ if (ret < 0) {
+ iov_iter_revert(&iter, nbytes);
+ break;
+ }
+
+ if (soft_error) {
+ /*
+ * Userspace wants to inject an error code.
+ */
+
+ if (write) {
+ /*
+ * For writes, take it as-is.
+ */
+ ret = soft_error;
+ break;
+ }
+
+ /*
+ * -EUCLEAN and -EBADMSG are special for reads
+ * in MTD, it expects from a device to return all
+ * requsted data even if there are (un)correctable errors.
+ * The upper layer, such as UBI, has to deal with them.
+ */
+ if (soft_error == -EUCLEAN) {
+ bitflips++;
+ } else if (soft_error == -EBADMSG) {
+ eccerrors++;
+ } else {
+ ret = soft_error;
+ break;
+ }
+ }
+
+ /*
+ * No short reads are allowed in MTD.
+ */
+ if (ret != nbytes) {
+ iov_iter_revert(&iter, nbytes - ret);
+ ret = -EIO;
+ break;
+ }
+
+ count -= ret;
+ retlen += ret;
+ pos += ret;
+
+ if (count) {
+ max_pages = iov_iter_npages(&iter, fc->max_pages);
+ memset(&ap, 0, sizeof(ap));
+ ap.pages = fuse_pages_alloc(max_pages, GFP_KERNEL, &ap.descs);
+ if (!ap.pages)
+ break;
+ }
+ }
+
+ kfree(ap.pages);
+
+ if (bitflips)
+ retcode = -EUCLEAN;
+ if (eccerrors)
+ retcode = -EBADMSG;
+
+out:
+ /*
+ * If ret is set, it must be a fatal error which overrides
+ * -EUCLEAN and -EBADMSG.
+ */
+ if (ret < 0)
+ retcode = ret;
+
+ if (oob)
+ ops->oobretlen = retlen;
+ else
+ ops->retlen = retlen;
+
+ return retcode;
+}
+
+static int muse_mtd_read_oob(struct mtd_info *mtd, loff_t from, struct mtd_oob_ops *ops)
+{
+ struct muse_conn *mc = get_mc_from_mtd(mtd);
+
+ return muse_do_io(mc, ops, from, 0);
+}
+
+static int muse_mtd_write_oob(struct mtd_info *mtd, loff_t to, struct mtd_oob_ops *ops)
+{
+ struct muse_conn *mc = get_mc_from_mtd(mtd);
+
+ return muse_do_io(mc, ops, to, 1);
+}
+
+static int muse_mtd_get_device(struct mtd_info *mtd)
+{
+ struct muse_conn *mc = get_mc_from_mtd(mtd);
+ int ret = 0;
+
+ mutex_lock(&mc->ref_mutex);
+
+ /*
+ * Refuse a new reference if userspace is no longer connected.
+ */
+ if (mc->want_exit) {
+ ret = -ENODEV;
+ goto out;
+ }
+
+ fuse_conn_get(&mc->fc);
+
+out:
+ mutex_unlock(&mc->ref_mutex);
+ return ret;
+}
+
+static ssize_t muse_pid_show(struct device *dev, struct device_attribute *attr,
+ char *buf)
+{
+ struct mtd_info *mtd = dev_get_drvdata(dev);
+ struct muse_conn *mc = container_of(mtd_get_master(mtd), struct muse_conn, mtd);
+
+ return sprintf(buf, "%d\n", mc->creator);
+}
+
+static DEVICE_ATTR_RO(muse_pid);
+
+static int install_sysfs_attrs(struct mtd_info *mtd)
+{
+ bool part_master = IS_ENABLED(CONFIG_MTD_PARTITIONED_MASTER);
+ struct mtd_info *child;
+ int ret = 0;
+
+ /*
+ * Create the sysfs file only for visible MTDs, on the master device only
+ * if CONFIG_MTD_PARTITIONED_MASTER enabled or it is unpartitioned.
+ */
+ if (part_master || list_empty(&mtd->partitions)) {
+ ret = sysfs_create_file(&mtd->dev.kobj, &dev_attr_muse_pid.attr);
+ if (ret || !part_master)
+ goto out;
+ }
+
+ /*
+ * ... and to all partitions, if there are any.
+ */
+ list_for_each_entry(child, &mtd->partitions, part.node) {
+ ret = sysfs_create_file(&child->dev.kobj, &dev_attr_muse_pid.attr);
+ if (ret)
+ break;
+ }
+
+out:
+ return ret;
+}
+
+static void remove_sysfs_attrs(struct mtd_info *mtd)
+{
+ bool part_master = IS_ENABLED(CONFIG_MTD_PARTITIONED_MASTER);
+ struct mtd_info *child;
+
+ /*
+ * Same logic as in install_sysfs_attrs().
+ */
+ if (part_master || list_empty(&mtd->partitions)) {
+ sysfs_remove_file(&mtd->dev.kobj, &dev_attr_muse_pid.attr);
+ if (!part_master)
+ return;
+ }
+
+ list_for_each_entry(child, &mtd->partitions, part.node) {
+ sysfs_remove_file(&child->dev.kobj, &dev_attr_muse_pid.attr);
+ }
+}
+
+static void muse_exit_mtd_work(struct work_struct *work)
+{
+ struct muse_conn *mc = container_of(work, struct muse_conn, mtd_exit_work);
+
+ if (mc->mtd_registered) {
+ remove_sysfs_attrs(&mc->mtd);
+ mtd_device_unregister(&mc->mtd);
+ kfree(mc->mtd.name);
+ }
+ fuse_conn_put(&mc->fc);
+}
+
+/*
+ * MTD deregristation has to happen asynchronously.
+ * It will grap mtd_table_mutex but depending on the context
+ * we hold it already or hold mc->ref_mutex.
+ * The locking order is mtd_table_mutex > mc->ref_mutex.
+ */
+static void muse_remove_mtd_async(struct muse_conn *mc)
+{
+ INIT_WORK(&mc->mtd_exit_work, muse_exit_mtd_work);
+ schedule_work(&mc->mtd_exit_work);
+}
+
+static void muse_mtd_put_device(struct mtd_info *mtd)
+{
+ struct muse_conn *mc = get_mc_from_mtd(mtd);
+
+ mutex_lock(&mc->ref_mutex);
+
+ if (mc->want_exit && mc->mtd.usecount == 0) {
+ /*
+ * This was the last reference on the MTD, remove it now.
+ */
+ muse_remove_mtd_async(mc);
+ } else {
+ /*
+ * The MTD has users or userspace is still connected,
+ * keep the MTD and just decrement the FUSE connection
+ * reference counter.
+ */
+ fuse_conn_put(&mc->fc);
+ }
+ mutex_unlock(&mc->ref_mutex);
+}
+
+static int muse_verify_mtdreq(struct muse_mtd_create_req *req)
+{
+ int ret = -EINVAL;
+ uint64_t tmp;
+
+ if (!req->name)
+ goto out;
+
+ if (!req->size || !req->writesize || !req->erasesize)
+ goto out;
+
+ tmp = req->size;
+ if (do_div(tmp, req->writesize))
+ goto out;
+
+ tmp = req->size;
+ if (do_div(tmp, req->erasesize))
+ goto out;
+
+ if (req->oobsize < req->oobavail)
+ goto out;
+
+ if (req->oobsize >= req->writesize)
+ goto out;
+
+ if (req->flags & ~(MTD_WRITEABLE | MTD_BIT_WRITEABLE | MTD_NO_ERASE))
+ goto out;
+
+ if (req->subpage_shift > 2)
+ goto out;
+
+ switch (req->type) {
+ case MTD_RAM:
+ case MTD_ROM:
+ case MTD_NORFLASH:
+ case MTD_NANDFLASH:
+ case MTD_DATAFLASH:
+ case MTD_MLCNANDFLASH:
+ break;
+ default:
+ goto out;
+ }
+
+ ret = 0;
+
+out:
+ return ret;
+}
+
+static int muse_parse_mtdreq(char *p, size_t len, struct mtd_info *mtd,
+ struct mtd_part_parser_data *pd)
+{
+ struct muse_mtd_create_req req = {0};
+ char *end = p + len;
+ char *key, *val;
+ int ret;
+
+ for (;;) {
+ ret = fuse_kv_parse_one(&p, end, &key, &val);
+ if (ret < 0)
+ goto out;
+ if (!ret)
+ break;
+
+ if (strcmp(key, "NAME") == 0) {
+ req.name = val;
+ } else if (strcmp(key, "TYPE") == 0) {
+ unsigned int type;
+
+ ret = kstrtouint(val, 10, &type);
+ if (ret)
+ goto out;
+
+ req.type = type;
+ } else if (strcmp(key, "FLAGS") == 0) {
+ ret = kstrtou32(val, 10, &req.flags);
+ if (ret)
+ goto out;
+ } else if (strcmp(key, "SIZE") == 0) {
+ ret = kstrtou64(val, 10, &req.size);
+ if (ret)
+ goto out;
+ } else if (strcmp(key, "WRITESIZE") == 0) {
+ ret = kstrtou32(val, 10, &req.writesize);
+ if (ret)
+ goto out;
+ } else if (strcmp(key, "WRITEBUFSIZE") == 0) {
+ ret = kstrtou32(val, 10, &req.writebufsize);
+ if (ret)
+ goto out;
+ } else if (strcmp(key, "OOBSIZE") == 0) {
+ ret = kstrtou32(val, 10, &req.oobsize);
+ if (ret)
+ goto out;
+ } else if (strcmp(key, "OOBAVAIL") == 0) {
+ ret = kstrtou32(val, 10, &req.oobavail);
+ if (ret)
+ goto out;
+ } else if (strcmp(key, "ERASESIZE") == 0) {
+ ret = kstrtou32(val, 10, &req.erasesize);
+ if (ret)
+ goto out;
+ } else if (strcmp(key, "SUBPAGESHIFT") == 0) {
+ ret = kstrtouint(val, 10, &req.subpage_shift);
+ if (ret)
+ goto out;
+ } else if (strcmp(key, "PARTSCMDLINE") == 0) {
+ req.mtdparts = val;
+ } else {
+ pr_warn("Ignoring unknown MTD param \"%s\"\n", key);
+ }
+ }
+
+ if (req.name && req.mtdparts && strlen(req.mtdparts) > 0) {
+ pd->mtdparts = kasprintf(GFP_KERNEL, "%s:%s", req.name, req.mtdparts);
+ if (!pd->mtdparts) {
+ ret = -ENOMEM;
+ goto out;
+ }
+ }
+
+ ret = muse_verify_mtdreq(&req);
+ if (ret)
+ goto out;
+
+ mtd->name = kstrdup(req.name, GFP_KERNEL);
+ if (!mtd->name) {
+ kfree(pd->mtdparts);
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ mtd->size = req.size;
+ mtd->erasesize = req.erasesize;
+ mtd->writesize = req.writesize;
+
+ if (req.writebufsize)
+ mtd->writebufsize = req.writebufsize;
+ else
+ mtd->writebufsize = mtd->writesize;
+
+ mtd->oobsize = req.oobsize;
+ mtd->oobavail = req.oobavail;
+ mtd->subpage_sft = req.subpage_shift;
+
+ mtd->type = req.type;
+ mtd->flags = MTD_MUSE | req.flags;
+
+ ret = 0;
+out:
+ return ret;
+}
+
+static void muse_init_mtd_work(struct work_struct *work)
+{
+ struct muse_mtd_init_ctx *ctx = container_of(work, struct muse_mtd_init_ctx, mtd_init_work);
+ static const char * const part_probe_types[] = { "cmdlinepart", NULL };
+ struct muse_conn *mc = ctx->mc;
+
+ if (mtd_device_parse_register(&mc->mtd, part_probe_types, &ctx->pd, NULL, 0) != 0)
+ goto abort;
+
+ if (install_sysfs_attrs(&mc->mtd))
+ goto abort;
+
+ goto free_mtdparts;
+
+abort:
+ fuse_abort_conn(&mc->fc);
+
+free_mtdparts:
+ mc->mtd_registered = true;
+ kfree(ctx->pd.mtdparts);
+ kfree(ctx);
+}
+
+static void muse_process_init_reply(struct fuse_mount *fm,
+ struct fuse_args *args, int error)
+{
+ struct fuse_conn *fc = fm->fc;
+ struct muse_init_args *mia = container_of(args, struct muse_init_args, ap.args);
+ struct muse_conn *mc = container_of(fc, struct muse_conn, fc);
+ struct fuse_args_pages *ap = &mia->ap;
+ struct muse_init_out *arg = &mia->out;
+ struct page *page = ap->pages[0];
+ struct mtd_info *mtd = &mc->mtd;
+ struct muse_mtd_init_ctx *init_ctx = NULL;
+ int ret;
+
+ init_ctx = kzalloc(sizeof(*init_ctx), GFP_KERNEL);
+ if (!init_ctx)
+ goto abort;
+
+ init_ctx->mc = mc;
+
+ if (error || arg->fuse_major != FUSE_KERNEL_VERSION || arg->fuse_minor < 34)
+ goto free_ctx;
+
+ fc->minor = arg->fuse_minor;
+ fc->max_read = max_t(unsigned int, arg->max_read, 4096);
+ fc->max_write = max_t(unsigned int, arg->max_write, 4096);
+
+ ret = muse_parse_mtdreq(page_address(page), ap->args.out_args[1].size,
+ mtd, &init_ctx->pd);
+ if (ret)
+ goto free_ctx;
+
+ mtd->_erase = muse_mtd_erase;
+ mtd->_sync = muse_mtd_sync;
+ mtd->_read_oob = muse_mtd_read_oob;
+ mtd->_write_oob = muse_mtd_write_oob;
+ mtd->_get_device = muse_mtd_get_device;
+ mtd->_put_device = muse_mtd_put_device;
+
+ /*
+ * Bad blocks make only sense on NAND devices.
+ * As soon _block_isbad is set, upper layer such as
+ * UBI expects a working _block_isbad, so userspace
+ * has to implement MUSE_ISBAD.
+ */
+ if (mtd_type_is_nand(mtd)) {
+ mtd->_block_isbad = muse_mtd_isbad;
+ mtd->_block_markbad = muse_mtd_markbad;
+ }
+
+ mtd->priv = mc;
+ mtd->owner = THIS_MODULE;
+
+ /*
+ * We want one READ/WRITE op per MTD io. So the MTD pagesize needs
+ * to fit into max_write/max_read
+ */
+ if (fc->max_write < mtd->writebufsize || fc->max_read < mtd->writebufsize)
+ goto free_name;
+
+ mc->creator = task_tgid_vnr(current);
+
+ kfree(mia);
+ __free_page(page);
+
+ INIT_WORK(&init_ctx->mtd_init_work, muse_init_mtd_work);
+
+ /*
+ * MTD can access the device while probing it.
+ * e.g. scanning for bad blocks or custom partition parsers.
+ * So we need to do the final step in a different process
+ * context. Otherwise we will lockup here if the userspace
+ * side of this MUSE MTD is single threaded.
+ */
+ schedule_work(&init_ctx->mtd_init_work);
+ return;
+
+free_name:
+ kfree(mtd->name);
+free_ctx:
+ kfree(init_ctx);
+abort:
+ kfree(mia);
+ __free_page(page);
+ fuse_abort_conn(fc);
+}
+
+static int muse_send_init(struct muse_conn *mc)
+{
+ struct fuse_mount *fm = &mc->fm;
+ struct fuse_args_pages *ap;
+ struct muse_init_args *mia;
+ struct page *page;
+ int ret = -ENOMEM;
+
+ BUILD_BUG_ON(MUSE_INIT_INFO_MAX > PAGE_SIZE);
+
+ page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+ if (!page)
+ goto err;
+
+ mia = kzalloc(sizeof(*mia), GFP_KERNEL);
+ if (!mia)
+ goto err_page;
+
+ ap = &mia->ap;
+ mia->in.fuse_major = FUSE_KERNEL_VERSION;
+ mia->in.fuse_minor = FUSE_KERNEL_MINOR_VERSION;
+ ap->args.opcode = MUSE_INIT;
+ ap->args.in_numargs = 1;
+ ap->args.in_args[0].size = sizeof(mia->in);
+ ap->args.in_args[0].value = &mia->in;
+ ap->args.out_numargs = 2;
+ ap->args.out_args[0].size = sizeof(mia->out);
+ ap->args.out_args[0].value = &mia->out;
+ ap->args.out_args[1].size = MUSE_INIT_INFO_MAX;
+ ap->args.out_argvar = true;
+ ap->args.out_pages = true;
+ ap->num_pages = 1;
+ ap->pages = &mia->page;
+ ap->descs = &mia->desc;
+ mia->page = page;
+ mia->desc.length = ap->args.out_args[1].size;
+ ap->args.end = muse_process_init_reply;
+
+ ret = fuse_simple_background(fm, &ap->args, GFP_KERNEL);
+ if (ret)
+ goto err_ia;
+
+ return 0;
+
+err_ia:
+ kfree(mia);
+err_page:
+ __free_page(page);
+err:
+ return ret;
+}
+
+static int muse_ctrl_open(struct inode *inode, struct file *file)
+{
+ struct muse_conn *mc;
+ struct fuse_dev *fud;
+ int ret;
+
+ /*
+ * Paranoia check.
+ */
+ if (!capable(CAP_SYS_ADMIN)) {
+ ret = -EPERM;
+ goto err;
+ }
+
+ mc = kzalloc(sizeof(*mc), GFP_KERNEL);
+ if (!mc) {
+ ret = -ENOMEM;
+ goto err;
+ }
+
+ mutex_init(&mc->ref_mutex);
+
+ fuse_conn_init(&mc->fc, &mc->fm, get_user_ns(&init_user_ns),
+ &fuse_dev_fiq_ops, NULL);
+
+ fud = fuse_dev_alloc_install(&mc->fc);
+ if (!fud) {
+ ret = -ENOMEM;
+ goto err_free;
+ }
+
+ mc->fc.release = muse_fc_release;
+ mc->fc.initialized = 1;
+
+ ret = muse_send_init(mc);
+ if (ret)
+ goto err_dev;
+
+ file->private_data = fud;
+
+ return 0;
+
+err_dev:
+ fuse_dev_free(fud);
+ fuse_conn_put(&mc->fc);
+err_free:
+ kfree(mc);
+err:
+ return ret;
+}
+
+static int muse_ctrl_release(struct inode *inode, struct file *file)
+{
+ struct fuse_dev *fud = file->private_data;
+ struct muse_conn *mc = container_of(fud->fc, struct muse_conn, fc);
+
+ mutex_lock(&mc->ref_mutex);
+ /*
+ * Make sure that nobody can gain a new reference on our MTD.
+ */
+ mc->want_exit = true;
+
+ /*
+ * If the MTD has no users, remove it right now, keep it otherwise
+ * until the last user is gone. During this phase all operations will
+ * fail with -ENOTCONN.
+ */
+ if (mc->mtd.usecount == 0)
+ muse_remove_mtd_async(mc);
+ else
+ fuse_conn_put(&mc->fc);
+ mutex_unlock(&mc->ref_mutex);
+
+ return fuse_dev_release(inode, file);
+}
+
+static struct file_operations muse_ctrl_fops;
+
+static struct miscdevice muse_ctrl_dev = {
+ .minor = MISC_DYNAMIC_MINOR,
+ .name = "muse",
+ .fops = &muse_ctrl_fops,
+};
+
+static int __init muse_init(void)
+{
+ /*
+ * Inherit from fuse_dev_operations and override open() plus release().
+ */
+ muse_ctrl_fops = fuse_dev_operations;
+ muse_ctrl_fops.owner = THIS_MODULE;
+ muse_ctrl_fops.open = muse_ctrl_open;
+ muse_ctrl_fops.release = muse_ctrl_release;
+
+ return misc_register(&muse_ctrl_dev);
+}
+
+static void __exit muse_exit(void)
+{
+ misc_deregister(&muse_ctrl_dev);
+}
+
+module_init(muse_init);
+module_exit(muse_exit);
+
+MODULE_AUTHOR("Richard Weinberger <[email protected]>");
+MODULE_DESCRIPTION("MTD in userspace");
+MODULE_LICENSE("GPL");
--
2.26.2
Since MUSE lives in fs/fuse/, make sure that linux-mtd@ is CC'ed
on patches such that MTD related aspects of changes can be reviewed.
Signed-off-by: Richard Weinberger <[email protected]>
---
MAINTAINERS | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/MAINTAINERS b/MAINTAINERS
index f79ec98bbb29..dabd9fd2e5e4 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12151,6 +12151,13 @@ L: [email protected]
S: Maintained
F: drivers/usb/musb/
+MUSE: MTD IN USERSPACE DRIVER
+M: Richard Weinberger <[email protected]>
+L: [email protected]
+L: [email protected]
+S: Maintained
+F: fs/fuse/muse.c
+
MXL301RF MEDIA DRIVER
M: Akihiro Tsukada <[email protected]>
L: [email protected]
--
2.26.2
Hi Richard,
Richard Weinberger <[email protected]> wrote on Mon, 25 Jan 2021 00:19:59
+0100:
> I'm happy to announce the first non-RFC version of this patch set.
> Over the xmas holidays I found some time to experiment with various userspace
> implementations of MTDs and gave the kernel side more fine-tuning.
>
> Rationale:
> ----------
>
> When working with flash devices a common task is emulating them to run various
> tests or inspect dumps from real hardware. To achieve that we have plenty of
> emulators in the MTD subsystem: mtdram, block2mtd, nandsim.
>
> Each of them implements an ad-hoc MTD and have various drawbacks.
> Over the last years some developers tried to extend them but these attempts
> often got rejected because they added just more adhoc feature instead of
> addressing overall problems.
>
> MUSE is a novel approach to address the need of advanced MTD emulators.
> Advanced means in this context supporting different (vendor specific) image
> formats, different ways for fault injection (fuzzing) and recoding/replaying
> IOs to emulate power cuts.
>
> The core goal of MUSE is having the complexity on the userspace side and
> only a small MTD driver in kernelspace.
> While playing with different approaches I realized that FUSE offers everything
> we need. So MUSE is a little like CUSE except that it does not implement a
> bare character device but an MTD.
I can't tell if your MUSE implementation is right but it looks fine
on the MTD side.
This is following the right path, I look forward to merging it soon!
Thanks for your contribution,
Miquèl
*friendly FUSE maintainer ping* :-)
On Mon, Jan 25, 2021 at 12:24 AM Richard Weinberger <[email protected]> wrote:
>
> I'm happy to announce the first non-RFC version of this patch set.
> Over the xmas holidays I found some time to experiment with various userspace
> implementations of MTDs and gave the kernel side more fine-tuning.
>
> Rationale:
> ----------
>
> When working with flash devices a common task is emulating them to run various
> tests or inspect dumps from real hardware. To achieve that we have plenty of
> emulators in the MTD subsystem: mtdram, block2mtd, nandsim.
>
> Each of them implements an ad-hoc MTD and have various drawbacks.
> Over the last years some developers tried to extend them but these attempts
> often got rejected because they added just more adhoc feature instead of
> addressing overall problems.
>
> MUSE is a novel approach to address the need of advanced MTD emulators.
> Advanced means in this context supporting different (vendor specific) image
> formats, different ways for fault injection (fuzzing) and recoding/replaying
> IOs to emulate power cuts.
>
> The core goal of MUSE is having the complexity on the userspace side and
> only a small MTD driver in kernelspace.
> While playing with different approaches I realized that FUSE offers everything
> we need. So MUSE is a little like CUSE except that it does not implement a
> bare character device but an MTD.
>
> Notes:
> ------
>
> - OOB support is currently limited. Currently MUSE has no support for processing
> in- and out-band in the same MTD operation. It is good enough to make JFFS2
> happy. This limitation is because FUSE has no support more than one variable
> length buffer in a FUSE request.
> At least I didn’t find a good way to pass more than one buffer to a request.
> Maybe FUSE folks can correct me. :-)
>
> - Every MTD read/write operation maps 1:1 to a MUSE_READ/WRITE opcode.
> Since FUSE requests are not cheap, the amount of read/written bytes in a MTD
> operation as a huge impact on the performance. Especially when NOR style MTDs
> are implemented in userspace a large writebufsize should be requested to gain
> good write performance.
> On the other hand, MTD operations with lengths larger than writesize are *not*
> split up into multiple MUSE_READ/WRITE requests. This means that userspace
> has to split them manually when doing power-cut emulation.
>
> - MUSE is not super fast. On my i5 workstation nandsim is almost twice as fast
> as a NAND flash in userspace. But MUSE is still magnitudes faster than any
> real world MTD out there. So it is good enough for the use cases I have in
> mind.
>
> Changelog:
> ----------
>
> Changes since v2 (RFC):
> - OOB support
> - MUSE_READ/WRITE opcodes are no longer a min IO MTD unit
> - MTD partitions support via mtdparts string
> - More code cleanup
> - Code rebased to 5.11-rc4
>
> Changes since v1 (RFC):
> - Rewrote IO path, fuse_direct_io() is no longer used.
> Instead of cheating fuse_direct_io() use custom ops to implement
> reading and writing. That way MUSE no longer needs a dummy file object
> nor a fuse file object.
> In MTD all IO is synchronous and operations on kernel buffers, this
> makes IO processing simple for MUSE.
> - Support for bad blocks.
> - No more (ab)use of FUSE ops such as FUSE_FSYNC.
> - Major code cleanup.
>
> This series can also be found at:
> git://git.kernel.org/pub/scm/linux/kernel/git/rw/misc.git muse_v3
>
> Richard Weinberger (8):
> fuse: Export fuse_simple_request
> fuse: Export IO helpers
> fuse: Make cuse_parse_one a common helper
> mtd: Add MTD_MUSE flag
> mtd: Allow passing a custom cmdline to cmdline line parser
> fuse: Add MUSE specific defines FUSE interface
> fuse: Implement MUSE - MTD in userspace
> MAINTAINERS: Add entry for MUSE
>
> Documentation/ABI/testing/sysfs-class-mtd | 8 +
> MAINTAINERS | 7 +
> drivers/mtd/parsers/cmdlinepart.c | 73 +-
> fs/fuse/Kconfig | 15 +
> fs/fuse/Makefile | 2 +
> fs/fuse/cuse.c | 58 +-
> fs/fuse/dev.c | 1 +
> fs/fuse/file.c | 16 +-
> fs/fuse/fuse_i.h | 18 +
> fs/fuse/helper.c | 70 ++
> fs/fuse/muse.c | 1086 +++++++++++++++++++++
> include/linux/mtd/partitions.h | 2 +
> include/uapi/linux/fuse.h | 76 ++
> include/uapi/mtd/mtd-abi.h | 1 +
> 14 files changed, 1346 insertions(+), 87 deletions(-)
> create mode 100644 fs/fuse/helper.c
> create mode 100644 fs/fuse/muse.c
>
> --
> 2.26.2
>
--
Thanks,
//richard
On Mon, Feb 1, 2021 at 2:14 PM Richard Weinberger
<[email protected]> wrote:
>
> *friendly FUSE maintainer ping* :-)
Seems like MTD folks are happy, so I'll review and merge when I get the time.
Thanks,
Miklos
On Mon, Jan 25, 2021 at 12:21 AM Richard Weinberger <[email protected]> wrote:
>
> I'm happy to announce the first non-RFC version of this patch set.
> Over the xmas holidays I found some time to experiment with various userspace
> implementations of MTDs and gave the kernel side more fine-tuning.
>
> Rationale:
> ----------
>
> When working with flash devices a common task is emulating them to run various
> tests or inspect dumps from real hardware. To achieve that we have plenty of
> emulators in the MTD subsystem: mtdram, block2mtd, nandsim.
>
> Each of them implements an ad-hoc MTD and have various drawbacks.
> Over the last years some developers tried to extend them but these attempts
> often got rejected because they added just more adhoc feature instead of
> addressing overall problems.
>
> MUSE is a novel approach to address the need of advanced MTD emulators.
> Advanced means in this context supporting different (vendor specific) image
> formats, different ways for fault injection (fuzzing) and recoding/replaying
> IOs to emulate power cuts.
>
> The core goal of MUSE is having the complexity on the userspace side and
> only a small MTD driver in kernelspace.
> While playing with different approaches I realized that FUSE offers everything
> we need. So MUSE is a little like CUSE except that it does not implement a
> bare character device but an MTD.
Looks fine.
I do wonder if MUSE should go to drivers/mtd/ instead. Long term
goal would be move CUSE to drivers/char and move the transport part of
fuse into net/fuse leaving only the actual filesystems (fuse and
virtiofs) under fs/.
But for now just moving the minimal interface needed for MUSE into a
separate header (<net/fuse.h>) would work, I guess.
Do you think that would make sense?
>
> Notes:
> ------
>
> - OOB support is currently limited. Currently MUSE has no support for processing
> in- and out-band in the same MTD operation. It is good enough to make JFFS2
> happy. This limitation is because FUSE has no support more than one variable
> length buffer in a FUSE request.
> At least I didn’t find a good way to pass more than one buffer to a request.
> Maybe FUSE folks can correct me. :-)
If you look at fuse_do_ioctl() it does variable length input and
output at the same time. I guess you need something similar to that.
Thanks,
Miklos
Miklos,
----- Ursprüngliche Mail -----
>> The core goal of MUSE is having the complexity on the userspace side and
>> only a small MTD driver in kernelspace.
>> While playing with different approaches I realized that FUSE offers everything
>> we need. So MUSE is a little like CUSE except that it does not implement a
>> bare character device but an MTD.
>
> Looks fine.
I'm glad to hear that!
> I do wonder if MUSE should go to drivers/mtd/ instead. Long term
> goal would be move CUSE to drivers/char and move the transport part of
> fuse into net/fuse leaving only the actual filesystems (fuse and
> virtiofs) under fs/.
>
> But for now just moving the minimal interface needed for MUSE into a
> separate header (<net/fuse.h>) would work, I guess.
>
> Do you think that would make sense?
Yes, I'm all for having MUSE in drivers/mtd/.
I placed MUSE initially in fs/fuse/ because CUSE was already there and muse.c includes
fuse_i.h. So tried to be as little invasive as possible.
>>
>> Notes:
>> ------
>>
>> - OOB support is currently limited. Currently MUSE has no support for processing
>> in- and out-band in the same MTD operation. It is good enough to make JFFS2
>> happy. This limitation is because FUSE has no support more than one variable
>> length buffer in a FUSE request.
>> At least I didn’t find a good way to pass more than one buffer to a request.
>> Maybe FUSE folks can correct me. :-)
>
> If you look at fuse_do_ioctl() it does variable length input and
> output at the same time. I guess you need something similar to that.
I'll dig into this!
Thanks,
//richard
Miklos,
----- Ursprüngliche Mail -----
> If you look at fuse_do_ioctl() it does variable length input and
> output at the same time. I guess you need something similar to that.
I'm not sure whether I understand correctly.
In MUSE one use case would be attaching two distinct (variable length) buffers to a
single FUSE request, in both directions.
If I read fuse_do_ioctl() correctly, it attaches always a single buffer per request
but does multiple requests.
In MUSE we cold go the same path and issue up to two requests.
One for in-band and optionally a second one for the out-of-band data.
Hmmm?
Thanks,
//richard
Hi guys,
a bit OT probably: is there any chance for you to also implement mmap()
for CUSE? That would be much appreciated.
Thanks
On 09/02/21 15:35, Richard Weinberger wrote:
> Miklos,
>
> ----- Ursprüngliche Mail -----
>>> The core goal of MUSE is having the complexity on the userspace side and
>>> only a small MTD driver in kernelspace.
>>> While playing with different approaches I realized that FUSE offers everything
>>> we need. So MUSE is a little like CUSE except that it does not implement a
>>> bare character device but an MTD.
>>
>> Looks fine.
>
> I'm glad to hear that!
>
>> I do wonder if MUSE should go to drivers/mtd/ instead. Long term
>> goal would be move CUSE to drivers/char and move the transport part of
>> fuse into net/fuse leaving only the actual filesystems (fuse and
>> virtiofs) under fs/.
>>
>> But for now just moving the minimal interface needed for MUSE into a
>> separate header (<net/fuse.h>) would work, I guess.
>>
>> Do you think that would make sense?
>
> Yes, I'm all for having MUSE in drivers/mtd/.
>
> I placed MUSE initially in fs/fuse/ because CUSE was already there and muse.c includes
> fuse_i.h. So tried to be as little invasive as possible.
>
>>>
>>> Notes:
>>> ------
>>>
>>> - OOB support is currently limited. Currently MUSE has no support for processing
>>> in- and out-band in the same MTD operation. It is good enough to make JFFS2
>>> happy. This limitation is because FUSE has no support more than one variable
>>> length buffer in a FUSE request.
>>> At least I didn’t find a good way to pass more than one buffer to a request.
>>> Maybe FUSE folks can correct me. :-)
>>
>> If you look at fuse_do_ioctl() it does variable length input and
>> output at the same time. I guess you need something similar to that.
>
> I'll dig into this!
>
> Thanks,
> //richard
>
>
Miklos,
----- Ursprüngliche Mail -----
>> I do wonder if MUSE should go to drivers/mtd/ instead. Long term
>> goal would be move CUSE to drivers/char and move the transport part of
>> fuse into net/fuse leaving only the actual filesystems (fuse and
>> virtiofs) under fs/.
>>
>> But for now just moving the minimal interface needed for MUSE into a
>> separate header (<net/fuse.h>) would work, I guess.
>>
>> Do you think that would make sense?
>
> Yes, I'm all for having MUSE in drivers/mtd/.
>
> I placed MUSE initially in fs/fuse/ because CUSE was already there and muse.c
> includes
> fuse_i.h. So tried to be as little invasive as possible.
I did a quick patch series which moves CUSE into drivers/char/
https://git.kernel.org/pub/scm/linux/kernel/git/rw/misc.git/log/?h=fs_fuse_split
Does this more or less what you had in mind?
If so, I'd submit these patches, rebase MUSE on them and do a v4 soon.
Thanks,
//richard
On Tue, Feb 9, 2021 at 9:06 PM Richard Weinberger <[email protected]> wrote:
>
> Miklos,
>
> ----- Ursprüngliche Mail -----
> >> I do wonder if MUSE should go to drivers/mtd/ instead. Long term
> >> goal would be move CUSE to drivers/char and move the transport part of
> >> fuse into net/fuse leaving only the actual filesystems (fuse and
> >> virtiofs) under fs/.
> >>
> >> But for now just moving the minimal interface needed for MUSE into a
> >> separate header (<net/fuse.h>) would work, I guess.
> >>
> >> Do you think that would make sense?
> >
> > Yes, I'm all for having MUSE in drivers/mtd/.
> >
> > I placed MUSE initially in fs/fuse/ because CUSE was already there and muse.c
> > includes
> > fuse_i.h. So tried to be as little invasive as possible.
>
> I did a quick patch series which moves CUSE into drivers/char/
>
> https://git.kernel.org/pub/scm/linux/kernel/git/rw/misc.git/log/?h=fs_fuse_split
>
> Does this more or less what you had in mind?
Just moving the whole internal header file is not nice. I did a
mechanical public/private separation of the interface based on what
CUSE uses. Incremental patch attached.
But this is just a start. From the big structures still left in
<net/fuse.h> CUSE only uses the following fields:
fc: .minor, max_read, max_write, rcu, release, initialized, num_waiting
fm: .fc
ff: .fm
fud: .fc
Dealing with the last 3 is trivial: create and alloc function for the
fm, and create accessor functions for the accessed fields.
Dealing with fc properly is probably a bit more involved, but does not
seem to be too compex at first glance.
Do you want to take a stab at cleaning this up further?
Thanks,
Miklos
On Tue, Feb 9, 2021 at 10:39 PM Richard Weinberger <[email protected]> wrote:
>
> Miklos,
>
> ----- Ursprüngliche Mail -----
> > If you look at fuse_do_ioctl() it does variable length input and
> > output at the same time. I guess you need something similar to that.
>
> I'm not sure whether I understand correctly.
>
> In MUSE one use case would be attaching two distinct (variable length) buffers to a
> single FUSE request, in both directions.
> If I read fuse_do_ioctl() correctly, it attaches always a single buffer per request
> but does multiple requests.
Right.
> In MUSE we cold go the same path and issue up to two requests.
> One for in-band and optionally a second one for the out-of-band data.
> Hmmm?
Does in-band and OOB data need to be handled together? If so, then
two requests is not a good option.
Thanks,
Miklos
On Wed, Feb 10, 2021 at 11:22 AM Miklos Szeredi <[email protected]> wrote:
> > In MUSE one use case would be attaching two distinct (variable length) buffers to a
> > single FUSE request, in both directions.
> > If I read fuse_do_ioctl() correctly, it attaches always a single buffer per request
> > but does multiple requests.
>
> Right.
>
> > In MUSE we cold go the same path and issue up to two requests.
> > One for in-band and optionally a second one for the out-of-band data.
> > Hmmm?
>
> Does in-band and OOB data need to be handled together? If so, then
> two requests is not a good option.
They can be handled separately. All I need to figure who to abstract this nicely
in libfuse.
--
Thanks,
//richard
On Wed, Feb 10, 2021 at 11:18 AM Miklos Szeredi <[email protected]> wrote:
> > Does this more or less what you had in mind?
>
> Just moving the whole internal header file is not nice. I did a
> mechanical public/private separation of the interface based on what
> CUSE uses. Incremental patch attached.
>
> But this is just a start. From the big structures still left in
> <net/fuse.h> CUSE only uses the following fields:
>
> fc: .minor, max_read, max_write, rcu, release, initialized, num_waiting
> fm: .fc
> ff: .fm
> fud: .fc
>
> Dealing with the last 3 is trivial: create and alloc function for the
> fm, and create accessor functions for the accessed fields.
Ah, ok. So the goal is that <net/fuse.h> provides the bare minimum such that
CUSE and MUSE can reside outside of fs/fuse?
> Dealing with fc properly is probably a bit more involved, but does not
> seem to be too compex at first glance.
>
> Do you want to take a stab at cleaning this up further?
Yes. I guess for MUSE the interface needs little adaptations as well.
But I won't be able to do this for the 5.12 merge window.
--
Thanks,
//richard
Hi Miklos,
Miklos Szeredi <[email protected]> wrote on Wed, 10 Feb 2021 11:16:45
+0100:
> On Tue, Feb 9, 2021 at 10:39 PM Richard Weinberger <[email protected]> wrote:
> >
> > Miklos,
> >
> > ----- Ursprüngliche Mail -----
> > > If you look at fuse_do_ioctl() it does variable length input and
> > > output at the same time. I guess you need something similar to that.
> >
> > I'm not sure whether I understand correctly.
> >
> > In MUSE one use case would be attaching two distinct (variable length) buffers to a
> > single FUSE request, in both directions.
> > If I read fuse_do_ioctl() correctly, it attaches always a single buffer per request
> > but does multiple requests.
>
> Right.
>
> > In MUSE we cold go the same path and issue up to two requests.
> > One for in-band and optionally a second one for the out-of-band data.
> > Hmmm?
>
> Does in-band and OOB data need to be handled together?
Short answer: yes.
> If so, then two requests is not a good option.
More detailed answer:
There is a type of MTD device (NAND devices) which are composed, for
each page, of X in-band bytes plus Y out-of-band metadata bytes.
Accessing either the in-band data, or the out-of-band data, or both at
the same time are all valid use cases.
* Read operation details:
From a hardware point of view, the out-of-band data is (almost)
always retrieved when the in-band data is read because it contains
meta-data used to correct eventual bitflips. In this case, if both
areas are requested, it is highly non-efficient to do two requests,
that's why the MTD core allows to do both at the same time.
* Write operation details:
Even worse, in the write case, you *must* write both at the same
time. It is physically impossible to do one after the other (still
with actual hardware, of course).
That is why it is preferable that MUSE will be able to access both in
a single request.
Thanks,
Miquèl
On Wed, Feb 10, 2021 at 11:12 AM Miklos Szeredi <[email protected]> wrote:
> But this is just a start. From the big structures still left in
> <net/fuse.h> CUSE only uses the following fields:
>
> fc: .minor, max_read, max_write, rcu, release, initialized, num_waiting
> fm: .fc
> ff: .fm
> fud: .fc
>
> Dealing with the last 3 is trivial: create and alloc function for the
> fm, and create accessor functions for the accessed fields.
>
> Dealing with fc properly is probably a bit more involved, but does not
> seem to be too compex at first glance.
>
> Do you want to take a stab at cleaning this up further?
On second thought, I'll finish this off, since I know the internal API better.
Thanks,
Miklos
Miquel,
----- Ursprüngliche Mail -----
>> Does in-band and OOB data need to be handled together?
>
> Short answer: yes.
>
>> If so, then two requests is not a good option.
>
> More detailed answer:
>
> There is a type of MTD device (NAND devices) which are composed, for
> each page, of X in-band bytes plus Y out-of-band metadata bytes.
>
> Accessing either the in-band data, or the out-of-band data, or both at
> the same time are all valid use cases.
>
> * Read operation details:
> From a hardware point of view, the out-of-band data is (almost)
> always retrieved when the in-band data is read because it contains
> meta-data used to correct eventual bitflips. In this case, if both
> areas are requested, it is highly non-efficient to do two requests,
> that's why the MTD core allows to do both at the same time.
> * Write operation details:
> Even worse, in the write case, you *must* write both at the same
> time. It is physically impossible to do one after the other (still
> with actual hardware, of course).
>
> That is why it is preferable that MUSE will be able to access both in
> a single request.
By single request we meant FUSE op-codes. The NAND simulator in Userspace
will see just one call. My plan is to abstract it in libfuse.
Thanks,
//richard
Hi Richard,
Richard Weinberger <[email protected]> wrote on Wed, 10 Feb 2021 12:23:53
+0100 (CET):
> Miquel,
>
> ----- Ursprüngliche Mail -----
> >> Does in-band and OOB data need to be handled together?
> >
> > Short answer: yes.
> >
> >> If so, then two requests is not a good option.
> >
> > More detailed answer:
> >
> > There is a type of MTD device (NAND devices) which are composed, for
> > each page, of X in-band bytes plus Y out-of-band metadata bytes.
> >
> > Accessing either the in-band data, or the out-of-band data, or both at
> > the same time are all valid use cases.
> >
> > * Read operation details:
> > From a hardware point of view, the out-of-band data is (almost)
> > always retrieved when the in-band data is read because it contains
> > meta-data used to correct eventual bitflips. In this case, if both
> > areas are requested, it is highly non-efficient to do two requests,
> > that's why the MTD core allows to do both at the same time.
> > * Write operation details:
> > Even worse, in the write case, you *must* write both at the same
> > time. It is physically impossible to do one after the other (still
> > with actual hardware, of course).
> >
> > That is why it is preferable that MUSE will be able to access both in
> > a single request.
>
> By single request we meant FUSE op-codes. The NAND simulator in Userspace
> will see just one call. My plan is to abstract it in libfuse.
If libfuse abstracts it, as long as MTD only sees a single request I'm
fine :)
Thanks,
Miquèl
----- Ursprüngliche Mail -----
>> By single request we meant FUSE op-codes. The NAND simulator in Userspace
>> will see just one call. My plan is to abstract it in libfuse.
>
> If libfuse abstracts it, as long as MTD only sees a single request I'm
> fine :)
:-)
I'll prototype that in the next few weeks. Let's see whether my plans are
doable to not.
Thanks,
//richard
On Wed, Feb 10, 2021 at 12:16 PM Miklos Szeredi <[email protected]> wrote:
>
> On Wed, Feb 10, 2021 at 11:12 AM Miklos Szeredi <[email protected]> wrote:
>
> > But this is just a start. From the big structures still left in
> > <net/fuse.h> CUSE only uses the following fields:
> >
> > fc: .minor, max_read, max_write, rcu, release, initialized, num_waiting
> > fm: .fc
> > ff: .fm
> > fud: .fc
> >
> > Dealing with the last 3 is trivial: create and alloc function for the
> > fm, and create accessor functions for the accessed fields.
> >
> > Dealing with fc properly is probably a bit more involved, but does not
> > seem to be too compex at first glance.
> >
> > Do you want to take a stab at cleaning this up further?
>
> On second thought, I'll finish this off, since I know the internal API better.
>
Pushed to
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git#fs_fuse_split
There's still room for improvement, but I guess this can wait after
MUSE integration.
Thanks,
Miklos
On Thu, Feb 11, 2021 at 7:09 PM Miklos Szeredi <[email protected]> wrote:
>
> Pushed to
>
> git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git#fs_fuse_split
>
> There's still room for improvement, but I guess this can wait after
> MUSE integration.
Hi Richard,
Have you had a chance of looking at this?
Thanks,
Miklos