2020-03-25 08:56:48

by WeiXiong Liao

[permalink] [raw]
Subject: [PATCH v3 00/11] pstore: mtd: support crash log to block and mtd device

Why do we need to log to block (mtd) device?
1. Most embedded intelligent equipment have no persistent ram, which
increases costs. We perfer to cheaper solutions, like block devices.
2. Do not any equipment have battery, which means that it lost all data
on general ram if power failure. Pstore has little to do for these
equipments.

Why do we need mtdpstore instead of mtdoops?
1. repetitive jobs between pstore and mtdoops
Both of pstore and mtdoops do the same jobs that store panic/oops log.
2. do what a driver should do
To me, a driver should provide methods instead of policies. What MTD
should do is to provide read/write/erase operations, geting rid of codes
about chunk management, kmsg dumper and configuration.
3. enhanced feature
Not only store log, but also show it as files.
Not only log, but also trigger time and trigger count.
Not only panic/oops log, but also log recorder for pmsg, console and
ftrace in the future.

Before upstream submission, pstore/blk is tested on arch ARM and x84_64,
block device and mtd device, built as modules and in kernel. Here are the
details:

https://github.com/gmpy/articles/blob/master/pstore/Test-Pstore-Block.md

[PATCH v3]:
1. patch 1~10: a lot of improvements according to Kees Cook <[email protected]>
including rename module, typo, reorder, rewrite document, bugs
and so on.
2. patch 11: rename funtions of pstore/blk and update document.
[PATCH v2]:
1. fix syntax error in documents. Thank Randy Dunlap <[email protected]>
2. replace pr_* with dev_* for mtdpstore.
Thank Vignesh Raghavendra <[email protected]>
3. improve mtdpstore. Thank Miquel Raynal <[email protected]>
[PATCH v1]:
1. fix errors and warnings reported by kbuild test robot.

WeiXiong Liao (11):
pstore/zone: a common layer to manage storage as zones
pstore/blk: new support logger for block devices
pstore/blk: respect for driver to pick pstore front-ends
pstore/blk: pstore/zone: support pmsg recorder
pstore/blk: blkoops: support console recorder
pstore/blk: blkoops: support ftrace recorder
Documentation: create document for pstore/blk
pstore/zone: skip broken zone for MTD device
pstore/blk: a way to get user configure about pstore front-ends.
pstore/zone: pstore/blk: support non-block devices
mtd: new support oops logger based on pstore/blk

Documentation/admin-guide/pstore-block.rst | 237 +++++
MAINTAINERS | 1 +
drivers/mtd/Kconfig | 10 +
drivers/mtd/Makefile | 1 +
drivers/mtd/mtdpstore.c | 564 +++++++++++
fs/pstore/Kconfig | 107 ++
fs/pstore/Makefile | 4 +
fs/pstore/platform.c | 3 +-
fs/pstore/pstore_blk.c | 480 +++++++++
fs/pstore/pstore_zone.c | 1511 ++++++++++++++++++++++++++++
include/linux/pstore.h | 1 +
include/linux/pstore_blk.h | 77 ++
include/linux/pstore_zone.h | 60 ++
13 files changed, 3055 insertions(+), 1 deletion(-)
create mode 100644 Documentation/admin-guide/pstore-block.rst
create mode 100644 drivers/mtd/mtdpstore.c
create mode 100644 fs/pstore/pstore_blk.c
create mode 100644 fs/pstore/pstore_zone.c
create mode 100644 include/linux/pstore_blk.h
create mode 100644 include/linux/pstore_zone.h

--
1.9.1


2020-03-25 08:57:01

by WeiXiong Liao

[permalink] [raw]
Subject: [PATCH v3 05/11] pstore/blk: blkoops: support console recorder

Support recorder for console. To enable console recorder, just make
console_size be greater than 0 and a multiple of 4096.

Signed-off-by: WeiXiong Liao <[email protected]>
---
fs/pstore/Kconfig | 12 ++++++++
fs/pstore/pstore_blk.c | 12 +++++++-
fs/pstore/pstore_zone.c | 70 +++++++++++++++++++++++++++++++++++++++++----
include/linux/pstore_zone.h | 4 ++-
4 files changed, 91 insertions(+), 7 deletions(-)

diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig
index 8cead860dcfc..bf90de48ad3c 100644
--- a/fs/pstore/Kconfig
+++ b/fs/pstore/Kconfig
@@ -225,6 +225,18 @@ config PSTORE_BLK_PMSG_SIZE
NOTE that, both Kconfig and module parameters can configure
pstore/blk, but module parameters have priority over Kconfig.

+config PSTORE_BLK_CONSOLE_SIZE
+ int "Size in Kbytes of console to store"
+ depends on PSTORE_BLK
+ depends on PSTORE_CONSOLE
+ default 64
+ help
+ This just sets size of console (console_size) for pstore/blk. The
+ size is in KB and must be a multiple of 4.
+
+ NOTE that, both Kconfig and module parameters can configure
+ pstore/blk, but module parameters have priority over Kconfig.
+
config PSTORE_BLK_DUMP_OOPS
bool "dump oops"
depends on PSTORE_BLK
diff --git a/fs/pstore/pstore_blk.c b/fs/pstore/pstore_blk.c
index 85cd9f2335be..2b513acaa18f 100644
--- a/fs/pstore/pstore_blk.c
+++ b/fs/pstore/pstore_blk.c
@@ -26,6 +26,14 @@
module_param(pmsg_size, long, 0400);
MODULE_PARM_DESC(pmsg_size, "pmsg size in kbytes");

+#if IS_ENABLED(CONFIG_PSTORE_CONSOLE)
+static long console_size = CONFIG_PSTORE_BLK_CONSOLE_SIZE;
+#else
+static long console_size = -1;
+#endif
+module_param(console_size, long, 0400);
+MODULE_PARM_DESC(console_size, "console size in kbytes");
+
static int dump_oops = CONFIG_PSTORE_BLK_DUMP_OOPS;
module_param(dump_oops, int, 0400);
MODULE_PARM_DESC(total_size, "whether dump oops");
@@ -81,7 +89,8 @@
* whole disk).
* On success, the number of bytes should be returned, others
* means error.
- * @write: The same as @read.
+ * @write: The same as @read, but the following error number:
+ * -EBUSY means try to write again later.
* @panic_write:The write operation only used for panic case. It's optional
* if you do not care panic log. The parameters and return value
* are the same as @read.
@@ -131,6 +140,7 @@ static int psblk_register_do(struct psblk_device *dev)

verify_size(oops_size, 4096, dev->flags & PSTORE_FLAGS_DMESG);
verify_size(pmsg_size, 4096, dev->flags & PSTORE_FLAGS_PMSG);
+ verify_size(console_size, 4096, dev->flags & PSTORE_FLAGS_CONSOLE);
#undef verify_size
dump_oops = dump_oops <= 0 ? 0 : 1;

diff --git a/fs/pstore/pstore_zone.c b/fs/pstore/pstore_zone.c
index 444bce7f9ac3..e1e84505b046 100644
--- a/fs/pstore/pstore_zone.c
+++ b/fs/pstore/pstore_zone.c
@@ -87,10 +87,12 @@ struct psz_zone {
*
* @opszs: oops/panic storage zones
* @ppsz: pmsg storage zone
+ * @cpsz: console storage zone
* @oops_max_cnt: max count of @opszs
* @oops_read_cnt: counter to read oops zone
* @oops_write_cnt: counter to write
* @pmsg_read_cnt: counter to read pmsg zone
+ * @console_read_cnt: counter to read console zone
* @oops_counter: counter to oops
* @panic_counter: counter to panic
* @recovered: whether finish recovering data from storage
@@ -102,10 +104,12 @@ struct psz_zone {
struct psz_context {
struct psz_zone **opszs;
struct psz_zone *ppsz;
+ struct psz_zone *cpsz;
unsigned int oops_max_cnt;
unsigned int oops_read_cnt;
unsigned int oops_write_cnt;
unsigned int pmsg_read_cnt;
+ unsigned int console_read_cnt;
/*
* the counter should be recovered when recover.
* It records the oops/panic times after burning rather than booting.
@@ -125,6 +129,9 @@ struct psz_context {
};
static struct psz_context psz_cxt;

+static void psz_flush_all_dirty_zones(struct work_struct *);
+static DECLARE_WORK(psz_cleaner, psz_flush_all_dirty_zones);
+
/**
* enum psz_flush_mode - flush mode for psz_zone_write()
*
@@ -235,6 +242,9 @@ static int psz_zone_write(struct psz_zone *zone,
return 0;
dirty:
atomic_set(&zone->dirty, true);
+ /* flush dirty zones nicely */
+ if (wcnt == -EBUSY && !is_on_panic())
+ schedule_work(&psz_cleaner);
return -EBUSY;
}

@@ -291,6 +301,15 @@ static int psz_move_zone(struct psz_zone *old, struct psz_zone *new)
return 0;
}

+static void psz_flush_all_dirty_zones(struct work_struct *work)
+{
+ struct psz_context *cxt = &psz_cxt;
+
+ psz_flush_dirty_zone(cxt->ppsz);
+ psz_flush_dirty_zone(cxt->cpsz);
+ psz_flush_dirty_zones(cxt->opszs, cxt->oops_max_cnt);
+}
+
static int psz_recover_oops_data(struct psz_context *cxt)
{
struct psz_info *info = cxt->psz_info;
@@ -546,6 +565,10 @@ static inline int psz_recovery(struct psz_context *cxt)
if (ret)
goto recover_fail;

+ ret = psz_recover_zone(cxt, cxt->cpsz);
+ if (ret)
+ goto recover_fail;
+
pr_debug("recover end!\n");
atomic_set(&cxt->recovered, 1);
return 0;
@@ -561,6 +584,7 @@ static int psz_pstore_open(struct pstore_info *psi)

cxt->oops_read_cnt = 0;
cxt->pmsg_read_cnt = 0;
+ cxt->console_read_cnt = 0;
return 0;
}

@@ -625,8 +649,9 @@ static int psz_pstore_erase(struct pstore_record *record)
return psz_oops_erase(cxt, cxt->opszs[record->id], record);
case PSTORE_TYPE_PMSG:
return psz_record_erase(cxt, cxt->ppsz);
- default:
- return -EINVAL;
+ case PSTORE_TYPE_CONSOLE:
+ return psz_record_erase(cxt, cxt->cpsz);
+ default: return -EINVAL;
}
}

@@ -767,9 +792,18 @@ static int notrace psz_pstore_write(struct pstore_record *record)
record->reason == KMSG_DUMP_PANIC)
atomic_set(&cxt->on_panic, 1);

+ /*
+ * if on panic, do not write except panic records
+ * Fix case that panic_write prints log which wakes up console recorder.
+ */
+ if (is_on_panic() && record->type != PSTORE_TYPE_DMESG)
+ return -EBUSY;
+
switch (record->type) {
case PSTORE_TYPE_DMESG:
return psz_oops_write(cxt, record);
+ case PSTORE_TYPE_CONSOLE:
+ return psz_record_write(cxt->cpsz, record);
case PSTORE_TYPE_PMSG:
return psz_record_write(cxt->ppsz, record);
default:
@@ -794,6 +828,13 @@ static struct psz_zone *psz_read_next_zone(struct psz_context *cxt)
return zone;
}

+ if (cxt->console_read_cnt == 0) {
+ cxt->console_read_cnt++;
+ zone = cxt->cpsz;
+ if (psz_old_ok(zone))
+ return zone;
+ }
+
return NULL;
}

@@ -903,6 +944,8 @@ static ssize_t psz_pstore_read(struct pstore_record *record)
readop = psz_oops_read;
record->id = cxt->oops_read_cnt - 1;
break;
+ case PSTORE_TYPE_CONSOLE:
+ fallthrough;
case PSTORE_TYPE_PMSG:
readop = psz_record_read;
break;
@@ -1050,6 +1093,8 @@ static void psz_free_all_zones(struct psz_context *cxt)
psz_free_zones(&cxt->opszs, &cxt->oops_max_cnt);
if (cxt->ppsz)
psz_free_zone(&cxt->ppsz);
+ if (cxt->cpsz)
+ psz_free_zone(&cxt->cpsz);
}

static int psz_alloc_zones(struct psz_context *cxt)
@@ -1066,6 +1111,14 @@ static int psz_alloc_zones(struct psz_context *cxt)
goto free_out;
}

+ off_size += info->console_size;
+ cxt->cpsz = psz_init_zone(PSTORE_TYPE_CONSOLE, &off,
+ info->console_size);
+ if (IS_ERR(cxt->cpsz)) {
+ err = PTR_ERR(cxt->cpsz);
+ goto free_out;
+ }
+
cxt->opszs = psz_init_zones(PSTORE_TYPE_DMESG, &off,
info->total_size - off_size,
info->oops_size, &cxt->oops_max_cnt);
@@ -1100,7 +1153,7 @@ int psz_register(struct psz_info *info)
return -EINVAL;
}

- if (!info->oops_size && !info->pmsg_size) {
+ if (!info->oops_size && !info->pmsg_size && !info->console_size) {
pr_warn("at least one of the records be non-zero\n");
return -EINVAL;
}
@@ -1128,6 +1181,7 @@ int psz_register(struct psz_info *info)
check_size(total_size, 4096);
check_size(oops_size, SECTOR_SIZE);
check_size(pmsg_size, SECTOR_SIZE);
+ check_size(console_size, SECTOR_SIZE);

#undef check_size

@@ -1160,6 +1214,7 @@ int psz_register(struct psz_info *info)
pr_debug("\ttotal size : %ld Bytes\n", info->total_size);
pr_debug("\toops size : %ld Bytes\n", info->oops_size);
pr_debug("\tpmsg size : %ld Bytes\n", info->pmsg_size);
+ pr_debug("\tconsole size : %ld Bytes\n", info->console_size);

err = psz_alloc_zones(cxt);
if (err) {
@@ -1181,11 +1236,14 @@ int psz_register(struct psz_info *info)
cxt->pstore.flags |= PSTORE_FLAGS_DMESG;
if (info->pmsg_size)
cxt->pstore.flags |= PSTORE_FLAGS_PMSG;
+ if (info->console_size)
+ cxt->pstore.flags |= PSTORE_FLAGS_CONSOLE;

- pr_info("Registered %s as pszone backend for%s%s%s\n", info->name,
+ pr_info("Registered %s as pszone backend for%s%s%s%s\n", info->name,
cxt->opszs && cxt->psz_info->dump_oops ? " Oops" : "",
cxt->opszs && cxt->psz_info->panic_write ? " Panic" : "",
- cxt->ppsz ? " Pmsg" : "");
+ cxt->ppsz ? " Pmsg" : "",
+ cxt->cpsz ? " Console" : "");

err = pstore_register(&cxt->pstore);
if (err) {
@@ -1219,6 +1277,8 @@ void psz_unregister(struct psz_info *info)
{
struct psz_context *cxt = &psz_cxt;

+ flush_work(&psz_cleaner);
+
pstore_unregister(&cxt->pstore);
kfree(cxt->pstore.buf);
cxt->pstore.bufsize = 0;
diff --git a/include/linux/pstore_zone.h b/include/linux/pstore_zone.h
index 85e159d8f935..8a1838633010 100644
--- a/include/linux/pstore_zone.h
+++ b/include/linux/pstore_zone.h
@@ -17,12 +17,13 @@
* @oops_size: The size of oops/panic zone. Zero means disabled, otherwise,
* it must be multiple of SECTOR_SIZE(512 Bytes).
* @pmsg_size: The size of pmsg zone which is the same as @oops_size.
+ * @console_size:The size of console zone which is the same as @oops_size.
* @dump_oops: Whether to dump oops log.
* @read: The general read operation. Both of the function parameters
* @size and @offset are relative value to storage.
* On success, the number of bytes should be returned, others
* means error.
- * @write: The same as @read.
+ * @write: The same as @read, but -EBUSY means try to write again later.
* @panic_write:The write operation only used for panic case. It's optional
* if you do not care panic log. The parameters and return value
* are the same as @read.
@@ -34,6 +35,7 @@ struct psz_info {
unsigned long total_size;
unsigned long oops_size;
unsigned long pmsg_size;
+ unsigned long console_size;
int dump_oops;
psz_read_op read;
psz_write_op write;
--
1.9.1

2020-03-25 08:57:10

by WeiXiong Liao

[permalink] [raw]
Subject: [PATCH v3 08/11] pstore/zone: skip broken zone for MTD device

It's one of a series of patches to adapt to MTD device.

It's different with block device, the MTD device will make new broken
blocks (or bad block) while running. It's necessary for pstore/blk to
skip the broken block.

The MTD driver should return -ENOMSG if meets a bad block, which tells
pstore/zone to skip and try next one.

Signed-off-by: WeiXiong Liao <[email protected]>
---
fs/pstore/pstore_blk.c | 10 +++++--
fs/pstore/pstore_zone.c | 65 ++++++++++++++++++++++++++++++++++++---------
include/linux/pstore_blk.h | 3 ++-
include/linux/pstore_zone.h | 12 ++++++---
4 files changed, 71 insertions(+), 19 deletions(-)

diff --git a/fs/pstore/pstore_blk.c b/fs/pstore/pstore_blk.c
index 560e7b3f0945..d6949146135b 100644
--- a/fs/pstore/pstore_blk.c
+++ b/fs/pstore/pstore_blk.c
@@ -99,9 +99,12 @@
* means error.
* @write: The same as @read, but the following error number:
* -EBUSY means try to write again later.
+ * -ENOMSG means to try next zone.
* @panic_write:The write operation only used for panic case. It's optional
- * if you do not care panic log. The parameters and return value
- * are the same as @read.
+ * if you do not care panic log. The parameters are relative
+ * value to storage.
+ * On success, the number of bytes should be returned, others
+ * excluding -ENOMSG mean error. -ENOMSG means to try next zone.
*/
struct psblk_device {
unsigned long total_size;
@@ -314,6 +317,9 @@ static ssize_t psblk_blk_panic_write(const char *buf, size_t size,
/* size and off must align to SECTOR_SIZE for block device */
ret = blkdev_panic_write(buf, off >> SECTOR_SHIFT,
size >> SECTOR_SHIFT);
+ /* try next zone */
+ if (ret == -ENOMSG)
+ return ret;
return ret ? -EIO : size;
}

diff --git a/fs/pstore/pstore_zone.c b/fs/pstore/pstore_zone.c
index 1da4fc760d31..84ff53c1aaae 100644
--- a/fs/pstore/pstore_zone.c
+++ b/fs/pstore/pstore_zone.c
@@ -247,6 +247,9 @@ static int psz_zone_write(struct psz_zone *zone,

return 0;
dirty:
+ /* no need to mark dirty if going to try next zone */
+ if (wcnt == -ENOMSG)
+ return -ENOMSG;
atomic_set(&zone->dirty, true);
/* flush dirty zones nicely */
if (wcnt == -EBUSY && !is_on_panic())
@@ -382,7 +385,11 @@ static int psz_recover_oops_meta(struct psz_context *cxt)
return -EINVAL;

rcnt = info->read((char *)buf, len, zone->off);
- if (rcnt != len) {
+ if (rcnt == -ENOMSG) {
+ pr_debug("%s with id %lu may be broken, skip\n",
+ zone->name, i);
+ continue;
+ } else if (rcnt != len) {
pr_err("read %s with id %lu failed\n", zone->name, i);
return (int)rcnt < 0 ? (int)rcnt : -EIO;
}
@@ -717,24 +724,58 @@ static void psz_write_kmsg_hdr(struct psz_zone *zone,
hdr->counter = 0;
}

+/*
+ * In case zone is broken, which may occur to MTD device, we try each zones,
+ * start at cxt->oops_write_cnt.
+ */
static inline int notrace psz_oops_write_record(struct psz_context *cxt,
struct pstore_record *record)
{
+ int ret = -EBUSY;
size_t size, hlen;
struct psz_zone *zone;
- unsigned int zonenum;
+ unsigned int i;

- zonenum = cxt->oops_write_cnt;
- zone = cxt->opszs[zonenum];
- if (unlikely(!zone))
- return -ENOSPC;
- cxt->oops_write_cnt = (zonenum + 1) % cxt->oops_max_cnt;
+ for (i = 0; i < cxt->oops_max_cnt; i++) {
+ unsigned int zonenum, len;
+
+ zonenum = (cxt->oops_write_cnt + i) % cxt->oops_max_cnt;
+ zone = cxt->opszs[zonenum];
+ if (unlikely(!zone))
+ return -ENOSPC;

- pr_debug("write %s to zone id %d\n", zone->name, zonenum);
- psz_write_kmsg_hdr(zone, record);
- hlen = sizeof(struct psz_oops_header);
- size = min_t(size_t, record->size, zone->buffer_size - hlen);
- return psz_zone_write(zone, FLUSH_ALL, record->buf, size, hlen);
+ /* avoid destorying old data, allocate a new one */
+ len = zone->buffer_size + sizeof(*zone->buffer);
+ zone->oldbuf = zone->buffer;
+ zone->buffer = kzalloc(len, GFP_KERNEL);
+ if (!zone->buffer) {
+ zone->buffer = zone->oldbuf;
+ return -ENOMEM;
+ }
+ zone->buffer->sig = zone->oldbuf->sig;
+
+ pr_debug("write %s to zone id %d\n", zone->name, zonenum);
+ psz_write_kmsg_hdr(zone, record);
+ hlen = sizeof(struct psz_oops_header);
+ size = min_t(size_t, record->size, zone->buffer_size - hlen);
+ ret = psz_zone_write(zone, FLUSH_ALL, record->buf, size, hlen);
+ if (likely(!ret || ret != -ENOMSG)) {
+ cxt->oops_write_cnt = zonenum + 1;
+ cxt->oops_write_cnt %= cxt->oops_max_cnt;
+ /* no need to try next zone, free last zone buffer */
+ kfree(zone->oldbuf);
+ zone->oldbuf = NULL;
+ return ret;
+ }
+
+ pr_debug("zone %u may be broken, try next dmesg zone\n",
+ zonenum);
+ kfree(zone->buffer);
+ zone->buffer = zone->oldbuf;
+ zone->oldbuf = NULL;
+ }
+
+ return -EBUSY;
}

static int notrace psz_oops_write(struct psz_context *cxt,
diff --git a/include/linux/pstore_blk.h b/include/linux/pstore_blk.h
index d8f609e60288..828b0763d477 100644
--- a/include/linux/pstore_blk.h
+++ b/include/linux/pstore_blk.h
@@ -14,7 +14,8 @@
* @start_sect: start sector to block device
* @sects: sectors count on buf
*
- * Return: On success, zero should be returned. Others mean error.
+ * Return: On success, zero should be returned. Others excluding -ENOMSG
+ * mean error. -ENOMSG means to try next zone.
*
* Panic write to block device must be aligned to SECTOR_SIZE.
*/
diff --git a/include/linux/pstore_zone.h b/include/linux/pstore_zone.h
index a138e8b7dc20..2bfa79bff97b 100644
--- a/include/linux/pstore_zone.h
+++ b/include/linux/pstore_zone.h
@@ -23,11 +23,15 @@
* @read: The general read operation. Both of the function parameters
* @size and @offset are relative value to storage.
* On success, the number of bytes should be returned, others
- * means error.
- * @write: The same as @read, but -EBUSY means try to write again later.
+ * mean error.
+ * @write: The same as @read, but the following error number:
+ * -EBUSY means try to write again later.
+ * -ENOMSG means to try next zone.
* @panic_write:The write operation only used for panic case. It's optional
- * if you do not care panic log. The parameters and return value
- * are the same as @read.
+ * if you do not care panic log. The parameters are relative
+ * value to storage.
+ * On success, the number of bytes should be returned, others
+ * excluding -ENOMSG mean error. -ENOMSG means to try next zone.
*/
struct psz_info {
struct module *owner;
--
1.9.1

2020-03-25 08:57:18

by WeiXiong Liao

[permalink] [raw]
Subject: [PATCH v3 09/11] pstore/blk: a way to get user configure about pstore front-ends.

It's one of a series of patches to adapt to MTD device.

It's different to block driver, MTD driver should check zone size
and get identifier to MTD device. To make it, psblk_usr_info() is
created.

Signed-off-by: WeiXiong Liao <[email protected]>
---
fs/pstore/pstore_blk.c | 37 ++++++++++++++++++++++++++++++-------
include/linux/pstore_blk.h | 10 ++++++++++
2 files changed, 40 insertions(+), 7 deletions(-)

diff --git a/fs/pstore/pstore_blk.c b/fs/pstore/pstore_blk.c
index d6949146135b..061609b66d8a 100644
--- a/fs/pstore/pstore_blk.c
+++ b/fs/pstore/pstore_blk.c
@@ -84,6 +84,17 @@
sector_t start_sect;
} g_bdev_info;

+#define check_size(name, alignsize) ({ \
+ long _##name_ = (name); \
+ _##name_ = _##name_ <= 0 ? 0 : (_##name_ * 1024); \
+ if (_##name_ & ((alignsize) - 1)) { \
+ pr_info(#name " must align to %d\n", \
+ (alignsize)); \
+ _##name_ = ALIGN(name, (alignsize)); \
+ } \
+ _##name_; \
+})
+
/**
* struct psblk_device - back-end pstore/blk driver structure.
*
@@ -138,13 +149,11 @@ static int psblk_register_do(struct psblk_device *dev)
if (!dev->flags)
dev->flags = UINT_MAX;
#define verify_size(name, alignsize, enable) { \
- long _##name_ = (enable) ? (name) : 0; \
- _##name_ = _##name_ <= 0 ? 0 : (_##name_ * 1024); \
- if (_##name_ & ((alignsize) - 1)) { \
- pr_info(#name " must align to %d\n", \
- (alignsize)); \
- _##name_ = ALIGN(name, (alignsize)); \
- } \
+ long _##name_; \
+ if (enable) \
+ _##name_ = check_size(name, alignsize); \
+ else \
+ _##name_ = 0; \
name = _##name_ / 1024; \
psz_info->name = _##name_; \
}
@@ -464,6 +473,20 @@ int psblk_blkdev_info(dev_t *devt, sector_t *nr_sects, sector_t *start_sect)
}
EXPORT_SYMBOL_GPL(psblk_blkdev_info);

+/* get information of pstore/blk */
+int psblk_usr_info(struct psblk_info *info)
+{
+ strncpy(info->device, blkdev, 80);
+ info->dump_oops = dump_oops <= 0 ? 0 : 1;
+ info->oops_size = check_size(oops_size, 4096);
+ info->pmsg_size = check_size(pmsg_size, 4096);
+ info->ftrace_size = check_size(ftrace_size, 4096);
+ info->console_size = check_size(console_size, 4096);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(psblk_usr_info);
+
MODULE_LICENSE("GPL");
MODULE_AUTHOR("WeiXiong Liao <[email protected]>");
MODULE_DESCRIPTION("Block device Oops/Panic logger");
diff --git a/include/linux/pstore_blk.h b/include/linux/pstore_blk.h
index 828b0763d477..d2ea1733b51a 100644
--- a/include/linux/pstore_blk.h
+++ b/include/linux/pstore_blk.h
@@ -27,4 +27,14 @@ int psblk_register_blkdev(unsigned int major, unsigned int flags,
void psblk_unregister_blkdev(unsigned int major);
int psblk_blkdev_info(dev_t *devt, sector_t *nr_sects, sector_t *start_sect);

+struct psblk_info {
+ int dump_oops;
+ char device[80];
+ unsigned long oops_size;
+ unsigned long pmsg_size;
+ unsigned long console_size;
+ unsigned long ftrace_size;
+};
+int psblk_usr_info(struct psblk_info *info);
+
#endif
--
1.9.1

2020-03-25 08:57:35

by WeiXiong Liao

[permalink] [raw]
Subject: [PATCH v3 07/11] Documentation: create document for pstore/blk

Pstore/blk is a new back-end of pstore to dump oops/pmsg log to
block device.

The document, at Documentation/admin-guide/pstore-block.rst, tells us
how to use pstore/blk.

Signed-off-by: WeiXiong Liao <[email protected]>
---
Documentation/admin-guide/pstore-block.rst | 223 +++++++++++++++++++++++++++++
MAINTAINERS | 1 +
fs/pstore/Kconfig | 2 +
3 files changed, 226 insertions(+)
create mode 100644 Documentation/admin-guide/pstore-block.rst

diff --git a/Documentation/admin-guide/pstore-block.rst b/Documentation/admin-guide/pstore-block.rst
new file mode 100644
index 000000000000..a96415eaaf24
--- /dev/null
+++ b/Documentation/admin-guide/pstore-block.rst
@@ -0,0 +1,223 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Pstore block oops/panic logger
+==============================
+
+Introduction
+------------
+
+Pstore block (pstore/blk) is an oops/panic logger that writes its logs to a
+block device before the system crashes. You can get these log files by
+mounting pstore filesystem like::
+
+ mount -t pstore pstore /sys/fs/pstore
+
+
+Pstore block concepts
+---------------------
+
+Pstore/blk provides efficient configuration method for pstore/blk, which
+divides all configurations into two parts, configurations for user and
+configurations for driver.
+
+Configurations for user determine how pstore/blk works, such as pmsg_size,
+oops_size and so on. All of them support both Kconfig and module parameters,
+but module parameters have priority over Kconfig.
+
+Configurations for driver are all about block device, such as total_size
+of block device and read/write operations.
+
+Configurations for user
+-----------------------
+
+All of these configurations support both Kconfig and module parameters, but
+module parameters have priority over Kconfig.
+
+Here is an example for module parameters::
+
+ pstore_blk.blkdev=179:7 pstore_blk.oops_size=64
+
+The detail of each configurations may be of interest to you.
+
+blkdev
+~~~~~~
+
+The block device to use. Most of the time, it is a partition of block device.
+It's required for pstore/blk.
+
+It accepts the following variants:
+
+1. <hex_major><hex_minor> device number in hexadecimal represents itself; no
+ leading 0x, for example b302.
+#. /dev/<disk_name> represents the device number of disk
+#. /dev/<disk_name><decimal> represents the device number of partition - device
+ number of disk plus the partition number
+#. /dev/<disk_name>p<decimal> - same as the above; this form is used when disk
+ name of partitioned disk ends with a digit.
+#. PARTUUID=00112233-4455-6677-8899-AABBCCDDEEFF represents the unique id of
+ a partition if the partition table provides it. The UUID may be either an
+ EFI/GPT UUID, or refer to an MSDOS partition using the format SSSSSSSS-PP,
+ where SSSSSSSS is a zero-filled hex representation of the 32-bit
+ "NT disk signature", and PP is a zero-filled hex representation of the
+ 1-based partition number.
+#. PARTUUID=<UUID>/PARTNROFF=<int> to select a partition in relation to a
+ partition with a known unique id.
+#. <major>:<minor> major and minor number of the device separated by a colon.
+
+oops_size
+~~~~~~~~~
+
+The chunk size in KB for oops/panic front-end. It **MUST** be a multiple of 4.
+It's optional if you do not care oops/panic log.
+
+There are multiple chunks for oops/panic front-end depending on the remaining
+space except other pstore front-ends.
+
+Pstore/blk will log to oops/panic chunks one by one, and always overwrite the
+oldest chunk if there is no more free chunk.
+
+pmsg_size
+~~~~~~~~~
+
+The chunk size in KB for pmsg front-end. It **MUST** be a multiple of 4.
+It's optional if you do not care pmsg log.
+
+Unlike oops/panic front-end, there is only one chunk for pmsg front-end.
+
+Pmsg is a user space accessible pstore object. Writes to */dev/pmsg0* are
+appended to the chunk. On reboot the contents are available in
+*/sys/fs/pstore/pmsg-pstore-blk-0*.
+
+console_size
+~~~~~~~~~~~~
+
+The chunk size in KB for console front-end. It **MUST** be a multiple of 4.
+It's optional if you do not care console log.
+
+Similar to pmsg front-end, there is only one chunk for console front-end.
+
+All log of console will be appended to the chunk. On reboot the contents are
+available in */sys/fs/pstore/console-pstore-blk-0*.
+
+ftrace_size
+~~~~~~~~~~~
+
+The chunk size in KB for ftrace front-end. It **MUST** be a multiple of 4.
+It's optional if you do not care console log.
+
+Similar to oops front-end, there are multiple chunks for ftrace front-end
+depending on the count of cpu processors. Each chunk size is equal to
+ftrace_size / processors_count.
+
+All log of ftrace will be appended to the chunk. On reboot the contents are
+combined and available in */sys/fs/pstore/ftrace-pstore-blk-0*.
+
+Persistent function tracing might be useful for debugging software or hardware
+related hangs. Here is an example of usage::
+
+ # mount -t pstore pstore /sys/fs/pstore
+ # mount -t debugfs debugfs /sys/kernel/debug/
+ # echo 1 > /sys/kernel/debug/pstore/record_ftrace
+ # reboot -f
+ [...]
+ # mount -t pstore pstore /sys/fs/pstore
+ # tail /sys/fs/pstore/ftrace-pstore-blk-0
+ CPU:0 ts:5914676 c0063828 c0063b94 call_cpuidle <- cpu_startup_entry+0x1b8/0x1e0
+ CPU:0 ts:5914678 c039ecdc c006385c cpuidle_enter_state <- call_cpuidle+0x44/0x48
+ CPU:0 ts:5914680 c039e9a0 c039ecf0 cpuidle_enter_freeze <- cpuidle_enter_state+0x304/0x314
+ CPU:0 ts:5914681 c0063870 c039ea30 sched_idle_set_state <- cpuidle_enter_state+0x44/0x314
+ CPU:1 ts:5916720 c0160f59 c015ee04 kernfs_unmap_bin_file <- __kernfs_remove+0x140/0x204
+ CPU:1 ts:5916721 c05ca625 c015ee0c __mutex_lock_slowpath <- __kernfs_remove+0x148/0x204
+ CPU:1 ts:5916723 c05c813d c05ca630 yield_to <- __mutex_lock_slowpath+0x314/0x358
+ CPU:1 ts:5916724 c05ca2d1 c05ca638 __ww_mutex_lock <- __mutex_lock_slowpath+0x31c/0x358
+
+dump_oops
+~~~~~~~~~
+
+Dumping both oopses and panics can be done by setting 1 (not zero) in the
+``dump_oops`` member while setting 0 in that variable dumps only the panics.
+
+Configurations for driver
+-------------------------
+
+Only a block device driver cares about these configurations. A block device
+driver uses ``psblk_register_blkdev`` to register to pstore/blk.
+
+.. kernel-doc:: fs/pstore/pstore_blk.c
+ :identifiers: psblk_register_blkdev
+
+Compression and header
+----------------------
+
+Block device is large enough for uncompressed oops data. Actually we do not
+recommend data compression because pstore/blk will insert some information into
+the first line of oops/panic data. For example::
+
+ Panic: Total 16 times
+
+It means that it's OOPS|Panic for the 16th time since the first booting.
+Sometimes the number of occurrences of oops|panic since the first booting is
+important to judge whether the system is stable.
+
+The following line is inserted by pstore filesystem. For example::
+
+ Oops#2 Part1
+
+It means that it's OOPS for the 2nd time on the last boot.
+
+Reading the data
+----------------
+
+The dump data can be read from the pstore filesystem. The format for these
+files is ``dmesg-pstore-blk-[N]`` for oops/panic front-end,
+``pmsg-pstore-blk-0`` for pmsg front-end and so on. The timestamp of the
+dump file records the trigger time. To delete a stored record from block
+device, simply unlink the respective pstore file.
+
+Attentions in panic read/write APIs
+-----------------------------------
+
+If on panic, the kernel is not going to run for much longer, the tasks will not
+be scheduled and most kernel resources will be out of service. It
+looks like a single-threaded program running on a single-core computer.
+
+The following points require special attention for panic read/write APIs:
+
+1. Can **NOT** allocate any memory.
+ If you need memory, just allocate while the block driver is initializing
+ rather than waiting until the panic.
+#. Must be polled, **NOT** interrupt driven.
+ No task schedule any more. The block driver should delay to ensure the write
+ succeeds, but NOT sleep.
+#. Can **NOT** take any lock.
+ There is no other task, nor any shared resource; you are safe to break all
+ locks.
+#. Just use CPU to transfer.
+ Do not use DMA to transfer unless you are sure that DMA will not keep lock.
+#. Control registers directly.
+ Please control registers directly rather than use Linux kernel resources.
+ Do I/O map while initializing rather than wait until a panic occurs.
+#. Reset your block device and controller if necessary.
+ If you are not sure of the state of your block device and controller when
+ a panic occurs, you are safe to stop and reset them.
+
+Pstore/blk supports psblk_blkdev_info(), which is defined in
+*linux/pstore_blk.h*, to get information of using block device, such as the
+device number, sector count and start sector of the whole disk.
+
+pstore block internals
+----------------------
+
+For developer reference, here are all the important structures and APIs:
+
+.. kernel-doc:: fs/pstore/pstore_zone.c
+ :internal:
+
+.. kernel-doc:: include/linux/pstore_zone.h
+ :internal:
+
+.. kernel-doc:: fs/pstore/pstore_blk.c
+ :export:
+
+.. kernel-doc:: include/linux/pstore_blk.h
+ :internal:
diff --git a/MAINTAINERS b/MAINTAINERS
index cc0a4a8ae06a..f553323b68c9 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -13380,6 +13380,7 @@ F: include/linux/pstore*
F: drivers/firmware/efi/efi-pstore.c
F: drivers/acpi/apei/erst.c
F: Documentation/admin-guide/ramoops.rst
+F: Documentation/admin-guide/pstore-block.rst
F: Documentation/devicetree/bindings/reserved-memory/ramoops.txt
K: \b(pstore|ramoops)

diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig
index 55ce726be795..699f1925948f 100644
--- a/fs/pstore/Kconfig
+++ b/fs/pstore/Kconfig
@@ -171,6 +171,8 @@ config PSTORE_BLK
This enables panic and oops message to be logged to a block dev
where it can be read back at some later point.

+ For more information, see Documentation/admin-guide/pstore-block.rst.
+
If unsure, say N.

config PSTORE_BLK_BLKDEV
--
1.9.1

2020-03-25 08:57:48

by WeiXiong Liao

[permalink] [raw]
Subject: [PATCH v3 06/11] pstore/blk: blkoops: support ftrace recorder

Support recorder for ftrace. To enable ftrace recorder, just make
ftrace_size be greater than 0 and a multiple of 4096.

Signed-off-by: WeiXiong Liao <[email protected]>
---
fs/pstore/Kconfig | 12 ++++
fs/pstore/pstore_blk.c | 9 +++
fs/pstore/pstore_zone.c | 172 +++++++++++++++++++++++++++++++++++++++++++-
include/linux/pstore_zone.h | 2 +
4 files changed, 193 insertions(+), 2 deletions(-)

diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig
index bf90de48ad3c..55ce726be795 100644
--- a/fs/pstore/Kconfig
+++ b/fs/pstore/Kconfig
@@ -237,6 +237,18 @@ config PSTORE_BLK_CONSOLE_SIZE
NOTE that, both Kconfig and module parameters can configure
pstore/blk, but module parameters have priority over Kconfig.

+config PSTORE_BLK_FTRACE_SIZE
+ int "Size in Kbytes of ftarce to store"
+ depends on PSTORE_BLK
+ depends on PSTORE_FTRACE
+ default 64
+ help
+ This just sets size of ftrace (ftrace_size) for pstore/blk. The
+ size is in KB and must be a multiple of 4.
+
+ NOTE that, both Kconfig and module parameters can configure
+ pstore/blk, but module parameters have priority over Kconfig.
+
config PSTORE_BLK_DUMP_OOPS
bool "dump oops"
depends on PSTORE_BLK
diff --git a/fs/pstore/pstore_blk.c b/fs/pstore/pstore_blk.c
index 2b513acaa18f..560e7b3f0945 100644
--- a/fs/pstore/pstore_blk.c
+++ b/fs/pstore/pstore_blk.c
@@ -34,6 +34,14 @@
module_param(console_size, long, 0400);
MODULE_PARM_DESC(console_size, "console size in kbytes");

+#if IS_ENABLED(CONFIG_PSTORE_FTRACE)
+static long ftrace_size = CONFIG_PSTORE_BLK_FTRACE_SIZE;
+#else
+static long ftrace_size = -1;
+#endif
+module_param(ftrace_size, long, 0400);
+MODULE_PARM_DESC(ftrace_size, "ftrace size in kbytes");
+
static int dump_oops = CONFIG_PSTORE_BLK_DUMP_OOPS;
module_param(dump_oops, int, 0400);
MODULE_PARM_DESC(total_size, "whether dump oops");
@@ -141,6 +149,7 @@ static int psblk_register_do(struct psblk_device *dev)
verify_size(oops_size, 4096, dev->flags & PSTORE_FLAGS_DMESG);
verify_size(pmsg_size, 4096, dev->flags & PSTORE_FLAGS_PMSG);
verify_size(console_size, 4096, dev->flags & PSTORE_FLAGS_CONSOLE);
+ verify_size(ftrace_size, 4096, dev->flags & PSTORE_FLAGS_FTRACE);
#undef verify_size
dump_oops = dump_oops <= 0 ? 0 : 1;

diff --git a/fs/pstore/pstore_zone.c b/fs/pstore/pstore_zone.c
index e1e84505b046..1da4fc760d31 100644
--- a/fs/pstore/pstore_zone.c
+++ b/fs/pstore/pstore_zone.c
@@ -88,11 +88,14 @@ struct psz_zone {
* @opszs: oops/panic storage zones
* @ppsz: pmsg storage zone
* @cpsz: console storage zone
+ * @fpszs: ftrace storage zones
* @oops_max_cnt: max count of @opszs
* @oops_read_cnt: counter to read oops zone
* @oops_write_cnt: counter to write
* @pmsg_read_cnt: counter to read pmsg zone
* @console_read_cnt: counter to read console zone
+ * @ftrace_max_cnt: max count of @fpszs
+ * @ftrace_read_cnt: counter to read ftrace zone
* @oops_counter: counter to oops
* @panic_counter: counter to panic
* @recovered: whether finish recovering data from storage
@@ -105,11 +108,14 @@ struct psz_context {
struct psz_zone **opszs;
struct psz_zone *ppsz;
struct psz_zone *cpsz;
+ struct psz_zone **fpszs;
unsigned int oops_max_cnt;
unsigned int oops_read_cnt;
unsigned int oops_write_cnt;
unsigned int pmsg_read_cnt;
unsigned int console_read_cnt;
+ unsigned int ftrace_max_cnt;
+ unsigned int ftrace_read_cnt;
/*
* the counter should be recovered when recover.
* It records the oops/panic times after burning rather than booting.
@@ -308,6 +314,7 @@ static void psz_flush_all_dirty_zones(struct work_struct *work)
psz_flush_dirty_zone(cxt->ppsz);
psz_flush_dirty_zone(cxt->cpsz);
psz_flush_dirty_zones(cxt->opszs, cxt->oops_max_cnt);
+ psz_flush_dirty_zones(cxt->fpszs, cxt->ftrace_max_cnt);
}

static int psz_recover_oops_data(struct psz_context *cxt)
@@ -542,6 +549,31 @@ static int psz_recover_zone(struct psz_context *cxt, struct psz_zone *zone)
return ret;
}

+static int psz_recover_zones(struct psz_context *cxt,
+ struct psz_zone **zones, unsigned int cnt)
+{
+ int ret;
+ unsigned int i;
+ struct psz_zone *zone;
+
+ if (!zones)
+ return 0;
+
+ for (i = 0; i < cnt; i++) {
+ zone = zones[i];
+ if (unlikely(!zone))
+ continue;
+ ret = psz_recover_zone(cxt, zone);
+ if (ret)
+ goto recover_fail;
+ }
+
+ return 0;
+recover_fail:
+ pr_debug("recover %s[%u] failed\n", zone->name, i);
+ return ret;
+}
+
/**
* psz_recovery() - recover data from storage
* @cxt: the context of pstore/zone
@@ -569,6 +601,10 @@ static inline int psz_recovery(struct psz_context *cxt)
if (ret)
goto recover_fail;

+ ret = psz_recover_zones(cxt, cxt->fpszs, cxt->ftrace_max_cnt);
+ if (ret)
+ goto recover_fail;
+
pr_debug("recover end!\n");
atomic_set(&cxt->recovered, 1);
return 0;
@@ -585,6 +621,7 @@ static int psz_pstore_open(struct pstore_info *psi)
cxt->oops_read_cnt = 0;
cxt->pmsg_read_cnt = 0;
cxt->console_read_cnt = 0;
+ cxt->ftrace_read_cnt = 0;
return 0;
}

@@ -651,6 +688,10 @@ static int psz_pstore_erase(struct pstore_record *record)
return psz_record_erase(cxt, cxt->ppsz);
case PSTORE_TYPE_CONSOLE:
return psz_record_erase(cxt, cxt->cpsz);
+ case PSTORE_TYPE_FTRACE:
+ if (record->id >= cxt->ftrace_max_cnt)
+ return -EINVAL;
+ return psz_record_erase(cxt, cxt->fpszs[record->id]);
default: return -EINVAL;
}
}
@@ -806,6 +847,13 @@ static int notrace psz_pstore_write(struct pstore_record *record)
return psz_record_write(cxt->cpsz, record);
case PSTORE_TYPE_PMSG:
return psz_record_write(cxt->ppsz, record);
+ case PSTORE_TYPE_FTRACE: {
+ int zonenum = smp_processor_id();
+
+ if (!cxt->fpszs)
+ return -ENOSPC;
+ return psz_record_write(cxt->fpszs[zonenum], record);
+ }
default:
return -EINVAL;
}
@@ -821,6 +869,14 @@ static struct psz_zone *psz_read_next_zone(struct psz_context *cxt)
return zone;
}

+ if (cxt->ftrace_read_cnt < cxt->ftrace_max_cnt)
+ /*
+ * No need psz_old_ok(). Let psz_ftrace_read() do so for
+ * combination. psz_ftrace_read() should traverse over
+ * all zones in case of some zone without data.
+ */
+ return cxt->fpszs[cxt->ftrace_read_cnt++];
+
if (cxt->pmsg_read_cnt == 0) {
cxt->pmsg_read_cnt++;
zone = cxt->ppsz;
@@ -894,6 +950,98 @@ static ssize_t psz_oops_read(struct psz_zone *zone,
return size + hlen;
}

+static int psz_ftrace_combine(char *src1_buf, size_t src1_size,
+ char *src2_buf, size_t src2_size,
+ char **dest_buf, size_t *dest_size)
+{
+ size_t src1_off, src2_off, total;
+ size_t src1_idx = 0, src2_idx = 0, merged_idx = 0;
+ void *merged_buf;
+ struct pstore_ftrace_record *mrec, *s1rec, *s2rec;
+ size_t record_size = sizeof(struct pstore_ftrace_record);
+
+ src1_off = src1_size % record_size;
+ src1_size -= src1_off;
+
+ src2_off = src2_size % record_size;
+ src2_size -= src2_off;
+
+ total = src1_size + src2_size;
+ merged_buf = kmalloc(total, GFP_KERNEL);
+ if (!merged_buf)
+ return -ENOMEM;
+
+ s1rec = (struct pstore_ftrace_record *)(src1_buf + src1_off);
+ s2rec = (struct pstore_ftrace_record *)(src2_buf + src2_off);
+ mrec = (struct pstore_ftrace_record *)(merged_buf);
+
+ while (src1_size > 0 && src2_size > 0) {
+ u64 s1_ts, s2_ts;
+
+ s1_ts = pstore_ftrace_read_timestamp(&s1rec[src1_idx]);
+ s2_ts = pstore_ftrace_read_timestamp(&s2rec[src2_idx]);
+ if (s1_ts < s2_ts) {
+ mrec[merged_idx++] = s1rec[src1_idx++];
+ src1_size -= record_size;
+ } else {
+ mrec[merged_idx++] = s2rec[src2_idx++];
+ src2_size -= record_size;
+ }
+ }
+
+ while (src1_size > 0) {
+ mrec[merged_idx++] = s1rec[src1_idx++];
+ src1_size -= record_size;
+ }
+
+ while (src2_size > 0) {
+ mrec[merged_idx++] = s2rec[src2_idx++];
+ src2_size -= record_size;
+ }
+
+ *dest_buf = merged_buf;
+ *dest_size = total;
+ return 0;
+}
+
+/* try to combine all ftrace zones */
+static ssize_t psz_ftrace_read(struct psz_zone *zone,
+ struct pstore_record *record)
+{
+ struct psz_context *cxt = record->psi->data;
+ struct psz_buffer *buf;
+ char *dest;
+ size_t dest_size;
+ int ret;
+
+ if (!zone || !record)
+ return -ENOSPC;
+
+ if (!psz_old_ok(zone))
+ goto out;
+
+ buf = (struct psz_buffer *)zone->oldbuf;
+ if (!buf)
+ return -ENOMSG;
+
+ ret = psz_ftrace_combine(record->buf, record->size,
+ (char *)buf->data, atomic_read(&buf->datalen),
+ &dest, &dest_size);
+ if (unlikely(ret))
+ return ret;
+
+ kfree(record->buf);
+ record->buf = dest;
+ record->size = dest_size;
+
+out:
+ if (cxt->ftrace_read_cnt < cxt->ftrace_max_cnt)
+ /* then, read next ftrace zone */
+ return -ENOMSG;
+ record->id = 0;
+ return record->size ? record->size : -ENOMSG;
+}
+
static ssize_t psz_record_read(struct psz_zone *zone,
struct pstore_record *record)
{
@@ -944,6 +1092,9 @@ static ssize_t psz_pstore_read(struct pstore_record *record)
readop = psz_oops_read;
record->id = cxt->oops_read_cnt - 1;
break;
+ case PSTORE_TYPE_FTRACE:
+ readop = psz_ftrace_read;
+ break;
case PSTORE_TYPE_CONSOLE:
fallthrough;
case PSTORE_TYPE_PMSG:
@@ -1095,6 +1246,8 @@ static void psz_free_all_zones(struct psz_context *cxt)
psz_free_zone(&cxt->ppsz);
if (cxt->cpsz)
psz_free_zone(&cxt->cpsz);
+ if (cxt->fpszs)
+ psz_free_zones(&cxt->fpszs, &cxt->ftrace_max_cnt);
}

static int psz_alloc_zones(struct psz_context *cxt)
@@ -1119,6 +1272,16 @@ static int psz_alloc_zones(struct psz_context *cxt)
goto free_out;
}

+ off_size += info->ftrace_size;
+ cxt->fpszs = psz_init_zones(PSTORE_TYPE_FTRACE, &off,
+ info->ftrace_size,
+ info->ftrace_size / nr_cpu_ids,
+ &cxt->ftrace_max_cnt);
+ if (IS_ERR(cxt->fpszs)) {
+ err = PTR_ERR(cxt->fpszs);
+ goto free_out;
+ }
+
cxt->opszs = psz_init_zones(PSTORE_TYPE_DMESG, &off,
info->total_size - off_size,
info->oops_size, &cxt->oops_max_cnt);
@@ -1182,6 +1345,7 @@ int psz_register(struct psz_info *info)
check_size(oops_size, SECTOR_SIZE);
check_size(pmsg_size, SECTOR_SIZE);
check_size(console_size, SECTOR_SIZE);
+ check_size(ftrace_size, SECTOR_SIZE);

#undef check_size

@@ -1215,6 +1379,7 @@ int psz_register(struct psz_info *info)
pr_debug("\toops size : %ld Bytes\n", info->oops_size);
pr_debug("\tpmsg size : %ld Bytes\n", info->pmsg_size);
pr_debug("\tconsole size : %ld Bytes\n", info->console_size);
+ pr_debug("\tftrace size : %ld Bytes\n", info->ftrace_size);

err = psz_alloc_zones(cxt);
if (err) {
@@ -1238,12 +1403,15 @@ int psz_register(struct psz_info *info)
cxt->pstore.flags |= PSTORE_FLAGS_PMSG;
if (info->console_size)
cxt->pstore.flags |= PSTORE_FLAGS_CONSOLE;
+ if (info->ftrace_size)
+ cxt->pstore.flags |= PSTORE_FLAGS_FTRACE;

- pr_info("Registered %s as pszone backend for%s%s%s%s\n", info->name,
+ pr_info("Registered %s as pszone backend for%s%s%s%s%s\n", info->name,
cxt->opszs && cxt->psz_info->dump_oops ? " Oops" : "",
cxt->opszs && cxt->psz_info->panic_write ? " Panic" : "",
cxt->ppsz ? " Pmsg" : "",
- cxt->cpsz ? " Console" : "");
+ cxt->cpsz ? " Console" : "",
+ cxt->fpszs ? " Ftrace" : "");

err = pstore_register(&cxt->pstore);
if (err) {
diff --git a/include/linux/pstore_zone.h b/include/linux/pstore_zone.h
index 8a1838633010..a138e8b7dc20 100644
--- a/include/linux/pstore_zone.h
+++ b/include/linux/pstore_zone.h
@@ -18,6 +18,7 @@
* it must be multiple of SECTOR_SIZE(512 Bytes).
* @pmsg_size: The size of pmsg zone which is the same as @oops_size.
* @console_size:The size of console zone which is the same as @oops_size.
+ * @ftrace_size:The size of ftrace zone which is the same as @oops_size.
* @dump_oops: Whether to dump oops log.
* @read: The general read operation. Both of the function parameters
* @size and @offset are relative value to storage.
@@ -36,6 +37,7 @@ struct psz_info {
unsigned long oops_size;
unsigned long pmsg_size;
unsigned long console_size;
+ unsigned long ftrace_size;
int dump_oops;
psz_read_op read;
psz_write_op write;
--
1.9.1

2020-03-25 08:58:10

by WeiXiong Liao

[permalink] [raw]
Subject: [PATCH v3 11/11] mtd: new support oops logger based on pstore/blk

It's the last one of a series of patches to adapt to MTD device.

The mtdpstore is similar to mtdoops but more powerful. It bases on
pstore/blk, aims to store panic and oops logs to a flash partition,
where it can be read back as files after mounting pstore filesystem.

The pstore/blk is designed for block device at the very beginning,
but now, compatible to non-block device like MTD device.

To make mtdpstore work, "blkdev" of pstore/blk should be set as MTD
device name or MTD device number. Please refer to document at:
Documentation/admin-guide/pstore-block.rst

Why do we need mtdpstore?
1. repetitive jobs between pstore and mtdoops
Both of pstore and mtdoops do the same jobs that store panic/oops log.
They have much similar logic that register to kmsg dumper and store
log to several chunks one by one.
2. do what a driver should do
To me, a driver should provide methods instead of policies. What MTD
should do is to provide read/write/erase operations, geting rid of codes
about chunk management, kmsg dumper and configuration.
3. enhanced feature
Not only store log, but also show it as files.
Not only log, but also trigger time and trigger count.
Not only panic/oops log, but also log recorder for pmsg, console and
ftrace in the future.

Signed-off-by: WeiXiong Liao <[email protected]>
---
Documentation/admin-guide/pstore-block.rst | 9 +-
drivers/mtd/Kconfig | 10 +
drivers/mtd/Makefile | 1 +
drivers/mtd/mtdpstore.c | 564 +++++++++++++++++++++++++++++
4 files changed, 582 insertions(+), 2 deletions(-)
create mode 100644 drivers/mtd/mtdpstore.c

diff --git a/Documentation/admin-guide/pstore-block.rst b/Documentation/admin-guide/pstore-block.rst
index a31155b36e62..464efb0db932 100644
--- a/Documentation/admin-guide/pstore-block.rst
+++ b/Documentation/admin-guide/pstore-block.rst
@@ -43,9 +43,9 @@ blkdev
~~~~~~

The block device to use. Most of the time, it is a partition of block device.
-It's required for pstore/blk.
+It's required for pstore/blk. It is also used for MTD device.

-It accepts the following variants:
+It accepts the following variants for block device:

1. <hex_major><hex_minor> device number in hexadecimal represents itself; no
leading 0x, for example b302.
@@ -64,6 +64,11 @@ It accepts the following variants:
partition with a known unique id.
#. <major>:<minor> major and minor number of the device separated by a colon.

+It accepts the following variants for MTD device:
+
+1. <device name> MTD device name. "pstore" is recommended.
+#. <device number> MTD device number.
+
oops_size
~~~~~~~~~

diff --git a/drivers/mtd/Kconfig b/drivers/mtd/Kconfig
index 42d401ea60ee..6ddab796216d 100644
--- a/drivers/mtd/Kconfig
+++ b/drivers/mtd/Kconfig
@@ -170,6 +170,16 @@ config MTD_OOPS
buffer in a flash partition where it can be read back at some
later point.

+config MTD_PSTORE
+ tristate "Log panic/oops to an MTD buffer based on pstore"
+ depends on PSTORE_BLK
+ help
+ This enables panic and oops messages to be logged to a circular
+ buffer in a flash partition where it can be read back as files after
+ mounting pstore filesystem.
+
+ If unsure, say N.
+
config MTD_SWAP
tristate "Swap on MTD device support"
depends on MTD && SWAP
diff --git a/drivers/mtd/Makefile b/drivers/mtd/Makefile
index 56cc60ccc477..593d0593a038 100644
--- a/drivers/mtd/Makefile
+++ b/drivers/mtd/Makefile
@@ -20,6 +20,7 @@ obj-$(CONFIG_RFD_FTL) += rfd_ftl.o
obj-$(CONFIG_SSFDC) += ssfdc.o
obj-$(CONFIG_SM_FTL) += sm_ftl.o
obj-$(CONFIG_MTD_OOPS) += mtdoops.o
+obj-$(CONFIG_MTD_PSTORE) += mtdpstore.o
obj-$(CONFIG_MTD_SWAP) += mtdswap.o

nftl-objs := nftlcore.o nftlmount.o
diff --git a/drivers/mtd/mtdpstore.c b/drivers/mtd/mtdpstore.c
new file mode 100644
index 000000000000..e6fdaed6d1d3
--- /dev/null
+++ b/drivers/mtd/mtdpstore.c
@@ -0,0 +1,564 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define dev_fmt(fmt) "mtdoops-pstore: " fmt
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/pstore_blk.h>
+#include <linux/mtd/mtd.h>
+#include <linux/bitops.h>
+
+static struct mtdpstore_context {
+ int index;
+ struct psblk_info info;
+ struct psblk_device dev;
+ struct mtd_info *mtd;
+ unsigned long *rmmap; /* removed bit map */
+ unsigned long *usedmap; /* used bit map */
+ /*
+ * used for panic write
+ * As there are no block_isbad for panic case, we should keep this
+ * status before panic to ensure panic_write not failed.
+ */
+ unsigned long *badmap; /* bad block bit map */
+} oops_cxt;
+
+static int mtdpstore_block_isbad(struct mtdpstore_context *cxt, loff_t off)
+{
+ int ret;
+ struct mtd_info *mtd = cxt->mtd;
+ u64 blknum = div_u64(off, mtd->erasesize);
+
+ if (test_bit(blknum, cxt->badmap))
+ return true;
+ ret = mtd_block_isbad(mtd, off);
+ if (ret < 0) {
+ dev_err(&mtd->dev, "mtd_block_isbad failed, aborting\n");
+ return ret;
+ } else if (ret > 0) {
+ set_bit(blknum, cxt->badmap);
+ return true;
+ }
+ return false;
+}
+
+static inline int mtdpstore_panic_block_isbad(struct mtdpstore_context *cxt,
+ loff_t off)
+{
+ struct mtd_info *mtd = cxt->mtd;
+ u64 blknum = div_u64(off, mtd->erasesize);
+
+ return test_bit(blknum, cxt->badmap);
+}
+
+static inline void mtdpstore_mark_used(struct mtdpstore_context *cxt,
+ loff_t off)
+{
+ struct mtd_info *mtd = cxt->mtd;
+ u64 zonenum = div_u64(off, cxt->info.oops_size);
+
+ dev_dbg(&mtd->dev, "mark zone %llu used\n", zonenum);
+ set_bit(zonenum, cxt->usedmap);
+}
+
+static inline void mtdpstore_mark_unused(struct mtdpstore_context *cxt,
+ loff_t off)
+{
+ struct mtd_info *mtd = cxt->mtd;
+ u64 zonenum = div_u64(off, cxt->info.oops_size);
+
+ dev_dbg(&mtd->dev, "mark zone %llu unused\n", zonenum);
+ clear_bit(zonenum, cxt->usedmap);
+}
+
+static inline void mtdpstore_block_mark_unused(struct mtdpstore_context *cxt,
+ loff_t off)
+{
+ struct mtd_info *mtd = cxt->mtd;
+ u64 zonenum = div_u64(off, cxt->info.oops_size);
+ u32 zonecnt = cxt->mtd->erasesize / cxt->info.oops_size;
+
+ while (zonecnt > 0) {
+ dev_dbg(&mtd->dev, "mark zone %llu unused\n", zonenum);
+ clear_bit(zonenum, cxt->usedmap);
+ zonenum++;
+ zonecnt--;
+ }
+}
+
+static inline int mtdpstore_is_used(struct mtdpstore_context *cxt, loff_t off)
+{
+ u64 zonenum = div_u64(off, cxt->info.oops_size);
+ u64 blknum = div_u64(off, cxt->mtd->erasesize);
+
+ if (test_bit(blknum, cxt->badmap))
+ return true;
+ return test_bit(zonenum, cxt->usedmap);
+}
+
+static int mtdpstore_block_is_used(struct mtdpstore_context *cxt,
+ loff_t off)
+{
+ u64 zonenum = div_u64(off, cxt->info.oops_size);
+ u32 zonecnt = cxt->mtd->erasesize / cxt->info.oops_size;
+
+ while (zonecnt > 0) {
+ if (test_bit(zonenum, cxt->usedmap))
+ return true;
+ zonenum++;
+ zonecnt--;
+ }
+ return false;
+}
+
+static int mtdpstore_is_empty(struct mtdpstore_context *cxt, char *buf,
+ size_t size)
+{
+ struct mtd_info *mtd = cxt->mtd;
+ size_t sz;
+ int i;
+
+ sz = min_t(uint32_t, size, mtd->writesize / 4);
+ for (i = 0; i < sz; i++) {
+ if (buf[i] != (char)0xFF)
+ return false;
+ }
+ return true;
+}
+
+static void mtdpstore_mark_removed(struct mtdpstore_context *cxt, loff_t off)
+{
+ struct mtd_info *mtd = cxt->mtd;
+ u64 zonenum = div_u64(off, cxt->info.oops_size);
+
+ dev_dbg(&mtd->dev, "mark zone %llu removed\n", zonenum);
+ set_bit(zonenum, cxt->rmmap);
+}
+
+static void mtdpstore_block_clear_removed(struct mtdpstore_context *cxt,
+ loff_t off)
+{
+ u64 zonenum = div_u64(off, cxt->info.oops_size);
+ u32 zonecnt = cxt->mtd->erasesize / cxt->info.oops_size;
+
+ while (zonecnt > 0) {
+ clear_bit(zonenum, cxt->rmmap);
+ zonenum++;
+ zonecnt--;
+ }
+}
+
+static int mtdpstore_block_is_removed(struct mtdpstore_context *cxt,
+ loff_t off)
+{
+ u64 zonenum = div_u64(off, cxt->info.oops_size);
+ u32 zonecnt = cxt->mtd->erasesize / cxt->info.oops_size;
+
+ while (zonecnt > 0) {
+ if (test_bit(zonenum, cxt->rmmap))
+ return true;
+ zonenum++;
+ zonecnt--;
+ }
+ return false;
+}
+
+static int mtdpstore_erase_do(struct mtdpstore_context *cxt, loff_t off)
+{
+ struct mtd_info *mtd = cxt->mtd;
+ struct erase_info erase;
+ int ret;
+
+ dev_dbg(&mtd->dev, "try to erase off 0x%llx\n", off);
+ erase.len = cxt->mtd->erasesize;
+ erase.addr = off;
+ ret = mtd_erase(cxt->mtd, &erase);
+ if (!ret)
+ mtdpstore_block_clear_removed(cxt, off);
+ else
+ dev_err(&mtd->dev, "erase of region [0x%llx, 0x%llx] on \"%s\" failed\n",
+ (unsigned long long)erase.addr,
+ (unsigned long long)erase.len, cxt->info.device);
+ return ret;
+}
+
+/*
+ * called while removing file
+ *
+ * Avoiding over erasing, do erase block only when the whole block is unused.
+ * If the block contains valid log, do erase lazily on flush_removed() when
+ * unregister.
+ */
+static ssize_t mtdpstore_erase(size_t size, loff_t off)
+{
+ struct mtdpstore_context *cxt = &oops_cxt;
+
+ if (mtdpstore_block_isbad(cxt, off))
+ return -EIO;
+
+ mtdpstore_mark_unused(cxt, off);
+
+ /* If the block still has valid data, mtdpstore do erase lazily */
+ if (likely(mtdpstore_block_is_used(cxt, off))) {
+ mtdpstore_mark_removed(cxt, off);
+ return 0;
+ }
+
+ /* all zones are unused, erase it */
+ off = ALIGN_DOWN(off, cxt->mtd->erasesize);
+ return mtdpstore_erase_do(cxt, off);
+}
+
+/*
+ * What is security for mtdpstore?
+ * As there is no erase for panic case, we should ensure at least one zone
+ * is writable. Otherwise, panic write will fail.
+ * If zone is used, write operation will return -ENOMSG, which means that
+ * pstore/blk will try one by one until gets an empty zone. So, it is not
+ * needed to ensure the next zone is empty, but at least one.
+ */
+static int mtdpstore_security(struct mtdpstore_context *cxt, loff_t off)
+{
+ int ret = 0, i;
+ struct mtd_info *mtd = cxt->mtd;
+ u32 zonenum = (u32)div_u64(off, cxt->info.oops_size);
+ u32 zonecnt = (u32)div_u64(cxt->mtd->size, cxt->info.oops_size);
+ u32 blkcnt = (u32)div_u64(cxt->mtd->size, cxt->mtd->erasesize);
+ u32 erasesize = cxt->mtd->erasesize;
+
+ for (i = 0; i < zonecnt; i++) {
+ u32 num = (zonenum + i) % zonecnt;
+
+ /* found empty zone */
+ if (!test_bit(num, cxt->usedmap))
+ return 0;
+ }
+
+ /* If there is no any empty zone, we have no way but to do erase */
+ off = ALIGN_DOWN(off, erasesize);
+ while (blkcnt--) {
+ div64_u64_rem(off + erasesize, cxt->mtd->size, (u64 *)&off);
+
+ if (mtdpstore_block_isbad(cxt, off))
+ continue;
+
+ ret = mtdpstore_erase_do(cxt, off);
+ if (!ret) {
+ mtdpstore_block_mark_unused(cxt, off);
+ break;
+ }
+ }
+
+ if (ret)
+ dev_err(&mtd->dev, "all blocks bad!\n");
+ dev_dbg(&mtd->dev, "end security\n");
+ return ret;
+}
+
+static ssize_t mtdpstore_write(const char *buf, size_t size, loff_t off)
+{
+ struct mtdpstore_context *cxt = &oops_cxt;
+ struct mtd_info *mtd = cxt->mtd;
+ size_t retlen;
+ int ret;
+
+ if (mtdpstore_block_isbad(cxt, off))
+ return -ENOMSG;
+
+ /* zone is used, please try next one */
+ if (mtdpstore_is_used(cxt, off))
+ return -ENOMSG;
+
+ dev_dbg(&mtd->dev, "try to write off 0x%llx size %zu\n", off, size);
+ ret = mtd_write(cxt->mtd, off, size, &retlen, (u_char *)buf);
+ if (ret < 0 || retlen != size) {
+ dev_err(&mtd->dev, "write failure at %lld (%zu of %zu written), err %d\n",
+ off, retlen, size, ret);
+ return -EIO;
+ }
+ mtdpstore_mark_used(cxt, off);
+
+ mtdpstore_security(cxt, off);
+ return retlen;
+}
+
+static inline bool mtdpstore_is_io_error(int ret)
+{
+ return ret < 0 && !mtd_is_bitflip(ret) && !mtd_is_eccerr(ret);
+}
+
+/*
+ * All zones will be read as pstore/blk will read zone one by one when do
+ * recover.
+ */
+static ssize_t mtdpstore_read(char *buf, size_t size, loff_t off)
+{
+ struct mtdpstore_context *cxt = &oops_cxt;
+ struct mtd_info *mtd = cxt->mtd;
+ size_t retlen, done;
+ int ret;
+
+ if (mtdpstore_block_isbad(cxt, off))
+ return -ENOMSG;
+
+ dev_dbg(&mtd->dev, "try to read off 0x%llx size %zu\n", off, size);
+ for (done = 0, retlen = 0; done < size; done += retlen) {
+ retlen = 0;
+
+ ret = mtd_read(cxt->mtd, off + done, size - done, &retlen,
+ (u_char *)buf + done);
+ if (mtdpstore_is_io_error(ret)) {
+ dev_err(&mtd->dev, "read failure at %lld (%zu of %zu read), err %d\n",
+ off + done, retlen, size - done, ret);
+ /* the zone may be broken, try next one */
+ return -ENOMSG;
+ }
+
+ /*
+ * ECC error. The impact on log data is so small. Maybe we can
+ * still read it and try to understand. So mtdpstore just hands
+ * over what it gets and user can judge whether the data is
+ * valid or not.
+ */
+ if (mtd_is_eccerr(ret)) {
+ dev_err(&mtd->dev, "ecc error at %lld (%zu of %zu read), err %d\n",
+ off + done, retlen, size - done, ret);
+ /* driver may not set retlen when ecc error */
+ retlen = retlen == 0 ? size - done : retlen;
+ }
+ }
+
+ if (mtdpstore_is_empty(cxt, buf, size))
+ mtdpstore_mark_unused(cxt, off);
+ else
+ mtdpstore_mark_used(cxt, off);
+
+ mtdpstore_security(cxt, off);
+ return retlen;
+}
+
+static ssize_t mtdpstore_panic_write(const char *buf, size_t size, loff_t off)
+{
+ struct mtdpstore_context *cxt = &oops_cxt;
+ struct mtd_info *mtd = cxt->mtd;
+ size_t retlen;
+ int ret;
+
+ if (mtdpstore_panic_block_isbad(cxt, off))
+ return -ENOMSG;
+
+ /* zone is used, please try next one */
+ if (mtdpstore_is_used(cxt, off))
+ return -ENOMSG;
+
+ ret = mtd_panic_write(cxt->mtd, off, size, &retlen, (u_char *)buf);
+ if (ret < 0 || size != retlen) {
+ dev_err(&mtd->dev, "panic write failure at %lld (%zu of %zu read), err %d\n",
+ off, retlen, size, ret);
+ return -EIO;
+ }
+ mtdpstore_mark_used(cxt, off);
+
+ return retlen;
+}
+
+static void mtdpstore_notify_add(struct mtd_info *mtd)
+{
+ int ret;
+ struct mtdpstore_context *cxt = &oops_cxt;
+ struct psblk_info *info = &cxt->info;
+ unsigned long longcnt;
+
+ if (!strcmp(mtd->name, info->device))
+ cxt->index = mtd->index;
+
+ if (mtd->index != cxt->index || cxt->index < 0)
+ return;
+
+ dev_dbg(&mtd->dev, "found matching MTD device %s\n", mtd->name);
+
+ if (mtd->size < info->oops_size * 2) {
+ dev_err(&mtd->dev, "MTD partition %d not big enough\n",
+ mtd->index);
+ return;
+ }
+ /*
+ * oops_size must be aligned to 4096 Bytes, which is limited by
+ * psblk. The default value of oops_size is 64KB. If oops_size
+ * is larger than erasesize, some errors will occur since mtdpsotre
+ * is designed on it.
+ */
+ if (mtd->erasesize < info->oops_size) {
+ dev_err(&mtd->dev, "eraseblock size of MTD partition %d too small\n",
+ mtd->index);
+ return;
+ }
+ if (unlikely(info->oops_size % mtd->writesize)) {
+ dev_err(&mtd->dev, "record size %lu KB must align to write size %d KB\n",
+ info->oops_size / 1024,
+ mtd->writesize / 1024);
+ return;
+ }
+
+ longcnt = BITS_TO_LONGS(div_u64(mtd->size, info->oops_size));
+ cxt->rmmap = kcalloc(longcnt, sizeof(long), GFP_KERNEL);
+ cxt->usedmap = kcalloc(longcnt, sizeof(long), GFP_KERNEL);
+
+ longcnt = BITS_TO_LONGS(div_u64(mtd->size, mtd->erasesize));
+ cxt->badmap = kcalloc(longcnt, sizeof(long), GFP_KERNEL);
+
+ cxt->dev.total_size = mtd->size;
+ /* just support dmesg right now */
+ cxt->dev.flags = PSTORE_FLAGS_DMESG;
+ cxt->dev.read = mtdpstore_read;
+ cxt->dev.write = mtdpstore_write;
+ cxt->dev.erase = mtdpstore_erase;
+ cxt->dev.panic_write = mtdpstore_panic_write;
+
+ ret = psblk_register_device(&cxt->dev);
+ if (ret) {
+ dev_err(&mtd->dev, "mtd%d register to psblk failed\n",
+ mtd->index);
+ return;
+ }
+ cxt->mtd = mtd;
+ dev_info(&mtd->dev, "Attached to MTD device %d\n", mtd->index);
+}
+
+static int mtdpstore_flush_removed_do(struct mtdpstore_context *cxt,
+ loff_t off, size_t size)
+{
+ struct mtd_info *mtd = cxt->mtd;
+ u_char *buf;
+ int ret;
+ size_t retlen;
+ struct erase_info erase;
+
+ buf = kmalloc(mtd->erasesize, GFP_KERNEL);
+ if (!buf)
+ return -ENOMEM;
+
+ /* 1st. read to cache */
+ ret = mtd_read(mtd, off, mtd->erasesize, &retlen, buf);
+ if (mtdpstore_is_io_error(ret))
+ goto free;
+
+ /* 2nd. erase block */
+ erase.len = mtd->erasesize;
+ erase.addr = off;
+ ret = mtd_erase(mtd, &erase);
+ if (ret)
+ goto free;
+
+ /* 3rd. write back */
+ while (size) {
+ unsigned int zonesize = cxt->info.oops_size;
+
+ /* there is valid data on block, write back */
+ if (mtdpstore_is_used(cxt, off)) {
+ ret = mtd_write(mtd, off, zonesize, &retlen, buf);
+ if (ret)
+ dev_err(&mtd->dev, "write failure at %lld (%zu of %u written), err %d\n",
+ off, retlen, zonesize, ret);
+ }
+
+ off += zonesize;
+ size -= min_t(unsigned int, zonesize, size);
+ }
+
+free:
+ kfree(buf);
+ return ret;
+}
+
+/*
+ * What does mtdpstore_flush_removed() do?
+ * When user remove any log file on pstore filesystem, mtdpstore should do
+ * something to ensure log file removed. If the whole block is no longer used,
+ * it's nice to erase the block. However if the block still contains valid log,
+ * what mtdpstore can do is to erase and write the valid log back.
+ */
+static int mtdpstore_flush_removed(struct mtdpstore_context *cxt)
+{
+ struct mtd_info *mtd = cxt->mtd;
+ int ret;
+ loff_t off;
+ u32 blkcnt = (u32)div_u64(mtd->size, mtd->erasesize);
+
+ for (off = 0; blkcnt > 0; blkcnt--, off += mtd->erasesize) {
+ ret = mtdpstore_block_isbad(cxt, off);
+ if (ret)
+ continue;
+
+ ret = mtdpstore_block_is_removed(cxt, off);
+ if (!ret)
+ continue;
+
+ ret = mtdpstore_flush_removed_do(cxt, off, mtd->erasesize);
+ if (ret)
+ return ret;
+ }
+ return 0;
+}
+
+static void mtdpstore_notify_remove(struct mtd_info *mtd)
+{
+ struct mtdpstore_context *cxt = &oops_cxt;
+
+ if (mtd->index != cxt->index || cxt->index < 0)
+ return;
+
+ mtdpstore_flush_removed(cxt);
+
+ psblk_unregister_device(&cxt->dev);
+ kfree(cxt->badmap);
+ kfree(cxt->usedmap);
+ kfree(cxt->rmmap);
+ cxt->mtd = NULL;
+ cxt->index = -1;
+}
+
+static struct mtd_notifier mtdpstore_notifier = {
+ .add = mtdpstore_notify_add,
+ .remove = mtdpstore_notify_remove,
+};
+
+static int __init mtdpstore_init(void)
+{
+ int ret;
+ struct mtdpstore_context *cxt = &oops_cxt;
+ struct mtd_info *mtd = cxt->mtd;
+ struct psblk_info *info = &cxt->info;
+
+ ret = psblk_usr_info(info);
+ if (unlikely(ret))
+ return ret;
+
+ if (strlen(info->device) == 0) {
+ dev_err(&mtd->dev, "mtd device must be supplied\n");
+ return -EINVAL;
+ }
+ if (!info->oops_size) {
+ dev_err(&mtd->dev, "no recorder enabled\n");
+ return -EINVAL;
+ }
+
+ /* Setup the MTD device to use */
+ ret = kstrtoint((char *)info->device, 0, &cxt->index);
+ if (ret)
+ cxt->index = -1;
+
+ register_mtd_user(&mtdpstore_notifier);
+ return 0;
+}
+module_init(mtdpstore_init);
+
+static void __exit mtdpstore_exit(void)
+{
+ unregister_mtd_user(&mtdpstore_notifier);
+}
+module_exit(mtdpstore_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("WeiXiong Liao <[email protected]>");
+MODULE_DESCRIPTION("MTD Oops/Panic console logger/driver");
--
1.9.1

2020-03-25 08:59:06

by WeiXiong Liao

[permalink] [raw]
Subject: [PATCH v3 02/11] pstore/blk: new support logger for block devices

pstore/blk is similar to pstore/ram, but dump log to block device
rather than persistent ram.

Why do we need pstore/blk?
1. Most embedded intelligent equipment have no persistent ram, which
increases costs. We perfer to cheaper solutions, like block devices.
2. Do not any equipment have battery, which means that it lost all data
on general ram if power failure. Pstore has little to do for these
equipments.

Pstore/blk provides efficient configuration method. It divides all
configurations into 2 parts, configurations for user and
configurations for driver.

Configurations for user detemine how pstore/blk work, such as
dump_oops and oops_size. They can be set by Kconfig and module
parameter, but module parameter has priority over Kconfig.

Configurations for driver are all about block device, such as
total_size of device and read/write operations. They should be provided
by device drivers, calling psblk_register_blkdev().

If block device do not support panic write, @panic_write can be NULL.

Signed-off-by: WeiXiong Liao <[email protected]>
---
fs/pstore/Kconfig | 62 +++++++
fs/pstore/Makefile | 2 +
fs/pstore/pstore_blk.c | 425 +++++++++++++++++++++++++++++++++++++++++++++
include/linux/pstore_blk.h | 27 +++
4 files changed, 516 insertions(+)
create mode 100644 fs/pstore/pstore_blk.c
create mode 100644 include/linux/pstore_blk.h

diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig
index 5ad4ab68ac4f..590af61019c2 100644
--- a/fs/pstore/Kconfig
+++ b/fs/pstore/Kconfig
@@ -160,3 +160,65 @@ config PSTORE_ZONE
help
The common layer for pstore/blk (and pstore/ram in the future)
to manager storage as zones.
+
+config PSTORE_BLK
+ tristate "Log panic/oops to a block device"
+ depends on PSTORE
+ depends on BLOCK
+ select PSTORE_ZONE
+ default n
+ help
+ This enables panic and oops message to be logged to a block dev
+ where it can be read back at some later point.
+
+ If unsure, say N.
+
+config PSTORE_BLK_BLKDEV
+ string "block device identifier"
+ depends on PSTORE_BLK
+ default ""
+ help
+ Which block device should be used for pstore/blk.
+
+ It accept the following variants:
+ 1) <hex_major><hex_minor> device number in hexadecimal represents
+ itself no leading 0x, for example b302.
+ 2) /dev/<disk_name> represents the device number of disk
+ 3) /dev/<disk_name><decimal> represents the device number
+ of partition - device number of disk plus the partition number
+ 4) /dev/<disk_name>p<decimal> - same as the above, this form is
+ used when disk name of partitioned disk ends with a digit.
+ 5) PARTUUID=00112233-4455-6677-8899-AABBCCDDEEFF representing the
+ unique id of a partition if the partition table provides it.
+ The UUID may be either an EFI/GPT UUID, or refer to an MSDOS
+ partition using the format SSSSSSSS-PP, where SSSSSSSS is a zero-
+ filled hex representation of the 32-bit "NT disk signature", and PP
+ is a zero-filled hex representation of the 1-based partition number.
+ 6) PARTUUID=<UUID>/PARTNROFF=<int> to select a partition in relation
+ to a partition with a known unique id.
+ 7) <major>:<minor> major and minor number of the device separated by
+ a colon.
+
+ NOTE that, both Kconfig and module parameters can configure
+ pstore/blk, but module parameters have priority over Kconfig.
+
+config PSTORE_BLK_OOPS_SIZE
+ int "Size in Kbytes of oops/panic log to store"
+ depends on PSTORE_BLK
+ default 64
+ help
+ This just sets size of oops/panic log (oops_size) for pstore/blk.
+ The size is in KB and must be a multiple of 4.
+
+ NOTE that, both Kconfig and module parameters can configure
+ pstore/blk, but module parameters have priority over Kconfig.
+
+config PSTORE_BLK_DUMP_OOPS
+ bool "dump oops"
+ depends on PSTORE_BLK
+ default y
+ help
+ Whether pstore/blk dumps oops or not.
+
+ NOTE that, both Kconfig and module parameters can configure
+ pstore/blk, but module parameters have priority over Kconfig.
diff --git a/fs/pstore/Makefile b/fs/pstore/Makefile
index 94f3631c80ce..0e1c5faf7a0b 100644
--- a/fs/pstore/Makefile
+++ b/fs/pstore/Makefile
@@ -14,3 +14,5 @@ ramoops-objs += ram.o ram_core.o
obj-$(CONFIG_PSTORE_RAM) += ramoops.o

obj-$(CONFIG_PSTORE_ZONE) += pstore_zone.o
+
+obj-$(CONFIG_PSTORE_BLK) += pstore_blk.o
diff --git a/fs/pstore/pstore_blk.c b/fs/pstore/pstore_blk.c
new file mode 100644
index 000000000000..2fbdd4563e5c
--- /dev/null
+++ b/fs/pstore/pstore_blk.c
@@ -0,0 +1,425 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define MODNAME "pstore-blk"
+#define pr_fmt(fmt) MODNAME ": " fmt
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/blkdev.h>
+#include <linux/string.h>
+#include <linux/of.h>
+#include <linux/of_address.h>
+#include <linux/platform_device.h>
+#include <linux/pstore_blk.h>
+#include <linux/mount.h>
+#include <linux/uio.h>
+
+static long oops_size = CONFIG_PSTORE_BLK_OOPS_SIZE;
+module_param(oops_size, long, 0400);
+MODULE_PARM_DESC(oops_size, "oops size in kbytes");
+
+static int dump_oops = CONFIG_PSTORE_BLK_DUMP_OOPS;
+module_param(dump_oops, int, 0400);
+MODULE_PARM_DESC(total_size, "whether dump oops");
+
+/*
+ * blkdev - The block device to use.
+ *
+ * Most of the time, it is a partition of block device.
+ *
+ * blkdev accepts the following variants:
+ * 1) <hex_major><hex_minor> device number in hexadecimal represents itself
+ * no leading 0x, for example b302.
+ * 2) /dev/<disk_name> represents the device number of disk
+ * 3) /dev/<disk_name><decimal> represents the device number
+ * of partition - device number of disk plus the partition number
+ * 4) /dev/<disk_name>p<decimal> - same as the above, that form is
+ * used when disk name of partitioned disk ends on a digit.
+ * 5) PARTUUID=00112233-4455-6677-8899-AABBCCDDEEFF representing the
+ * unique id of a partition if the partition table provides it.
+ * The UUID may be either an EFI/GPT UUID, or refer to an MSDOS
+ * partition using the format SSSSSSSS-PP, where SSSSSSSS is a zero-
+ * filled hex representation of the 32-bit "NT disk signature", and PP
+ * is a zero-filled hex representation of the 1-based partition number.
+ * 6) PARTUUID=<UUID>/PARTNROFF=<int> to select a partition in relation to
+ * a partition with a known unique id.
+ * 7) <major>:<minor> major and minor number of the device separated by
+ * a colon.
+ */
+static char blkdev[80] = CONFIG_PSTORE_BLK_BLKDEV;
+module_param_string(blkdev, blkdev, 80, 0400);
+MODULE_PARM_DESC(blkdev, "the block device for general read/write");
+
+static DEFINE_MUTEX(psz_lock);
+static struct block_device *psblk_bdev;
+static struct psz_info *psz_info;
+static psblk_panic_write_op blkdev_panic_write;
+static struct bdev_info {
+ dev_t devt;
+ sector_t nr_sects;
+ sector_t start_sect;
+} g_bdev_info;
+
+/**
+ * struct psblk_device - back-end pstore/blk driver structure.
+ *
+ * @total_size: The total size in bytes pstore/blk can use. It must be greater
+ * than 4096 and be multiple of 4096.
+ * @read: The general read operation. Both of the function parameters
+ * @size and @offset are relative value to bock device (not the
+ * whole disk).
+ * On success, the number of bytes should be returned, others
+ * means error.
+ * @write: The same as @read.
+ * @panic_write:The write operation only used for panic case. It's optional
+ * if you do not care panic log. The parameters and return value
+ * are the same as @read.
+ */
+struct psblk_device {
+ unsigned long total_size;
+ psz_read_op read;
+ psz_write_op write;
+ psz_write_op panic_write;
+};
+
+static int psblk_register_do(struct psblk_device *dev)
+{
+ int ret;
+
+ if (!dev || !dev->total_size || !dev->read || !dev->write)
+ return -EINVAL;
+
+ mutex_lock(&psz_lock);
+
+ /* someone already registered before */
+ if (psz_info) {
+ mutex_unlock(&psz_lock);
+ return -EBUSY;
+ }
+ psz_info = kzalloc(sizeof(struct psz_info), GFP_KERNEL);
+ if (!psz_info) {
+ mutex_unlock(&psz_lock);
+ return -ENOMEM;
+ }
+
+#define verify_size(name, alignsize) { \
+ long _##name_ = (name); \
+ _##name_ = _##name_ <= 0 ? 0 : (_##name_ * 1024); \
+ if (_##name_ & ((alignsize) - 1)) { \
+ pr_info(#name " must align to %d\n", \
+ (alignsize)); \
+ _##name_ = ALIGN(name, (alignsize)); \
+ } \
+ name = _##name_ / 1024; \
+ psz_info->name = _##name_; \
+ }
+
+ verify_size(oops_size, 4096);
+#undef verify_size
+ dump_oops = dump_oops <= 0 ? 0 : 1;
+
+ psz_info->total_size = dev->total_size;
+ psz_info->dump_oops = dump_oops;
+ psz_info->read = dev->read;
+ psz_info->write = dev->write;
+ psz_info->panic_write = dev->panic_write;
+ psz_info->name = MODNAME;
+ psz_info->owner = THIS_MODULE;
+
+ ret = psz_register(psz_info);
+ if (ret) {
+ kfree(psz_info);
+ psz_info = NULL;
+ }
+ mutex_unlock(&psz_lock);
+ return ret;
+}
+
+static void psblk_unregister_do(struct psblk_device *dev)
+{
+ mutex_lock(&psz_lock);
+ if (psz_info && psz_info->read == dev->read) {
+ psz_unregister(psz_info);
+ kfree(psz_info);
+ psz_info = NULL;
+ }
+ mutex_unlock(&psz_lock);
+}
+
+/**
+ * psblk_get_bdev() - open block device
+ * @holder: exclusive holder identifier
+ *
+ * Return: pointer to block device on success and others on error.
+ *
+ * On success, the returned block_device has reference count of one.
+ */
+static struct block_device *psblk_get_bdev(void *holder)
+{
+ struct block_device *bdev = ERR_PTR(-ENODEV);
+ fmode_t mode = FMODE_READ | FMODE_WRITE;
+
+ if (!blkdev[0])
+ return ERR_PTR(-ENODEV);
+
+ mutex_lock(&psz_lock);
+ if (psz_info)
+ goto out;
+ if (holder)
+ mode |= FMODE_EXCL;
+ bdev = blkdev_get_by_path(blkdev, mode, holder);
+ if (IS_ERR(bdev)) {
+ dev_t devt;
+
+ devt = name_to_dev_t(blkdev);
+ if (devt == 0) {
+ bdev = ERR_PTR(-ENODEV);
+ goto out;
+ }
+ bdev = blkdev_get_by_dev(devt, mode, holder);
+ }
+out:
+ mutex_unlock(&psz_lock);
+ return bdev;
+}
+
+static void psblk_put_bdev(struct block_device *bdev, void *holder)
+{
+ fmode_t mode = FMODE_READ | FMODE_WRITE;
+
+ if (!bdev)
+ return;
+
+ mutex_lock(&psz_lock);
+ if (holder)
+ mode |= FMODE_EXCL;
+ blkdev_put(bdev, mode);
+ mutex_unlock(&psz_lock);
+}
+
+static ssize_t psblk_generic_blk_read(char *buf, size_t bytes, loff_t pos)
+{
+ struct block_device *bdev = psblk_bdev;
+ struct file file;
+ struct kiocb kiocb;
+ struct iov_iter iter;
+ struct kvec iov = {.iov_base = buf, .iov_len = bytes};
+
+ if (!bdev)
+ return -ENODEV;
+
+ memset(&file, 0, sizeof(struct file));
+ file.f_mapping = bdev->bd_inode->i_mapping;
+ file.f_flags = O_DSYNC | __O_SYNC | O_NOATIME;
+ file.f_inode = bdev->bd_inode;
+ file_ra_state_init(&file.f_ra, file.f_mapping);
+
+ init_sync_kiocb(&kiocb, &file);
+ kiocb.ki_pos = pos;
+ iov_iter_kvec(&iter, READ, &iov, 1, bytes);
+
+ return generic_file_read_iter(&kiocb, &iter);
+}
+
+static ssize_t psblk_generic_blk_write(const char *buf, size_t bytes,
+ loff_t pos)
+{
+ struct block_device *bdev = psblk_bdev;
+ struct iov_iter iter;
+ struct kiocb kiocb;
+ struct file file;
+ ssize_t ret;
+ struct kvec iov = {.iov_base = (void *)buf, .iov_len = bytes};
+
+ if (!bdev)
+ return -ENODEV;
+
+ /* Console/Ftrace recorder may handle buffer until flush dirty zones */
+ if (in_interrupt() || irqs_disabled())
+ return -EBUSY;
+
+ memset(&file, 0, sizeof(struct file));
+ file.f_mapping = bdev->bd_inode->i_mapping;
+ file.f_flags = O_DSYNC | __O_SYNC | O_NOATIME;
+ file.f_inode = bdev->bd_inode;
+
+ init_sync_kiocb(&kiocb, &file);
+ kiocb.ki_pos = pos;
+ iov_iter_kvec(&iter, WRITE, &iov, 1, bytes);
+
+ inode_lock(bdev->bd_inode);
+ ret = generic_write_checks(&kiocb, &iter);
+ if (ret > 0)
+ ret = generic_perform_write(&file, &iter, pos);
+ inode_unlock(bdev->bd_inode);
+
+ if (likely(ret > 0)) {
+ const struct file_operations f_op = {.fsync = blkdev_fsync};
+
+ file.f_op = &f_op;
+ kiocb.ki_pos += ret;
+ ret = generic_write_sync(&kiocb, ret);
+ }
+ return ret;
+}
+
+static inline unsigned long psblk_bdev_size(struct block_device *bdev)
+{
+ return (unsigned long)part_nr_sects_read(bdev->bd_part) << SECTOR_SHIFT;
+}
+
+static ssize_t psblk_blk_panic_write(const char *buf, size_t size,
+ loff_t off)
+{
+ int ret;
+
+ if (!blkdev_panic_write)
+ return -EOPNOTSUPP;
+
+ /* size and off must align to SECTOR_SIZE for block device */
+ ret = blkdev_panic_write(buf, off >> SECTOR_SHIFT,
+ size >> SECTOR_SHIFT);
+ return ret ? -EIO : size;
+}
+
+static struct bdev_info *psblk_get_bdev_info(void)
+{
+ struct bdev_info *info = &g_bdev_info;
+ struct block_device *bdev;
+
+ if (info->devt)
+ return info;
+
+ bdev = psblk_get_bdev(NULL);
+ if (IS_ERR(bdev))
+ return ERR_CAST(bdev);
+
+ info->devt = bdev->bd_dev;
+ info->nr_sects = part_nr_sects_read(bdev->bd_part);
+ info->start_sect = get_start_sect(bdev);
+
+ if (!psblk_bdev_size(bdev)) {
+ pr_err("no enough space to '%s'\n", blkdev);
+ info = ERR_PTR(-ENOSPC);
+ }
+
+ psblk_put_bdev(bdev, NULL);
+ return info;
+}
+
+/**
+ * psblk_register_blkdev() - register block device to pstore/blk
+ *
+ * @major: the major device number of registering device
+ * @panic_write: the interface for panic case.
+ *
+ * Only the matching major to @blkdev can register.
+ *
+ * If block device do not support panic write, @panic_write can be NULL.
+ *
+ * Return:
+ * * 0 - OK
+ * * Others - something error.
+ */
+int psblk_register_blkdev(unsigned int major, psblk_panic_write_op panic_write)
+{
+ struct block_device *bdev;
+ struct psblk_device dev = {0};
+ struct bdev_info *binfo;
+ int ret = -ENODEV;
+ void *holder = blkdev;
+
+ binfo = psblk_get_bdev_info();
+ if (IS_ERR(binfo))
+ return PTR_ERR(binfo);
+
+ /* only allow driver matching the @blkdev */
+ if (!binfo->devt || MAJOR(binfo->devt) != major) {
+ pr_debug("invalid major %u (expect %u)\n",
+ major, MAJOR(binfo->devt));
+ return -ENODEV;
+ }
+
+ /* hold bdev exclusively */
+ bdev = psblk_get_bdev(holder);
+ if (IS_ERR(bdev)) {
+ pr_err("failed to open '%s'!\n", blkdev);
+ return PTR_ERR(bdev);
+ }
+
+ /* psblk_bdev must be assigned before register to pstore/blk */
+ psblk_bdev = bdev;
+ blkdev_panic_write = panic_write;
+
+ dev.total_size = psblk_bdev_size(bdev);
+ dev.panic_write = panic_write ? psblk_blk_panic_write : NULL;
+ dev.read = psblk_generic_blk_read;
+ dev.write = psblk_generic_blk_write;
+
+ ret = psblk_register_do(&dev);
+ if (ret)
+ goto err_put_bdev;
+
+ pr_info("using '%s'\n", blkdev);
+ return 0;
+
+err_put_bdev:
+ psblk_bdev = NULL;
+ blkdev_panic_write = NULL;
+ psblk_put_bdev(bdev, holder);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(psblk_register_blkdev);
+
+/**
+ * psblk_unregister_blkdev() - unregister block device from pstore/blk
+ *
+ * @major: the major device number of device
+ */
+void psblk_unregister_blkdev(unsigned int major)
+{
+ struct psblk_device dev = {.read = psblk_generic_blk_read};
+ void *holder = blkdev;
+
+ if (psblk_bdev && MAJOR(psblk_bdev->bd_dev) == major) {
+ psblk_unregister_do(&dev);
+ psblk_put_bdev(psblk_bdev, holder);
+ blkdev_panic_write = NULL;
+ psblk_bdev = NULL;
+ }
+}
+EXPORT_SYMBOL_GPL(psblk_unregister_blkdev);
+
+/**
+ * psblk_blkdev_info() - get information of @blkdev
+ *
+ * @devt: the block device num of @blkdev
+ * @nr_sects: the sector count of @blkdev
+ * @start_sect: the start sector of @blkdev
+ *
+ * Block driver needs the follow information for @panic_write.
+ *
+ * Return: 0 on success, others on failure.
+ */
+int psblk_blkdev_info(dev_t *devt, sector_t *nr_sects, sector_t *start_sect)
+{
+ struct bdev_info *binfo;
+
+ binfo = psblk_get_bdev_info();
+ if (IS_ERR(binfo))
+ return PTR_ERR(binfo);
+
+ if (devt)
+ *devt = binfo->devt;
+ if (nr_sects)
+ *nr_sects = binfo->nr_sects;
+ if (start_sect)
+ *start_sect = binfo->start_sect;
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(psblk_blkdev_info);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("WeiXiong Liao <[email protected]>");
+MODULE_DESCRIPTION("Block device Oops/Panic logger");
diff --git a/include/linux/pstore_blk.h b/include/linux/pstore_blk.h
new file mode 100644
index 000000000000..5ff465e3953e
--- /dev/null
+++ b/include/linux/pstore_blk.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __PSTORE_BLK_H_
+#define __PSTORE_BLK_H_
+
+#include <linux/types.h>
+#include <linux/pstore_zone.h>
+
+/**
+ * typedef psblk_panic_write_op - panic write operation to block device
+ *
+ * @buf: the data to write
+ * @start_sect: start sector to block device
+ * @sects: sectors count on buf
+ *
+ * Return: On success, zero should be returned. Others mean error.
+ *
+ * Panic write to block device must be aligned to SECTOR_SIZE.
+ */
+typedef int (*psblk_panic_write_op)(const char *buf, sector_t start_sect,
+ sector_t sects);
+
+int psblk_register_blkdev(unsigned int major, psblk_panic_write_op panic_write);
+void psblk_unregister_blkdev(unsigned int major);
+int psblk_blkdev_info(dev_t *devt, sector_t *nr_sects, sector_t *start_sect);
+
+#endif
--
1.9.1

2020-04-15 06:49:56

by WeiXiong Liao

[permalink] [raw]
Subject: Re: [PATCH v3 05/11] pstore/blk: blkoops: support console recorder

Hi Kees Cook,

A bad subject on patch 5 and patch 6 that "blkoops" should be "pstore/zone".

I will fix it in the next version.

On 2020/3/25 PM 4:55, WeiXiong Liao wrote:
> Support recorder for console. To enable console recorder, just make
> console_size be greater than 0 and a multiple of 4096.
>
> Signed-off-by: WeiXiong Liao <[email protected]>
> ---
> fs/pstore/Kconfig | 12 ++++++++
> fs/pstore/pstore_blk.c | 12 +++++++-
> fs/pstore/pstore_zone.c | 70 +++++++++++++++++++++++++++++++++++++++++----
> include/linux/pstore_zone.h | 4 ++-
> 4 files changed, 91 insertions(+), 7 deletions(-)
>
> diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig
> index 8cead860dcfc..bf90de48ad3c 100644
> --- a/fs/pstore/Kconfig
> +++ b/fs/pstore/Kconfig
> @@ -225,6 +225,18 @@ config PSTORE_BLK_PMSG_SIZE
> NOTE that, both Kconfig and module parameters can configure
> pstore/blk, but module parameters have priority over Kconfig.
>
> +config PSTORE_BLK_CONSOLE_SIZE
> + int "Size in Kbytes of console to store"
> + depends on PSTORE_BLK
> + depends on PSTORE_CONSOLE
> + default 64
> + help
> + This just sets size of console (console_size) for pstore/blk. The
> + size is in KB and must be a multiple of 4.
> +
> + NOTE that, both Kconfig and module parameters can configure
> + pstore/blk, but module parameters have priority over Kconfig.
> +
> config PSTORE_BLK_DUMP_OOPS
> bool "dump oops"
> depends on PSTORE_BLK
> diff --git a/fs/pstore/pstore_blk.c b/fs/pstore/pstore_blk.c
> index 85cd9f2335be..2b513acaa18f 100644
> --- a/fs/pstore/pstore_blk.c
> +++ b/fs/pstore/pstore_blk.c
> @@ -26,6 +26,14 @@
> module_param(pmsg_size, long, 0400);
> MODULE_PARM_DESC(pmsg_size, "pmsg size in kbytes");
>
> +#if IS_ENABLED(CONFIG_PSTORE_CONSOLE)
> +static long console_size = CONFIG_PSTORE_BLK_CONSOLE_SIZE;
> +#else
> +static long console_size = -1;
> +#endif
> +module_param(console_size, long, 0400);
> +MODULE_PARM_DESC(console_size, "console size in kbytes");
> +
> static int dump_oops = CONFIG_PSTORE_BLK_DUMP_OOPS;
> module_param(dump_oops, int, 0400);
> MODULE_PARM_DESC(total_size, "whether dump oops");
> @@ -81,7 +89,8 @@
> * whole disk).
> * On success, the number of bytes should be returned, others
> * means error.
> - * @write: The same as @read.
> + * @write: The same as @read, but the following error number:
> + * -EBUSY means try to write again later.
> * @panic_write:The write operation only used for panic case. It's optional
> * if you do not care panic log. The parameters and return value
> * are the same as @read.
> @@ -131,6 +140,7 @@ static int psblk_register_do(struct psblk_device *dev)
>
> verify_size(oops_size, 4096, dev->flags & PSTORE_FLAGS_DMESG);
> verify_size(pmsg_size, 4096, dev->flags & PSTORE_FLAGS_PMSG);
> + verify_size(console_size, 4096, dev->flags & PSTORE_FLAGS_CONSOLE);
> #undef verify_size
> dump_oops = dump_oops <= 0 ? 0 : 1;
>
> diff --git a/fs/pstore/pstore_zone.c b/fs/pstore/pstore_zone.c
> index 444bce7f9ac3..e1e84505b046 100644
> --- a/fs/pstore/pstore_zone.c
> +++ b/fs/pstore/pstore_zone.c
> @@ -87,10 +87,12 @@ struct psz_zone {
> *
> * @opszs: oops/panic storage zones
> * @ppsz: pmsg storage zone
> + * @cpsz: console storage zone
> * @oops_max_cnt: max count of @opszs
> * @oops_read_cnt: counter to read oops zone
> * @oops_write_cnt: counter to write
> * @pmsg_read_cnt: counter to read pmsg zone
> + * @console_read_cnt: counter to read console zone
> * @oops_counter: counter to oops
> * @panic_counter: counter to panic
> * @recovered: whether finish recovering data from storage
> @@ -102,10 +104,12 @@ struct psz_zone {
> struct psz_context {
> struct psz_zone **opszs;
> struct psz_zone *ppsz;
> + struct psz_zone *cpsz;
> unsigned int oops_max_cnt;
> unsigned int oops_read_cnt;
> unsigned int oops_write_cnt;
> unsigned int pmsg_read_cnt;
> + unsigned int console_read_cnt;
> /*
> * the counter should be recovered when recover.
> * It records the oops/panic times after burning rather than booting.
> @@ -125,6 +129,9 @@ struct psz_context {
> };
> static struct psz_context psz_cxt;
>
> +static void psz_flush_all_dirty_zones(struct work_struct *> +static DECLARE_WORK(psz_cleaner, psz_flush_all_dirty_zones);

I think it's better to use delayed work.

static DECLARE_DELAYED_WORK(psz_cleaner, psz_flush_all_dirty_zones);

> +
> /**
> * enum psz_flush_mode - flush mode for psz_zone_write()
> *
> @@ -235,6 +242,9 @@ static int psz_zone_write(struct psz_zone *zone,
> return 0;
> dirty:
> atomic_set(&zone->dirty, true);
> + /* flush dirty zones nicely */
> + if (wcnt == -EBUSY && !is_on_panic())
> + schedule_work(&psz_cleaner);

Change to:

schedule_delayed_work(&psz_cleaner, msecs_to_jiffies(500));

delay for 500ms to merge more log of console and reduce calling times.

> return -EBUSY;
> }
>
> @@ -291,6 +301,15 @@ static int psz_move_zone(struct psz_zone *old, struct psz_zone *new)
> return 0;
> }
>
> +static void psz_flush_all_dirty_zones(struct work_struct *work)
> +{
> + struct psz_context *cxt = &psz_cxt;
> +
> + psz_flush_dirty_zone(cxt->ppsz);
> + psz_flush_dirty_zone(cxt->cpsz);
> + psz_flush_dirty_zones(cxt->opszs, cxt->oops_max_cnt);
> +}

If flush dirty failed, I think it should try again later.

int ret = 0;

ret |= psz_flush_dirty_zone(cxt->ppsz);
ret |= psz_flush_dirty_zone(cxt->cpsz);
ret |= psz_flush_dirty_zones(cxt->opszs, cxt->oops_max_cnt);
if (ret)
schedule_delayed_work(&psz_cleaner, msecs_to_jiffies(1000));

I will fix it in the next version.

> +
> static int psz_recover_oops_data(struct psz_context *cxt)
> {
> struct psz_info *info = cxt->psz_info;
> @@ -546,6 +565,10 @@ static inline int psz_recovery(struct psz_context *cxt)
> if (ret)
> goto recover_fail;
>
> + ret = psz_recover_zone(cxt, cxt->cpsz);
> + if (ret)
> + goto recover_fail;
> +
> pr_debug("recover end!\n");
> atomic_set(&cxt->recovered, 1);
> return 0;
> @@ -561,6 +584,7 @@ static int psz_pstore_open(struct pstore_info *psi)
>
> cxt->oops_read_cnt = 0;
> cxt->pmsg_read_cnt = 0;
> + cxt->console_read_cnt = 0;
> return 0;
> }
>
> @@ -625,8 +649,9 @@ static int psz_pstore_erase(struct pstore_record *record)
> return psz_oops_erase(cxt, cxt->opszs[record->id], record);
> case PSTORE_TYPE_PMSG:
> return psz_record_erase(cxt, cxt->ppsz);
> - default:
> - return -EINVAL;
> + case PSTORE_TYPE_CONSOLE:
> + return psz_record_erase(cxt, cxt->cpsz);
> + default: return -EINVAL;
> }
> }
>
> @@ -767,9 +792,18 @@ static int notrace psz_pstore_write(struct pstore_record *record)
> record->reason == KMSG_DUMP_PANIC)
> atomic_set(&cxt->on_panic, 1);
>
> + /*
> + * if on panic, do not write except panic records
> + * Fix case that panic_write prints log which wakes up console recorder.
> + */
> + if (is_on_panic() && record->type != PSTORE_TYPE_DMESG)
> + return -EBUSY;
> +
> switch (record->type) {
> case PSTORE_TYPE_DMESG:
> return psz_oops_write(cxt, record);
> + case PSTORE_TYPE_CONSOLE:
> + return psz_record_write(cxt->cpsz, record);
> case PSTORE_TYPE_PMSG:
> return psz_record_write(cxt->ppsz, record);
> default:
> @@ -794,6 +828,13 @@ static struct psz_zone *psz_read_next_zone(struct psz_context *cxt)
> return zone;
> }
>
> + if (cxt->console_read_cnt == 0) {
> + cxt->console_read_cnt++;
> + zone = cxt->cpsz;
> + if (psz_old_ok(zone))
> + return zone;
> + }
> +
> return NULL;
> }
>
> @@ -903,6 +944,8 @@ static ssize_t psz_pstore_read(struct pstore_record *record)
> readop = psz_oops_read;
> record->id = cxt->oops_read_cnt - 1;
> break;
> + case PSTORE_TYPE_CONSOLE:
> + fallthrough;
> case PSTORE_TYPE_PMSG:
> readop = psz_record_read;
> break;
> @@ -1050,6 +1093,8 @@ static void psz_free_all_zones(struct psz_context *cxt)
> psz_free_zones(&cxt->opszs, &cxt->oops_max_cnt);
> if (cxt->ppsz)
> psz_free_zone(&cxt->ppsz);
> + if (cxt->cpsz)
> + psz_free_zone(&cxt->cpsz);
> }
>
> static int psz_alloc_zones(struct psz_context *cxt)
> @@ -1066,6 +1111,14 @@ static int psz_alloc_zones(struct psz_context *cxt)
> goto free_out;
> }
>
> + off_size += info->console_size;
> + cxt->cpsz = psz_init_zone(PSTORE_TYPE_CONSOLE, &off,
> + info->console_size);
> + if (IS_ERR(cxt->cpsz)) {
> + err = PTR_ERR(cxt->cpsz);
> + goto free_out;
> + }
> +
> cxt->opszs = psz_init_zones(PSTORE_TYPE_DMESG, &off,
> info->total_size - off_size,
> info->oops_size, &cxt->oops_max_cnt);
> @@ -1100,7 +1153,7 @@ int psz_register(struct psz_info *info)
> return -EINVAL;
> }
>
> - if (!info->oops_size && !info->pmsg_size) {
> + if (!info->oops_size && !info->pmsg_size && !info->console_size) {
> pr_warn("at least one of the records be non-zero\n");
> return -EINVAL;
> }
> @@ -1128,6 +1181,7 @@ int psz_register(struct psz_info *info)
> check_size(total_size, 4096);
> check_size(oops_size, SECTOR_SIZE);
> check_size(pmsg_size, SECTOR_SIZE);
> + check_size(console_size, SECTOR_SIZE);
>
> #undef check_size
>
> @@ -1160,6 +1214,7 @@ int psz_register(struct psz_info *info)
> pr_debug("\ttotal size : %ld Bytes\n", info->total_size);
> pr_debug("\toops size : %ld Bytes\n", info->oops_size);
> pr_debug("\tpmsg size : %ld Bytes\n", info->pmsg_size);
> + pr_debug("\tconsole size : %ld Bytes\n", info->console_size);
>
> err = psz_alloc_zones(cxt);
> if (err) {
> @@ -1181,11 +1236,14 @@ int psz_register(struct psz_info *info)
> cxt->pstore.flags |= PSTORE_FLAGS_DMESG;
> if (info->pmsg_size)
> cxt->pstore.flags |= PSTORE_FLAGS_PMSG;
> + if (info->console_size)
> + cxt->pstore.flags |= PSTORE_FLAGS_CONSOLE;
>
> - pr_info("Registered %s as pszone backend for%s%s%s\n", info->name,
> + pr_info("Registered %s as pszone backend for%s%s%s%s\n", info->name,
> cxt->opszs && cxt->psz_info->dump_oops ? " Oops" : "",
> cxt->opszs && cxt->psz_info->panic_write ? " Panic" : "",
> - cxt->ppsz ? " Pmsg" : "");
> + cxt->ppsz ? " Pmsg" : "",
> + cxt->cpsz ? " Console" : "");
>
> err = pstore_register(&cxt->pstore);
> if (err) {
> @@ -1219,6 +1277,8 @@ void psz_unregister(struct psz_info *info)
> {
> struct psz_context *cxt = &psz_cxt;
>
> + flush_work(&psz_cleaner);

I think it should try to flush dirty zones before unregister.

psz_flush_all_dirty_zones(NULL);
flush_delayed_work(&psz_cleaner);

> +
> pstore_unregister(&cxt->pstore);
> kfree(cxt->pstore.buf);
> cxt->pstore.bufsize = 0;
> diff --git a/include/linux/pstore_zone.h b/include/linux/pstore_zone.h
> index 85e159d8f935..8a1838633010 100644
> --- a/include/linux/pstore_zone.h
> +++ b/include/linux/pstore_zone.h
> @@ -17,12 +17,13 @@
> * @oops_size: The size of oops/panic zone. Zero means disabled, otherwise,
> * it must be multiple of SECTOR_SIZE(512 Bytes).
> * @pmsg_size: The size of pmsg zone which is the same as @oops_size.
> + * @console_size:The size of console zone which is the same as @oops_size.
> * @dump_oops: Whether to dump oops log.
> * @read: The general read operation. Both of the function parameters
> * @size and @offset are relative value to storage.
> * On success, the number of bytes should be returned, others
> * means error.
> - * @write: The same as @read.
> + * @write: The same as @read, but -EBUSY means try to write again later.
> * @panic_write:The write operation only used for panic case. It's optional
> * if you do not care panic log. The parameters and return value
> * are the same as @read.
> @@ -34,6 +35,7 @@ struct psz_info {
> unsigned long total_size;
> unsigned long oops_size;
> unsigned long pmsg_size;
> + unsigned long console_size;
> int dump_oops;
> psz_read_op read;
> psz_write_op write;
>

--
WeiXiong Liao

2020-05-07 19:48:14

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v3 00/11] pstore: mtd: support crash log to block and mtd device

On Wed, Mar 25, 2020 at 04:54:55PM +0800, WeiXiong Liao wrote:
> [PATCH v3]:
> 1. patch 1~10: a lot of improvements according to Kees Cook <[email protected]>
> including rename module, typo, reorder, rewrite document, bugs
> and so on.
> 2. patch 11: rename funtions of pstore/blk and update document.

This is looking good. I'm going to update it for the recent max_reason
series, and make some tweaks here and there and send out a v4 tomorrow
to see what you think. Thanks for your patience!

--
Kees Cook

2020-05-08 06:09:04

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v3 02/11] pstore/blk: new support logger for block devices

On Wed, Mar 25, 2020 at 04:54:57PM +0800, WeiXiong Liao wrote:
> pstore/blk is similar to pstore/ram, but dump log to block device
> rather than persistent ram.
> [...]
> +int psblk_register_blkdev(unsigned int major, psblk_panic_write_op panic_write)
> +{
> + struct block_device *bdev;
> + struct psblk_device dev = {0};
> + struct bdev_info *binfo;
> + int ret = -ENODEV;
> + void *holder = blkdev;
> +
> + binfo = psblk_get_bdev_info();
> + if (IS_ERR(binfo))
> + return PTR_ERR(binfo);
> +
> + /* only allow driver matching the @blkdev */
> + if (!binfo->devt || MAJOR(binfo->devt) != major) {
> + pr_debug("invalid major %u (expect %u)\n",
> + major, MAJOR(binfo->devt));
> + return -ENODEV;
> + }
> +
> + /* hold bdev exclusively */
> + bdev = psblk_get_bdev(holder);
> + if (IS_ERR(bdev)) {
> + pr_err("failed to open '%s'!\n", blkdev);
> + return PTR_ERR(bdev);
> + }
> +
> + /* psblk_bdev must be assigned before register to pstore/blk */
> + psblk_bdev = bdev;
> + blkdev_panic_write = panic_write;
> +
> + dev.total_size = psblk_bdev_size(bdev);
> + dev.panic_write = panic_write ? psblk_blk_panic_write : NULL;
> + dev.read = psblk_generic_blk_read;
> + dev.write = psblk_generic_blk_write;
> +
> + ret = psblk_register_do(&dev);
> + if (ret)
> + goto err_put_bdev;
> +
> + pr_info("using '%s'\n", blkdev);
> + return 0;
> +
> +err_put_bdev:
> + psblk_bdev = NULL;
> + blkdev_panic_write = NULL;
> + psblk_put_bdev(bdev, holder);
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(psblk_register_blkdev);

I've gotten this series refactored on top of current pstore, and I've
been making various bikeshed changes to names, etc, and as I went to go
start testing, I realized that nothing actually uses
psblk_register_blkdev().

It seems like it should be possible to just start using this on any
block device of the user's choosing. I assume the idea is to allow for
drivers to register panic_write handlers, but even without that, it'd be
nice to just be able to test this with something like /dev/loop0.

What's your thinking on how this would happen? It seems like if
pstore/blk uses pstore/zone, and mtdpstore uses pstore/blk, there should
be a blkoops that uses pstore/blk too? I guess I need to learn a bit
more about how block device probing works so pstore/blk can notice
devices as they're brought online, etc.

--
Kees Cook

2020-05-08 13:16:02

by WeiXiong Liao

[permalink] [raw]
Subject: Re: [PATCH v3 02/11] pstore/blk: new support logger for block devices

hi Kees Cook,

On 2020/5/8 PM 2:07, Kees Cook wrote:
> On Wed, Mar 25, 2020 at 04:54:57PM +0800, WeiXiong Liao wrote:
>> pstore/blk is similar to pstore/ram, but dump log to block device
>> rather than persistent ram.
>> [...]
>> +int psblk_register_blkdev(unsigned int major, psblk_panic_write_op panic_write)
>> +{
>> + struct block_device *bdev;
>> + struct psblk_device dev = {0};
>> + struct bdev_info *binfo;
>> + int ret = -ENODEV;
>> + void *holder = blkdev;
>> +
>> + binfo = psblk_get_bdev_info();
>> + if (IS_ERR(binfo))
>> + return PTR_ERR(binfo);
>> +
>> + /* only allow driver matching the @blkdev */
>> + if (!binfo->devt || MAJOR(binfo->devt) != major) {
>> + pr_debug("invalid major %u (expect %u)\n",
>> + major, MAJOR(binfo->devt));
>> + return -ENODEV;
>> + }
>> +
>> + /* hold bdev exclusively */
>> + bdev = psblk_get_bdev(holder);
>> + if (IS_ERR(bdev)) {
>> + pr_err("failed to open '%s'!\n", blkdev);
>> + return PTR_ERR(bdev);
>> + }
>> +
>> + /* psblk_bdev must be assigned before register to pstore/blk */
>> + psblk_bdev = bdev;
>> + blkdev_panic_write = panic_write;
>> +
>> + dev.total_size = psblk_bdev_size(bdev);
>> + dev.panic_write = panic_write ? psblk_blk_panic_write : NULL;
>> + dev.read = psblk_generic_blk_read;
>> + dev.write = psblk_generic_blk_write;
>> +
>> + ret = psblk_register_do(&dev);
>> + if (ret)
>> + goto err_put_bdev;
>> +
>> + pr_info("using '%s'\n", blkdev);
>> + return 0;
>> +
>> +err_put_bdev:
>> + psblk_bdev = NULL;
>> + blkdev_panic_write = NULL;
>> + psblk_put_bdev(bdev, holder);
>> + return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(psblk_register_blkdev);
>
> I've gotten this series refactored on top of current pstore, and I've
> been making various bikeshed changes to names, etc, and as I went to go
> start testing, I realized that nothing actually uses
> psblk_register_blkdev().
>
> It seems like it should be possible to just start using this on any
> block device of the user's choosing. I assume the idea is to allow for
> drivers to register panic_write handlers, but even without that, it'd be
> nice to just be able to test this with something like /dev/loop0.
>

Yes. psblk_register_blkdev() is there for block drivers to register
panic_write()
handlers. The panic_wrire() is used only when panic occurs. Not only the
panic
log, but also all data on dirty zones. I implement the panic_write() of mmc
and nand on the platform of Allwinner, but I think it is not ready to
submit to
community.

All other front-ends but dmesg for panic are available since pstore/blk
provides
the general write/read through IO stack. That's why /dev/loop0 seemed to
works well.

> What's your thinking on how this would happen? It seems like if
> pstore/blk uses pstore/zone, and mtdpstore uses pstore/blk, there should
> be a blkoops that uses pstore/blk too? I guess I need to learn a bit> more about how block device probing works so pstore/blk can notice
> devices as they're brought online, etc.
>

pstore/blk provides all user options and register function for device
drivers.
The mtdpstore is the implementation case using pstore/blk. How about
'mmc_pstore' for mmc and *_pstore for others block device?

I guess I need to learn more about how pstore/blk can notice devices too.
I think pstore/blk can be better if block device can do like mtd device that
not only notifies but also provides generic panic_write().

--
WeiXiong Liao