2023-11-24 22:21:33

by Mukesh Ojha

[permalink] [raw]
Subject: [Patch v6 00/12] Add Qualcomm Minidump driver related support

Hi,

Just for information,
We have recently presented at LPC2023
Abstract and PDF here:
https://lpc.events/event/17/contributions/1468/

Video:
07:09:00 hrs at
https://www.youtube.com/watch?v=68EBBgEltXA

This series should be applied on top of below series of some SCM patches sent here.
https://lore.kernel.org/lkml/[email protected]/

Patch 1-2 enable support to enable multiple download mode
Patch 3 deals in detail documentation on minidump.
Patch 4-6 refactors minidump existing layout and separate it from remoteproc files.
Patch 8 is the Qualcomm apss minidump driver.
Patch 10-12 Enable support to reserve dynamic ramoops and the support to
register ramoops region with minidump.

Changes in v6:
- Accumalated the feedback received on v5 and rebase v5 versions in v6.
- Removed the exported function as there is no current users of them.
- Applied [Pavan.K] suggestion on caller/callee placement of dynamic ramoops reserve memory.
- Addressed [krzysztof] comment on sizeof() and to have qcom_apss_md_table_exit().
- Addressed [Bagas.S] comment on minidump doc.
- Tried to implement [Kees] suggestion in slight different way with callback registration
with ramoops instead of pstore core.

Change in rebase v5: https://lore.kernel.org/lkml/[email protected]/
- Rebased it on latest tag available on linux-next
- Added missed Poovendhan sign-off on 15/17 and tested-by tag from
Kathiravan. Thanks to him for testing and reminding me of missing sign-off.

Changes in v5: https://lore.kernel.org/lkml/[email protected]/
- On suggestion from Pavan.k, to have single function call for minidump collection
from remoteproc driver, separated the logic to have separate minidump file called
qcom_rproc_minidump.c and also renamed the function from qcom_minidump() to
qcom_rproc_minidump(); however, dropped his suggestion about rework on lazy deletion
during region unregister in this series, will pursue it in next series.

- To simplify the minidump driver, removed the complication for frontend and different
backend from Greg suggestion, will pursue this once main driver gets mainlined.

- Move the dynamic ramoops region allocation from Device tree approach to command line
approch with the introduction command line parsing and memblock reservation during
early boot up; Not added documentation about it yet, will add if it gets positive
response.

- Exporting linux banner from kernel to make minidump build also as module, however,
minidump is a debug module and should be kernel built to get most debug information
from kernel.

- Tried to address comments given on dload patch series.

Changes in v4: https://lore.kernel.org/lkml/[email protected]/
- Redesigned the driver and divided the driver into front end and backend (smem) so
that any new backend can be attached easily to avoid code duplication.
- Patch reordering as per the driver and subsystem to easier review of the code.
- Removed minidump specific code from remoteproc to minidump smem based driver.
- Enabled the all the driver as modules.
- Address comments made on documentation and yaml and Device tree file [Krzysztof/Konrad]
- Address comments made qcom_pstore_minidump driver and given its Device tree
same set of properties as ramoops. [Luca/Kees]
- Added patch for MAINTAINER file.
- Include defconfig change as one patch as per [Krzysztof] suggestion.
- Tried to remove the redundant file scope variables from the module as per [Krzysztof] suggestion.
- Addressed comments made on dload mode patch v6 version
https://lore.kernel.org/lkml/[email protected]/

Changes in v3: https://lore.kernel.org/lkml/[email protected]/
- Addressed most of the comments by Srini on v2 and refactored the minidump driver.
- Added platform device support
- Unregister region support.
- Added update region for clients.
- Added pending region support.
- Modified the documentation guide accordingly.
- Added qcom_pstore_ramdump client driver which happen to add ramoops platform
device and also registers ramoops region with minidump.
- Added download mode patch series with this minidump series.
https://lore.kernel.org/lkml/[email protected]/

Changes in v2: https://lore.kernel.org/lkml/[email protected]/
- Addressed review comment made by [quic_tsoni/bmasney] to add documentation.
- Addressed comments made by [srinivas.kandagatla]
- Dropped pstore 6/6 from the last series, till i get conclusion to get pstore
region in minidump.
- Fixed issue reported by kernel test robot.

Changes in v1: https://lore.kernel.org/lkml/[email protected]/

Minidump is a best effort mechanism to collect useful and predefined data
for first level of debugging on end user devices running on Qualcomm SoCs.
It is built on the premise that System on Chip (SoC) or subsystem part of
SoC crashes, due to a range of hardware and software bugs.

Qualcomm devices in engineering mode provides a mechanism for generating
full system ramdumps for post mortem debugging. But in some cases it's
however not feasible to capture the entire content of RAM. The minidump
mechanism provides the means for selecting which snippets should be
included in the ramdump.

The core of SMEM based minidump feature is part of Qualcomm's boot
firmware code. It initializes shared memory (SMEM), which is a part of
DDR and allocates a small section of SMEM to minidump table i.e also
called global table of content (G-ToC). Each subsystem (APSS, ADSP, ...)
has their own table of segments to be included in the minidump and all
get their reference from G-ToC. Each segment/region has some details
like name, physical address and it's size etc. and it could be anywhere
scattered in the DDR.

Existing upstream Qualcomm remoteproc driver[1] already supports SMEM
based minidump feature for remoteproc instances like ADSP, MODEM, ...
where predefined selective segments of subsystem region can be dumped
as part of coredump collection which generates smaller size artifacts
compared to complete coredump of subsystem on crash.

[1]
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/remoteproc/qcom_common.c#n142

In addition to managing and querying the APSS minidump description,
the Linux driver maintains a ELF header in a segment. This segment
gets updated with section/program header whenever a new entry gets
registered.

Testing of these patches has been done on sm8450 target after enabling config like
CONFIG_PSTORE_RAM/CONFIG_PSTORE_CONSOLE and once the device boots up.

echo mini > /sys/module/qcom_scm/parameters/download_mode

Try crashing it via devmem2 0xf11c000(this is known to create xpu violation and
and put the device in download mode) on command prompt.

Default storage type is set to via USB, so minidump would be downloaded with the
help of x86_64 machine (running PCAT tool) attached to Qualcomm device which has
backed minidump boot firmware support.

This will make the device go to download mode and collect the minidump on to the
attached x86 machine running the Qualcomm PCAT tool(This comes as part Qualcomm
package manager kit).

After that we will see a bunch of predefined registered region as binary blobs files
starts with md_* downloaded on the x86 machine on given location in PCAT tool from
the target device, more about this can be found in qualcomm minidump guide patch.

Mukesh Ojha (12):
firmware: qcom_scm: Refactor code to support multiple dload mode
firmware/qcom: qcom_scm: Add multiple download mode support
docs: qcom: Add qualcomm minidump guide
soc: qcom: Add qcom_rproc_minidump module
remoteproc: qcom_q6v5_pas: Use qcom_rproc_minidump()
remoteproc: qcom: Remove minidump related data from qcom_common.c
init: export linux_banner data variable
soc: qcom: Add Qualcomm APSS minidump kernel driver
MAINTAINERS: Add entry for minidump related files
pstore/ram: Add dynamic ramoops region support through commandline
pstore/ram: Add ramoops ready notifier support
soc: qcom: register ramoops region with APSS minidump

Documentation/admin-guide/index.rst | 1 +
Documentation/admin-guide/qcom_minidump.rst | 272 +++++++++++
Documentation/admin-guide/ramoops.rst | 7 +
MAINTAINERS | 10 +
drivers/firmware/qcom/Kconfig | 11 -
drivers/firmware/qcom/qcom_scm.c | 62 ++-
drivers/remoteproc/Kconfig | 1 +
drivers/remoteproc/qcom_common.c | 160 -------
drivers/remoteproc/qcom_q6v5_pas.c | 3 +-
drivers/soc/qcom/Kconfig | 23 +
drivers/soc/qcom/Makefile | 2 +
drivers/soc/qcom/qcom_minidump.c | 668 ++++++++++++++++++++++++++++
drivers/soc/qcom/qcom_minidump_internal.h | 74 +++
drivers/soc/qcom/qcom_rproc_minidump.c | 111 +++++
drivers/soc/qcom/smem.c | 20 +
fs/pstore/Kconfig | 15 +
fs/pstore/ram.c | 139 +++++-
include/linux/init.h | 3 +
include/linux/pstore_ram.h | 11 +
include/linux/soc/qcom/smem.h | 2 +
include/soc/qcom/qcom_minidump.h | 41 ++
init/main.c | 2 +
init/version-timestamp.c | 3 +
23 files changed, 1456 insertions(+), 185 deletions(-)
create mode 100644 Documentation/admin-guide/qcom_minidump.rst
create mode 100644 drivers/soc/qcom/qcom_minidump.c
create mode 100644 drivers/soc/qcom/qcom_minidump_internal.h
create mode 100644 drivers/soc/qcom/qcom_rproc_minidump.c
create mode 100644 include/soc/qcom/qcom_minidump.h

--
2.7.4


2023-11-24 22:21:37

by Mukesh Ojha

[permalink] [raw]
Subject: [Patch v6 01/12] firmware: qcom_scm: Refactor code to support multiple dload mode

Currently on Qualcomm SoC, download_mode is enabled if
CONFIG_QCOM_SCM_DOWNLOAD_MODE_DEFAULT is selected.

Refactor the code such that it supports multiple download
modes and drop CONFIG_QCOM_SCM_DOWNLOAD_MODE_DEFAULT config
instead, give interface to set the download mode from
module parameter.

Signed-off-by: Mukesh Ojha <[email protected]>
---
drivers/firmware/qcom/Kconfig | 11 --------
drivers/firmware/qcom/qcom_scm.c | 58 +++++++++++++++++++++++++++++++++-------
2 files changed, 49 insertions(+), 20 deletions(-)

diff --git a/drivers/firmware/qcom/Kconfig b/drivers/firmware/qcom/Kconfig
index 3f05d9854ddf..3bbbaef145ad 100644
--- a/drivers/firmware/qcom/Kconfig
+++ b/drivers/firmware/qcom/Kconfig
@@ -9,17 +9,6 @@ menu "Qualcomm firmware drivers"
config QCOM_SCM
tristate

-config QCOM_SCM_DOWNLOAD_MODE_DEFAULT
- bool "Qualcomm download mode enabled by default"
- depends on QCOM_SCM
- help
- A device with "download mode" enabled will upon an unexpected
- warm-restart enter a special debug mode that allows the user to
- "download" memory content over USB for offline postmortem analysis.
- The feature can be enabled/disabled on the kernel command line.
-
- Say Y here to enable "download mode" by default.
-
config QCOM_QSEECOM
bool "Qualcomm QSEECOM interface driver"
depends on QCOM_SCM=y
diff --git a/drivers/firmware/qcom/qcom_scm.c b/drivers/firmware/qcom/qcom_scm.c
index 87bcd5c02f2b..c5878c38f378 100644
--- a/drivers/firmware/qcom/qcom_scm.c
+++ b/drivers/firmware/qcom/qcom_scm.c
@@ -15,6 +15,7 @@
#include <linux/init.h>
#include <linux/interconnect.h>
#include <linux/interrupt.h>
+#include <linux/kstrtox.h>
#include <linux/module.h>
#include <linux/of.h>
#include <linux/of_address.h>
@@ -27,8 +28,7 @@

#include "qcom_scm.h"

-static bool download_mode = IS_ENABLED(CONFIG_QCOM_SCM_DOWNLOAD_MODE_DEFAULT);
-module_param(download_mode, bool, 0);
+static u32 download_mode;

struct qcom_scm {
struct device *dev;
@@ -132,6 +132,11 @@ static const char * const qcom_scm_convention_names[] = {
[SMC_CONVENTION_LEGACY] = "smc legacy",
};

+static const char * const download_mode_name[] = {
+ [QCOM_DLOAD_NODUMP] = "off",
+ [QCOM_DLOAD_FULLDUMP] = "full",
+};
+
static struct qcom_scm *__scm;

static int qcom_scm_clk_enable(void)
@@ -529,19 +534,18 @@ static int __qcom_scm_set_dload_mode(struct device *dev, bool enable)
return qcom_scm_call_atomic(__scm->dev, &desc, NULL);
}

-static void qcom_scm_set_download_mode(bool enable)
+static void qcom_scm_set_download_mode(u32 download_mode)
{
- u32 val = enable ? QCOM_DLOAD_FULLDUMP : QCOM_DLOAD_NODUMP;
int ret = 0;

if (__scm->dload_mode_addr) {
ret = qcom_scm_io_rmw(__scm->dload_mode_addr,
QCOM_DLOAD_MASK,
- FIELD_PREP(QCOM_DLOAD_MASK, val));
+ FIELD_PREP(QCOM_DLOAD_MASK, download_mode));
} else if (__qcom_scm_is_call_available(__scm->dev,
QCOM_SCM_SVC_BOOT,
QCOM_SCM_BOOT_SET_DLOAD_MODE)) {
- ret = __qcom_scm_set_dload_mode(__scm->dev, enable);
+ ret = __qcom_scm_set_dload_mode(__scm->dev, !!download_mode);
} else {
dev_err(__scm->dev,
"No available mechanism for setting download mode\n");
@@ -1843,6 +1847,42 @@ static irqreturn_t qcom_scm_irq_handler(int irq, void *data)
return IRQ_HANDLED;
}

+static int get_download_mode(char *buffer, const struct kernel_param *kp)
+{
+ if (download_mode >= ARRAY_SIZE(download_mode_name))
+ return sysfs_emit(buffer, "unknown mode\n");
+
+ return sysfs_emit(buffer, "%s\n", download_mode_name[download_mode]);
+}
+
+static int set_download_mode(const char *val, const struct kernel_param *kp)
+{
+ u32 old = download_mode;
+ int ret;
+
+ ret = sysfs_match_string(download_mode_name, val);
+ if (ret < 0) {
+ download_mode = old;
+ pr_err("qcom_scm: unknown download mode: %s\n", val);
+ return -EINVAL;
+ }
+
+ download_mode = ret;
+ if (__scm)
+ qcom_scm_set_download_mode(download_mode);
+
+ return 0;
+}
+
+static const struct kernel_param_ops download_mode_param_ops = {
+ .get = get_download_mode,
+ .set = set_download_mode,
+};
+
+module_param_cb(download_mode, &download_mode_param_ops, NULL, 0644);
+MODULE_PARM_DESC(download_mode,
+ "download mode: off/full are acceptable values");
+
static int qcom_scm_probe(struct platform_device *pdev)
{
struct qcom_scm *scm;
@@ -1907,12 +1947,12 @@ static int qcom_scm_probe(struct platform_device *pdev)
__get_convention();

/*
- * If requested enable "download mode", from this point on warmboot
+ * If "download mode" is requested, from this point on warmboot
* will cause the boot stages to enter download mode, unless
* disabled below by a clean shutdown/reboot.
*/
if (download_mode)
- qcom_scm_set_download_mode(true);
+ qcom_scm_set_download_mode(download_mode);


/*
@@ -1940,7 +1980,7 @@ static int qcom_scm_probe(struct platform_device *pdev)
static void qcom_scm_shutdown(struct platform_device *pdev)
{
/* Clean shutdown, disable download mode to allow normal restart */
- qcom_scm_set_download_mode(false);
+ qcom_scm_set_download_mode(QCOM_DLOAD_NODUMP);
}

static const struct of_device_id qcom_scm_dt_match[] = {
--
2.7.4

2023-11-24 22:21:42

by Mukesh Ojha

[permalink] [raw]
Subject: [Patch v6 02/12] firmware/qcom: qcom_scm: Add multiple download mode support

Currently, scm driver only supports full dump when download
mode is selected. Add support to enable minidump as well as
enable it along with fulldump.

Signed-off-by: Mukesh Ojha <[email protected]>
---
drivers/firmware/qcom/qcom_scm.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/firmware/qcom/qcom_scm.c b/drivers/firmware/qcom/qcom_scm.c
index c5878c38f378..70bb59992fb9 100644
--- a/drivers/firmware/qcom/qcom_scm.c
+++ b/drivers/firmware/qcom/qcom_scm.c
@@ -123,6 +123,8 @@ static const u8 qcom_scm_cpu_warm_bits[QCOM_SCM_BOOT_MAX_CPUS] = {
enum qcom_dload_mode {
QCOM_DLOAD_NODUMP = 0,
QCOM_DLOAD_FULLDUMP = 1,
+ QCOM_DLOAD_MINIDUMP = 2,
+ QCOM_DLOAD_BOTHDUMP = 3,
};

static const char * const qcom_scm_convention_names[] = {
@@ -135,6 +137,8 @@ static const char * const qcom_scm_convention_names[] = {
static const char * const download_mode_name[] = {
[QCOM_DLOAD_NODUMP] = "off",
[QCOM_DLOAD_FULLDUMP] = "full",
+ [QCOM_DLOAD_MINIDUMP] = "mini",
+ [QCOM_DLOAD_BOTHDUMP] = "full,mini",
};

static struct qcom_scm *__scm;
@@ -1881,7 +1885,7 @@ static const struct kernel_param_ops download_mode_param_ops = {

module_param_cb(download_mode, &download_mode_param_ops, NULL, 0644);
MODULE_PARM_DESC(download_mode,
- "download mode: off/full are acceptable values");
+ "download mode: off/full/mini/full,mini are acceptable values");

static int qcom_scm_probe(struct platform_device *pdev)
{
--
2.7.4

2023-11-24 22:21:51

by Mukesh Ojha

[permalink] [raw]
Subject: [Patch v6 04/12] soc: qcom: Add qcom_rproc_minidump module

Add qcom_rproc_minidump module in a preparation to remove
minidump specific code from driver/remoteproc/qcom_common.c
and provide needed exported API, this as well helps to
abstract minidump specific data layout from qualcomm's
remoteproc driver.

It is just a copying of qcom_minidump() functionality from
driver/remoteproc/qcom_common.c into a separate file under
qcom_rproc_minidump().

Signed-off-by: Mukesh Ojha <[email protected]>
---
drivers/soc/qcom/Kconfig | 10 +++
drivers/soc/qcom/Makefile | 1 +
drivers/soc/qcom/qcom_minidump_internal.h | 64 +++++++++++++++++
drivers/soc/qcom/qcom_rproc_minidump.c | 111 ++++++++++++++++++++++++++++++
include/soc/qcom/qcom_minidump.h | 23 +++++++
5 files changed, 209 insertions(+)
create mode 100644 drivers/soc/qcom/qcom_minidump_internal.h
create mode 100644 drivers/soc/qcom/qcom_rproc_minidump.c
create mode 100644 include/soc/qcom/qcom_minidump.h

diff --git a/drivers/soc/qcom/Kconfig b/drivers/soc/qcom/Kconfig
index b3634e10f6f5..e507a317c74f 100644
--- a/drivers/soc/qcom/Kconfig
+++ b/drivers/soc/qcom/Kconfig
@@ -254,4 +254,14 @@ config QCOM_INLINE_CRYPTO_ENGINE
tristate
select QCOM_SCM

+config QCOM_RPROC_MINIDUMP
+ tristate "QCOM Remoteproc Minidump Support"
+ depends on ARCH_QCOM || COMPILE_TEST
+ depends on QCOM_SMEM
+ help
+ Enablement of core minidump feature is controlled from boot firmware
+ side, so if it is enabled from firmware, this config allow linux to
+ query predefined minidump segments associated with the remote processor
+ and check its validity and end up collecting the dump on remote processor
+ crash during its recovery.
endmenu
diff --git a/drivers/soc/qcom/Makefile b/drivers/soc/qcom/Makefile
index bbca2e1e55bb..838528e7e30a 100644
--- a/drivers/soc/qcom/Makefile
+++ b/drivers/soc/qcom/Makefile
@@ -32,3 +32,4 @@ obj-$(CONFIG_QCOM_KRYO_L2_ACCESSORS) += kryo-l2-accessors.o
obj-$(CONFIG_QCOM_ICC_BWMON) += icc-bwmon.o
qcom_ice-objs += ice.o
obj-$(CONFIG_QCOM_INLINE_CRYPTO_ENGINE) += qcom_ice.o
+obj-$(CONFIG_QCOM_RPROC_MINIDUMP) += qcom_rproc_minidump.o
diff --git a/drivers/soc/qcom/qcom_minidump_internal.h b/drivers/soc/qcom/qcom_minidump_internal.h
new file mode 100644
index 000000000000..71709235b196
--- /dev/null
+++ b/drivers/soc/qcom/qcom_minidump_internal.h
@@ -0,0 +1,64 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (c) 2023 Qualcomm Innovation Center, Inc. All rights reserved.
+ */
+
+#ifndef _QCOM_MINIDUMP_INTERNAL_H_
+#define _QCOM_MINIDUMP_INTERNAL_H_
+
+#define MAX_NUM_OF_SS 10
+#define MAX_REGION_NAME_LENGTH 16
+#define SBL_MINIDUMP_SMEM_ID 602
+#define MINIDUMP_REGION_VALID ('V' << 24 | 'A' << 16 | 'L' << 8 | 'I' << 0)
+#define MINIDUMP_SS_ENCR_DONE ('D' << 24 | 'O' << 16 | 'N' << 8 | 'E' << 0)
+#define MINIDUMP_SS_ENABLED ('E' << 24 | 'N' << 16 | 'B' << 8 | 'L' << 0)
+
+/**
+ * struct minidump_region - Minidump region
+ * @name : Name of the region to be dumped
+ * @seq_num: : Use to differentiate regions with same name.
+ * @valid : This entry to be dumped (if set to 1)
+ * @address : Physical address of region to be dumped
+ * @size : Size of the region
+ */
+struct minidump_region {
+ char name[MAX_REGION_NAME_LENGTH];
+ __le32 seq_num;
+ __le32 valid;
+ __le64 address;
+ __le64 size;
+};
+
+/**
+ * struct minidump_subsystem - Subsystem's SMEM Table of content
+ * @status : Subsystem toc init status
+ * @enabled : if set to 1, this region would be copied during coredump
+ * @encryption_status: Encryption status for this subsystem
+ * @encryption_required : Decides to encrypt the subsystem regions or not
+ * @region_count : Number of regions added in this subsystem toc
+ * @regions_baseptr : regions base pointer of the subsystem
+ */
+struct minidump_subsystem {
+ __le32 status;
+ __le32 enabled;
+ __le32 encryption_status;
+ __le32 encryption_required;
+ __le32 region_count;
+ __le64 regions_baseptr;
+};
+
+/**
+ * struct minidump_global_toc - Global Table of Content
+ * @status : Global Minidump init status
+ * @md_revision : Minidump revision
+ * @enabled : Minidump enable status
+ * @subsystems : Array of subsystems toc
+ */
+struct minidump_global_toc {
+ __le32 status;
+ __le32 md_revision;
+ __le32 enabled;
+ struct minidump_subsystem subsystems[MAX_NUM_OF_SS];
+};
+
+#endif /* _QCOM_MINIDUMP_INTERNAL_H_ */
diff --git a/drivers/soc/qcom/qcom_rproc_minidump.c b/drivers/soc/qcom/qcom_rproc_minidump.c
new file mode 100644
index 000000000000..9bc84cc2536f
--- /dev/null
+++ b/drivers/soc/qcom/qcom_rproc_minidump.c
@@ -0,0 +1,111 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2023 Qualcomm Innovation Center, Inc. All rights reserved.
+ */
+#include <linux/io.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/remoteproc.h>
+#include <linux/soc/qcom/smem.h>
+#include <linux/string.h>
+#include <soc/qcom/qcom_minidump.h>
+
+#include "qcom_minidump_internal.h"
+
+static void qcom_minidump_cleanup(struct rproc *rproc)
+{
+ struct rproc_dump_segment *entry, *tmp;
+
+ list_for_each_entry_safe(entry, tmp, &rproc->dump_segments, node) {
+ list_del(&entry->node);
+ kfree(entry->priv);
+ kfree(entry);
+ }
+}
+
+static int qcom_add_minidump_segments(struct rproc *rproc, struct minidump_subsystem *subsystem,
+ void (*rproc_dumpfn_t)(struct rproc *rproc, struct rproc_dump_segment *segment,
+ void *dest, size_t offset, size_t size))
+{
+ struct minidump_region __iomem *ptr;
+ struct minidump_region region;
+ int seg_cnt, i;
+ dma_addr_t da;
+ size_t size;
+ char *name;
+
+ if (WARN_ON(!list_empty(&rproc->dump_segments))) {
+ dev_err(&rproc->dev, "dump segment list already populated\n");
+ return -EUCLEAN;
+ }
+
+ seg_cnt = le32_to_cpu(subsystem->region_count);
+ ptr = ioremap((unsigned long)le64_to_cpu(subsystem->regions_baseptr),
+ seg_cnt * sizeof(struct minidump_region));
+ if (!ptr)
+ return -EFAULT;
+
+ for (i = 0; i < seg_cnt; i++) {
+ memcpy_fromio(&region, ptr + i, sizeof(region));
+ if (le32_to_cpu(region.valid) == MINIDUMP_REGION_VALID) {
+ name = kstrndup(region.name, MAX_REGION_NAME_LENGTH - 1, GFP_KERNEL);
+ if (!name) {
+ iounmap(ptr);
+ return -ENOMEM;
+ }
+ da = le64_to_cpu(region.address);
+ size = le64_to_cpu(region.size);
+ rproc_coredump_add_custom_segment(rproc, da, size, rproc_dumpfn_t, name);
+ }
+ }
+
+ iounmap(ptr);
+ return 0;
+}
+
+void qcom_rproc_minidump(struct rproc *rproc, unsigned int minidump_id,
+ void (*rproc_dumpfn_t)(struct rproc *rproc,
+ struct rproc_dump_segment *segment, void *dest, size_t offset,
+ size_t size))
+{
+ int ret;
+ struct minidump_subsystem *subsystem;
+ struct minidump_global_toc *toc;
+
+ /* Get Global minidump ToC*/
+ toc = qcom_smem_get(QCOM_SMEM_HOST_ANY, SBL_MINIDUMP_SMEM_ID, NULL);
+
+ /* check if global table pointer exists and init is set */
+ if (IS_ERR(toc) || !toc->status) {
+ dev_err(&rproc->dev, "Minidump TOC not found in SMEM\n");
+ return;
+ }
+
+ /* Get subsystem table of contents using the minidump id */
+ subsystem = &toc->subsystems[minidump_id];
+
+ /**
+ * Collect minidump if SS ToC is valid and segment table
+ * is initialized in memory and encryption status is set.
+ */
+ if (subsystem->regions_baseptr == 0 ||
+ le32_to_cpu(subsystem->status) != 1 ||
+ le32_to_cpu(subsystem->enabled) != MINIDUMP_SS_ENABLED ||
+ le32_to_cpu(subsystem->encryption_status) != MINIDUMP_SS_ENCR_DONE) {
+ dev_err(&rproc->dev, "Minidump not ready, skipping\n");
+ return;
+ }
+
+ ret = qcom_add_minidump_segments(rproc, subsystem, rproc_dumpfn_t);
+ if (ret) {
+ dev_err(&rproc->dev, "Failed with error: %d while adding minidump entries\n", ret);
+ goto clean_minidump;
+ }
+ rproc_coredump_using_sections(rproc);
+clean_minidump:
+ qcom_minidump_cleanup(rproc);
+}
+EXPORT_SYMBOL_GPL(qcom_rproc_minidump);
+
+MODULE_DESCRIPTION("Qualcomm remoteproc minidump(smem) helper module");
+MODULE_LICENSE("GPL");
diff --git a/include/soc/qcom/qcom_minidump.h b/include/soc/qcom/qcom_minidump.h
new file mode 100644
index 000000000000..cd87caef919d
--- /dev/null
+++ b/include/soc/qcom/qcom_minidump.h
@@ -0,0 +1,23 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (c) 2023 Qualcomm Innovation Center, Inc. All rights reserved.
+ */
+
+#ifndef _QCOM_MINIDUMP_H_
+#define _QCOM_MINIDUMP_H_
+
+struct rproc;
+struct rproc_dump_segment;
+
+#if IS_ENABLED(CONFIG_QCOM_RPROC_MINIDUMP)
+void qcom_rproc_minidump(struct rproc *rproc, unsigned int minidump_id,
+ void (*rproc_dumpfn_t)(struct rproc *rproc,
+ struct rproc_dump_segment *segment, void *dest, size_t offset,
+ size_t size));
+#else
+static inline void qcom_rproc_minidump(struct rproc *rproc, unsigned int minidump_id,
+ void (*rproc_dumpfn_t)(struct rproc *rproc,
+ struct rproc_dump_segment *segment, void *dest, size_t offset,
+ size_t size)) { }
+#endif /* CONFIG_QCOM_RPROC_MINIDUMP */
+#endif /* _QCOM_MINIDUMP_H_ */
--
2.7.4

2023-11-24 22:21:59

by Mukesh Ojha

[permalink] [raw]
Subject: [Patch v6 05/12] remoteproc: qcom_q6v5_pas: Use qcom_rproc_minidump()

Now, as all the minidump specific data structure is moved to
minidump specific files and implementation wise qcom_rproc_minidump()
and qcom_minidump() exactly same and the name qcom_rproc_minidump
make more sense as it happen to collect the minidump for the
remoteproc processors. So, let's use qcom_rproc_minidump() and
we will be removing qcom_minidump() and minidump related stuff
from driver/remoteproc/qcom_common.c .

Signed-off-by: Mukesh Ojha <[email protected]>
---
drivers/remoteproc/Kconfig | 1 +
drivers/remoteproc/qcom_q6v5_pas.c | 3 ++-
2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/remoteproc/Kconfig b/drivers/remoteproc/Kconfig
index 48845dc8fa85..cea960749e2c 100644
--- a/drivers/remoteproc/Kconfig
+++ b/drivers/remoteproc/Kconfig
@@ -166,6 +166,7 @@ config QCOM_PIL_INFO

config QCOM_RPROC_COMMON
tristate
+ select QCOM_RPROC_MINIDUMP

config QCOM_Q6V5_COMMON
tristate
diff --git a/drivers/remoteproc/qcom_q6v5_pas.c b/drivers/remoteproc/qcom_q6v5_pas.c
index 913a5d2068e8..a39fa75a7162 100644
--- a/drivers/remoteproc/qcom_q6v5_pas.c
+++ b/drivers/remoteproc/qcom_q6v5_pas.c
@@ -25,6 +25,7 @@
#include <linux/soc/qcom/mdt_loader.h>
#include <linux/soc/qcom/smem.h>
#include <linux/soc/qcom/smem_state.h>
+#include <soc/qcom/qcom_minidump.h>

#include "qcom_common.h"
#include "qcom_pil_info.h"
@@ -131,7 +132,7 @@ static void adsp_minidump(struct rproc *rproc)
if (rproc->dump_conf == RPROC_COREDUMP_DISABLED)
return;

- qcom_minidump(rproc, adsp->minidump_id, adsp_segment_dump);
+ qcom_rproc_minidump(rproc, adsp->minidump_id, adsp_segment_dump);
}

static int adsp_pds_enable(struct qcom_adsp *adsp, struct device **pds,
--
2.7.4

2023-11-24 22:22:18

by Mukesh Ojha

[permalink] [raw]
Subject: [Patch v6 07/12] init: export linux_banner data variable

Some debug loadable module like minidump is interested in knowing
the kernel version against which it is being build. Let's export
linux_banner.

Signed-off-by: Mukesh Ojha <[email protected]>
---
include/linux/init.h | 3 +++
init/version-timestamp.c | 3 +++
2 files changed, 6 insertions(+)

diff --git a/include/linux/init.h b/include/linux/init.h
index 01b52c9c7526..01a28b367841 100644
--- a/include/linux/init.h
+++ b/include/linux/init.h
@@ -145,6 +145,9 @@ extern char *saved_command_line;
extern unsigned int saved_command_line_len;
extern unsigned int reset_devices;

+/* Defined in init/version-timestamp.c */
+extern const char linux_banner[];
+
/* used by init/main.c */
void setup_arch(char **);
void prepare_namespace(void);
diff --git a/init/version-timestamp.c b/init/version-timestamp.c
index 043cbf80a766..a48f2c19e5d7 100644
--- a/init/version-timestamp.c
+++ b/init/version-timestamp.c
@@ -6,6 +6,7 @@
#include <linux/refcount.h>
#include <linux/uts.h>
#include <linux/utsname.h>
+#include <linux/init.h>

struct uts_namespace init_uts_ns = {
.ns.count = REFCOUNT_INIT(2),
@@ -28,3 +29,5 @@ struct uts_namespace init_uts_ns = {
const char linux_banner[] =
"Linux version " UTS_RELEASE " (" LINUX_COMPILE_BY "@"
LINUX_COMPILE_HOST ") (" LINUX_COMPILER ") " UTS_VERSION "\n";
+
+EXPORT_SYMBOL_GPL(linux_banner);
--
2.7.4

2023-11-24 22:22:18

by Mukesh Ojha

[permalink] [raw]
Subject: [Patch v6 03/12] docs: qcom: Add qualcomm minidump guide

Add the qualcomm minidump guide for the users which tries to cover
the dependency, API use and the way to test and collect minidump
on Qualcomm supported SoCs.

Signed-off-by: Mukesh Ojha <[email protected]>
---
Documentation/admin-guide/index.rst | 1 +
Documentation/admin-guide/qcom_minidump.rst | 272 ++++++++++++++++++++++++++++
2 files changed, 273 insertions(+)
create mode 100644 Documentation/admin-guide/qcom_minidump.rst

diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst
index 43ea35613dfc..251d070486c2 100644
--- a/Documentation/admin-guide/index.rst
+++ b/Documentation/admin-guide/index.rst
@@ -120,6 +120,7 @@ configure specific aspects of kernel behavior to your liking.
perf-security
pm/index
pnp
+ qcom_minidump
rapidio
ras
rtc
diff --git a/Documentation/admin-guide/qcom_minidump.rst b/Documentation/admin-guide/qcom_minidump.rst
new file mode 100644
index 000000000000..b492f2b79639
--- /dev/null
+++ b/Documentation/admin-guide/qcom_minidump.rst
@@ -0,0 +1,272 @@
+Qualcomm minidump feature
+=========================
+
+Introduction
+------------
+
+Minidump is a best effort mechanism to collect useful and predefined
+data for first level of debugging on end user devices running on
+Qualcomm SoCs. It is built on the premise that System on Chip (SoC)
+or subsystem part of SoC crashes, due to a range of hardware and
+software bugs. Hence, the ability to collect accurate data is only
+a best-effort. The data collected could be invalid or corrupted, data
+collection itself could fail, and so on.
+
+Qualcomm devices in engineering mode provides a mechanism for generating
+full system RAM dumps for post-mortem debugging. But in some cases it's
+however not feasible to capture the entire content of RAM. The minidump
+mechanism provides the means for selected region should be included in
+the ramdump.
+
+
+::
+
+ +-----------------------------------------------+
+ | DDR +-------------+ |
+ | | SS0-ToC| |
+ | +----------------+ +----------------+ | |
+ | |Shared memory | | SS1-ToC| | |
+ | |(SMEM) | | | | |
+ | | | +-->|--------+ | | |
+ | |G-ToC | | | SS-ToC \ | | |
+ | |+-------------+ | | | +-----------+ | | |
+ | ||-------------| | | | |-----------| | | |
+ | || SS0-ToC | | | +-|<|SS1 region1| | | |
+ | ||-------------| | | | | |-----------| | | |
+ | || SS1-ToC |-|>+ | | |SS1 region2| | | |
+ | ||-------------| | | | |-----------| | | |
+ | || SS2-ToC | | | | | ... | | | |
+ | ||-------------| | | | |-----------| | | |
+ | || ... | | |-|<|SS1 regionN| | | |
+ | ||-------------| | | | |-----------| | | |
+ | || SSn-ToC | | | | +-----------+ | | |
+ | |+-------------+ | | | | | |
+ | | | | |----------------| | |
+ | | | +>| regionN | | |
+ | | | | |----------------| | |
+ | +----------------+ | | | | |
+ | | |----------------| | |
+ | +>| region1 | | |
+ | |----------------| | |
+ | | | | |
+ | |----------------|-+ |
+ | | region5 | |
+ | |----------------| |
+ | | | |
+ | Region information +----------------+ |
+ | +---------------+ |
+ | |region name | |
+ | |---------------| |
+ | |region address | |
+ | |---------------| |
+ | |region size | |
+ | +---------------+ |
+ +-----------------------------------------------+
+ G-ToC: Global table of contents
+ SS-ToC: Subsystem table of contents
+ SS0-SSn: Subsystem numbered from 0 to n
+
+It depends on SoC where the underlying firmware is keeping the
+minidump global table taking care of subsystem ToC part for
+minidump like for above diagram, it is for shared memory sitting
+in DDR and it is shared among various master however it is possible
+that this could be implemented via memory mapped regions but the
+general idea should remain same. Here, various subsystem could be
+DSP's like ADSP/CDSP/MODEM etc, along with Application processor
+(APSS) where Linux runs. DSP minidump gets collected when DSP's goes
+for recovery followed by a crash. The minidump part of code for
+that resides in ``qcom_rproc_minidump.c``.
+
+
+SMEM as backend
+----------------
+
+In this document, SMEM will be used as the backend implementation
+of minidump.
+
+The core of minidump feature is part of Qualcomm's boot firmware code.
+It initializes shared memory (SMEM), which is a part of DDR and
+allocates a small section of it to minidump table, i.e. also called
+global table of contents (G-ToC). Each subsystem (APSS, ADSP, ...) has
+its own table of segments to be included in the minidump, all
+references from a descriptor in SMEM (G-ToC). Each segment/region has
+some details like name, physical address and its size etc. and it
+could be anywhere scattered in the DDR.
+
+Qualcomm APSS Minidump kernel driver concept
+--------------------------------------------
+
+Qualcomm APSS minidump kernel driver adds the capability to add Linux
+region to be dumped as part of RAM dump collection. At the moment,
+shared memory driver creates platform device for minidump driver and
+give a means to APSS minidump to initialize itself on probe.
+
+This driver provides ``qcom_minidump_region_register`` and
+``qcom_minidump_region_unregister`` API's to register and unregister
+APSS minidump region. It also supports registration for the clients
+who came before minidump driver was initialized. It maintains pending
+list of clients who came before minidump and once minidump is initialized
+it registers them in one go.
+
+To simplify post-mortem debugging, driver creates and maintain an ELF
+header as first region that gets updated each time a new region gets
+registered.
+
+The solution supports extracting the RAM dump/minidump produced either
+over USB or stored to an attached storage device.
+
+Dependency of minidump kernel driver
+------------------------------------
+
+It is to note that whole of minidump depends on Qualcomm boot firmware
+whether it supports minidump or not. So, if the minidump SMEM ID is
+present in shared memory, it indicates that minidump is supported from
+boot firmware and it is possible to dump Linux (APSS) region as part
+of minidump collection.
+
+How a kernel client driver can register region with minidump
+------------------------------------------------------------
+
+Client driver can use ``qcom_minidump_region_register`` API's to register
+and ``qcom_minidump_region_unregister`` to unregister their region from
+minidump driver.
+
+Client needs to fill their region by filling ``qcom_minidump_region``
+structure object which consists of the region name, region's virtual
+and physical address and its size.
+
+Below, is one sample client driver snippet which tries to allocate a
+region from kernel heap of certain size and it writes a certain known
+pattern (that can help in verification after collection that we got
+the exact pattern, what we wrote) and registers it with minidump.
+
+ .. code-block:: c
+
+ #include <soc/qcom/qcom_minidump.h>
+ [...]
+
+
+ [... inside a function ...]
+ struct qcom_minidump_region region;
+
+ [...]
+
+ client_mem_region = kzalloc(region_size, GFP_KERNEL);
+ if (!client_mem_region)
+ return -ENOMEM;
+
+ [... Just write a pattern ...]
+ memset(client_mem_region, 0xAB, region_size);
+
+ [... Fill up the region object ...]
+ strlcpy(region.name, "REGION_A", sizeof(region.name));
+ region.virt_addr = client_mem_region;
+ region.phys_addr = virt_to_phys(client_mem_region);
+ region.size = region_size;
+
+ ret = qcom_minidump_region_register(&region);
+ if (ret < 0) {
+ pr_err("failed to add region in minidump: err: %d\n", ret);
+ return ret;
+ }
+
+ [...]
+
+
+Test
+----
+
+Existing Qualcomm devices already supports entire RAM dump (also called
+full dump) by writing appropriate value to Qualcomm's top control and
+status register (tcsr) in ``driver/firmware/qcom_scm.c`` .
+
+SCM device Tree bindings required to support download mode
+For example (sm8450) ::
+
+ / {
+
+ [...]
+
+ firmware {
+ scm: scm {
+ compatible = "qcom,scm-sm8450", "qcom,scm";
+ [... tcsr register ... ]
+ qcom,dload-mode = <&tcsr 0x13000>;
+
+ [...]
+ };
+ };
+
+ [...]
+
+ soc: soc@0 {
+
+ [...]
+
+ tcsr: syscon@1fc0000 {
+ compatible = "qcom,sm8450-tcsr", "syscon";
+ reg = <0x0 0x1fc0000 0x0 0x30000>;
+ };
+
+ [...]
+ };
+ [...]
+
+ };
+
+User of minidump can pass ``qcom_scm.download_mode="mini"`` to kernel
+commandline to set the current download mode to minidump.
+Similarly, ``"full"`` is passed to set the download mode to full dump
+where entire RAM dump will be collected while setting it ``"full,mini"``
+will collect minidump along with fulldump.
+
+Writing to sysfs node can also be used to set the mode to minidump::
+
+ echo "mini" > /sys/module/qcom_scm/parameter/download_mode
+
+Once the download mode is set, any kind of crash will make the device collect
+respective dump as per set download mode.
+
+Dump collection
+---------------
+::
+
+ +-----------+
+ | |
+ | | +------+
+ | | | |
+ | | +--+---+ Product(Qualcomm SoC)
+ +-----------+ |
+ |+++++++++++|<------------+
+ |+++++++++++| usb cable
+ +-----------+
+ x86_64 PC
+
+The solution supports a product running with Qualcomm SoC (where minidump)
+is supported from the firmware) connected to x86_64 host PC running PCAT
+tool. It supports downloading the minidump produced from product to the
+host PC over USB or to save the minidump to the product attached storage
+device(UFS/eMMC/SD Card) into minidump dedicated partition.
+
+By default, dumps are downloaded via USB to the attached x86_64 PC running
+PCAT (Qualcomm tool) software. Upon download, we will see a set of binary
+blobs starting with name ``md_*`` in PCAT configured directory in x86_64
+machine, so for above example from the client it will be ``md_REGION_A.BIN``.
+This binary blob depends on region content to determine whether it needs
+external parser support to get the content of the region, so for simple
+plain ASCII text we don't need any parsing and the content can be seen
+just opening the binary file.
+
+To collect the dump to attached storage type, one needs to write appropriate
+value to IMEM register, in that case dumps are collected in rawdump
+partition on the product device itself.
+
+One needs to read the entire rawdump partition and pull out content to
+save it onto the attached x86_64 machine over USB. Later, this rawdump
+can be passed to another tool (``dexter.exe`` [Qualcomm tool]) which
+converts this into the similar binary blobs which we have got it when
+download type was set to USB, i.e. a set of registered regions as blobs
+and their name starts with ``md_*``.
+
+Replacing the ``dexter.exe`` with some open source tool can be added as future
+scope of this document.
--
2.7.4

2023-11-24 22:22:41

by Mukesh Ojha

[permalink] [raw]
Subject: [Patch v6 06/12] remoteproc: qcom: Remove minidump related data from qcom_common.c

As minidump specific data structure and functions move under
config QCOM_RPROC_MINIDUMP, so remove minidump specific data
from driver/remoteproc/qcom_common.c .

Signed-off-by: Mukesh Ojha <[email protected]>
---
drivers/remoteproc/qcom_common.c | 160 ---------------------------------------
1 file changed, 160 deletions(-)

diff --git a/drivers/remoteproc/qcom_common.c b/drivers/remoteproc/qcom_common.c
index 03e5f5d533eb..085fd73fa23a 100644
--- a/drivers/remoteproc/qcom_common.c
+++ b/drivers/remoteproc/qcom_common.c
@@ -17,7 +17,6 @@
#include <linux/rpmsg/qcom_smd.h>
#include <linux/slab.h>
#include <linux/soc/qcom/mdt_loader.h>
-#include <linux/soc/qcom/smem.h>

#include "remoteproc_internal.h"
#include "qcom_common.h"
@@ -26,61 +25,6 @@
#define to_smd_subdev(d) container_of(d, struct qcom_rproc_subdev, subdev)
#define to_ssr_subdev(d) container_of(d, struct qcom_rproc_ssr, subdev)

-#define MAX_NUM_OF_SS 10
-#define MAX_REGION_NAME_LENGTH 16
-#define SBL_MINIDUMP_SMEM_ID 602
-#define MINIDUMP_REGION_VALID ('V' << 24 | 'A' << 16 | 'L' << 8 | 'I' << 0)
-#define MINIDUMP_SS_ENCR_DONE ('D' << 24 | 'O' << 16 | 'N' << 8 | 'E' << 0)
-#define MINIDUMP_SS_ENABLED ('E' << 24 | 'N' << 16 | 'B' << 8 | 'L' << 0)
-
-/**
- * struct minidump_region - Minidump region
- * @name : Name of the region to be dumped
- * @seq_num: : Use to differentiate regions with same name.
- * @valid : This entry to be dumped (if set to 1)
- * @address : Physical address of region to be dumped
- * @size : Size of the region
- */
-struct minidump_region {
- char name[MAX_REGION_NAME_LENGTH];
- __le32 seq_num;
- __le32 valid;
- __le64 address;
- __le64 size;
-};
-
-/**
- * struct minidump_subsystem - Subsystem's SMEM Table of content
- * @status : Subsystem toc init status
- * @enabled : if set to 1, this region would be copied during coredump
- * @encryption_status: Encryption status for this subsystem
- * @encryption_required : Decides to encrypt the subsystem regions or not
- * @region_count : Number of regions added in this subsystem toc
- * @regions_baseptr : regions base pointer of the subsystem
- */
-struct minidump_subsystem {
- __le32 status;
- __le32 enabled;
- __le32 encryption_status;
- __le32 encryption_required;
- __le32 region_count;
- __le64 regions_baseptr;
-};
-
-/**
- * struct minidump_global_toc - Global Table of Content
- * @status : Global Minidump init status
- * @md_revision : Minidump revision
- * @enabled : Minidump enable status
- * @subsystems : Array of subsystems toc
- */
-struct minidump_global_toc {
- __le32 status;
- __le32 md_revision;
- __le32 enabled;
- struct minidump_subsystem subsystems[MAX_NUM_OF_SS];
-};
-
struct qcom_ssr_subsystem {
const char *name;
struct srcu_notifier_head notifier_list;
@@ -90,110 +34,6 @@ struct qcom_ssr_subsystem {
static LIST_HEAD(qcom_ssr_subsystem_list);
static DEFINE_MUTEX(qcom_ssr_subsys_lock);

-static void qcom_minidump_cleanup(struct rproc *rproc)
-{
- struct rproc_dump_segment *entry, *tmp;
-
- list_for_each_entry_safe(entry, tmp, &rproc->dump_segments, node) {
- list_del(&entry->node);
- kfree(entry->priv);
- kfree(entry);
- }
-}
-
-static int qcom_add_minidump_segments(struct rproc *rproc, struct minidump_subsystem *subsystem,
- void (*rproc_dumpfn_t)(struct rproc *rproc, struct rproc_dump_segment *segment,
- void *dest, size_t offset, size_t size))
-{
- struct minidump_region __iomem *ptr;
- struct minidump_region region;
- int seg_cnt, i;
- dma_addr_t da;
- size_t size;
- char *name;
-
- if (WARN_ON(!list_empty(&rproc->dump_segments))) {
- dev_err(&rproc->dev, "dump segment list already populated\n");
- return -EUCLEAN;
- }
-
- seg_cnt = le32_to_cpu(subsystem->region_count);
- ptr = ioremap((unsigned long)le64_to_cpu(subsystem->regions_baseptr),
- seg_cnt * sizeof(struct minidump_region));
- if (!ptr)
- return -EFAULT;
-
- for (i = 0; i < seg_cnt; i++) {
- memcpy_fromio(&region, ptr + i, sizeof(region));
- if (le32_to_cpu(region.valid) == MINIDUMP_REGION_VALID) {
- name = kstrndup(region.name, MAX_REGION_NAME_LENGTH - 1, GFP_KERNEL);
- if (!name) {
- iounmap(ptr);
- return -ENOMEM;
- }
- da = le64_to_cpu(region.address);
- size = le64_to_cpu(region.size);
- rproc_coredump_add_custom_segment(rproc, da, size, rproc_dumpfn_t, name);
- }
- }
-
- iounmap(ptr);
- return 0;
-}
-
-void qcom_minidump(struct rproc *rproc, unsigned int minidump_id,
- void (*rproc_dumpfn_t)(struct rproc *rproc,
- struct rproc_dump_segment *segment, void *dest, size_t offset,
- size_t size))
-{
- int ret;
- struct minidump_subsystem *subsystem;
- struct minidump_global_toc *toc;
-
- /* Get Global minidump ToC*/
- toc = qcom_smem_get(QCOM_SMEM_HOST_ANY, SBL_MINIDUMP_SMEM_ID, NULL);
-
- /* check if global table pointer exists and init is set */
- if (IS_ERR(toc) || !toc->status) {
- dev_err(&rproc->dev, "Minidump TOC not found in SMEM\n");
- return;
- }
-
- /* Get subsystem table of contents using the minidump id */
- subsystem = &toc->subsystems[minidump_id];
-
- /**
- * Collect minidump if SS ToC is valid and segment table
- * is initialized in memory and encryption status is set.
- */
- if (subsystem->regions_baseptr == 0 ||
- le32_to_cpu(subsystem->status) != 1 ||
- le32_to_cpu(subsystem->enabled) != MINIDUMP_SS_ENABLED) {
- return rproc_coredump(rproc);
- }
-
- if (le32_to_cpu(subsystem->encryption_status) != MINIDUMP_SS_ENCR_DONE) {
- dev_err(&rproc->dev, "Minidump not ready, skipping\n");
- return;
- }
-
- /**
- * Clear out the dump segments populated by parse_fw before
- * re-populating them with minidump segments.
- */
- rproc_coredump_cleanup(rproc);
-
- ret = qcom_add_minidump_segments(rproc, subsystem, rproc_dumpfn_t);
- if (ret) {
- dev_err(&rproc->dev, "Failed with error: %d while adding minidump entries\n", ret);
- goto clean_minidump;
- }
- rproc_coredump_using_sections(rproc);
-clean_minidump:
- qcom_minidump_cleanup(rproc);
-}
-EXPORT_SYMBOL_GPL(qcom_minidump);
-
static int glink_subdev_start(struct rproc_subdev *subdev)
{
struct qcom_rproc_glink *glink = to_glink_subdev(subdev);
--
2.7.4

2023-11-24 22:23:03

by Mukesh Ojha

[permalink] [raw]
Subject: [Patch v6 09/12] MAINTAINERS: Add entry for minidump related files

Add the entries into maintainer file for all the minidump related
modules.

Signed-off-by: Mukesh Ojha <[email protected]>
---
MAINTAINERS | 10 ++++++++++
1 file changed, 10 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index f2b2cd60eb20..53b9f22816c2 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -18016,6 +18016,16 @@ S: Maintained
F: Documentation/devicetree/bindings/regulator/vqmmc-ipq4019-regulator.yaml
F: drivers/regulator/vqmmc-ipq4019-regulator.c

+QUALCOMM MINIDUMP DRIVER
+M: Mukesh Ojha <[email protected]>
+L: [email protected]
+S: Maintained
+F: Documentation/admin-guide/qcom_minidump.rst
+F: drivers/soc/qcom/qcom_rproc_minidump.c
+F: drivers/soc/qcom/qcom_minidump.c
+F: drivers/soc/qcom/qcom_ramoops_minidump.c
+
+
QUALCOMM NAND CONTROLLER DRIVER
M: Manivannan Sadhasivam <[email protected]>
L: [email protected]
--
2.7.4

2023-11-24 22:23:04

by Mukesh Ojha

[permalink] [raw]
Subject: [Patch v6 10/12] pstore/ram: Add dynamic ramoops region support through commandline

The reserved memory region for ramoops is assumed to be at a fixed
and known location when read from the devicetree. This may not be
required for something like Qualcomm's minidump which is interested
in knowing addresses of ramoops region but it does not put hard
requirement of address being fixed as most of it's SoC does not
support warm reset and does not use pstorefs at all instead it has
firmware way of collecting ramoops region if it gets to know the
address and register it with apss minidump table which is sitting
in shared memory region in DDR and firmware will have access to
these table during reset and collects it on crash of SoC.

So, add the support of reserving ramoops region to be dynamically
allocated early during boot if it is request through command line
via 'dyn_ramoops_size=<size>' and fill up reserved resource structure
and export the structure, so that it can be read by ramoops driver.

Signed-off-by: Mukesh Ojha <[email protected]>
---
Documentation/admin-guide/ramoops.rst | 7 ++++
fs/pstore/Kconfig | 15 +++++++++
fs/pstore/ram.c | 62 ++++++++++++++++++++++++++++++++---
include/linux/pstore_ram.h | 5 +++
init/main.c | 2 ++
5 files changed, 87 insertions(+), 4 deletions(-)

diff --git a/Documentation/admin-guide/ramoops.rst b/Documentation/admin-guide/ramoops.rst
index e9f85142182d..af737adbf079 100644
--- a/Documentation/admin-guide/ramoops.rst
+++ b/Documentation/admin-guide/ramoops.rst
@@ -33,6 +33,13 @@ memory are implementation defined, and won't work on many ARMs such as omaps.
Setting ``mem_type=2`` attempts to treat the memory region as normal memory,
which enables full cache on it. This can improve the performance.

+Ramoops memory region can also be allocated dynamically for a special case where
+there is no requirement to access the logs from pstorefs on next boot instead there
+is separate backend mechanism like minidump present which has awareness about the
+dynamic ramoops region and can recover the logs. This is enabled via command line
+parameter ``dyn_ramoops_size=<size>`` and should not be used in absence of
+separate backend which knows how to recover this dynamic region.
+
The memory area is divided into ``record_size`` chunks (also rounded down to
power of two) and each kmesg dump writes a ``record_size`` chunk of
information.
diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig
index 3acc38600cd1..e13e53d7a225 100644
--- a/fs/pstore/Kconfig
+++ b/fs/pstore/Kconfig
@@ -81,6 +81,21 @@ config PSTORE_RAM

For more information, see Documentation/admin-guide/ramoops.rst.

+config PSTORE_DYNAMIC_RAMOOPS_REGION_RESERVATION
+ bool "Reserve ramoops region dynamically"
+ select PSTORE_RAM
+ help
+ This enables the dynamic reservation of ramoops region for a special case
+ where there is no requirement to access the logs from pstorefs on next boot
+ instead there is separate backend mechanism like minidump present which has
+ awareness about the dynamic ramoops region and can recover the logs. This is
+ enabled via command line parameter dyn_ramoops_size=<size> and should not be
+ used in absence of separate backend which knows how to recover this dynamic
+ region.
+
+ Note whenever this config is selected ramoops driver will be build statically
+ into kernel.
+
config PSTORE_ZONE
tristate
depends on PSTORE
diff --git a/fs/pstore/ram.c b/fs/pstore/ram.c
index 88b34fdbf759..a6c0da8cfdd4 100644
--- a/fs/pstore/ram.c
+++ b/fs/pstore/ram.c
@@ -20,6 +20,7 @@
#include <linux/compiler.h>
#include <linux/of.h>
#include <linux/of_address.h>
+#include <linux/memblock.h>
#include <linux/mm.h>

#include "internal.h"
@@ -103,6 +104,55 @@ struct ramoops_context {
};

static struct platform_device *dummy;
+static int dyn_ramoops_size;
+/* Location of the reserved area for the dynamic ramoops */
+static struct resource dyn_ramoops_res = {
+ .name = "ramoops",
+ .start = 0,
+ .end = 0,
+ .flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
+ .desc = IORES_DESC_NONE,
+};
+
+static int __init parse_dyn_ramoops_size(char *p)
+{
+ char *tmp;
+
+ dyn_ramoops_size = memparse(p, &tmp);
+ if (p == tmp) {
+ pr_err("ramoops: memory size expected\n");
+ return -EINVAL;
+ }
+
+ return 0;
+}
+early_param("dyn_ramoops_size", parse_dyn_ramoops_size);
+
+#ifdef CONFIG_PSTORE_DYNAMIC_RAMOOPS_REGION_RESERVATION
+/*
+ * setup_dynamic_ramoops() - reserves memory for dynamic ramoops
+ *
+ * This enable dynamic reserve memory support for ramoops through
+ * command line.
+ */
+void __init setup_dynamic_ramoops(void)
+{
+ unsigned long long ramoops_base;
+ unsigned long long ramoops_size;
+
+ ramoops_base = memblock_phys_alloc_range(dyn_ramoops_size, SMP_CACHE_BYTES,
+ 0, MEMBLOCK_ALLOC_NOLEAKTRACE);
+ if (!ramoops_base) {
+ pr_err("cannot allocate ramoops dynamic memory (size:0x%llx).\n",
+ ramoops_size);
+ return;
+ }
+
+ dyn_ramoops_res.start = ramoops_base;
+ dyn_ramoops_res.end = ramoops_base + dyn_ramoops_size - 1;
+ insert_resource(&iomem_resource, &dyn_ramoops_res);
+}
+#endif

static int ramoops_pstore_open(struct pstore_info *psi)
{
@@ -915,14 +965,18 @@ static void __init ramoops_register_dummy(void)

/*
* Prepare a dummy platform data structure to carry the module
- * parameters. If mem_size isn't set, then there are no module
- * parameters, and we can skip this.
+ * parameters. If mem_size isn't set, check for dynamic ramoops
+ * size and use if it is set.
*/
- if (!mem_size)
+ if (!mem_size && !dyn_ramoops_size)
return;

- pr_info("using module parameters\n");
+ if (dyn_ramoops_size) {
+ mem_size = dyn_ramoops_size;
+ mem_address = dyn_ramoops_res.start;
+ }

+ pr_info("using module parameters\n");
memset(&pdata, 0, sizeof(pdata));
pdata.mem_size = mem_size;
pdata.mem_address = mem_address;
diff --git a/include/linux/pstore_ram.h b/include/linux/pstore_ram.h
index 9d65ff94e216..b3537336c4e1 100644
--- a/include/linux/pstore_ram.h
+++ b/include/linux/pstore_ram.h
@@ -39,4 +39,9 @@ struct ramoops_platform_data {
struct persistent_ram_ecc_info ecc_info;
};

+#ifdef CONFIG_PSTORE_DYNAMIC_RAMOOPS_REGION_RESERVATION
+void __init setup_dynamic_ramoops(void);
+#else
+static inline void __init setup_dynamic_ramoops(void) {}
+#endif
#endif
diff --git a/init/main.c b/init/main.c
index e24b0780fdff..32c7d94558ec 100644
--- a/init/main.c
+++ b/init/main.c
@@ -99,6 +99,7 @@
#include <linux/init_syscalls.h>
#include <linux/stackdepot.h>
#include <linux/randomize_kstack.h>
+#include <linux/pstore_ram.h>
#include <net/net_namespace.h>

#include <asm/io.h>
@@ -895,6 +896,7 @@ void start_kernel(void)
pr_notice("%s", linux_banner);
early_security_init();
setup_arch(&command_line);
+ setup_dynamic_ramoops();
setup_boot_config();
setup_command_line(command_line);
setup_nr_cpu_ids();
--
2.7.4

2023-11-24 22:23:37

by Mukesh Ojha

[permalink] [raw]
Subject: [Patch v6 12/12] soc: qcom: register ramoops region with APSS minidump

Register ramoops region with APSS minidump so that
these region gets captured on system crash.

Signed-off-by: Mukesh Ojha <[email protected]>
---
drivers/soc/qcom/qcom_minidump.c | 62 ++++++++++++++++++++++++++++++++++++++++
1 file changed, 62 insertions(+)

diff --git a/drivers/soc/qcom/qcom_minidump.c b/drivers/soc/qcom/qcom_minidump.c
index c0f76a51d0e8..9a39b68903fb 100644
--- a/drivers/soc/qcom/qcom_minidump.c
+++ b/drivers/soc/qcom/qcom_minidump.c
@@ -18,8 +18,10 @@
#include <linux/mutex.h>
#include <linux/platform_device.h>
#include <linux/printk.h>
+#include <linux/pstore_ram.h>
#include <linux/soc/qcom/smem.h>
#include <linux/string.h>
+#include <linux/workqueue.h>
#include <soc/qcom/qcom_minidump.h>

#include "qcom_minidump_internal.h"
@@ -56,12 +58,20 @@ struct minidump_elfhdr {
* @dev: Minidump backend device
* @apss_data: APSS driver data
* @md_lock: Lock to protect access to APSS minidump table
+ * @work: Minidump work for any required execution in process context.
*/
struct minidump {
struct minidump_elfhdr elf;
struct device *dev;
struct minidump_ss_data *apss_data;
struct mutex md_lock;
+ struct work_struct work;
+};
+
+static LIST_HEAD(apss_md_rlist);
+struct md_region_list {
+ struct qcom_minidump_region md_region;
+ struct list_head list;
};

/*
@@ -530,6 +540,52 @@ static int qcom_apss_md_table_init(struct minidump *md,
return 0;
}

+static int ramoops_region_md_register(const char *name, int id, void *vaddr,
+ phys_addr_t paddr, size_t size)
+{
+ struct qcom_minidump_region *md_region;
+ struct md_region_list *mdr_list;
+ int ret;
+
+ mdr_list = kzalloc(sizeof(*mdr_list), GFP_KERNEL);
+ if (!mdr_list)
+ return -ENOMEM;
+
+ md_region = &mdr_list->md_region;
+ scnprintf(md_region->name, sizeof(md_region->name), "K%s%d", name, id);
+ md_region->virt_addr = vaddr;
+ md_region->phys_addr = paddr;
+ md_region->size = size;
+ ret = qcom_minidump_region_register(md_region);
+ if (ret < 0) {
+ pr_err("failed to register region in minidump: err: %d\n", ret);
+ return ret;
+ }
+
+ list_add(&mdr_list->list, &apss_md_rlist);
+
+ return 0;
+}
+
+static void ramoops_register_md_callback(struct work_struct *work)
+{
+ register_ramoops_ready_notifier(ramoops_region_md_register);
+}
+
+static void qcom_ramoops_minidump_unregister(void)
+{
+ struct md_region_list *mdr_list;
+ struct md_region_list *tmp;
+
+ list_for_each_entry_safe(mdr_list, tmp, &apss_md_rlist, list) {
+ struct qcom_minidump_region *region;
+
+ region = &mdr_list->md_region;
+ qcom_minidump_region_unregister(region);
+ list_del(&mdr_list->list);
+ }
+}
+
static void qcom_apss_md_table_exit(struct minidump_ss_data *mdss_data)
{
memset(mdss_data->md_ss_toc, cpu_to_le32(0), sizeof(*mdss_data->md_ss_toc));
@@ -575,6 +631,9 @@ static int qcom_apss_minidump_probe(struct platform_device *pdev)

platform_set_drvdata(pdev, md);

+ INIT_WORK(&md->work, ramoops_register_md_callback);
+ schedule_work(&md->work);
+
return ret;
}

@@ -582,6 +641,9 @@ static void qcom_apss_minidump_remove(struct platform_device *pdev)
{
struct minidump *md = platform_get_drvdata(pdev);

+ flush_work(&md->work);
+ qcom_ramoops_minidump_unregister();
+ unregister_ramoops_ready_notifier(ramoops_region_md_register);
qcom_apss_md_table_exit(md->apss_data);
}

--
2.7.4

2023-11-24 22:23:50

by Mukesh Ojha

[permalink] [raw]
Subject: [Patch v6 11/12] pstore/ram: Add ramoops ready notifier support

Client like minidump, is only interested in ramoops
region addresses/size so that it could register them
with its table and also it is only deals with ram
backend and does not use pstorefs to read the records.
Let's introduce a client notifier in ramoops which
gets called when ramoops driver probes successfully
and it passes the ramoops region information to the
passed callback by the client and If the call for
ramoops ready register comes after ramoops probe
than call the callback directly.

Signed-off-by: Mukesh Ojha <[email protected]>
---
fs/pstore/ram.c | 77 ++++++++++++++++++++++++++++++++++++++++++++++
include/linux/pstore_ram.h | 6 ++++
2 files changed, 83 insertions(+)

diff --git a/fs/pstore/ram.c b/fs/pstore/ram.c
index a6c0da8cfdd4..72341fd21aec 100644
--- a/fs/pstore/ram.c
+++ b/fs/pstore/ram.c
@@ -22,6 +22,7 @@
#include <linux/of_address.h>
#include <linux/memblock.h>
#include <linux/mm.h>
+#include <linux/mutex.h>

#include "internal.h"
#include "ram_internal.h"
@@ -101,6 +102,14 @@ struct ramoops_context {
unsigned int ftrace_read_cnt;
unsigned int pmsg_read_cnt;
struct pstore_info pstore;
+ /*
+ * Lock to serialize calls to register_ramoops_ready_notifier,
+ * ramoops_ready_notifier and read/modification of 'ramoops_ready'.
+ */
+ struct mutex lock;
+ bool ramoops_ready;
+ int (*callback)(const char *name, int id, void *vaddr,
+ phys_addr_t paddr, size_t size);
};

static struct platform_device *dummy;
@@ -488,6 +497,7 @@ static int ramoops_pstore_erase(struct pstore_record *record)
}

static struct ramoops_context oops_cxt = {
+ .lock = __MUTEX_INITIALIZER(oops_cxt.lock),
.pstore = {
.owner = THIS_MODULE,
.name = "ramoops",
@@ -662,6 +672,68 @@ static int ramoops_init_prz(const char *name,
return 0;
}

+void ramoops_ready_notifier(struct ramoops_context *cxt)
+{
+ struct persistent_ram_zone *prz;
+ int i;
+
+ if (!cxt->callback)
+ return;
+
+ for (i = 0; i < cxt->max_dump_cnt; i++) {
+ prz = cxt->dprzs[i];
+ cxt->callback("dmesg", i, prz->vaddr, prz->paddr, prz->size);
+ }
+
+ if (cxt->console_size) {
+ prz = cxt->cprz;
+ cxt->callback("console", 0, prz->vaddr, prz->paddr, prz->size);
+ }
+
+ for (i = 0; i < cxt->max_ftrace_cnt; i++) {
+ prz = cxt->fprzs[i];
+ cxt->callback("ftrace", i, prz->vaddr, prz->paddr, prz->size);
+ }
+
+ if (cxt->pmsg_size) {
+ prz = cxt->mprz;
+ cxt->callback("pmsg", 0, prz->vaddr, prz->paddr, prz->size);
+ }
+}
+
+int register_ramoops_ready_notifier(int (*fn)(const char *, int,
+ void *, phys_addr_t, size_t))
+{
+ struct ramoops_context *cxt = &oops_cxt;
+
+ mutex_lock(&cxt->lock);
+ if (cxt->callback) {
+ mutex_unlock(&cxt->lock);
+ return -EEXIST;
+ }
+
+ cxt->callback = fn;
+ if (cxt->ramoops_ready)
+ ramoops_ready_notifier(cxt);
+
+ mutex_unlock(&cxt->lock);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(register_ramoops_ready_notifier);
+
+void unregister_ramoops_ready_notifier(int (*fn)(const char *, int,
+ void *, phys_addr_t, size_t))
+{
+ struct ramoops_context *cxt = &oops_cxt;
+
+ mutex_lock(&cxt->lock);
+ WARN_ON_ONCE(cxt->callback != fn);
+ cxt->callback = NULL;
+ mutex_unlock(&cxt->lock);
+}
+EXPORT_SYMBOL_GPL(unregister_ramoops_ready_notifier);
+
/* Read a u32 from a dt property and make sure it's safe for an int. */
static int ramoops_parse_dt_u32(struct platform_device *pdev,
const char *propname,
@@ -911,6 +983,11 @@ static int ramoops_probe(struct platform_device *pdev)
ramoops_pmsg_size = pdata->pmsg_size;
ramoops_ftrace_size = pdata->ftrace_size;

+ mutex_lock(&cxt->lock);
+ ramoops_ready_notifier(cxt);
+ cxt->ramoops_ready = true;
+ mutex_unlock(&cxt->lock);
+
pr_info("using 0x%lx@0x%llx, ecc: %d\n",
cxt->size, (unsigned long long)cxt->phys_addr,
cxt->ecc_info.ecc_size);
diff --git a/include/linux/pstore_ram.h b/include/linux/pstore_ram.h
index b3537336c4e1..9745d48ba59e 100644
--- a/include/linux/pstore_ram.h
+++ b/include/linux/pstore_ram.h
@@ -39,6 +39,12 @@ struct ramoops_platform_data {
struct persistent_ram_ecc_info ecc_info;
};

+int register_ramoops_ready_notifier(int (*fn)(const char *name, int id,
+ void *vaddr, phys_addr_t paddr,
+ size_t size));
+void unregister_ramoops_ready_notifier(int (*fn)(const char *name, int id,
+ void *vaddr, phys_addr_t paddr,
+ size_t size));
#ifdef CONFIG_PSTORE_DYNAMIC_RAMOOPS_REGION_RESERVATION
void __init setup_dynamic_ramoops(void);
#else
--
2.7.4

2023-11-24 22:24:02

by Mukesh Ojha

[permalink] [raw]
Subject: [Patch v6 08/12] soc: qcom: Add Qualcomm APSS minidump kernel driver

Minidump is a best effort mechanism to collect useful and predefined
data for first level of debugging on end user devices running on
Qualcomm SoCs. It is built on the premise that System on Chip (SoC)
or subsystem part of SoC crashes, due to a range of hardware and
software bugs. Hence, the ability to collect accurate data is only
a best-effort. The data collected could be invalid or corrupted,
data collection itself could fail, and so on.

Qualcomm devices in engineering mode provides a mechanism for
generating full system ramdumps for post mortem debugging. But in some
cases it's however not feasible to capture the entire content of RAM.
The minidump mechanism provides the means for selecting region should
be included in the ramdump. The solution supports extracting the
ramdump/minidump produced either over USB or stored to an attached
storage device.

The core of minidump feature is part of Qualcomm's boot firmware code.
It initializes shared memory (SMEM), which is a part of DDR and
allocates a small section of it to minidump table i.e also called
global table of content (G-ToC). Each subsystem (APSS, ADSP, ...) has
their own table of segments to be included in the minidump, all
references from a descriptor in SMEM (G-ToC). Each segment/region has
some details like name, physical address and it's size etc. and it
could be anywhere scattered in the DDR.

qcom_minidump(core or frontend) driver adds the capability to add inux
region to be dumped as part of ram dump collection. It provides
appropriate symbol to register/unregister client regions.

To simplify post mortem debugging, it creates and maintain an ELF
header as first region that gets updated upon registration
of a new region.

Signed-off-by: Mukesh Ojha <[email protected]>
---
drivers/soc/qcom/Kconfig | 13 +
drivers/soc/qcom/Makefile | 1 +
drivers/soc/qcom/qcom_minidump.c | 606 ++++++++++++++++++++++++++++++
drivers/soc/qcom/qcom_minidump_internal.h | 10 +
drivers/soc/qcom/smem.c | 20 +
include/linux/soc/qcom/smem.h | 2 +
include/soc/qcom/qcom_minidump.h | 18 +
7 files changed, 670 insertions(+)
create mode 100644 drivers/soc/qcom/qcom_minidump.c

diff --git a/drivers/soc/qcom/Kconfig b/drivers/soc/qcom/Kconfig
index e507a317c74f..984623c0c1a0 100644
--- a/drivers/soc/qcom/Kconfig
+++ b/drivers/soc/qcom/Kconfig
@@ -264,4 +264,17 @@ config QCOM_RPROC_MINIDUMP
query predefined minidump segments associated with the remote processor
and check its validity and end up collecting the dump on remote processor
crash during its recovery.
+
+config QCOM_MINIDUMP
+ tristate "QCOM APSS Minidump driver"
+ depends on ARCH_QCOM || COMPILE_TEST
+ depends on QCOM_SMEM
+ help
+ This config enables linux core infrastructure for Application
+ processor subsystem (APSS) minidump collection i.e, it enables
+ Linux clients drivers to register their internal data structures
+ and debug messages as part of the apss minidump table and when
+ the SoC is crashed, these selective regions will be dumped
+ instead of the entire DDR dump. This saves significant amount
+ of time and/or storage space.
endmenu
diff --git a/drivers/soc/qcom/Makefile b/drivers/soc/qcom/Makefile
index 838528e7e30a..4b5f72f78d3c 100644
--- a/drivers/soc/qcom/Makefile
+++ b/drivers/soc/qcom/Makefile
@@ -33,3 +33,4 @@ obj-$(CONFIG_QCOM_ICC_BWMON) += icc-bwmon.o
qcom_ice-objs += ice.o
obj-$(CONFIG_QCOM_INLINE_CRYPTO_ENGINE) += qcom_ice.o
obj-$(CONFIG_QCOM_RPROC_MINIDUMP) += qcom_rproc_minidump.o
+obj-$(CONFIG_QCOM_MINIDUMP) += qcom_minidump.o
diff --git a/drivers/soc/qcom/qcom_minidump.c b/drivers/soc/qcom/qcom_minidump.c
new file mode 100644
index 000000000000..c0f76a51d0e8
--- /dev/null
+++ b/drivers/soc/qcom/qcom_minidump.c
@@ -0,0 +1,606 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2023 Qualcomm Innovation Center, Inc. All rights reserved.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/device.h>
+#include <linux/elf.h>
+#include <linux/err.h>
+#include <linux/errno.h>
+#include <linux/export.h>
+#include <linux/init.h>
+#include <linux/io.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/mod_devicetable.h>
+#include <linux/mutex.h>
+#include <linux/platform_device.h>
+#include <linux/printk.h>
+#include <linux/soc/qcom/smem.h>
+#include <linux/string.h>
+#include <soc/qcom/qcom_minidump.h>
+
+#include "qcom_minidump_internal.h"
+
+/**
+ * struct minidump_ss_data - Minidump subsystem private data
+ * @md_ss_toc: Application Subsystem TOC pointer
+ * @md_regions: Application Subsystem region base pointer
+ */
+struct minidump_ss_data {
+ struct minidump_subsystem *md_ss_toc;
+ struct minidump_region *md_regions;
+};
+
+/**
+ * struct minidump_elfhdr - Minidump table elf header
+ * @ehdr: elf main header
+ * @shdr: Section header
+ * @phdr: Program header
+ * @elf_offset: Section offset in elf
+ * @strtable_idx: String table current index position
+ */
+struct minidump_elfhdr {
+ struct elfhdr *ehdr;
+ struct elf_shdr *shdr;
+ struct elf_phdr *phdr;
+ size_t elf_offset;
+ size_t strtable_idx;
+};
+
+/**
+ * struct minidump - Minidump driver data information
+ * @elf: Minidump elf header
+ * @dev: Minidump backend device
+ * @apss_data: APSS driver data
+ * @md_lock: Lock to protect access to APSS minidump table
+ */
+struct minidump {
+ struct minidump_elfhdr elf;
+ struct device *dev;
+ struct minidump_ss_data *apss_data;
+ struct mutex md_lock;
+};
+
+/*
+ * In some of the Old Qualcomm devices, boot firmware statically allocates 300
+ * as total number of supported region (including all co-processors) in
+ * minidump table out of which linux was using 201. In future, this limitation
+ * from boot firmware might get removed by allocating the region dynamically.
+ * So, keep it compatible with older devices, we can keep the current limit for
+ * Linux to 201.
+ */
+#define MAX_NUM_ENTRIES 201
+#define MAX_STRTBL_SIZE (MAX_NUM_ENTRIES * MAX_REGION_NAME_LENGTH)
+
+static struct elf_shdr *elf_shdr_entry_addr(struct elfhdr *ehdr, int idx)
+{
+ struct elf_shdr *eshdr = (struct elf_shdr *)((size_t)ehdr + ehdr->e_shoff);
+
+ return &eshdr[idx];
+}
+
+static struct elf_phdr *elf_phdr_entry_addr(struct elfhdr *ehdr, int idx)
+{
+ struct elf_phdr *ephdr = (struct elf_phdr *)((size_t)ehdr + ehdr->e_phoff);
+
+ return &ephdr[idx];
+}
+
+static char *elf_str_table_start(struct elfhdr *ehdr)
+{
+ struct elf_shdr *eshdr;
+
+ if (ehdr->e_shstrndx == SHN_UNDEF)
+ return NULL;
+
+ eshdr = elf_shdr_entry_addr(ehdr, ehdr->e_shstrndx);
+
+ return (char *)ehdr + eshdr->sh_offset;
+}
+
+static char *elf_lookup_string(struct minidump *md, struct elfhdr *ehdr, int offset)
+{
+ char *strtab = elf_str_table_start(ehdr);
+
+ if (!strtab || (md->elf.strtable_idx < offset))
+ return NULL;
+
+ return strtab + offset;
+}
+
+static unsigned int append_str_to_strtable(struct minidump *md, const char *name)
+{
+ char *strtab = elf_str_table_start(md->elf.ehdr);
+ unsigned int old_idx = md->elf.strtable_idx;
+ unsigned int ret;
+
+ if (!strtab || !name)
+ return 0;
+
+ ret = old_idx;
+ old_idx += strscpy((strtab + old_idx), name, MAX_REGION_NAME_LENGTH);
+ md->elf.strtable_idx = old_idx + 1;
+
+ return ret;
+}
+
+static int qcom_md_clear_elfheader(struct minidump *md,
+ const struct qcom_minidump_region *region)
+{
+ struct elfhdr *ehdr = md->elf.ehdr;
+ struct elf_shdr *shdr;
+ struct elf_shdr *tmp_shdr;
+ struct elf_phdr *phdr;
+ struct elf_phdr *tmp_phdr;
+ unsigned int phidx;
+ unsigned int shidx;
+ unsigned int len;
+ unsigned int i;
+ char *shname;
+
+ for (i = 0; i < ehdr->e_phnum; i++) {
+ phdr = elf_phdr_entry_addr(ehdr, i);
+ if (phdr->p_paddr == region->phys_addr &&
+ phdr->p_memsz == region->size)
+ break;
+ }
+
+ if (i == ehdr->e_phnum) {
+ dev_err(md->dev, "Cannot find program header entry in elf\n");
+ return -EINVAL;
+ }
+
+ phidx = i;
+ for (i = 0; i < ehdr->e_shnum; i++) {
+ shdr = elf_shdr_entry_addr(ehdr, i);
+ shname = elf_lookup_string(md, ehdr, shdr->sh_name);
+ if (shname && !strcmp(shname, region->name) &&
+ shdr->sh_addr == (elf_addr_t)region->virt_addr &&
+ shdr->sh_size == region->size)
+ break;
+ }
+
+ if (i == ehdr->e_shnum) {
+ dev_err(md->dev, "Cannot find section header entry in elf\n");
+ return -EINVAL;
+ }
+
+ shidx = i;
+ if (shdr->sh_offset != phdr->p_offset) {
+ dev_err(md->dev, "Invalid entry details for region: %s\n", region->name);
+ return -EINVAL;
+ }
+
+ /* Clear name in string table */
+ len = strlen(shname) + 1;
+ memmove(shname, shname + len, md->elf.strtable_idx - shdr->sh_name - len);
+ md->elf.strtable_idx -= len;
+
+ /* Clear program header */
+ tmp_phdr = elf_phdr_entry_addr(ehdr, phidx);
+ for (i = phidx; i < ehdr->e_phnum - 1; i++) {
+ tmp_phdr = elf_phdr_entry_addr(ehdr, i + 1);
+ phdr = elf_phdr_entry_addr(ehdr, i);
+ memcpy(phdr, tmp_phdr, sizeof(*phdr));
+ phdr->p_offset = phdr->p_offset - region->size;
+ }
+ memset(tmp_phdr, 0, sizeof(*tmp_phdr));
+ ehdr->e_phnum--;
+
+ /* Clear section header */
+ tmp_shdr = elf_shdr_entry_addr(ehdr, shidx);
+ for (i = shidx; i < ehdr->e_shnum - 1; i++) {
+ tmp_shdr = elf_shdr_entry_addr(ehdr, i + 1);
+ shdr = elf_shdr_entry_addr(ehdr, i);
+ memcpy(shdr, tmp_shdr, sizeof(*shdr));
+ shdr->sh_offset -= region->size;
+ shdr->sh_name -= len;
+ }
+
+ memset(tmp_shdr, 0, sizeof(*tmp_shdr));
+ ehdr->e_shnum--;
+ md->elf.elf_offset -= region->size;
+
+ return 0;
+}
+
+static void qcom_md_update_elfheader(struct minidump *md,
+ const struct qcom_minidump_region *region)
+{
+ struct elfhdr *ehdr = md->elf.ehdr;
+ struct elf_shdr *shdr;
+ struct elf_phdr *phdr;
+
+ shdr = elf_shdr_entry_addr(ehdr, ehdr->e_shnum++);
+ phdr = elf_phdr_entry_addr(ehdr, ehdr->e_phnum++);
+
+ shdr->sh_type = SHT_PROGBITS;
+ shdr->sh_name = append_str_to_strtable(md, region->name);
+ shdr->sh_addr = (elf_addr_t)region->virt_addr;
+ shdr->sh_size = region->size;
+ shdr->sh_flags = SHF_WRITE;
+ shdr->sh_offset = md->elf.elf_offset;
+ shdr->sh_entsize = 0;
+
+ phdr->p_type = PT_LOAD;
+ phdr->p_offset = md->elf.elf_offset;
+ phdr->p_vaddr = (elf_addr_t)region->virt_addr;
+ phdr->p_paddr = region->phys_addr;
+ phdr->p_filesz = phdr->p_memsz = region->size;
+ phdr->p_flags = PF_R | PF_W;
+ md->elf.elf_offset += shdr->sh_size;
+}
+
+static void qcom_md_add_region(struct minidump_ss_data *mdss_data,
+ const struct qcom_minidump_region *region)
+{
+ struct minidump_subsystem *mdss_toc = mdss_data->md_ss_toc;
+ struct minidump_region *mdr;
+ unsigned int region_cnt;
+
+ region_cnt = le32_to_cpu(mdss_toc->region_count);
+ mdr = &mdss_data->md_regions[region_cnt];
+ strscpy(mdr->name, region->name, sizeof(mdr->name));
+ mdr->address = cpu_to_le64(region->phys_addr);
+ mdr->size = cpu_to_le64(region->size);
+ mdr->valid = cpu_to_le32(MINIDUMP_REGION_VALID);
+ region_cnt++;
+ mdss_toc->region_count = cpu_to_le32(region_cnt);
+}
+
+static int qcom_md_get_region_index(struct minidump_ss_data *mdss_data,
+ const struct qcom_minidump_region *region)
+{
+ struct minidump_subsystem *mdss_toc = mdss_data->md_ss_toc;
+ struct minidump_region *mdr;
+ unsigned int i;
+ unsigned int count;
+
+ count = le32_to_cpu(mdss_toc->region_count);
+ for (i = 0; i < count; i++) {
+ mdr = &mdss_data->md_regions[i];
+ if (!strcmp(mdr->name, region->name))
+ return i;
+ }
+
+ return -ENOENT;
+}
+
+static int qcom_md_region_unregister(struct minidump *md,
+ const struct qcom_minidump_region *region)
+{
+ struct minidump_ss_data *mdss_data = md->apss_data;
+ struct minidump_subsystem *mdss_toc = mdss_data->md_ss_toc;
+ struct minidump_region *mdr;
+ unsigned int region_cnt;
+ unsigned int idx;
+ int ret;
+
+ ret = qcom_md_get_region_index(mdss_data, region);
+ if (ret < 0) {
+ dev_err(md->dev, "%s region is not present\n", region->name);
+ return ret;
+ }
+
+ idx = ret;
+ mdr = &mdss_data->md_regions[0];
+ region_cnt = le32_to_cpu(mdss_toc->region_count);
+ /*
+ * Left shift all the regions exist after this removed region
+ * index by 1 to fill the gap and zero out the last region
+ * present at the end.
+ */
+ memmove(&mdr[idx], &mdr[idx + 1], (region_cnt - idx - 1) * sizeof(*mdr));
+ memset(&mdr[region_cnt - 1], 0, sizeof(*mdr));
+ region_cnt--;
+ mdss_toc->region_count = cpu_to_le32(region_cnt);
+
+ return 0;
+}
+
+static int qcom_md_region_register(struct minidump *md,
+ const struct qcom_minidump_region *region)
+{
+ struct minidump_ss_data *mdss_data = md->apss_data;
+ struct minidump_subsystem *mdss_toc = mdss_data->md_ss_toc;
+ unsigned int num_region;
+ int ret;
+
+ ret = qcom_md_get_region_index(mdss_data, region);
+ if (ret >= 0) {
+ dev_info(md->dev, "%s region is already registered\n", region->name);
+ return -EEXIST;
+ }
+
+ /* Check if there is a room for a new entry */
+ num_region = le32_to_cpu(mdss_toc->region_count);
+ if (num_region >= MAX_NUM_ENTRIES) {
+ dev_err(md->dev, "maximum region limit %u reached\n", num_region);
+ return -ENOSPC;
+ }
+
+ qcom_md_add_region(mdss_data, region);
+
+ return 0;
+}
+
+static bool qcom_minidump_valid_region(const struct qcom_minidump_region *region)
+{
+ return region &&
+ strnlen(region->name, MAX_NAME_LENGTH) < MAX_NAME_LENGTH &&
+ region->virt_addr &&
+ region->size &&
+ IS_ALIGNED(region->size, 4);
+}
+
+/**
+ * qcom_minidump_region_register() - Register region in APSS Minidump table.
+ * @region: minidump region.
+ *
+ * Return: On success, it returns 0 and negative error value on failure.
+ */
+static int qcom_minidump_region_register(const struct qcom_minidump_region *region)
+{
+ struct minidump *md;
+ int ret;
+
+ md = qcom_smem_minidump_ready();
+ if (!md)
+ return -EPROBE_DEFER;
+
+ if (!qcom_minidump_valid_region(region))
+ return -EINVAL;
+
+ mutex_lock(&md->md_lock);
+ ret = qcom_md_region_register(md, region);
+ if (ret)
+ goto unlock;
+
+ qcom_md_update_elfheader(md, region);
+unlock:
+ mutex_unlock(&md->md_lock);
+ return ret;
+}
+
+/**
+ * qcom_minidump_region_unregister() - Unregister region from APSS Minidump table.
+ * @region: minidump region.
+ *
+ * Return: On success, it returns 0 and negative error value on failure.
+ */
+static int qcom_minidump_region_unregister(const struct qcom_minidump_region *region)
+{
+ struct minidump *md;
+ int ret;
+
+ md = qcom_smem_minidump_ready();
+ if (!md)
+ return -EPROBE_DEFER;
+
+ if (!qcom_minidump_valid_region(region))
+ return -EINVAL;
+
+ mutex_lock(&md->md_lock);
+ ret = qcom_md_region_unregister(md, region);
+ if (ret)
+ goto unlock;
+
+ ret = qcom_md_clear_elfheader(md, region);
+unlock:
+ mutex_unlock(&md->md_lock);
+ return ret;
+}
+
+static int qcom_md_add_elfheader(struct minidump *md)
+{
+ struct qcom_minidump_region elfregion;
+ struct elfhdr *ehdr;
+ struct elf_shdr *shdr;
+ struct elf_phdr *phdr;
+ unsigned int elfh_size;
+ unsigned int strtbl_off;
+ unsigned int phdr_off;
+ unsigned int banner_len;
+ char *banner;
+
+ banner_len = strlen(linux_banner);
+ /*
+ * Header buffer contains:
+ * ELF header, (MAX_NUM_ENTRIES + 4) of Section and Program ELF headers,
+ * where, 4 additional entries, one for empty header, one for string table
+ * one for minidump table and one for linux banner.
+ *
+ * Linux banner is stored in minidump to aid post mortem tools to determine
+ * the kernel version.
+ */
+ elfh_size = sizeof(*ehdr);
+ elfh_size += MAX_STRTBL_SIZE;
+ elfh_size += banner_len + 1;
+ elfh_size += ((sizeof(*shdr) + sizeof(*phdr)) * (MAX_NUM_ENTRIES + 4));
+ elfh_size = ALIGN(elfh_size, 4);
+
+ md->elf.ehdr = devm_kzalloc(md->dev, elfh_size, GFP_KERNEL);
+ if (!md->elf.ehdr)
+ return -ENOMEM;
+
+ ehdr = md->elf.ehdr;
+ /* Assign Section/Program headers offset */
+ md->elf.shdr = shdr = (struct elf_shdr *)(ehdr + 1);
+ md->elf.phdr = phdr = (struct elf_phdr *)(shdr + MAX_NUM_ENTRIES);
+ phdr_off = sizeof(*ehdr) + (sizeof(*shdr) * MAX_NUM_ENTRIES);
+
+ memcpy(ehdr->e_ident, ELFMAG, SELFMAG);
+ ehdr->e_ident[EI_CLASS] = ELF_CLASS;
+ ehdr->e_ident[EI_DATA] = ELF_DATA;
+ ehdr->e_ident[EI_VERSION] = EV_CURRENT;
+ ehdr->e_ident[EI_OSABI] = ELF_OSABI;
+ ehdr->e_type = ET_CORE;
+ ehdr->e_machine = ELF_ARCH;
+ ehdr->e_version = EV_CURRENT;
+ ehdr->e_ehsize = sizeof(*ehdr);
+ ehdr->e_phoff = phdr_off;
+ ehdr->e_phentsize = sizeof(*phdr);
+ ehdr->e_shoff = sizeof(*ehdr);
+ ehdr->e_shentsize = sizeof(*shdr);
+ ehdr->e_shstrndx = 1;
+
+ md->elf.elf_offset = elfh_size;
+ /*
+ * The zeroth index of the section header is reserved and is rarely used.
+ * Set the section header as null (SHN_UNDEF) and move to the next one.
+ * 2nd Section is String table.
+ */
+ md->elf.strtable_idx = 1;
+ strtbl_off = sizeof(*ehdr) + ((sizeof(*phdr) + sizeof(*shdr)) * MAX_NUM_ENTRIES);
+ shdr++;
+ shdr->sh_type = SHT_STRTAB;
+ shdr->sh_offset = (elf_addr_t)strtbl_off;
+ shdr->sh_size = MAX_STRTBL_SIZE;
+ shdr->sh_entsize = 0;
+ shdr->sh_flags = 0;
+ shdr->sh_name = append_str_to_strtable(md, "STR_TBL");
+ shdr++;
+
+ /* 3rd Section is Linux banner */
+ banner = (char *)ehdr + strtbl_off + MAX_STRTBL_SIZE;
+ memcpy(banner, linux_banner, banner_len);
+
+ shdr->sh_type = SHT_PROGBITS;
+ shdr->sh_offset = (elf_addr_t)(strtbl_off + MAX_STRTBL_SIZE);
+ shdr->sh_size = banner_len + 1;
+ shdr->sh_addr = (elf_addr_t)linux_banner;
+ shdr->sh_entsize = 0;
+ shdr->sh_flags = SHF_WRITE;
+ shdr->sh_name = append_str_to_strtable(md, "linux_banner");
+
+ phdr->p_type = PT_LOAD;
+ phdr->p_offset = (elf_addr_t)(strtbl_off + MAX_STRTBL_SIZE);
+ phdr->p_vaddr = (elf_addr_t)linux_banner;
+ phdr->p_paddr = virt_to_phys(linux_banner);
+ phdr->p_filesz = phdr->p_memsz = banner_len + 1;
+ phdr->p_flags = PF_R | PF_W;
+
+ /*
+ * Above are some prdefined sections/program header used
+ * for debug, update their count here.
+ */
+ ehdr->e_phnum = 1;
+ ehdr->e_shnum = 3;
+
+ /* Register ELF header as first region */
+ strscpy(elfregion.name, "KELF_HEADER", sizeof(elfregion.name));
+ elfregion.virt_addr = md->elf.ehdr;
+ elfregion.phys_addr = virt_to_phys(md->elf.ehdr);
+ elfregion.size = elfh_size;
+
+ return qcom_md_region_register(md, &elfregion);
+}
+
+static int qcom_apss_md_table_init(struct minidump *md,
+ struct minidump_subsystem *mdss_toc)
+{
+ struct minidump_ss_data *mdss_data;
+
+ mdss_data = devm_kzalloc(md->dev, sizeof(*mdss_data), GFP_KERNEL);
+ if (!mdss_data)
+ return -ENOMEM;
+
+ mdss_data->md_ss_toc = mdss_toc;
+ mdss_data->md_regions = devm_kcalloc(md->dev, MAX_NUM_ENTRIES,
+ sizeof(*mdss_data->md_regions),
+ GFP_KERNEL);
+ if (!mdss_data->md_regions)
+ return -ENOMEM;
+
+ mdss_toc = mdss_data->md_ss_toc;
+ mdss_toc->regions_baseptr = cpu_to_le64(virt_to_phys(mdss_data->md_regions));
+ mdss_toc->enabled = cpu_to_le32(MINIDUMP_SS_ENABLED);
+ mdss_toc->status = cpu_to_le32(1);
+ mdss_toc->region_count = cpu_to_le32(0);
+
+ /* Tell bootloader not to encrypt the regions of this subsystem */
+ mdss_toc->encryption_status = cpu_to_le32(MINIDUMP_SS_ENCR_DONE);
+ mdss_toc->encryption_required = cpu_to_le32(MINIDUMP_SS_ENCR_NOTREQ);
+
+ md->apss_data = mdss_data;
+
+ return 0;
+}
+
+static void qcom_apss_md_table_exit(struct minidump_ss_data *mdss_data)
+{
+ memset(mdss_data->md_ss_toc, cpu_to_le32(0), sizeof(*mdss_data->md_ss_toc));
+}
+
+static int qcom_apss_minidump_probe(struct platform_device *pdev)
+{
+ struct minidump_global_toc *mdgtoc;
+ struct minidump *md;
+ size_t size;
+ int ret;
+
+ md = devm_kzalloc(&pdev->dev, sizeof(*md), GFP_KERNEL);
+ if (!md)
+ return -ENOMEM;
+
+ md->dev = &pdev->dev;
+ mdgtoc = qcom_smem_get(QCOM_SMEM_HOST_ANY, SBL_MINIDUMP_SMEM_ID, &size);
+ if (IS_ERR(mdgtoc)) {
+ ret = PTR_ERR(mdgtoc);
+ return dev_err_probe(md->dev, ret,
+ "Couldn't find minidump smem item\n");
+ }
+
+ if (size < sizeof(*mdgtoc) || !mdgtoc->status) {
+ ret = -EINVAL;
+ return dev_err_probe(md->dev, ret,
+ "minidump table is not initialized\n");
+ }
+
+ mutex_init(&md->md_lock);
+ ret = qcom_apss_md_table_init(md, &mdgtoc->subsystems[MINIDUMP_APSS_DESC]);
+ if (ret)
+ return dev_err_probe(md->dev, ret,
+ "apss minidump initialization failed\n");
+
+ /* First entry would be ELF header */
+ ret = qcom_md_add_elfheader(md);
+ if (ret) {
+ qcom_apss_md_table_exit(md->apss_data);
+ return dev_err_probe(md->dev, ret, "Failed to add elf header\n");
+ }
+
+ platform_set_drvdata(pdev, md);
+
+ return ret;
+}
+
+static void qcom_apss_minidump_remove(struct platform_device *pdev)
+{
+ struct minidump *md = platform_get_drvdata(pdev);
+
+ qcom_apss_md_table_exit(md->apss_data);
+}
+
+static const struct platform_device_id qcom_minidump_id_table[] = {
+ { .name = "qcom_minidump_smem" },
+ {}
+};
+MODULE_DEVICE_TABLE(platform, qcom_minidump_id_table);
+
+static struct platform_driver qcom_minidump_driver = {
+ .probe = qcom_apss_minidump_probe,
+ .remove_new = qcom_apss_minidump_remove,
+ .driver = {
+ .name = "qcom_minidump_smem",
+ },
+ .id_table = qcom_minidump_id_table,
+};
+
+module_platform_driver(qcom_minidump_driver);
+
+MODULE_DESCRIPTION("Qualcomm APSS minidump driver");
+MODULE_LICENSE("GPL");
diff --git a/drivers/soc/qcom/qcom_minidump_internal.h b/drivers/soc/qcom/qcom_minidump_internal.h
index 71709235b196..a2aebe5b690a 100644
--- a/drivers/soc/qcom/qcom_minidump_internal.h
+++ b/drivers/soc/qcom/qcom_minidump_internal.h
@@ -9,10 +9,20 @@
#define MAX_NUM_OF_SS 10
#define MAX_REGION_NAME_LENGTH 16
#define SBL_MINIDUMP_SMEM_ID 602
+
#define MINIDUMP_REGION_VALID ('V' << 24 | 'A' << 16 | 'L' << 8 | 'I' << 0)
+#define MINIDUMP_REGION_INVALID ('I' << 24 | 'N' << 16 | 'V' << 8 | 'A' << 0)
+#define MINIDUMP_REGION_INIT ('I' << 24 | 'N' << 16 | 'I' << 8 | 'T' << 0)
+#define MINIDUMP_REGION_NOINIT 0
+
+#define MINIDUMP_SS_ENCR_REQ (0 << 24 | 'Y' << 16 | 'E' << 8 | 'S' << 0)
+#define MINIDUMP_SS_ENCR_NOTREQ (0 << 24 | 0 << 16 | 'N' << 8 | 'R' << 0)
+#define MINIDUMP_SS_ENCR_START ('S' << 24 | 'T' << 16 | 'R' << 8 | 'T' << 0)
#define MINIDUMP_SS_ENCR_DONE ('D' << 24 | 'O' << 16 | 'N' << 8 | 'E' << 0)
#define MINIDUMP_SS_ENABLED ('E' << 24 | 'N' << 16 | 'B' << 8 | 'L' << 0)

+#define MINIDUMP_APSS_DESC 0
+
/**
* struct minidump_region - Minidump region
* @name : Name of the region to be dumped
diff --git a/drivers/soc/qcom/smem.c b/drivers/soc/qcom/smem.c
index 690afc9a12f4..d6d5f4ade471 100644
--- a/drivers/soc/qcom/smem.c
+++ b/drivers/soc/qcom/smem.c
@@ -270,6 +270,7 @@ struct smem_region {
* @partitions: list of partitions of current processor/host
* @item_count: max accepted item number
* @socinfo: platform device pointer
+ * @minidump: minidump platform device pointer
* @num_regions: number of @regions
* @regions: list of the memory regions defining the shared memory
*/
@@ -280,6 +281,7 @@ struct qcom_smem {

u32 item_count;
struct platform_device *socinfo;
+ struct platform_device *minidump;
struct smem_ptable *ptable;
struct smem_partition global_partition;
struct smem_partition partitions[SMEM_HOST_COUNT];
@@ -806,6 +808,15 @@ int qcom_smem_get_soc_id(u32 *id)
}
EXPORT_SYMBOL_GPL(qcom_smem_get_soc_id);

+void *qcom_smem_minidump_ready(void)
+{
+ if (__smem && __smem->minidump)
+ return platform_get_drvdata(__smem->minidump);
+ else
+ return NULL;
+}
+EXPORT_SYMBOL_GPL(qcom_smem_minidump_ready);
+
static int qcom_smem_get_sbl_version(struct qcom_smem *smem)
{
struct smem_header *header;
@@ -1184,11 +1195,20 @@ static int qcom_smem_probe(struct platform_device *pdev)
if (IS_ERR(smem->socinfo))
dev_dbg(&pdev->dev, "failed to register socinfo device\n");

+ smem->minidump = platform_device_register_data(&pdev->dev, "qcom_minidump_smem",
+ PLATFORM_DEVID_NONE, NULL,
+ 0);
+ if (IS_ERR(smem->minidump)) {
+ dev_dbg(&pdev->dev, "failed to register minidump device\n");
+ smem->minidump = NULL;
+ }
+
return 0;
}

static void qcom_smem_remove(struct platform_device *pdev)
{
+ platform_device_unregister(__smem->minidump);
platform_device_unregister(__smem->socinfo);

hwspin_lock_free(__smem->hwlock);
diff --git a/include/linux/soc/qcom/smem.h b/include/linux/soc/qcom/smem.h
index a36a3b9d4929..08288360a55e 100644
--- a/include/linux/soc/qcom/smem.h
+++ b/include/linux/soc/qcom/smem.h
@@ -14,4 +14,6 @@ phys_addr_t qcom_smem_virt_to_phys(void *p);

int qcom_smem_get_soc_id(u32 *id);

+void *qcom_smem_minidump_ready(void);
+
#endif
diff --git a/include/soc/qcom/qcom_minidump.h b/include/soc/qcom/qcom_minidump.h
index cd87caef919d..9fdf9e9d0af3 100644
--- a/include/soc/qcom/qcom_minidump.h
+++ b/include/soc/qcom/qcom_minidump.h
@@ -6,6 +6,24 @@
#ifndef _QCOM_MINIDUMP_H_
#define _QCOM_MINIDUMP_H_

+#define MAX_NAME_LENGTH 12
+
+/**
+ * struct qcom_minidump_region - APSS Minidump region information
+ *
+ * @name: Entry name, Minidump will dump binary with this name.
+ * @virt_addr: Virtual address of the entry.
+ * @phys_addr: Physical address of the entry to dump.
+ * @size: Number of byte to dump from @address location,
+ * and it should be 4 byte aligned.
+ */
+struct qcom_minidump_region {
+ char name[MAX_NAME_LENGTH];
+ void *virt_addr;
+ phys_addr_t phys_addr;
+ size_t size;
+};
+
struct rproc;
struct rproc_dump_segment;

--
2.7.4

2023-11-27 05:41:23

by Bagas Sanjaya

[permalink] [raw]
Subject: Re: [Patch v6 03/12] docs: qcom: Add qualcomm minidump guide

On Sat, Nov 25, 2023 at 03:49:46AM +0530, Mukesh Ojha wrote:
> diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst
> index 43ea35613dfc..251d070486c2 100644
> --- a/Documentation/admin-guide/index.rst
> +++ b/Documentation/admin-guide/index.rst
> @@ -120,6 +120,7 @@ configure specific aspects of kernel behavior to your liking.
> perf-security
> pm/index
> pnp
> + qcom_minidump
> rapidio
> ras
> rtc
> diff --git a/Documentation/admin-guide/qcom_minidump.rst b/Documentation/admin-guide/qcom_minidump.rst
> new file mode 100644
> index 000000000000..b492f2b79639
> --- /dev/null
> +++ b/Documentation/admin-guide/qcom_minidump.rst
> @@ -0,0 +1,272 @@
> +Qualcomm minidump feature
> +=========================
> +
> +Introduction
> +------------
> +
> +Minidump is a best effort mechanism to collect useful and predefined
> +data for first level of debugging on end user devices running on
> +Qualcomm SoCs. It is built on the premise that System on Chip (SoC)
> +or subsystem part of SoC crashes, due to a range of hardware and
> +software bugs. Hence, the ability to collect accurate data is only
> +a best-effort. The data collected could be invalid or corrupted, data
> +collection itself could fail, and so on.
> +
> +Qualcomm devices in engineering mode provides a mechanism for generating
> +full system RAM dumps for post-mortem debugging. But in some cases it's
> +however not feasible to capture the entire content of RAM. The minidump
> +mechanism provides the means for selected region should be included in
> +the ramdump.
> +
> +
> +::
> +
> + +-----------------------------------------------+
> + | DDR +-------------+ |
> + | | SS0-ToC| |
> + | +----------------+ +----------------+ | |
> + | |Shared memory | | SS1-ToC| | |
> + | |(SMEM) | | | | |
> + | | | +-->|--------+ | | |
> + | |G-ToC | | | SS-ToC \ | | |
> + | |+-------------+ | | | +-----------+ | | |
> + | ||-------------| | | | |-----------| | | |
> + | || SS0-ToC | | | +-|<|SS1 region1| | | |
> + | ||-------------| | | | | |-----------| | | |
> + | || SS1-ToC |-|>+ | | |SS1 region2| | | |
> + | ||-------------| | | | |-----------| | | |
> + | || SS2-ToC | | | | | ... | | | |
> + | ||-------------| | | | |-----------| | | |
> + | || ... | | |-|<|SS1 regionN| | | |
> + | ||-------------| | | | |-----------| | | |
> + | || SSn-ToC | | | | +-----------+ | | |
> + | |+-------------+ | | | | | |
> + | | | | |----------------| | |
> + | | | +>| regionN | | |
> + | | | | |----------------| | |
> + | +----------------+ | | | | |
> + | | |----------------| | |
> + | +>| region1 | | |
> + | |----------------| | |
> + | | | | |
> + | |----------------|-+ |
> + | | region5 | |
> + | |----------------| |
> + | | | |
> + | Region information +----------------+ |
> + | +---------------+ |
> + | |region name | |
> + | |---------------| |
> + | |region address | |
> + | |---------------| |
> + | |region size | |
> + | +---------------+ |
> + +-----------------------------------------------+
> + G-ToC: Global table of contents
> + SS-ToC: Subsystem table of contents
> + SS0-SSn: Subsystem numbered from 0 to n
> +
> +It depends on SoC where the underlying firmware is keeping the
> +minidump global table taking care of subsystem ToC part for
> +minidump like for above diagram, it is for shared memory sitting
> +in DDR and it is shared among various master however it is possible
> +that this could be implemented via memory mapped regions but the
> +general idea should remain same. Here, various subsystem could be
> +DSP's like ADSP/CDSP/MODEM etc, along with Application processor
> +(APSS) where Linux runs. DSP minidump gets collected when DSP's goes
> +for recovery followed by a crash. The minidump part of code for
> +that resides in ``qcom_rproc_minidump.c``.
> +
> +
> +SMEM as backend
> +----------------
> +
> +In this document, SMEM will be used as the backend implementation
> +of minidump.
> +
> +The core of minidump feature is part of Qualcomm's boot firmware code.
> +It initializes shared memory (SMEM), which is a part of DDR and
> +allocates a small section of it to minidump table, i.e. also called
> +global table of contents (G-ToC). Each subsystem (APSS, ADSP, ...) has
> +its own table of segments to be included in the minidump, all
> +references from a descriptor in SMEM (G-ToC). Each segment/region has
> +some details like name, physical address and its size etc. and it
> +could be anywhere scattered in the DDR.
> +
> +Qualcomm APSS Minidump kernel driver concept
> +--------------------------------------------
> +
> +Qualcomm APSS minidump kernel driver adds the capability to add Linux
> +region to be dumped as part of RAM dump collection. At the moment,
> +shared memory driver creates platform device for minidump driver and
> +give a means to APSS minidump to initialize itself on probe.
> +
> +This driver provides ``qcom_minidump_region_register`` and
> +``qcom_minidump_region_unregister`` API's to register and unregister
> +APSS minidump region. It also supports registration for the clients
> +who came before minidump driver was initialized. It maintains pending
> +list of clients who came before minidump and once minidump is initialized
> +it registers them in one go.
> +
> +To simplify post-mortem debugging, driver creates and maintain an ELF
> +header as first region that gets updated each time a new region gets
> +registered.
> +
> +The solution supports extracting the RAM dump/minidump produced either
> +over USB or stored to an attached storage device.
> +
> +Dependency of minidump kernel driver
> +------------------------------------
> +
> +It is to note that whole of minidump depends on Qualcomm boot firmware
> +whether it supports minidump or not. So, if the minidump SMEM ID is
> +present in shared memory, it indicates that minidump is supported from
> +boot firmware and it is possible to dump Linux (APSS) region as part
> +of minidump collection.
> +
> +How a kernel client driver can register region with minidump
> +------------------------------------------------------------
> +
> +Client driver can use ``qcom_minidump_region_register`` API's to register
> +and ``qcom_minidump_region_unregister`` to unregister their region from
> +minidump driver.
> +
> +Client needs to fill their region by filling ``qcom_minidump_region``
> +structure object which consists of the region name, region's virtual
> +and physical address and its size.
> +
> +Below, is one sample client driver snippet which tries to allocate a
> +region from kernel heap of certain size and it writes a certain known
> +pattern (that can help in verification after collection that we got
> +the exact pattern, what we wrote) and registers it with minidump.
> +
> + .. code-block:: c
> +
> + #include <soc/qcom/qcom_minidump.h>
> + [...]
> +
> +
> + [... inside a function ...]
> + struct qcom_minidump_region region;
> +
> + [...]
> +
> + client_mem_region = kzalloc(region_size, GFP_KERNEL);
> + if (!client_mem_region)
> + return -ENOMEM;
> +
> + [... Just write a pattern ...]
> + memset(client_mem_region, 0xAB, region_size);
> +
> + [... Fill up the region object ...]
> + strlcpy(region.name, "REGION_A", sizeof(region.name));
> + region.virt_addr = client_mem_region;
> + region.phys_addr = virt_to_phys(client_mem_region);
> + region.size = region_size;
> +
> + ret = qcom_minidump_region_register(&region);
> + if (ret < 0) {
> + pr_err("failed to add region in minidump: err: %d\n", ret);
> + return ret;
> + }
> +
> + [...]
> +
> +
> +Test
> +----
> +
> +Existing Qualcomm devices already supports entire RAM dump (also called
> +full dump) by writing appropriate value to Qualcomm's top control and
> +status register (tcsr) in ``driver/firmware/qcom_scm.c`` .
> +
> +SCM device Tree bindings required to support download mode
> +For example (sm8450) ::
> +
> + / {
> +
> + [...]
> +
> + firmware {
> + scm: scm {
> + compatible = "qcom,scm-sm8450", "qcom,scm";
> + [... tcsr register ... ]
> + qcom,dload-mode = <&tcsr 0x13000>;
> +
> + [...]
> + };
> + };
> +
> + [...]
> +
> + soc: soc@0 {
> +
> + [...]
> +
> + tcsr: syscon@1fc0000 {
> + compatible = "qcom,sm8450-tcsr", "syscon";
> + reg = <0x0 0x1fc0000 0x0 0x30000>;
> + };
> +
> + [...]
> + };
> + [...]
> +
> + };
> +
> +User of minidump can pass ``qcom_scm.download_mode="mini"`` to kernel
> +commandline to set the current download mode to minidump.
> +Similarly, ``"full"`` is passed to set the download mode to full dump
> +where entire RAM dump will be collected while setting it ``"full,mini"``
> +will collect minidump along with fulldump.
> +
> +Writing to sysfs node can also be used to set the mode to minidump::
> +
> + echo "mini" > /sys/module/qcom_scm/parameter/download_mode
> +
> +Once the download mode is set, any kind of crash will make the device collect
> +respective dump as per set download mode.
> +
> +Dump collection
> +---------------
> +::
> +
> + +-----------+
> + | |
> + | | +------+
> + | | | |
> + | | +--+---+ Product(Qualcomm SoC)
> + +-----------+ |
> + |+++++++++++|<------------+
> + |+++++++++++| usb cable
> + +-----------+
> + x86_64 PC
> +
> +The solution supports a product running with Qualcomm SoC (where minidump)
> +is supported from the firmware) connected to x86_64 host PC running PCAT
> +tool. It supports downloading the minidump produced from product to the
> +host PC over USB or to save the minidump to the product attached storage
> +device(UFS/eMMC/SD Card) into minidump dedicated partition.
> +
> +By default, dumps are downloaded via USB to the attached x86_64 PC running
> +PCAT (Qualcomm tool) software. Upon download, we will see a set of binary
> +blobs starting with name ``md_*`` in PCAT configured directory in x86_64
> +machine, so for above example from the client it will be ``md_REGION_A.BIN``.
> +This binary blob depends on region content to determine whether it needs
> +external parser support to get the content of the region, so for simple
> +plain ASCII text we don't need any parsing and the content can be seen
> +just opening the binary file.
> +
> +To collect the dump to attached storage type, one needs to write appropriate
> +value to IMEM register, in that case dumps are collected in rawdump
> +partition on the product device itself.
> +
> +One needs to read the entire rawdump partition and pull out content to
> +save it onto the attached x86_64 machine over USB. Later, this rawdump
> +can be passed to another tool (``dexter.exe`` [Qualcomm tool]) which
> +converts this into the similar binary blobs which we have got it when
> +download type was set to USB, i.e. a set of registered regions as blobs
> +and their name starts with ``md_*``.
> +
> +Replacing the ``dexter.exe`` with some open source tool can be added as future
> +scope of this document.

LGTM, thanks!

Reviewed-by: Bagas Sanjaya <[email protected]>

--
An old man doll... just what I always wanted! - Clara


Attachments:
(No filename) (11.99 kB)
signature.asc (235.00 B)
Download all attachments

2023-11-27 10:11:50

by Pavan Kondeti

[permalink] [raw]
Subject: Re: [Patch v6 11/12] pstore/ram: Add ramoops ready notifier support

On Sat, Nov 25, 2023 at 03:49:54AM +0530, Mukesh Ojha wrote:
> Client like minidump, is only interested in ramoops
> region addresses/size so that it could register them
> with its table and also it is only deals with ram
> backend and does not use pstorefs to read the records.
> Let's introduce a client notifier in ramoops which
> gets called when ramoops driver probes successfully
> and it passes the ramoops region information to the
> passed callback by the client and If the call for
> ramoops ready register comes after ramoops probe
> than call the callback directly.
>
> Signed-off-by: Mukesh Ojha <[email protected]>
> ---
> fs/pstore/ram.c | 77 ++++++++++++++++++++++++++++++++++++++++++++++
> include/linux/pstore_ram.h | 6 ++++
> 2 files changed, 83 insertions(+)
>
> diff --git a/fs/pstore/ram.c b/fs/pstore/ram.c
> index a6c0da8cfdd4..72341fd21aec 100644
> --- a/fs/pstore/ram.c
> +++ b/fs/pstore/ram.c
> @@ -22,6 +22,7 @@
> #include <linux/of_address.h>
> #include <linux/memblock.h>
> #include <linux/mm.h>
> +#include <linux/mutex.h>
>
> #include "internal.h"
> #include "ram_internal.h"
> @@ -101,6 +102,14 @@ struct ramoops_context {
> unsigned int ftrace_read_cnt;
> unsigned int pmsg_read_cnt;
> struct pstore_info pstore;
> + /*
> + * Lock to serialize calls to register_ramoops_ready_notifier,
> + * ramoops_ready_notifier and read/modification of 'ramoops_ready'.
> + */
> + struct mutex lock;
> + bool ramoops_ready;
> + int (*callback)(const char *name, int id, void *vaddr,
> + phys_addr_t paddr, size_t size);
> };
>
> static struct platform_device *dummy;
> @@ -488,6 +497,7 @@ static int ramoops_pstore_erase(struct pstore_record *record)
> }
>
> static struct ramoops_context oops_cxt = {
> + .lock = __MUTEX_INITIALIZER(oops_cxt.lock),
> .pstore = {
> .owner = THIS_MODULE,
> .name = "ramoops",
> @@ -662,6 +672,68 @@ static int ramoops_init_prz(const char *name,
> return 0;
> }
>
> +void ramoops_ready_notifier(struct ramoops_context *cxt)
> +{
> + struct persistent_ram_zone *prz;
> + int i;
> +
> + if (!cxt->callback)
> + return;
> +
> + for (i = 0; i < cxt->max_dump_cnt; i++) {
> + prz = cxt->dprzs[i];
> + cxt->callback("dmesg", i, prz->vaddr, prz->paddr, prz->size);
> + }
> +
> + if (cxt->console_size) {
> + prz = cxt->cprz;
> + cxt->callback("console", 0, prz->vaddr, prz->paddr, prz->size);
> + }
> +
> + for (i = 0; i < cxt->max_ftrace_cnt; i++) {
> + prz = cxt->fprzs[i];
> + cxt->callback("ftrace", i, prz->vaddr, prz->paddr, prz->size);
> + }
> +
> + if (cxt->pmsg_size) {
> + prz = cxt->mprz;
> + cxt->callback("pmsg", 0, prz->vaddr, prz->paddr, prz->size);
> + }
> +}
> +
> +int register_ramoops_ready_notifier(int (*fn)(const char *, int,
> + void *, phys_addr_t, size_t))
> +{
> + struct ramoops_context *cxt = &oops_cxt;
> +
> + mutex_lock(&cxt->lock);
> + if (cxt->callback) {
> + mutex_unlock(&cxt->lock);
> + return -EEXIST;
> + }
> +
> + cxt->callback = fn;
> + if (cxt->ramoops_ready)
> + ramoops_ready_notifier(cxt);
> +
> + mutex_unlock(&cxt->lock);
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(register_ramoops_ready_notifier);
> +

Can you please elaborate on why do we need this custom notifier logic?

why would not a standard notifier (include/linux/notifier.h) work here?
The notifier_call callback can recieve custom data from the
notifier chain implementer. All we need is to define a custom struct like

struct pstore_ramoops_zone_data {
const char *name;
int id;
void *vaddr;
phys_addr_t paddr;
size_t size;
};

and pass the pointer to array of this struct.


btw, the current logic only supports just one client and this limitation
is not highlighted any where.

Thanks,
Pavan

2023-11-27 11:36:04

by Pavan Kondeti

[permalink] [raw]
Subject: Re: [Patch v6 10/12] pstore/ram: Add dynamic ramoops region support through commandline

On Sat, Nov 25, 2023 at 03:49:53AM +0530, Mukesh Ojha wrote:
> The reserved memory region for ramoops is assumed to be at a fixed
> and known location when read from the devicetree. This may not be
> required for something like Qualcomm's minidump which is interested
> in knowing addresses of ramoops region but it does not put hard
> requirement of address being fixed as most of it's SoC does not
> support warm reset and does not use pstorefs at all instead it has
> firmware way of collecting ramoops region if it gets to know the
> address and register it with apss minidump table which is sitting
> in shared memory region in DDR and firmware will have access to
> these table during reset and collects it on crash of SoC.
>
> So, add the support of reserving ramoops region to be dynamically
> allocated early during boot if it is request through command line
> via 'dyn_ramoops_size=<size>' and fill up reserved resource structure
> and export the structure, so that it can be read by ramoops driver.
>
> Signed-off-by: Mukesh Ojha <[email protected]>
> ---
> Documentation/admin-guide/ramoops.rst | 7 ++++
> fs/pstore/Kconfig | 15 +++++++++
> fs/pstore/ram.c | 62 ++++++++++++++++++++++++++++++++---
> include/linux/pstore_ram.h | 5 +++
> init/main.c | 2 ++
> 5 files changed, 87 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/admin-guide/ramoops.rst b/Documentation/admin-guide/ramoops.rst
> index e9f85142182d..af737adbf079 100644
> --- a/Documentation/admin-guide/ramoops.rst
> +++ b/Documentation/admin-guide/ramoops.rst
> @@ -33,6 +33,13 @@ memory are implementation defined, and won't work on many ARMs such as omaps.
> Setting ``mem_type=2`` attempts to treat the memory region as normal memory,
> which enables full cache on it. This can improve the performance.
>
> +Ramoops memory region can also be allocated dynamically for a special case where
> +there is no requirement to access the logs from pstorefs on next boot instead there
> +is separate backend mechanism like minidump present which has awareness about the
> +dynamic ramoops region and can recover the logs. This is enabled via command line
> +parameter ``dyn_ramoops_size=<size>`` and should not be used in absence of
> +separate backend which knows how to recover this dynamic region.
> +
> The memory area is divided into ``record_size`` chunks (also rounded down to
> power of two) and each kmesg dump writes a ``record_size`` chunk of
> information.
> diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig
> index 3acc38600cd1..e13e53d7a225 100644
> --- a/fs/pstore/Kconfig
> +++ b/fs/pstore/Kconfig
> @@ -81,6 +81,21 @@ config PSTORE_RAM
>
> For more information, see Documentation/admin-guide/ramoops.rst.
>
> +config PSTORE_DYNAMIC_RAMOOPS_REGION_RESERVATION
> + bool "Reserve ramoops region dynamically"
> + select PSTORE_RAM
> + help
> + This enables the dynamic reservation of ramoops region for a special case
> + where there is no requirement to access the logs from pstorefs on next boot
> + instead there is separate backend mechanism like minidump present which has
> + awareness about the dynamic ramoops region and can recover the logs. This is
> + enabled via command line parameter dyn_ramoops_size=<size> and should not be
> + used in absence of separate backend which knows how to recover this dynamic
> + region.
> +
> + Note whenever this config is selected ramoops driver will be build statically
> + into kernel.
> +

Is there any advantage if we decouple this memory reservation from
pstore ram so that pstore ram can still be compiled as module? Asking
because you explicitly mentioned this limitation.

> config PSTORE_ZONE
> tristate
> depends on PSTORE
> diff --git a/fs/pstore/ram.c b/fs/pstore/ram.c
> index 88b34fdbf759..a6c0da8cfdd4 100644
> --- a/fs/pstore/ram.c
> +++ b/fs/pstore/ram.c
> @@ -20,6 +20,7 @@
> #include <linux/compiler.h>
> #include <linux/of.h>
> #include <linux/of_address.h>
> +#include <linux/memblock.h>
> #include <linux/mm.h>
>
> #include "internal.h"
> @@ -103,6 +104,55 @@ struct ramoops_context {
> };
>
> static struct platform_device *dummy;
> +static int dyn_ramoops_size;
> +/* Location of the reserved area for the dynamic ramoops */
> +static struct resource dyn_ramoops_res = {
> + .name = "ramoops",
> + .start = 0,
> + .end = 0,
> + .flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
> + .desc = IORES_DESC_NONE,
> +};
> +
> +static int __init parse_dyn_ramoops_size(char *p)
> +{
> + char *tmp;
> +
> + dyn_ramoops_size = memparse(p, &tmp);
> + if (p == tmp) {
> + pr_err("ramoops: memory size expected\n");
> + return -EINVAL;
> + }
> +
> + return 0;
> +}
> +early_param("dyn_ramoops_size", parse_dyn_ramoops_size);

should not this code be under
CONFIG_PSTORE_DYNAMIC_RAMOOPS_REGION_RESERVATION?

> +
> +#ifdef CONFIG_PSTORE_DYNAMIC_RAMOOPS_REGION_RESERVATION
> +/*
> + * setup_dynamic_ramoops() - reserves memory for dynamic ramoops
> + *
> + * This enable dynamic reserve memory support for ramoops through
> + * command line.
> + */
> +void __init setup_dynamic_ramoops(void)
> +{
> + unsigned long long ramoops_base;
> + unsigned long long ramoops_size;
> +
> + ramoops_base = memblock_phys_alloc_range(dyn_ramoops_size, SMP_CACHE_BYTES,
> + 0, MEMBLOCK_ALLOC_NOLEAKTRACE);
> + if (!ramoops_base) {
> + pr_err("cannot allocate ramoops dynamic memory (size:0x%llx).\n",
> + ramoops_size);
> + return;
> + }

This error needs to be propagated to ramoops_register_dummy() since it
rely on !dyn_ramoops_size . one way is to set dyn_ramoops_size to 0.

> +
> + dyn_ramoops_res.start = ramoops_base;
> + dyn_ramoops_res.end = ramoops_base + dyn_ramoops_size - 1;
> + insert_resource(&iomem_resource, &dyn_ramoops_res);
> +}
> +#endif
>
> static int ramoops_pstore_open(struct pstore_info *psi)
> {
> @@ -915,14 +965,18 @@ static void __init ramoops_register_dummy(void)
>
> /*
> * Prepare a dummy platform data structure to carry the module
> - * parameters. If mem_size isn't set, then there are no module
> - * parameters, and we can skip this.
> + * parameters. If mem_size isn't set, check for dynamic ramoops
> + * size and use if it is set.
> */
> - if (!mem_size)
> + if (!mem_size && !dyn_ramoops_size)
> return;
>

If mem_size and dyn_ramoops_size are set, you are taking
dyn_ramoops_size precedence here. The comment is a bit confusing, pls
review it once.

> - pr_info("using module parameters\n");
> + if (dyn_ramoops_size) {
> + mem_size = dyn_ramoops_size;
> + mem_address = dyn_ramoops_res.start;
> + }
>

Overall it Looks good to me. Thanks.

2023-11-28 10:38:17

by Mukesh Ojha

[permalink] [raw]
Subject: Re: [Patch v6 10/12] pstore/ram: Add dynamic ramoops region support through commandline



On 11/27/2023 5:04 PM, Pavan Kondeti wrote:
> On Sat, Nov 25, 2023 at 03:49:53AM +0530, Mukesh Ojha wrote:
>> The reserved memory region for ramoops is assumed to be at a fixed
>> and known location when read from the devicetree. This may not be
>> required for something like Qualcomm's minidump which is interested
>> in knowing addresses of ramoops region but it does not put hard
>> requirement of address being fixed as most of it's SoC does not
>> support warm reset and does not use pstorefs at all instead it has
>> firmware way of collecting ramoops region if it gets to know the
>> address and register it with apss minidump table which is sitting
>> in shared memory region in DDR and firmware will have access to
>> these table during reset and collects it on crash of SoC.
>>
>> So, add the support of reserving ramoops region to be dynamically
>> allocated early during boot if it is request through command line
>> via 'dyn_ramoops_size=<size>' and fill up reserved resource structure
>> and export the structure, so that it can be read by ramoops driver.
>>
>> Signed-off-by: Mukesh Ojha <[email protected]>
>> ---
>> Documentation/admin-guide/ramoops.rst | 7 ++++
>> fs/pstore/Kconfig | 15 +++++++++
>> fs/pstore/ram.c | 62 ++++++++++++++++++++++++++++++++---
>> include/linux/pstore_ram.h | 5 +++
>> init/main.c | 2 ++
>> 5 files changed, 87 insertions(+), 4 deletions(-)
>>
>> diff --git a/Documentation/admin-guide/ramoops.rst b/Documentation/admin-guide/ramoops.rst
>> index e9f85142182d..af737adbf079 100644
>> --- a/Documentation/admin-guide/ramoops.rst
>> +++ b/Documentation/admin-guide/ramoops.rst
>> @@ -33,6 +33,13 @@ memory are implementation defined, and won't work on many ARMs such as omaps.
>> Setting ``mem_type=2`` attempts to treat the memory region as normal memory,
>> which enables full cache on it. This can improve the performance.
>>
>> +Ramoops memory region can also be allocated dynamically for a special case where
>> +there is no requirement to access the logs from pstorefs on next boot instead there
>> +is separate backend mechanism like minidump present which has awareness about the
>> +dynamic ramoops region and can recover the logs. This is enabled via command line
>> +parameter ``dyn_ramoops_size=<size>`` and should not be used in absence of
>> +separate backend which knows how to recover this dynamic region.
>> +
>> The memory area is divided into ``record_size`` chunks (also rounded down to
>> power of two) and each kmesg dump writes a ``record_size`` chunk of
>> information.
>> diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig
>> index 3acc38600cd1..e13e53d7a225 100644
>> --- a/fs/pstore/Kconfig
>> +++ b/fs/pstore/Kconfig
>> @@ -81,6 +81,21 @@ config PSTORE_RAM
>>
>> For more information, see Documentation/admin-guide/ramoops.rst.
>>
>> +config PSTORE_DYNAMIC_RAMOOPS_REGION_RESERVATION
>> + bool "Reserve ramoops region dynamically"
>> + select PSTORE_RAM
>> + help
>> + This enables the dynamic reservation of ramoops region for a special case
>> + where there is no requirement to access the logs from pstorefs on next boot
>> + instead there is separate backend mechanism like minidump present which has
>> + awareness about the dynamic ramoops region and can recover the logs. This is
>> + enabled via command line parameter dyn_ramoops_size=<size> and should not be
>> + used in absence of separate backend which knows how to recover this dynamic
>> + region.
>> +
>> + Note whenever this config is selected ramoops driver will be build statically
>> + into kernel.
>> +
>
> Is there any advantage if we decouple this memory reservation from
> pstore ram so that pstore ram can still be compiled as module? Asking
> because you explicitly mentioned this limitation.

This is doable and it will be needing export(may be _NS) of
ramoops resource if ramoops needs to be build as modules.

Thanks for suggestion.
But Let's hear it from other people as well if they have something
to add otherwise, will do it next series.

>
>> config PSTORE_ZONE
>> tristate
>> depends on PSTORE
>> diff --git a/fs/pstore/ram.c b/fs/pstore/ram.c
>> index 88b34fdbf759..a6c0da8cfdd4 100644
>> --- a/fs/pstore/ram.c
>> +++ b/fs/pstore/ram.c
>> @@ -20,6 +20,7 @@
>> #include <linux/compiler.h>
>> #include <linux/of.h>
>> #include <linux/of_address.h>
>> +#include <linux/memblock.h>
>> #include <linux/mm.h>
>>
>> #include "internal.h"
>> @@ -103,6 +104,55 @@ struct ramoops_context {
>> };
>>
>> static struct platform_device *dummy;
>> +static int dyn_ramoops_size;
>> +/* Location of the reserved area for the dynamic ramoops */
>> +static struct resource dyn_ramoops_res = {
>> + .name = "ramoops",
>> + .start = 0,
>> + .end = 0,
>> + .flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
>> + .desc = IORES_DESC_NONE,
>> +};
>> +
>> +static int __init parse_dyn_ramoops_size(char *p)
>> +{
>> + char *tmp;
>> +
>> + dyn_ramoops_size = memparse(p, &tmp);
>> + if (p == tmp) {
>> + pr_err("ramoops: memory size expected\n");
>> + return -EINVAL;
>> + }
>> +
>> + return 0;
>> +}
>> +early_param("dyn_ramoops_size", parse_dyn_ramoops_size);
>
> should not this code be under
> CONFIG_PSTORE_DYNAMIC_RAMOOPS_REGION_RESERVATION?

Yeah, looks to be miss., thanks again..

>
>> +
>> +#ifdef CONFIG_PSTORE_DYNAMIC_RAMOOPS_REGION_RESERVATION
>> +/*
>> + * setup_dynamic_ramoops() - reserves memory for dynamic ramoops
>> + *
>> + * This enable dynamic reserve memory support for ramoops through
>> + * command line.
>> + */
>> +void __init setup_dynamic_ramoops(void)
>> +{
>> + unsigned long long ramoops_base;
>> + unsigned long long ramoops_size;
>> +
>> + ramoops_base = memblock_phys_alloc_range(dyn_ramoops_size, SMP_CACHE_BYTES,
>> + 0, MEMBLOCK_ALLOC_NOLEAKTRACE);
>> + if (!ramoops_base) {
>> + pr_err("cannot allocate ramoops dynamic memory (size:0x%llx).\n",
>> + ramoops_size);
>> + return;
>> + }
>
> This error needs to be propagated to ramoops_register_dummy() since it
> rely on !dyn_ramoops_size . one way is to set dyn_ramoops_size to 0.

Good point, will do that..

>
>> +
>> + dyn_ramoops_res.start = ramoops_base;
>> + dyn_ramoops_res.end = ramoops_base + dyn_ramoops_size - 1;
>> + insert_resource(&iomem_resource, &dyn_ramoops_res);
>> +}
>> +#endif
>>
>> static int ramoops_pstore_open(struct pstore_info *psi)
>> {
>> @@ -915,14 +965,18 @@ static void __init ramoops_register_dummy(void)
>>
>> /*
>> * Prepare a dummy platform data structure to carry the module
>> - * parameters. If mem_size isn't set, then there are no module
>> - * parameters, and we can skip this.
>> + * parameters. If mem_size isn't set, check for dynamic ramoops
>> + * size and use if it is set.
>> */
>> - if (!mem_size)
>> + if (!mem_size && !dyn_ramoops_size)
>> return;
>>
>
> If mem_size and dyn_ramoops_size are set, you are taking
> dyn_ramoops_size precedence here. The comment is a bit confusing, pls
> review it once.

Ideally, both should not be set and there will always be
confusion.

Do you think, if we use mem_size a single variable both for earlier
and dynamic ramoops where based on dyn_ramoops_size=true/on a boolean
it will take dynamic ramoops path and if not mentioned it will take
older path.

-Mukesh
>
>> - pr_info("using module parameters\n");
>> + if (dyn_ramoops_size) {
>> + mem_size = dyn_ramoops_size;
>> + mem_address = dyn_ramoops_res.start;
>> + }
>>
>
> Overall it Looks good to me. Thanks.

2023-11-28 11:11:22

by Mukesh Ojha

[permalink] [raw]
Subject: Re: [Patch v6 11/12] pstore/ram: Add ramoops ready notifier support



On 11/27/2023 3:40 PM, Pavan Kondeti wrote:
> On Sat, Nov 25, 2023 at 03:49:54AM +0530, Mukesh Ojha wrote:
>> Client like minidump, is only interested in ramoops
>> region addresses/size so that it could register them
>> with its table and also it is only deals with ram
>> backend and does not use pstorefs to read the records.
>> Let's introduce a client notifier in ramoops which
>> gets called when ramoops driver probes successfully
>> and it passes the ramoops region information to the
>> passed callback by the client and If the call for
>> ramoops ready register comes after ramoops probe
>> than call the callback directly.
>>
>> Signed-off-by: Mukesh Ojha <[email protected]>
>> ---
>> fs/pstore/ram.c | 77 ++++++++++++++++++++++++++++++++++++++++++++++
>> include/linux/pstore_ram.h | 6 ++++
>> 2 files changed, 83 insertions(+)
>>
>> diff --git a/fs/pstore/ram.c b/fs/pstore/ram.c
>> index a6c0da8cfdd4..72341fd21aec 100644
>> --- a/fs/pstore/ram.c
>> +++ b/fs/pstore/ram.c
>> @@ -22,6 +22,7 @@
>> #include <linux/of_address.h>
>> #include <linux/memblock.h>
>> #include <linux/mm.h>
>> +#include <linux/mutex.h>
>>
>> #include "internal.h"
>> #include "ram_internal.h"
>> @@ -101,6 +102,14 @@ struct ramoops_context {
>> unsigned int ftrace_read_cnt;
>> unsigned int pmsg_read_cnt;
>> struct pstore_info pstore;
>> + /*
>> + * Lock to serialize calls to register_ramoops_ready_notifier,
>> + * ramoops_ready_notifier and read/modification of 'ramoops_ready'.
>> + */
>> + struct mutex lock;
>> + bool ramoops_ready;
>> + int (*callback)(const char *name, int id, void *vaddr,
>> + phys_addr_t paddr, size_t size);
>> };
>>
>> static struct platform_device *dummy;
>> @@ -488,6 +497,7 @@ static int ramoops_pstore_erase(struct pstore_record *record)
>> }
>>
>> static struct ramoops_context oops_cxt = {
>> + .lock = __MUTEX_INITIALIZER(oops_cxt.lock),
>> .pstore = {
>> .owner = THIS_MODULE,
>> .name = "ramoops",
>> @@ -662,6 +672,68 @@ static int ramoops_init_prz(const char *name,
>> return 0;
>> }
>>
>> +void ramoops_ready_notifier(struct ramoops_context *cxt)
>> +{
>> + struct persistent_ram_zone *prz;
>> + int i;
>> +
>> + if (!cxt->callback)
>> + return;
>> +
>> + for (i = 0; i < cxt->max_dump_cnt; i++) {
>> + prz = cxt->dprzs[i];
>> + cxt->callback("dmesg", i, prz->vaddr, prz->paddr, prz->size);
>> + }
>> +
>> + if (cxt->console_size) {
>> + prz = cxt->cprz;
>> + cxt->callback("console", 0, prz->vaddr, prz->paddr, prz->size);
>> + }
>> +
>> + for (i = 0; i < cxt->max_ftrace_cnt; i++) {
>> + prz = cxt->fprzs[i];
>> + cxt->callback("ftrace", i, prz->vaddr, prz->paddr, prz->size);
>> + }
>> +
>> + if (cxt->pmsg_size) {
>> + prz = cxt->mprz;
>> + cxt->callback("pmsg", 0, prz->vaddr, prz->paddr, prz->size);
>> + }
>> +}
>> +
>> +int register_ramoops_ready_notifier(int (*fn)(const char *, int,
>> + void *, phys_addr_t, size_t))
>> +{
>> + struct ramoops_context *cxt = &oops_cxt;
>> +
>> + mutex_lock(&cxt->lock);
>> + if (cxt->callback) {
>> + mutex_unlock(&cxt->lock);
>> + return -EEXIST;
>> + }
>> +
>> + cxt->callback = fn;
>> + if (cxt->ramoops_ready)
>> + ramoops_ready_notifier(cxt);
>> +
>> + mutex_unlock(&cxt->lock);
>> +
>> + return 0;
>> +}
>> +EXPORT_SYMBOL_GPL(register_ramoops_ready_notifier);
>> +
>
> Can you please elaborate on why do we need this custom notifier logic?
>
> why would not a standard notifier (include/linux/notifier.h) work here?
> The notifier_call callback can recieve custom data from the
> notifier chain implementer. All we need is to define a custom struct like
> struct pstore_ramoops_zone_data {
> const char *name;
> int id;
> void *vaddr;
> phys_addr_t paddr;
> size_t size;
> };
>
> and pass the pointer to array of this struct.
>
>
> btw, the current logic only supports just one client and this limitation
> is not highlighted any where.

I could work on it, was not sure if that will be helpful
for other users .

-Mukesh
>
> Thanks,
> Pavan
>

2023-11-28 13:16:07

by Bryan O'Donoghue

[permalink] [raw]
Subject: Re: [Patch v6 03/12] docs: qcom: Add qualcomm minidump guide

On 24/11/2023 22:19, Mukesh Ojha wrote:
> Add the qualcomm minidump guide for the users which tries to cover
> the dependency, API use and the way to test and collect minidump
> on Qualcomm supported SoCs.
>
> Signed-off-by: Mukesh Ojha <[email protected]>
> ---
> Documentation/admin-guide/index.rst | 1 +
> Documentation/admin-guide/qcom_minidump.rst | 272 ++++++++++++++++++++++++++++
> 2 files changed, 273 insertions(+)
> create mode 100644 Documentation/admin-guide/qcom_minidump.rst

General nit-pick on sequencing of your patches. Its good practice (tm)
to put your documentation first in your series.

> diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst
> index 43ea35613dfc..251d070486c2 100644
> --- a/Documentation/admin-guide/index.rst
> +++ b/Documentation/admin-guide/index.rst
> @@ -120,6 +120,7 @@ configure specific aspects of kernel behavior to your liking.
> perf-security
> pm/index
> pnp
> + qcom_minidump
> rapidio
> ras
> rtc
> diff --git a/Documentation/admin-guide/qcom_minidump.rst b/Documentation/admin-guide/qcom_minidump.rst
> new file mode 100644
> index 000000000000..b492f2b79639
> --- /dev/null
> +++ b/Documentation/admin-guide/qcom_minidump.rst
> @@ -0,0 +1,272 @@
> +Qualcomm minidump feature
> +=========================
> +
> +Introduction
> +------------
> +
> +Minidump is a best effort mechanism to collect useful and predefined
> +data for first level of debugging on end user devices running on
> +Qualcomm SoCs.

What does "first-level debugging" mean here ? You use the term
"post-mortem" later, stick with one term for consistency throughout your
document.

Suggest:

"Minidump is a best-effort mechanism to collect useful predefined data
for post-mortem debugging on a Qualcomm System on Chips (SoCs)."

Generally better to stick with "Qualcomm SoC" and "Minidump" once you
establish the terms upfront and early in your text - instead of
reverting to "device" and "it".

It is built on the premise that System on Chip (SoC)
> +or subsystem part of SoC crashes, due to a range of hardware and
> +software bugs.
Instead of saying "It" say "Minidump"

Hence, the ability to collect accurate data is only
> +a best-effort. The data collected could be invalid or corrupted, data
> +collection itself could fail, and so on.

"and so on" is a redundancy, drop.


Suggest:
Minidump is built on the premise that a hardware or software component
on the SoC has encountered an unexpected fault. This means that the data
collected by minidump cannot be assumed to be correct or even present.

> +
> +Qualcomm devices in engineering mode provides a mechanism for generating
> +full system RAM dumps for post-mortem debugging.

Stick with the established naming convention

"Qualcomm SoCs in engineering mode provide a mechanism for generating
full system RAM dumps for this post-mortem debugging"

But in some cases it's
> +however not feasible to capture the entire content of RAM. The minidump
> +mechanism provides the means for selected region should be included in
> +the ramdump.

Dont' start a sentence with "But"

You're also not being clear what you mean by "the entire content of RAM"
since you obviously can't capture a full snap-shot of DRAM and store in
DRAM.

Better to say IMO

"Minidump captures specific pre-defined regions of RAM and stores those
regions in a reserved Minidump specific buffer."

"The region of RAM used to store Minidump data shall be referred to as
SMEM throughout this document." [1]

> +::
> +
> + +-----------------------------------------------+
> + | DDR +-------------+ |

Instead of saying "DDR" just mark this as RAM or Memory.
Its a terrible nit-pick from me but DDR = "Double Data Rate Synchronous
Dynamic Random-Access Memory" but that's irrelevant to this
specification. We could be living in a gigantic SRAM for argument sake.

> + | | SS0-ToC| |
> + | +----------------+ +----------------+ | |
> + | |Shared memory | | SS1-ToC| | |
> + | |(SMEM) | | | | |
> + | | | +-->|--------+ | | |
> + | |G-ToC | | | SS-ToC \ | | |
> + | |+-------------+ | | | +-----------+ | | |
> + | ||-------------| | | | |-----------| | | |
> + | || SS0-ToC | | | +-|<|SS1 region1| | | |
> + | ||-------------| | | | | |-----------| | | |
> + | || SS1-ToC |-|>+ | | |SS1 region2| | | |
> + | ||-------------| | | | |-----------| | | |
> + | || SS2-ToC | | | | | ... | | | |
> + | ||-------------| | | | |-----------| | | |
> + | || ... | | |-|<|SS1 regionN| | | |
> + | ||-------------| | | | |-----------| | | |
> + | || SSn-ToC | | | | +-----------+ | | |
> + | |+-------------+ | | | | | |
> + | | | | |----------------| | |
> + | | | +>| regionN | | |
> + | | | | |----------------| | |
> + | +----------------+ | | | | |
> + | | |----------------| | |
> + | +>| region1 | | |
> + | |----------------| | |
> + | | | | |
> + | |----------------|-+ |
> + | | region5 | |
> + | |----------------| |
> + | | | |
> + | Region information +----------------+ |
> + | +---------------+ |
> + | |region name | |
> + | |---------------| |
> + | |region address | |
> + | |---------------| |
> + | |region size | |
> + | +---------------+ |
> + +-----------------------------------------------+
> + G-ToC: Global table of contents
> + SS-ToC: Subsystem table of contents
> + SS0-SSn: Subsystem numbered from 0 to n

You need to be more consistent here

SSX-ToC -> SSX-SSn

Where X is an integer from 0 upwards and similarly n is a integer from 0
upwards.

This name SSX-SSn is not especially descriptive, I'm not sure if this is
a name you are choosing here ? If so then let me suggest a new name like
"Subsystem Memory Segment" SSX-MSn

->

G-ToC: Global table of contents
SSX-ToC: Subsystem X table of contents.
X is an integer in the range of 0 to ?
Is there an upper limit ?
Presumably this is an 8, 16, 32 or 64 bit integer
Please define either the size of the integer or the
valid range of values 0..128, 0..256
SSX-MSn: Subsystem numbered from 0 to n
Same comment for the 'n' here.

> +
> +It depends on SoC where the underlying firmware is keeping the
> +minidump global table taking care of subsystem ToC part for
> +minidump like for above diagram, it is for shared memory sitting
> +in DDR and it is shared among various master however it is possible

> +that this could be implemented via memory mapped regions but the
> +general idea should remain same. Here, various subsystem could be
> +DSP's like ADSP/CDSP/MODEM etc, along with Application processor
> +(APSS) where Linux runs.


DSP minidump gets collected when DSP's goes
> +for recovery followed by a crash. The minidump part of code for
> +that resides in ``qcom_rproc_minidump.c``.

This paragraph is difficult to parse.

What you are describing here is a linked list, I think you should have a
paragraph describing how the memory structure works

->

"Minidump determines which areas of DRAM to capture via a Minidump
defined linked-list structure.

At the top level a Global Table of Contents (GTOC) enumerates a variable
number of SubSystem Table Of Contents (SSTOC) structures.

Each SSTOC contains a list of SubSystem Memory Segements which are named
according to the containing SSTOC hence (SSX-MSn) where "X" denotes the
SystemSystem index of the containing SSX-ToC and "n" denotes an
individual Memory Segment within the SystemSystem. Hence SS0-MS0 belongs
to SS0-ToC whereas SS1-MS0 belongs to SS1-ToC."

Then I think you can describe how the crash dump colleciton works and
which agents of the system - DSP ? is responsible for collecting the
crashdump

->

"The Application Processor SubSystem (APSS) runs the Linux kernel and is
therefore not responsible for assembling Minidump data. One of the other
system agents in the SoC will be responsible for capturing the Minidump
data during system reset.

Typically one of the SoC Digital Signal Processors (DSP) will be used
for this purpose.

During reset the DSP will walk the GTOC, SSX-ToCs and SSX-MSns
populating the Minidump RAM area with the indicated memory"

> +
> +
> +SMEM as backend


> +----------------
> +
> +In this document, SMEM will be used as the backend implementation
> +of minidump.

[1] As per the above link, you need to introduce the term SMEM earlier.

It's fine to expand on its meaning later but, do please define it once
upfront before you use it in your awesome ASCII art.

> +The core of minidump feature is part of Qualcomm's boot firmware code.
> +It initializes shared memory (SMEM), which is a part of DDR and
> +allocates a small section of it to minidump table, i.e. also called
> +global table of contents (G-ToC). Each subsystem (APSS, ADSP, ...) has
> +its own table of segments to be included in the minidump, all
> +references from a descriptor in SMEM (G-ToC). Each segment/region has
> +some details like name, physical address and its size etc. and it
> +could be anywhere scattered in the DDR.

->

"The SoC's bootloader must reserve an area of RAM as SMEM prior to
handing over control to the run-time operating system. The bootloader is
responsible to place the GTOC at the starting address of SMEM."

If you want to give more technical details of size, physical address -
then explicitly define those in the section above which talks about the
linked-list structure.

Please try to avoid use of "etc" or "and so on" since it assumes the
reader already knows how the system works and can fill in the blanks
but, what you are doing here is educating a Minidump novice in how
things work.

> +
> +Qualcomm APSS Minidump kernel driver concept
> +--------------------------------------------
> +
> +Qualcomm APSS minidump kernel driver adds the capability to add Linux

So why "Minidump" and then "minidump" choose one.

> +region to be dumped as part of RAM dump collection.


OK so this really is the "meat" of the system. The bootloader/firmware
populates the GTOC.

The Q this document should probably answer is how the kernel driver
knows how/where to place its data.

Assumed to be parsing the DTB.

At the moment,
> +shared memory driver creates platform device for minidump driver and
> +give a means to APSS minidump to initialize itself on probe.

"At the moment" is another drop.

Just make a clear statement

"The minidump platform driver populates the APSS porition of the GTOC"

more interesting to me is - are there defined numbers, identifiers for
the APSS ? or do we just add new entries to the GTOC ?

ie. is there a reserved index or "type" in the GTOC that identifies
where the APSS needs to insert itself ?

> +This driver provides ``qcom_minidump_region_register`` and
> +``qcom_minidump_region_unregister`` API's to register and unregister
> +APSS minidump region.

Why does it do that ? Is it not the case that the driver knows where the
APSS data goes ?

It also supports registration for the clients
> +who came before minidump driver was initialized. It maintains pending
> +list of clients who came before minidump and once minidump is initialized
> +it registers them in one go.

Don't start sentences with "It" -> "The driver" or "Minidump"

As I read this though, the Minidump driver in Linux isn't just
registering / managing the APSS side of things but also "doing stuff"
for other clients ?

How does the Linux driver know what to register ?

> +
> +To simplify post-mortem debugging, driver creates and maintain an ELF

the driver creates and maintains

> +header as first region that gets updated each time a new region gets
> +registered.

as the first region

So - who is registering these regions ? Linux kernel drivers ? aDSP / cDSP ?

If I write a new driver for Venus or GPU can I define my own region(s)
to be captured ?

Presumably. Please give more detail on this.

> +
> +The solution supports extracting the RAM dump/minidump produced either
> +over USB or stored to an attached storage device.

What provides that functionality ? The bootloader ?

How do you trigger / capture that dump from the bootloader ?

No need to go into super-detail but give some idea.

> +
> +Dependency of minidump kernel driver
> +------------------------------------
> +
> +It is to note that whole of minidump depends on Qualcomm boot firmware
> +whether it supports minidump or not.

You can drop this - you've already stated the bootloader/firmware must
setup the initial table so, you're not providing additional information
with this statement.

> So, if the minidump SMEM ID is

Try not to start sentences with "So"

SMEM ID ? This is your first time using this term - please relate it
back to your ASCII diagram and the description you give with that text.

> +present in shared memory, it indicates that minidump is supported from
> +boot firmware and it is possible to dump Linux (APSS) region as part
> +of minidump collection.

If _which_ SMEM ID ?

It seems to me as if we are missing some important information here -
what are the list of SMEM IDs ?

Are these IDs serial and incrementing across SoC versions or SoC specific ?

You need to define that data.

> +How a kernel client driver can register region with minidump
> +------------------------------------------------------------

Answering yes to my earlier question. A driver I write can make use of
the API you are providing here.

Great. Please give some indication of that earlier, even if its a
reference to this description you give here "See X.Y later in this document"

> +
> +Client driver can use ``qcom_minidump_region_register`` API's to register
> +and ``qcom_minidump_region_unregister`` to unregister their region from
> +minidump driver.
> +
> +Client needs to fill their region by filling ``qcom_minidump_region``
> +structure object which consists of the region name, region's virtual
> +and physical address and its size.

Nit pick. You need a definite article here "A client driver" etc.

> +
> +Below, is one sample client driver snippet which tries to allocate a
> +region from kernel heap of certain size and it writes a certain known
> +pattern.

Good

(that can help in verification after collection that we got
> +the exact pattern, what we wrote) and registers it with minidump.

Not necessary to define this. We are all smart here and by now the
intent of the mechanism is defined..

> +
> + .. code-block:: c
> +
> + #include <soc/qcom/qcom_minidump.h>
> + [...]
> +
> +
> + [... inside a function ...]
> + struct qcom_minidump_region region;
> +
> + [...]
> +
> + client_mem_region = kzalloc(region_size, GFP_KERNEL);
> + if (!client_mem_region)
> + return -ENOMEM;
> +
> + [... Just write a pattern ...]
> + memset(client_mem_region, 0xAB, region_size);
> +
> + [... Fill up the region object ...]
> + strlcpy(region.name, "REGION_A", sizeof(region.name));
> + region.virt_addr = client_mem_region;
> + region.phys_addr = virt_to_phys(client_mem_region);
> + region.size = region_size;
> +
> + ret = qcom_minidump_region_register(&region);
> + if (ret < 0) {
> + pr_err("failed to add region in minidump: err: %d\n", ret);
> + return ret;
> + }
> +
> + [...]
> +
> +
> +Test
> +----

Testing

> +
> +Existing Qualcomm devices already supports entire RAM dump (also called
> +full dump) by writing appropriate value to Qualcomm's top control and
> +status register (tcsr) in ``driver/firmware/qcom_scm.c`` .

"Existing Qualcomm SoCs already support dumping the entire RAM to the
SMEM area/segment/whatever"

This is 100% counter-intuitive since SMEM lives in RAM, correct ?

Full dump means what, a full dump of the APSS RAM ? What happens if SMEM
cannot accommodate the full APSS RAM dump ?

> +
> +SCM device Tree bindings required to support download mode
> +For example (sm8450) ::
> +
> + / {
> +
> + [...]
> +
> + firmware {
> + scm: scm {
> + compatible = "qcom,scm-sm8450", "qcom,scm";
> + [... tcsr register ... ]
> + qcom,dload-mode = <&tcsr 0x13000>;
> +
> + [...]
> + };
> + };
> +
> + [...]
> +
> + soc: soc@0 {
> +
> + [...]
> +
> + tcsr: syscon@1fc0000 {
> + compatible = "qcom,sm8450-tcsr", "syscon";
> + reg = <0x0 0x1fc0000 0x0 0x30000>;
> + };
> +
> + [...]
> + };
> + [...]
> +
> + };
> +
> +User of minidump can pass ``qcom_scm.download_mode="mini"`` to kernel
> +commandline to set the current download mode to minidump.

"A kernel command line parameter is provided
``qcom_scm.download_mode="mini"`` to facilitate ... but you aren't
telling us what "minidump" captures "the current download" ? do you mean
the current state ?

Does the system continue to boot up if you pass
qcom_scm.download_mode="mini ? will additional registrations to
SMEM/Minidump work ?

What happens to the minidump data if there is a _subsequent_ real
crashdump ?

Overwritten ?

Also what happens if SMEM runs out of space ? Say I boot with
``qcom_scm.download_mode="mini"`` and then the system crashes - SMEM has
a limit right ?

So the minidump gets overwritten ?

> +Similarly, ``"full"`` is passed to set the download mode to full dump
> +where entire RAM dump will be collected while setting it ``"full,mini"``
> +will collect minidump along with fulldump.

Still not super-clear what the difference between mini and full is here.

> +
> +Writing to sysfs node can also be used to set the mode to minidump::
> +
> + echo "mini" > /sys/module/qcom_scm/parameter/download_mode
> +
> +Once the download mode is set, any kind of crash will make the device collect
> +respective dump as per set download mode.

Nice.

> +
> +Dump collection
> +---------------
> +::
> +
> + +-----------+
> + | |
> + | | +------+
> + | | | |
> + | | +--+---+ Product(Qualcomm SoC)
> + +-----------+ |
> + |+++++++++++|<------------+
> + |+++++++++++| usb cable
> + +-----------+
> + x86_64 PC
> +
> +The solution supports a product running with Qualcomm SoC (where minidump)
> +is supported from the firmware) connected to x86_64 host PC running PCAT
> +tool.

It supports downloading the minidump produced from product to the
> +host PC over USB or to save the minidump to the product attached storage
> +device(UFS/eMMC/SD Card) into minidump dedicated partition.

It would be a good idea to reference this section earlier.

> +
> +By default, dumps are downloaded via USB to the attached x86_64 PC running
> +PCAT (Qualcomm tool) software. Upon download, we will see a set of binary
> +blobs starting with name ``md_*`` in PCAT configured directory in x86_64
> +machine, so for above example from the client it will be ``md_REGION_A.BIN``.
> +This binary blob depends on region content to determine whether it needs
> +external parser support to get the content of the region, so for simple
> +plain ASCII text we don't need any parsing and the content can be seen
> +just opening the binary file.
> +
> +To collect the dump to attached storage type, one needs to write appropriate
> +value to IMEM register, in that case dumps are collected in rawdump
> +partition on the product device itself.
> +
> +One needs to read the entire rawdump partition and pull out content to
> +save it onto the attached x86_64 machine over USB. Later, this rawdump
> +can be passed to another tool (``dexter.exe`` [Qualcomm tool]) which
> +converts this into the similar binary blobs which we have got it when
> +download type was set to USB, i.e. a set of registered regions as blobs
> +and their name starts with ``md_*``.
> +
> +Replacing the ``dexter.exe`` with some open source tool can be added as future
> +scope of this document.

---
bod

2023-11-28 16:04:29

by Pavan Kondeti

[permalink] [raw]
Subject: Re: [Patch v6 10/12] pstore/ram: Add dynamic ramoops region support through commandline

On Tue, Nov 28, 2023 at 04:02:33PM +0530, Mukesh Ojha wrote:
> > > static int ramoops_pstore_open(struct pstore_info *psi)
> > > {
> > > @@ -915,14 +965,18 @@ static void __init ramoops_register_dummy(void)
> > > /*
> > > * Prepare a dummy platform data structure to carry the module
> > > - * parameters. If mem_size isn't set, then there are no module
> > > - * parameters, and we can skip this.
> > > + * parameters. If mem_size isn't set, check for dynamic ramoops
> > > + * size and use if it is set.
> > > */
> > > - if (!mem_size)
> > > + if (!mem_size && !dyn_ramoops_size)
> > > return;
> >
> > If mem_size and dyn_ramoops_size are set, you are taking
> > dyn_ramoops_size precedence here. The comment is a bit confusing, pls
> > review it once.
>
> Ideally, both should not be set and there will always be
> confusion.
>
> Do you think, if we use mem_size a single variable both for earlier
> and dynamic ramoops where based on dyn_ramoops_size=true/on a boolean
> it will take dynamic ramoops path and if not mentioned it will take older
> path.
>

Sounds like a good idea to me. You would need a callback for mem_size
module param handling. Because dyn_ramoops can be passed before mem_size
parameter. Also, we can't make pstore ram as module i.e decoupling
dynamic ramoops from pstore ram module.

Thanks,
Pavan

2023-12-11 13:22:02

by Mukesh Ojha

[permalink] [raw]
Subject: Re: [Patch v6 03/12] docs: qcom: Add qualcomm minidump guide

Thanks you very much Bryan for giving so much time of yours in reviewing
minidump DOC and I apologize for coming so late for your reply.

I will definetely apply your suggestion in next series.

Please find my reply inline for your queries.

On 11/28/2023 6:45 PM, Bryan O'Donoghue wrote:
> On 24/11/2023 22:19, Mukesh Ojha wrote:
>> Add the qualcomm minidump guide for the users which tries to cover
>> the dependency, API use and the way to test and collect minidump
>> on Qualcomm supported SoCs.
>>
>> Signed-off-by: Mukesh Ojha <[email protected]>
>> ---
>>   Documentation/admin-guide/index.rst         |   1 +
>>   Documentation/admin-guide/qcom_minidump.rst | 272
>> ++++++++++++++++++++++++++++
>>   2 files changed, 273 insertions(+)
>>   create mode 100644 Documentation/admin-guide/qcom_minidump.rst
>
> General nit-pick on sequencing of your patches. Its good practice (tm)
> to put your documentation first in your series.
>
>> diff --git a/Documentation/admin-guide/index.rst
>> b/Documentation/admin-guide/index.rst
>> index 43ea35613dfc..251d070486c2 100644
>> --- a/Documentation/admin-guide/index.rst
>> +++ b/Documentation/admin-guide/index.rst
>> @@ -120,6 +120,7 @@ configure specific aspects of kernel behavior to
>> your liking.
>>      perf-security
>>      pm/index
>>      pnp
>> +   qcom_minidump
>>      rapidio
>>      ras
>>      rtc
>> diff --git a/Documentation/admin-guide/qcom_minidump.rst
>> b/Documentation/admin-guide/qcom_minidump.rst
>> new file mode 100644
>> index 000000000000..b492f2b79639
>> --- /dev/null
>> +++ b/Documentation/admin-guide/qcom_minidump.rst
>> @@ -0,0 +1,272 @@
>> +Qualcomm minidump feature
>> +=========================
>> +
>> +Introduction
>> +------------
>> +
>> +Minidump is a best effort mechanism to collect useful and predefined
>> +data for first level of debugging on end user devices running on
>> +Qualcomm SoCs.
>
> What does "first-level debugging" mean here ? You use the term
> "post-mortem" later, stick with one term for consistency throughout your
> document.

First level debugging meant was to mean the amount of data captured is
limited and may be sufficient to root cause an issue and you may need
entire ramdump to debug the issue.

>
> Suggest:
>
> "Minidump is a best-effort mechanism to collect useful predefined data
> for post-mortem debugging on a Qualcomm System on Chips (SoCs)."
>
> Generally better to stick with "Qualcomm SoC" and "Minidump" once you
> establish the terms upfront and early in your text - instead of
> reverting to "device" and "it".
>
> It is built on the premise that System on Chip (SoC)
>> +or subsystem part of SoC crashes, due to a range of hardware and
>> +software bugs.
> Instead of saying "It" say "Minidump"

Sure, will try to be consistent in term usage.

>
> Hence, the ability to collect accurate data is only
>> +a best-effort. The data collected could be invalid or corrupted, data
>> +collection itself could fail, and so on.
>
> "and so on" is a redundancy, drop.
>
>
> Suggest:
> Minidump is built on the premise that a hardware or software component
> on the SoC has encountered an unexpected fault. This means that the data
> collected by minidump cannot be assumed to be correct or even present.
>
>> +
>> +Qualcomm devices in engineering mode provides a mechanism for generating
>> +full system RAM dumps for post-mortem debugging.
>
> Stick with the established naming convention
>
> "Qualcomm SoCs in engineering mode provide a mechanism for generating
> full system RAM dumps for this post-mortem debugging"
>
> But in some cases it's
>> +however not feasible to capture the entire content of RAM. The minidump
>> +mechanism provides the means for selected region should be included in
>> +the ramdump.
>
> Dont' start a sentence with "But"

Sure.

>
> You're also not being clear what you mean by "the entire content of RAM"
> since you obviously can't capture a full snap-shot of DRAM and store in
> DRAM.
>
> Better to say IMO
>
> "Minidump captures specific pre-defined regions of RAM and stores those
> regions in a reserved Minidump specific buffer."
>
> "The region of RAM used to store Minidump data shall be referred to as
> SMEM throughout this document." [1]

ok

>
>> +::
>> +
>> +   +-----------------------------------------------+
>> +   |   DDR                       +-------------+   |
>
> Instead of saying "DDR" just mark this as RAM or Memory.
> Its a terrible nit-pick from me but DDR = "Double Data Rate Synchronous
> Dynamic Random-Access Memory" but that's irrelevant to this
> specification. We could be living in a gigantic SRAM for argument sake.

Sure.

>
>> +   |                             |      SS0-ToC|   |
>> +   | +----------------+     +----------------+ |   |
>> +   | |Shared memory   |     |         SS1-ToC| |   |
>> +   | |(SMEM)          |     |                | |   |
>> +   | |                | +-->|--------+       | |   |
>> +   | |G-ToC           | |   | SS-ToC  \      | |   |
>> +   | |+-------------+ | |   | +-----------+  | |   |
>> +   | ||-------------| | |   | |-----------|  | |   |
>> +   | || SS0-ToC     | | | +-|<|SS1 region1|  | |   |
>> +   | ||-------------| | | | | |-----------|  | |   |
>> +   | || SS1-ToC     |-|>+ | | |SS1 region2|  | |   |
>> +   | ||-------------| |   | | |-----------|  | |   |
>> +   | || SS2-ToC     | |   | | |  ...      |  | |   |
>> +   | ||-------------| |   | | |-----------|  | |   |
>> +   | ||  ...        | |   |-|<|SS1 regionN|  | |   |
>> +   | ||-------------| |   | | |-----------|  | |   |
>> +   | || SSn-ToC     | |   | | +-----------+  | |   |
>> +   | |+-------------+ |   | |                | |   |
>> +   | |                |   | |----------------| |   |
>> +   | |                |   +>|  regionN       | |   |
>> +   | |                |   | |----------------| |   |
>> +   | +----------------+   | |                | |   |
>> +   |                      | |----------------| |   |
>> +   |                      +>|  region1       | |   |
>> +   |                        |----------------| |   |
>> +   |                        |                | |   |
>> +   |                        |----------------|-+   |
>> +   |                        |  region5       |     |
>> +   |                        |----------------|     |
>> +   |                        |                |     |
>> +   |  Region information    +----------------+     |
>> +   | +---------------+                             |
>> +   | |region name    |                             |
>> +   | |---------------|                             |
>> +   | |region address |                             |
>> +   | |---------------|                             |
>> +   | |region size    |                             |
>> +   | +---------------+                             |
>> +   +-----------------------------------------------+
>> +       G-ToC: Global table of contents
>> +       SS-ToC: Subsystem table of contents
>> +       SS0-SSn: Subsystem numbered from 0 to n
>
> You need to be more consistent here
>
> SSX-ToC -> SSX-SSn
>
> Where X is an integer from 0 upwards and similarly n is a integer from 0
> upwards.
>
> This name SSX-SSn is not especially descriptive, I'm not sure if this is
> a name you are choosing here ? If so then let me suggest a new name like
> "Subsystem Memory Segment" SSX-MSn

looks good.

>
> ->
>
>        G-ToC: Global table of contents
>        SSX-ToC: Subsystem X table of contents.
>                 X is an integer in the range of 0 to ?
>                 Is there an upper limit ?

Older Soc had the limit of 10 but the newer SoC firmware this limit has
increased.

This reminds me of creating a patch to make the no of subsystem
configurable through CONFIG_*.

#define MAX_NUM_OF_SS 10

"drivers/remoteproc/qcom_common.c"


>                 Presumably this is an 8, 16, 32 or 64 bit integer
>                 Please define either the size of the integer or the
>                 valid range of values 0..128, 0..256
>        SSX-MSn: Subsystem numbered from 0 to n
>                 Same comment for the 'n' here.

Thanks again..

>
>> +
>> +It depends on SoC where the underlying firmware is keeping the
>> +minidump global table taking care of subsystem ToC part for
>> +minidump like for above diagram, it is for shared memory sitting
>> +in DDR and it is shared among various master however it is possible
>
>> +that this could be implemented via memory mapped regions but the
>> +general idea should remain same. Here, various subsystem could be
>> +DSP's like ADSP/CDSP/MODEM etc, along with Application processor
>> +(APSS) where Linux runs.
>
>
> DSP minidump gets collected when DSP's goes
>> +for recovery followed by a crash. The minidump part of code for
>> +that resides in ``qcom_rproc_minidump.c``.
>
> This paragraph is difficult to parse.
>
> What you are describing here is a linked list, I think you should have a
> paragraph describing how the memory structure works
>
> ->
>
> "Minidump determines which areas of DRAM to capture via a Minidump
> defined linked-list structure.
>
> At the top level a Global Table of Contents (GTOC) enumerates a variable
> number of SubSystem Table Of Contents (SSTOC) structures.
>
> Each SSTOC contains a list of SubSystem Memory Segements which are named
> according to the containing SSTOC hence (SSX-MSn) where "X" denotes the
> SystemSystem index of the containing SSX-ToC and "n" denotes an
> individual Memory Segment within the SystemSystem. Hence SS0-MS0 belongs
> to SS0-ToC whereas SS1-MS0 belongs to SS1-ToC."

Thanks for this.

>
> Then I think you can describe how the crash dump colleciton works and
> which agents of the system - DSP ? is responsible for collecting the
> crashdump
>
> ->
>
> "The Application Processor SubSystem (APSS) runs the Linux kernel and is
> therefore not responsible for assembling Minidump data. One of the other
> system agents in the SoC will be responsible for capturing the Minidump
> data during system reset.
>
> Typically one of the SoC Digital Signal Processors (DSP) will be used
> for this purpose.
>
> During reset the DSP will walk the GTOC, SSX-ToCs and SSX-MSns
> populating the Minidump RAM area with the indicated memory"
>
>> +
>> +
>> +SMEM as backend
>
>
>> +----------------
>> +
>> +In this document, SMEM will be used as the backend implementation
>> +of minidump.
>
> [1] As per the above link, you need to introduce the term SMEM earlier.
>
> It's fine to expand on its meaning later but, do please define it once
> upfront before you use it in your awesome ASCII art.

Sure.

>
>> +The core of minidump feature is part of Qualcomm's boot firmware code.
>> +It initializes shared memory (SMEM), which is a part of DDR and
>> +allocates a small section of it to minidump table, i.e. also called
>> +global table of contents (G-ToC). Each subsystem (APSS, ADSP, ...) has
>> +its own table of segments to be included in the minidump, all
>> +references from a descriptor in SMEM (G-ToC). Each segment/region has
>> +some details like name, physical address and its size etc. and it
>> +could be anywhere scattered in the DDR.
>
> ->
>
> "The SoC's bootloader must reserve an area of RAM as SMEM prior to
> handing over control to the run-time operating system. The bootloader is
> responsible to place the GTOC at the starting address of SMEM."
>
> If you want to give more technical details of size, physical address -
> then explicitly define those in the section above which talks about the
> linked-list structure.
>
> Please try to avoid use of "etc" or "and so on" since it assumes the
> reader already knows how the system works and can fill in the blanks
> but, what you are doing here is educating a Minidump novice in how
> things work.

ok.

>
>> +
>> +Qualcomm APSS Minidump kernel driver concept
>> +--------------------------------------------
>> +
>> +Qualcomm APSS minidump kernel driver adds the capability to add Linux
>
> So why "Minidump" and then "minidump" choose one.
>
>> +region to be dumped as part of RAM dump collection.
>
>
> OK so this really is the "meat" of the system. The bootloader/firmware
> populates the GTOC.
>
> The Q this document should probably answer is how the kernel driver
> knows how/where to place its data.
>
> Assumed to be parsing the DTB.
>
>  At the moment,
>> +shared memory driver creates platform device for minidump driver and
>> +give a means to APSS minidump to initialize itself on probe.
>
> "At the moment" is another drop.
>
> Just make a clear statement
>
> "The minidump platform driver populates the APSS porition of the GTOC"
>
> more interesting to me is - are there defined numbers, identifiers for
> the APSS ? or do we just add new entries to the GTOC ?
>
> ie. is there a reserved index or "type" in the GTOC that identifies
> where the APSS needs to insert itself ?

Correct.

APSS has similar index i.e, 0 similar to minidump index for other
remoteprocs mentioned belowin GTOC.

https://elixir.bootlin.com/linux/v6.7-rc5/source/drivers/remoteproc/qcom_q6v5_pas.c#L1113

https://elixir.bootlin.com/linux/v6.7-rc5/source/drivers/remoteproc/qcom_q6v5_pas.c#L1133

https://elixir.bootlin.com/linux/v6.7-rc5/source/drivers/remoteproc/qcom_q6v5_pas.c#L1153

qcom_smem_get(QCOM_SMEM_HOST_ANY, SBL_MINIDUMP_SMEM_ID, &size);

Above API is the key, And given SMEM ID it gives the object detail
present in shared memory and whether it is Minidump global table
structure object address in case of minidump or SoC info object
in case of socinfo related information (drivers/soc/qcom/socinfo.c).

>
>> +This driver provides ``qcom_minidump_region_register`` and
>> +``qcom_minidump_region_unregister`` API's to register and unregister
>> +APSS minidump region.
>
> Why does it do that ? Is it not the case that the driver knows where the
> APSS data goes ?

If you mean why does it provide
qcom_minidump_region_register/qcom_minidump_region_unregister

E.g., it is for the linux clients to upfront register their buffer or
object which they want to see to check once there is crash from their
driver and same is true other linux clients.

In the current state of the driver it does not export these symbols
and only captures ramoops frontends data like console, dmesg, ftrace,
pmesg but there is plan to do that when one of the upstream client
driver interested in capturing something from their driver they have
to use this API to register it with APSS minidump table.

>
> It also supports registration for the clients
>> +who came before minidump driver was initialized. It maintains pending
>> +list of clients who came before minidump and once minidump is
>> initialized
>> +it registers them in one go.
>
> Don't start sentences with "It" -> "The driver" or "Minidump"
>
> As I read this though, the Minidump driver in Linux isn't just
> registering / managing the APSS side of things but also "doing stuff"
> for other clients ?
>
> How does the Linux driver know what to register ?

Again, I am reiterating what i said above, it is upto the client driver
interests what they want to check from their driver once they see a
crash from there driver, it could be some buffer content or structure
object or anything.

Minidump is kind of blind what are you capturing is not known to itself.

>
>> +
>> +To simplify post-mortem debugging, driver creates and maintain an ELF
>
> the driver creates and maintains
>
>> +header as first region that gets updated each time a new region gets
>> +registered.
>
> as the first region
>
> So - who is registering these regions ? Linux kernel drivers ? aDSP /
> cDSP ?

For APSS(linux), driver register it through the call made to
qcom_minidump_region_{register|unregister} by the linux clients.
Here, it provide flexiblity to dynamically register the regions.

For ADSP/CDSP/MODEM, they will have their own way which is mostly
statically decided what addresses range they want to capture on
their crash.

>
> If I write a new driver for Venus or GPU can I define my own region(s)
> to be captured ?

I am not an expert of Venus/GPU but AFAIK, currently their firmware side
don't use minidump but you can still use APSS minidump to register
required their host regions from GPU/VENUS linux drivers once APSS
minidump driver exports register/unregister API.

>
> Presumably. Please give more detail on this.
>
>> +
>> +The solution supports extracting the RAM dump/minidump produced either
>> +over USB or stored to an attached storage device.
>
> What provides that functionality ? The bootloader ?

Yes, the bootloader.

>
> How do you trigger / capture that dump from the bootloader ?

This is explained how to set minidump and once set any region of SOC
crash will collect minidump and if not set by default entire ddr
dump is collected.

>
> No need to go into super-detail but give some idea.
>
>> +
>> +Dependency of minidump kernel driver
>> +------------------------------------
>> +
>> +It is to note that whole of minidump depends on Qualcomm boot firmware
>> +whether it supports minidump or not.
>
> You can drop this - you've already stated the bootloader/firmware must
> setup the initial table so, you're not providing additional information
> with this statement.
>
>> So, if the minidump SMEM ID is
>
> Try not to start sentences with "So"
>
> SMEM ID ? This is your first time using this term - please relate it
> back to your ASCII diagram and the description you give with that text.
>
>> +present in shared memory, it indicates that minidump is supported from
>> +boot firmware and it is possible to dump Linux (APSS) region as part
>> +of minidump collection.
>
> If _which_ SMEM ID ?
>
> It seems to me as if we are missing some important information here -
> what are the list of SMEM IDs ?
>
> Are these IDs serial and incrementing across SoC versions or SoC specific ?


I have described above in my explanation probably it should make sense
now, will also try to add it in the doc.

>
> You need to define that data.
>
>> +How a kernel client driver can register region with minidump
>> +------------------------------------------------------------
>
> Answering yes to my earlier question. A driver I write can make use of
> the API you are providing here.
>
> Great. Please give some indication of that earlier, even if its a
> reference to this description you give here "See X.Y later in this
> document"
>
>> +
>> +Client driver can use ``qcom_minidump_region_register`` API's to
>> register
>> +and ``qcom_minidump_region_unregister`` to unregister their region from
>> +minidump driver.
>> +
>> +Client needs to fill their region by filling ``qcom_minidump_region``
>> +structure object which consists of the region name, region's virtual
>> +and physical address and its size.
>
> Nit pick. You need a definite article here "A client driver" etc.
>
>> +
>> +Below, is one sample client driver snippet which tries to allocate a
>> +region from kernel heap of certain size and it writes a certain known
>> +pattern.
>
> Good
>
>  (that can help in verification after collection that we got
>> +the exact pattern, what we wrote) and registers it with minidump.
>
> Not necessary to define this. We are all smart here and by now the
> intent of the mechanism is defined..
>
>> +
>> + .. code-block:: c
>> +
>> +  #include <soc/qcom/qcom_minidump.h>
>> +  [...]
>> +
>> +
>> +  [... inside a function ...]
>> +  struct qcom_minidump_region region;
>> +
>> +  [...]
>> +
>> +  client_mem_region = kzalloc(region_size, GFP_KERNEL);
>> +  if (!client_mem_region)
>> +    return -ENOMEM;
>> +
>> +  [... Just write a pattern ...]
>> +  memset(client_mem_region, 0xAB, region_size);
>> +
>> +  [... Fill up the region object ...]
>> +  strlcpy(region.name, "REGION_A", sizeof(region.name));
>> +  region.virt_addr = client_mem_region;
>> +  region.phys_addr = virt_to_phys(client_mem_region);
>> +  region.size = region_size;
>> +
>> +  ret = qcom_minidump_region_register(&region);
>> +  if (ret < 0) {
>> +    pr_err("failed to add region in minidump: err: %d\n", ret);
>> +    return ret;
>> +  }
>> +
>> +  [...]
>> +
>> +
>> +Test
>> +----
>
> Testing
>
>> +
>> +Existing Qualcomm devices already supports entire RAM dump (also called
>> +full dump) by writing appropriate value to Qualcomm's top control and
>> +status register (tcsr) in ``driver/firmware/qcom_scm.c`` .
>
> "Existing Qualcomm SoCs already support dumping the entire RAM to the
> SMEM area/segment/whatever"
>
> This is 100% counter-intuitive since SMEM lives in RAM, correct ?
>
> Full dump means what, a full dump of the APSS RAM ? What happens if SMEM
> cannot accommodate the full APSS RAM dump ?

Sorry, I never said full dump is captured in SMEM, the only difference
of minidump with full dump is, no region registration is required so no
need of SMEM(where the meta data is there right ?) and entire ddr dump
is collected by firmware/bootloader and it sent to to host connected
machine via USB through Sahara protocol.

>
>> +
>> +SCM device Tree bindings required to support download mode
>> +For example (sm8450) ::
>> +
>> +    / {
>> +
>> +    [...]
>> +
>> +        firmware {
>> +            scm: scm {
>> +                compatible = "qcom,scm-sm8450", "qcom,scm";
>> +                [... tcsr register ... ]
>> +                qcom,dload-mode = <&tcsr 0x13000>;
>> +
>> +                [...]
>> +            };
>> +        };
>> +
>> +    [...]
>> +
>> +        soc: soc@0 {
>> +
>> +            [...]
>> +
>> +            tcsr: syscon@1fc0000 {
>> +                compatible = "qcom,sm8450-tcsr", "syscon";
>> +                reg = <0x0 0x1fc0000 0x0 0x30000>;
>> +            };
>> +
>> +            [...]
>> +        };
>> +    [...]
>> +
>> +    };
>> +
>> +User of minidump can pass ``qcom_scm.download_mode="mini"`` to kernel
>> +commandline to set the current download mode to minidump.
>
> "A kernel command line parameter is provided
> ``qcom_scm.download_mode="mini"`` to facilitate ... but you aren't
> telling us what "minidump" captures "the current download" ? do you mean
> the current state ?

Minidump is a SoC solution and not specific to linux(APSS) so even if
there is no APSS linux driver to capture linux specific regions,
firmware like ADSP/CDSP/MODEM or Trustzone or boot loader still
registers their own regions that get captured and you will get them on
crash. If there is no registration from anywhere it will not dump
anything.

>
> Does the system continue to boot up if you pass
> qcom_scm.download_mode="mini ? will additional registrations to
> SMEM/Minidump work ?

yes, it will continue to boot, this parameter just convert the
capture mode from full dump (entire ddr dump ) to minidump (smem)
registered region dump.

Yes, additional registration of linux region to SMEM will work.

>
> What happens to the minidump data if there is a _subsequent_ real
> crashdump ?

By passing qcom_scm.download_mode="mini" we are converting crash dump
mode from full(complete ddr) to minidump.
>
> Overwritten ?
>
> Also what happens if SMEM runs out of space ? Say I boot with
> ``qcom_scm.download_mode="mini"`` and then the system crashes - SMEM has
> a limit right ?
>
> So the minidump gets overwritten ?

I think, is this after your confusion full dump also gets written to
SMEM ?

if this crashes before qcom_scm.download_mode="mini" mode was set then
full ramdump gets collected which is default gets set from the firmware
however, it is also possible to set minidump by default from the
firmware, but let's no go into that detail.

>
>> +Similarly, ``"full"`` is passed to set the download mode to full dump
>> +where entire RAM dump will be collected while setting it ``"full,mini"``
>> +will collect minidump along with fulldump.
>
> Still not super-clear what the difference between mini and full is here.

It should be clear by now.

>
>> +
>> +Writing to sysfs node can also be used to set the mode to minidump::
>> +
>> +    echo "mini" > /sys/module/qcom_scm/parameter/download_mode
>> +
>> +Once the download mode is set, any kind of crash will make the device
>> collect
>> +respective dump as per set download mode.
>
> Nice.
>
>> +
>> +Dump collection
>> +---------------
>> +::
>> +
>> +    +-----------+
>> +    |           |
>> +    |           |         +------+
>> +    |           |         |      |
>> +    |           |         +--+---+ Product(Qualcomm SoC)
>> +    +-----------+             |
>> +    |+++++++++++|<------------+
>> +    |+++++++++++|    usb cable
>> +    +-----------+
>> +            x86_64 PC
>> +
>> +The solution supports a product running with Qualcomm SoC (where
>> minidump)
>> +is supported from the firmware) connected to x86_64 host PC running PCAT
>> +tool.
>
> It supports downloading the minidump produced from product to the
>> +host PC over USB or to save the minidump to the product attached storage
>> +device(UFS/eMMC/SD Card) into minidump dedicated partition.
>
> It would be a good idea to reference this section earlier.

Ok.

Thanks again, for going into such minute detail and asking these questions.

I really appriciate the interest.

-Mukesh

>
>> +
>> +By default, dumps are downloaded via USB to the attached x86_64 PC
>> running
>> +PCAT (Qualcomm tool) software. Upon download, we will see a set of
>> binary
>> +blobs starting with name ``md_*`` in PCAT configured directory in x86_64
>> +machine, so for above example from the client it will be
>> ``md_REGION_A.BIN``.
>> +This binary blob depends on region content to determine whether it needs
>> +external parser support to get the content of the region, so for simple
>> +plain ASCII text we don't need any parsing and the content can be seen
>> +just opening the binary file.
>> +
>> +To collect the dump to attached storage type, one needs to write
>> appropriate
>> +value to IMEM register, in that case dumps are collected in rawdump
>> +partition on the product device itself.
>> +
>> +One needs to read the entire rawdump partition and pull out content to
>> +save it onto the attached x86_64 machine over USB. Later, this rawdump
>> +can be passed to another tool (``dexter.exe`` [Qualcomm tool]) which
>> +converts this into the similar binary blobs which we have got it when
>> +download type was set to USB, i.e. a set of registered regions as blobs
>> +and their name starts with ``md_*``.
>> +
>> +Replacing the ``dexter.exe`` with some open source tool can be added
>> as future
>> +scope of this document.
>
> ---
> bod

2023-12-25 13:25:13

by Ruipeng Qi

[permalink] [raw]
Subject: Re: [Patch v6 03/12] docs: qcom: Add qualcomm minidump guide

On Sat, 25 Nov 2023, Mukesh Ojha wrote:

<+How a kernel client driver can register region with minidump
<+------------------------------------------------------------
<+
<+Client driver can use ``qcom_minidump_region_register`` API's to register
<+and ``qcom_minidump_region_unregister`` to unregister their region from
<+minidump driver.
<+
<+Client needs to fill their region by filling ``qcom_minidump_region``
<+structure object which consists of the region name, region's virtual
<+and physical address and its size.

Hi, Mukesh, wish you a good holiday :)

I have the following idea, please help me to assess whether this can be
implemented or not. As we all know, most of the kernel objects are
allocated by the slab sub-system.I wonder if we can dump all memory
keeped by the slab sub-system? If so, we got most of the kernel objects
which will be helpful to fix problems when we run with system issues.

How can we do this? From the description above, I think we should
register one region for each slab, for each slab will have some pages,
and the memory between each slab is non-continuous. As we all
know, there are millions of slabs in the system, so if we dump slabs
in this way, it will introduce a heavy overhead.

I am not very familiar with qualcomm minidump, maybe my thought
is wrong. Looking forward to your reply!

Best Regards
Ruipeng