2018-02-16 17:37:31

by Marc-André Lureau

[permalink] [raw]
Subject: [PATCH v15 00/11] fw_cfg: add DMA operations & etc/vmcoreinfo support

Hi,

This series adds DMA operations support to the qemu fw_cfg kernel
module and populates "etc/vmcoreinfo" with vmcoreinfo location
details (entry added since qemu 2.11 with -device vmcoreinfo).

v15:
- fix fw_cfg.h uapi header #include
- use BSD license for fw_cfg.h uapi header
- move the uapi defines/structs for DMA & vmcoreinfo in the
corresponding patch
- use cpu_relax() instead of usleep_range(50, 100);
- replace do { } while(true) by for (;;)
- fix the rmb() call location
- add a preliminary patch to handle error from fw_cfg_write_blob()
- rewrite fw_cfg_sel_endianness() to wrap iowrite() calls

v14:
- add "fw_cfg: add a public uapi header"
- fix sparse warnings & don't introduce new warnings
- add memory barriers to force IO ordering
- split fw_cfg_read_blob() in fw_cfg_read_blob_io() and
fw_cfg_read_blob_dma()
- add error handling to fw_cfg_read_blob() callers
- minor stylistic changes

v13:
- reorder patch series, introduce DMA write before DMA read
- do some measurements of DMA read speed-ups

v12:
- fix virt_to_phys(NULL) panic with CONFIG_DEBUG_VIRTUAL=y
- do not use DMA read, except for kmalloc() memory we allocated
ourself (only fw_cfg_register_dir_entries() so far)

v11:
- add #include <linux/crash_core.h> in last patch,
fixing kbuild .config test

Marc-André Lureau (11):
crash: export paddr_vmcoreinfo_note()
fw_cfg: add a public uapi header
fw_cfg: fix sparse warnings in fw_cfg_sel_endianness()
fw_cfg: fix sparse warnings with fw_cfg_file
fw_cfg: fix sparse warning reading FW_CFG_ID
fw_cfg: fix sparse warnings around FW_CFG_FILE_DIR read
fw_cfg: remove inline from fw_cfg_read_blob()
fw_cfg: handle fw_cfg_read_blob() error
fw_cfg: add DMA register
fw_cfg: write vmcoreinfo details
RFC: fw_cfg: do DMA read operation

MAINTAINERS | 1 +
drivers/firmware/qemu_fw_cfg.c | 334 +++++++++++++++++++++++++++++++++--------
include/uapi/linux/fw_cfg.h | 97 ++++++++++++
kernel/crash_core.c | 1 +
4 files changed, 369 insertions(+), 64 deletions(-)
create mode 100644 include/uapi/linux/fw_cfg.h

--
2.16.1.73.g5832b7e9f2



2018-02-16 15:29:16

by Marc-André Lureau

[permalink] [raw]
Subject: [PATCH v15 02/11] fw_cfg: add a public uapi header

Create a common header file for well-known values and structures to be
shared by the Linux kernel with qemu or other projects.

It is based from qemu/docs/specs/fw_cfg.txt which references
qemu/include/hw/nvram/fw_cfg_keys.h "for the most up-to-date and
authoritative list" & vmcoreinfo.txt. Those files don't have an
explicit license, but qemu/hw/nvram/fw_cfg.c is BSD-license, so
Michael S. Tsirkin suggested to use the same license.

The patch intentionally left out DMA & vmcoreinfo structures &
defines, which are added in the commits making usage of it.

Suggested-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Marc-André Lureau <[email protected]>

---

The related qemu patch making use of it, to be submitted:
https://github.com/elmarco/qemu/commit/4884fc9e9c4c4467a371e5a40f3181239e1b70f5
---
MAINTAINERS | 1 +
drivers/firmware/qemu_fw_cfg.c | 22 ++------------
include/uapi/linux/fw_cfg.h | 66 ++++++++++++++++++++++++++++++++++++++++++
3 files changed, 69 insertions(+), 20 deletions(-)
create mode 100644 include/uapi/linux/fw_cfg.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 3bdc260e36b7..a66b65f62811 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11352,6 +11352,7 @@ M: "Michael S. Tsirkin" <[email protected]>
L: [email protected]
S: Maintained
F: drivers/firmware/qemu_fw_cfg.c
+F: include/uapi/linux/fw_cfg.h

QIB DRIVER
M: Dennis Dalessandro <[email protected]>
diff --git a/drivers/firmware/qemu_fw_cfg.c b/drivers/firmware/qemu_fw_cfg.c
index a41b572eeeb1..42601a3eaed5 100644
--- a/drivers/firmware/qemu_fw_cfg.c
+++ b/drivers/firmware/qemu_fw_cfg.c
@@ -32,30 +32,12 @@
#include <linux/slab.h>
#include <linux/io.h>
#include <linux/ioport.h>
+#include <uapi/linux/fw_cfg.h>

MODULE_AUTHOR("Gabriel L. Somlo <[email protected]>");
MODULE_DESCRIPTION("QEMU fw_cfg sysfs support");
MODULE_LICENSE("GPL");

-/* selector key values for "well-known" fw_cfg entries */
-#define FW_CFG_SIGNATURE 0x00
-#define FW_CFG_ID 0x01
-#define FW_CFG_FILE_DIR 0x19
-
-/* size in bytes of fw_cfg signature */
-#define FW_CFG_SIG_SIZE 4
-
-/* fw_cfg "file name" is up to 56 characters (including terminating nul) */
-#define FW_CFG_MAX_FILE_PATH 56
-
-/* fw_cfg file directory entry type */
-struct fw_cfg_file {
- u32 size;
- u16 select;
- u16 reserved;
- char name[FW_CFG_MAX_FILE_PATH];
-};
-
/* fw_cfg device i/o register addresses */
static bool fw_cfg_is_mmio;
static phys_addr_t fw_cfg_p_base;
@@ -597,7 +579,7 @@ MODULE_DEVICE_TABLE(of, fw_cfg_sysfs_mmio_match);

#ifdef CONFIG_ACPI
static const struct acpi_device_id fw_cfg_sysfs_acpi_match[] = {
- { "QEMU0002", },
+ { FW_CFG_ACPI_DEVICE_ID, },
{},
};
MODULE_DEVICE_TABLE(acpi, fw_cfg_sysfs_acpi_match);
diff --git a/include/uapi/linux/fw_cfg.h b/include/uapi/linux/fw_cfg.h
new file mode 100644
index 000000000000..c698ac3812f6
--- /dev/null
+++ b/include/uapi/linux/fw_cfg.h
@@ -0,0 +1,66 @@
+/* SPDX-License-Identifier: BSD-3-Clause */
+#ifndef _LINUX_FW_CFG_H
+#define _LINUX_FW_CFG_H
+
+#include <linux/types.h>
+
+#define FW_CFG_ACPI_DEVICE_ID "QEMU0002"
+
+/* selector key values for "well-known" fw_cfg entries */
+#define FW_CFG_SIGNATURE 0x00
+#define FW_CFG_ID 0x01
+#define FW_CFG_UUID 0x02
+#define FW_CFG_RAM_SIZE 0x03
+#define FW_CFG_NOGRAPHIC 0x04
+#define FW_CFG_NB_CPUS 0x05
+#define FW_CFG_MACHINE_ID 0x06
+#define FW_CFG_KERNEL_ADDR 0x07
+#define FW_CFG_KERNEL_SIZE 0x08
+#define FW_CFG_KERNEL_CMDLINE 0x09
+#define FW_CFG_INITRD_ADDR 0x0a
+#define FW_CFG_INITRD_SIZE 0x0b
+#define FW_CFG_BOOT_DEVICE 0x0c
+#define FW_CFG_NUMA 0x0d
+#define FW_CFG_BOOT_MENU 0x0e
+#define FW_CFG_MAX_CPUS 0x0f
+#define FW_CFG_KERNEL_ENTRY 0x10
+#define FW_CFG_KERNEL_DATA 0x11
+#define FW_CFG_INITRD_DATA 0x12
+#define FW_CFG_CMDLINE_ADDR 0x13
+#define FW_CFG_CMDLINE_SIZE 0x14
+#define FW_CFG_CMDLINE_DATA 0x15
+#define FW_CFG_SETUP_ADDR 0x16
+#define FW_CFG_SETUP_SIZE 0x17
+#define FW_CFG_SETUP_DATA 0x18
+#define FW_CFG_FILE_DIR 0x19
+
+#define FW_CFG_FILE_FIRST 0x20
+#define FW_CFG_FILE_SLOTS_MIN 0x10
+
+#define FW_CFG_WRITE_CHANNEL 0x4000
+#define FW_CFG_ARCH_LOCAL 0x8000
+#define FW_CFG_ENTRY_MASK (~(FW_CFG_WRITE_CHANNEL | FW_CFG_ARCH_LOCAL))
+
+#define FW_CFG_INVALID 0xffff
+
+/* width in bytes of fw_cfg control register */
+#define FW_CFG_CTL_SIZE 0x02
+
+/* fw_cfg "file name" is up to 56 characters (including terminating nul) */
+#define FW_CFG_MAX_FILE_PATH 56
+
+/* size in bytes of fw_cfg signature */
+#define FW_CFG_SIG_SIZE 4
+
+/* FW_CFG_ID bits */
+#define FW_CFG_VERSION 0x01
+
+/* fw_cfg file directory entry type */
+struct fw_cfg_file {
+ __be32 size;
+ __be16 select;
+ __u16 reserved;
+ char name[FW_CFG_MAX_FILE_PATH];
+};
+
+#endif
--
2.16.1.73.g5832b7e9f2


2018-02-16 15:29:19

by Marc-André Lureau

[permalink] [raw]
Subject: [PATCH v15 03/11] fw_cfg: fix sparse warnings in fw_cfg_sel_endianness()

Dispatch to the appropriate iowrite() instead of casting restricted
type to u16.

- if fw_cfg_is_mmio:
before: iowrite16(cpu_to_be16(key))
after: iowrite16be(key)
- if !fw_cfg_is_mmio:
before: iowrite16(cpu_to_le16(key))
after: iowrite16(key)
which is equivalent on little-endian systems, where fw_cfg IO is supported.

Fixes:
$ make C=1 CF=-D__CHECK_ENDIAN__ drivers/firmware/qemu_fw_cfg.o

drivers/firmware/qemu_fw_cfg.c:55:33: warning: restricted __be16 degrades to integer
drivers/firmware/qemu_fw_cfg.c:55:52: warning: restricted __le16 degrades to integer

Signed-off-by: Marc-André Lureau <[email protected]>
---
drivers/firmware/qemu_fw_cfg.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/firmware/qemu_fw_cfg.c b/drivers/firmware/qemu_fw_cfg.c
index 42601a3eaed5..6164731a3c35 100644
--- a/drivers/firmware/qemu_fw_cfg.c
+++ b/drivers/firmware/qemu_fw_cfg.c
@@ -50,9 +50,12 @@ static void __iomem *fw_cfg_reg_data;
static DEFINE_MUTEX(fw_cfg_dev_lock);

/* pick appropriate endianness for selector key */
-static inline u16 fw_cfg_sel_endianness(u16 key)
+static void fw_cfg_sel_endianness(u16 key)
{
- return fw_cfg_is_mmio ? cpu_to_be16(key) : cpu_to_le16(key);
+ if (fw_cfg_is_mmio)
+ iowrite16be(key, fw_cfg_reg_ctrl);
+ else
+ iowrite16(key, fw_cfg_reg_ctrl);
}

/* read chunk of given fw_cfg blob (caller responsible for sanity-check) */
@@ -74,7 +77,7 @@ static inline void fw_cfg_read_blob(u16 key,
}

mutex_lock(&fw_cfg_dev_lock);
- iowrite16(fw_cfg_sel_endianness(key), fw_cfg_reg_ctrl);
+ fw_cfg_sel_endianness(key);
while (pos-- > 0)
ioread8(fw_cfg_reg_data);
ioread8_rep(fw_cfg_reg_data, buf, count);
--
2.16.1.73.g5832b7e9f2


2018-02-16 15:30:20

by Marc-André Lureau

[permalink] [raw]
Subject: [PATCH v15 07/11] fw_cfg: remove inline from fw_cfg_read_blob()

The function is not small and getting bigger.

Let the compiler decide instead. No profiling done, hopefully
unnecessary.

Signed-off-by: Marc-André Lureau <[email protected]>
---
drivers/firmware/qemu_fw_cfg.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/firmware/qemu_fw_cfg.c b/drivers/firmware/qemu_fw_cfg.c
index 805372e8e50d..f6f90bef604c 100644
--- a/drivers/firmware/qemu_fw_cfg.c
+++ b/drivers/firmware/qemu_fw_cfg.c
@@ -59,8 +59,8 @@ static void fw_cfg_sel_endianness(u16 key)
}

/* read chunk of given fw_cfg blob (caller responsible for sanity-check) */
-static inline void fw_cfg_read_blob(u16 key,
- void *buf, loff_t pos, size_t count)
+static void fw_cfg_read_blob(u16 key,
+ void *buf, loff_t pos, size_t count)
{
u32 glk = -1U;
acpi_status status;
--
2.16.1.73.g5832b7e9f2


2018-02-16 15:30:29

by Marc-André Lureau

[permalink] [raw]
Subject: [PATCH v15 09/11] fw_cfg: add DMA register

Add an optional <dma_off> kernel module (or command line) parameter
using the following syntax:

[qemu_fw_cfg.]ioport=<size>@<base>[:<ctrl_off>:<data_off>[:<dma_off>]]
or
[qemu_fw_cfg.]mmio=<size>@<base>[:<ctrl_off>:<data_off>[:<dma_off>]]

and initializes the register address using given or default offset.

Signed-off-by: Marc-André Lureau <[email protected]>
Reviewed-by: Gabriel Somlo <[email protected]>
---
drivers/firmware/qemu_fw_cfg.c | 53 ++++++++++++++++++++++++++++++++----------
1 file changed, 41 insertions(+), 12 deletions(-)

diff --git a/drivers/firmware/qemu_fw_cfg.c b/drivers/firmware/qemu_fw_cfg.c
index 5e6e5ac71dab..c28bec4b5663 100644
--- a/drivers/firmware/qemu_fw_cfg.c
+++ b/drivers/firmware/qemu_fw_cfg.c
@@ -10,20 +10,21 @@
* and select subsets of aarch64), a Device Tree node (on arm), or using
* a kernel module (or command line) parameter with the following syntax:
*
- * [qemu_fw_cfg.]ioport=<size>@<base>[:<ctrl_off>:<data_off>]
+ * [qemu_fw_cfg.]ioport=<size>@<base>[:<ctrl_off>:<data_off>[:<dma_off>]]
* or
- * [qemu_fw_cfg.]mmio=<size>@<base>[:<ctrl_off>:<data_off>]
+ * [qemu_fw_cfg.]mmio=<size>@<base>[:<ctrl_off>:<data_off>[:<dma_off>]]
*
* where:
* <size> := size of ioport or mmio range
* <base> := physical base address of ioport or mmio range
* <ctrl_off> := (optional) offset of control register
* <data_off> := (optional) offset of data register
+ * <dma_off> := (optional) offset of dma register
*
* e.g.:
- * qemu_fw_cfg.ioport=2@0x510:0:1 (the default on x86)
+ * qemu_fw_cfg.ioport=12@0x510:0:1:4 (the default on x86)
* or
- * qemu_fw_cfg.mmio=0xA@0x9020000:8:0 (the default on arm)
+ * qemu_fw_cfg.mmio=16@0x9020000:8:0:16 (the default on arm)
*/

#include <linux/module.h>
@@ -45,6 +46,7 @@ static resource_size_t fw_cfg_p_size;
static void __iomem *fw_cfg_dev_base;
static void __iomem *fw_cfg_reg_ctrl;
static void __iomem *fw_cfg_reg_data;
+static void __iomem *fw_cfg_reg_dma;

/* atomic access to fw_cfg device (potentially slow i/o, so using mutex) */
static DEFINE_MUTEX(fw_cfg_dev_lock);
@@ -104,12 +106,14 @@ static void fw_cfg_io_cleanup(void)
# if (defined(CONFIG_ARM) || defined(CONFIG_ARM64))
# define FW_CFG_CTRL_OFF 0x08
# define FW_CFG_DATA_OFF 0x00
+# define FW_CFG_DMA_OFF 0x10
# elif (defined(CONFIG_PPC_PMAC) || defined(CONFIG_SPARC32)) /* ppc/mac,sun4m */
# define FW_CFG_CTRL_OFF 0x00
# define FW_CFG_DATA_OFF 0x02
# elif (defined(CONFIG_X86) || defined(CONFIG_SPARC64)) /* x86, sun4u */
# define FW_CFG_CTRL_OFF 0x00
# define FW_CFG_DATA_OFF 0x01
+# define FW_CFG_DMA_OFF 0x04
# else
# error "QEMU FW_CFG not available on this architecture!"
# endif
@@ -119,7 +123,7 @@ static void fw_cfg_io_cleanup(void)
static int fw_cfg_do_platform_probe(struct platform_device *pdev)
{
char sig[FW_CFG_SIG_SIZE];
- struct resource *range, *ctrl, *data;
+ struct resource *range, *ctrl, *data, *dma;

/* acquire i/o range details */
fw_cfg_is_mmio = false;
@@ -156,6 +160,7 @@ static int fw_cfg_do_platform_probe(struct platform_device *pdev)
/* were custom register offsets provided (e.g. on the command line)? */
ctrl = platform_get_resource_byname(pdev, IORESOURCE_REG, "ctrl");
data = platform_get_resource_byname(pdev, IORESOURCE_REG, "data");
+ dma = platform_get_resource_byname(pdev, IORESOURCE_REG, "dma");
if (ctrl && data) {
fw_cfg_reg_ctrl = fw_cfg_dev_base + ctrl->start;
fw_cfg_reg_data = fw_cfg_dev_base + data->start;
@@ -165,6 +170,13 @@ static int fw_cfg_do_platform_probe(struct platform_device *pdev)
fw_cfg_reg_data = fw_cfg_dev_base + FW_CFG_DATA_OFF;
}

+ if (dma)
+ fw_cfg_reg_dma = fw_cfg_dev_base + dma->start;
+#ifdef FW_CFG_DMA_OFF
+ else
+ fw_cfg_reg_dma = fw_cfg_dev_base + FW_CFG_DMA_OFF;
+#endif
+
/* verify fw_cfg device signature */
if (fw_cfg_read_blob(FW_CFG_SIGNATURE, sig,
0, FW_CFG_SIG_SIZE) < 0 ||
@@ -630,6 +642,7 @@ static struct platform_device *fw_cfg_cmdline_dev;
/* use special scanf/printf modifier for phys_addr_t, resource_size_t */
#define PH_ADDR_SCAN_FMT "@%" __PHYS_ADDR_PREFIX "i%n" \
":%" __PHYS_ADDR_PREFIX "i" \
+ ":%" __PHYS_ADDR_PREFIX "i%n" \
":%" __PHYS_ADDR_PREFIX "i%n"

#define PH_ADDR_PR_1_FMT "0x%" __PHYS_ADDR_PREFIX "x@" \
@@ -639,12 +652,15 @@ static struct platform_device *fw_cfg_cmdline_dev;
":%" __PHYS_ADDR_PREFIX "u" \
":%" __PHYS_ADDR_PREFIX "u"

+#define PH_ADDR_PR_4_FMT PH_ADDR_PR_3_FMT \
+ ":%" __PHYS_ADDR_PREFIX "u"
+
static int fw_cfg_cmdline_set(const char *arg, const struct kernel_param *kp)
{
- struct resource res[3] = {};
+ struct resource res[4] = {};
char *str;
phys_addr_t base;
- resource_size_t size, ctrl_off, data_off;
+ resource_size_t size, ctrl_off, data_off, dma_off;
int processed, consumed = 0;

/* only one fw_cfg device can exist system-wide, so if one
@@ -660,19 +676,20 @@ static int fw_cfg_cmdline_set(const char *arg, const struct kernel_param *kp)
/* consume "<size>" portion of command line argument */
size = memparse(arg, &str);

- /* get "@<base>[:<ctrl_off>:<data_off>]" chunks */
+ /* get "@<base>[:<ctrl_off>:<data_off>[:<dma_off>]]" chunks */
processed = sscanf(str, PH_ADDR_SCAN_FMT,
&base, &consumed,
- &ctrl_off, &data_off, &consumed);
+ &ctrl_off, &data_off, &consumed,
+ &dma_off, &consumed);

- /* sscanf() must process precisely 1 or 3 chunks:
+ /* sscanf() must process precisely 1, 3 or 4 chunks:
* <base> is mandatory, optionally followed by <ctrl_off>
- * and <data_off>;
+ * and <data_off>, and <dma_off>;
* there must be no extra characters after the last chunk,
* so str[consumed] must be '\0'.
*/
if (str[consumed] ||
- (processed != 1 && processed != 3))
+ (processed != 1 && processed != 3 && processed != 4))
return -EINVAL;

res[0].start = base;
@@ -689,6 +706,11 @@ static int fw_cfg_cmdline_set(const char *arg, const struct kernel_param *kp)
res[2].start = data_off;
res[2].flags = IORESOURCE_REG;
}
+ if (processed > 3) {
+ res[3].name = "dma";
+ res[3].start = dma_off;
+ res[3].flags = IORESOURCE_REG;
+ }

/* "processed" happens to nicely match the number of resources
* we need to pass in to this platform device.
@@ -721,6 +743,13 @@ static int fw_cfg_cmdline_get(char *buf, const struct kernel_param *kp)
fw_cfg_cmdline_dev->resource[0].start,
fw_cfg_cmdline_dev->resource[1].start,
fw_cfg_cmdline_dev->resource[2].start);
+ case 4:
+ return snprintf(buf, PAGE_SIZE, PH_ADDR_PR_4_FMT,
+ resource_size(&fw_cfg_cmdline_dev->resource[0]),
+ fw_cfg_cmdline_dev->resource[0].start,
+ fw_cfg_cmdline_dev->resource[1].start,
+ fw_cfg_cmdline_dev->resource[2].start,
+ fw_cfg_cmdline_dev->resource[3].start);
}

/* Should never get here */
--
2.16.1.73.g5832b7e9f2


2018-02-16 15:30:40

by Marc-André Lureau

[permalink] [raw]
Subject: [PATCH v15 10/11] fw_cfg: write vmcoreinfo details

If the "etc/vmcoreinfo" fw_cfg file is present and we are not running
the kdump kernel, write the addr/size of the vmcoreinfo ELF note.

The DMA operation is expected to run synchronously with today qemu,
but the specification states that it may become async, so we run
"control" field check in a loop for eventual changes.

Signed-off-by: Marc-André Lureau <[email protected]>
---
drivers/firmware/qemu_fw_cfg.c | 143 ++++++++++++++++++++++++++++++++++++++++-
include/uapi/linux/fw_cfg.h | 31 +++++++++
2 files changed, 171 insertions(+), 3 deletions(-)

diff --git a/drivers/firmware/qemu_fw_cfg.c b/drivers/firmware/qemu_fw_cfg.c
index c28bec4b5663..3015e77aebca 100644
--- a/drivers/firmware/qemu_fw_cfg.c
+++ b/drivers/firmware/qemu_fw_cfg.c
@@ -34,11 +34,17 @@
#include <linux/io.h>
#include <linux/ioport.h>
#include <uapi/linux/fw_cfg.h>
+#include <linux/delay.h>
+#include <linux/crash_dump.h>
+#include <linux/crash_core.h>

MODULE_AUTHOR("Gabriel L. Somlo <[email protected]>");
MODULE_DESCRIPTION("QEMU fw_cfg sysfs support");
MODULE_LICENSE("GPL");

+/* fw_cfg revision attribute, in /sys/firmware/qemu_fw_cfg top-level dir. */
+static u32 fw_cfg_rev;
+
/* fw_cfg device i/o register addresses */
static bool fw_cfg_is_mmio;
static phys_addr_t fw_cfg_p_base;
@@ -60,6 +66,64 @@ static void fw_cfg_sel_endianness(u16 key)
iowrite16(key, fw_cfg_reg_ctrl);
}

+static inline bool fw_cfg_dma_enabled(void)
+{
+ return (fw_cfg_rev & FW_CFG_VERSION_DMA) && fw_cfg_reg_dma;
+}
+
+/* qemu fw_cfg device is sync today, but spec says it may become async */
+static void fw_cfg_wait_for_control(struct fw_cfg_dma_access *d)
+{
+ for (;;) {
+ u32 ctrl = be32_to_cpu(READ_ONCE(d->control));
+
+ /* do not reorder the read to d->control */
+ rmb();
+ if ((ctrl & ~FW_CFG_DMA_CTL_ERROR) == 0)
+ return;
+
+ cpu_relax();
+ }
+}
+
+static ssize_t fw_cfg_dma_transfer(void *address, u32 length, u32 control)
+{
+ phys_addr_t dma;
+ struct fw_cfg_dma_access *d = NULL;
+ ssize_t ret = length;
+
+ d = kmalloc(sizeof(*d), GFP_KERNEL);
+ if (!d) {
+ ret = -ENOMEM;
+ goto end;
+ }
+
+ /* fw_cfg device does not need IOMMU protection, so use physical addresses */
+ *d = (struct fw_cfg_dma_access) {
+ .address = cpu_to_be64(address ? virt_to_phys(address) : 0),
+ .length = cpu_to_be32(length),
+ .control = cpu_to_be32(control)
+ };
+
+ dma = virt_to_phys(d);
+
+ iowrite32be((u64)dma >> 32, fw_cfg_reg_dma);
+ /* force memory to sync before notifying device via MMIO */
+ wmb();
+ iowrite32be(dma, fw_cfg_reg_dma + 4);
+
+ fw_cfg_wait_for_control(d);
+
+ if (be32_to_cpu(READ_ONCE(d->control)) & FW_CFG_DMA_CTL_ERROR) {
+ ret = -EIO;
+ }
+
+end:
+ kfree(d);
+
+ return ret;
+}
+
/* read chunk of given fw_cfg blob (caller responsible for sanity-check) */
static ssize_t fw_cfg_read_blob(u16 key,
void *buf, loff_t pos, size_t count)
@@ -89,6 +153,47 @@ static ssize_t fw_cfg_read_blob(u16 key,
return count;
}

+#ifdef CONFIG_CRASH_CORE
+/* write chunk of given fw_cfg blob (caller responsible for sanity-check) */
+static ssize_t fw_cfg_write_blob(u16 key,
+ void *buf, loff_t pos, size_t count)
+{
+ u32 glk = -1U;
+ acpi_status status;
+ ssize_t ret = count;
+
+ /* If we have ACPI, ensure mutual exclusion against any potential
+ * device access by the firmware, e.g. via AML methods:
+ */
+ status = acpi_acquire_global_lock(ACPI_WAIT_FOREVER, &glk);
+ if (ACPI_FAILURE(status) && status != AE_NOT_CONFIGURED) {
+ /* Should never get here */
+ WARN(1, "%s: Failed to lock ACPI!\n", __func__);
+ return -EINVAL;
+ }
+
+ mutex_lock(&fw_cfg_dev_lock);
+ if (pos == 0) {
+ ret = fw_cfg_dma_transfer(buf, count, key << 16
+ | FW_CFG_DMA_CTL_SELECT
+ | FW_CFG_DMA_CTL_WRITE);
+ } else {
+ fw_cfg_sel_endianness(key);
+ ret = fw_cfg_dma_transfer(NULL, pos, FW_CFG_DMA_CTL_SKIP);
+ if (ret < 0)
+ goto end;
+ ret = fw_cfg_dma_transfer(buf, count, FW_CFG_DMA_CTL_WRITE);
+ }
+
+end:
+ mutex_unlock(&fw_cfg_dev_lock);
+
+ acpi_release_global_lock(glk);
+
+ return ret;
+}
+#endif /* CONFIG_CRASH_CORE */
+
/* clean up fw_cfg device i/o */
static void fw_cfg_io_cleanup(void)
{
@@ -188,9 +293,6 @@ static int fw_cfg_do_platform_probe(struct platform_device *pdev)
return 0;
}

-/* fw_cfg revision attribute, in /sys/firmware/qemu_fw_cfg top-level dir. */
-static u32 fw_cfg_rev;
-
static ssize_t fw_cfg_showrev(struct kobject *k, struct attribute *a, char *buf)
{
return sprintf(buf, "%u\n", fw_cfg_rev);
@@ -213,6 +315,32 @@ struct fw_cfg_sysfs_entry {
struct list_head list;
};

+#ifdef CONFIG_CRASH_CORE
+static ssize_t fw_cfg_write_vmcoreinfo(const struct fw_cfg_file *f)
+{
+ static struct fw_cfg_vmcoreinfo *data;
+ ssize_t ret;
+
+ data = kmalloc(sizeof(struct fw_cfg_vmcoreinfo), GFP_KERNEL);
+ if (!data)
+ return -ENOMEM;
+
+ *data = (struct fw_cfg_vmcoreinfo) {
+ .guest_format = cpu_to_le16(FW_CFG_VMCOREINFO_FORMAT_ELF),
+ .size = cpu_to_le32(VMCOREINFO_NOTE_SIZE),
+ .paddr = cpu_to_le64(paddr_vmcoreinfo_note())
+ };
+ /* spare ourself reading host format support for now since we
+ * don't know what else to format - host may ignore ours
+ */
+ ret = fw_cfg_write_blob(be16_to_cpu(f->select), data,
+ 0, sizeof(struct fw_cfg_vmcoreinfo));
+
+ kfree(data);
+ return ret;
+}
+#endif /* CONFIG_CRASH_CORE */
+
/* get fw_cfg_sysfs_entry from kobject member */
static inline struct fw_cfg_sysfs_entry *to_entry(struct kobject *kobj)
{
@@ -452,6 +580,15 @@ static int fw_cfg_register_file(const struct fw_cfg_file *f)
int err;
struct fw_cfg_sysfs_entry *entry;

+#ifdef CONFIG_CRASH_CORE
+ if (fw_cfg_dma_enabled() &&
+ strcmp(f->name, FW_CFG_VMCOREINFO_FILENAME) == 0 &&
+ !is_kdump_kernel()) {
+ if (fw_cfg_write_vmcoreinfo(f) < 0)
+ pr_warn("fw_cfg: failed to write vmcoreinfo");
+ }
+#endif
+
/* allocate new entry */
entry = kzalloc(sizeof(*entry), GFP_KERNEL);
if (!entry)
diff --git a/include/uapi/linux/fw_cfg.h b/include/uapi/linux/fw_cfg.h
index c698ac3812f6..e089c0159ec2 100644
--- a/include/uapi/linux/fw_cfg.h
+++ b/include/uapi/linux/fw_cfg.h
@@ -54,6 +54,7 @@

/* FW_CFG_ID bits */
#define FW_CFG_VERSION 0x01
+#define FW_CFG_VERSION_DMA 0x02

/* fw_cfg file directory entry type */
struct fw_cfg_file {
@@ -63,4 +64,34 @@ struct fw_cfg_file {
char name[FW_CFG_MAX_FILE_PATH];
};

+/* FW_CFG_DMA_CONTROL bits */
+#define FW_CFG_DMA_CTL_ERROR 0x01
+#define FW_CFG_DMA_CTL_READ 0x02
+#define FW_CFG_DMA_CTL_SKIP 0x04
+#define FW_CFG_DMA_CTL_SELECT 0x08
+#define FW_CFG_DMA_CTL_WRITE 0x10
+
+#define FW_CFG_DMA_SIGNATURE 0x51454d5520434647ULL /* "QEMU CFG" */
+
+/* Control as first field allows for different structures selected by this
+ * field, which might be useful in the future
+ */
+struct fw_cfg_dma_access {
+ __be32 control;
+ __be32 length;
+ __be64 address;
+};
+
+#define FW_CFG_VMCOREINFO_FILENAME "etc/vmcoreinfo"
+
+#define FW_CFG_VMCOREINFO_FORMAT_NONE 0x0
+#define FW_CFG_VMCOREINFO_FORMAT_ELF 0x1
+
+struct fw_cfg_vmcoreinfo {
+ __le16 host_format;
+ __le16 guest_format;
+ __le32 size;
+ __le64 paddr;
+};
+
#endif
--
2.16.1.73.g5832b7e9f2


2018-02-16 15:30:45

by Marc-André Lureau

[permalink] [raw]
Subject: [PATCH v15 05/11] fw_cfg: fix sparse warning reading FW_CFG_ID

Use a restricted type for reading FW_CFG_ID, fixing sparse warning:

drivers/firmware/qemu_fw_cfg.c:540:22: warning: cast to restricted __le32

Signed-off-by: Marc-André Lureau <[email protected]>
---
drivers/firmware/qemu_fw_cfg.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/firmware/qemu_fw_cfg.c b/drivers/firmware/qemu_fw_cfg.c
index 6ee12c9e079a..71672cb8c427 100644
--- a/drivers/firmware/qemu_fw_cfg.c
+++ b/drivers/firmware/qemu_fw_cfg.c
@@ -512,6 +512,7 @@ static inline void fw_cfg_kobj_cleanup(struct kobject *kobj)
static int fw_cfg_sysfs_probe(struct platform_device *pdev)
{
int err;
+ __le32 rev;

/* NOTE: If we supported multiple fw_cfg devices, we'd first create
* a subdirectory named after e.g. pdev->id, then hang per-device
@@ -537,8 +538,8 @@ static int fw_cfg_sysfs_probe(struct platform_device *pdev)
goto err_probe;

/* get revision number, add matching top-level attribute */
- fw_cfg_read_blob(FW_CFG_ID, &fw_cfg_rev, 0, sizeof(fw_cfg_rev));
- fw_cfg_rev = le32_to_cpu(fw_cfg_rev);
+ fw_cfg_read_blob(FW_CFG_ID, &rev, 0, sizeof(rev));
+ fw_cfg_rev = le32_to_cpu(rev);
err = sysfs_create_file(fw_cfg_top_ko, &fw_cfg_rev_attr.attr);
if (err)
goto err_rev;
--
2.16.1.73.g5832b7e9f2


2018-02-16 15:30:48

by Marc-André Lureau

[permalink] [raw]
Subject: [PATCH v15 08/11] fw_cfg: handle fw_cfg_read_blob() error

fw_cfg_read_blob() may fail, but does not return error. This may lead
to undefined behaviours, such as a memcmp(sig, "QEMU") on uninitilized
memory. Return an error if ACPI locking failed. Also, the following
DMA read/write extension will add more error paths that should be
handled appropriately.

Signed-off-by: Marc-André Lureau <[email protected]>
---
drivers/firmware/qemu_fw_cfg.c | 32 ++++++++++++++++++++++----------
1 file changed, 22 insertions(+), 10 deletions(-)

diff --git a/drivers/firmware/qemu_fw_cfg.c b/drivers/firmware/qemu_fw_cfg.c
index f6f90bef604c..5e6e5ac71dab 100644
--- a/drivers/firmware/qemu_fw_cfg.c
+++ b/drivers/firmware/qemu_fw_cfg.c
@@ -59,8 +59,8 @@ static void fw_cfg_sel_endianness(u16 key)
}

/* read chunk of given fw_cfg blob (caller responsible for sanity-check) */
-static void fw_cfg_read_blob(u16 key,
- void *buf, loff_t pos, size_t count)
+static ssize_t fw_cfg_read_blob(u16 key,
+ void *buf, loff_t pos, size_t count)
{
u32 glk = -1U;
acpi_status status;
@@ -73,7 +73,7 @@ static void fw_cfg_read_blob(u16 key,
/* Should never get here */
WARN(1, "fw_cfg_read_blob: Failed to lock ACPI!\n");
memset(buf, 0, count);
- return;
+ return -EINVAL;
}

mutex_lock(&fw_cfg_dev_lock);
@@ -84,6 +84,7 @@ static void fw_cfg_read_blob(u16 key,
mutex_unlock(&fw_cfg_dev_lock);

acpi_release_global_lock(glk);
+ return count;
}

/* clean up fw_cfg device i/o */
@@ -165,8 +166,9 @@ static int fw_cfg_do_platform_probe(struct platform_device *pdev)
}

/* verify fw_cfg device signature */
- fw_cfg_read_blob(FW_CFG_SIGNATURE, sig, 0, FW_CFG_SIG_SIZE);
- if (memcmp(sig, "QEMU", FW_CFG_SIG_SIZE) != 0) {
+ if (fw_cfg_read_blob(FW_CFG_SIGNATURE, sig,
+ 0, FW_CFG_SIG_SIZE) < 0 ||
+ memcmp(sig, "QEMU", FW_CFG_SIG_SIZE) != 0) {
fw_cfg_io_cleanup();
return -ENODEV;
}
@@ -326,8 +328,7 @@ static ssize_t fw_cfg_sysfs_read_raw(struct file *filp, struct kobject *kobj,
if (count > entry->size - pos)
count = entry->size - pos;

- fw_cfg_read_blob(entry->select, buf, pos, count);
- return count;
+ return fw_cfg_read_blob(entry->select, buf, pos, count);
}

static struct bin_attribute fw_cfg_sysfs_attr_raw = {
@@ -483,7 +484,11 @@ static int fw_cfg_register_dir_entries(void)
struct fw_cfg_file *dir;
size_t dir_size;

- fw_cfg_read_blob(FW_CFG_FILE_DIR, &files_count, 0, sizeof(files_count));
+ ret = fw_cfg_read_blob(FW_CFG_FILE_DIR, &files_count,
+ 0, sizeof(files_count));
+ if (ret < 0)
+ return ret;
+
count = be32_to_cpu(files_count);
dir_size = count * sizeof(struct fw_cfg_file);

@@ -491,7 +496,10 @@ static int fw_cfg_register_dir_entries(void)
if (!dir)
return -ENOMEM;

- fw_cfg_read_blob(FW_CFG_FILE_DIR, dir, sizeof(files_count), dir_size);
+ ret = fw_cfg_read_blob(FW_CFG_FILE_DIR, dir,
+ sizeof(files_count), dir_size);
+ if (ret < 0)
+ goto end;

for (i = 0; i < count; i++) {
ret = fw_cfg_register_file(&dir[i]);
@@ -499,6 +507,7 @@ static int fw_cfg_register_dir_entries(void)
break;
}

+end:
kfree(dir);
return ret;
}
@@ -539,7 +548,10 @@ static int fw_cfg_sysfs_probe(struct platform_device *pdev)
goto err_probe;

/* get revision number, add matching top-level attribute */
- fw_cfg_read_blob(FW_CFG_ID, &rev, 0, sizeof(rev));
+ err = fw_cfg_read_blob(FW_CFG_ID, &rev, 0, sizeof(rev));
+ if (err < 0)
+ goto err_probe;
+
fw_cfg_rev = le32_to_cpu(rev);
err = sysfs_create_file(fw_cfg_top_ko, &fw_cfg_rev_attr.attr);
if (err)
--
2.16.1.73.g5832b7e9f2


2018-02-16 17:37:31

by Marc-André Lureau

[permalink] [raw]
Subject: [PATCH v15 01/11] crash: export paddr_vmcoreinfo_note()

The following patch is going to use the symbol from the fw_cfg module,
to call the function and write the note location details in the
vmcoreinfo entry, so qemu can produce dumps with the vmcoreinfo note.

CC: Andrew Morton <[email protected]>
CC: Baoquan He <[email protected]>
CC: Dave Young <[email protected]>
CC: Dave Young <[email protected]>
CC: Hari Bathini <[email protected]>
CC: Tony Luck <[email protected]>
CC: Vivek Goyal <[email protected]>
Signed-off-by: Marc-André Lureau <[email protected]>
Acked-by: Gabriel Somlo <[email protected]>
---
kernel/crash_core.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index 4f63597c824d..a93590cdd9e1 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -376,6 +376,7 @@ phys_addr_t __weak paddr_vmcoreinfo_note(void)
{
return __pa(vmcoreinfo_note);
}
+EXPORT_SYMBOL(paddr_vmcoreinfo_note);

static int __init crash_save_vmcoreinfo_init(void)
{
--
2.16.1.73.g5832b7e9f2


2018-02-16 17:38:17

by Marc-André Lureau

[permalink] [raw]
Subject: [PATCH v15 04/11] fw_cfg: fix sparse warnings with fw_cfg_file

Modify fw_cfg_sysfs_entry to store entry values, instead of reusing
the restricted types.

Fixes warnings such as:

$ make C=1 CF=-D__CHECK_ENDIAN__ drivers/firmware/qemu_fw_cfg.o

drivers/firmware/qemu_fw_cfg.c:491:29: warning: incorrect type in assignment (different base types)
drivers/firmware/qemu_fw_cfg.c:491:29: expected restricted __be32 [usertype] size
drivers/firmware/qemu_fw_cfg.c:491:29: got unsigned int
drivers/firmware/qemu_fw_cfg.c:492:31: warning: incorrect type in assignment (different base types)
drivers/firmware/qemu_fw_cfg.c:492:31: expected restricted __be16 [usertype] select
drivers/firmware/qemu_fw_cfg.c:492:31: got int

Signed-off-by: Marc-André Lureau <[email protected]>
---
drivers/firmware/qemu_fw_cfg.c | 28 +++++++++++++++-------------
1 file changed, 15 insertions(+), 13 deletions(-)

diff --git a/drivers/firmware/qemu_fw_cfg.c b/drivers/firmware/qemu_fw_cfg.c
index 6164731a3c35..6ee12c9e079a 100644
--- a/drivers/firmware/qemu_fw_cfg.c
+++ b/drivers/firmware/qemu_fw_cfg.c
@@ -193,7 +193,9 @@ static const struct {
/* fw_cfg_sysfs_entry type */
struct fw_cfg_sysfs_entry {
struct kobject kobj;
- struct fw_cfg_file f;
+ u32 size;
+ u16 select;
+ char name[FW_CFG_MAX_FILE_PATH];
struct list_head list;
};

@@ -257,17 +259,17 @@ struct fw_cfg_sysfs_attribute fw_cfg_sysfs_attr_##_attr = { \

static ssize_t fw_cfg_sysfs_show_size(struct fw_cfg_sysfs_entry *e, char *buf)
{
- return sprintf(buf, "%u\n", e->f.size);
+ return sprintf(buf, "%u\n", e->size);
}

static ssize_t fw_cfg_sysfs_show_key(struct fw_cfg_sysfs_entry *e, char *buf)
{
- return sprintf(buf, "%u\n", e->f.select);
+ return sprintf(buf, "%u\n", e->select);
}

static ssize_t fw_cfg_sysfs_show_name(struct fw_cfg_sysfs_entry *e, char *buf)
{
- return sprintf(buf, "%s\n", e->f.name);
+ return sprintf(buf, "%s\n", e->name);
}

static FW_CFG_SYSFS_ATTR(size);
@@ -318,13 +320,13 @@ static ssize_t fw_cfg_sysfs_read_raw(struct file *filp, struct kobject *kobj,
{
struct fw_cfg_sysfs_entry *entry = to_entry(kobj);

- if (pos > entry->f.size)
+ if (pos > entry->size)
return -EINVAL;

- if (count > entry->f.size - pos)
- count = entry->f.size - pos;
+ if (count > entry->size - pos)
+ count = entry->size - pos;

- fw_cfg_read_blob(entry->f.select, buf, pos, count);
+ fw_cfg_read_blob(entry->select, buf, pos, count);
return count;
}

@@ -443,11 +445,13 @@ static int fw_cfg_register_file(const struct fw_cfg_file *f)
return -ENOMEM;

/* set file entry information */
- memcpy(&entry->f, f, sizeof(struct fw_cfg_file));
+ entry->size = be32_to_cpu(f->size);
+ entry->select = be16_to_cpu(f->select);
+ memcpy(entry->name, f->name, FW_CFG_MAX_FILE_PATH);

/* register entry under "/sys/firmware/qemu_fw_cfg/by_key/" */
err = kobject_init_and_add(&entry->kobj, &fw_cfg_sysfs_entry_ktype,
- fw_cfg_sel_ko, "%d", entry->f.select);
+ fw_cfg_sel_ko, "%d", entry->select);
if (err)
goto err_register;

@@ -457,7 +461,7 @@ static int fw_cfg_register_file(const struct fw_cfg_file *f)
goto err_add_raw;

/* try adding "/sys/firmware/qemu_fw_cfg/by_name/" symlink */
- fw_cfg_build_symlink(fw_cfg_fname_kset, &entry->kobj, entry->f.name);
+ fw_cfg_build_symlink(fw_cfg_fname_kset, &entry->kobj, entry->name);

/* success, add entry to global cache */
fw_cfg_sysfs_cache_enlist(entry);
@@ -489,8 +493,6 @@ static int fw_cfg_register_dir_entries(void)
fw_cfg_read_blob(FW_CFG_FILE_DIR, dir, sizeof(count), dir_size);

for (i = 0; i < count; i++) {
- dir[i].size = be32_to_cpu(dir[i].size);
- dir[i].select = be16_to_cpu(dir[i].select);
ret = fw_cfg_register_file(&dir[i]);
if (ret)
break;
--
2.16.1.73.g5832b7e9f2


2018-02-16 17:38:19

by Marc-André Lureau

[permalink] [raw]
Subject: [PATCH v15 06/11] fw_cfg: fix sparse warnings around FW_CFG_FILE_DIR read

Use struct fw_cfg_files to read the directory size, fixing the sparse
warnings:

drivers/firmware/qemu_fw_cfg.c:485:17: warning: cast to restricted __be32

Signed-off-by: Marc-André Lureau <[email protected]>
---
drivers/firmware/qemu_fw_cfg.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/firmware/qemu_fw_cfg.c b/drivers/firmware/qemu_fw_cfg.c
index 71672cb8c427..805372e8e50d 100644
--- a/drivers/firmware/qemu_fw_cfg.c
+++ b/drivers/firmware/qemu_fw_cfg.c
@@ -478,19 +478,20 @@ static int fw_cfg_register_file(const struct fw_cfg_file *f)
static int fw_cfg_register_dir_entries(void)
{
int ret = 0;
+ __be32 files_count;
u32 count, i;
struct fw_cfg_file *dir;
size_t dir_size;

- fw_cfg_read_blob(FW_CFG_FILE_DIR, &count, 0, sizeof(count));
- count = be32_to_cpu(count);
+ fw_cfg_read_blob(FW_CFG_FILE_DIR, &files_count, 0, sizeof(files_count));
+ count = be32_to_cpu(files_count);
dir_size = count * sizeof(struct fw_cfg_file);

dir = kmalloc(dir_size, GFP_KERNEL);
if (!dir)
return -ENOMEM;

- fw_cfg_read_blob(FW_CFG_FILE_DIR, dir, sizeof(count), dir_size);
+ fw_cfg_read_blob(FW_CFG_FILE_DIR, dir, sizeof(files_count), dir_size);

for (i = 0; i < count; i++) {
ret = fw_cfg_register_file(&dir[i]);
--
2.16.1.73.g5832b7e9f2


2018-02-16 17:38:28

by Marc-André Lureau

[permalink] [raw]
Subject: [PATCH v15 11/11] RFC: fw_cfg: do DMA read operation

Modify fw_cfg_read_blob() to use DMA if the device supports it.
Return errors, because the operation may fail.

So far, only one call in fw_cfg_register_dir_entries() is using
kmalloc'ed buf and is thus clearly eligible to DMA read.

Initially, I didn't implement DMA read to speed up boot time, but as a
first step before introducing DMA write (since read operations were
already presents). Even more, I didn't realize fw-cfg entries were
being read by the kernel during boot by default. But actally fw-cfg
entries are being populated during module probe. I knew DMA improved a
lot bios boot time (the main reason the DMA interface was added
afaik). Let see the time it would take to read the whole ACPI
tables (128kb allocated)

# time cat /sys/firmware/qemu_fw_cfg/by_name/etc/acpi/tables/raw
- with DMA: sys 0m0.003s
- without DMA (-global fw_cfg.dma_enabled=off): sys 0m7.674s

FW_CFG_FILE_DIR (0x19) is the only "file" that is read during kernel
boot to populate sysfs qemu_fw_cfg directory, and it is quite
small (1-2kb). Since it does not expose itself, in order to measure
the time it takes to read such small file, I took a comparable sized
file of 2048 bytes and exposed it (-fw_cfg test,file=file with a
modified read_raw enabling DMA)

# perf stat -r 100 cat /sys/firmware/qemu_fw_cfg/by_name/test/raw >/dev/null
- with DMA:
0.636037 task-clock (msec) # 0.141 CPUs utilized ( +- 1.19% )
- without DMA:
6.430128 task-clock (msec) # 0.622 CPUs utilized ( +- 0.22% )

That's a few msec saved during boot by enabling DMA read (the gain
would be more substantial if other & bigger fw-cfg entries are read by
others from sysfs, unfortunately, it's not clear if we can always
enable DMA there)

Signed-off-by: Marc-André Lureau <[email protected]>
---
drivers/firmware/qemu_fw_cfg.c | 61 ++++++++++++++++++++++++++++++++++--------
1 file changed, 50 insertions(+), 11 deletions(-)

diff --git a/drivers/firmware/qemu_fw_cfg.c b/drivers/firmware/qemu_fw_cfg.c
index 3015e77aebca..94df57e9be66 100644
--- a/drivers/firmware/qemu_fw_cfg.c
+++ b/drivers/firmware/qemu_fw_cfg.c
@@ -124,12 +124,47 @@ static ssize_t fw_cfg_dma_transfer(void *address, u32 length, u32 control)
return ret;
}

+/* with acpi & dev locks taken */
+static ssize_t fw_cfg_read_blob_dma(u16 key,
+ void *buf, loff_t pos, size_t count)
+{
+ ssize_t ret;
+
+ if (pos == 0) {
+ ret = fw_cfg_dma_transfer(buf, count, key << 16
+ | FW_CFG_DMA_CTL_SELECT
+ | FW_CFG_DMA_CTL_READ);
+ } else {
+ fw_cfg_sel_endianness(key);
+ ret = fw_cfg_dma_transfer(NULL, pos, FW_CFG_DMA_CTL_SKIP);
+ if (ret < 0)
+ return ret;
+ ret = fw_cfg_dma_transfer(buf, count,
+ FW_CFG_DMA_CTL_READ);
+ }
+
+ return ret;
+}
+
+/* with acpi & dev locks taken */
+static ssize_t fw_cfg_read_blob_io(u16 key,
+ void *buf, loff_t pos, size_t count)
+{
+ fw_cfg_sel_endianness(key);
+ while (pos-- > 0)
+ ioread8(fw_cfg_reg_data);
+ ioread8_rep(fw_cfg_reg_data, buf, count);
+ return count;
+}
+
/* read chunk of given fw_cfg blob (caller responsible for sanity-check) */
static ssize_t fw_cfg_read_blob(u16 key,
- void *buf, loff_t pos, size_t count)
+ void *buf, loff_t pos, size_t count,
+ bool dma)
{
u32 glk = -1U;
acpi_status status;
+ ssize_t ret;

/* If we have ACPI, ensure mutual exclusion against any potential
* device access by the firmware, e.g. via AML methods:
@@ -143,14 +178,17 @@ static ssize_t fw_cfg_read_blob(u16 key,
}

mutex_lock(&fw_cfg_dev_lock);
- fw_cfg_sel_endianness(key);
- while (pos-- > 0)
- ioread8(fw_cfg_reg_data);
- ioread8_rep(fw_cfg_reg_data, buf, count);
+ if (dma && fw_cfg_dma_enabled()) {
+ ret = fw_cfg_read_blob_dma(key, buf, pos, count);
+ } else {
+ ret = fw_cfg_read_blob_io(key, buf, pos, count);
+ }
+
mutex_unlock(&fw_cfg_dev_lock);

acpi_release_global_lock(glk);
- return count;
+
+ return ret;
}

#ifdef CONFIG_CRASH_CORE
@@ -284,7 +322,7 @@ static int fw_cfg_do_platform_probe(struct platform_device *pdev)

/* verify fw_cfg device signature */
if (fw_cfg_read_blob(FW_CFG_SIGNATURE, sig,
- 0, FW_CFG_SIG_SIZE) < 0 ||
+ 0, FW_CFG_SIG_SIZE, false) < 0 ||
memcmp(sig, "QEMU", FW_CFG_SIG_SIZE) != 0) {
fw_cfg_io_cleanup();
return -ENODEV;
@@ -468,7 +506,8 @@ static ssize_t fw_cfg_sysfs_read_raw(struct file *filp, struct kobject *kobj,
if (count > entry->size - pos)
count = entry->size - pos;

- return fw_cfg_read_blob(entry->select, buf, pos, count);
+ /* do not use DMA, virt_to_phys(buf) might not be ok */
+ return fw_cfg_read_blob(entry->select, buf, pos, count, false);
}

static struct bin_attribute fw_cfg_sysfs_attr_raw = {
@@ -634,7 +673,7 @@ static int fw_cfg_register_dir_entries(void)
size_t dir_size;

ret = fw_cfg_read_blob(FW_CFG_FILE_DIR, &files_count,
- 0, sizeof(files_count));
+ 0, sizeof(files_count), false);
if (ret < 0)
return ret;

@@ -646,7 +685,7 @@ static int fw_cfg_register_dir_entries(void)
return -ENOMEM;

ret = fw_cfg_read_blob(FW_CFG_FILE_DIR, dir,
- sizeof(files_count), dir_size);
+ sizeof(files_count), dir_size, false);
if (ret < 0)
goto end;

@@ -697,7 +736,7 @@ static int fw_cfg_sysfs_probe(struct platform_device *pdev)
goto err_probe;

/* get revision number, add matching top-level attribute */
- err = fw_cfg_read_blob(FW_CFG_ID, &rev, 0, sizeof(rev));
+ err = fw_cfg_read_blob(FW_CFG_ID, &rev, 0, sizeof(rev), false);
if (err < 0)
goto err_probe;

--
2.16.1.73.g5832b7e9f2


2018-02-27 00:06:03

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v15 11/11] RFC: fw_cfg: do DMA read operation

On Thu, Feb 15, 2018 at 10:33:12PM +0100, Marc-Andr? Lureau wrote:
> Modify fw_cfg_read_blob() to use DMA if the device supports it.
> Return errors, because the operation may fail.
>
> So far, only one call in fw_cfg_register_dir_entries() is using
> kmalloc'ed buf and is thus clearly eligible to DMA read.
>
> Initially, I didn't implement DMA read to speed up boot time, but as a
> first step before introducing DMA write (since read operations were
> already presents). Even more, I didn't realize fw-cfg entries were
> being read by the kernel during boot by default. But actally fw-cfg
> entries are being populated during module probe. I knew DMA improved a
> lot bios boot time (the main reason the DMA interface was added
> afaik). Let see the time it would take to read the whole ACPI
> tables (128kb allocated)
>
> # time cat /sys/firmware/qemu_fw_cfg/by_name/etc/acpi/tables/raw
> - with DMA: sys 0m0.003s
> - without DMA (-global fw_cfg.dma_enabled=off): sys 0m7.674s
>
> FW_CFG_FILE_DIR (0x19) is the only "file" that is read during kernel
> boot to populate sysfs qemu_fw_cfg directory, and it is quite
> small (1-2kb). Since it does not expose itself, in order to measure
> the time it takes to read such small file, I took a comparable sized
> file of 2048 bytes and exposed it (-fw_cfg test,file=file with a
> modified read_raw enabling DMA)
>
> # perf stat -r 100 cat /sys/firmware/qemu_fw_cfg/by_name/test/raw >/dev/null
> - with DMA:
> 0.636037 task-clock (msec) # 0.141 CPUs utilized ( +- 1.19% )
> - without DMA:
> 6.430128 task-clock (msec) # 0.622 CPUs utilized ( +- 0.22% )
>
> That's a few msec saved during boot by enabling DMA read (the gain
> would be more substantial if other & bigger fw-cfg entries are read by
> others from sysfs, unfortunately, it's not clear if we can always
> enable DMA there)
>
> Signed-off-by: Marc-Andr? Lureau <[email protected]>
> ---
> drivers/firmware/qemu_fw_cfg.c | 61 ++++++++++++++++++++++++++++++++++--------
> 1 file changed, 50 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/firmware/qemu_fw_cfg.c b/drivers/firmware/qemu_fw_cfg.c
> index 3015e77aebca..94df57e9be66 100644
> --- a/drivers/firmware/qemu_fw_cfg.c
> +++ b/drivers/firmware/qemu_fw_cfg.c
> @@ -124,12 +124,47 @@ static ssize_t fw_cfg_dma_transfer(void *address, u32 length, u32 control)
> return ret;
> }
>
> +/* with acpi & dev locks taken */
> +static ssize_t fw_cfg_read_blob_dma(u16 key,
> + void *buf, loff_t pos, size_t count)
> +{
> + ssize_t ret;
> +
> + if (pos == 0) {
> + ret = fw_cfg_dma_transfer(buf, count, key << 16
> + | FW_CFG_DMA_CTL_SELECT
> + | FW_CFG_DMA_CTL_READ);
> + } else {
> + fw_cfg_sel_endianness(key);
> + ret = fw_cfg_dma_transfer(NULL, pos, FW_CFG_DMA_CTL_SKIP);
> + if (ret < 0)
> + return ret;
> + ret = fw_cfg_dma_transfer(buf, count,
> + FW_CFG_DMA_CTL_READ);
> + }
> +
> + return ret;
> +}
> +
> +/* with acpi & dev locks taken */
> +static ssize_t fw_cfg_read_blob_io(u16 key,
> + void *buf, loff_t pos, size_t count)
> +{
> + fw_cfg_sel_endianness(key);
> + while (pos-- > 0)
> + ioread8(fw_cfg_reg_data);
> + ioread8_rep(fw_cfg_reg_data, buf, count);
> + return count;
> +}
> +
> /* read chunk of given fw_cfg blob (caller responsible for sanity-check) */
> static ssize_t fw_cfg_read_blob(u16 key,
> - void *buf, loff_t pos, size_t count)
> + void *buf, loff_t pos, size_t count,
> + bool dma)
> {
> u32 glk = -1U;
> acpi_status status;
> + ssize_t ret;
>
> /* If we have ACPI, ensure mutual exclusion against any potential
> * device access by the firmware, e.g. via AML methods:

so this adds a dma flag to fw_cfg_read_blob.



> @@ -143,14 +178,17 @@ static ssize_t fw_cfg_read_blob(u16 key,
> }
>
> mutex_lock(&fw_cfg_dev_lock);
> - fw_cfg_sel_endianness(key);
> - while (pos-- > 0)
> - ioread8(fw_cfg_reg_data);
> - ioread8_rep(fw_cfg_reg_data, buf, count);
> + if (dma && fw_cfg_dma_enabled()) {
> + ret = fw_cfg_read_blob_dma(key, buf, pos, count);
> + } else {
> + ret = fw_cfg_read_blob_io(key, buf, pos, count);
> + }
> +
> mutex_unlock(&fw_cfg_dev_lock);
>
> acpi_release_global_lock(glk);
> - return count;
> +
> + return ret;
> }
>
> #ifdef CONFIG_CRASH_CORE

If set to false it does io, if set to true it does dma.

I would prefer passing an accessor function pointer
since that's clearer than true/false.


> @@ -284,7 +322,7 @@ static int fw_cfg_do_platform_probe(struct platform_device *pdev)
>
> /* verify fw_cfg device signature */
> if (fw_cfg_read_blob(FW_CFG_SIGNATURE, sig,
> - 0, FW_CFG_SIG_SIZE) < 0 ||
> + 0, FW_CFG_SIG_SIZE, false) < 0 ||
> memcmp(sig, "QEMU", FW_CFG_SIG_SIZE) != 0) {
> fw_cfg_io_cleanup();
> return -ENODEV;
> @@ -468,7 +506,8 @@ static ssize_t fw_cfg_sysfs_read_raw(struct file *filp, struct kobject *kobj,
> if (count > entry->size - pos)
> count = entry->size - pos;
>
> - return fw_cfg_read_blob(entry->select, buf, pos, count);
> + /* do not use DMA, virt_to_phys(buf) might not be ok */
> + return fw_cfg_read_blob(entry->select, buf, pos, count, false);
> }
>
> static struct bin_attribute fw_cfg_sysfs_attr_raw = {
> @@ -634,7 +673,7 @@ static int fw_cfg_register_dir_entries(void)
> size_t dir_size;
>
> ret = fw_cfg_read_blob(FW_CFG_FILE_DIR, &files_count,
> - 0, sizeof(files_count));
> + 0, sizeof(files_count), false);
> if (ret < 0)
> return ret;
>
> @@ -646,7 +685,7 @@ static int fw_cfg_register_dir_entries(void)
> return -ENOMEM;
>
> ret = fw_cfg_read_blob(FW_CFG_FILE_DIR, dir,
> - sizeof(files_count), dir_size);
> + sizeof(files_count), dir_size, false);
> if (ret < 0)
> goto end;
>
> @@ -697,7 +736,7 @@ static int fw_cfg_sysfs_probe(struct platform_device *pdev)
> goto err_probe;
>
> /* get revision number, add matching top-level attribute */
> - err = fw_cfg_read_blob(FW_CFG_ID, &rev, 0, sizeof(rev));
> + err = fw_cfg_read_blob(FW_CFG_ID, &rev, 0, sizeof(rev), false);
> if (err < 0)
> goto err_probe;


Looks like all callers pass in false as parameter.
Given this, how can this speed up any operations?

Are you sure you tested this properly?

> --
> 2.16.1.73.g5832b7e9f2

2018-02-27 00:07:25

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v15 02/11] fw_cfg: add a public uapi header

On Thu, Feb 15, 2018 at 10:33:03PM +0100, Marc-Andr? Lureau wrote:
> Create a common header file for well-known values and structures to be
> shared by the Linux kernel with qemu or other projects.
>
> It is based from qemu/docs/specs/fw_cfg.txt which references
> qemu/include/hw/nvram/fw_cfg_keys.h "for the most up-to-date and
> authoritative list" & vmcoreinfo.txt. Those files don't have an
> explicit license, but qemu/hw/nvram/fw_cfg.c is BSD-license, so
> Michael S. Tsirkin suggested to use the same license.
>
> The patch intentionally left out DMA & vmcoreinfo structures &
> defines, which are added in the commits making usage of it.
>
> Suggested-by: Michael S. Tsirkin <[email protected]>
> Signed-off-by: Marc-Andr? Lureau <[email protected]>
>
> ---
>
> The related qemu patch making use of it, to be submitted:
> https://github.com/elmarco/qemu/commit/4884fc9e9c4c4467a371e5a40f3181239e1b70f5
> ---
> MAINTAINERS | 1 +
> drivers/firmware/qemu_fw_cfg.c | 22 ++------------
> include/uapi/linux/fw_cfg.h | 66 ++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 69 insertions(+), 20 deletions(-)
> create mode 100644 include/uapi/linux/fw_cfg.h
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 3bdc260e36b7..a66b65f62811 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -11352,6 +11352,7 @@ M: "Michael S. Tsirkin" <[email protected]>
> L: [email protected]
> S: Maintained
> F: drivers/firmware/qemu_fw_cfg.c
> +F: include/uapi/linux/fw_cfg.h
>
> QIB DRIVER
> M: Dennis Dalessandro <[email protected]>

Why fw_cfg.h and not qemu_fw_cfg.h ? fw_cfg.h seems too generic.

> diff --git a/drivers/firmware/qemu_fw_cfg.c b/drivers/firmware/qemu_fw_cfg.c
> index a41b572eeeb1..42601a3eaed5 100644
> --- a/drivers/firmware/qemu_fw_cfg.c
> +++ b/drivers/firmware/qemu_fw_cfg.c
> @@ -32,30 +32,12 @@
> #include <linux/slab.h>
> #include <linux/io.h>
> #include <linux/ioport.h>
> +#include <uapi/linux/fw_cfg.h>
>
> MODULE_AUTHOR("Gabriel L. Somlo <[email protected]>");
> MODULE_DESCRIPTION("QEMU fw_cfg sysfs support");
> MODULE_LICENSE("GPL");
>
> -/* selector key values for "well-known" fw_cfg entries */
> -#define FW_CFG_SIGNATURE 0x00
> -#define FW_CFG_ID 0x01
> -#define FW_CFG_FILE_DIR 0x19
> -
> -/* size in bytes of fw_cfg signature */
> -#define FW_CFG_SIG_SIZE 4
> -
> -/* fw_cfg "file name" is up to 56 characters (including terminating nul) */
> -#define FW_CFG_MAX_FILE_PATH 56
> -
> -/* fw_cfg file directory entry type */
> -struct fw_cfg_file {
> - u32 size;
> - u16 select;
> - u16 reserved;
> - char name[FW_CFG_MAX_FILE_PATH];
> -};
> -
> /* fw_cfg device i/o register addresses */
> static bool fw_cfg_is_mmio;
> static phys_addr_t fw_cfg_p_base;
> @@ -597,7 +579,7 @@ MODULE_DEVICE_TABLE(of, fw_cfg_sysfs_mmio_match);
>
> #ifdef CONFIG_ACPI
> static const struct acpi_device_id fw_cfg_sysfs_acpi_match[] = {
> - { "QEMU0002", },
> + { FW_CFG_ACPI_DEVICE_ID, },
> {},
> };
> MODULE_DEVICE_TABLE(acpi, fw_cfg_sysfs_acpi_match);
> diff --git a/include/uapi/linux/fw_cfg.h b/include/uapi/linux/fw_cfg.h
> new file mode 100644
> index 000000000000..c698ac3812f6
> --- /dev/null
> +++ b/include/uapi/linux/fw_cfg.h
> @@ -0,0 +1,66 @@
> +/* SPDX-License-Identifier: BSD-3-Clause */
> +#ifndef _LINUX_FW_CFG_H
> +#define _LINUX_FW_CFG_H
> +
> +#include <linux/types.h>
> +
> +#define FW_CFG_ACPI_DEVICE_ID "QEMU0002"
> +
> +/* selector key values for "well-known" fw_cfg entries */
> +#define FW_CFG_SIGNATURE 0x00
> +#define FW_CFG_ID 0x01
> +#define FW_CFG_UUID 0x02
> +#define FW_CFG_RAM_SIZE 0x03
> +#define FW_CFG_NOGRAPHIC 0x04
> +#define FW_CFG_NB_CPUS 0x05
> +#define FW_CFG_MACHINE_ID 0x06
> +#define FW_CFG_KERNEL_ADDR 0x07
> +#define FW_CFG_KERNEL_SIZE 0x08
> +#define FW_CFG_KERNEL_CMDLINE 0x09
> +#define FW_CFG_INITRD_ADDR 0x0a
> +#define FW_CFG_INITRD_SIZE 0x0b
> +#define FW_CFG_BOOT_DEVICE 0x0c
> +#define FW_CFG_NUMA 0x0d
> +#define FW_CFG_BOOT_MENU 0x0e
> +#define FW_CFG_MAX_CPUS 0x0f
> +#define FW_CFG_KERNEL_ENTRY 0x10
> +#define FW_CFG_KERNEL_DATA 0x11
> +#define FW_CFG_INITRD_DATA 0x12
> +#define FW_CFG_CMDLINE_ADDR 0x13
> +#define FW_CFG_CMDLINE_SIZE 0x14
> +#define FW_CFG_CMDLINE_DATA 0x15
> +#define FW_CFG_SETUP_ADDR 0x16
> +#define FW_CFG_SETUP_SIZE 0x17
> +#define FW_CFG_SETUP_DATA 0x18
> +#define FW_CFG_FILE_DIR 0x19
> +
> +#define FW_CFG_FILE_FIRST 0x20
> +#define FW_CFG_FILE_SLOTS_MIN 0x10
> +
> +#define FW_CFG_WRITE_CHANNEL 0x4000
> +#define FW_CFG_ARCH_LOCAL 0x8000
> +#define FW_CFG_ENTRY_MASK (~(FW_CFG_WRITE_CHANNEL | FW_CFG_ARCH_LOCAL))
> +
> +#define FW_CFG_INVALID 0xffff
> +
> +/* width in bytes of fw_cfg control register */
> +#define FW_CFG_CTL_SIZE 0x02
> +
> +/* fw_cfg "file name" is up to 56 characters (including terminating nul) */
> +#define FW_CFG_MAX_FILE_PATH 56
> +
> +/* size in bytes of fw_cfg signature */
> +#define FW_CFG_SIG_SIZE 4
> +
> +/* FW_CFG_ID bits */
> +#define FW_CFG_VERSION 0x01
> +
> +/* fw_cfg file directory entry type */
> +struct fw_cfg_file {
> + __be32 size;
> + __be16 select;
> + __u16 reserved;
> + char name[FW_CFG_MAX_FILE_PATH];
> +};
> +
> +#endif
> --
> 2.16.1.73.g5832b7e9f2

2018-02-27 00:21:37

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v15 08/11] fw_cfg: handle fw_cfg_read_blob() error

On Thu, Feb 15, 2018 at 10:33:09PM +0100, Marc-Andr? Lureau wrote:
> fw_cfg_read_blob() may fail, but does not return error. This may lead
> to undefined behaviours, such as a memcmp(sig, "QEMU") on uninitilized
> memory.

I don't think that's true - there's a memset there that
will initialize the memory. probe is likely the only
case where it returns a slightly incorrect data.

> Return an error if ACPI locking failed. Also, the following
> DMA read/write extension will add more error paths that should be
> handled appropriately.
>
> Signed-off-by: Marc-Andr? Lureau <[email protected]>
> ---
> drivers/firmware/qemu_fw_cfg.c | 32 ++++++++++++++++++++++----------
> 1 file changed, 22 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/firmware/qemu_fw_cfg.c b/drivers/firmware/qemu_fw_cfg.c
> index f6f90bef604c..5e6e5ac71dab 100644
> --- a/drivers/firmware/qemu_fw_cfg.c
> +++ b/drivers/firmware/qemu_fw_cfg.c
> @@ -59,8 +59,8 @@ static void fw_cfg_sel_endianness(u16 key)
> }
>
> /* read chunk of given fw_cfg blob (caller responsible for sanity-check) */
> -static void fw_cfg_read_blob(u16 key,
> - void *buf, loff_t pos, size_t count)
> +static ssize_t fw_cfg_read_blob(u16 key,
> + void *buf, loff_t pos, size_t count)
> {
> u32 glk = -1U;
> acpi_status status;
> @@ -73,7 +73,7 @@ static void fw_cfg_read_blob(u16 key,
> /* Should never get here */
> WARN(1, "fw_cfg_read_blob: Failed to lock ACPI!\n");
> memset(buf, 0, count);
> - return;
> + return -EINVAL;
> }
>
> mutex_lock(&fw_cfg_dev_lock);

Wouldn't something like -EBUSY be more appropriate?

> @@ -84,6 +84,7 @@ static void fw_cfg_read_blob(u16 key,
> mutex_unlock(&fw_cfg_dev_lock);
>
> acpi_release_global_lock(glk);
> + return count;
> }
>
> /* clean up fw_cfg device i/o */
> @@ -165,8 +166,9 @@ static int fw_cfg_do_platform_probe(struct platform_device *pdev)
> }
>
> /* verify fw_cfg device signature */
> - fw_cfg_read_blob(FW_CFG_SIGNATURE, sig, 0, FW_CFG_SIG_SIZE);
> - if (memcmp(sig, "QEMU", FW_CFG_SIG_SIZE) != 0) {
> + if (fw_cfg_read_blob(FW_CFG_SIGNATURE, sig,
> + 0, FW_CFG_SIG_SIZE) < 0 ||
> + memcmp(sig, "QEMU", FW_CFG_SIG_SIZE) != 0) {
> fw_cfg_io_cleanup();
> return -ENODEV;
> }
> @@ -326,8 +328,7 @@ static ssize_t fw_cfg_sysfs_read_raw(struct file *filp, struct kobject *kobj,
> if (count > entry->size - pos)
> count = entry->size - pos;
>
> - fw_cfg_read_blob(entry->select, buf, pos, count);
> - return count;
> + return fw_cfg_read_blob(entry->select, buf, pos, count);
> }
>
> static struct bin_attribute fw_cfg_sysfs_attr_raw = {
> @@ -483,7 +484,11 @@ static int fw_cfg_register_dir_entries(void)
> struct fw_cfg_file *dir;
> size_t dir_size;
>
> - fw_cfg_read_blob(FW_CFG_FILE_DIR, &files_count, 0, sizeof(files_count));
> + ret = fw_cfg_read_blob(FW_CFG_FILE_DIR, &files_count,
> + 0, sizeof(files_count));
> + if (ret < 0)
> + return ret;
> +
> count = be32_to_cpu(files_count);
> dir_size = count * sizeof(struct fw_cfg_file);
>
> @@ -491,7 +496,10 @@ static int fw_cfg_register_dir_entries(void)
> if (!dir)
> return -ENOMEM;
>
> - fw_cfg_read_blob(FW_CFG_FILE_DIR, dir, sizeof(files_count), dir_size);
> + ret = fw_cfg_read_blob(FW_CFG_FILE_DIR, dir,
> + sizeof(files_count), dir_size);
> + if (ret < 0)
> + goto end;
>
> for (i = 0; i < count; i++) {
> ret = fw_cfg_register_file(&dir[i]);
> @@ -499,6 +507,7 @@ static int fw_cfg_register_dir_entries(void)
> break;
> }
>
> +end:
> kfree(dir);
> return ret;
> }
> @@ -539,7 +548,10 @@ static int fw_cfg_sysfs_probe(struct platform_device *pdev)
> goto err_probe;
>
> /* get revision number, add matching top-level attribute */
> - fw_cfg_read_blob(FW_CFG_ID, &rev, 0, sizeof(rev));
> + err = fw_cfg_read_blob(FW_CFG_ID, &rev, 0, sizeof(rev));
> + if (err < 0)
> + goto err_probe;
> +
> fw_cfg_rev = le32_to_cpu(rev);
> err = sysfs_create_file(fw_cfg_top_ko, &fw_cfg_rev_attr.attr);
> if (err)

I think that this is the only case where it's not doing the right thing right now in
that it shows 0 as the revision to the users. Is it worth failing probe
here? We could just skip the attribute, could we not?

> --
> 2.16.1.73.g5832b7e9f2

2018-02-27 00:29:20

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v15 10/11] fw_cfg: write vmcoreinfo details

On Thu, Feb 15, 2018 at 10:33:11PM +0100, Marc-Andr? Lureau wrote:
> If the "etc/vmcoreinfo" fw_cfg file is present and we are not running
> the kdump kernel, write the addr/size of the vmcoreinfo ELF note.
>
> The DMA operation is expected to run synchronously with today qemu,
> but the specification states that it may become async, so we run
> "control" field check in a loop for eventual changes.
>
> Signed-off-by: Marc-Andr? Lureau <[email protected]>
> ---
> drivers/firmware/qemu_fw_cfg.c | 143 ++++++++++++++++++++++++++++++++++++++++-
> include/uapi/linux/fw_cfg.h | 31 +++++++++
> 2 files changed, 171 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/firmware/qemu_fw_cfg.c b/drivers/firmware/qemu_fw_cfg.c
> index c28bec4b5663..3015e77aebca 100644
> --- a/drivers/firmware/qemu_fw_cfg.c
> +++ b/drivers/firmware/qemu_fw_cfg.c
> @@ -34,11 +34,17 @@
> #include <linux/io.h>
> #include <linux/ioport.h>
> #include <uapi/linux/fw_cfg.h>
> +#include <linux/delay.h>
> +#include <linux/crash_dump.h>
> +#include <linux/crash_core.h>
>
> MODULE_AUTHOR("Gabriel L. Somlo <[email protected]>");
> MODULE_DESCRIPTION("QEMU fw_cfg sysfs support");
> MODULE_LICENSE("GPL");
>
> +/* fw_cfg revision attribute, in /sys/firmware/qemu_fw_cfg top-level dir. */
> +static u32 fw_cfg_rev;
> +
> /* fw_cfg device i/o register addresses */
> static bool fw_cfg_is_mmio;
> static phys_addr_t fw_cfg_p_base;
> @@ -60,6 +66,64 @@ static void fw_cfg_sel_endianness(u16 key)
> iowrite16(key, fw_cfg_reg_ctrl);
> }
>
> +static inline bool fw_cfg_dma_enabled(void)
> +{
> + return (fw_cfg_rev & FW_CFG_VERSION_DMA) && fw_cfg_reg_dma;
> +}
> +
> +/* qemu fw_cfg device is sync today, but spec says it may become async */
> +static void fw_cfg_wait_for_control(struct fw_cfg_dma_access *d)
> +{
> + for (;;) {
> + u32 ctrl = be32_to_cpu(READ_ONCE(d->control));
> +
> + /* do not reorder the read to d->control */
> + rmb();
> + if ((ctrl & ~FW_CFG_DMA_CTL_ERROR) == 0)
> + return;
> +
> + cpu_relax();
> + }
> +}
> +
> +static ssize_t fw_cfg_dma_transfer(void *address, u32 length, u32 control)
> +{
> + phys_addr_t dma;
> + struct fw_cfg_dma_access *d = NULL;
> + ssize_t ret = length;
> +
> + d = kmalloc(sizeof(*d), GFP_KERNEL);
> + if (!d) {
> + ret = -ENOMEM;
> + goto end;
> + }
> +
> + /* fw_cfg device does not need IOMMU protection, so use physical addresses */
> + *d = (struct fw_cfg_dma_access) {
> + .address = cpu_to_be64(address ? virt_to_phys(address) : 0),
> + .length = cpu_to_be32(length),
> + .control = cpu_to_be32(control)
> + };
> +
> + dma = virt_to_phys(d);
> +
> + iowrite32be((u64)dma >> 32, fw_cfg_reg_dma);
> + /* force memory to sync before notifying device via MMIO */
> + wmb();
> + iowrite32be(dma, fw_cfg_reg_dma + 4);
> +
> + fw_cfg_wait_for_control(d);
> +
> + if (be32_to_cpu(READ_ONCE(d->control)) & FW_CFG_DMA_CTL_ERROR) {
> + ret = -EIO;
> + }
> +
> +end:
> + kfree(d);
> +
> + return ret;
> +}
> +
> /* read chunk of given fw_cfg blob (caller responsible for sanity-check) */
> static ssize_t fw_cfg_read_blob(u16 key,
> void *buf, loff_t pos, size_t count)
> @@ -89,6 +153,47 @@ static ssize_t fw_cfg_read_blob(u16 key,
> return count;
> }
>
> +#ifdef CONFIG_CRASH_CORE
> +/* write chunk of given fw_cfg blob (caller responsible for sanity-check) */
> +static ssize_t fw_cfg_write_blob(u16 key,
> + void *buf, loff_t pos, size_t count)
> +{
> + u32 glk = -1U;
> + acpi_status status;
> + ssize_t ret = count;
> +
> + /* If we have ACPI, ensure mutual exclusion against any potential
> + * device access by the firmware, e.g. via AML methods:
> + */
> + status = acpi_acquire_global_lock(ACPI_WAIT_FOREVER, &glk);
> + if (ACPI_FAILURE(status) && status != AE_NOT_CONFIGURED) {
> + /* Should never get here */
> + WARN(1, "%s: Failed to lock ACPI!\n", __func__);
> + return -EINVAL;
> + }
> +
> + mutex_lock(&fw_cfg_dev_lock);
> + if (pos == 0) {
> + ret = fw_cfg_dma_transfer(buf, count, key << 16
> + | FW_CFG_DMA_CTL_SELECT
> + | FW_CFG_DMA_CTL_WRITE);
> + } else {
> + fw_cfg_sel_endianness(key);
> + ret = fw_cfg_dma_transfer(NULL, pos, FW_CFG_DMA_CTL_SKIP);
> + if (ret < 0)
> + goto end;
> + ret = fw_cfg_dma_transfer(buf, count, FW_CFG_DMA_CTL_WRITE);
> + }
> +
> +end:
> + mutex_unlock(&fw_cfg_dev_lock);
> +
> + acpi_release_global_lock(glk);
> +
> + return ret;
> +}
> +#endif /* CONFIG_CRASH_CORE */
> +
> /* clean up fw_cfg device i/o */
> static void fw_cfg_io_cleanup(void)
> {
> @@ -188,9 +293,6 @@ static int fw_cfg_do_platform_probe(struct platform_device *pdev)
> return 0;
> }
>
> -/* fw_cfg revision attribute, in /sys/firmware/qemu_fw_cfg top-level dir. */
> -static u32 fw_cfg_rev;
> -
> static ssize_t fw_cfg_showrev(struct kobject *k, struct attribute *a, char *buf)
> {
> return sprintf(buf, "%u\n", fw_cfg_rev);
> @@ -213,6 +315,32 @@ struct fw_cfg_sysfs_entry {
> struct list_head list;
> };
>
> +#ifdef CONFIG_CRASH_CORE
> +static ssize_t fw_cfg_write_vmcoreinfo(const struct fw_cfg_file *f)
> +{
> + static struct fw_cfg_vmcoreinfo *data;
> + ssize_t ret;
> +
> + data = kmalloc(sizeof(struct fw_cfg_vmcoreinfo), GFP_KERNEL);
> + if (!data)
> + return -ENOMEM;
> +
> + *data = (struct fw_cfg_vmcoreinfo) {
> + .guest_format = cpu_to_le16(FW_CFG_VMCOREINFO_FORMAT_ELF),
> + .size = cpu_to_le32(VMCOREINFO_NOTE_SIZE),
> + .paddr = cpu_to_le64(paddr_vmcoreinfo_note())
> + };
> + /* spare ourself reading host format support for now since we
> + * don't know what else to format - host may ignore ours


qemu documentation probably should mention that it must.

> + */
> + ret = fw_cfg_write_blob(be16_to_cpu(f->select), data,
> + 0, sizeof(struct fw_cfg_vmcoreinfo));
> +
> + kfree(data);
> + return ret;
> +}
> +#endif /* CONFIG_CRASH_CORE */
> +
> /* get fw_cfg_sysfs_entry from kobject member */
> static inline struct fw_cfg_sysfs_entry *to_entry(struct kobject *kobj)
> {
> @@ -452,6 +580,15 @@ static int fw_cfg_register_file(const struct fw_cfg_file *f)
> int err;
> struct fw_cfg_sysfs_entry *entry;
>
> +#ifdef CONFIG_CRASH_CORE
> + if (fw_cfg_dma_enabled() &&
> + strcmp(f->name, FW_CFG_VMCOREINFO_FILENAME) == 0 &&
> + !is_kdump_kernel()) {
> + if (fw_cfg_write_vmcoreinfo(f) < 0)
> + pr_warn("fw_cfg: failed to write vmcoreinfo");
> + }
> +#endif
> +
> /* allocate new entry */
> entry = kzalloc(sizeof(*entry), GFP_KERNEL);
> if (!entry)
> diff --git a/include/uapi/linux/fw_cfg.h b/include/uapi/linux/fw_cfg.h
> index c698ac3812f6..e089c0159ec2 100644
> --- a/include/uapi/linux/fw_cfg.h
> +++ b/include/uapi/linux/fw_cfg.h
> @@ -54,6 +54,7 @@
>
> /* FW_CFG_ID bits */
> #define FW_CFG_VERSION 0x01
> +#define FW_CFG_VERSION_DMA 0x02
>
> /* fw_cfg file directory entry type */
> struct fw_cfg_file {
> @@ -63,4 +64,34 @@ struct fw_cfg_file {
> char name[FW_CFG_MAX_FILE_PATH];
> };
>
> +/* FW_CFG_DMA_CONTROL bits */
> +#define FW_CFG_DMA_CTL_ERROR 0x01
> +#define FW_CFG_DMA_CTL_READ 0x02
> +#define FW_CFG_DMA_CTL_SKIP 0x04
> +#define FW_CFG_DMA_CTL_SELECT 0x08
> +#define FW_CFG_DMA_CTL_WRITE 0x10
> +
> +#define FW_CFG_DMA_SIGNATURE 0x51454d5520434647ULL /* "QEMU CFG" */
> +
> +/* Control as first field allows for different structures selected by this
> + * field, which might be useful in the future
> + */
> +struct fw_cfg_dma_access {
> + __be32 control;
> + __be32 length;
> + __be64 address;
> +};
> +
> +#define FW_CFG_VMCOREINFO_FILENAME "etc/vmcoreinfo"
> +
> +#define FW_CFG_VMCOREINFO_FORMAT_NONE 0x0
> +#define FW_CFG_VMCOREINFO_FORMAT_ELF 0x1
> +
> +struct fw_cfg_vmcoreinfo {
> + __le16 host_format;
> + __le16 guest_format;
> + __le32 size;
> + __le64 paddr;
> +};
> +
> #endif
> --
> 2.16.1.73.g5832b7e9f2

2018-02-27 00:30:50

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v15 00/11] fw_cfg: add DMA operations & etc/vmcoreinfo support

On Thu, Feb 15, 2018 at 10:33:01PM +0100, Marc-Andr? Lureau wrote:
> Hi,
>
> This series adds DMA operations support to the qemu fw_cfg kernel
> module and populates "etc/vmcoreinfo" with vmcoreinfo location
> details (entry added since qemu 2.11 with -device vmcoreinfo).

Pls reorder with patches 3-7 first as these are fixes.

> v15:
> - fix fw_cfg.h uapi header #include
> - use BSD license for fw_cfg.h uapi header
> - move the uapi defines/structs for DMA & vmcoreinfo in the
> corresponding patch
> - use cpu_relax() instead of usleep_range(50, 100);
> - replace do { } while(true) by for (;;)
> - fix the rmb() call location
> - add a preliminary patch to handle error from fw_cfg_write_blob()
> - rewrite fw_cfg_sel_endianness() to wrap iowrite() calls
>
> v14:
> - add "fw_cfg: add a public uapi header"
> - fix sparse warnings & don't introduce new warnings
> - add memory barriers to force IO ordering
> - split fw_cfg_read_blob() in fw_cfg_read_blob_io() and
> fw_cfg_read_blob_dma()
> - add error handling to fw_cfg_read_blob() callers
> - minor stylistic changes
>
> v13:
> - reorder patch series, introduce DMA write before DMA read
> - do some measurements of DMA read speed-ups
>
> v12:
> - fix virt_to_phys(NULL) panic with CONFIG_DEBUG_VIRTUAL=y
> - do not use DMA read, except for kmalloc() memory we allocated
> ourself (only fw_cfg_register_dir_entries() so far)
>
> v11:
> - add #include <linux/crash_core.h> in last patch,
> fixing kbuild .config test
>
> Marc-Andr? Lureau (11):
> crash: export paddr_vmcoreinfo_note()
> fw_cfg: add a public uapi header
> fw_cfg: fix sparse warnings in fw_cfg_sel_endianness()
> fw_cfg: fix sparse warnings with fw_cfg_file
> fw_cfg: fix sparse warning reading FW_CFG_ID
> fw_cfg: fix sparse warnings around FW_CFG_FILE_DIR read
> fw_cfg: remove inline from fw_cfg_read_blob()
> fw_cfg: handle fw_cfg_read_blob() error
> fw_cfg: add DMA register
> fw_cfg: write vmcoreinfo details
> RFC: fw_cfg: do DMA read operation
>
> MAINTAINERS | 1 +
> drivers/firmware/qemu_fw_cfg.c | 334 +++++++++++++++++++++++++++++++++--------
> include/uapi/linux/fw_cfg.h | 97 ++++++++++++
> kernel/crash_core.c | 1 +
> 4 files changed, 369 insertions(+), 64 deletions(-)
> create mode 100644 include/uapi/linux/fw_cfg.h
>
> --
> 2.16.1.73.g5832b7e9f2

2018-02-28 11:50:37

by Marc-André Lureau

[permalink] [raw]
Subject: Re: [PATCH v15 08/11] fw_cfg: handle fw_cfg_read_blob() error

Hi

On Tue, Feb 27, 2018 at 1:20 AM, Michael S. Tsirkin <[email protected]> wrote:
> On Thu, Feb 15, 2018 at 10:33:09PM +0100, Marc-André Lureau wrote:
>> fw_cfg_read_blob() may fail, but does not return error. This may lead
>> to undefined behaviours, such as a memcmp(sig, "QEMU") on uninitilized
>> memory.
>
> I don't think that's true - there's a memset there that
> will initialize the memory. probe is likely the only
> case where it returns a slightly incorrect data.

Right, I'll update the commit message.

>> Return an error if ACPI locking failed. Also, the following
>> DMA read/write extension will add more error paths that should be
>> handled appropriately.
>>
>> Signed-off-by: Marc-André Lureau <[email protected]>
>> ---
>> drivers/firmware/qemu_fw_cfg.c | 32 ++++++++++++++++++++++----------
>> 1 file changed, 22 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/firmware/qemu_fw_cfg.c b/drivers/firmware/qemu_fw_cfg.c
>> index f6f90bef604c..5e6e5ac71dab 100644
>> --- a/drivers/firmware/qemu_fw_cfg.c
>> +++ b/drivers/firmware/qemu_fw_cfg.c
>> @@ -59,8 +59,8 @@ static void fw_cfg_sel_endianness(u16 key)
>> }
>>
>> /* read chunk of given fw_cfg blob (caller responsible for sanity-check) */
>> -static void fw_cfg_read_blob(u16 key,
>> - void *buf, loff_t pos, size_t count)
>> +static ssize_t fw_cfg_read_blob(u16 key,
>> + void *buf, loff_t pos, size_t count)
>> {
>> u32 glk = -1U;
>> acpi_status status;
>> @@ -73,7 +73,7 @@ static void fw_cfg_read_blob(u16 key,
>> /* Should never get here */
>> WARN(1, "fw_cfg_read_blob: Failed to lock ACPI!\n");
>> memset(buf, 0, count);
>> - return;
>> + return -EINVAL;
>> }
>>
>> mutex_lock(&fw_cfg_dev_lock);
>
> Wouldn't something like -EBUSY be more appropriate?

In theory, it would be a general failure right? I don't think we want
the caller to retry. I think in EINVAL fits better, but I don't think
it matters much this or EBUSY.

>> @@ -84,6 +84,7 @@ static void fw_cfg_read_blob(u16 key,
>> mutex_unlock(&fw_cfg_dev_lock);
>>
>> acpi_release_global_lock(glk);
>> + return count;
>> }
>>
>> /* clean up fw_cfg device i/o */
>> @@ -165,8 +166,9 @@ static int fw_cfg_do_platform_probe(struct platform_device *pdev)
>> }
>>
>> /* verify fw_cfg device signature */
>> - fw_cfg_read_blob(FW_CFG_SIGNATURE, sig, 0, FW_CFG_SIG_SIZE);
>> - if (memcmp(sig, "QEMU", FW_CFG_SIG_SIZE) != 0) {
>> + if (fw_cfg_read_blob(FW_CFG_SIGNATURE, sig,
>> + 0, FW_CFG_SIG_SIZE) < 0 ||
>> + memcmp(sig, "QEMU", FW_CFG_SIG_SIZE) != 0) {
>> fw_cfg_io_cleanup();
>> return -ENODEV;
>> }
>> @@ -326,8 +328,7 @@ static ssize_t fw_cfg_sysfs_read_raw(struct file *filp, struct kobject *kobj,
>> if (count > entry->size - pos)
>> count = entry->size - pos;
>>
>> - fw_cfg_read_blob(entry->select, buf, pos, count);
>> - return count;
>> + return fw_cfg_read_blob(entry->select, buf, pos, count);
>> }
>>
>> static struct bin_attribute fw_cfg_sysfs_attr_raw = {
>> @@ -483,7 +484,11 @@ static int fw_cfg_register_dir_entries(void)
>> struct fw_cfg_file *dir;
>> size_t dir_size;
>>
>> - fw_cfg_read_blob(FW_CFG_FILE_DIR, &files_count, 0, sizeof(files_count));
>> + ret = fw_cfg_read_blob(FW_CFG_FILE_DIR, &files_count,
>> + 0, sizeof(files_count));
>> + if (ret < 0)
>> + return ret;
>> +
>> count = be32_to_cpu(files_count);
>> dir_size = count * sizeof(struct fw_cfg_file);
>>
>> @@ -491,7 +496,10 @@ static int fw_cfg_register_dir_entries(void)
>> if (!dir)
>> return -ENOMEM;
>>
>> - fw_cfg_read_blob(FW_CFG_FILE_DIR, dir, sizeof(files_count), dir_size);
>> + ret = fw_cfg_read_blob(FW_CFG_FILE_DIR, dir,
>> + sizeof(files_count), dir_size);
>> + if (ret < 0)
>> + goto end;
>>
>> for (i = 0; i < count; i++) {
>> ret = fw_cfg_register_file(&dir[i]);
>> @@ -499,6 +507,7 @@ static int fw_cfg_register_dir_entries(void)
>> break;
>> }
>>
>> +end:
>> kfree(dir);
>> return ret;
>> }
>> @@ -539,7 +548,10 @@ static int fw_cfg_sysfs_probe(struct platform_device *pdev)
>> goto err_probe;
>>
>> /* get revision number, add matching top-level attribute */
>> - fw_cfg_read_blob(FW_CFG_ID, &rev, 0, sizeof(rev));
>> + err = fw_cfg_read_blob(FW_CFG_ID, &rev, 0, sizeof(rev));
>> + if (err < 0)
>> + goto err_probe;
>> +
>> fw_cfg_rev = le32_to_cpu(rev);
>> err = sysfs_create_file(fw_cfg_top_ko, &fw_cfg_rev_attr.attr);
>> if (err)
>
> I think that this is the only case where it's not doing the right thing right now in
> that it shows 0 as the revision to the users. Is it worth failing probe
> here? We could just skip the attribute, could we not?

I think it's best to fail the probe if we have a read failure at that time.

--
Marc-André Lureau

2018-02-28 11:53:10

by Marc-André Lureau

[permalink] [raw]
Subject: Re: [PATCH v15 02/11] fw_cfg: add a public uapi header

On Tue, Feb 27, 2018 at 1:06 AM, Michael S. Tsirkin <[email protected]> wrote:
> On Thu, Feb 15, 2018 at 10:33:03PM +0100, Marc-André Lureau wrote:
>> Create a common header file for well-known values and structures to be
>> shared by the Linux kernel with qemu or other projects.
>>
>> It is based from qemu/docs/specs/fw_cfg.txt which references
>> qemu/include/hw/nvram/fw_cfg_keys.h "for the most up-to-date and
>> authoritative list" & vmcoreinfo.txt. Those files don't have an
>> explicit license, but qemu/hw/nvram/fw_cfg.c is BSD-license, so
>> Michael S. Tsirkin suggested to use the same license.
>>
>> The patch intentionally left out DMA & vmcoreinfo structures &
>> defines, which are added in the commits making usage of it.
>>
>> Suggested-by: Michael S. Tsirkin <[email protected]>
>> Signed-off-by: Marc-André Lureau <[email protected]>
>>
>> ---
>>
>> The related qemu patch making use of it, to be submitted:
>> https://github.com/elmarco/qemu/commit/4884fc9e9c4c4467a371e5a40f3181239e1b70f5
>> ---
>> MAINTAINERS | 1 +
>> drivers/firmware/qemu_fw_cfg.c | 22 ++------------
>> include/uapi/linux/fw_cfg.h | 66 ++++++++++++++++++++++++++++++++++++++++++
>> 3 files changed, 69 insertions(+), 20 deletions(-)
>> create mode 100644 include/uapi/linux/fw_cfg.h
>>
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index 3bdc260e36b7..a66b65f62811 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -11352,6 +11352,7 @@ M: "Michael S. Tsirkin" <[email protected]>
>> L: [email protected]
>> S: Maintained
>> F: drivers/firmware/qemu_fw_cfg.c
>> +F: include/uapi/linux/fw_cfg.h
>>
>> QIB DRIVER
>> M: Dennis Dalessandro <[email protected]>
>
> Why fw_cfg.h and not qemu_fw_cfg.h ? fw_cfg.h seems too generic.

ok

>
>> diff --git a/drivers/firmware/qemu_fw_cfg.c b/drivers/firmware/qemu_fw_cfg.c
>> index a41b572eeeb1..42601a3eaed5 100644
>> --- a/drivers/firmware/qemu_fw_cfg.c
>> +++ b/drivers/firmware/qemu_fw_cfg.c
>> @@ -32,30 +32,12 @@
>> #include <linux/slab.h>
>> #include <linux/io.h>
>> #include <linux/ioport.h>
>> +#include <uapi/linux/fw_cfg.h>
>>
>> MODULE_AUTHOR("Gabriel L. Somlo <[email protected]>");
>> MODULE_DESCRIPTION("QEMU fw_cfg sysfs support");
>> MODULE_LICENSE("GPL");
>>
>> -/* selector key values for "well-known" fw_cfg entries */
>> -#define FW_CFG_SIGNATURE 0x00
>> -#define FW_CFG_ID 0x01
>> -#define FW_CFG_FILE_DIR 0x19
>> -
>> -/* size in bytes of fw_cfg signature */
>> -#define FW_CFG_SIG_SIZE 4
>> -
>> -/* fw_cfg "file name" is up to 56 characters (including terminating nul) */
>> -#define FW_CFG_MAX_FILE_PATH 56
>> -
>> -/* fw_cfg file directory entry type */
>> -struct fw_cfg_file {
>> - u32 size;
>> - u16 select;
>> - u16 reserved;
>> - char name[FW_CFG_MAX_FILE_PATH];
>> -};
>> -
>> /* fw_cfg device i/o register addresses */
>> static bool fw_cfg_is_mmio;
>> static phys_addr_t fw_cfg_p_base;
>> @@ -597,7 +579,7 @@ MODULE_DEVICE_TABLE(of, fw_cfg_sysfs_mmio_match);
>>
>> #ifdef CONFIG_ACPI
>> static const struct acpi_device_id fw_cfg_sysfs_acpi_match[] = {
>> - { "QEMU0002", },
>> + { FW_CFG_ACPI_DEVICE_ID, },
>> {},
>> };
>> MODULE_DEVICE_TABLE(acpi, fw_cfg_sysfs_acpi_match);
>> diff --git a/include/uapi/linux/fw_cfg.h b/include/uapi/linux/fw_cfg.h
>> new file mode 100644
>> index 000000000000..c698ac3812f6
>> --- /dev/null
>> +++ b/include/uapi/linux/fw_cfg.h
>> @@ -0,0 +1,66 @@
>> +/* SPDX-License-Identifier: BSD-3-Clause */
>> +#ifndef _LINUX_FW_CFG_H
>> +#define _LINUX_FW_CFG_H
>> +
>> +#include <linux/types.h>
>> +
>> +#define FW_CFG_ACPI_DEVICE_ID "QEMU0002"
>> +
>> +/* selector key values for "well-known" fw_cfg entries */
>> +#define FW_CFG_SIGNATURE 0x00
>> +#define FW_CFG_ID 0x01
>> +#define FW_CFG_UUID 0x02
>> +#define FW_CFG_RAM_SIZE 0x03
>> +#define FW_CFG_NOGRAPHIC 0x04
>> +#define FW_CFG_NB_CPUS 0x05
>> +#define FW_CFG_MACHINE_ID 0x06
>> +#define FW_CFG_KERNEL_ADDR 0x07
>> +#define FW_CFG_KERNEL_SIZE 0x08
>> +#define FW_CFG_KERNEL_CMDLINE 0x09
>> +#define FW_CFG_INITRD_ADDR 0x0a
>> +#define FW_CFG_INITRD_SIZE 0x0b
>> +#define FW_CFG_BOOT_DEVICE 0x0c
>> +#define FW_CFG_NUMA 0x0d
>> +#define FW_CFG_BOOT_MENU 0x0e
>> +#define FW_CFG_MAX_CPUS 0x0f
>> +#define FW_CFG_KERNEL_ENTRY 0x10
>> +#define FW_CFG_KERNEL_DATA 0x11
>> +#define FW_CFG_INITRD_DATA 0x12
>> +#define FW_CFG_CMDLINE_ADDR 0x13
>> +#define FW_CFG_CMDLINE_SIZE 0x14
>> +#define FW_CFG_CMDLINE_DATA 0x15
>> +#define FW_CFG_SETUP_ADDR 0x16
>> +#define FW_CFG_SETUP_SIZE 0x17
>> +#define FW_CFG_SETUP_DATA 0x18
>> +#define FW_CFG_FILE_DIR 0x19
>> +
>> +#define FW_CFG_FILE_FIRST 0x20
>> +#define FW_CFG_FILE_SLOTS_MIN 0x10
>> +
>> +#define FW_CFG_WRITE_CHANNEL 0x4000
>> +#define FW_CFG_ARCH_LOCAL 0x8000
>> +#define FW_CFG_ENTRY_MASK (~(FW_CFG_WRITE_CHANNEL | FW_CFG_ARCH_LOCAL))
>> +
>> +#define FW_CFG_INVALID 0xffff
>> +
>> +/* width in bytes of fw_cfg control register */
>> +#define FW_CFG_CTL_SIZE 0x02
>> +
>> +/* fw_cfg "file name" is up to 56 characters (including terminating nul) */
>> +#define FW_CFG_MAX_FILE_PATH 56
>> +
>> +/* size in bytes of fw_cfg signature */
>> +#define FW_CFG_SIG_SIZE 4
>> +
>> +/* FW_CFG_ID bits */
>> +#define FW_CFG_VERSION 0x01
>> +
>> +/* fw_cfg file directory entry type */
>> +struct fw_cfg_file {
>> + __be32 size;
>> + __be16 select;
>> + __u16 reserved;
>> + char name[FW_CFG_MAX_FILE_PATH];
>> +};
>> +
>> +#endif
>> --
>> 2.16.1.73.g5832b7e9f2



--
Marc-André Lureau

2018-02-28 12:23:46

by Marc-André Lureau

[permalink] [raw]
Subject: Re: [PATCH v15 10/11] fw_cfg: write vmcoreinfo details

Hi

On Tue, Feb 27, 2018 at 1:28 AM, Michael S. Tsirkin <[email protected]> wrote:
> On Thu, Feb 15, 2018 at 10:33:11PM +0100, Marc-André Lureau wrote:
>> If the "etc/vmcoreinfo" fw_cfg file is present and we are not running
>> the kdump kernel, write the addr/size of the vmcoreinfo ELF note.
>>
>> The DMA operation is expected to run synchronously with today qemu,
>> but the specification states that it may become async, so we run
>> "control" field check in a loop for eventual changes.
>>
>> Signed-off-by: Marc-André Lureau <[email protected]>
>> ---
>> drivers/firmware/qemu_fw_cfg.c | 143 ++++++++++++++++++++++++++++++++++++++++-
>> include/uapi/linux/fw_cfg.h | 31 +++++++++
>> 2 files changed, 171 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/firmware/qemu_fw_cfg.c b/drivers/firmware/qemu_fw_cfg.c
>> index c28bec4b5663..3015e77aebca 100644
>> --- a/drivers/firmware/qemu_fw_cfg.c
>> +++ b/drivers/firmware/qemu_fw_cfg.c
>> @@ -34,11 +34,17 @@
>> #include <linux/io.h>
>> #include <linux/ioport.h>
>> #include <uapi/linux/fw_cfg.h>
>> +#include <linux/delay.h>
>> +#include <linux/crash_dump.h>
>> +#include <linux/crash_core.h>
>>
>> MODULE_AUTHOR("Gabriel L. Somlo <[email protected]>");
>> MODULE_DESCRIPTION("QEMU fw_cfg sysfs support");
>> MODULE_LICENSE("GPL");
>>
>> +/* fw_cfg revision attribute, in /sys/firmware/qemu_fw_cfg top-level dir. */
>> +static u32 fw_cfg_rev;
>> +
>> /* fw_cfg device i/o register addresses */
>> static bool fw_cfg_is_mmio;
>> static phys_addr_t fw_cfg_p_base;
>> @@ -60,6 +66,64 @@ static void fw_cfg_sel_endianness(u16 key)
>> iowrite16(key, fw_cfg_reg_ctrl);
>> }
>>
>> +static inline bool fw_cfg_dma_enabled(void)
>> +{
>> + return (fw_cfg_rev & FW_CFG_VERSION_DMA) && fw_cfg_reg_dma;
>> +}
>> +
>> +/* qemu fw_cfg device is sync today, but spec says it may become async */
>> +static void fw_cfg_wait_for_control(struct fw_cfg_dma_access *d)
>> +{
>> + for (;;) {
>> + u32 ctrl = be32_to_cpu(READ_ONCE(d->control));
>> +
>> + /* do not reorder the read to d->control */
>> + rmb();
>> + if ((ctrl & ~FW_CFG_DMA_CTL_ERROR) == 0)
>> + return;
>> +
>> + cpu_relax();
>> + }
>> +}
>> +
>> +static ssize_t fw_cfg_dma_transfer(void *address, u32 length, u32 control)
>> +{
>> + phys_addr_t dma;
>> + struct fw_cfg_dma_access *d = NULL;
>> + ssize_t ret = length;
>> +
>> + d = kmalloc(sizeof(*d), GFP_KERNEL);
>> + if (!d) {
>> + ret = -ENOMEM;
>> + goto end;
>> + }
>> +
>> + /* fw_cfg device does not need IOMMU protection, so use physical addresses */
>> + *d = (struct fw_cfg_dma_access) {
>> + .address = cpu_to_be64(address ? virt_to_phys(address) : 0),
>> + .length = cpu_to_be32(length),
>> + .control = cpu_to_be32(control)
>> + };
>> +
>> + dma = virt_to_phys(d);
>> +
>> + iowrite32be((u64)dma >> 32, fw_cfg_reg_dma);
>> + /* force memory to sync before notifying device via MMIO */
>> + wmb();
>> + iowrite32be(dma, fw_cfg_reg_dma + 4);
>> +
>> + fw_cfg_wait_for_control(d);
>> +
>> + if (be32_to_cpu(READ_ONCE(d->control)) & FW_CFG_DMA_CTL_ERROR) {
>> + ret = -EIO;
>> + }
>> +
>> +end:
>> + kfree(d);
>> +
>> + return ret;
>> +}
>> +
>> /* read chunk of given fw_cfg blob (caller responsible for sanity-check) */
>> static ssize_t fw_cfg_read_blob(u16 key,
>> void *buf, loff_t pos, size_t count)
>> @@ -89,6 +153,47 @@ static ssize_t fw_cfg_read_blob(u16 key,
>> return count;
>> }
>>
>> +#ifdef CONFIG_CRASH_CORE
>> +/* write chunk of given fw_cfg blob (caller responsible for sanity-check) */
>> +static ssize_t fw_cfg_write_blob(u16 key,
>> + void *buf, loff_t pos, size_t count)
>> +{
>> + u32 glk = -1U;
>> + acpi_status status;
>> + ssize_t ret = count;
>> +
>> + /* If we have ACPI, ensure mutual exclusion against any potential
>> + * device access by the firmware, e.g. via AML methods:
>> + */
>> + status = acpi_acquire_global_lock(ACPI_WAIT_FOREVER, &glk);
>> + if (ACPI_FAILURE(status) && status != AE_NOT_CONFIGURED) {
>> + /* Should never get here */
>> + WARN(1, "%s: Failed to lock ACPI!\n", __func__);
>> + return -EINVAL;
>> + }
>> +
>> + mutex_lock(&fw_cfg_dev_lock);
>> + if (pos == 0) {
>> + ret = fw_cfg_dma_transfer(buf, count, key << 16
>> + | FW_CFG_DMA_CTL_SELECT
>> + | FW_CFG_DMA_CTL_WRITE);
>> + } else {
>> + fw_cfg_sel_endianness(key);
>> + ret = fw_cfg_dma_transfer(NULL, pos, FW_CFG_DMA_CTL_SKIP);
>> + if (ret < 0)
>> + goto end;
>> + ret = fw_cfg_dma_transfer(buf, count, FW_CFG_DMA_CTL_WRITE);
>> + }
>> +
>> +end:
>> + mutex_unlock(&fw_cfg_dev_lock);
>> +
>> + acpi_release_global_lock(glk);
>> +
>> + return ret;
>> +}
>> +#endif /* CONFIG_CRASH_CORE */
>> +
>> /* clean up fw_cfg device i/o */
>> static void fw_cfg_io_cleanup(void)
>> {
>> @@ -188,9 +293,6 @@ static int fw_cfg_do_platform_probe(struct platform_device *pdev)
>> return 0;
>> }
>>
>> -/* fw_cfg revision attribute, in /sys/firmware/qemu_fw_cfg top-level dir. */
>> -static u32 fw_cfg_rev;
>> -
>> static ssize_t fw_cfg_showrev(struct kobject *k, struct attribute *a, char *buf)
>> {
>> return sprintf(buf, "%u\n", fw_cfg_rev);
>> @@ -213,6 +315,32 @@ struct fw_cfg_sysfs_entry {
>> struct list_head list;
>> };
>>
>> +#ifdef CONFIG_CRASH_CORE
>> +static ssize_t fw_cfg_write_vmcoreinfo(const struct fw_cfg_file *f)
>> +{
>> + static struct fw_cfg_vmcoreinfo *data;
>> + ssize_t ret;
>> +
>> + data = kmalloc(sizeof(struct fw_cfg_vmcoreinfo), GFP_KERNEL);
>> + if (!data)
>> + return -ENOMEM;
>> +
>> + *data = (struct fw_cfg_vmcoreinfo) {
>> + .guest_format = cpu_to_le16(FW_CFG_VMCOREINFO_FORMAT_ELF),
>> + .size = cpu_to_le32(VMCOREINFO_NOTE_SIZE),
>> + .paddr = cpu_to_le64(paddr_vmcoreinfo_note())
>> + };
>> + /* spare ourself reading host format support for now since we
>> + * don't know what else to format - host may ignore ours
>
>
> qemu documentation probably should mention that it must.
>

Can you be more explicit?
thanks

>> + */
>> + ret = fw_cfg_write_blob(be16_to_cpu(f->select), data,
>> + 0, sizeof(struct fw_cfg_vmcoreinfo));
>> +
>> + kfree(data);
>> + return ret;
>> +}
>> +#endif /* CONFIG_CRASH_CORE */
>> +
>> /* get fw_cfg_sysfs_entry from kobject member */
>> static inline struct fw_cfg_sysfs_entry *to_entry(struct kobject *kobj)
>> {
>> @@ -452,6 +580,15 @@ static int fw_cfg_register_file(const struct fw_cfg_file *f)
>> int err;
>> struct fw_cfg_sysfs_entry *entry;
>>
>> +#ifdef CONFIG_CRASH_CORE
>> + if (fw_cfg_dma_enabled() &&
>> + strcmp(f->name, FW_CFG_VMCOREINFO_FILENAME) == 0 &&
>> + !is_kdump_kernel()) {
>> + if (fw_cfg_write_vmcoreinfo(f) < 0)
>> + pr_warn("fw_cfg: failed to write vmcoreinfo");
>> + }
>> +#endif
>> +
>> /* allocate new entry */
>> entry = kzalloc(sizeof(*entry), GFP_KERNEL);
>> if (!entry)
>> diff --git a/include/uapi/linux/fw_cfg.h b/include/uapi/linux/fw_cfg.h
>> index c698ac3812f6..e089c0159ec2 100644
>> --- a/include/uapi/linux/fw_cfg.h
>> +++ b/include/uapi/linux/fw_cfg.h
>> @@ -54,6 +54,7 @@
>>
>> /* FW_CFG_ID bits */
>> #define FW_CFG_VERSION 0x01
>> +#define FW_CFG_VERSION_DMA 0x02
>>
>> /* fw_cfg file directory entry type */
>> struct fw_cfg_file {
>> @@ -63,4 +64,34 @@ struct fw_cfg_file {
>> char name[FW_CFG_MAX_FILE_PATH];
>> };
>>
>> +/* FW_CFG_DMA_CONTROL bits */
>> +#define FW_CFG_DMA_CTL_ERROR 0x01
>> +#define FW_CFG_DMA_CTL_READ 0x02
>> +#define FW_CFG_DMA_CTL_SKIP 0x04
>> +#define FW_CFG_DMA_CTL_SELECT 0x08
>> +#define FW_CFG_DMA_CTL_WRITE 0x10
>> +
>> +#define FW_CFG_DMA_SIGNATURE 0x51454d5520434647ULL /* "QEMU CFG" */
>> +
>> +/* Control as first field allows for different structures selected by this
>> + * field, which might be useful in the future
>> + */
>> +struct fw_cfg_dma_access {
>> + __be32 control;
>> + __be32 length;
>> + __be64 address;
>> +};
>> +
>> +#define FW_CFG_VMCOREINFO_FILENAME "etc/vmcoreinfo"
>> +
>> +#define FW_CFG_VMCOREINFO_FORMAT_NONE 0x0
>> +#define FW_CFG_VMCOREINFO_FORMAT_ELF 0x1
>> +
>> +struct fw_cfg_vmcoreinfo {
>> + __le16 host_format;
>> + __le16 guest_format;
>> + __le32 size;
>> + __le64 paddr;
>> +};
>> +
>> #endif
>> --
>> 2.16.1.73.g5832b7e9f2

2018-02-28 12:28:04

by Marc-André Lureau

[permalink] [raw]
Subject: Re: [PATCH v15 11/11] RFC: fw_cfg: do DMA read operation

Hi

On Tue, Feb 27, 2018 at 1:04 AM, Michael S. Tsirkin <[email protected]> wrote:
> On Thu, Feb 15, 2018 at 10:33:12PM +0100, Marc-André Lureau wrote:
>> Modify fw_cfg_read_blob() to use DMA if the device supports it.
>> Return errors, because the operation may fail.
>>
>> So far, only one call in fw_cfg_register_dir_entries() is using
>> kmalloc'ed buf and is thus clearly eligible to DMA read.
>>
>> Initially, I didn't implement DMA read to speed up boot time, but as a
>> first step before introducing DMA write (since read operations were
>> already presents). Even more, I didn't realize fw-cfg entries were
>> being read by the kernel during boot by default. But actally fw-cfg
>> entries are being populated during module probe. I knew DMA improved a
>> lot bios boot time (the main reason the DMA interface was added
>> afaik). Let see the time it would take to read the whole ACPI
>> tables (128kb allocated)
>>
>> # time cat /sys/firmware/qemu_fw_cfg/by_name/etc/acpi/tables/raw
>> - with DMA: sys 0m0.003s
>> - without DMA (-global fw_cfg.dma_enabled=off): sys 0m7.674s
>>
>> FW_CFG_FILE_DIR (0x19) is the only "file" that is read during kernel
>> boot to populate sysfs qemu_fw_cfg directory, and it is quite
>> small (1-2kb). Since it does not expose itself, in order to measure
>> the time it takes to read such small file, I took a comparable sized
>> file of 2048 bytes and exposed it (-fw_cfg test,file=file with a
>> modified read_raw enabling DMA)
>>
>> # perf stat -r 100 cat /sys/firmware/qemu_fw_cfg/by_name/test/raw >/dev/null
>> - with DMA:
>> 0.636037 task-clock (msec) # 0.141 CPUs utilized ( +- 1.19% )
>> - without DMA:
>> 6.430128 task-clock (msec) # 0.622 CPUs utilized ( +- 0.22% )
>>
>> That's a few msec saved during boot by enabling DMA read (the gain
>> would be more substantial if other & bigger fw-cfg entries are read by
>> others from sysfs, unfortunately, it's not clear if we can always
>> enable DMA there)
>>
>> Signed-off-by: Marc-André Lureau <[email protected]>
>> ---
>> drivers/firmware/qemu_fw_cfg.c | 61 ++++++++++++++++++++++++++++++++++--------
>> 1 file changed, 50 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/firmware/qemu_fw_cfg.c b/drivers/firmware/qemu_fw_cfg.c
>> index 3015e77aebca..94df57e9be66 100644
>> --- a/drivers/firmware/qemu_fw_cfg.c
>> +++ b/drivers/firmware/qemu_fw_cfg.c
>> @@ -124,12 +124,47 @@ static ssize_t fw_cfg_dma_transfer(void *address, u32 length, u32 control)
>> return ret;
>> }
>>
>> +/* with acpi & dev locks taken */
>> +static ssize_t fw_cfg_read_blob_dma(u16 key,
>> + void *buf, loff_t pos, size_t count)
>> +{
>> + ssize_t ret;
>> +
>> + if (pos == 0) {
>> + ret = fw_cfg_dma_transfer(buf, count, key << 16
>> + | FW_CFG_DMA_CTL_SELECT
>> + | FW_CFG_DMA_CTL_READ);
>> + } else {
>> + fw_cfg_sel_endianness(key);
>> + ret = fw_cfg_dma_transfer(NULL, pos, FW_CFG_DMA_CTL_SKIP);
>> + if (ret < 0)
>> + return ret;
>> + ret = fw_cfg_dma_transfer(buf, count,
>> + FW_CFG_DMA_CTL_READ);
>> + }
>> +
>> + return ret;
>> +}
>> +
>> +/* with acpi & dev locks taken */
>> +static ssize_t fw_cfg_read_blob_io(u16 key,
>> + void *buf, loff_t pos, size_t count)
>> +{
>> + fw_cfg_sel_endianness(key);
>> + while (pos-- > 0)
>> + ioread8(fw_cfg_reg_data);
>> + ioread8_rep(fw_cfg_reg_data, buf, count);
>> + return count;
>> +}
>> +
>> /* read chunk of given fw_cfg blob (caller responsible for sanity-check) */
>> static ssize_t fw_cfg_read_blob(u16 key,
>> - void *buf, loff_t pos, size_t count)
>> + void *buf, loff_t pos, size_t count,
>> + bool dma)
>> {
>> u32 glk = -1U;
>> acpi_status status;
>> + ssize_t ret;
>>
>> /* If we have ACPI, ensure mutual exclusion against any potential
>> * device access by the firmware, e.g. via AML methods:
>
> so this adds a dma flag to fw_cfg_read_blob.
>
>
>
>> @@ -143,14 +178,17 @@ static ssize_t fw_cfg_read_blob(u16 key,
>> }
>>
>> mutex_lock(&fw_cfg_dev_lock);
>> - fw_cfg_sel_endianness(key);
>> - while (pos-- > 0)
>> - ioread8(fw_cfg_reg_data);
>> - ioread8_rep(fw_cfg_reg_data, buf, count);
>> + if (dma && fw_cfg_dma_enabled()) {
>> + ret = fw_cfg_read_blob_dma(key, buf, pos, count);
>> + } else {
>> + ret = fw_cfg_read_blob_io(key, buf, pos, count);
>> + }
>> +
>> mutex_unlock(&fw_cfg_dev_lock);
>>
>> acpi_release_global_lock(glk);
>> - return count;
>> +
>> + return ret;
>> }
>>
>> #ifdef CONFIG_CRASH_CORE
>
> If set to false it does io, if set to true it does dma.
>
> I would prefer passing an accessor function pointer
> since that's clearer than true/false.

ok

>
>> @@ -284,7 +322,7 @@ static int fw_cfg_do_platform_probe(struct platform_device *pdev)
>>
>> /* verify fw_cfg device signature */
>> if (fw_cfg_read_blob(FW_CFG_SIGNATURE, sig,
>> - 0, FW_CFG_SIG_SIZE) < 0 ||
>> + 0, FW_CFG_SIG_SIZE, false) < 0 ||
>> memcmp(sig, "QEMU", FW_CFG_SIG_SIZE) != 0) {
>> fw_cfg_io_cleanup();
>> return -ENODEV;
>> @@ -468,7 +506,8 @@ static ssize_t fw_cfg_sysfs_read_raw(struct file *filp, struct kobject *kobj,
>> if (count > entry->size - pos)
>> count = entry->size - pos;
>>
>> - return fw_cfg_read_blob(entry->select, buf, pos, count);
>> + /* do not use DMA, virt_to_phys(buf) might not be ok */
>> + return fw_cfg_read_blob(entry->select, buf, pos, count, false);
>> }
>>
>> static struct bin_attribute fw_cfg_sysfs_attr_raw = {
>> @@ -634,7 +673,7 @@ static int fw_cfg_register_dir_entries(void)
>> size_t dir_size;
>>
>> ret = fw_cfg_read_blob(FW_CFG_FILE_DIR, &files_count,
>> - 0, sizeof(files_count));
>> + 0, sizeof(files_count), false);
>> if (ret < 0)
>> return ret;
>>
>> @@ -646,7 +685,7 @@ static int fw_cfg_register_dir_entries(void)
>> return -ENOMEM;
>>
>> ret = fw_cfg_read_blob(FW_CFG_FILE_DIR, dir,
>> - sizeof(files_count), dir_size);
>> + sizeof(files_count), dir_size, false);
>> if (ret < 0)
>> goto end;
>>
>> @@ -697,7 +736,7 @@ static int fw_cfg_sysfs_probe(struct platform_device *pdev)
>> goto err_probe;
>>
>> /* get revision number, add matching top-level attribute */
>> - err = fw_cfg_read_blob(FW_CFG_ID, &rev, 0, sizeof(rev));
>> + err = fw_cfg_read_blob(FW_CFG_ID, &rev, 0, sizeof(rev), false);
>> if (err < 0)
>> goto err_probe;
>
>
> Looks like all callers pass in false as parameter.
> Given this, how can this speed up any operations?
>
> Are you sure you tested this properly?


I did modify read_raw to conduct testing ( the part "with a
modified read_raw enabling DMA" should be before, updating commit message).

>
>> --
>> 2.16.1.73.g5832b7e9f2

2018-02-28 13:27:23

by Gabriel L. Somlo

[permalink] [raw]
Subject: Re: [PATCH v15 08/11] fw_cfg: handle fw_cfg_read_blob() error

On Wed, Feb 28, 2018 at 12:49:35PM +0100, Marc-Andr? Lureau wrote:
> Hi
>
> On Tue, Feb 27, 2018 at 1:20 AM, Michael S. Tsirkin <[email protected]> wrote:
> > On Thu, Feb 15, 2018 at 10:33:09PM +0100, Marc-Andr? Lureau wrote:
> >> fw_cfg_read_blob() may fail, but does not return error. This may lead
> >> to undefined behaviours, such as a memcmp(sig, "QEMU") on uninitilized
> >> memory.
> >
> > I don't think that's true - there's a memset there that
> > will initialize the memory. probe is likely the only
> > case where it returns a slightly incorrect data.
>
> Right, I'll update the commit message.
>
> >> Return an error if ACPI locking failed. Also, the following
> >> DMA read/write extension will add more error paths that should be
> >> handled appropriately.
> >>
> >> Signed-off-by: Marc-Andr? Lureau <[email protected]>
> >> ---
> >> drivers/firmware/qemu_fw_cfg.c | 32 ++++++++++++++++++++++----------
> >> 1 file changed, 22 insertions(+), 10 deletions(-)
> >>
> >> diff --git a/drivers/firmware/qemu_fw_cfg.c b/drivers/firmware/qemu_fw_cfg.c
> >> index f6f90bef604c..5e6e5ac71dab 100644
> >> --- a/drivers/firmware/qemu_fw_cfg.c
> >> +++ b/drivers/firmware/qemu_fw_cfg.c
> >> @@ -59,8 +59,8 @@ static void fw_cfg_sel_endianness(u16 key)
> >> }
> >>
> >> /* read chunk of given fw_cfg blob (caller responsible for sanity-check) */
> >> -static void fw_cfg_read_blob(u16 key,
> >> - void *buf, loff_t pos, size_t count)
> >> +static ssize_t fw_cfg_read_blob(u16 key,
> >> + void *buf, loff_t pos, size_t count)
> >> {
> >> u32 glk = -1U;
> >> acpi_status status;
> >> @@ -73,7 +73,7 @@ static void fw_cfg_read_blob(u16 key,
> >> /* Should never get here */
> >> WARN(1, "fw_cfg_read_blob: Failed to lock ACPI!\n");
> >> memset(buf, 0, count);
> >> - return;
> >> + return -EINVAL;
> >> }
> >>
> >> mutex_lock(&fw_cfg_dev_lock);
> >
> > Wouldn't something like -EBUSY be more appropriate?
>
> In theory, it would be a general failure right? I don't think we want
> the caller to retry. I think in EINVAL fits better, but I don't think
> it matters much this or EBUSY.

The original thought behind EINVAL was that this is a "should never
happen", "man-bites-dog" condition. Hence also the WARN statement.

>
> >> @@ -84,6 +84,7 @@ static void fw_cfg_read_blob(u16 key,
> >> mutex_unlock(&fw_cfg_dev_lock);
> >>
> >> acpi_release_global_lock(glk);
> >> + return count;
> >> }
> >>
> >> /* clean up fw_cfg device i/o */
> >> @@ -165,8 +166,9 @@ static int fw_cfg_do_platform_probe(struct platform_device *pdev)
> >> }
> >>
> >> /* verify fw_cfg device signature */
> >> - fw_cfg_read_blob(FW_CFG_SIGNATURE, sig, 0, FW_CFG_SIG_SIZE);
> >> - if (memcmp(sig, "QEMU", FW_CFG_SIG_SIZE) != 0) {
> >> + if (fw_cfg_read_blob(FW_CFG_SIGNATURE, sig,
> >> + 0, FW_CFG_SIG_SIZE) < 0 ||
> >> + memcmp(sig, "QEMU", FW_CFG_SIG_SIZE) != 0) {
> >> fw_cfg_io_cleanup();
> >> return -ENODEV;
> >> }
> >> @@ -326,8 +328,7 @@ static ssize_t fw_cfg_sysfs_read_raw(struct file *filp, struct kobject *kobj,
> >> if (count > entry->size - pos)
> >> count = entry->size - pos;
> >>
> >> - fw_cfg_read_blob(entry->select, buf, pos, count);
> >> - return count;
> >> + return fw_cfg_read_blob(entry->select, buf, pos, count);
> >> }
> >>
> >> static struct bin_attribute fw_cfg_sysfs_attr_raw = {
> >> @@ -483,7 +484,11 @@ static int fw_cfg_register_dir_entries(void)
> >> struct fw_cfg_file *dir;
> >> size_t dir_size;
> >>
> >> - fw_cfg_read_blob(FW_CFG_FILE_DIR, &files_count, 0, sizeof(files_count));
> >> + ret = fw_cfg_read_blob(FW_CFG_FILE_DIR, &files_count,
> >> + 0, sizeof(files_count));
> >> + if (ret < 0)
> >> + return ret;
> >> +
> >> count = be32_to_cpu(files_count);
> >> dir_size = count * sizeof(struct fw_cfg_file);
> >>
> >> @@ -491,7 +496,10 @@ static int fw_cfg_register_dir_entries(void)
> >> if (!dir)
> >> return -ENOMEM;
> >>
> >> - fw_cfg_read_blob(FW_CFG_FILE_DIR, dir, sizeof(files_count), dir_size);
> >> + ret = fw_cfg_read_blob(FW_CFG_FILE_DIR, dir,
> >> + sizeof(files_count), dir_size);
> >> + if (ret < 0)
> >> + goto end;
> >>
> >> for (i = 0; i < count; i++) {
> >> ret = fw_cfg_register_file(&dir[i]);
> >> @@ -499,6 +507,7 @@ static int fw_cfg_register_dir_entries(void)
> >> break;
> >> }
> >>
> >> +end:
> >> kfree(dir);
> >> return ret;
> >> }
> >> @@ -539,7 +548,10 @@ static int fw_cfg_sysfs_probe(struct platform_device *pdev)
> >> goto err_probe;
> >>
> >> /* get revision number, add matching top-level attribute */
> >> - fw_cfg_read_blob(FW_CFG_ID, &rev, 0, sizeof(rev));
> >> + err = fw_cfg_read_blob(FW_CFG_ID, &rev, 0, sizeof(rev));
> >> + if (err < 0)
> >> + goto err_probe;
> >> +
> >> fw_cfg_rev = le32_to_cpu(rev);
> >> err = sysfs_create_file(fw_cfg_top_ko, &fw_cfg_rev_attr.attr);
> >> if (err)
> >
> > I think that this is the only case where it's not doing the right thing right now in
> > that it shows 0 as the revision to the users. Is it worth failing probe
> > here? We could just skip the attribute, could we not?
>
> I think it's best to fail the probe if we have a read failure at that time.
>
> --
> Marc-Andr? Lureau

2018-02-28 15:35:07

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v15 08/11] fw_cfg: handle fw_cfg_read_blob() error

On Wed, Feb 28, 2018 at 12:49:35PM +0100, Marc-Andr? Lureau wrote:
> Hi
>
> On Tue, Feb 27, 2018 at 1:20 AM, Michael S. Tsirkin <[email protected]> wrote:
> > On Thu, Feb 15, 2018 at 10:33:09PM +0100, Marc-Andr? Lureau wrote:
> >> fw_cfg_read_blob() may fail, but does not return error. This may lead
> >> to undefined behaviours, such as a memcmp(sig, "QEMU") on uninitilized
> >> memory.
> >
> > I don't think that's true - there's a memset there that
> > will initialize the memory. probe is likely the only
> > case where it returns a slightly incorrect data.
>
> Right, I'll update the commit message.
>
> >> Return an error if ACPI locking failed. Also, the following
> >> DMA read/write extension will add more error paths that should be
> >> handled appropriately.
> >>
> >> Signed-off-by: Marc-Andr? Lureau <[email protected]>
> >> ---
> >> drivers/firmware/qemu_fw_cfg.c | 32 ++++++++++++++++++++++----------
> >> 1 file changed, 22 insertions(+), 10 deletions(-)
> >>
> >> diff --git a/drivers/firmware/qemu_fw_cfg.c b/drivers/firmware/qemu_fw_cfg.c
> >> index f6f90bef604c..5e6e5ac71dab 100644
> >> --- a/drivers/firmware/qemu_fw_cfg.c
> >> +++ b/drivers/firmware/qemu_fw_cfg.c
> >> @@ -59,8 +59,8 @@ static void fw_cfg_sel_endianness(u16 key)
> >> }
> >>
> >> /* read chunk of given fw_cfg blob (caller responsible for sanity-check) */
> >> -static void fw_cfg_read_blob(u16 key,
> >> - void *buf, loff_t pos, size_t count)
> >> +static ssize_t fw_cfg_read_blob(u16 key,
> >> + void *buf, loff_t pos, size_t count)
> >> {
> >> u32 glk = -1U;
> >> acpi_status status;
> >> @@ -73,7 +73,7 @@ static void fw_cfg_read_blob(u16 key,
> >> /* Should never get here */
> >> WARN(1, "fw_cfg_read_blob: Failed to lock ACPI!\n");
> >> memset(buf, 0, count);
> >> - return;
> >> + return -EINVAL;
> >> }
> >>
> >> mutex_lock(&fw_cfg_dev_lock);
> >
> > Wouldn't something like -EBUSY be more appropriate?
>
> In theory, it would be a general failure right? I don't think we want
> the caller to retry. I think in EINVAL fits better, but I don't think
> it matters much this or EBUSY.
>
> >> @@ -84,6 +84,7 @@ static void fw_cfg_read_blob(u16 key,
> >> mutex_unlock(&fw_cfg_dev_lock);
> >>
> >> acpi_release_global_lock(glk);
> >> + return count;
> >> }
> >>
> >> /* clean up fw_cfg device i/o */
> >> @@ -165,8 +166,9 @@ static int fw_cfg_do_platform_probe(struct platform_device *pdev)
> >> }
> >>
> >> /* verify fw_cfg device signature */
> >> - fw_cfg_read_blob(FW_CFG_SIGNATURE, sig, 0, FW_CFG_SIG_SIZE);
> >> - if (memcmp(sig, "QEMU", FW_CFG_SIG_SIZE) != 0) {
> >> + if (fw_cfg_read_blob(FW_CFG_SIGNATURE, sig,
> >> + 0, FW_CFG_SIG_SIZE) < 0 ||
> >> + memcmp(sig, "QEMU", FW_CFG_SIG_SIZE) != 0) {
> >> fw_cfg_io_cleanup();
> >> return -ENODEV;
> >> }
> >> @@ -326,8 +328,7 @@ static ssize_t fw_cfg_sysfs_read_raw(struct file *filp, struct kobject *kobj,
> >> if (count > entry->size - pos)
> >> count = entry->size - pos;
> >>
> >> - fw_cfg_read_blob(entry->select, buf, pos, count);
> >> - return count;
> >> + return fw_cfg_read_blob(entry->select, buf, pos, count);
> >> }
> >>
> >> static struct bin_attribute fw_cfg_sysfs_attr_raw = {
> >> @@ -483,7 +484,11 @@ static int fw_cfg_register_dir_entries(void)
> >> struct fw_cfg_file *dir;
> >> size_t dir_size;
> >>
> >> - fw_cfg_read_blob(FW_CFG_FILE_DIR, &files_count, 0, sizeof(files_count));
> >> + ret = fw_cfg_read_blob(FW_CFG_FILE_DIR, &files_count,
> >> + 0, sizeof(files_count));
> >> + if (ret < 0)
> >> + return ret;
> >> +
> >> count = be32_to_cpu(files_count);
> >> dir_size = count * sizeof(struct fw_cfg_file);
> >>
> >> @@ -491,7 +496,10 @@ static int fw_cfg_register_dir_entries(void)
> >> if (!dir)
> >> return -ENOMEM;
> >>
> >> - fw_cfg_read_blob(FW_CFG_FILE_DIR, dir, sizeof(files_count), dir_size);
> >> + ret = fw_cfg_read_blob(FW_CFG_FILE_DIR, dir,
> >> + sizeof(files_count), dir_size);
> >> + if (ret < 0)
> >> + goto end;
> >>
> >> for (i = 0; i < count; i++) {
> >> ret = fw_cfg_register_file(&dir[i]);
> >> @@ -499,6 +507,7 @@ static int fw_cfg_register_dir_entries(void)
> >> break;
> >> }
> >>
> >> +end:
> >> kfree(dir);
> >> return ret;
> >> }
> >> @@ -539,7 +548,10 @@ static int fw_cfg_sysfs_probe(struct platform_device *pdev)
> >> goto err_probe;
> >>
> >> /* get revision number, add matching top-level attribute */
> >> - fw_cfg_read_blob(FW_CFG_ID, &rev, 0, sizeof(rev));
> >> + err = fw_cfg_read_blob(FW_CFG_ID, &rev, 0, sizeof(rev));
> >> + if (err < 0)
> >> + goto err_probe;
> >> +
> >> fw_cfg_rev = le32_to_cpu(rev);
> >> err = sysfs_create_file(fw_cfg_top_ko, &fw_cfg_rev_attr.attr);
> >> if (err)
> >
> > I think that this is the only case where it's not doing the right thing right now in
> > that it shows 0 as the revision to the users. Is it worth failing probe
> > here? We could just skip the attribute, could we not?
>
> I think it's best to fail the probe if we have a read failure at that time.

I'd rather we just dropped this attribute completely.
Why is it there?
Does any userspace actually need it?
Gabriel?

> --
> Marc-Andr? Lureau

2018-02-28 15:36:31

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v15 10/11] fw_cfg: write vmcoreinfo details

On Wed, Feb 28, 2018 at 01:22:33PM +0100, Marc-Andr? Lureau wrote:
> Hi
>
> On Tue, Feb 27, 2018 at 1:28 AM, Michael S. Tsirkin <[email protected]> wrote:
> > On Thu, Feb 15, 2018 at 10:33:11PM +0100, Marc-Andr? Lureau wrote:
> >> If the "etc/vmcoreinfo" fw_cfg file is present and we are not running
> >> the kdump kernel, write the addr/size of the vmcoreinfo ELF note.
> >>
> >> The DMA operation is expected to run synchronously with today qemu,
> >> but the specification states that it may become async, so we run
> >> "control" field check in a loop for eventual changes.
> >>
> >> Signed-off-by: Marc-Andr? Lureau <[email protected]>
> >> ---
> >> drivers/firmware/qemu_fw_cfg.c | 143 ++++++++++++++++++++++++++++++++++++++++-
> >> include/uapi/linux/fw_cfg.h | 31 +++++++++
> >> 2 files changed, 171 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/drivers/firmware/qemu_fw_cfg.c b/drivers/firmware/qemu_fw_cfg.c
> >> index c28bec4b5663..3015e77aebca 100644
> >> --- a/drivers/firmware/qemu_fw_cfg.c
> >> +++ b/drivers/firmware/qemu_fw_cfg.c
> >> @@ -34,11 +34,17 @@
> >> #include <linux/io.h>
> >> #include <linux/ioport.h>
> >> #include <uapi/linux/fw_cfg.h>
> >> +#include <linux/delay.h>
> >> +#include <linux/crash_dump.h>
> >> +#include <linux/crash_core.h>
> >>
> >> MODULE_AUTHOR("Gabriel L. Somlo <[email protected]>");
> >> MODULE_DESCRIPTION("QEMU fw_cfg sysfs support");
> >> MODULE_LICENSE("GPL");
> >>
> >> +/* fw_cfg revision attribute, in /sys/firmware/qemu_fw_cfg top-level dir. */
> >> +static u32 fw_cfg_rev;
> >> +
> >> /* fw_cfg device i/o register addresses */
> >> static bool fw_cfg_is_mmio;
> >> static phys_addr_t fw_cfg_p_base;
> >> @@ -60,6 +66,64 @@ static void fw_cfg_sel_endianness(u16 key)
> >> iowrite16(key, fw_cfg_reg_ctrl);
> >> }
> >>
> >> +static inline bool fw_cfg_dma_enabled(void)
> >> +{
> >> + return (fw_cfg_rev & FW_CFG_VERSION_DMA) && fw_cfg_reg_dma;
> >> +}
> >> +
> >> +/* qemu fw_cfg device is sync today, but spec says it may become async */
> >> +static void fw_cfg_wait_for_control(struct fw_cfg_dma_access *d)
> >> +{
> >> + for (;;) {
> >> + u32 ctrl = be32_to_cpu(READ_ONCE(d->control));
> >> +
> >> + /* do not reorder the read to d->control */
> >> + rmb();
> >> + if ((ctrl & ~FW_CFG_DMA_CTL_ERROR) == 0)
> >> + return;
> >> +
> >> + cpu_relax();
> >> + }
> >> +}
> >> +
> >> +static ssize_t fw_cfg_dma_transfer(void *address, u32 length, u32 control)
> >> +{
> >> + phys_addr_t dma;
> >> + struct fw_cfg_dma_access *d = NULL;
> >> + ssize_t ret = length;
> >> +
> >> + d = kmalloc(sizeof(*d), GFP_KERNEL);
> >> + if (!d) {
> >> + ret = -ENOMEM;
> >> + goto end;
> >> + }
> >> +
> >> + /* fw_cfg device does not need IOMMU protection, so use physical addresses */
> >> + *d = (struct fw_cfg_dma_access) {
> >> + .address = cpu_to_be64(address ? virt_to_phys(address) : 0),
> >> + .length = cpu_to_be32(length),
> >> + .control = cpu_to_be32(control)
> >> + };
> >> +
> >> + dma = virt_to_phys(d);
> >> +
> >> + iowrite32be((u64)dma >> 32, fw_cfg_reg_dma);
> >> + /* force memory to sync before notifying device via MMIO */
> >> + wmb();
> >> + iowrite32be(dma, fw_cfg_reg_dma + 4);
> >> +
> >> + fw_cfg_wait_for_control(d);
> >> +
> >> + if (be32_to_cpu(READ_ONCE(d->control)) & FW_CFG_DMA_CTL_ERROR) {
> >> + ret = -EIO;
> >> + }
> >> +
> >> +end:
> >> + kfree(d);
> >> +
> >> + return ret;
> >> +}
> >> +
> >> /* read chunk of given fw_cfg blob (caller responsible for sanity-check) */
> >> static ssize_t fw_cfg_read_blob(u16 key,
> >> void *buf, loff_t pos, size_t count)
> >> @@ -89,6 +153,47 @@ static ssize_t fw_cfg_read_blob(u16 key,
> >> return count;
> >> }
> >>
> >> +#ifdef CONFIG_CRASH_CORE
> >> +/* write chunk of given fw_cfg blob (caller responsible for sanity-check) */
> >> +static ssize_t fw_cfg_write_blob(u16 key,
> >> + void *buf, loff_t pos, size_t count)
> >> +{
> >> + u32 glk = -1U;
> >> + acpi_status status;
> >> + ssize_t ret = count;
> >> +
> >> + /* If we have ACPI, ensure mutual exclusion against any potential
> >> + * device access by the firmware, e.g. via AML methods:
> >> + */
> >> + status = acpi_acquire_global_lock(ACPI_WAIT_FOREVER, &glk);
> >> + if (ACPI_FAILURE(status) && status != AE_NOT_CONFIGURED) {
> >> + /* Should never get here */
> >> + WARN(1, "%s: Failed to lock ACPI!\n", __func__);
> >> + return -EINVAL;
> >> + }
> >> +
> >> + mutex_lock(&fw_cfg_dev_lock);
> >> + if (pos == 0) {
> >> + ret = fw_cfg_dma_transfer(buf, count, key << 16
> >> + | FW_CFG_DMA_CTL_SELECT
> >> + | FW_CFG_DMA_CTL_WRITE);
> >> + } else {
> >> + fw_cfg_sel_endianness(key);
> >> + ret = fw_cfg_dma_transfer(NULL, pos, FW_CFG_DMA_CTL_SKIP);
> >> + if (ret < 0)
> >> + goto end;
> >> + ret = fw_cfg_dma_transfer(buf, count, FW_CFG_DMA_CTL_WRITE);
> >> + }
> >> +
> >> +end:
> >> + mutex_unlock(&fw_cfg_dev_lock);
> >> +
> >> + acpi_release_global_lock(glk);
> >> +
> >> + return ret;
> >> +}
> >> +#endif /* CONFIG_CRASH_CORE */
> >> +
> >> /* clean up fw_cfg device i/o */
> >> static void fw_cfg_io_cleanup(void)
> >> {
> >> @@ -188,9 +293,6 @@ static int fw_cfg_do_platform_probe(struct platform_device *pdev)
> >> return 0;
> >> }
> >>
> >> -/* fw_cfg revision attribute, in /sys/firmware/qemu_fw_cfg top-level dir. */
> >> -static u32 fw_cfg_rev;
> >> -
> >> static ssize_t fw_cfg_showrev(struct kobject *k, struct attribute *a, char *buf)
> >> {
> >> return sprintf(buf, "%u\n", fw_cfg_rev);
> >> @@ -213,6 +315,32 @@ struct fw_cfg_sysfs_entry {
> >> struct list_head list;
> >> };
> >>
> >> +#ifdef CONFIG_CRASH_CORE
> >> +static ssize_t fw_cfg_write_vmcoreinfo(const struct fw_cfg_file *f)
> >> +{
> >> + static struct fw_cfg_vmcoreinfo *data;
> >> + ssize_t ret;
> >> +
> >> + data = kmalloc(sizeof(struct fw_cfg_vmcoreinfo), GFP_KERNEL);
> >> + if (!data)
> >> + return -ENOMEM;
> >> +
> >> + *data = (struct fw_cfg_vmcoreinfo) {
> >> + .guest_format = cpu_to_le16(FW_CFG_VMCOREINFO_FORMAT_ELF),
> >> + .size = cpu_to_le32(VMCOREINFO_NOTE_SIZE),
> >> + .paddr = cpu_to_le64(paddr_vmcoreinfo_note())
> >> + };
> >> + /* spare ourself reading host format support for now since we
> >> + * don't know what else to format - host may ignore ours
> >
> >
> > qemu documentation probably should mention that it must.
> >
>
> Can you be more explicit?

Well you make assumptions about host behaviour (unknown formats
are safe to send, they are ignored), so it's a good idea
to document them in host code.


> thanks
>
> >> + */
> >> + ret = fw_cfg_write_blob(be16_to_cpu(f->select), data,
> >> + 0, sizeof(struct fw_cfg_vmcoreinfo));
> >> +
> >> + kfree(data);
> >> + return ret;
> >> +}
> >> +#endif /* CONFIG_CRASH_CORE */
> >> +
> >> /* get fw_cfg_sysfs_entry from kobject member */
> >> static inline struct fw_cfg_sysfs_entry *to_entry(struct kobject *kobj)
> >> {
> >> @@ -452,6 +580,15 @@ static int fw_cfg_register_file(const struct fw_cfg_file *f)
> >> int err;
> >> struct fw_cfg_sysfs_entry *entry;
> >>
> >> +#ifdef CONFIG_CRASH_CORE
> >> + if (fw_cfg_dma_enabled() &&
> >> + strcmp(f->name, FW_CFG_VMCOREINFO_FILENAME) == 0 &&
> >> + !is_kdump_kernel()) {
> >> + if (fw_cfg_write_vmcoreinfo(f) < 0)
> >> + pr_warn("fw_cfg: failed to write vmcoreinfo");
> >> + }
> >> +#endif
> >> +
> >> /* allocate new entry */
> >> entry = kzalloc(sizeof(*entry), GFP_KERNEL);
> >> if (!entry)
> >> diff --git a/include/uapi/linux/fw_cfg.h b/include/uapi/linux/fw_cfg.h
> >> index c698ac3812f6..e089c0159ec2 100644
> >> --- a/include/uapi/linux/fw_cfg.h
> >> +++ b/include/uapi/linux/fw_cfg.h
> >> @@ -54,6 +54,7 @@
> >>
> >> /* FW_CFG_ID bits */
> >> #define FW_CFG_VERSION 0x01
> >> +#define FW_CFG_VERSION_DMA 0x02
> >>
> >> /* fw_cfg file directory entry type */
> >> struct fw_cfg_file {
> >> @@ -63,4 +64,34 @@ struct fw_cfg_file {
> >> char name[FW_CFG_MAX_FILE_PATH];
> >> };
> >>
> >> +/* FW_CFG_DMA_CONTROL bits */
> >> +#define FW_CFG_DMA_CTL_ERROR 0x01
> >> +#define FW_CFG_DMA_CTL_READ 0x02
> >> +#define FW_CFG_DMA_CTL_SKIP 0x04
> >> +#define FW_CFG_DMA_CTL_SELECT 0x08
> >> +#define FW_CFG_DMA_CTL_WRITE 0x10
> >> +
> >> +#define FW_CFG_DMA_SIGNATURE 0x51454d5520434647ULL /* "QEMU CFG" */
> >> +
> >> +/* Control as first field allows for different structures selected by this
> >> + * field, which might be useful in the future
> >> + */
> >> +struct fw_cfg_dma_access {
> >> + __be32 control;
> >> + __be32 length;
> >> + __be64 address;
> >> +};
> >> +
> >> +#define FW_CFG_VMCOREINFO_FILENAME "etc/vmcoreinfo"
> >> +
> >> +#define FW_CFG_VMCOREINFO_FORMAT_NONE 0x0
> >> +#define FW_CFG_VMCOREINFO_FORMAT_ELF 0x1
> >> +
> >> +struct fw_cfg_vmcoreinfo {
> >> + __le16 host_format;
> >> + __le16 guest_format;
> >> + __le32 size;
> >> + __le64 paddr;
> >> +};
> >> +
> >> #endif
> >> --
> >> 2.16.1.73.g5832b7e9f2

2018-02-28 15:37:20

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v15 11/11] RFC: fw_cfg: do DMA read operation

On Wed, Feb 28, 2018 at 01:27:02PM +0100, Marc-Andr? Lureau wrote:
> Hi
>
> On Tue, Feb 27, 2018 at 1:04 AM, Michael S. Tsirkin <[email protected]> wrote:
> > On Thu, Feb 15, 2018 at 10:33:12PM +0100, Marc-Andr? Lureau wrote:
> >> Modify fw_cfg_read_blob() to use DMA if the device supports it.
> >> Return errors, because the operation may fail.
> >>
> >> So far, only one call in fw_cfg_register_dir_entries() is using
> >> kmalloc'ed buf and is thus clearly eligible to DMA read.
> >>
> >> Initially, I didn't implement DMA read to speed up boot time, but as a
> >> first step before introducing DMA write (since read operations were
> >> already presents). Even more, I didn't realize fw-cfg entries were
> >> being read by the kernel during boot by default. But actally fw-cfg
> >> entries are being populated during module probe. I knew DMA improved a
> >> lot bios boot time (the main reason the DMA interface was added
> >> afaik). Let see the time it would take to read the whole ACPI
> >> tables (128kb allocated)
> >>
> >> # time cat /sys/firmware/qemu_fw_cfg/by_name/etc/acpi/tables/raw
> >> - with DMA: sys 0m0.003s
> >> - without DMA (-global fw_cfg.dma_enabled=off): sys 0m7.674s
> >>
> >> FW_CFG_FILE_DIR (0x19) is the only "file" that is read during kernel
> >> boot to populate sysfs qemu_fw_cfg directory, and it is quite
> >> small (1-2kb). Since it does not expose itself, in order to measure
> >> the time it takes to read such small file, I took a comparable sized
> >> file of 2048 bytes and exposed it (-fw_cfg test,file=file with a
> >> modified read_raw enabling DMA)
> >>
> >> # perf stat -r 100 cat /sys/firmware/qemu_fw_cfg/by_name/test/raw >/dev/null
> >> - with DMA:
> >> 0.636037 task-clock (msec) # 0.141 CPUs utilized ( +- 1.19% )
> >> - without DMA:
> >> 6.430128 task-clock (msec) # 0.622 CPUs utilized ( +- 0.22% )
> >>
> >> That's a few msec saved during boot by enabling DMA read (the gain
> >> would be more substantial if other & bigger fw-cfg entries are read by
> >> others from sysfs, unfortunately, it's not clear if we can always
> >> enable DMA there)
> >>
> >> Signed-off-by: Marc-Andr? Lureau <[email protected]>
> >> ---
> >> drivers/firmware/qemu_fw_cfg.c | 61 ++++++++++++++++++++++++++++++++++--------
> >> 1 file changed, 50 insertions(+), 11 deletions(-)
> >>
> >> diff --git a/drivers/firmware/qemu_fw_cfg.c b/drivers/firmware/qemu_fw_cfg.c
> >> index 3015e77aebca..94df57e9be66 100644
> >> --- a/drivers/firmware/qemu_fw_cfg.c
> >> +++ b/drivers/firmware/qemu_fw_cfg.c
> >> @@ -124,12 +124,47 @@ static ssize_t fw_cfg_dma_transfer(void *address, u32 length, u32 control)
> >> return ret;
> >> }
> >>
> >> +/* with acpi & dev locks taken */
> >> +static ssize_t fw_cfg_read_blob_dma(u16 key,
> >> + void *buf, loff_t pos, size_t count)
> >> +{
> >> + ssize_t ret;
> >> +
> >> + if (pos == 0) {
> >> + ret = fw_cfg_dma_transfer(buf, count, key << 16
> >> + | FW_CFG_DMA_CTL_SELECT
> >> + | FW_CFG_DMA_CTL_READ);
> >> + } else {
> >> + fw_cfg_sel_endianness(key);
> >> + ret = fw_cfg_dma_transfer(NULL, pos, FW_CFG_DMA_CTL_SKIP);
> >> + if (ret < 0)
> >> + return ret;
> >> + ret = fw_cfg_dma_transfer(buf, count,
> >> + FW_CFG_DMA_CTL_READ);
> >> + }
> >> +
> >> + return ret;
> >> +}
> >> +
> >> +/* with acpi & dev locks taken */
> >> +static ssize_t fw_cfg_read_blob_io(u16 key,
> >> + void *buf, loff_t pos, size_t count)
> >> +{
> >> + fw_cfg_sel_endianness(key);
> >> + while (pos-- > 0)
> >> + ioread8(fw_cfg_reg_data);
> >> + ioread8_rep(fw_cfg_reg_data, buf, count);
> >> + return count;
> >> +}
> >> +
> >> /* read chunk of given fw_cfg blob (caller responsible for sanity-check) */
> >> static ssize_t fw_cfg_read_blob(u16 key,
> >> - void *buf, loff_t pos, size_t count)
> >> + void *buf, loff_t pos, size_t count,
> >> + bool dma)
> >> {
> >> u32 glk = -1U;
> >> acpi_status status;
> >> + ssize_t ret;
> >>
> >> /* If we have ACPI, ensure mutual exclusion against any potential
> >> * device access by the firmware, e.g. via AML methods:
> >
> > so this adds a dma flag to fw_cfg_read_blob.
> >
> >
> >
> >> @@ -143,14 +178,17 @@ static ssize_t fw_cfg_read_blob(u16 key,
> >> }
> >>
> >> mutex_lock(&fw_cfg_dev_lock);
> >> - fw_cfg_sel_endianness(key);
> >> - while (pos-- > 0)
> >> - ioread8(fw_cfg_reg_data);
> >> - ioread8_rep(fw_cfg_reg_data, buf, count);
> >> + if (dma && fw_cfg_dma_enabled()) {
> >> + ret = fw_cfg_read_blob_dma(key, buf, pos, count);
> >> + } else {
> >> + ret = fw_cfg_read_blob_io(key, buf, pos, count);
> >> + }
> >> +
> >> mutex_unlock(&fw_cfg_dev_lock);
> >>
> >> acpi_release_global_lock(glk);
> >> - return count;
> >> +
> >> + return ret;
> >> }
> >>
> >> #ifdef CONFIG_CRASH_CORE
> >
> > If set to false it does io, if set to true it does dma.
> >
> > I would prefer passing an accessor function pointer
> > since that's clearer than true/false.
>
> ok
>
> >
> >> @@ -284,7 +322,7 @@ static int fw_cfg_do_platform_probe(struct platform_device *pdev)
> >>
> >> /* verify fw_cfg device signature */
> >> if (fw_cfg_read_blob(FW_CFG_SIGNATURE, sig,
> >> - 0, FW_CFG_SIG_SIZE) < 0 ||
> >> + 0, FW_CFG_SIG_SIZE, false) < 0 ||
> >> memcmp(sig, "QEMU", FW_CFG_SIG_SIZE) != 0) {
> >> fw_cfg_io_cleanup();
> >> return -ENODEV;
> >> @@ -468,7 +506,8 @@ static ssize_t fw_cfg_sysfs_read_raw(struct file *filp, struct kobject *kobj,
> >> if (count > entry->size - pos)
> >> count = entry->size - pos;
> >>
> >> - return fw_cfg_read_blob(entry->select, buf, pos, count);
> >> + /* do not use DMA, virt_to_phys(buf) might not be ok */
> >> + return fw_cfg_read_blob(entry->select, buf, pos, count, false);
> >> }
> >>
> >> static struct bin_attribute fw_cfg_sysfs_attr_raw = {
> >> @@ -634,7 +673,7 @@ static int fw_cfg_register_dir_entries(void)
> >> size_t dir_size;
> >>
> >> ret = fw_cfg_read_blob(FW_CFG_FILE_DIR, &files_count,
> >> - 0, sizeof(files_count));
> >> + 0, sizeof(files_count), false);
> >> if (ret < 0)
> >> return ret;
> >>
> >> @@ -646,7 +685,7 @@ static int fw_cfg_register_dir_entries(void)
> >> return -ENOMEM;
> >>
> >> ret = fw_cfg_read_blob(FW_CFG_FILE_DIR, dir,
> >> - sizeof(files_count), dir_size);
> >> + sizeof(files_count), dir_size, false);
> >> if (ret < 0)
> >> goto end;
> >>
> >> @@ -697,7 +736,7 @@ static int fw_cfg_sysfs_probe(struct platform_device *pdev)
> >> goto err_probe;
> >>
> >> /* get revision number, add matching top-level attribute */
> >> - err = fw_cfg_read_blob(FW_CFG_ID, &rev, 0, sizeof(rev));
> >> + err = fw_cfg_read_blob(FW_CFG_ID, &rev, 0, sizeof(rev), false);
> >> if (err < 0)
> >> goto err_probe;
> >
> >
> > Looks like all callers pass in false as parameter.
> > Given this, how can this speed up any operations?
> >
> > Are you sure you tested this properly?
>
>
> I did modify read_raw to conduct testing ( the part "with a
> modified read_raw enabling DMA" should be before, updating commit message).

So this patch does nothing, it's just infrastructure so DMA can be
enabled in the future - is that right?

> >
> >> --
> >> 2.16.1.73.g5832b7e9f2

2018-02-28 15:43:44

by Marc-André Lureau

[permalink] [raw]
Subject: Re: [PATCH v15 11/11] RFC: fw_cfg: do DMA read operation

On Wed, Feb 28, 2018 at 4:35 PM, Michael S. Tsirkin <[email protected]> wrote:
> On Wed, Feb 28, 2018 at 01:27:02PM +0100, Marc-André Lureau wrote:
>> Hi
>>
>> On Tue, Feb 27, 2018 at 1:04 AM, Michael S. Tsirkin <[email protected]> wrote:
>> > On Thu, Feb 15, 2018 at 10:33:12PM +0100, Marc-André Lureau wrote:
>> >> Modify fw_cfg_read_blob() to use DMA if the device supports it.
>> >> Return errors, because the operation may fail.
>> >>
>> >> So far, only one call in fw_cfg_register_dir_entries() is using
>> >> kmalloc'ed buf and is thus clearly eligible to DMA read.
>> >>
>> >> Initially, I didn't implement DMA read to speed up boot time, but as a
>> >> first step before introducing DMA write (since read operations were
>> >> already presents). Even more, I didn't realize fw-cfg entries were
>> >> being read by the kernel during boot by default. But actally fw-cfg
>> >> entries are being populated during module probe. I knew DMA improved a
>> >> lot bios boot time (the main reason the DMA interface was added
>> >> afaik). Let see the time it would take to read the whole ACPI
>> >> tables (128kb allocated)
>> >>
>> >> # time cat /sys/firmware/qemu_fw_cfg/by_name/etc/acpi/tables/raw
>> >> - with DMA: sys 0m0.003s
>> >> - without DMA (-global fw_cfg.dma_enabled=off): sys 0m7.674s
>> >>
>> >> FW_CFG_FILE_DIR (0x19) is the only "file" that is read during kernel
>> >> boot to populate sysfs qemu_fw_cfg directory, and it is quite
>> >> small (1-2kb). Since it does not expose itself, in order to measure
>> >> the time it takes to read such small file, I took a comparable sized
>> >> file of 2048 bytes and exposed it (-fw_cfg test,file=file with a
>> >> modified read_raw enabling DMA)
>> >>
>> >> # perf stat -r 100 cat /sys/firmware/qemu_fw_cfg/by_name/test/raw >/dev/null
>> >> - with DMA:
>> >> 0.636037 task-clock (msec) # 0.141 CPUs utilized ( +- 1.19% )
>> >> - without DMA:
>> >> 6.430128 task-clock (msec) # 0.622 CPUs utilized ( +- 0.22% )
>> >>
>> >> That's a few msec saved during boot by enabling DMA read (the gain
>> >> would be more substantial if other & bigger fw-cfg entries are read by
>> >> others from sysfs, unfortunately, it's not clear if we can always
>> >> enable DMA there)
>> >>
>> >> Signed-off-by: Marc-André Lureau <[email protected]>
>> >> ---
>> >> drivers/firmware/qemu_fw_cfg.c | 61 ++++++++++++++++++++++++++++++++++--------
>> >> 1 file changed, 50 insertions(+), 11 deletions(-)
>> >>
>> >> diff --git a/drivers/firmware/qemu_fw_cfg.c b/drivers/firmware/qemu_fw_cfg.c
>> >> index 3015e77aebca..94df57e9be66 100644
>> >> --- a/drivers/firmware/qemu_fw_cfg.c
>> >> +++ b/drivers/firmware/qemu_fw_cfg.c
>> >> @@ -124,12 +124,47 @@ static ssize_t fw_cfg_dma_transfer(void *address, u32 length, u32 control)
>> >> return ret;
>> >> }
>> >>
>> >> +/* with acpi & dev locks taken */
>> >> +static ssize_t fw_cfg_read_blob_dma(u16 key,
>> >> + void *buf, loff_t pos, size_t count)
>> >> +{
>> >> + ssize_t ret;
>> >> +
>> >> + if (pos == 0) {
>> >> + ret = fw_cfg_dma_transfer(buf, count, key << 16
>> >> + | FW_CFG_DMA_CTL_SELECT
>> >> + | FW_CFG_DMA_CTL_READ);
>> >> + } else {
>> >> + fw_cfg_sel_endianness(key);
>> >> + ret = fw_cfg_dma_transfer(NULL, pos, FW_CFG_DMA_CTL_SKIP);
>> >> + if (ret < 0)
>> >> + return ret;
>> >> + ret = fw_cfg_dma_transfer(buf, count,
>> >> + FW_CFG_DMA_CTL_READ);
>> >> + }
>> >> +
>> >> + return ret;
>> >> +}
>> >> +
>> >> +/* with acpi & dev locks taken */
>> >> +static ssize_t fw_cfg_read_blob_io(u16 key,
>> >> + void *buf, loff_t pos, size_t count)
>> >> +{
>> >> + fw_cfg_sel_endianness(key);
>> >> + while (pos-- > 0)
>> >> + ioread8(fw_cfg_reg_data);
>> >> + ioread8_rep(fw_cfg_reg_data, buf, count);
>> >> + return count;
>> >> +}
>> >> +
>> >> /* read chunk of given fw_cfg blob (caller responsible for sanity-check) */
>> >> static ssize_t fw_cfg_read_blob(u16 key,
>> >> - void *buf, loff_t pos, size_t count)
>> >> + void *buf, loff_t pos, size_t count,
>> >> + bool dma)
>> >> {
>> >> u32 glk = -1U;
>> >> acpi_status status;
>> >> + ssize_t ret;
>> >>
>> >> /* If we have ACPI, ensure mutual exclusion against any potential
>> >> * device access by the firmware, e.g. via AML methods:
>> >
>> > so this adds a dma flag to fw_cfg_read_blob.
>> >
>> >
>> >
>> >> @@ -143,14 +178,17 @@ static ssize_t fw_cfg_read_blob(u16 key,
>> >> }
>> >>
>> >> mutex_lock(&fw_cfg_dev_lock);
>> >> - fw_cfg_sel_endianness(key);
>> >> - while (pos-- > 0)
>> >> - ioread8(fw_cfg_reg_data);
>> >> - ioread8_rep(fw_cfg_reg_data, buf, count);
>> >> + if (dma && fw_cfg_dma_enabled()) {
>> >> + ret = fw_cfg_read_blob_dma(key, buf, pos, count);
>> >> + } else {
>> >> + ret = fw_cfg_read_blob_io(key, buf, pos, count);
>> >> + }
>> >> +
>> >> mutex_unlock(&fw_cfg_dev_lock);
>> >>
>> >> acpi_release_global_lock(glk);
>> >> - return count;
>> >> +
>> >> + return ret;
>> >> }
>> >>
>> >> #ifdef CONFIG_CRASH_CORE
>> >
>> > If set to false it does io, if set to true it does dma.
>> >
>> > I would prefer passing an accessor function pointer
>> > since that's clearer than true/false.
>>
>> ok
>>
>> >
>> >> @@ -284,7 +322,7 @@ static int fw_cfg_do_platform_probe(struct platform_device *pdev)
>> >>
>> >> /* verify fw_cfg device signature */
>> >> if (fw_cfg_read_blob(FW_CFG_SIGNATURE, sig,
>> >> - 0, FW_CFG_SIG_SIZE) < 0 ||
>> >> + 0, FW_CFG_SIG_SIZE, false) < 0 ||
>> >> memcmp(sig, "QEMU", FW_CFG_SIG_SIZE) != 0) {
>> >> fw_cfg_io_cleanup();
>> >> return -ENODEV;
>> >> @@ -468,7 +506,8 @@ static ssize_t fw_cfg_sysfs_read_raw(struct file *filp, struct kobject *kobj,
>> >> if (count > entry->size - pos)
>> >> count = entry->size - pos;
>> >>
>> >> - return fw_cfg_read_blob(entry->select, buf, pos, count);
>> >> + /* do not use DMA, virt_to_phys(buf) might not be ok */
>> >> + return fw_cfg_read_blob(entry->select, buf, pos, count, false);
>> >> }
>> >>
>> >> static struct bin_attribute fw_cfg_sysfs_attr_raw = {
>> >> @@ -634,7 +673,7 @@ static int fw_cfg_register_dir_entries(void)
>> >> size_t dir_size;
>> >>
>> >> ret = fw_cfg_read_blob(FW_CFG_FILE_DIR, &files_count,
>> >> - 0, sizeof(files_count));
>> >> + 0, sizeof(files_count), false);
>> >> if (ret < 0)
>> >> return ret;
>> >>
>> >> @@ -646,7 +685,7 @@ static int fw_cfg_register_dir_entries(void)
>> >> return -ENOMEM;
>> >>
>> >> ret = fw_cfg_read_blob(FW_CFG_FILE_DIR, dir,
>> >> - sizeof(files_count), dir_size);
>> >> + sizeof(files_count), dir_size, false);
>> >> if (ret < 0)
>> >> goto end;
>> >>
>> >> @@ -697,7 +736,7 @@ static int fw_cfg_sysfs_probe(struct platform_device *pdev)
>> >> goto err_probe;
>> >>
>> >> /* get revision number, add matching top-level attribute */
>> >> - err = fw_cfg_read_blob(FW_CFG_ID, &rev, 0, sizeof(rev));
>> >> + err = fw_cfg_read_blob(FW_CFG_ID, &rev, 0, sizeof(rev), false);
>> >> if (err < 0)
>> >> goto err_probe;
>> >
>> >
>> > Looks like all callers pass in false as parameter.
>> > Given this, how can this speed up any operations?
>> >
>> > Are you sure you tested this properly?
>>
>>
>> I did modify read_raw to conduct testing ( the part "with a
>> modified read_raw enabling DMA" should be before, updating commit message).
>
> So this patch does nothing, it's just infrastructure so DMA can be
> enabled in the future - is that right?

There is a small nit in v15, where I passed dma=false for in
read(FW_CFG_FILE_DIR), it should have been true. It is fixed in v16.

I don't know if it's always safe to enable dma in read_raw(), how
could we know? Is there a check we could use to choose one or ther
other (and thus avoiding explicit dma/readfn argument)?

2018-02-28 15:49:13

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v15 11/11] RFC: fw_cfg: do DMA read operation

On Wed, Feb 28, 2018 at 04:41:51PM +0100, Marc-Andr? Lureau wrote:
> I don't know if it's always safe to enable dma in read_raw(), how
> could we know? Is there a check we could use to choose one or ther
> other (and thus avoiding explicit dma/readfn argument)?

I'm not sure - but does it really matter? Is anyone reading large files
like this in production where speed matters?
Why even bother with DMA?

--
MST

2018-02-28 16:02:26

by Marc-André Lureau

[permalink] [raw]
Subject: Re: [PATCH v15 11/11] RFC: fw_cfg: do DMA read operation

Hi

On Wed, Feb 28, 2018 at 4:48 PM, Michael S. Tsirkin <[email protected]> wrote:
> On Wed, Feb 28, 2018 at 04:41:51PM +0100, Marc-André Lureau wrote:
>> I don't know if it's always safe to enable dma in read_raw(), how
>> could we know? Is there a check we could use to choose one or ther
>> other (and thus avoiding explicit dma/readfn argument)?
>
> I'm not sure - but does it really matter? Is anyone reading large files
> like this in production where speed matters?
> Why even bother with DMA?

The difference is quite significante for not so small files, as shown above.

And if they access the fw_cfg entries at boot time, or when starting
things etc, this may speed things up.

2018-02-28 17:18:47

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v15 11/11] RFC: fw_cfg: do DMA read operation

On Wed, Feb 28, 2018 at 05:00:58PM +0100, Marc-Andr? Lureau wrote:
> Hi
>
> On Wed, Feb 28, 2018 at 4:48 PM, Michael S. Tsirkin <[email protected]> wrote:
> > On Wed, Feb 28, 2018 at 04:41:51PM +0100, Marc-Andr? Lureau wrote:
> >> I don't know if it's always safe to enable dma in read_raw(), how
> >> could we know? Is there a check we could use to choose one or ther
> >> other (and thus avoiding explicit dma/readfn argument)?
> >
> > I'm not sure - but does it really matter? Is anyone reading large files
> > like this in production where speed matters?
> > Why even bother with DMA?
>
> The difference is quite significante for not so small files, as shown above.
>
> And if they access the fw_cfg entries at boot time, or when starting
> things etc, this may speed things up.

Question would be whether anyone at all does this.

--
MST

2018-02-28 17:19:46

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v15 11/11] RFC: fw_cfg: do DMA read operation

On Wed, Feb 28, 2018 at 04:41:51PM +0100, Marc-Andr? Lureau wrote:
> I don't know if it's always safe to enable dma in read_raw(), how
> could we know? Is there a check we could use to choose one or ther
> other (and thus avoiding explicit dma/readfn argument)?

IMHO the way to go is not to try to do zero copy.
Allocate a buffer and DMA there, then copy.

--
MST

2018-02-28 17:24:07

by Marc-André Lureau

[permalink] [raw]
Subject: Re: [PATCH v15 11/11] RFC: fw_cfg: do DMA read operation

Hi

On Wed, Feb 28, 2018 at 6:17 PM, Michael S. Tsirkin <[email protected]> wrote:
> On Wed, Feb 28, 2018 at 04:41:51PM +0100, Marc-André Lureau wrote:
>> I don't know if it's always safe to enable dma in read_raw(), how
>> could we know? Is there a check we could use to choose one or ther
>> other (and thus avoiding explicit dma/readfn argument)?
>
> IMHO the way to go is not to try to do zero copy.
> Allocate a buffer and DMA there, then copy.

Sounds fine to me, I'll resend this patch separately if the rest from
v16 is applied.

thanks

2018-02-28 23:27:10

by Gabriel L. Somlo

[permalink] [raw]
Subject: Re: [PATCH v15 08/11] fw_cfg: handle fw_cfg_read_blob() error

On Wed, Feb 28, 2018 at 05:32:52PM +0200, Michael S. Tsirkin wrote:
> On Wed, Feb 28, 2018 at 12:49:35PM +0100, Marc-Andr? Lureau wrote:
> > Hi
> >
> > On Tue, Feb 27, 2018 at 1:20 AM, Michael S. Tsirkin <[email protected]> wrote:
> > > On Thu, Feb 15, 2018 at 10:33:09PM +0100, Marc-Andr? Lureau wrote:
> > >> fw_cfg_read_blob() may fail, but does not return error. This may lead
> > >> to undefined behaviours, such as a memcmp(sig, "QEMU") on uninitilized
> > >> memory.
> > >
> > > I don't think that's true - there's a memset there that
> > > will initialize the memory. probe is likely the only
> > > case where it returns a slightly incorrect data.
> >
> > Right, I'll update the commit message.
> >
> > >> Return an error if ACPI locking failed. Also, the following
> > >> DMA read/write extension will add more error paths that should be
> > >> handled appropriately.
> > >>
> > >> Signed-off-by: Marc-Andr? Lureau <[email protected]>
> > >> ---
> > >> drivers/firmware/qemu_fw_cfg.c | 32 ++++++++++++++++++++++----------
> > >> 1 file changed, 22 insertions(+), 10 deletions(-)
> > >>
> > >> diff --git a/drivers/firmware/qemu_fw_cfg.c b/drivers/firmware/qemu_fw_cfg.c
> > >> index f6f90bef604c..5e6e5ac71dab 100644
> > >> --- a/drivers/firmware/qemu_fw_cfg.c
> > >> +++ b/drivers/firmware/qemu_fw_cfg.c
> > >> @@ -59,8 +59,8 @@ static void fw_cfg_sel_endianness(u16 key)
> > >> }
> > >>
> > >> /* read chunk of given fw_cfg blob (caller responsible for sanity-check) */
> > >> -static void fw_cfg_read_blob(u16 key,
> > >> - void *buf, loff_t pos, size_t count)
> > >> +static ssize_t fw_cfg_read_blob(u16 key,
> > >> + void *buf, loff_t pos, size_t count)
> > >> {
> > >> u32 glk = -1U;
> > >> acpi_status status;
> > >> @@ -73,7 +73,7 @@ static void fw_cfg_read_blob(u16 key,
> > >> /* Should never get here */
> > >> WARN(1, "fw_cfg_read_blob: Failed to lock ACPI!\n");
> > >> memset(buf, 0, count);
> > >> - return;
> > >> + return -EINVAL;
> > >> }
> > >>
> > >> mutex_lock(&fw_cfg_dev_lock);
> > >
> > > Wouldn't something like -EBUSY be more appropriate?
> >
> > In theory, it would be a general failure right? I don't think we want
> > the caller to retry. I think in EINVAL fits better, but I don't think
> > it matters much this or EBUSY.
> >
> > >> @@ -84,6 +84,7 @@ static void fw_cfg_read_blob(u16 key,
> > >> mutex_unlock(&fw_cfg_dev_lock);
> > >>
> > >> acpi_release_global_lock(glk);
> > >> + return count;
> > >> }
> > >>
> > >> /* clean up fw_cfg device i/o */
> > >> @@ -165,8 +166,9 @@ static int fw_cfg_do_platform_probe(struct platform_device *pdev)
> > >> }
> > >>
> > >> /* verify fw_cfg device signature */
> > >> - fw_cfg_read_blob(FW_CFG_SIGNATURE, sig, 0, FW_CFG_SIG_SIZE);
> > >> - if (memcmp(sig, "QEMU", FW_CFG_SIG_SIZE) != 0) {
> > >> + if (fw_cfg_read_blob(FW_CFG_SIGNATURE, sig,
> > >> + 0, FW_CFG_SIG_SIZE) < 0 ||
> > >> + memcmp(sig, "QEMU", FW_CFG_SIG_SIZE) != 0) {
> > >> fw_cfg_io_cleanup();
> > >> return -ENODEV;
> > >> }
> > >> @@ -326,8 +328,7 @@ static ssize_t fw_cfg_sysfs_read_raw(struct file *filp, struct kobject *kobj,
> > >> if (count > entry->size - pos)
> > >> count = entry->size - pos;
> > >>
> > >> - fw_cfg_read_blob(entry->select, buf, pos, count);
> > >> - return count;
> > >> + return fw_cfg_read_blob(entry->select, buf, pos, count);
> > >> }
> > >>
> > >> static struct bin_attribute fw_cfg_sysfs_attr_raw = {
> > >> @@ -483,7 +484,11 @@ static int fw_cfg_register_dir_entries(void)
> > >> struct fw_cfg_file *dir;
> > >> size_t dir_size;
> > >>
> > >> - fw_cfg_read_blob(FW_CFG_FILE_DIR, &files_count, 0, sizeof(files_count));
> > >> + ret = fw_cfg_read_blob(FW_CFG_FILE_DIR, &files_count,
> > >> + 0, sizeof(files_count));
> > >> + if (ret < 0)
> > >> + return ret;
> > >> +
> > >> count = be32_to_cpu(files_count);
> > >> dir_size = count * sizeof(struct fw_cfg_file);
> > >>
> > >> @@ -491,7 +496,10 @@ static int fw_cfg_register_dir_entries(void)
> > >> if (!dir)
> > >> return -ENOMEM;
> > >>
> > >> - fw_cfg_read_blob(FW_CFG_FILE_DIR, dir, sizeof(files_count), dir_size);
> > >> + ret = fw_cfg_read_blob(FW_CFG_FILE_DIR, dir,
> > >> + sizeof(files_count), dir_size);
> > >> + if (ret < 0)
> > >> + goto end;
> > >>
> > >> for (i = 0; i < count; i++) {
> > >> ret = fw_cfg_register_file(&dir[i]);
> > >> @@ -499,6 +507,7 @@ static int fw_cfg_register_dir_entries(void)
> > >> break;
> > >> }
> > >>
> > >> +end:
> > >> kfree(dir);
> > >> return ret;
> > >> }
> > >> @@ -539,7 +548,10 @@ static int fw_cfg_sysfs_probe(struct platform_device *pdev)
> > >> goto err_probe;
> > >>
> > >> /* get revision number, add matching top-level attribute */
> > >> - fw_cfg_read_blob(FW_CFG_ID, &rev, 0, sizeof(rev));
> > >> + err = fw_cfg_read_blob(FW_CFG_ID, &rev, 0, sizeof(rev));
> > >> + if (err < 0)
> > >> + goto err_probe;
> > >> +
> > >> fw_cfg_rev = le32_to_cpu(rev);
> > >> err = sysfs_create_file(fw_cfg_top_ko, &fw_cfg_rev_attr.attr);
> > >> if (err)
> > >
> > > I think that this is the only case where it's not doing the right thing right now in
> > > that it shows 0 as the revision to the users. Is it worth failing probe
> > > here? We could just skip the attribute, could we not?
> >
> > I think it's best to fail the probe if we have a read failure at that time.
>
> I'd rather we just dropped this attribute completely.
> Why is it there?
> Does any userspace actually need it?
> Gabriel?

I'd recommend keeping it: `cat /sys/firmware/qemu_fw_cfg/rev` is how
you can easily tell if, e.g., dma is supported :)

If you end up with a '0' there, it's because locking ACPI with
WAIT_FOREVER failed with something other than AE_NOT_CONFIGURED, which
should never happen, so we're off the reservation, having morally
failed the equivalent of an assertion.

Perhaps memset to something other than 0 (all-Fs, so we'd get a -1 for
rev)?

Thanks,
--Gabriel
>
> > --
> > Marc-Andr? Lureau

2018-02-28 23:59:00

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v15 08/11] fw_cfg: handle fw_cfg_read_blob() error

On Wed, Feb 28, 2018 at 06:25:58PM -0500, Gabriel Somlo wrote:
> On Wed, Feb 28, 2018 at 05:32:52PM +0200, Michael S. Tsirkin wrote:
> > On Wed, Feb 28, 2018 at 12:49:35PM +0100, Marc-Andr? Lureau wrote:
> > > Hi
> > >
> > > On Tue, Feb 27, 2018 at 1:20 AM, Michael S. Tsirkin <[email protected]> wrote:
> > > > On Thu, Feb 15, 2018 at 10:33:09PM +0100, Marc-Andr? Lureau wrote:
> > > >> fw_cfg_read_blob() may fail, but does not return error. This may lead
> > > >> to undefined behaviours, such as a memcmp(sig, "QEMU") on uninitilized
> > > >> memory.
> > > >
> > > > I don't think that's true - there's a memset there that
> > > > will initialize the memory. probe is likely the only
> > > > case where it returns a slightly incorrect data.
> > >
> > > Right, I'll update the commit message.
> > >
> > > >> Return an error if ACPI locking failed. Also, the following
> > > >> DMA read/write extension will add more error paths that should be
> > > >> handled appropriately.
> > > >>
> > > >> Signed-off-by: Marc-Andr? Lureau <[email protected]>
> > > >> ---
> > > >> drivers/firmware/qemu_fw_cfg.c | 32 ++++++++++++++++++++++----------
> > > >> 1 file changed, 22 insertions(+), 10 deletions(-)
> > > >>
> > > >> diff --git a/drivers/firmware/qemu_fw_cfg.c b/drivers/firmware/qemu_fw_cfg.c
> > > >> index f6f90bef604c..5e6e5ac71dab 100644
> > > >> --- a/drivers/firmware/qemu_fw_cfg.c
> > > >> +++ b/drivers/firmware/qemu_fw_cfg.c
> > > >> @@ -59,8 +59,8 @@ static void fw_cfg_sel_endianness(u16 key)
> > > >> }
> > > >>
> > > >> /* read chunk of given fw_cfg blob (caller responsible for sanity-check) */
> > > >> -static void fw_cfg_read_blob(u16 key,
> > > >> - void *buf, loff_t pos, size_t count)
> > > >> +static ssize_t fw_cfg_read_blob(u16 key,
> > > >> + void *buf, loff_t pos, size_t count)
> > > >> {
> > > >> u32 glk = -1U;
> > > >> acpi_status status;
> > > >> @@ -73,7 +73,7 @@ static void fw_cfg_read_blob(u16 key,
> > > >> /* Should never get here */
> > > >> WARN(1, "fw_cfg_read_blob: Failed to lock ACPI!\n");
> > > >> memset(buf, 0, count);
> > > >> - return;
> > > >> + return -EINVAL;
> > > >> }
> > > >>
> > > >> mutex_lock(&fw_cfg_dev_lock);
> > > >
> > > > Wouldn't something like -EBUSY be more appropriate?
> > >
> > > In theory, it would be a general failure right? I don't think we want
> > > the caller to retry. I think in EINVAL fits better, but I don't think
> > > it matters much this or EBUSY.
> > >
> > > >> @@ -84,6 +84,7 @@ static void fw_cfg_read_blob(u16 key,
> > > >> mutex_unlock(&fw_cfg_dev_lock);
> > > >>
> > > >> acpi_release_global_lock(glk);
> > > >> + return count;
> > > >> }
> > > >>
> > > >> /* clean up fw_cfg device i/o */
> > > >> @@ -165,8 +166,9 @@ static int fw_cfg_do_platform_probe(struct platform_device *pdev)
> > > >> }
> > > >>
> > > >> /* verify fw_cfg device signature */
> > > >> - fw_cfg_read_blob(FW_CFG_SIGNATURE, sig, 0, FW_CFG_SIG_SIZE);
> > > >> - if (memcmp(sig, "QEMU", FW_CFG_SIG_SIZE) != 0) {
> > > >> + if (fw_cfg_read_blob(FW_CFG_SIGNATURE, sig,
> > > >> + 0, FW_CFG_SIG_SIZE) < 0 ||
> > > >> + memcmp(sig, "QEMU", FW_CFG_SIG_SIZE) != 0) {
> > > >> fw_cfg_io_cleanup();
> > > >> return -ENODEV;
> > > >> }
> > > >> @@ -326,8 +328,7 @@ static ssize_t fw_cfg_sysfs_read_raw(struct file *filp, struct kobject *kobj,
> > > >> if (count > entry->size - pos)
> > > >> count = entry->size - pos;
> > > >>
> > > >> - fw_cfg_read_blob(entry->select, buf, pos, count);
> > > >> - return count;
> > > >> + return fw_cfg_read_blob(entry->select, buf, pos, count);
> > > >> }
> > > >>
> > > >> static struct bin_attribute fw_cfg_sysfs_attr_raw = {
> > > >> @@ -483,7 +484,11 @@ static int fw_cfg_register_dir_entries(void)
> > > >> struct fw_cfg_file *dir;
> > > >> size_t dir_size;
> > > >>
> > > >> - fw_cfg_read_blob(FW_CFG_FILE_DIR, &files_count, 0, sizeof(files_count));
> > > >> + ret = fw_cfg_read_blob(FW_CFG_FILE_DIR, &files_count,
> > > >> + 0, sizeof(files_count));
> > > >> + if (ret < 0)
> > > >> + return ret;
> > > >> +
> > > >> count = be32_to_cpu(files_count);
> > > >> dir_size = count * sizeof(struct fw_cfg_file);
> > > >>
> > > >> @@ -491,7 +496,10 @@ static int fw_cfg_register_dir_entries(void)
> > > >> if (!dir)
> > > >> return -ENOMEM;
> > > >>
> > > >> - fw_cfg_read_blob(FW_CFG_FILE_DIR, dir, sizeof(files_count), dir_size);
> > > >> + ret = fw_cfg_read_blob(FW_CFG_FILE_DIR, dir,
> > > >> + sizeof(files_count), dir_size);
> > > >> + if (ret < 0)
> > > >> + goto end;
> > > >>
> > > >> for (i = 0; i < count; i++) {
> > > >> ret = fw_cfg_register_file(&dir[i]);
> > > >> @@ -499,6 +507,7 @@ static int fw_cfg_register_dir_entries(void)
> > > >> break;
> > > >> }
> > > >>
> > > >> +end:
> > > >> kfree(dir);
> > > >> return ret;
> > > >> }
> > > >> @@ -539,7 +548,10 @@ static int fw_cfg_sysfs_probe(struct platform_device *pdev)
> > > >> goto err_probe;
> > > >>
> > > >> /* get revision number, add matching top-level attribute */
> > > >> - fw_cfg_read_blob(FW_CFG_ID, &rev, 0, sizeof(rev));
> > > >> + err = fw_cfg_read_blob(FW_CFG_ID, &rev, 0, sizeof(rev));
> > > >> + if (err < 0)
> > > >> + goto err_probe;
> > > >> +
> > > >> fw_cfg_rev = le32_to_cpu(rev);
> > > >> err = sysfs_create_file(fw_cfg_top_ko, &fw_cfg_rev_attr.attr);
> > > >> if (err)
> > > >
> > > > I think that this is the only case where it's not doing the right thing right now in
> > > > that it shows 0 as the revision to the users. Is it worth failing probe
> > > > here? We could just skip the attribute, could we not?
> > >
> > > I think it's best to fail the probe if we have a read failure at that time.
> >
> > I'd rather we just dropped this attribute completely.
> > Why is it there?
> > Does any userspace actually need it?
> > Gabriel?
>
> I'd recommend keeping it: `cat /sys/firmware/qemu_fw_cfg/rev` is how
> you can easily tell if, e.g., dma is supported :)
>
> If you end up with a '0' there, it's because locking ACPI with
> WAIT_FOREVER failed with something other than AE_NOT_CONFIGURED, which
> should never happen, so we're off the reservation, having morally
> failed the equivalent of an assertion.
>
> Perhaps memset to something other than 0 (all-Fs, so we'd get a -1 for
> rev)?
>
> Thanks,
> --Gabriel

Why not just skip adding the attribute on this error?

> >
> > > --
> > > Marc-Andr? Lureau

2018-02-28 23:59:34

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v15 08/11] fw_cfg: handle fw_cfg_read_blob() error

On Thu, Mar 01, 2018 at 01:58:01AM +0200, Michael S. Tsirkin wrote:
> On Wed, Feb 28, 2018 at 06:25:58PM -0500, Gabriel Somlo wrote:
> > On Wed, Feb 28, 2018 at 05:32:52PM +0200, Michael S. Tsirkin wrote:
> > > On Wed, Feb 28, 2018 at 12:49:35PM +0100, Marc-Andr? Lureau wrote:
> > > > Hi
> > > >
> > > > On Tue, Feb 27, 2018 at 1:20 AM, Michael S. Tsirkin <[email protected]> wrote:
> > > > > On Thu, Feb 15, 2018 at 10:33:09PM +0100, Marc-Andr? Lureau wrote:
> > > > >> fw_cfg_read_blob() may fail, but does not return error. This may lead
> > > > >> to undefined behaviours, such as a memcmp(sig, "QEMU") on uninitilized
> > > > >> memory.
> > > > >
> > > > > I don't think that's true - there's a memset there that
> > > > > will initialize the memory. probe is likely the only
> > > > > case where it returns a slightly incorrect data.
> > > >
> > > > Right, I'll update the commit message.
> > > >
> > > > >> Return an error if ACPI locking failed. Also, the following
> > > > >> DMA read/write extension will add more error paths that should be
> > > > >> handled appropriately.
> > > > >>
> > > > >> Signed-off-by: Marc-Andr? Lureau <[email protected]>
> > > > >> ---
> > > > >> drivers/firmware/qemu_fw_cfg.c | 32 ++++++++++++++++++++++----------
> > > > >> 1 file changed, 22 insertions(+), 10 deletions(-)
> > > > >>
> > > > >> diff --git a/drivers/firmware/qemu_fw_cfg.c b/drivers/firmware/qemu_fw_cfg.c
> > > > >> index f6f90bef604c..5e6e5ac71dab 100644
> > > > >> --- a/drivers/firmware/qemu_fw_cfg.c
> > > > >> +++ b/drivers/firmware/qemu_fw_cfg.c
> > > > >> @@ -59,8 +59,8 @@ static void fw_cfg_sel_endianness(u16 key)
> > > > >> }
> > > > >>
> > > > >> /* read chunk of given fw_cfg blob (caller responsible for sanity-check) */
> > > > >> -static void fw_cfg_read_blob(u16 key,
> > > > >> - void *buf, loff_t pos, size_t count)
> > > > >> +static ssize_t fw_cfg_read_blob(u16 key,
> > > > >> + void *buf, loff_t pos, size_t count)
> > > > >> {
> > > > >> u32 glk = -1U;
> > > > >> acpi_status status;
> > > > >> @@ -73,7 +73,7 @@ static void fw_cfg_read_blob(u16 key,
> > > > >> /* Should never get here */
> > > > >> WARN(1, "fw_cfg_read_blob: Failed to lock ACPI!\n");
> > > > >> memset(buf, 0, count);
> > > > >> - return;
> > > > >> + return -EINVAL;
> > > > >> }
> > > > >>
> > > > >> mutex_lock(&fw_cfg_dev_lock);
> > > > >
> > > > > Wouldn't something like -EBUSY be more appropriate?
> > > >
> > > > In theory, it would be a general failure right? I don't think we want
> > > > the caller to retry. I think in EINVAL fits better, but I don't think
> > > > it matters much this or EBUSY.
> > > >
> > > > >> @@ -84,6 +84,7 @@ static void fw_cfg_read_blob(u16 key,
> > > > >> mutex_unlock(&fw_cfg_dev_lock);
> > > > >>
> > > > >> acpi_release_global_lock(glk);
> > > > >> + return count;
> > > > >> }
> > > > >>
> > > > >> /* clean up fw_cfg device i/o */
> > > > >> @@ -165,8 +166,9 @@ static int fw_cfg_do_platform_probe(struct platform_device *pdev)
> > > > >> }
> > > > >>
> > > > >> /* verify fw_cfg device signature */
> > > > >> - fw_cfg_read_blob(FW_CFG_SIGNATURE, sig, 0, FW_CFG_SIG_SIZE);
> > > > >> - if (memcmp(sig, "QEMU", FW_CFG_SIG_SIZE) != 0) {
> > > > >> + if (fw_cfg_read_blob(FW_CFG_SIGNATURE, sig,
> > > > >> + 0, FW_CFG_SIG_SIZE) < 0 ||
> > > > >> + memcmp(sig, "QEMU", FW_CFG_SIG_SIZE) != 0) {
> > > > >> fw_cfg_io_cleanup();
> > > > >> return -ENODEV;
> > > > >> }
> > > > >> @@ -326,8 +328,7 @@ static ssize_t fw_cfg_sysfs_read_raw(struct file *filp, struct kobject *kobj,
> > > > >> if (count > entry->size - pos)
> > > > >> count = entry->size - pos;
> > > > >>
> > > > >> - fw_cfg_read_blob(entry->select, buf, pos, count);
> > > > >> - return count;
> > > > >> + return fw_cfg_read_blob(entry->select, buf, pos, count);
> > > > >> }
> > > > >>
> > > > >> static struct bin_attribute fw_cfg_sysfs_attr_raw = {
> > > > >> @@ -483,7 +484,11 @@ static int fw_cfg_register_dir_entries(void)
> > > > >> struct fw_cfg_file *dir;
> > > > >> size_t dir_size;
> > > > >>
> > > > >> - fw_cfg_read_blob(FW_CFG_FILE_DIR, &files_count, 0, sizeof(files_count));
> > > > >> + ret = fw_cfg_read_blob(FW_CFG_FILE_DIR, &files_count,
> > > > >> + 0, sizeof(files_count));
> > > > >> + if (ret < 0)
> > > > >> + return ret;
> > > > >> +
> > > > >> count = be32_to_cpu(files_count);
> > > > >> dir_size = count * sizeof(struct fw_cfg_file);
> > > > >>
> > > > >> @@ -491,7 +496,10 @@ static int fw_cfg_register_dir_entries(void)
> > > > >> if (!dir)
> > > > >> return -ENOMEM;
> > > > >>
> > > > >> - fw_cfg_read_blob(FW_CFG_FILE_DIR, dir, sizeof(files_count), dir_size);
> > > > >> + ret = fw_cfg_read_blob(FW_CFG_FILE_DIR, dir,
> > > > >> + sizeof(files_count), dir_size);
> > > > >> + if (ret < 0)
> > > > >> + goto end;
> > > > >>
> > > > >> for (i = 0; i < count; i++) {
> > > > >> ret = fw_cfg_register_file(&dir[i]);
> > > > >> @@ -499,6 +507,7 @@ static int fw_cfg_register_dir_entries(void)
> > > > >> break;
> > > > >> }
> > > > >>
> > > > >> +end:
> > > > >> kfree(dir);
> > > > >> return ret;
> > > > >> }
> > > > >> @@ -539,7 +548,10 @@ static int fw_cfg_sysfs_probe(struct platform_device *pdev)
> > > > >> goto err_probe;
> > > > >>
> > > > >> /* get revision number, add matching top-level attribute */
> > > > >> - fw_cfg_read_blob(FW_CFG_ID, &rev, 0, sizeof(rev));
> > > > >> + err = fw_cfg_read_blob(FW_CFG_ID, &rev, 0, sizeof(rev));
> > > > >> + if (err < 0)
> > > > >> + goto err_probe;
> > > > >> +
> > > > >> fw_cfg_rev = le32_to_cpu(rev);
> > > > >> err = sysfs_create_file(fw_cfg_top_ko, &fw_cfg_rev_attr.attr);
> > > > >> if (err)
> > > > >
> > > > > I think that this is the only case where it's not doing the right thing right now in
> > > > > that it shows 0 as the revision to the users. Is it worth failing probe
> > > > > here? We could just skip the attribute, could we not?
> > > >
> > > > I think it's best to fail the probe if we have a read failure at that time.
> > >
> > > I'd rather we just dropped this attribute completely.
> > > Why is it there?
> > > Does any userspace actually need it?
> > > Gabriel?
> >
> > I'd recommend keeping it: `cat /sys/firmware/qemu_fw_cfg/rev` is how
> > you can easily tell if, e.g., dma is supported :)

Does user ever care?

> > If you end up with a '0' there, it's because locking ACPI with
> > WAIT_FOREVER failed with something other than AE_NOT_CONFIGURED, which
> > should never happen, so we're off the reservation, having morally
> > failed the equivalent of an assertion.
> >
> > Perhaps memset to something other than 0 (all-Fs, so we'd get a -1 for
> > rev)?
> >
> > Thanks,
> > --Gabriel
>
> Why not just skip adding the attribute on this error?
>
> > >
> > > > --
> > > > Marc-Andr? Lureau

2018-03-01 00:50:40

by Gabriel L. Somlo

[permalink] [raw]
Subject: Re: [PATCH v15 08/11] fw_cfg: handle fw_cfg_read_blob() error

On Thu, Mar 01, 2018 at 01:58:22AM +0200, Michael S. Tsirkin wrote:
> On Thu, Mar 01, 2018 at 01:58:01AM +0200, Michael S. Tsirkin wrote:
> > On Wed, Feb 28, 2018 at 06:25:58PM -0500, Gabriel Somlo wrote:
> > > On Wed, Feb 28, 2018 at 05:32:52PM +0200, Michael S. Tsirkin wrote:
> > > > On Wed, Feb 28, 2018 at 12:49:35PM +0100, Marc-Andr? Lureau wrote:
> > > > > Hi
> > > > >
> > > > > On Tue, Feb 27, 2018 at 1:20 AM, Michael S. Tsirkin <[email protected]> wrote:
> > > > > > On Thu, Feb 15, 2018 at 10:33:09PM +0100, Marc-Andr? Lureau wrote:
> > > > > >> fw_cfg_read_blob() may fail, but does not return error. This may lead
> > > > > >> to undefined behaviours, such as a memcmp(sig, "QEMU") on uninitilized
> > > > > >> memory.
> > > > > >
> > > > > > I don't think that's true - there's a memset there that
> > > > > > will initialize the memory. probe is likely the only
> > > > > > case where it returns a slightly incorrect data.
> > > > >
> > > > > Right, I'll update the commit message.
> > > > >
> > > > > >> Return an error if ACPI locking failed. Also, the following
> > > > > >> DMA read/write extension will add more error paths that should be
> > > > > >> handled appropriately.
> > > > > >>
> > > > > >> Signed-off-by: Marc-Andr? Lureau <[email protected]>
> > > > > >> ---
> > > > > >> drivers/firmware/qemu_fw_cfg.c | 32 ++++++++++++++++++++++----------
> > > > > >> 1 file changed, 22 insertions(+), 10 deletions(-)
> > > > > >>
> > > > > >> diff --git a/drivers/firmware/qemu_fw_cfg.c b/drivers/firmware/qemu_fw_cfg.c
> > > > > >> index f6f90bef604c..5e6e5ac71dab 100644
> > > > > >> --- a/drivers/firmware/qemu_fw_cfg.c
> > > > > >> +++ b/drivers/firmware/qemu_fw_cfg.c
> > > > > >> @@ -59,8 +59,8 @@ static void fw_cfg_sel_endianness(u16 key)
> > > > > >> }
> > > > > >>
> > > > > >> /* read chunk of given fw_cfg blob (caller responsible for sanity-check) */
> > > > > >> -static void fw_cfg_read_blob(u16 key,
> > > > > >> - void *buf, loff_t pos, size_t count)
> > > > > >> +static ssize_t fw_cfg_read_blob(u16 key,
> > > > > >> + void *buf, loff_t pos, size_t count)
> > > > > >> {
> > > > > >> u32 glk = -1U;
> > > > > >> acpi_status status;
> > > > > >> @@ -73,7 +73,7 @@ static void fw_cfg_read_blob(u16 key,
> > > > > >> /* Should never get here */
> > > > > >> WARN(1, "fw_cfg_read_blob: Failed to lock ACPI!\n");
> > > > > >> memset(buf, 0, count);
> > > > > >> - return;
> > > > > >> + return -EINVAL;
> > > > > >> }
> > > > > >>
> > > > > >> mutex_lock(&fw_cfg_dev_lock);
> > > > > >
> > > > > > Wouldn't something like -EBUSY be more appropriate?
> > > > >
> > > > > In theory, it would be a general failure right? I don't think we want
> > > > > the caller to retry. I think in EINVAL fits better, but I don't think
> > > > > it matters much this or EBUSY.
> > > > >
> > > > > >> @@ -84,6 +84,7 @@ static void fw_cfg_read_blob(u16 key,
> > > > > >> mutex_unlock(&fw_cfg_dev_lock);
> > > > > >>
> > > > > >> acpi_release_global_lock(glk);
> > > > > >> + return count;
> > > > > >> }
> > > > > >>
> > > > > >> /* clean up fw_cfg device i/o */
> > > > > >> @@ -165,8 +166,9 @@ static int fw_cfg_do_platform_probe(struct platform_device *pdev)
> > > > > >> }
> > > > > >>
> > > > > >> /* verify fw_cfg device signature */
> > > > > >> - fw_cfg_read_blob(FW_CFG_SIGNATURE, sig, 0, FW_CFG_SIG_SIZE);
> > > > > >> - if (memcmp(sig, "QEMU", FW_CFG_SIG_SIZE) != 0) {
> > > > > >> + if (fw_cfg_read_blob(FW_CFG_SIGNATURE, sig,
> > > > > >> + 0, FW_CFG_SIG_SIZE) < 0 ||
> > > > > >> + memcmp(sig, "QEMU", FW_CFG_SIG_SIZE) != 0) {
> > > > > >> fw_cfg_io_cleanup();
> > > > > >> return -ENODEV;
> > > > > >> }
> > > > > >> @@ -326,8 +328,7 @@ static ssize_t fw_cfg_sysfs_read_raw(struct file *filp, struct kobject *kobj,
> > > > > >> if (count > entry->size - pos)
> > > > > >> count = entry->size - pos;
> > > > > >>
> > > > > >> - fw_cfg_read_blob(entry->select, buf, pos, count);
> > > > > >> - return count;
> > > > > >> + return fw_cfg_read_blob(entry->select, buf, pos, count);
> > > > > >> }
> > > > > >>
> > > > > >> static struct bin_attribute fw_cfg_sysfs_attr_raw = {
> > > > > >> @@ -483,7 +484,11 @@ static int fw_cfg_register_dir_entries(void)
> > > > > >> struct fw_cfg_file *dir;
> > > > > >> size_t dir_size;
> > > > > >>
> > > > > >> - fw_cfg_read_blob(FW_CFG_FILE_DIR, &files_count, 0, sizeof(files_count));
> > > > > >> + ret = fw_cfg_read_blob(FW_CFG_FILE_DIR, &files_count,
> > > > > >> + 0, sizeof(files_count));
> > > > > >> + if (ret < 0)
> > > > > >> + return ret;
> > > > > >> +
> > > > > >> count = be32_to_cpu(files_count);
> > > > > >> dir_size = count * sizeof(struct fw_cfg_file);
> > > > > >>
> > > > > >> @@ -491,7 +496,10 @@ static int fw_cfg_register_dir_entries(void)
> > > > > >> if (!dir)
> > > > > >> return -ENOMEM;
> > > > > >>
> > > > > >> - fw_cfg_read_blob(FW_CFG_FILE_DIR, dir, sizeof(files_count), dir_size);
> > > > > >> + ret = fw_cfg_read_blob(FW_CFG_FILE_DIR, dir,
> > > > > >> + sizeof(files_count), dir_size);
> > > > > >> + if (ret < 0)
> > > > > >> + goto end;
> > > > > >>
> > > > > >> for (i = 0; i < count; i++) {
> > > > > >> ret = fw_cfg_register_file(&dir[i]);
> > > > > >> @@ -499,6 +507,7 @@ static int fw_cfg_register_dir_entries(void)
> > > > > >> break;
> > > > > >> }
> > > > > >>
> > > > > >> +end:
> > > > > >> kfree(dir);
> > > > > >> return ret;
> > > > > >> }
> > > > > >> @@ -539,7 +548,10 @@ static int fw_cfg_sysfs_probe(struct platform_device *pdev)
> > > > > >> goto err_probe;
> > > > > >>
> > > > > >> /* get revision number, add matching top-level attribute */
> > > > > >> - fw_cfg_read_blob(FW_CFG_ID, &rev, 0, sizeof(rev));
> > > > > >> + err = fw_cfg_read_blob(FW_CFG_ID, &rev, 0, sizeof(rev));
> > > > > >> + if (err < 0)
> > > > > >> + goto err_probe;
> > > > > >> +
> > > > > >> fw_cfg_rev = le32_to_cpu(rev);
> > > > > >> err = sysfs_create_file(fw_cfg_top_ko, &fw_cfg_rev_attr.attr);
> > > > > >> if (err)
> > > > > >
> > > > > > I think that this is the only case where it's not doing the right thing right now in
> > > > > > that it shows 0 as the revision to the users. Is it worth failing probe
> > > > > > here? We could just skip the attribute, could we not?
> > > > >
> > > > > I think it's best to fail the probe if we have a read failure at that time.
> > > >
> > > > I'd rather we just dropped this attribute completely.
> > > > Why is it there?
> > > > Does any userspace actually need it?
> > > > Gabriel?
> > >
> > > I'd recommend keeping it: `cat /sys/firmware/qemu_fw_cfg/rev` is how
> > > you can easily tell if, e.g., dma is supported :)
>
> Does user ever care?

I'd rather not presume they *don't*

For instance, yourself: you add DMA support, then one day in the
future on some system you're on, things feel sluggish. You're curious
-- does this thing claim to support DMA? The answer is one easy `cat`
away. If you then found out someone decided you're not supposed to care,
and forced you to dig through the sources for a guess instead, you'd
probably be less than happy about it... :)

That said, if there's a better reason to leave it out than handling
some never-to-occur error condition elegantly, I can be persuaded...

As I replied to Michael earlier, leaving it out if that impossible
error condition *does* occur is OK with me (since it "never happens"
in practice... :)

Thanks,
--G

> > > If you end up with a '0' there, it's because locking ACPI with
> > > WAIT_FOREVER failed with something other than AE_NOT_CONFIGURED, which
> > > should never happen, so we're off the reservation, having morally
> > > failed the equivalent of an assertion.
> > >
> > > Perhaps memset to something other than 0 (all-Fs, so we'd get a -1 for
> > > rev)?
> > >
> > > Thanks,
> > > --Gabriel
> >
> > Why not just skip adding the attribute on this error?
> >
> > > >
> > > > > --
> > > > > Marc-Andr? Lureau

2018-03-01 01:00:39

by Gabriel L. Somlo

[permalink] [raw]
Subject: Re: [PATCH v15 08/11] fw_cfg: handle fw_cfg_read_blob() error

On Thu, Mar 01, 2018 at 01:58:01AM +0200, Michael S. Tsirkin wrote:
> On Wed, Feb 28, 2018 at 06:25:58PM -0500, Gabriel Somlo wrote:
> > On Wed, Feb 28, 2018 at 05:32:52PM +0200, Michael S. Tsirkin wrote:
> > > On Wed, Feb 28, 2018 at 12:49:35PM +0100, Marc-Andr? Lureau wrote:
> > > > Hi
> > > >
> > > > On Tue, Feb 27, 2018 at 1:20 AM, Michael S. Tsirkin <[email protected]> wrote:
> > > > > On Thu, Feb 15, 2018 at 10:33:09PM +0100, Marc-Andr? Lureau wrote:
> > > > >> fw_cfg_read_blob() may fail, but does not return error. This may lead
> > > > >> to undefined behaviours, such as a memcmp(sig, "QEMU") on uninitilized
> > > > >> memory.
> > > > >
> > > > > I don't think that's true - there's a memset there that
> > > > > will initialize the memory. probe is likely the only
> > > > > case where it returns a slightly incorrect data.
> > > >
> > > > Right, I'll update the commit message.
> > > >
> > > > >> Return an error if ACPI locking failed. Also, the following
> > > > >> DMA read/write extension will add more error paths that should be
> > > > >> handled appropriately.
> > > > >>
> > > > >> Signed-off-by: Marc-Andr? Lureau <[email protected]>
> > > > >> ---
> > > > >> drivers/firmware/qemu_fw_cfg.c | 32 ++++++++++++++++++++++----------
> > > > >> 1 file changed, 22 insertions(+), 10 deletions(-)
> > > > >>
> > > > >> diff --git a/drivers/firmware/qemu_fw_cfg.c b/drivers/firmware/qemu_fw_cfg.c
> > > > >> index f6f90bef604c..5e6e5ac71dab 100644
> > > > >> --- a/drivers/firmware/qemu_fw_cfg.c
> > > > >> +++ b/drivers/firmware/qemu_fw_cfg.c
> > > > >> @@ -59,8 +59,8 @@ static void fw_cfg_sel_endianness(u16 key)
> > > > >> }
> > > > >>
> > > > >> /* read chunk of given fw_cfg blob (caller responsible for sanity-check) */
> > > > >> -static void fw_cfg_read_blob(u16 key,
> > > > >> - void *buf, loff_t pos, size_t count)
> > > > >> +static ssize_t fw_cfg_read_blob(u16 key,
> > > > >> + void *buf, loff_t pos, size_t count)
> > > > >> {
> > > > >> u32 glk = -1U;
> > > > >> acpi_status status;
> > > > >> @@ -73,7 +73,7 @@ static void fw_cfg_read_blob(u16 key,
> > > > >> /* Should never get here */
> > > > >> WARN(1, "fw_cfg_read_blob: Failed to lock ACPI!\n");
> > > > >> memset(buf, 0, count);
> > > > >> - return;
> > > > >> + return -EINVAL;
> > > > >> }
> > > > >>
> > > > >> mutex_lock(&fw_cfg_dev_lock);
> > > > >
> > > > > Wouldn't something like -EBUSY be more appropriate?
> > > >
> > > > In theory, it would be a general failure right? I don't think we want
> > > > the caller to retry. I think in EINVAL fits better, but I don't think
> > > > it matters much this or EBUSY.
> > > >
> > > > >> @@ -84,6 +84,7 @@ static void fw_cfg_read_blob(u16 key,
> > > > >> mutex_unlock(&fw_cfg_dev_lock);
> > > > >>
> > > > >> acpi_release_global_lock(glk);
> > > > >> + return count;
> > > > >> }
> > > > >>
> > > > >> /* clean up fw_cfg device i/o */
> > > > >> @@ -165,8 +166,9 @@ static int fw_cfg_do_platform_probe(struct platform_device *pdev)
> > > > >> }
> > > > >>
> > > > >> /* verify fw_cfg device signature */
> > > > >> - fw_cfg_read_blob(FW_CFG_SIGNATURE, sig, 0, FW_CFG_SIG_SIZE);
> > > > >> - if (memcmp(sig, "QEMU", FW_CFG_SIG_SIZE) != 0) {
> > > > >> + if (fw_cfg_read_blob(FW_CFG_SIGNATURE, sig,
> > > > >> + 0, FW_CFG_SIG_SIZE) < 0 ||
> > > > >> + memcmp(sig, "QEMU", FW_CFG_SIG_SIZE) != 0) {
> > > > >> fw_cfg_io_cleanup();
> > > > >> return -ENODEV;
> > > > >> }
> > > > >> @@ -326,8 +328,7 @@ static ssize_t fw_cfg_sysfs_read_raw(struct file *filp, struct kobject *kobj,
> > > > >> if (count > entry->size - pos)
> > > > >> count = entry->size - pos;
> > > > >>
> > > > >> - fw_cfg_read_blob(entry->select, buf, pos, count);
> > > > >> - return count;
> > > > >> + return fw_cfg_read_blob(entry->select, buf, pos, count);
> > > > >> }
> > > > >>
> > > > >> static struct bin_attribute fw_cfg_sysfs_attr_raw = {
> > > > >> @@ -483,7 +484,11 @@ static int fw_cfg_register_dir_entries(void)
> > > > >> struct fw_cfg_file *dir;
> > > > >> size_t dir_size;
> > > > >>
> > > > >> - fw_cfg_read_blob(FW_CFG_FILE_DIR, &files_count, 0, sizeof(files_count));
> > > > >> + ret = fw_cfg_read_blob(FW_CFG_FILE_DIR, &files_count,
> > > > >> + 0, sizeof(files_count));
> > > > >> + if (ret < 0)
> > > > >> + return ret;
> > > > >> +
> > > > >> count = be32_to_cpu(files_count);
> > > > >> dir_size = count * sizeof(struct fw_cfg_file);
> > > > >>
> > > > >> @@ -491,7 +496,10 @@ static int fw_cfg_register_dir_entries(void)
> > > > >> if (!dir)
> > > > >> return -ENOMEM;
> > > > >>
> > > > >> - fw_cfg_read_blob(FW_CFG_FILE_DIR, dir, sizeof(files_count), dir_size);
> > > > >> + ret = fw_cfg_read_blob(FW_CFG_FILE_DIR, dir,
> > > > >> + sizeof(files_count), dir_size);
> > > > >> + if (ret < 0)
> > > > >> + goto end;
> > > > >>
> > > > >> for (i = 0; i < count; i++) {
> > > > >> ret = fw_cfg_register_file(&dir[i]);
> > > > >> @@ -499,6 +507,7 @@ static int fw_cfg_register_dir_entries(void)
> > > > >> break;
> > > > >> }
> > > > >>
> > > > >> +end:
> > > > >> kfree(dir);
> > > > >> return ret;
> > > > >> }
> > > > >> @@ -539,7 +548,10 @@ static int fw_cfg_sysfs_probe(struct platform_device *pdev)
> > > > >> goto err_probe;
> > > > >>
> > > > >> /* get revision number, add matching top-level attribute */
> > > > >> - fw_cfg_read_blob(FW_CFG_ID, &rev, 0, sizeof(rev));
> > > > >> + err = fw_cfg_read_blob(FW_CFG_ID, &rev, 0, sizeof(rev));
> > > > >> + if (err < 0)
> > > > >> + goto err_probe;
> > > > >> +
> > > > >> fw_cfg_rev = le32_to_cpu(rev);
> > > > >> err = sysfs_create_file(fw_cfg_top_ko, &fw_cfg_rev_attr.attr);
> > > > >> if (err)
> > > > >
> > > > > I think that this is the only case where it's not doing the right thing right now in
> > > > > that it shows 0 as the revision to the users. Is it worth failing probe
> > > > > here? We could just skip the attribute, could we not?
> > > >
> > > > I think it's best to fail the probe if we have a read failure at that time.
> > >
> > > I'd rather we just dropped this attribute completely.
> > > Why is it there?
> > > Does any userspace actually need it?
> > > Gabriel?
> >
> > I'd recommend keeping it: `cat /sys/firmware/qemu_fw_cfg/rev` is how
> > you can easily tell if, e.g., dma is supported :)
> >
> > If you end up with a '0' there, it's because locking ACPI with
> > WAIT_FOREVER failed with something other than AE_NOT_CONFIGURED, which
> > should never happen, so we're off the reservation, having morally
> > failed the equivalent of an assertion.
> >
> > Perhaps memset to something other than 0 (all-Fs, so we'd get a -1 for
> > rev)?
> >
> > Thanks,
> > --Gabriel
>
> Why not just skip adding the attribute on this error?

Skipping only on error should be OK (since it's "never supposed to
happen" anyway).

--G