2021-06-04 16:37:41

by Pratik R. Sampat

[permalink] [raw]
Subject: [RFC] powerpc/pseries: Interface to represent PAPR firmware attributes

Adds a generic interface to represent the energy and frequency related
PAPR attributes on the system using the new H_CALL
"H_GET_ENERGY_SCALE_INFO".

H_GET_EM_PARMS H_CALL was previously responsible for exporting this
information in the lparcfg, however the H_GET_EM_PARMS H_CALL
will be deprecated P10 onwards.

The H_GET_ENERGY_SCALE_INFO H_CALL is of the following call format:
hcall(
uint64 H_GET_ENERGY_SCALE_INFO, // Get energy scale info
uint64 flags, // Per the flag request
uint64 firstAttributeId,// The attribute id
uint64 bufferAddress, // The logical address of the output buffer
uint64 bufferSize // The size in bytes of the output buffer
);

This H_CALL can query either all the attributes at once with
firstAttributeId = 0, flags = 0 as well as query only one attribute
at a time with firstAttributeId = id

The output buffer consists of the following
1. number of attributes - 8 bytes
2. array offset to the data location - 8 bytes
3. version info - 1 byte
4. A data array of size num attributes, which contains the following:
a. attribute ID - 8 bytes
b. attribute value in number - 8 bytes
c. attribute name in string - 64 bytes
d. attribute value in string - 64 bytes

The new H_CALL exports information in direct string value format, hence
a new interface has been introduced in /sys/firmware/papr to export
this information to userspace in an extensible pass-through format.
The H_CALL returns the name, numeric value and string value. As string
values are in human readable format, therefore if the string value
exists then that is given precedence over the numeric value.

The format of exposing the sysfs information is as follows:
/sys/firmware/papr/
|-- attr_0_name
|-- attr_0_val
|-- attr_1_name
|-- attr_1_val
...

The energy information that is exported is useful for userspace tools
such as powerpc-utils. Currently these tools infer the
"power_mode_data" value in the lparcfg, which in turn is obtained from
the to be deprecated H_GET_EM_PARMS H_CALL.
On future platforms, such userspace utilities will have to look at the
data returned from the new H_CALL being populated in this new sysfs
interface and report this information directly without the need of
interpretation.

Signed-off-by: Pratik R. Sampat <[email protected]>
---
Documentation/ABI/testing/sysfs-firmware-papr | 24 +++
arch/powerpc/include/asm/hvcall.h | 21 +-
arch/powerpc/kvm/trace_hv.h | 1 +
arch/powerpc/platforms/pseries/Makefile | 3 +-
.../pseries/papr_platform_attributes.c | 203 ++++++++++++++++++
5 files changed, 250 insertions(+), 2 deletions(-)
create mode 100644 Documentation/ABI/testing/sysfs-firmware-papr
create mode 100644 arch/powerpc/platforms/pseries/papr_platform_attributes.c

diff --git a/Documentation/ABI/testing/sysfs-firmware-papr b/Documentation/ABI/testing/sysfs-firmware-papr
new file mode 100644
index 000000000000..1c040b44ac3b
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-firmware-papr
@@ -0,0 +1,24 @@
+What: /sys/firmware/papr
+Date: June 2021
+Contact: Linux for PowerPC mailing list <[email protected]>
+Description : Director hosting a set of platform attributes on Linux
+ running as a PAPR guest.
+
+ Each file in a directory contains a platform
+ attribute pertaining to performance/energy-savings
+ mode and processor frequency.
+
+What: /sys/firmware/papr/attr_X_name
+ /sys/firmware/papr/attr_X_val
+Date: June 2021
+Contact: Linux for PowerPC mailing list <[email protected]>
+Description: PAPR attributes directory for POWERVM servers
+
+ This directory provides PAPR information. It
+ contains below sysfs attributes:
+
+ - attr_X_name: File contains the name of
+ attribute X
+
+ - attr_X_val: Numeric/string value of
+ attribute X
diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h
index e3b29eda8074..19a2a8c77a49 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -316,7 +316,8 @@
#define H_SCM_PERFORMANCE_STATS 0x418
#define H_RPT_INVALIDATE 0x448
#define H_SCM_FLUSH 0x44C
-#define MAX_HCALL_OPCODE H_SCM_FLUSH
+#define H_GET_ENERGY_SCALE_INFO 0x450
+#define MAX_HCALL_OPCODE H_GET_ENERGY_SCALE_INFO

/* Scope args for H_SCM_UNBIND_ALL */
#define H_UNBIND_SCOPE_ALL (0x1)
@@ -631,6 +632,24 @@ struct hv_gpci_request_buffer {
uint8_t bytes[HGPCI_MAX_DATA_BYTES];
} __packed;

+#define MAX_EM_ATTRS 10
+#define MAX_EM_DATA_BYTES \
+ (sizeof(struct energy_scale_attributes) * MAX_EM_ATTRS)
+struct energy_scale_attributes {
+ __be64 attr_id;
+ __be64 attr_value;
+ unsigned char attr_desc[64];
+ unsigned char attr_value_desc[64];
+} __packed;
+
+struct hv_energy_scale_buffer {
+ __be64 num_attr;
+ __be64 array_offset;
+ __u8 data_header_version;
+ unsigned char data[MAX_EM_DATA_BYTES];
+} __packed;
+
+
#endif /* __ASSEMBLY__ */
#endif /* __KERNEL__ */
#endif /* _ASM_POWERPC_HVCALL_H */
diff --git a/arch/powerpc/kvm/trace_hv.h b/arch/powerpc/kvm/trace_hv.h
index 830a126e095d..38cd0ed0a617 100644
--- a/arch/powerpc/kvm/trace_hv.h
+++ b/arch/powerpc/kvm/trace_hv.h
@@ -115,6 +115,7 @@
{H_VASI_STATE, "H_VASI_STATE"}, \
{H_ENABLE_CRQ, "H_ENABLE_CRQ"}, \
{H_GET_EM_PARMS, "H_GET_EM_PARMS"}, \
+ {H_GET_ENERGY_SCALE_INFO, "H_GET_ENERGY_SCALE_INFO"}, \
{H_SET_MPP, "H_SET_MPP"}, \
{H_GET_MPP, "H_GET_MPP"}, \
{H_HOME_NODE_ASSOCIATIVITY, "H_HOME_NODE_ASSOCIATIVITY"}, \
diff --git a/arch/powerpc/platforms/pseries/Makefile b/arch/powerpc/platforms/pseries/Makefile
index c8a2b0b05ac0..d14fca89ac25 100644
--- a/arch/powerpc/platforms/pseries/Makefile
+++ b/arch/powerpc/platforms/pseries/Makefile
@@ -6,7 +6,8 @@ obj-y := lpar.o hvCall.o nvram.o reconfig.o \
of_helpers.o \
setup.o iommu.o event_sources.o ras.o \
firmware.o power.o dlpar.o mobility.o rng.o \
- pci.o pci_dlpar.o eeh_pseries.o msi.o
+ pci.o pci_dlpar.o eeh_pseries.o msi.o \
+ papr_platform_attributes.o
obj-$(CONFIG_SMP) += smp.o
obj-$(CONFIG_SCANLOG) += scanlog.o
obj-$(CONFIG_KEXEC_CORE) += kexec.o
diff --git a/arch/powerpc/platforms/pseries/papr_platform_attributes.c b/arch/powerpc/platforms/pseries/papr_platform_attributes.c
new file mode 100644
index 000000000000..8818877ff47e
--- /dev/null
+++ b/arch/powerpc/platforms/pseries/papr_platform_attributes.c
@@ -0,0 +1,203 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * PowerPC64 LPAR PAPR Information Driver
+ *
+ * This driver creates a sys file at /sys/firmware/papr/ which contains
+ * files keyword - value pairs that specify energy configuration of the system.
+ *
+ * Copyright 2021 IBM Corp.
+ */
+
+#include <linux/module.h>
+#include <linux/types.h>
+#include <linux/errno.h>
+#include <linux/init.h>
+#include <linux/seq_file.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+#include <linux/hugetlb.h>
+#include <asm/lppaca.h>
+#include <asm/hvcall.h>
+#include <asm/firmware.h>
+#include <asm/time.h>
+#include <asm/prom.h>
+#include <asm/vdso_datapage.h>
+#include <asm/vio.h>
+#include <asm/mmu.h>
+#include <asm/machdep.h>
+#include <asm/drmem.h>
+
+#include "pseries.h"
+
+#define MAX_KOBJ_ATTRS 2
+
+struct papr_attr {
+ u64 id;
+ struct kobj_attribute attr;
+} *pgattrs;
+
+struct kobject *papr_kobj;
+struct hv_energy_scale_buffer *em_buf;
+struct energy_scale_attributes *ea;
+
+static ssize_t papr_show_name(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ char *buf)
+{
+ struct papr_attr *pattr = container_of(attr, struct papr_attr, attr);
+ int idx, ret = 0;
+
+ /*
+ * We do not expect the name to change, hence use the old value
+ * and save a HCALL
+ */
+ for (idx = 0; idx < be64_to_cpu(em_buf->num_attr); idx++) {
+ if (pattr->id == be64_to_cpu(ea[idx].attr_id)) {
+ ret = sprintf(buf, "%s\n", ea[idx].attr_desc);
+ if (ret < 0)
+ ret = -EIO;
+ break;
+ }
+ }
+
+ return ret;
+}
+
+static ssize_t papr_show_val(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ char *buf)
+{
+ struct papr_attr *pattr = container_of(attr, struct papr_attr, attr);
+ struct hv_energy_scale_buffer *t_buf;
+ struct energy_scale_attributes *t_ea;
+ int data_offset, ret = 0;
+
+ t_buf = kmalloc(sizeof(*t_buf), GFP_KERNEL);
+ if (t_buf == NULL)
+ return -ENOMEM;
+
+ ret = plpar_hcall_norets(H_GET_ENERGY_SCALE_INFO, 0,
+ pattr->id, virt_to_phys(t_buf),
+ sizeof(*t_buf));
+
+ if (ret != H_SUCCESS) {
+ pr_warn("hcall faiiled: H_GET_ENERGY_SCALE_INFO");
+ goto out;
+ }
+
+ data_offset = be64_to_cpu(t_buf->array_offset) -
+ (sizeof(t_buf->num_attr) +
+ sizeof(t_buf->array_offset) +
+ sizeof(t_buf->data_header_version));
+
+ t_ea = (struct energy_scale_attributes *) &t_buf->data[data_offset];
+
+ /* Prioritize string values over numerical */
+ if (strlen(t_ea->attr_value_desc) != 0)
+ ret = sprintf(buf, "%s\n", t_ea->attr_value_desc);
+ else
+ ret = sprintf(buf, "%llu\n", be64_to_cpu(t_ea->attr_value));
+ if (ret < 0)
+ ret = -EIO;
+out:
+ kfree(t_buf);
+ return ret;
+}
+
+static struct papr_ops_info {
+ const char *attr_name;
+ ssize_t (*show)(struct kobject *kobj, struct kobj_attribute *attr,
+ char *buf);
+} ops_info[MAX_KOBJ_ATTRS] = {
+ { "name", papr_show_name },
+ { "val", papr_show_val },
+};
+
+static int __init papr_init(void)
+{
+ uint64_t num_attr;
+ int ret, idx, i, data_offset;
+
+ em_buf = kmalloc(sizeof(*em_buf), GFP_KERNEL);
+ if (em_buf == NULL)
+ return -ENOMEM;
+ /*
+ * hcall(
+ * uint64 H_GET_ENERGY_SCALE_INFO, // Get energy scale info
+ * uint64 flags, // Per the flag request
+ * uint64 firstAttributeId, // The attribute id
+ * uint64 bufferAddress, // The logical address of the output buffer
+ * uint64 bufferSize); // The size in bytes of the output buffer
+ */
+ ret = plpar_hcall_norets(H_GET_ENERGY_SCALE_INFO, 0, 0,
+ virt_to_phys(em_buf), sizeof(*em_buf));
+
+ if (!firmware_has_feature(FW_FEATURE_LPAR) || ret != H_SUCCESS ||
+ em_buf->data_header_version != 0x1) {
+ pr_warn("hcall faiiled: H_GET_ENERGY_SCALE_INFO");
+ goto out;
+ }
+
+ num_attr = be64_to_cpu(em_buf->num_attr);
+
+ /*
+ * Typecast the energy buffer to the attribute structure at the offset
+ * specified in the buffer
+ */
+ data_offset = be64_to_cpu(em_buf->array_offset) -
+ (sizeof(em_buf->num_attr) +
+ sizeof(em_buf->array_offset) +
+ sizeof(em_buf->data_header_version));
+
+ ea = (struct energy_scale_attributes *) &em_buf->data[data_offset];
+
+ papr_kobj = kobject_create_and_add("papr", firmware_kobj);
+ if (!papr_kobj) {
+ pr_warn("kobject_create_and_add papr failed\n");
+ goto out_kobj;
+ }
+
+ for (idx = 0; idx < num_attr; idx++) {
+ pgattrs = kcalloc(MAX_KOBJ_ATTRS,
+ sizeof(*pgattrs),
+ GFP_KERNEL);
+ if (!pgattrs)
+ goto out_kobj;
+
+ /*
+ * Create the sysfs attribute hierarchy for each PAPR
+ * property found
+ */
+ for (i = 0; i < MAX_KOBJ_ATTRS; i++) {
+ char buf[20];
+
+ pgattrs[i].id = be64_to_cpu(ea[idx].attr_id);
+ sysfs_attr_init(&pgattrs[i].attr.attr);
+ sprintf(buf, "%s_%d_%s", "attr", idx,
+ ops_info[i].attr_name);
+ pgattrs[i].attr.attr.name = buf;
+ pgattrs[i].attr.attr.mode = 0444;
+ pgattrs[i].attr.show = ops_info[i].show;
+
+ if (sysfs_create_file(papr_kobj, &pgattrs[i].attr.attr)) {
+ pr_warn("Failed to create papr file %s\n",
+ pgattrs[i].attr.attr.name);
+ goto out_pgattrs;
+ }
+ }
+ }
+
+ return 0;
+
+out_pgattrs:
+ for (i = 0; i < MAX_KOBJ_ATTRS; i++)
+ kfree(pgattrs);
+out_kobj:
+ kobject_put(papr_kobj);
+out:
+ kfree(em_buf);
+
+ return -ENOMEM;
+}
+
+machine_device_initcall(pseries, papr_init);
--
2.30.2


2021-06-09 06:19:27

by Pratik R. Sampat

[permalink] [raw]
Subject: Re: [RFC] powerpc/pseries: Interface to represent PAPR firmware attributes

I've implemented a POC using this interface for the powerpc-utils'
ppc64_cpu --frequency command-line tool to utilize this information
in userspace.

The POC has been hosted here:
https://github.com/pratiksampat/powerpc-utils/tree/H_GET_ENERGY_SCALE_INFO
and based on comments I suggestions I can further improve the
parsing logic from this initial implementation.

Sample output from the powerpc-utils tool is as follows:

# ppc64_cpu --frequency
Power and Performance Mode: XXXX
Idle Power Saver Status : XXXX
Processor Folding Status : XXXX --> Printed if Idle power save status is supported

Platform reported frequencies --> Frequencies reported from the platform's H_CALL i.e PAPR interface
min        :    NNNN GHz
max        :    NNNN GHz
static    :    NNNN GHz

Tool Computed frequencies
min        :    NNNN GHz (cpu XX)
max        :    NNNN GHz (cpu XX)
avg        :    NNNN GHz


On 04/06/21 10:05 pm, Pratik R. Sampat wrote:
> Adds a generic interface to represent the energy and frequency related
> PAPR attributes on the system using the new H_CALL
> "H_GET_ENERGY_SCALE_INFO".
>
> H_GET_EM_PARMS H_CALL was previously responsible for exporting this
> information in the lparcfg, however the H_GET_EM_PARMS H_CALL
> will be deprecated P10 onwards.
>
> The H_GET_ENERGY_SCALE_INFO H_CALL is of the following call format:
> hcall(
> uint64 H_GET_ENERGY_SCALE_INFO, // Get energy scale info
> uint64 flags, // Per the flag request
> uint64 firstAttributeId,// The attribute id
> uint64 bufferAddress, // The logical address of the output buffer
> uint64 bufferSize // The size in bytes of the output buffer
> );
>
> This H_CALL can query either all the attributes at once with
> firstAttributeId = 0, flags = 0 as well as query only one attribute
> at a time with firstAttributeId = id
>
> The output buffer consists of the following
> 1. number of attributes - 8 bytes
> 2. array offset to the data location - 8 bytes
> 3. version info - 1 byte
> 4. A data array of size num attributes, which contains the following:
> a. attribute ID - 8 bytes
> b. attribute value in number - 8 bytes
> c. attribute name in string - 64 bytes
> d. attribute value in string - 64 bytes
>
> The new H_CALL exports information in direct string value format, hence
> a new interface has been introduced in /sys/firmware/papr to export
> this information to userspace in an extensible pass-through format.
> The H_CALL returns the name, numeric value and string value. As string
> values are in human readable format, therefore if the string value
> exists then that is given precedence over the numeric value.
>
> The format of exposing the sysfs information is as follows:
> /sys/firmware/papr/
> |-- attr_0_name
> |-- attr_0_val
> |-- attr_1_name
> |-- attr_1_val
> ...
>
> The energy information that is exported is useful for userspace tools
> such as powerpc-utils. Currently these tools infer the
> "power_mode_data" value in the lparcfg, which in turn is obtained from
> the to be deprecated H_GET_EM_PARMS H_CALL.
> On future platforms, such userspace utilities will have to look at the
> data returned from the new H_CALL being populated in this new sysfs
> interface and report this information directly without the need of
> interpretation.
>
> Signed-off-by: Pratik R. Sampat <[email protected]>
> ---
> Documentation/ABI/testing/sysfs-firmware-papr | 24 +++
> arch/powerpc/include/asm/hvcall.h | 21 +-
> arch/powerpc/kvm/trace_hv.h | 1 +
> arch/powerpc/platforms/pseries/Makefile | 3 +-
> .../pseries/papr_platform_attributes.c | 203 ++++++++++++++++++
> 5 files changed, 250 insertions(+), 2 deletions(-)
> create mode 100644 Documentation/ABI/testing/sysfs-firmware-papr
> create mode 100644 arch/powerpc/platforms/pseries/papr_platform_attributes.c
>
> diff --git a/Documentation/ABI/testing/sysfs-firmware-papr b/Documentation/ABI/testing/sysfs-firmware-papr
> new file mode 100644
> index 000000000000..1c040b44ac3b
> --- /dev/null
> +++ b/Documentation/ABI/testing/sysfs-firmware-papr
> @@ -0,0 +1,24 @@
> +What: /sys/firmware/papr
> +Date: June 2021
> +Contact: Linux for PowerPC mailing list <[email protected]>
> +Description : Director hosting a set of platform attributes on Linux
> + running as a PAPR guest.
> +
> + Each file in a directory contains a platform
> + attribute pertaining to performance/energy-savings
> + mode and processor frequency.
> +
> +What: /sys/firmware/papr/attr_X_name
> + /sys/firmware/papr/attr_X_val
> +Date: June 2021
> +Contact: Linux for PowerPC mailing list <[email protected]>
> +Description: PAPR attributes directory for POWERVM servers
> +
> + This directory provides PAPR information. It
> + contains below sysfs attributes:
> +
> + - attr_X_name: File contains the name of
> + attribute X
> +
> + - attr_X_val: Numeric/string value of
> + attribute X
> diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h
> index e3b29eda8074..19a2a8c77a49 100644
> --- a/arch/powerpc/include/asm/hvcall.h
> +++ b/arch/powerpc/include/asm/hvcall.h
> @@ -316,7 +316,8 @@
> #define H_SCM_PERFORMANCE_STATS 0x418
> #define H_RPT_INVALIDATE 0x448
> #define H_SCM_FLUSH 0x44C
> -#define MAX_HCALL_OPCODE H_SCM_FLUSH
> +#define H_GET_ENERGY_SCALE_INFO 0x450
> +#define MAX_HCALL_OPCODE H_GET_ENERGY_SCALE_INFO
>
> /* Scope args for H_SCM_UNBIND_ALL */
> #define H_UNBIND_SCOPE_ALL (0x1)
> @@ -631,6 +632,24 @@ struct hv_gpci_request_buffer {
> uint8_t bytes[HGPCI_MAX_DATA_BYTES];
> } __packed;
>
> +#define MAX_EM_ATTRS 10
> +#define MAX_EM_DATA_BYTES \
> + (sizeof(struct energy_scale_attributes) * MAX_EM_ATTRS)
> +struct energy_scale_attributes {
> + __be64 attr_id;
> + __be64 attr_value;
> + unsigned char attr_desc[64];
> + unsigned char attr_value_desc[64];
> +} __packed;
> +
> +struct hv_energy_scale_buffer {
> + __be64 num_attr;
> + __be64 array_offset;
> + __u8 data_header_version;
> + unsigned char data[MAX_EM_DATA_BYTES];
> +} __packed;
> +
> +
> #endif /* __ASSEMBLY__ */
> #endif /* __KERNEL__ */
> #endif /* _ASM_POWERPC_HVCALL_H */
> diff --git a/arch/powerpc/kvm/trace_hv.h b/arch/powerpc/kvm/trace_hv.h
> index 830a126e095d..38cd0ed0a617 100644
> --- a/arch/powerpc/kvm/trace_hv.h
> +++ b/arch/powerpc/kvm/trace_hv.h
> @@ -115,6 +115,7 @@
> {H_VASI_STATE, "H_VASI_STATE"}, \
> {H_ENABLE_CRQ, "H_ENABLE_CRQ"}, \
> {H_GET_EM_PARMS, "H_GET_EM_PARMS"}, \
> + {H_GET_ENERGY_SCALE_INFO, "H_GET_ENERGY_SCALE_INFO"}, \
> {H_SET_MPP, "H_SET_MPP"}, \
> {H_GET_MPP, "H_GET_MPP"}, \
> {H_HOME_NODE_ASSOCIATIVITY, "H_HOME_NODE_ASSOCIATIVITY"}, \
> diff --git a/arch/powerpc/platforms/pseries/Makefile b/arch/powerpc/platforms/pseries/Makefile
> index c8a2b0b05ac0..d14fca89ac25 100644
> --- a/arch/powerpc/platforms/pseries/Makefile
> +++ b/arch/powerpc/platforms/pseries/Makefile
> @@ -6,7 +6,8 @@ obj-y := lpar.o hvCall.o nvram.o reconfig.o \
> of_helpers.o \
> setup.o iommu.o event_sources.o ras.o \
> firmware.o power.o dlpar.o mobility.o rng.o \
> - pci.o pci_dlpar.o eeh_pseries.o msi.o
> + pci.o pci_dlpar.o eeh_pseries.o msi.o \
> + papr_platform_attributes.o
> obj-$(CONFIG_SMP) += smp.o
> obj-$(CONFIG_SCANLOG) += scanlog.o
> obj-$(CONFIG_KEXEC_CORE) += kexec.o
> diff --git a/arch/powerpc/platforms/pseries/papr_platform_attributes.c b/arch/powerpc/platforms/pseries/papr_platform_attributes.c
> new file mode 100644
> index 000000000000..8818877ff47e
> --- /dev/null
> +++ b/arch/powerpc/platforms/pseries/papr_platform_attributes.c
> @@ -0,0 +1,203 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * PowerPC64 LPAR PAPR Information Driver
> + *
> + * This driver creates a sys file at /sys/firmware/papr/ which contains
> + * files keyword - value pairs that specify energy configuration of the system.
> + *
> + * Copyright 2021 IBM Corp.
> + */
> +
> +#include <linux/module.h>
> +#include <linux/types.h>
> +#include <linux/errno.h>
> +#include <linux/init.h>
> +#include <linux/seq_file.h>
> +#include <linux/slab.h>
> +#include <linux/uaccess.h>
> +#include <linux/hugetlb.h>
> +#include <asm/lppaca.h>
> +#include <asm/hvcall.h>
> +#include <asm/firmware.h>
> +#include <asm/time.h>
> +#include <asm/prom.h>
> +#include <asm/vdso_datapage.h>
> +#include <asm/vio.h>
> +#include <asm/mmu.h>
> +#include <asm/machdep.h>
> +#include <asm/drmem.h>
> +
> +#include "pseries.h"
> +
> +#define MAX_KOBJ_ATTRS 2
> +
> +struct papr_attr {
> + u64 id;
> + struct kobj_attribute attr;
> +} *pgattrs;
> +
> +struct kobject *papr_kobj;
> +struct hv_energy_scale_buffer *em_buf;
> +struct energy_scale_attributes *ea;
> +
> +static ssize_t papr_show_name(struct kobject *kobj,
> + struct kobj_attribute *attr,
> + char *buf)
> +{
> + struct papr_attr *pattr = container_of(attr, struct papr_attr, attr);
> + int idx, ret = 0;
> +
> + /*
> + * We do not expect the name to change, hence use the old value
> + * and save a HCALL
> + */
> + for (idx = 0; idx < be64_to_cpu(em_buf->num_attr); idx++) {
> + if (pattr->id == be64_to_cpu(ea[idx].attr_id)) {
> + ret = sprintf(buf, "%s\n", ea[idx].attr_desc);
> + if (ret < 0)
> + ret = -EIO;
> + break;
> + }
> + }
> +
> + return ret;
> +}
> +
> +static ssize_t papr_show_val(struct kobject *kobj,
> + struct kobj_attribute *attr,
> + char *buf)
> +{
> + struct papr_attr *pattr = container_of(attr, struct papr_attr, attr);
> + struct hv_energy_scale_buffer *t_buf;
> + struct energy_scale_attributes *t_ea;
> + int data_offset, ret = 0;
> +
> + t_buf = kmalloc(sizeof(*t_buf), GFP_KERNEL);
> + if (t_buf == NULL)
> + return -ENOMEM;
> +
> + ret = plpar_hcall_norets(H_GET_ENERGY_SCALE_INFO, 0,
> + pattr->id, virt_to_phys(t_buf),
> + sizeof(*t_buf));
> +
> + if (ret != H_SUCCESS) {
> + pr_warn("hcall faiiled: H_GET_ENERGY_SCALE_INFO");
> + goto out;
> + }
> +
> + data_offset = be64_to_cpu(t_buf->array_offset) -
> + (sizeof(t_buf->num_attr) +
> + sizeof(t_buf->array_offset) +
> + sizeof(t_buf->data_header_version));
> +
> + t_ea = (struct energy_scale_attributes *) &t_buf->data[data_offset];
> +
> + /* Prioritize string values over numerical */
> + if (strlen(t_ea->attr_value_desc) != 0)
> + ret = sprintf(buf, "%s\n", t_ea->attr_value_desc);
> + else
> + ret = sprintf(buf, "%llu\n", be64_to_cpu(t_ea->attr_value));
> + if (ret < 0)
> + ret = -EIO;
> +out:
> + kfree(t_buf);
> + return ret;
> +}
> +
> +static struct papr_ops_info {
> + const char *attr_name;
> + ssize_t (*show)(struct kobject *kobj, struct kobj_attribute *attr,
> + char *buf);
> +} ops_info[MAX_KOBJ_ATTRS] = {
> + { "name", papr_show_name },
> + { "val", papr_show_val },
> +};
> +
> +static int __init papr_init(void)
> +{
> + uint64_t num_attr;
> + int ret, idx, i, data_offset;
> +
> + em_buf = kmalloc(sizeof(*em_buf), GFP_KERNEL);
> + if (em_buf == NULL)
> + return -ENOMEM;
> + /*
> + * hcall(
> + * uint64 H_GET_ENERGY_SCALE_INFO, // Get energy scale info
> + * uint64 flags, // Per the flag request
> + * uint64 firstAttributeId, // The attribute id
> + * uint64 bufferAddress, // The logical address of the output buffer
> + * uint64 bufferSize); // The size in bytes of the output buffer
> + */
> + ret = plpar_hcall_norets(H_GET_ENERGY_SCALE_INFO, 0, 0,
> + virt_to_phys(em_buf), sizeof(*em_buf));
> +
> + if (!firmware_has_feature(FW_FEATURE_LPAR) || ret != H_SUCCESS ||
> + em_buf->data_header_version != 0x1) {
> + pr_warn("hcall faiiled: H_GET_ENERGY_SCALE_INFO");
> + goto out;
> + }
> +
> + num_attr = be64_to_cpu(em_buf->num_attr);
> +
> + /*
> + * Typecast the energy buffer to the attribute structure at the offset
> + * specified in the buffer
> + */
> + data_offset = be64_to_cpu(em_buf->array_offset) -
> + (sizeof(em_buf->num_attr) +
> + sizeof(em_buf->array_offset) +
> + sizeof(em_buf->data_header_version));
> +
> + ea = (struct energy_scale_attributes *) &em_buf->data[data_offset];
> +
> + papr_kobj = kobject_create_and_add("papr", firmware_kobj);
> + if (!papr_kobj) {
> + pr_warn("kobject_create_and_add papr failed\n");
> + goto out_kobj;
> + }
> +
> + for (idx = 0; idx < num_attr; idx++) {
> + pgattrs = kcalloc(MAX_KOBJ_ATTRS,
> + sizeof(*pgattrs),
> + GFP_KERNEL);
> + if (!pgattrs)
> + goto out_kobj;
> +
> + /*
> + * Create the sysfs attribute hierarchy for each PAPR
> + * property found
> + */
> + for (i = 0; i < MAX_KOBJ_ATTRS; i++) {
> + char buf[20];
> +
> + pgattrs[i].id = be64_to_cpu(ea[idx].attr_id);
> + sysfs_attr_init(&pgattrs[i].attr.attr);
> + sprintf(buf, "%s_%d_%s", "attr", idx,
> + ops_info[i].attr_name);
> + pgattrs[i].attr.attr.name = buf;
> + pgattrs[i].attr.attr.mode = 0444;
> + pgattrs[i].attr.show = ops_info[i].show;
> +
> + if (sysfs_create_file(papr_kobj, &pgattrs[i].attr.attr)) {
> + pr_warn("Failed to create papr file %s\n",
> + pgattrs[i].attr.attr.name);
> + goto out_pgattrs;
> + }
> + }
> + }
> +
> + return 0;
> +
> +out_pgattrs:
> + for (i = 0; i < MAX_KOBJ_ATTRS; i++)
> + kfree(pgattrs);
> +out_kobj:
> + kobject_put(papr_kobj);
> +out:
> + kfree(em_buf);
> +
> + return -ENOMEM;
> +}
> +
> +machine_device_initcall(pseries, papr_init);

2021-06-09 14:20:00

by Pratik R. Sampat

[permalink] [raw]
Subject: Re: [RFC] powerpc/pseries: Interface to represent PAPR firmware attributes

Hello,
Thank you for your comments on the design.

On 09/06/21 3:43 am, Fabiano Rosas wrote:
> "Pratik R. Sampat" <[email protected]> writes:
>
> Hi, I have some general comments and questions, mostly trying to
> understand design of the hcall and use cases of the sysfs data:
>
>> Adds a generic interface to represent the energy and frequency related
>> PAPR attributes on the system using the new H_CALL
>> "H_GET_ENERGY_SCALE_INFO".
>>
>> H_GET_EM_PARMS H_CALL was previously responsible for exporting this
>> information in the lparcfg, however the H_GET_EM_PARMS H_CALL
>> will be deprecated P10 onwards.
>>
>> The H_GET_ENERGY_SCALE_INFO H_CALL is of the following call format:
>> hcall(
>> uint64 H_GET_ENERGY_SCALE_INFO, // Get energy scale info
>> uint64 flags, // Per the flag request
>> uint64 firstAttributeId,// The attribute id
>> uint64 bufferAddress, // The logical address of the output buffer
> Instead of logical address, guest address or guest physical address
> would be more precise.

Yes, the name guest physical address makes more sense for this attribute.
The term logical address had me confused too when I first read it in the ACR,
however that isn't the case.

I'll change it to guest physical address here. Thanks for pointing out.

>
>> uint64 bufferSize // The size in bytes of the output buffer
>> );
>>
>> This H_CALL can query either all the attributes at once with
>> firstAttributeId = 0, flags = 0 as well as query only one attribute
>> at a time with firstAttributeId = id
>>
>> The output buffer consists of the following
>> 1. number of attributes - 8 bytes
>> 2. array offset to the data location - 8 bytes
> The offset is from the start of the buffer, isn't it? So not the array
> offset.

Yes,the offset carries information that is to the start of the data buffer.

>> 3. version info - 1 byte
>> 4. A data array of size num attributes, which contains the following:
>> a. attribute ID - 8 bytes
>> b. attribute value in number - 8 bytes
>> c. attribute name in string - 64 bytes
>> d. attribute value in string - 64 bytes
> Is this new hypercall already present in the spec? These seem a bit
> underspecified to me.

Yes, it is present in the spec. I probably summarized a little more than needed
here and I could expand upon below.

The input buffer recives the following data:

1. “flags”:
a. Bit 0: singleAttribute
If set to 1, only return the single attribute matching firstAttributeId.
b. Bits 1-63: Reserved
2. “firstAttributeId”: The first attribute to retrieve
3. “bufferAddress”: The logical real address of the start of the output buffer
4. “bufferSize”: The size in bytes of the output buffer


From the document, the format of the output buffer is as follows:

Table 1 --> output buffer
================================================================================
| Field Name | Byte | Length | Description
| | Offset | in Bytes |
================================================================================
| NumberOf | | | Number of Attributes in Buffer
| AttributesInBuffer | 0x000 | 0x08 |
--------------------------------------------------------------------------------
| AttributeArrayOffset | 0x008 | 0x08 | Byte offset to start of Array
| | | | of Attributes
| | | |
--------------------------------------------------------------------------------
| OutputBufferData | | | Version of the Header.
| HeaderVersion | 0x010 | 0x01 | The header will be always
| AttributesInBuffer | | | backward compatible, and changes
| | | | will not impact the Array of
| | | | attributes.
| | | | Current version = 0x01
--------------------------------------------------------------------------------
| ArrayOfAttributes | | | The array will contain
| | | | "NumberOfAttributesInBuffer"
| | | | array elements not to exceed
| | | | the size of the buffer.
| | | | Layout of the array is
| | | | detailed in Table 2.
--------------------------------------------------------------------------------


Table 2 --> Array of attributes
================================================================================
| Field Name | Byte | Length | Description
| | Offset | in Bytes |
================================================================================
| 1st AttributeId | 0x000 | 0x08 | The ID of the Attribute
--------------------------------------------------------------------------------
| 1st AttributeValue | 0x008 | 0x08 | The numerical value of
| | | | the attribute
--------------------------------------------------------------------------------
| 1st AttributeString | 0x010 | 0x40 | The ASCII string
| Description | | | description of the
| | | | attribute, up to 63
| | | | characters plus a NULL
| | | | terminator.
--------------------------------------------------------------------------------
| 1st AttributeValue | 0x050 | 0x40 | The ASCII string
| StringDescription | | | description of the
| | | | attribute value, up to 63
| | | | characters plus a NULL
| | | | terminator. If this
| | | | contains only a NULL
| | | | terminator, then there is
| | | | no ASCII string
| | | | associated with AttributeValue.
--------------------------------------------------------------------------------
| .... | | |


>
>> The new H_CALL exports information in direct string value format, hence
>> a new interface has been introduced in /sys/firmware/papr to export
> Hm.. Maybe this should be something less generic than "papr"?

The interface naming was inspired from /sys/firmware/opal's naming convention.
We believed the name PAPR could serve as more generic name to be used by both
Linux running on PHYP and linux on KVM.

If you have something more concrete in mind, please let me know. I'm open to
suggestions.

>
>> this information to userspace in an extensible pass-through format.
>> The H_CALL returns the name, numeric value and string value. As string
>> values are in human readable format, therefore if the string value
>> exists then that is given precedence over the numeric value.
> So the hypervisor could simply not send the string representation? How
> will the userspace tell the difference since they are reading everything
> from a file?
>
> Overall I'd say we should give the data in a more structured way and let
> the user-facing tool do the formatting and presentation.

That's a valid concern, the design for this was inspired from hwmon's interface
to housing the sensor information.

One alternative to add more structure to this format could be to introduce:
attr_X_name, attr_X_num_val, attr_X_str_val

However, in some cases like min/max frequency the string value is empty. In
that case the file attr_X_str_val will also be empty.
Is that an acceptable format of having empty files that in some cases will
never be populated?
We also went ahead to confirm with the SPEC team that if a string value exists
in their buffer, that must be given precedence.

Another alternative format could to keep attr_X_name, attr_X_val intact but
change what X means. Currently X is just an iteratively increasing number. But
X can also serve as an ID which we get from H_CALL output buffer.

In this case, we should also include some versioning so that the tool now also
has cognizance of contents of each file.

>> The format of exposing the sysfs information is as follows:
>> /sys/firmware/papr/
>> |-- attr_0_name
>> |-- attr_0_val
>> |-- attr_1_name
>> |-- attr_1_val
>> ...
> How do we keep a stable interface with userspace? Say the hypervisor
> decides to add or remove attributes, change their order, string
> representation, etc? It will inform us via the version field, but that
> is lost when we output this to sysfs.
>
> I get that if the userspace just iterate over the contents of the
> directory then nothing breaks, but there is not much else it could do it
> seems.

Fair point, having the version exposed to the sysfs does seem crucial.

Currently in ppc-utils we iterate over all the information, however as you
rightly pointed out there may be other tools needing just specific information.
The alternative I suggested a few sentences above to include ID based attribute
naming and versioning maybe a more elegant way of solving this problem.

What are your thoughts on a design like this?

>> The energy information that is exported is useful for userspace tools
>> such as powerpc-utils. Currently these tools infer the
>> "power_mode_data" value in the lparcfg, which in turn is obtained from
>> the to be deprecated H_GET_EM_PARMS H_CALL.
>> On future platforms, such userspace utilities will have to look at the
>> data returned from the new H_CALL being populated in this new sysfs
>> interface and report this information directly without the need of
>> interpretation.
>>
>> Signed-off-by: Pratik R. Sampat <[email protected]>

Thanks
Pratik

2021-06-09 16:53:01

by Fabiano Rosas

[permalink] [raw]
Subject: Re: [RFC] powerpc/pseries: Interface to represent PAPR firmware attributes

"Pratik R. Sampat" <[email protected]> writes:

Hi, I have some general comments and questions, mostly trying to
understand design of the hcall and use cases of the sysfs data:

> Adds a generic interface to represent the energy and frequency related
> PAPR attributes on the system using the new H_CALL
> "H_GET_ENERGY_SCALE_INFO".
>
> H_GET_EM_PARMS H_CALL was previously responsible for exporting this
> information in the lparcfg, however the H_GET_EM_PARMS H_CALL
> will be deprecated P10 onwards.
>
> The H_GET_ENERGY_SCALE_INFO H_CALL is of the following call format:
> hcall(
> uint64 H_GET_ENERGY_SCALE_INFO, // Get energy scale info
> uint64 flags, // Per the flag request
> uint64 firstAttributeId,// The attribute id
> uint64 bufferAddress, // The logical address of the output buffer

Instead of logical address, guest address or guest physical address
would be more precise.

> uint64 bufferSize // The size in bytes of the output buffer
> );
>
> This H_CALL can query either all the attributes at once with
> firstAttributeId = 0, flags = 0 as well as query only one attribute
> at a time with firstAttributeId = id
>
> The output buffer consists of the following
> 1. number of attributes - 8 bytes
> 2. array offset to the data location - 8 bytes

The offset is from the start of the buffer, isn't it? So not the array
offset.

> 3. version info - 1 byte
> 4. A data array of size num attributes, which contains the following:
> a. attribute ID - 8 bytes
> b. attribute value in number - 8 bytes
> c. attribute name in string - 64 bytes
> d. attribute value in string - 64 bytes

Is this new hypercall already present in the spec? These seem a bit
underspecified to me.

>
> The new H_CALL exports information in direct string value format, hence
> a new interface has been introduced in /sys/firmware/papr to export

Hm.. Maybe this should be something less generic than "papr"?

> this information to userspace in an extensible pass-through format.
> The H_CALL returns the name, numeric value and string value. As string
> values are in human readable format, therefore if the string value
> exists then that is given precedence over the numeric value.

So the hypervisor could simply not send the string representation? How
will the userspace tell the difference since they are reading everything
from a file?

Overall I'd say we should give the data in a more structured way and let
the user-facing tool do the formatting and presentation.

>
> The format of exposing the sysfs information is as follows:
> /sys/firmware/papr/
> |-- attr_0_name
> |-- attr_0_val
> |-- attr_1_name
> |-- attr_1_val
> ...

How do we keep a stable interface with userspace? Say the hypervisor
decides to add or remove attributes, change their order, string
representation, etc? It will inform us via the version field, but that
is lost when we output this to sysfs.

I get that if the userspace just iterate over the contents of the
directory then nothing breaks, but there is not much else it could do it
seems.

>
> The energy information that is exported is useful for userspace tools
> such as powerpc-utils. Currently these tools infer the
> "power_mode_data" value in the lparcfg, which in turn is obtained from
> the to be deprecated H_GET_EM_PARMS H_CALL.
> On future platforms, such userspace utilities will have to look at the
> data returned from the new H_CALL being populated in this new sysfs
> interface and report this information directly without the need of
> interpretation.
>
> Signed-off-by: Pratik R. Sampat <[email protected]>

2021-06-10 00:05:48

by Fabiano Rosas

[permalink] [raw]
Subject: Re: [RFC] powerpc/pseries: Interface to represent PAPR firmware attributes

Pratik Sampat <[email protected]> writes:

>>> 3. version info - 1 byte
>>> 4. A data array of size num attributes, which contains the following:
>>> a. attribute ID - 8 bytes
>>> b. attribute value in number - 8 bytes
>>> c. attribute name in string - 64 bytes
>>> d. attribute value in string - 64 bytes
>> Is this new hypercall already present in the spec? These seem a bit
>> underspecified to me.
>
> Yes, it is present in the spec. I probably summarized a little more than needed
> here and I could expand upon below.
>
> The input buffer recives the following data:
>
> 1. “flags”:
> a. Bit 0: singleAttribute
> If set to 1, only return the single attribute matching firstAttributeId.
> b. Bits 1-63: Reserved
> 2. “firstAttributeId”: The first attribute to retrieve
> 3. “bufferAddress”: The logical real address of the start of the output buffer
> 4. “bufferSize”: The size in bytes of the output buffer
>
>
> From the document, the format of the output buffer is as follows:
>
> Table 1 --> output buffer
> ================================================================================
> | Field Name | Byte | Length | Description
> | | Offset | in Bytes |
> ================================================================================
> | NumberOf | | | Number of Attributes in Buffer
> | AttributesInBuffer | 0x000 | 0x08 |
> --------------------------------------------------------------------------------
> | AttributeArrayOffset | 0x008 | 0x08 | Byte offset to start of Array
> | | | | of Attributes
> | | | |
> --------------------------------------------------------------------------------
> | OutputBufferData | | | Version of the Header.
> | HeaderVersion | 0x010 | 0x01 | The header will be always
> | AttributesInBuffer | | | backward compatible, and changes
> | | | | will not impact the Array of
> | | | | attributes.
> | | | | Current version = 0x01

This is not clear to me. In the event of a header version change, is the
total set of attributes guaranteed to remain the same? Or only the array
layout? We might not need to expose the version information after all.

> --------------------------------------------------------------------------------
> | ArrayOfAttributes | | | The array will contain
> | | | | "NumberOfAttributesInBuffer"
> | | | | array elements not to exceed
> | | | | the size of the buffer.
> | | | | Layout of the array is
> | | | | detailed in Table 2.
> --------------------------------------------------------------------------------
>
>
> Table 2 --> Array of attributes
> ================================================================================
> | Field Name | Byte | Length | Description
> | | Offset | in Bytes |
> ================================================================================
> | 1st AttributeId | 0x000 | 0x08 | The ID of the Attribute
> --------------------------------------------------------------------------------
> | 1st AttributeValue | 0x008 | 0x08 | The numerical value of
> | | | | the attribute
> --------------------------------------------------------------------------------
> | 1st AttributeString | 0x010 | 0x40 | The ASCII string
> | Description | | | description of the
> | | | | attribute, up to 63
> | | | | characters plus a NULL
> | | | | terminator.

There is a slight disconnect in that this is called "description" by the
spec, which makes me think they could eventually have something more
verbose than what you'd expect from "name".

So they could give us either: "Frequency" or "The Frequency in GigaHertz".

> --------------------------------------------------------------------------------
> | 1st AttributeValue | 0x050 | 0x40 | The ASCII string
> | StringDescription | | | description of the
> | | | | attribute value, up to 63
> | | | | characters plus a NULL
> | | | | terminator. If this
> | | | | contains only a NULL
> | | | | terminator, then there is
> | | | | no ASCII string
> | | | | associated with AttributeValue.
> --------------------------------------------------------------------------------
> | .... | | |
>
>
>>
>>> The new H_CALL exports information in direct string value format, hence
>>> a new interface has been introduced in /sys/firmware/papr to export
>> Hm.. Maybe this should be something less generic than "papr"?
>
> The interface naming was inspired from /sys/firmware/opal's naming convention.
> We believed the name PAPR could serve as more generic name to be used by both
> Linux running on PHYP and linux on KVM.

Right, I agree with that rationale, but /opal has identifiable elements
in it whereas /papr would have the generic "attr_X_name", which does not
give much hint about what they are.

We also expect people to iterate the "attr_X_*" files, so if we decide
to add something else under /papr in the future, that would potentially
cause issues with any tool that just lists the content of the directory.

So maybe we should be proactive and put the hcall stuff inside a
subdirectory already. /papr/energy_scale_attrs comes to mind, but I
don't have a strong opinion on the particular name.

>
> If you have something more concrete in mind, please let me know. I'm open to
> suggestions.
>
>>
>>> this information to userspace in an extensible pass-through format.
>>> The H_CALL returns the name, numeric value and string value. As string
>>> values are in human readable format, therefore if the string value
>>> exists then that is given precedence over the numeric value.
>> So the hypervisor could simply not send the string representation? How
>> will the userspace tell the difference since they are reading everything
>> from a file?
>>
>> Overall I'd say we should give the data in a more structured way and let
>> the user-facing tool do the formatting and presentation.
>
> That's a valid concern, the design for this was inspired from hwmon's interface
> to housing the sensor information.
>
> One alternative to add more structure to this format could be to introduce:
> attr_X_name, attr_X_num_val, attr_X_str_val
>
> However, in some cases like min/max frequency the string value is empty. In
> that case the file attr_X_str_val will also be empty.
> Is that an acceptable format of having empty files that in some cases will
> never be populated?

I'm thinking yes, but I'm not sure. Let's see if someone else has a say
in this.

> We also went ahead to confirm with the SPEC team that if a string value exists
> in their buffer, that must be given precedence.

Huh.. That must be a recommendation only. The hypervisor has no control
over how people present the information in userspace.

>
> Another alternative format could to keep attr_X_name, attr_X_val intact but
> change what X means. Currently X is just an iteratively increasing number. But
> X can also serve as an ID which we get from H_CALL output buffer.

This seems like a good idea. It makes it easier to correlate the
attribute with what is in PAPR.

>
> In this case, we should also include some versioning so that the tool now also
> has cognizance of contents of each file.
>
>>> The format of exposing the sysfs information is as follows:
>>> /sys/firmware/papr/
>>> |-- attr_0_name
>>> |-- attr_0_val
>>> |-- attr_1_name
>>> |-- attr_1_val
>>> ...
>> How do we keep a stable interface with userspace? Say the hypervisor
>> decides to add or remove attributes, change their order, string
>> representation, etc? It will inform us via the version field, but that
>> is lost when we output this to sysfs.
>>
>> I get that if the userspace just iterate over the contents of the
>> directory then nothing breaks, but there is not much else it could do it
>> seems.
>
> Fair point, having the version exposed to the sysfs does seem crucial.
>
> Currently in ppc-utils we iterate over all the information, however as you
> rightly pointed out there may be other tools needing just specific information.
> The alternative I suggested a few sentences above to include ID based attribute
> naming and versioning maybe a more elegant way of solving this problem.
>
> What are your thoughts on a design like this?
>

Based on all the new information you provided, I'd say present all the
data and group it under the ID:

/sys/firmware/papr/energy_scale_attrs/
|-- <id>/
|-- desc
|-- value
|-- value_desc
|-- <id>/
|-- desc
|-- value
|-- value_desc

Is that workable?

>>> The energy information that is exported is useful for userspace tools
>>> such as powerpc-utils. Currently these tools infer the
>>> "power_mode_data" value in the lparcfg, which in turn is obtained from
>>> the to be deprecated H_GET_EM_PARMS H_CALL.
>>> On future platforms, such userspace utilities will have to look at the
>>> data returned from the new H_CALL being populated in this new sysfs
>>> interface and report this information directly without the need of
>>> interpretation.
>>>
>>> Signed-off-by: Pratik R. Sampat <[email protected]>
>
> Thanks
> Pratik

2021-06-10 08:04:31

by Pratik R. Sampat

[permalink] [raw]
Subject: Re: [RFC] powerpc/pseries: Interface to represent PAPR firmware attributes



On 10/06/21 5:33 am, Fabiano Rosas wrote:
> Pratik Sampat <[email protected]> writes:
>
>>>> 3. version info - 1 byte
>>>> 4. A data array of size num attributes, which contains the following:
>>>> a. attribute ID - 8 bytes
>>>> b. attribute value in number - 8 bytes
>>>> c. attribute name in string - 64 bytes
>>>> d. attribute value in string - 64 bytes
>>> Is this new hypercall already present in the spec? These seem a bit
>>> underspecified to me.
>> Yes, it is present in the spec. I probably summarized a little more than needed
>> here and I could expand upon below.
>>
>> The input buffer recives the following data:
>>
>> 1. “flags”:
>> a. Bit 0: singleAttribute
>> If set to 1, only return the single attribute matching firstAttributeId.
>> b. Bits 1-63: Reserved
>> 2. “firstAttributeId”: The first attribute to retrieve
>> 3. “bufferAddress”: The logical real address of the start of the output buffer
>> 4. “bufferSize”: The size in bytes of the output buffer
>>
>>
>> From the document, the format of the output buffer is as follows:
>>
>> Table 1 --> output buffer
>> ================================================================================
>> | Field Name | Byte | Length | Description
>> | | Offset | in Bytes |
>> ================================================================================
>> | NumberOf | | | Number of Attributes in Buffer
>> | AttributesInBuffer | 0x000 | 0x08 |
>> --------------------------------------------------------------------------------
>> | AttributeArrayOffset | 0x008 | 0x08 | Byte offset to start of Array
>> | | | | of Attributes
>> | | | |
>> --------------------------------------------------------------------------------
>> | OutputBufferData | | | Version of the Header.
>> | HeaderVersion | 0x010 | 0x01 | The header will be always
>> | AttributesInBuffer | | | backward compatible, and changes
>> | | | | will not impact the Array of
>> | | | | attributes.
>> | | | | Current version = 0x01
> This is not clear to me. In the event of a header version change, is the
> total set of attributes guaranteed to remain the same? Or only the array
> layout? We might not need to expose the version information after all.

I believe, the way versioning currently works is that if any new attribute is
added/modified to the list, this will entail a new version.

Regardless, the older attributes and their ids will not change and will still
be backwards compatible.

If the versioning does change, this patch does introduce a version check and
will fail to populate the sysfs and, a tool like powerpc-utils will not read
incorrect/non-coherent information.

So I'm inclined also believe now that verisoning information may not be needed
to expose to userspace.

>> --------------------------------------------------------------------------------
>> | ArrayOfAttributes | | | The array will contain
>> | | | | "NumberOfAttributesInBuffer"
>> | | | | array elements not to exceed
>> | | | | the size of the buffer.
>> | | | | Layout of the array is
>> | | | | detailed in Table 2.
>> --------------------------------------------------------------------------------
>>
>>
>> Table 2 --> Array of attributes
>> ================================================================================
>> | Field Name | Byte | Length | Description
>> | | Offset | in Bytes |
>> ================================================================================
>> | 1st AttributeId | 0x000 | 0x08 | The ID of the Attribute
>> --------------------------------------------------------------------------------
>> | 1st AttributeValue | 0x008 | 0x08 | The numerical value of
>> | | | | the attribute
>> --------------------------------------------------------------------------------
>> | 1st AttributeString | 0x010 | 0x40 | The ASCII string
>> | Description | | | description of the
>> | | | | attribute, up to 63
>> | | | | characters plus a NULL
>> | | | | terminator.
> There is a slight disconnect in that this is called "description" by the
> spec, which makes me think they could eventually have something more
> verbose than what you'd expect from "name".
>
> So they could give us either: "Frequency" or "The Frequency in GigaHertz".

Yes, the description can be more verbose, like I can see attributes with the
description as "Minimum Frequency (MHz)". That's probably why parsing based on
IDs is a better approach.

>
>> --------------------------------------------------------------------------------
>> | 1st AttributeValue | 0x050 | 0x40 | The ASCII string
>> | StringDescription | | | description of the
>> | | | | attribute value, up to 63
>> | | | | characters plus a NULL
>> | | | | terminator. If this
>> | | | | contains only a NULL
>> | | | | terminator, then there is
>> | | | | no ASCII string
>> | | | | associated with AttributeValue.
>> --------------------------------------------------------------------------------
>> | .... | | |
>>
>>
>>>> The new H_CALL exports information in direct string value format, hence
>>>> a new interface has been introduced in /sys/firmware/papr to export
>>> Hm.. Maybe this should be something less generic than "papr"?
>> The interface naming was inspired from /sys/firmware/opal's naming convention.
>> We believed the name PAPR could serve as more generic name to be used by both
>> Linux running on PHYP and linux on KVM.
> Right, I agree with that rationale, but /opal has identifiable elements
> in it whereas /papr would have the generic "attr_X_name", which does not
> give much hint about what they are.
>
> We also expect people to iterate the "attr_X_*" files, so if we decide
> to add something else under /papr in the future, that would potentially
> cause issues with any tool that just lists the content of the directory.
>
> So maybe we should be proactive and put the hcall stuff inside a
> subdirectory already. /papr/energy_scale_attrs comes to mind, but I
> don't have a strong opinion on the particular name.

Encapsulating it within another directory like energy_scale_attrs does make
sense and keeps the PAPR directory open to more such information going forward.

>> If you have something more concrete in mind, please let me know. I'm open to
>> suggestions.
>>
>>>> this information to userspace in an extensible pass-through format.
>>>> The H_CALL returns the name, numeric value and string value. As string
>>>> values are in human readable format, therefore if the string value
>>>> exists then that is given precedence over the numeric value.
>>> So the hypervisor could simply not send the string representation? How
>>> will the userspace tell the difference since they are reading everything
>>> from a file?
>>>
>>> Overall I'd say we should give the data in a more structured way and let
>>> the user-facing tool do the formatting and presentation.
>> That's a valid concern, the design for this was inspired from hwmon's interface
>> to housing the sensor information.
>>
>> One alternative to add more structure to this format could be to introduce:
>> attr_X_name, attr_X_num_val, attr_X_str_val
>>
>> However, in some cases like min/max frequency the string value is empty. In
>> that case the file attr_X_str_val will also be empty.
>> Is that an acceptable format of having empty files that in some cases will
>> never be populated?
> I'm thinking yes, but I'm not sure. Let's see if someone else has a say
> in this.

Sure, if we can have empty sysfs files, then this presents a coherent interface.

@mpe, can you weigh in here, can we have an interface where we have the following structure:
/sys/firmware/papr/energy_scale_attrs/
|-- <id>/
|-- desc
|-- value
|-- value_desc
where value_desc can be empty in some case?
If so, can we leave them empty or do we need to have them populated with a
string "NULL"/"NONE"?

>
>> We also went ahead to confirm with the SPEC team that if a string value exists
>> in their buffer, that must be given precedence.
> Huh.. That must be a recommendation only. The hypervisor has no control
> over how people present the information in userspace.
>
>> Another alternative format could to keep attr_X_name, attr_X_val intact but
>> change what X means. Currently X is just an iteratively increasing number. But
>> X can also serve as an ID which we get from H_CALL output buffer.
> This seems like a good idea. It makes it easier to correlate the
> attribute with what is in PAPR.
>
>> In this case, we should also include some versioning so that the tool now also
>> has cognizance of contents of each file.
>>
>>>> The format of exposing the sysfs information is as follows:
>>>> /sys/firmware/papr/
>>>> |-- attr_0_name
>>>> |-- attr_0_val
>>>> |-- attr_1_name
>>>> |-- attr_1_val
>>>> ...
>>> How do we keep a stable interface with userspace? Say the hypervisor
>>> decides to add or remove attributes, change their order, string
>>> representation, etc? It will inform us via the version field, but that
>>> is lost when we output this to sysfs.
>>>
>>> I get that if the userspace just iterate over the contents of the
>>> directory then nothing breaks, but there is not much else it could do it
>>> seems.
>> Fair point, having the version exposed to the sysfs does seem crucial.
>>
>> Currently in ppc-utils we iterate over all the information, however as you
>> rightly pointed out there may be other tools needing just specific information.
>> The alternative I suggested a few sentences above to include ID based attribute
>> naming and versioning maybe a more elegant way of solving this problem.
>>
>> What are your thoughts on a design like this?
>>
> Based on all the new information you provided, I'd say present all the
> data and group it under the ID:
>
> /sys/firmware/papr/energy_scale_attrs/
> |-- <id>/
> |-- desc
> |-- value
> |-- value_desc
> |-- <id>/
> |-- desc
> |-- value
> |-- value_desc
>
> Is that workable?

If we can confirm if value descriptions can be empty, then I too think this is
a good interface to introduce for energy attributes.

Thanks for your feedback.
Pratik

>>>> The energy information that is exported is useful for userspace tools
>>>> such as powerpc-utils. Currently these tools infer the
>>>> "power_mode_data" value in the lparcfg, which in turn is obtained from
>>>> the to be deprecated H_GET_EM_PARMS H_CALL.
>>>> On future platforms, such userspace utilities will have to look at the
>>>> data returned from the new H_CALL being populated in this new sysfs
>>>> interface and report this information directly without the need of
>>>> interpretation.
>>>>
>>>> Signed-off-by: Pratik R. Sampat <[email protected]>
>> Thanks
>> Pratik

2021-06-16 00:05:26

by Michael Ellerman

[permalink] [raw]
Subject: Re: [RFC] powerpc/pseries: Interface to represent PAPR firmware attributes

Fabiano Rosas <[email protected]> writes:
> Pratik Sampat <[email protected]> writes:
...
>>>
>>>> The new H_CALL exports information in direct string value format, hence
>>>> a new interface has been introduced in /sys/firmware/papr to export
>>> Hm.. Maybe this should be something less generic than "papr"?
>>
>> The interface naming was inspired from /sys/firmware/opal's naming convention.
>> We believed the name PAPR could serve as more generic name to be used by both
>> Linux running on PHYP and linux on KVM.
>
> Right, I agree with that rationale, but /opal has identifiable elements
> in it whereas /papr would have the generic "attr_X_name", which does not
> give much hint about what they are.
>
> We also expect people to iterate the "attr_X_*" files, so if we decide
> to add something else under /papr in the future, that would potentially
> cause issues with any tool that just lists the content of the directory.
>
> So maybe we should be proactive and put the hcall stuff inside a
> subdirectory already. /papr/energy_scale_attrs comes to mind, but I
> don't have a strong opinion on the particular name.

Maybe we should use the descriptive part of the hcall.

So H_GET_ENERGY_SCALE_INFO -> ../papr/energy_scale_info/

That should help avoid any naming confusion, because every hcall should
have a unique name.

In future if there's ever a H_GET_ENERGY_SCALE_INFO_2 we would then have
to decide if we expose that as a separate directory, or more likely we
would handle that in the kernel and continue to use the existing sysfs
name.

...

> Based on all the new information you provided, I'd say present all the
> data and group it under the ID:
>
> /sys/firmware/papr/energy_scale_attrs/
> |-- <id>/
> |-- desc
> |-- value
> |-- value_desc
> |-- <id>/
> |-- desc
> |-- value
> |-- value_desc

Yeah that seems reasonable.

I'd think we should just omit the value_desc if it's empty.

cheers