2021-11-15 21:59:43

by Winiarska, Iwona

[permalink] [raw]
Subject: [PATCH v3 00/13] Introduce PECI subsystem

Hi,

This is a third round of patches introducing PECI subsystem.
Sorry for the delay between v2 and v3.

The Platform Environment Control Interface (PECI) is a communication
interface between Intel processors and management controllers (e.g.
Baseboard Management Controller, BMC).

This series adds a PECI subsystem and introduces drivers which run in
the Linux instance on the management controller (not the main Intel
processor) and is intended to be used by the OpenBMC [1], a Linux
distribution for BMC devices.
The information exposed over PECI (like processor and DIMM
temperature) refers to the Intel processor and can be consumed by
daemons running on the BMC to, for example, display the processor
temperature in its web interface.

The PECI bus is collection of code that provides interface support
between PECI devices (that actually represent processors) and PECI
controllers (such as the "peci-aspeed" controller) that allow to
access physical PECI interface. PECI devices are bound to PECI
drivers that provides access to PECI services. This series introduces
a generic "peci-cpu" driver that exposes hardware monitoring "cputemp"
and "dimmtemp" using the auxiliary bus.

Exposing "raw" PECI to userspace, either to write userspace drivers or
for debug/testing purpose was left out of this series to encourage
writing kernel drivers instead, but may be pursued in the future.

Introducing PECI to upstream Linux was already attempted before [2].
Since it's been over a year since last revision, and the series
changed quite a bit in the meantime, I've decided to start from v1.

I would also like to give credit to everyone who helped me with
different aspects of preliminary review:
- Pierre-Louis Bossart,
- Tony Luck,
- Andy Shevchenko,
- Dave Hansen.

[1] https://github.com/openbmc/openbmc
[2] https://lore.kernel.org/openbmc/[email protected]/

Changes v2 -> v3:

* Dropped x86/cpu patches (Boris)
* Dropped pr_fmt() for PECI module (Dan)
* Fixed releasing peci controller device flow (Dan)
* Improved peci-aspeed commit-msg and Kconfig help (Dan)
* Fixed aspeed_peci_xfer() to use the proper spin_lock function (Dan)
* Wrapped print_hex_dump_bytes() in CONFIG_DYNAMIC_DEBUG (Dan)
* Removed debug status logs from aspeed_peci_irq_handler() (Dan)
* Renamed functions using devres to start with "devm" (Dan)
* Changed request to be allocated on stack in peci_detect (Dan)
* Removed redundant WARN_ON on invalid PECI addr (Dan)
* Changed peci_device_create() to use device_initialize() + device_add() pattern (Dan)
* Fixed peci_device_destroy() to use kill_device() avoiding double-free (Dan)
* Renamed functions that perform xfer using "peci_xfer_*" prefix (Dan)
* Renamed peci_request_data_dib(temp) -> peci_request_dib(temp)_read (Dan)
* Fixed thermal margin readings for older Intel processors (Zev)
* Misc hwmon simplifications (Guenter)
* Used BIT_PER_TYPE to verify macro value constrains (Guenter)
* Improved WARN_ON message to print chan_rank_max and idx_dimm_max (Guenter)
* Improved dimmtemp to not reattempt probe if no dimms are populated

Changes v1 -> v2:

Biggest changes when it comes to diffstat are locking in HWMON
(I decided to clean things up a bit while adding it), switching to
devres usage in more places and exposing sysfs interface in separate patch.

* Moved extending X86 ARCHITECTURE MAINTAINERS earlier in series (Dan)
* Removed "default n" for GENERIC_LIB_X86 (Dan)
* Added vendor prefix for peci-aspeed specific properties (Rob)
* Refactored PECI to use devres consistently (Dan)
* Added missing sysfs documentation and excluded adding peci-sysfs to
separate patch (Dan)
* Used module_init() instead of subsys_init() for peci module initialization (Dan)
* Removed redundant struct peci_device member (Dan)
* Improved PECI Kconfig help (Randy/Dan)
* Fixed/removed log messages (Dan, Guenter)
* Refactored peci-cputemp and peci-dimmtemp and added missing locks (Guenter)
* Removed unused dev_set_drvdata() in peci-cputemp and peci-dimmtemp (Guenter)
* Fixed used types, names, fixed broken and added additional comments
to peci-hwmon (Guenter, Zev)
* Refactored peci-dimmtemp to not return -ETIMEDOUT (Guenter)
* Added sanity check for min_peci_revision in peci-hwmon drivers (Zev)
* Added assert for DIMM_NUMS_MAX and additional warning in peci-dimmtemp (Zev)
* Fixed macro names in peci-aspeed (Zev)
* Refactored peci-aspeed sanitizing properties to a single helper function (Zev)
* Fixed peci_cpu_device_ids definition for Broadwell Xeon D (David)
* Refactor peci_request to use a single allocation (Zev)
* Used min_t() to improve code readability (Zev)
* Added macro for PECI_RDENDPTCFG_MMIO_WR_LEN_BASE and fixed adev type
array name to more descriptive (Zev)
* Fixed peci-hwmon commit-msg and documentation (Zev)

Thanks
-Iwona

Iwona Winiarska (11):
dt-bindings: Add generic bindings for PECI
dt-bindings: Add bindings for peci-aspeed
ARM: dts: aspeed: Add PECI controller nodes
peci: Add core infrastructure
peci: Add device detection
peci: Add sysfs interface for PECI bus
peci: Add support for PECI device drivers
peci: Add peci-cpu driver
hwmon: peci: Add cputemp driver
hwmon: peci: Add dimmtemp driver
docs: Add PECI documentation

Jae Hyun Yoo (2):
peci: Add peci-aspeed controller driver
docs: hwmon: Document PECI drivers

Documentation/ABI/testing/sysfs-bus-peci | 16 +
.../devicetree/bindings/peci/peci-aspeed.yaml | 109 +++
.../bindings/peci/peci-controller.yaml | 33 +
Documentation/hwmon/index.rst | 2 +
Documentation/hwmon/peci-cputemp.rst | 90 +++
Documentation/hwmon/peci-dimmtemp.rst | 57 ++
Documentation/index.rst | 1 +
Documentation/peci/index.rst | 16 +
Documentation/peci/peci.rst | 51 ++
MAINTAINERS | 29 +
arch/arm/boot/dts/aspeed-g4.dtsi | 14 +
arch/arm/boot/dts/aspeed-g5.dtsi | 14 +
arch/arm/boot/dts/aspeed-g6.dtsi | 14 +
drivers/Kconfig | 3 +
drivers/Makefile | 1 +
drivers/hwmon/Kconfig | 2 +
drivers/hwmon/Makefile | 1 +
drivers/hwmon/peci/Kconfig | 31 +
drivers/hwmon/peci/Makefile | 7 +
drivers/hwmon/peci/common.h | 58 ++
drivers/hwmon/peci/cputemp.c | 592 ++++++++++++++++
drivers/hwmon/peci/dimmtemp.c | 630 ++++++++++++++++++
drivers/peci/Kconfig | 36 +
drivers/peci/Makefile | 10 +
drivers/peci/controller/Kconfig | 17 +
drivers/peci/controller/Makefile | 3 +
drivers/peci/controller/peci-aspeed.c | 429 ++++++++++++
drivers/peci/core.c | 236 +++++++
drivers/peci/cpu.c | 343 ++++++++++
drivers/peci/device.c | 249 +++++++
drivers/peci/internal.h | 136 ++++
drivers/peci/request.c | 482 ++++++++++++++
drivers/peci/sysfs.c | 82 +++
include/linux/peci-cpu.h | 40 ++
include/linux/peci.h | 110 +++
35 files changed, 3944 insertions(+)
create mode 100644 Documentation/ABI/testing/sysfs-bus-peci
create mode 100644 Documentation/devicetree/bindings/peci/peci-aspeed.yaml
create mode 100644 Documentation/devicetree/bindings/peci/peci-controller.yaml
create mode 100644 Documentation/hwmon/peci-cputemp.rst
create mode 100644 Documentation/hwmon/peci-dimmtemp.rst
create mode 100644 Documentation/peci/index.rst
create mode 100644 Documentation/peci/peci.rst
create mode 100644 drivers/hwmon/peci/Kconfig
create mode 100644 drivers/hwmon/peci/Makefile
create mode 100644 drivers/hwmon/peci/common.h
create mode 100644 drivers/hwmon/peci/cputemp.c
create mode 100644 drivers/hwmon/peci/dimmtemp.c
create mode 100644 drivers/peci/Kconfig
create mode 100644 drivers/peci/Makefile
create mode 100644 drivers/peci/controller/Kconfig
create mode 100644 drivers/peci/controller/Makefile
create mode 100644 drivers/peci/controller/peci-aspeed.c
create mode 100644 drivers/peci/core.c
create mode 100644 drivers/peci/cpu.c
create mode 100644 drivers/peci/device.c
create mode 100644 drivers/peci/internal.h
create mode 100644 drivers/peci/request.c
create mode 100644 drivers/peci/sysfs.c
create mode 100644 include/linux/peci-cpu.h
create mode 100644 include/linux/peci.h

--
2.31.1



2021-11-15 22:11:46

by Winiarska, Iwona

[permalink] [raw]
Subject: [PATCH v3 02/13] dt-bindings: Add bindings for peci-aspeed

Add device tree bindings for the peci-aspeed controller driver.

Co-developed-by: Jae Hyun Yoo <[email protected]>
Signed-off-by: Jae Hyun Yoo <[email protected]>
Signed-off-by: Iwona Winiarska <[email protected]>
Reviewed-by: Rob Herring <[email protected]>
---
.../devicetree/bindings/peci/peci-aspeed.yaml | 109 ++++++++++++++++++
1 file changed, 109 insertions(+)
create mode 100644 Documentation/devicetree/bindings/peci/peci-aspeed.yaml

diff --git a/Documentation/devicetree/bindings/peci/peci-aspeed.yaml b/Documentation/devicetree/bindings/peci/peci-aspeed.yaml
new file mode 100644
index 000000000000..2929d1e000d8
--- /dev/null
+++ b/Documentation/devicetree/bindings/peci/peci-aspeed.yaml
@@ -0,0 +1,109 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/peci/peci-aspeed.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Aspeed PECI Bus Device Tree Bindings
+
+maintainers:
+ - Iwona Winiarska <[email protected]>
+ - Jae Hyun Yoo <[email protected]>
+
+allOf:
+ - $ref: peci-controller.yaml#
+
+properties:
+ compatible:
+ enum:
+ - aspeed,ast2400-peci
+ - aspeed,ast2500-peci
+ - aspeed,ast2600-peci
+
+ reg:
+ maxItems: 1
+
+ interrupts:
+ maxItems: 1
+
+ clocks:
+ description:
+ Clock source for PECI controller. Should reference the external
+ oscillator clock.
+ maxItems: 1
+
+ resets:
+ maxItems: 1
+
+ cmd-timeout-ms:
+ minimum: 1
+ maximum: 1000
+ default: 1000
+
+ aspeed,clock-divider:
+ description:
+ This value determines PECI controller internal clock dividing
+ rate. The divider will be calculated as 2 raised to the power of
+ the given value.
+ $ref: /schemas/types.yaml#/definitions/uint32
+ minimum: 0
+ maximum: 7
+ default: 0
+
+ aspeed,msg-timing:
+ description:
+ Message timing negotiation period. This value will determine the period
+ of message timing negotiation to be issued by PECI controller. The unit
+ of the programmed value is four times of PECI clock period.
+ $ref: /schemas/types.yaml#/definitions/uint32
+ minimum: 0
+ maximum: 255
+ default: 1
+
+ aspeed,addr-timing:
+ description:
+ Address timing negotiation period. This value will determine the period
+ of address timing negotiation to be issued by PECI controller. The unit
+ of the programmed value is four times of PECI clock period.
+ $ref: /schemas/types.yaml#/definitions/uint32
+ minimum: 0
+ maximum: 255
+ default: 1
+
+ aspeed,rd-sampling-point:
+ description:
+ Read sampling point selection. The whole period of a bit time will be
+ divided into 16 time frames. This value will determine the time frame
+ in which the controller will sample PECI signal for data read back.
+ Usually in the middle of a bit time is the best.
+ $ref: /schemas/types.yaml#/definitions/uint32
+ minimum: 0
+ maximum: 15
+ default: 8
+
+required:
+ - compatible
+ - reg
+ - interrupts
+ - clocks
+ - resets
+
+additionalProperties: false
+
+examples:
+ - |
+ #include <dt-bindings/interrupt-controller/arm-gic.h>
+ #include <dt-bindings/clock/ast2600-clock.h>
+ peci-controller@1e78b000 {
+ compatible = "aspeed,ast2600-peci";
+ reg = <0x1e78b000 0x100>;
+ interrupts = <GIC_SPI 38 IRQ_TYPE_LEVEL_HIGH>;
+ clocks = <&syscon ASPEED_CLK_GATE_REF0CLK>;
+ resets = <&syscon ASPEED_RESET_PECI>;
+ cmd-timeout-ms = <1000>;
+ aspeed,clock-divider = <0>;
+ aspeed,msg-timing = <1>;
+ aspeed,addr-timing = <1>;
+ aspeed,rd-sampling-point = <8>;
+ };
+...
--
2.31.1


2021-11-15 22:11:47

by Winiarska, Iwona

[permalink] [raw]
Subject: [PATCH v3 08/13] peci: Add support for PECI device drivers

Add support for PECI device drivers, which unlike PECI controller
drivers are actually able to provide functionalities to userspace.

Also, extend peci_request API to allow querying more details about PECI
device (e.g. model/family), that's going to be used to find a compatible
peci_driver.

Signed-off-by: Iwona Winiarska <[email protected]>
Reviewed-by: Pierre-Louis Bossart <[email protected]>
---
drivers/peci/core.c | 44 +++++++++
drivers/peci/device.c | 130 ++++++++++++++++++++++++
drivers/peci/internal.h | 74 ++++++++++++++
drivers/peci/request.c | 214 ++++++++++++++++++++++++++++++++++++++++
include/linux/peci.h | 19 ++++
5 files changed, 481 insertions(+)

diff --git a/drivers/peci/core.c b/drivers/peci/core.c
index e993615cf521..9c8cf07e51c7 100644
--- a/drivers/peci/core.c
+++ b/drivers/peci/core.c
@@ -160,8 +160,52 @@ struct peci_controller *devm_peci_controller_add(struct device *dev,
}
EXPORT_SYMBOL_NS_GPL(devm_peci_controller_add, PECI);

+static const struct peci_device_id *
+peci_bus_match_device_id(const struct peci_device_id *id, struct peci_device *device)
+{
+ while (id->family != 0) {
+ if (id->family == device->info.family &&
+ id->model == device->info.model)
+ return id;
+ id++;
+ }
+
+ return NULL;
+}
+
+static int peci_bus_device_match(struct device *dev, struct device_driver *drv)
+{
+ struct peci_device *device = to_peci_device(dev);
+ struct peci_driver *peci_drv = to_peci_driver(drv);
+
+ if (dev->type != &peci_device_type)
+ return 0;
+
+ return !!peci_bus_match_device_id(peci_drv->id_table, device);
+}
+
+static int peci_bus_device_probe(struct device *dev)
+{
+ struct peci_device *device = to_peci_device(dev);
+ struct peci_driver *driver = to_peci_driver(dev->driver);
+
+ return driver->probe(device, peci_bus_match_device_id(driver->id_table, device));
+}
+
+static void peci_bus_device_remove(struct device *dev)
+{
+ struct peci_device *device = to_peci_device(dev);
+ struct peci_driver *driver = to_peci_driver(dev->driver);
+
+ if (driver->remove)
+ driver->remove(device);
+}
+
struct bus_type peci_bus_type = {
.name = "peci",
+ .match = peci_bus_device_match,
+ .probe = peci_bus_device_probe,
+ .remove = peci_bus_device_remove,
.bus_groups = peci_bus_groups,
};

diff --git a/drivers/peci/device.c b/drivers/peci/device.c
index 91b1522ce6f7..0a7660f4fc9b 100644
--- a/drivers/peci/device.c
+++ b/drivers/peci/device.c
@@ -1,11 +1,110 @@
// SPDX-License-Identifier: GPL-2.0-only
// Copyright (c) 2018-2021 Intel Corporation

+#include <linux/bitfield.h>
#include <linux/peci.h>
#include <linux/slab.h>

#include "internal.h"

+#define REVISION_NUM_MASK GENMASK(15, 8)
+static int peci_get_revision(struct peci_device *device, u8 *revision)
+{
+ struct peci_request *req;
+ u64 dib;
+
+ req = peci_xfer_get_dib(device);
+ if (IS_ERR(req))
+ return PTR_ERR(req);
+
+ /*
+ * PECI device may be in a state where it is unable to return a proper
+ * DIB, in which case it returns 0 as DIB value.
+ * Let's treat this as an error to avoid carrying on with the detection
+ * using invalid revision.
+ */
+ dib = peci_request_dib_read(req);
+ if (dib == 0) {
+ peci_request_free(req);
+ return -EIO;
+ }
+
+ *revision = FIELD_GET(REVISION_NUM_MASK, dib);
+
+ peci_request_free(req);
+
+ return 0;
+}
+
+static int peci_get_cpu_id(struct peci_device *device, u32 *cpu_id)
+{
+ struct peci_request *req;
+ int ret;
+
+ req = peci_xfer_pkg_cfg_readl(device, PECI_PCS_PKG_ID, PECI_PKG_ID_CPU_ID);
+ if (IS_ERR(req))
+ return PTR_ERR(req);
+
+ ret = peci_request_status(req);
+ if (ret)
+ goto out_req_free;
+
+ *cpu_id = peci_request_data_readl(req);
+out_req_free:
+ peci_request_free(req);
+
+ return ret;
+}
+
+static unsigned int peci_x86_cpu_family(unsigned int sig)
+{
+ unsigned int x86;
+
+ x86 = (sig >> 8) & 0xf;
+
+ if (x86 == 0xf)
+ x86 += (sig >> 20) & 0xff;
+
+ return x86;
+}
+
+static unsigned int peci_x86_cpu_model(unsigned int sig)
+{
+ unsigned int fam, model;
+
+ fam = peci_x86_cpu_family(sig);
+
+ model = (sig >> 4) & 0xf;
+
+ if (fam >= 0x6)
+ model += ((sig >> 16) & 0xf) << 4;
+
+ return model;
+}
+
+static int peci_device_info_init(struct peci_device *device)
+{
+ u8 revision;
+ u32 cpu_id;
+ int ret;
+
+ ret = peci_get_cpu_id(device, &cpu_id);
+ if (ret)
+ return ret;
+
+ device->info.family = peci_x86_cpu_family(cpu_id);
+ device->info.model = peci_x86_cpu_model(cpu_id);
+
+ ret = peci_get_revision(device, &revision);
+ if (ret)
+ return ret;
+ device->info.peci_revision = revision;
+
+ device->info.socket_id = device->addr - PECI_BASE_ADDR;
+
+ return 0;
+}
+
static int peci_detect(struct peci_controller *controller, u8 addr)
{
/*
@@ -75,6 +174,10 @@ int peci_device_create(struct peci_controller *controller, u8 addr)
device->dev.bus = &peci_bus_type;
device->dev.type = &peci_device_type;

+ ret = peci_device_info_init(device);
+ if (ret)
+ goto err_put;
+
ret = dev_set_name(&device->dev, "%d-%02x", controller->id, device->addr);
if (ret)
goto err_put;
@@ -105,6 +208,33 @@ void peci_device_destroy(struct peci_device *device)
device_unregister(&device->dev);
}

+int __peci_driver_register(struct peci_driver *driver, struct module *owner,
+ const char *mod_name)
+{
+ driver->driver.bus = &peci_bus_type;
+ driver->driver.owner = owner;
+ driver->driver.mod_name = mod_name;
+
+ if (!driver->probe) {
+ pr_err("peci: trying to register driver without probe callback\n");
+ return -EINVAL;
+ }
+
+ if (!driver->id_table) {
+ pr_err("peci: trying to register driver without device id table\n");
+ return -EINVAL;
+ }
+
+ return driver_register(&driver->driver);
+}
+EXPORT_SYMBOL_NS_GPL(__peci_driver_register, PECI);
+
+void peci_driver_unregister(struct peci_driver *driver)
+{
+ driver_unregister(&driver->driver);
+}
+EXPORT_SYMBOL_NS_GPL(peci_driver_unregister, PECI);
+
static void peci_device_release(struct device *dev)
{
struct peci_device *device = to_peci_device(dev);
diff --git a/drivers/peci/internal.h b/drivers/peci/internal.h
index 978e12c8e1d3..52c02e12874f 100644
--- a/drivers/peci/internal.h
+++ b/drivers/peci/internal.h
@@ -19,6 +19,35 @@ struct peci_request;
struct peci_request *peci_request_alloc(struct peci_device *device, u8 tx_len, u8 rx_len);
void peci_request_free(struct peci_request *req);

+int peci_request_status(struct peci_request *req);
+
+u64 peci_request_dib_read(struct peci_request *req);
+
+u8 peci_request_data_readb(struct peci_request *req);
+u16 peci_request_data_readw(struct peci_request *req);
+u32 peci_request_data_readl(struct peci_request *req);
+u64 peci_request_data_readq(struct peci_request *req);
+
+struct peci_request *peci_xfer_get_dib(struct peci_device *device);
+struct peci_request *peci_xfer_get_temp(struct peci_device *device);
+
+struct peci_request *peci_xfer_pkg_cfg_readb(struct peci_device *device, u8 index, u16 param);
+struct peci_request *peci_xfer_pkg_cfg_readw(struct peci_device *device, u8 index, u16 param);
+struct peci_request *peci_xfer_pkg_cfg_readl(struct peci_device *device, u8 index, u16 param);
+struct peci_request *peci_xfer_pkg_cfg_readq(struct peci_device *device, u8 index, u16 param);
+
+/**
+ * struct peci_device_id - PECI device data to match
+ * @data: pointer to driver private data specific to device
+ * @family: device family
+ * @model: device model
+ */
+struct peci_device_id {
+ const void *data;
+ u16 family;
+ u8 model;
+};
+
extern struct device_type peci_device_type;
extern const struct attribute_group *peci_device_groups[];

@@ -28,6 +57,51 @@ void peci_device_destroy(struct peci_device *device);
extern struct bus_type peci_bus_type;
extern const struct attribute_group *peci_bus_groups[];

+/**
+ * struct peci_driver - PECI driver
+ * @driver: inherit device driver
+ * @probe: probe callback
+ * @remove: remove callback
+ * @id_table: PECI device match table to decide which device to bind
+ */
+struct peci_driver {
+ struct device_driver driver;
+ int (*probe)(struct peci_device *device, const struct peci_device_id *id);
+ void (*remove)(struct peci_device *device);
+ const struct peci_device_id *id_table;
+};
+
+static inline struct peci_driver *to_peci_driver(struct device_driver *d)
+{
+ return container_of(d, struct peci_driver, driver);
+}
+
+int __peci_driver_register(struct peci_driver *driver, struct module *owner,
+ const char *mod_name);
+/**
+ * peci_driver_register() - register PECI driver
+ * @driver: the driver to be registered
+ *
+ * PECI drivers that don't need to do anything special in module init should
+ * use the convenience "module_peci_driver" macro instead
+ *
+ * Return: zero on success, else a negative error code.
+ */
+#define peci_driver_register(driver) \
+ __peci_driver_register(driver, THIS_MODULE, KBUILD_MODNAME)
+void peci_driver_unregister(struct peci_driver *driver);
+
+/**
+ * module_peci_driver() - helper macro for registering a modular PECI driver
+ * @__peci_driver: peci_driver struct
+ *
+ * Helper macro for PECI drivers which do not do anything special in module
+ * init/exit. This eliminates a lot of boilerplate. Each module may only
+ * use this macro once, and calling it replaces module_init() and module_exit()
+ */
+#define module_peci_driver(__peci_driver) \
+ module_driver(__peci_driver, peci_driver_register, peci_driver_unregister)
+
extern struct device_type peci_controller_type;

int peci_controller_scan_devices(struct peci_controller *controller);
diff --git a/drivers/peci/request.c b/drivers/peci/request.c
index 7dee51c50dd2..9bd1d81c6093 100644
--- a/drivers/peci/request.c
+++ b/drivers/peci/request.c
@@ -1,13 +1,140 @@
// SPDX-License-Identifier: GPL-2.0-only
// Copyright (c) 2021 Intel Corporation

+#include <linux/bug.h>
#include <linux/export.h>
#include <linux/peci.h>
#include <linux/slab.h>
#include <linux/types.h>

+#include <asm/unaligned.h>
+
#include "internal.h"

+#define PECI_GET_DIB_CMD 0xf7
+#define PECI_GET_DIB_WR_LEN 1
+#define PECI_GET_DIB_RD_LEN 8
+
+#define PECI_RDPKGCFG_CMD 0xa1
+#define PECI_RDPKGCFG_WR_LEN 5
+#define PECI_RDPKGCFG_RD_LEN_BASE 1
+#define PECI_WRPKGCFG_CMD 0xa5
+#define PECI_WRPKGCFG_WR_LEN_BASE 6
+#define PECI_WRPKGCFG_RD_LEN 1
+
+/* Device Specific Completion Code (CC) Definition */
+#define PECI_CC_SUCCESS 0x40
+#define PECI_CC_NEED_RETRY 0x80
+#define PECI_CC_OUT_OF_RESOURCE 0x81
+#define PECI_CC_UNAVAIL_RESOURCE 0x82
+#define PECI_CC_INVALID_REQ 0x90
+#define PECI_CC_MCA_ERROR 0x91
+#define PECI_CC_CATASTROPHIC_MCA_ERROR 0x93
+#define PECI_CC_FATAL_MCA_ERROR 0x94
+#define PECI_CC_PARITY_ERR_GPSB_OR_PMSB 0x98
+#define PECI_CC_PARITY_ERR_GPSB_OR_PMSB_IERR 0x9B
+#define PECI_CC_PARITY_ERR_GPSB_OR_PMSB_MCA 0x9C
+
+#define PECI_RETRY_BIT BIT(0)
+
+#define PECI_RETRY_TIMEOUT msecs_to_jiffies(700)
+#define PECI_RETRY_INTERVAL_MIN msecs_to_jiffies(1)
+#define PECI_RETRY_INTERVAL_MAX msecs_to_jiffies(128)
+
+static u8 peci_request_data_cc(struct peci_request *req)
+{
+ return req->rx.buf[0];
+}
+
+/**
+ * peci_request_status() - return -errno based on PECI completion code
+ * @req: the PECI request that contains response data with completion code
+ *
+ * It can't be used for Ping(), GetDIB() and GetTemp() - for those commands we
+ * don't expect completion code in the response.
+ *
+ * Return: -errno
+ */
+int peci_request_status(struct peci_request *req)
+{
+ u8 cc = peci_request_data_cc(req);
+
+ if (cc != PECI_CC_SUCCESS)
+ dev_dbg(&req->device->dev, "ret: %#02x\n", cc);
+
+ switch (cc) {
+ case PECI_CC_SUCCESS:
+ return 0;
+ case PECI_CC_NEED_RETRY:
+ case PECI_CC_OUT_OF_RESOURCE:
+ case PECI_CC_UNAVAIL_RESOURCE:
+ return -EAGAIN;
+ case PECI_CC_INVALID_REQ:
+ return -EINVAL;
+ case PECI_CC_MCA_ERROR:
+ case PECI_CC_CATASTROPHIC_MCA_ERROR:
+ case PECI_CC_FATAL_MCA_ERROR:
+ case PECI_CC_PARITY_ERR_GPSB_OR_PMSB:
+ case PECI_CC_PARITY_ERR_GPSB_OR_PMSB_IERR:
+ case PECI_CC_PARITY_ERR_GPSB_OR_PMSB_MCA:
+ return -EIO;
+ }
+
+ WARN_ONCE(1, "Unknown PECI completion code: %#02x\n", cc);
+
+ return -EIO;
+}
+EXPORT_SYMBOL_NS_GPL(peci_request_status, PECI);
+
+static int peci_request_xfer(struct peci_request *req)
+{
+ struct peci_device *device = req->device;
+ struct peci_controller *controller = to_peci_controller(device->dev.parent);
+ int ret;
+
+ mutex_lock(&controller->bus_lock);
+ ret = controller->ops->xfer(controller, device->addr, req);
+ mutex_unlock(&controller->bus_lock);
+
+ return ret;
+}
+
+static int peci_request_xfer_retry(struct peci_request *req)
+{
+ long wait_interval = PECI_RETRY_INTERVAL_MIN;
+ struct peci_device *device = req->device;
+ struct peci_controller *controller = to_peci_controller(device->dev.parent);
+ unsigned long start = jiffies;
+ int ret;
+
+ /* Don't try to use it for ping */
+ if (WARN_ON(!req->rx.buf))
+ return 0;
+
+ do {
+ ret = peci_request_xfer(req);
+ if (ret) {
+ dev_dbg(&controller->dev, "xfer error: %d\n", ret);
+ return ret;
+ }
+
+ if (peci_request_status(req) != -EAGAIN)
+ return 0;
+
+ /* Set the retry bit to indicate a retry attempt */
+ req->tx.buf[1] |= PECI_RETRY_BIT;
+
+ if (schedule_timeout_interruptible(wait_interval))
+ return -ERESTARTSYS;
+
+ wait_interval = min_t(long, wait_interval * 2, PECI_RETRY_INTERVAL_MAX);
+ } while (time_before(jiffies, start + PECI_RETRY_TIMEOUT));
+
+ dev_dbg(&controller->dev, "request timed out\n");
+
+ return -ETIMEDOUT;
+}
+
/**
* peci_request_alloc() - allocate &struct peci_requests
* @device: PECI device to which request is going to be sent
@@ -53,3 +180,90 @@ void peci_request_free(struct peci_request *req)
kfree(req);
}
EXPORT_SYMBOL_NS_GPL(peci_request_free, PECI);
+
+struct peci_request *peci_xfer_get_dib(struct peci_device *device)
+{
+ struct peci_request *req;
+ int ret;
+
+ req = peci_request_alloc(device, PECI_GET_DIB_WR_LEN, PECI_GET_DIB_RD_LEN);
+ if (!req)
+ return ERR_PTR(-ENOMEM);
+
+ req->tx.buf[0] = PECI_GET_DIB_CMD;
+
+ ret = peci_request_xfer(req);
+ if (ret) {
+ peci_request_free(req);
+ return ERR_PTR(ret);
+ }
+
+ return req;
+}
+EXPORT_SYMBOL_NS_GPL(peci_xfer_get_dib, PECI);
+
+static struct peci_request *
+__pkg_cfg_read(struct peci_device *device, u8 index, u16 param, u8 len)
+{
+ struct peci_request *req;
+ int ret;
+
+ req = peci_request_alloc(device, PECI_RDPKGCFG_WR_LEN, PECI_RDPKGCFG_RD_LEN_BASE + len);
+ if (!req)
+ return ERR_PTR(-ENOMEM);
+
+ req->tx.buf[0] = PECI_RDPKGCFG_CMD;
+ req->tx.buf[1] = 0;
+ req->tx.buf[2] = index;
+ put_unaligned_le16(param, &req->tx.buf[3]);
+
+ ret = peci_request_xfer_retry(req);
+ if (ret) {
+ peci_request_free(req);
+ return ERR_PTR(ret);
+ }
+
+ return req;
+}
+
+u8 peci_request_data_readb(struct peci_request *req)
+{
+ return req->rx.buf[1];
+}
+EXPORT_SYMBOL_NS_GPL(peci_request_data_readb, PECI);
+
+u16 peci_request_data_readw(struct peci_request *req)
+{
+ return get_unaligned_le16(&req->rx.buf[1]);
+}
+EXPORT_SYMBOL_NS_GPL(peci_request_data_readw, PECI);
+
+u32 peci_request_data_readl(struct peci_request *req)
+{
+ return get_unaligned_le32(&req->rx.buf[1]);
+}
+EXPORT_SYMBOL_NS_GPL(peci_request_data_readl, PECI);
+
+u64 peci_request_data_readq(struct peci_request *req)
+{
+ return get_unaligned_le64(&req->rx.buf[1]);
+}
+EXPORT_SYMBOL_NS_GPL(peci_request_data_readq, PECI);
+
+u64 peci_request_dib_read(struct peci_request *req)
+{
+ return get_unaligned_le64(&req->rx.buf[0]);
+}
+EXPORT_SYMBOL_NS_GPL(peci_request_dib_read, PECI);
+
+#define __read_pkg_config(x, type) \
+struct peci_request *peci_xfer_pkg_cfg_##x(struct peci_device *device, u8 index, u16 param) \
+{ \
+ return __pkg_cfg_read(device, index, param, sizeof(type)); \
+} \
+EXPORT_SYMBOL_NS_GPL(peci_xfer_pkg_cfg_##x, PECI)
+
+__read_pkg_config(readb, u8);
+__read_pkg_config(readw, u16);
+__read_pkg_config(readl, u32);
+__read_pkg_config(readq, u64);
diff --git a/include/linux/peci.h b/include/linux/peci.h
index 26e0a4e73b50..dcf1c53f4e40 100644
--- a/include/linux/peci.h
+++ b/include/linux/peci.h
@@ -14,6 +14,14 @@
*/
#define PECI_REQUEST_MAX_BUF_SIZE 32

+#define PECI_PCS_PKG_ID 0 /* Package Identifier Read */
+#define PECI_PKG_ID_CPU_ID 0x0000 /* CPUID Info */
+#define PECI_PKG_ID_PLATFORM_ID 0x0001 /* Platform ID */
+#define PECI_PKG_ID_DEVICE_ID 0x0002 /* Uncore Device ID */
+#define PECI_PKG_ID_MAX_THREAD_ID 0x0003 /* Max Thread ID */
+#define PECI_PKG_ID_MICROCODE_REV 0x0004 /* CPU Microcode Update Revision */
+#define PECI_PKG_ID_MCA_ERROR_LOG 0x0005 /* Machine Check Status */
+
struct peci_controller;
struct peci_request;

@@ -59,6 +67,11 @@ static inline struct peci_controller *to_peci_controller(void *d)
* struct peci_device - PECI device
* @dev: device object to register PECI device to the device model
* @controller: manages the bus segment hosting this PECI device
+ * @info: PECI device characteristics
+ * @info.family: device family
+ * @info.model: device model
+ * @info.peci_revision: PECI revision supported by the PECI device
+ * @info.socket_id: the socket ID represented by the PECI device
* @addr: address used on the PECI bus connected to the parent controller
*
* A peci_device identifies a single device (i.e. CPU) connected to a PECI bus.
@@ -67,6 +80,12 @@ static inline struct peci_controller *to_peci_controller(void *d)
*/
struct peci_device {
struct device dev;
+ struct {
+ u16 family;
+ u8 model;
+ u8 peci_revision;
+ u8 socket_id;
+ } info;
u8 addr;
};

--
2.31.1


2021-11-15 22:29:23

by Winiarska, Iwona

[permalink] [raw]
Subject: [PATCH v3 07/13] peci: Add sysfs interface for PECI bus

PECI devices may not be discoverable at the time when PECI controller is
being added (e.g. BMC can boot up when the Host system is still in S5).
Since we currently don't have the capabilities to figure out the Host
system state inside the PECI subsystem itself, we have to rely on
userspace to do it for us.

In the future, PECI subsystem may be expanded with mechanisms that allow
us to avoid depending on userspace interaction (e.g. CPU presence could
be detected using GPIO, and the information on whether it's discoverable
could be obtained over IPMI).
Unfortunately, those methods may ultimately not be available (support
will vary from platform to platform), which means that we still need
platform independent method triggered by userspace.

Signed-off-by: Iwona Winiarska <[email protected]>
---
Documentation/ABI/testing/sysfs-bus-peci | 16 +++++
drivers/peci/Makefile | 2 +-
drivers/peci/core.c | 3 +-
drivers/peci/device.c | 1 +
drivers/peci/internal.h | 5 ++
drivers/peci/sysfs.c | 82 ++++++++++++++++++++++++
6 files changed, 107 insertions(+), 2 deletions(-)
create mode 100644 Documentation/ABI/testing/sysfs-bus-peci
create mode 100644 drivers/peci/sysfs.c

diff --git a/Documentation/ABI/testing/sysfs-bus-peci b/Documentation/ABI/testing/sysfs-bus-peci
new file mode 100644
index 000000000000..56c2b2216bbd
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-bus-peci
@@ -0,0 +1,16 @@
+What: /sys/bus/peci/rescan
+Date: July 2021
+KernelVersion: 5.15
+Contact: Iwona Winiarska <[email protected]>
+Description:
+ Writing a non-zero value to this attribute will
+ initiate scan for PECI devices on all PECI controllers
+ in the system.
+
+What: /sys/bus/peci/devices/<controller_id>-<device_addr>/remove
+Date: July 2021
+KernelVersion: 5.15
+Contact: Iwona Winiarska <[email protected]>
+Description:
+ Writing a non-zero value to this attribute will
+ remove the PECI device and any of its children.
diff --git a/drivers/peci/Makefile b/drivers/peci/Makefile
index c5f9d3fe21bb..917f689e147a 100644
--- a/drivers/peci/Makefile
+++ b/drivers/peci/Makefile
@@ -1,7 +1,7 @@
# SPDX-License-Identifier: GPL-2.0-only

# Core functionality
-peci-y := core.o request.o device.o
+peci-y := core.o request.o device.o sysfs.o
obj-$(CONFIG_PECI) += peci.o

# Hardware specific bus drivers
diff --git a/drivers/peci/core.c b/drivers/peci/core.c
index c3361e6e043a..e993615cf521 100644
--- a/drivers/peci/core.c
+++ b/drivers/peci/core.c
@@ -29,7 +29,7 @@ struct device_type peci_controller_type = {
.release = peci_controller_dev_release,
};

-static int peci_controller_scan_devices(struct peci_controller *controller)
+int peci_controller_scan_devices(struct peci_controller *controller)
{
int ret;
u8 addr;
@@ -162,6 +162,7 @@ EXPORT_SYMBOL_NS_GPL(devm_peci_controller_add, PECI);

struct bus_type peci_bus_type = {
.name = "peci",
+ .bus_groups = peci_bus_groups,
};

static int __init peci_init(void)
diff --git a/drivers/peci/device.c b/drivers/peci/device.c
index 39f7f6409391..91b1522ce6f7 100644
--- a/drivers/peci/device.c
+++ b/drivers/peci/device.c
@@ -113,5 +113,6 @@ static void peci_device_release(struct device *dev)
}

struct device_type peci_device_type = {
+ .groups = peci_device_groups,
.release = peci_device_release,
};
diff --git a/drivers/peci/internal.h b/drivers/peci/internal.h
index 57d11a902c5d..978e12c8e1d3 100644
--- a/drivers/peci/internal.h
+++ b/drivers/peci/internal.h
@@ -8,6 +8,7 @@
#include <linux/types.h>

struct peci_controller;
+struct attribute_group;
struct peci_device;
struct peci_request;

@@ -19,12 +20,16 @@ struct peci_request *peci_request_alloc(struct peci_device *device, u8 tx_len, u
void peci_request_free(struct peci_request *req);

extern struct device_type peci_device_type;
+extern const struct attribute_group *peci_device_groups[];

int peci_device_create(struct peci_controller *controller, u8 addr);
void peci_device_destroy(struct peci_device *device);

extern struct bus_type peci_bus_type;
+extern const struct attribute_group *peci_bus_groups[];

extern struct device_type peci_controller_type;

+int peci_controller_scan_devices(struct peci_controller *controller);
+
#endif /* __PECI_INTERNAL_H */
diff --git a/drivers/peci/sysfs.c b/drivers/peci/sysfs.c
new file mode 100644
index 000000000000..db9ef05776e3
--- /dev/null
+++ b/drivers/peci/sysfs.c
@@ -0,0 +1,82 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright (c) 2021 Intel Corporation
+
+#include <linux/device.h>
+#include <linux/kernel.h>
+#include <linux/peci.h>
+
+#include "internal.h"
+
+static int rescan_controller(struct device *dev, void *data)
+{
+ if (dev->type != &peci_controller_type)
+ return 0;
+
+ return peci_controller_scan_devices(to_peci_controller(dev));
+}
+
+static ssize_t rescan_store(struct bus_type *bus, const char *buf, size_t count)
+{
+ bool res;
+ int ret;
+
+ ret = kstrtobool(buf, &res);
+ if (ret)
+ return ret;
+
+ if (!res)
+ return count;
+
+ ret = bus_for_each_dev(&peci_bus_type, NULL, NULL, rescan_controller);
+ if (ret)
+ return ret;
+
+ return count;
+}
+static BUS_ATTR_WO(rescan);
+
+static struct attribute *peci_bus_attrs[] = {
+ &bus_attr_rescan.attr,
+ NULL
+};
+
+static const struct attribute_group peci_bus_group = {
+ .attrs = peci_bus_attrs,
+};
+
+const struct attribute_group *peci_bus_groups[] = {
+ &peci_bus_group,
+ NULL
+};
+
+static ssize_t remove_store(struct device *dev, struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct peci_device *device = to_peci_device(dev);
+ bool res;
+ int ret;
+
+ ret = kstrtobool(buf, &res);
+ if (ret)
+ return ret;
+
+ if (res && device_remove_file_self(dev, attr))
+ peci_device_destroy(device);
+
+ return count;
+}
+static DEVICE_ATTR_IGNORE_LOCKDEP(remove, 0200, NULL, remove_store);
+
+static struct attribute *peci_device_attrs[] = {
+ &dev_attr_remove.attr,
+ NULL
+};
+
+static const struct attribute_group peci_device_group = {
+ .attrs = peci_device_attrs,
+};
+
+const struct attribute_group *peci_device_groups[] = {
+ &peci_device_group,
+ NULL
+};
--
2.31.1


2021-11-15 22:29:23

by Winiarska, Iwona

[permalink] [raw]
Subject: [PATCH v3 04/13] peci: Add core infrastructure

Intel processors provide access for various services designed to support
processor and DRAM thermal management, platform manageability and
processor interface tuning and diagnostics.
Those services are available via the Platform Environment Control
Interface (PECI) that provides a communication channel between the
processor and the Baseboard Management Controller (BMC) or other
platform management device.

This change introduces PECI subsystem by adding the initial core module
and API for controller drivers.

Co-developed-by: Jason M Bills <[email protected]>
Signed-off-by: Jason M Bills <[email protected]>
Co-developed-by: Jae Hyun Yoo <[email protected]>
Signed-off-by: Jae Hyun Yoo <[email protected]>
Signed-off-by: Iwona Winiarska <[email protected]>
Reviewed-by: Pierre-Louis Bossart <[email protected]>
---
MAINTAINERS | 9 +++
drivers/Kconfig | 3 +
drivers/Makefile | 1 +
drivers/peci/Kconfig | 15 ++++
drivers/peci/Makefile | 5 ++
drivers/peci/core.c | 158 ++++++++++++++++++++++++++++++++++++++++
drivers/peci/internal.h | 16 ++++
include/linux/peci.h | 99 +++++++++++++++++++++++++
8 files changed, 306 insertions(+)
create mode 100644 drivers/peci/Kconfig
create mode 100644 drivers/peci/Makefile
create mode 100644 drivers/peci/core.c
create mode 100644 drivers/peci/internal.h
create mode 100644 include/linux/peci.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 7a2345ce8521..11a6c0869010 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14911,6 +14911,15 @@ L: [email protected]
S: Maintained
F: drivers/platform/x86/peaq-wmi.c

+PECI SUBSYSTEM
+M: Iwona Winiarska <[email protected]>
+R: Jae Hyun Yoo <[email protected]>
+L: [email protected] (moderated for non-subscribers)
+S: Supported
+F: Documentation/devicetree/bindings/peci/
+F: drivers/peci/
+F: include/linux/peci.h
+
PENSANDO ETHERNET DRIVERS
M: Shannon Nelson <[email protected]>
M: [email protected]
diff --git a/drivers/Kconfig b/drivers/Kconfig
index 0d399ddaa185..8d6cd5d08722 100644
--- a/drivers/Kconfig
+++ b/drivers/Kconfig
@@ -236,4 +236,7 @@ source "drivers/interconnect/Kconfig"
source "drivers/counter/Kconfig"

source "drivers/most/Kconfig"
+
+source "drivers/peci/Kconfig"
+
endmenu
diff --git a/drivers/Makefile b/drivers/Makefile
index be5d40ae1488..ed84d5102302 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -188,3 +188,4 @@ obj-$(CONFIG_GNSS) += gnss/
obj-$(CONFIG_INTERCONNECT) += interconnect/
obj-$(CONFIG_COUNTER) += counter/
obj-$(CONFIG_MOST) += most/
+obj-$(CONFIG_PECI) += peci/
diff --git a/drivers/peci/Kconfig b/drivers/peci/Kconfig
new file mode 100644
index 000000000000..71a4ad81225a
--- /dev/null
+++ b/drivers/peci/Kconfig
@@ -0,0 +1,15 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+menuconfig PECI
+ tristate "PECI support"
+ help
+ The Platform Environment Control Interface (PECI) is an interface
+ that provides a communication channel to Intel processors and
+ chipset components from external monitoring or control devices.
+
+ If you are building a Baseboard Management Controller (BMC) kernel
+ for Intel platform say Y here and also to the specific driver for
+ your adapter(s) below. If unsure say N.
+
+ This support is also available as a module. If so, the module
+ will be called peci.
diff --git a/drivers/peci/Makefile b/drivers/peci/Makefile
new file mode 100644
index 000000000000..e789a354e842
--- /dev/null
+++ b/drivers/peci/Makefile
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+# Core functionality
+peci-y := core.o
+obj-$(CONFIG_PECI) += peci.o
diff --git a/drivers/peci/core.c b/drivers/peci/core.c
new file mode 100644
index 000000000000..73ad0a47fa9d
--- /dev/null
+++ b/drivers/peci/core.c
@@ -0,0 +1,158 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright (c) 2018-2021 Intel Corporation
+
+#include <linux/bug.h>
+#include <linux/device.h>
+#include <linux/export.h>
+#include <linux/idr.h>
+#include <linux/module.h>
+#include <linux/of.h>
+#include <linux/peci.h>
+#include <linux/pm_runtime.h>
+#include <linux/property.h>
+#include <linux/slab.h>
+
+#include "internal.h"
+
+static DEFINE_IDA(peci_controller_ida);
+
+static void peci_controller_dev_release(struct device *dev)
+{
+ struct peci_controller *controller = to_peci_controller(dev);
+
+ mutex_destroy(&controller->bus_lock);
+ ida_free(&peci_controller_ida, controller->id);
+ kfree(controller);
+}
+
+struct device_type peci_controller_type = {
+ .release = peci_controller_dev_release,
+};
+
+static struct peci_controller *peci_controller_alloc(struct device *dev,
+ struct peci_controller_ops *ops)
+{
+ struct peci_controller *controller;
+ int ret;
+
+ if (!ops->xfer)
+ return ERR_PTR(-EINVAL);
+
+ controller = kzalloc(sizeof(*controller), GFP_KERNEL);
+ if (!controller)
+ return ERR_PTR(-ENOMEM);
+
+ ret = ida_alloc_max(&peci_controller_ida, U8_MAX, GFP_KERNEL);
+ if (ret < 0)
+ goto err;
+ controller->id = ret;
+
+ controller->ops = ops;
+
+ controller->dev.parent = dev;
+ controller->dev.bus = &peci_bus_type;
+ controller->dev.type = &peci_controller_type;
+
+ device_initialize(&controller->dev);
+
+ mutex_init(&controller->bus_lock);
+
+ return controller;
+
+err:
+ kfree(controller);
+ return ERR_PTR(ret);
+}
+
+static void unregister_controller(void *_controller)
+{
+ struct peci_controller *controller = _controller;
+
+ device_unregister(&controller->dev);
+
+ fwnode_handle_put(controller->dev.fwnode);
+
+ pm_runtime_disable(&controller->dev);
+}
+
+/**
+ * devm_peci_controller_add() - add PECI controller
+ * @dev: device for devm operations
+ * @ops: pointer to controller specific methods
+ *
+ * In final stage of its probe(), peci_controller driver calls
+ * devm_peci_controller_add() to register itself with the PECI bus.
+ *
+ * Return: Pointer to the newly allocated controller or ERR_PTR() in case of failure.
+ */
+struct peci_controller *devm_peci_controller_add(struct device *dev,
+ struct peci_controller_ops *ops)
+{
+ struct peci_controller *controller;
+ int ret;
+
+ controller = peci_controller_alloc(dev, ops);
+ if (IS_ERR(controller))
+ return controller;
+
+ ret = dev_set_name(&controller->dev, "peci-%d", controller->id);
+ if (ret)
+ goto err_put;
+
+ pm_runtime_no_callbacks(&controller->dev);
+ pm_suspend_ignore_children(&controller->dev, true);
+ pm_runtime_enable(&controller->dev);
+
+ device_set_node(&controller->dev, fwnode_handle_get(dev_fwnode(dev)));
+
+ ret = device_add(&controller->dev);
+ if (ret)
+ goto err_fwnode;
+
+ ret = devm_add_action_or_reset(dev, unregister_controller, controller);
+ if (ret)
+ return ERR_PTR(ret);
+
+ return controller;
+
+err_fwnode:
+ fwnode_handle_put(controller->dev.fwnode);
+
+ pm_runtime_disable(&controller->dev);
+
+err_put:
+ put_device(&controller->dev);
+
+ return ERR_PTR(ret);
+}
+EXPORT_SYMBOL_NS_GPL(devm_peci_controller_add, PECI);
+
+struct bus_type peci_bus_type = {
+ .name = "peci",
+};
+
+static int __init peci_init(void)
+{
+ int ret;
+
+ ret = bus_register(&peci_bus_type);
+ if (ret < 0) {
+ pr_err("peci: failed to register PECI bus type!\n");
+ return ret;
+ }
+
+ return 0;
+}
+module_init(peci_init);
+
+static void __exit peci_exit(void)
+{
+ bus_unregister(&peci_bus_type);
+}
+module_exit(peci_exit);
+
+MODULE_AUTHOR("Jason M Bills <[email protected]>");
+MODULE_AUTHOR("Jae Hyun Yoo <[email protected]>");
+MODULE_AUTHOR("Iwona Winiarska <[email protected]>");
+MODULE_DESCRIPTION("PECI bus core module");
+MODULE_LICENSE("GPL");
diff --git a/drivers/peci/internal.h b/drivers/peci/internal.h
new file mode 100644
index 000000000000..918dea745a86
--- /dev/null
+++ b/drivers/peci/internal.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright (c) 2018-2021 Intel Corporation */
+
+#ifndef __PECI_INTERNAL_H
+#define __PECI_INTERNAL_H
+
+#include <linux/device.h>
+#include <linux/types.h>
+
+struct peci_controller;
+
+extern struct bus_type peci_bus_type;
+
+extern struct device_type peci_controller_type;
+
+#endif /* __PECI_INTERNAL_H */
diff --git a/include/linux/peci.h b/include/linux/peci.h
new file mode 100644
index 000000000000..26e0a4e73b50
--- /dev/null
+++ b/include/linux/peci.h
@@ -0,0 +1,99 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright (c) 2018-2021 Intel Corporation */
+
+#ifndef __LINUX_PECI_H
+#define __LINUX_PECI_H
+
+#include <linux/device.h>
+#include <linux/kernel.h>
+#include <linux/mutex.h>
+#include <linux/types.h>
+
+/*
+ * Currently we don't support any PECI command over 32 bytes.
+ */
+#define PECI_REQUEST_MAX_BUF_SIZE 32
+
+struct peci_controller;
+struct peci_request;
+
+/**
+ * struct peci_controller_ops - PECI controller specific methods
+ * @xfer: PECI transfer function
+ *
+ * PECI controllers may have different hardware interfaces - the drivers
+ * implementing PECI controllers can use this structure to abstract away those
+ * differences by exposing a common interface for PECI core.
+ */
+struct peci_controller_ops {
+ int (*xfer)(struct peci_controller *controller, u8 addr, struct peci_request *req);
+};
+
+/**
+ * struct peci_controller - PECI controller
+ * @dev: device object to register PECI controller to the device model
+ * @ops: pointer to device specific controller operations
+ * @bus_lock: lock used to protect multiple callers
+ * @id: PECI controller ID
+ *
+ * PECI controllers usually connect to their drivers using non-PECI bus,
+ * such as the platform bus.
+ * Each PECI controller can communicate with one or more PECI devices.
+ */
+struct peci_controller {
+ struct device dev;
+ struct peci_controller_ops *ops;
+ struct mutex bus_lock; /* held for the duration of xfer */
+ u8 id;
+};
+
+struct peci_controller *devm_peci_controller_add(struct device *parent,
+ struct peci_controller_ops *ops);
+
+static inline struct peci_controller *to_peci_controller(void *d)
+{
+ return container_of(d, struct peci_controller, dev);
+}
+
+/**
+ * struct peci_device - PECI device
+ * @dev: device object to register PECI device to the device model
+ * @controller: manages the bus segment hosting this PECI device
+ * @addr: address used on the PECI bus connected to the parent controller
+ *
+ * A peci_device identifies a single device (i.e. CPU) connected to a PECI bus.
+ * The behaviour exposed to the rest of the system is defined by the PECI driver
+ * managing the device.
+ */
+struct peci_device {
+ struct device dev;
+ u8 addr;
+};
+
+static inline struct peci_device *to_peci_device(struct device *d)
+{
+ return container_of(d, struct peci_device, dev);
+}
+
+/**
+ * struct peci_request - PECI request
+ * @device: PECI device to which the request is sent
+ * @tx: TX buffer specific data
+ * @tx.buf: TX buffer
+ * @tx.len: transfer data length in bytes
+ * @rx: RX buffer specific data
+ * @rx.buf: RX buffer
+ * @rx.len: received data length in bytes
+ *
+ * A peci_request represents a request issued by PECI originator (TX) and
+ * a response received from PECI responder (RX).
+ */
+struct peci_request {
+ struct peci_device *device;
+ struct {
+ u8 buf[PECI_REQUEST_MAX_BUF_SIZE];
+ u8 len;
+ } rx, tx;
+};
+
+#endif /* __LINUX_PECI_H */
--
2.31.1


2021-11-15 22:29:24

by Winiarska, Iwona

[permalink] [raw]
Subject: [PATCH v3 11/13] hwmon: peci: Add dimmtemp driver

Add peci-dimmtemp driver for Temperature Sensor on DIMM readings that
are accessible via the processor PECI interface.

The main use case for the driver (and PECI interface) is out-of-band
management, where we're able to obtain thermal readings from an external
entity connected with PECI, e.g. BMC on server platforms.

Co-developed-by: Jae Hyun Yoo <[email protected]>
Signed-off-by: Jae Hyun Yoo <[email protected]>
Signed-off-by: Iwona Winiarska <[email protected]>
Reviewed-by: Pierre-Louis Bossart <[email protected]>
---
drivers/hwmon/peci/Kconfig | 13 +
drivers/hwmon/peci/Makefile | 2 +
drivers/hwmon/peci/dimmtemp.c | 630 ++++++++++++++++++++++++++++++++++
3 files changed, 645 insertions(+)
create mode 100644 drivers/hwmon/peci/dimmtemp.c

diff --git a/drivers/hwmon/peci/Kconfig b/drivers/hwmon/peci/Kconfig
index e10eed68d70a..9d32a57badfe 100644
--- a/drivers/hwmon/peci/Kconfig
+++ b/drivers/hwmon/peci/Kconfig
@@ -14,5 +14,18 @@ config SENSORS_PECI_CPUTEMP
This driver can also be built as a module. If so, the module
will be called peci-cputemp.

+config SENSORS_PECI_DIMMTEMP
+ tristate "PECI DIMM temperature monitoring client"
+ depends on PECI
+ select SENSORS_PECI
+ select PECI_CPU
+ help
+ If you say yes here you get support for the generic Intel PECI hwmon
+ driver which provides Temperature Sensor on DIMM readings that are
+ accessible via the processor PECI interface.
+
+ This driver can also be built as a module. If so, the module
+ will be called peci-dimmtemp.
+
config SENSORS_PECI
tristate
diff --git a/drivers/hwmon/peci/Makefile b/drivers/hwmon/peci/Makefile
index e8a0ada5ab1f..191cfa0227f3 100644
--- a/drivers/hwmon/peci/Makefile
+++ b/drivers/hwmon/peci/Makefile
@@ -1,5 +1,7 @@
# SPDX-License-Identifier: GPL-2.0-only

peci-cputemp-y := cputemp.o
+peci-dimmtemp-y := dimmtemp.o

obj-$(CONFIG_SENSORS_PECI_CPUTEMP) += peci-cputemp.o
+obj-$(CONFIG_SENSORS_PECI_DIMMTEMP) += peci-dimmtemp.o
diff --git a/drivers/hwmon/peci/dimmtemp.c b/drivers/hwmon/peci/dimmtemp.c
new file mode 100644
index 000000000000..ec435b03e401
--- /dev/null
+++ b/drivers/hwmon/peci/dimmtemp.c
@@ -0,0 +1,630 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright (c) 2018-2021 Intel Corporation
+
+#include <linux/auxiliary_bus.h>
+#include <linux/bitfield.h>
+#include <linux/bitops.h>
+#include <linux/hwmon.h>
+#include <linux/jiffies.h>
+#include <linux/module.h>
+#include <linux/peci.h>
+#include <linux/peci-cpu.h>
+#include <linux/units.h>
+#include <linux/workqueue.h>
+
+#include "common.h"
+
+#define DIMM_MASK_CHECK_DELAY_JIFFIES msecs_to_jiffies(5000)
+
+/* Max number of channel ranks and DIMM index per channel */
+#define CHAN_RANK_MAX_ON_HSX 8
+#define DIMM_IDX_MAX_ON_HSX 3
+#define CHAN_RANK_MAX_ON_BDX 4
+#define DIMM_IDX_MAX_ON_BDX 3
+#define CHAN_RANK_MAX_ON_BDXD 2
+#define DIMM_IDX_MAX_ON_BDXD 2
+#define CHAN_RANK_MAX_ON_SKX 6
+#define DIMM_IDX_MAX_ON_SKX 2
+#define CHAN_RANK_MAX_ON_ICX 8
+#define DIMM_IDX_MAX_ON_ICX 2
+#define CHAN_RANK_MAX_ON_ICXD 4
+#define DIMM_IDX_MAX_ON_ICXD 2
+
+#define CHAN_RANK_MAX CHAN_RANK_MAX_ON_HSX
+#define DIMM_IDX_MAX DIMM_IDX_MAX_ON_HSX
+#define DIMM_NUMS_MAX (CHAN_RANK_MAX * DIMM_IDX_MAX)
+
+#define CPU_SEG_MASK GENMASK(23, 16)
+#define GET_CPU_SEG(x) (((x) & CPU_SEG_MASK) >> 16)
+#define CPU_BUS_MASK GENMASK(7, 0)
+#define GET_CPU_BUS(x) ((x) & CPU_BUS_MASK)
+
+#define DIMM_TEMP_MAX GENMASK(15, 8)
+#define DIMM_TEMP_CRIT GENMASK(23, 16)
+#define GET_TEMP_MAX(x) (((x) & DIMM_TEMP_MAX) >> 8)
+#define GET_TEMP_CRIT(x) (((x) & DIMM_TEMP_CRIT) >> 16)
+
+#define NO_DIMM_RETRY_COUNT_MAX 5
+
+struct peci_dimmtemp;
+
+struct dimm_info {
+ int chan_rank_max;
+ int dimm_idx_max;
+ u8 min_peci_revision;
+ int (*read_thresholds)(struct peci_dimmtemp *priv, int dimm_order,
+ int chan_rank, u32 *data);
+};
+
+struct peci_dimm_thresholds {
+ long temp_max;
+ long temp_crit;
+ struct peci_sensor_state state;
+};
+
+enum peci_dimm_threshold_type {
+ temp_max_type,
+ temp_crit_type,
+};
+
+struct peci_dimmtemp {
+ struct peci_device *peci_dev;
+ struct device *dev;
+ const char *name;
+ const struct dimm_info *gen_info;
+ struct delayed_work detect_work;
+ struct {
+ struct peci_sensor_data temp;
+ struct peci_dimm_thresholds thresholds;
+ } dimm[DIMM_NUMS_MAX];
+ char **dimmtemp_label;
+ DECLARE_BITMAP(dimm_mask, DIMM_NUMS_MAX);
+ u8 no_dimm_retry_count;
+};
+
+static u8 __dimm_temp(u32 reg, int dimm_order)
+{
+ return (reg >> (dimm_order * 8)) & 0xff;
+}
+
+static int get_dimm_temp(struct peci_dimmtemp *priv, int dimm_no, long *val)
+{
+ int dimm_order = dimm_no % priv->gen_info->dimm_idx_max;
+ int chan_rank = dimm_no / priv->gen_info->dimm_idx_max;
+ int ret = 0;
+ u32 data;
+
+ mutex_lock(&priv->dimm[dimm_no].temp.state.lock);
+ if (!peci_sensor_need_update(&priv->dimm[dimm_no].temp.state))
+ goto skip_update;
+
+ ret = peci_pcs_read(priv->peci_dev, PECI_PCS_DDR_DIMM_TEMP, chan_rank, &data);
+ if (ret)
+ goto unlock;
+
+ priv->dimm[dimm_no].temp.value = __dimm_temp(data, dimm_order) * MILLIDEGREE_PER_DEGREE;
+
+ peci_sensor_mark_updated(&priv->dimm[dimm_no].temp.state);
+
+skip_update:
+ *val = priv->dimm[dimm_no].temp.value;
+unlock:
+ mutex_unlock(&priv->dimm[dimm_no].temp.state.lock);
+ return ret;
+}
+
+static int update_thresholds(struct peci_dimmtemp *priv, int dimm_no)
+{
+ int dimm_order = dimm_no % priv->gen_info->dimm_idx_max;
+ int chan_rank = dimm_no / priv->gen_info->dimm_idx_max;
+ u32 data;
+ int ret;
+
+ if (!peci_sensor_need_update(&priv->dimm[dimm_no].thresholds.state))
+ return 0;
+
+ ret = priv->gen_info->read_thresholds(priv, dimm_order, chan_rank, &data);
+ if (ret == -ENODATA) /* Use default or previous value */
+ return 0;
+ if (ret)
+ return ret;
+
+ priv->dimm[dimm_no].thresholds.temp_max = GET_TEMP_MAX(data) * MILLIDEGREE_PER_DEGREE;
+ priv->dimm[dimm_no].thresholds.temp_crit = GET_TEMP_CRIT(data) * MILLIDEGREE_PER_DEGREE;
+
+ peci_sensor_mark_updated(&priv->dimm[dimm_no].thresholds.state);
+
+ return 0;
+}
+
+static int get_dimm_thresholds(struct peci_dimmtemp *priv, enum peci_dimm_threshold_type type,
+ int dimm_no, long *val)
+{
+ int ret;
+
+ mutex_lock(&priv->dimm[dimm_no].thresholds.state.lock);
+ ret = update_thresholds(priv, dimm_no);
+ if (ret)
+ goto unlock;
+
+ switch (type) {
+ case temp_max_type:
+ *val = priv->dimm[dimm_no].thresholds.temp_max;
+ break;
+ case temp_crit_type:
+ *val = priv->dimm[dimm_no].thresholds.temp_crit;
+ break;
+ default:
+ ret = -EOPNOTSUPP;
+ break;
+ }
+unlock:
+ mutex_unlock(&priv->dimm[dimm_no].thresholds.state.lock);
+
+ return ret;
+}
+
+static int dimmtemp_read_string(struct device *dev,
+ enum hwmon_sensor_types type,
+ u32 attr, int channel, const char **str)
+{
+ struct peci_dimmtemp *priv = dev_get_drvdata(dev);
+
+ if (attr != hwmon_temp_label)
+ return -EOPNOTSUPP;
+
+ *str = (const char *)priv->dimmtemp_label[channel];
+
+ return 0;
+}
+
+static int dimmtemp_read(struct device *dev, enum hwmon_sensor_types type,
+ u32 attr, int channel, long *val)
+{
+ struct peci_dimmtemp *priv = dev_get_drvdata(dev);
+
+ switch (attr) {
+ case hwmon_temp_input:
+ return get_dimm_temp(priv, channel, val);
+ case hwmon_temp_max:
+ return get_dimm_thresholds(priv, temp_max_type, channel, val);
+ case hwmon_temp_crit:
+ return get_dimm_thresholds(priv, temp_crit_type, channel, val);
+ default:
+ break;
+ }
+
+ return -EOPNOTSUPP;
+}
+
+static umode_t dimmtemp_is_visible(const void *data, enum hwmon_sensor_types type,
+ u32 attr, int channel)
+{
+ const struct peci_dimmtemp *priv = data;
+
+ if (test_bit(channel, priv->dimm_mask))
+ return 0444;
+
+ return 0;
+}
+
+static const struct hwmon_ops peci_dimmtemp_ops = {
+ .is_visible = dimmtemp_is_visible,
+ .read_string = dimmtemp_read_string,
+ .read = dimmtemp_read,
+};
+
+static int check_populated_dimms(struct peci_dimmtemp *priv)
+{
+ int chan_rank_max = priv->gen_info->chan_rank_max;
+ int dimm_idx_max = priv->gen_info->dimm_idx_max;
+ u32 chan_rank_empty = 0;
+ u64 dimm_mask = 0;
+ int chan_rank, dimm_idx, ret;
+ u32 pcs;
+
+ BUILD_BUG_ON(BITS_PER_TYPE(chan_rank_empty) < CHAN_RANK_MAX);
+ BUILD_BUG_ON(BITS_PER_TYPE(dimm_mask) < DIMM_NUMS_MAX);
+ if (chan_rank_max * dimm_idx_max > DIMM_NUMS_MAX) {
+ WARN_ONCE(1, "Unsupported number of DIMMs - chan_rank_max: %d, dimm_idx_max: %d",
+ chan_rank_max, dimm_idx_max);
+ return -EINVAL;
+ }
+
+ for (chan_rank = 0; chan_rank < chan_rank_max; chan_rank++) {
+ ret = peci_pcs_read(priv->peci_dev, PECI_PCS_DDR_DIMM_TEMP, chan_rank, &pcs);
+ if (ret) {
+ /*
+ * Overall, we expect either success or -EINVAL in
+ * order to determine whether DIMM is populated or not.
+ * For anything else we fall back to deferring the
+ * detection to be performed at a later point in time.
+ */
+ if (ret == -EINVAL) {
+ chan_rank_empty |= BIT(chan_rank);
+ continue;
+ }
+
+ return -EAGAIN;
+ }
+
+ for (dimm_idx = 0; dimm_idx < dimm_idx_max; dimm_idx++)
+ if (__dimm_temp(pcs, dimm_idx))
+ dimm_mask |= BIT(chan_rank * dimm_idx_max + dimm_idx);
+ }
+
+ /*
+ * If we got all -EINVALs, it means that the CPU doesn't have any
+ * DIMMs. Unfortunately, it may also happen at the very start of
+ * host platform boot. Retrying a couple of times lets us make sure
+ * that the state is persistent.
+ */
+ if (chan_rank_empty == GENMASK(chan_rank_max - 1, 0)) {
+ if (priv->no_dimm_retry_count < NO_DIMM_RETRY_COUNT_MAX) {
+ priv->no_dimm_retry_count++;
+
+ return -EAGAIN;
+ } else {
+ return -ENODEV;
+ }
+ }
+
+ /*
+ * It's possible that memory training is not done yet. In this case we
+ * defer the detection to be performed at a later point in time.
+ */
+ if (!dimm_mask) {
+ priv->no_dimm_retry_count = 0;
+ return -EAGAIN;
+ }
+
+ dev_dbg(priv->dev, "Scanned populated DIMMs: %#llx\n", dimm_mask);
+
+ bitmap_from_u64(priv->dimm_mask, dimm_mask);
+
+ return 0;
+}
+
+static int create_dimm_temp_label(struct peci_dimmtemp *priv, int chan)
+{
+ int rank = chan / priv->gen_info->dimm_idx_max;
+ int idx = chan % priv->gen_info->dimm_idx_max;
+
+ priv->dimmtemp_label[chan] = devm_kasprintf(priv->dev, GFP_KERNEL,
+ "DIMM %c%d", 'A' + rank,
+ idx + 1);
+ if (!priv->dimmtemp_label[chan])
+ return -ENOMEM;
+
+ return 0;
+}
+
+static const u32 peci_dimmtemp_temp_channel_config[] = {
+ [0 ... DIMM_NUMS_MAX - 1] = HWMON_T_LABEL | HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_CRIT,
+ 0
+};
+
+static const struct hwmon_channel_info peci_dimmtemp_temp_channel = {
+ .type = hwmon_temp,
+ .config = peci_dimmtemp_temp_channel_config,
+};
+
+static const struct hwmon_channel_info *peci_dimmtemp_temp_info[] = {
+ &peci_dimmtemp_temp_channel,
+ NULL
+};
+
+static const struct hwmon_chip_info peci_dimmtemp_chip_info = {
+ .ops = &peci_dimmtemp_ops,
+ .info = peci_dimmtemp_temp_info,
+};
+
+static int create_dimm_temp_info(struct peci_dimmtemp *priv)
+{
+ int ret, i, channels;
+ struct device *dev;
+
+ /*
+ * We expect to either find populated DIMMs and carry on with creating
+ * sensors, or find out that there are no DIMMs populated.
+ * All other states mean that the platform never reached the state that
+ * allows to check DIMM state - causing us to retry later on.
+ */
+ ret = check_populated_dimms(priv);
+ if (ret == -ENODEV) {
+ dev_dbg(priv->dev, "No DIMMs found\n");
+ return 0;
+ } else if (ret) {
+ schedule_delayed_work(&priv->detect_work, DIMM_MASK_CHECK_DELAY_JIFFIES);
+ dev_dbg(priv->dev, "Deferred populating DIMM temp info\n");
+ return ret;
+ }
+
+ channels = priv->gen_info->chan_rank_max * priv->gen_info->dimm_idx_max;
+
+ priv->dimmtemp_label = devm_kzalloc(priv->dev, channels * sizeof(char *), GFP_KERNEL);
+ if (!priv->dimmtemp_label)
+ return -ENOMEM;
+
+ for_each_set_bit(i, priv->dimm_mask, DIMM_NUMS_MAX) {
+ ret = create_dimm_temp_label(priv, i);
+ if (ret)
+ return ret;
+ mutex_init(&priv->dimm[i].thresholds.state.lock);
+ mutex_init(&priv->dimm[i].temp.state.lock);
+ }
+
+ dev = devm_hwmon_device_register_with_info(priv->dev, priv->name, priv,
+ &peci_dimmtemp_chip_info, NULL);
+ if (IS_ERR(dev)) {
+ dev_err(priv->dev, "Failed to register hwmon device\n");
+ return PTR_ERR(dev);
+ }
+
+ dev_dbg(priv->dev, "%s: sensor '%s'\n", dev_name(dev), priv->name);
+
+ return 0;
+}
+
+static void create_dimm_temp_info_delayed(struct work_struct *work)
+{
+ struct peci_dimmtemp *priv = container_of(to_delayed_work(work),
+ struct peci_dimmtemp,
+ detect_work);
+ int ret;
+
+ ret = create_dimm_temp_info(priv);
+ if (ret && ret != -EAGAIN)
+ dev_err(priv->dev, "Failed to populate DIMM temp info\n");
+}
+
+static void remove_delayed_work(void *_priv)
+{
+ struct peci_dimmtemp *priv = _priv;
+
+ cancel_delayed_work_sync(&priv->detect_work);
+}
+
+static int peci_dimmtemp_probe(struct auxiliary_device *adev, const struct auxiliary_device_id *id)
+{
+ struct device *dev = &adev->dev;
+ struct peci_device *peci_dev = to_peci_device(dev->parent);
+ struct peci_dimmtemp *priv;
+ int ret;
+
+ priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
+ if (!priv)
+ return -ENOMEM;
+
+ priv->name = devm_kasprintf(dev, GFP_KERNEL, "peci_dimmtemp.cpu%d",
+ peci_dev->info.socket_id);
+ if (!priv->name)
+ return -ENOMEM;
+
+ priv->dev = dev;
+ priv->peci_dev = peci_dev;
+ priv->gen_info = (const struct dimm_info *)id->driver_data;
+
+ /*
+ * This is just a sanity check. Since we're using commands that are
+ * guaranteed to be supported on a given platform, we should never see
+ * revision lower than expected.
+ */
+ if (peci_dev->info.peci_revision < priv->gen_info->min_peci_revision)
+ dev_warn(priv->dev,
+ "Unexpected PECI revision %#x, some features may be unavailable\n",
+ peci_dev->info.peci_revision);
+
+ INIT_DELAYED_WORK(&priv->detect_work, create_dimm_temp_info_delayed);
+
+ ret = devm_add_action_or_reset(priv->dev, remove_delayed_work, priv);
+ if (ret)
+ return ret;
+
+ ret = create_dimm_temp_info(priv);
+ if (ret && ret != -EAGAIN) {
+ dev_err(dev, "Failed to populate DIMM temp info\n");
+ return ret;
+ }
+
+ return 0;
+}
+
+static int
+read_thresholds_hsx(struct peci_dimmtemp *priv, int dimm_order, int chan_rank, u32 *data)
+{
+ u8 dev, func;
+ u16 reg;
+ int ret;
+
+ /*
+ * Device 20, Function 0: IMC 0 channel 0 -> rank 0
+ * Device 20, Function 1: IMC 0 channel 1 -> rank 1
+ * Device 21, Function 0: IMC 0 channel 2 -> rank 2
+ * Device 21, Function 1: IMC 0 channel 3 -> rank 3
+ * Device 23, Function 0: IMC 1 channel 0 -> rank 4
+ * Device 23, Function 1: IMC 1 channel 1 -> rank 5
+ * Device 24, Function 0: IMC 1 channel 2 -> rank 6
+ * Device 24, Function 1: IMC 1 channel 3 -> rank 7
+ */
+ dev = 20 + chan_rank / 2 + chan_rank / 4;
+ func = chan_rank % 2;
+ reg = 0x120 + dimm_order * 4;
+
+ ret = peci_pci_local_read(priv->peci_dev, 1, dev, func, reg, data);
+ if (ret)
+ return ret;
+
+ return 0;
+}
+
+static int
+read_thresholds_bdxd(struct peci_dimmtemp *priv, int dimm_order, int chan_rank, u32 *data)
+{
+ u8 dev, func;
+ u16 reg;
+ int ret;
+
+ /*
+ * Device 10, Function 2: IMC 0 channel 0 -> rank 0
+ * Device 10, Function 6: IMC 0 channel 1 -> rank 1
+ * Device 12, Function 2: IMC 1 channel 0 -> rank 2
+ * Device 12, Function 6: IMC 1 channel 1 -> rank 3
+ */
+ dev = 10 + chan_rank / 2 * 2;
+ func = (chan_rank % 2) ? 6 : 2;
+ reg = 0x120 + dimm_order * 4;
+
+ ret = peci_pci_local_read(priv->peci_dev, 2, dev, func, reg, data);
+ if (ret)
+ return ret;
+
+ return 0;
+}
+
+static int
+read_thresholds_skx(struct peci_dimmtemp *priv, int dimm_order, int chan_rank, u32 *data)
+{
+ u8 dev, func;
+ u16 reg;
+ int ret;
+
+ /*
+ * Device 10, Function 2: IMC 0 channel 0 -> rank 0
+ * Device 10, Function 6: IMC 0 channel 1 -> rank 1
+ * Device 11, Function 2: IMC 0 channel 2 -> rank 2
+ * Device 12, Function 2: IMC 1 channel 0 -> rank 3
+ * Device 12, Function 6: IMC 1 channel 1 -> rank 4
+ * Device 13, Function 2: IMC 1 channel 2 -> rank 5
+ */
+ dev = 10 + chan_rank / 3 * 2 + (chan_rank % 3 == 2 ? 1 : 0);
+ func = chan_rank % 3 == 1 ? 6 : 2;
+ reg = 0x120 + dimm_order * 4;
+
+ ret = peci_pci_local_read(priv->peci_dev, 2, dev, func, reg, data);
+ if (ret)
+ return ret;
+
+ return 0;
+}
+
+static int
+read_thresholds_icx(struct peci_dimmtemp *priv, int dimm_order, int chan_rank, u32 *data)
+{
+ u32 reg_val;
+ u64 offset;
+ int ret;
+ u8 dev;
+
+ ret = peci_ep_pci_local_read(priv->peci_dev, 0, 13, 0, 2, 0xd4, &reg_val);
+ if (ret || !(reg_val & BIT(31)))
+ return -ENODATA; /* Use default or previous value */
+
+ ret = peci_ep_pci_local_read(priv->peci_dev, 0, 13, 0, 2, 0xd0, &reg_val);
+ if (ret)
+ return -ENODATA; /* Use default or previous value */
+
+ /*
+ * Device 26, Offset 224e0: IMC 0 channel 0 -> rank 0
+ * Device 26, Offset 264e0: IMC 0 channel 1 -> rank 1
+ * Device 27, Offset 224e0: IMC 1 channel 0 -> rank 2
+ * Device 27, Offset 264e0: IMC 1 channel 1 -> rank 3
+ * Device 28, Offset 224e0: IMC 2 channel 0 -> rank 4
+ * Device 28, Offset 264e0: IMC 2 channel 1 -> rank 5
+ * Device 29, Offset 224e0: IMC 3 channel 0 -> rank 6
+ * Device 29, Offset 264e0: IMC 3 channel 1 -> rank 7
+ */
+ dev = 26 + chan_rank / 2;
+ offset = 0x224e0 + dimm_order * 4 + (chan_rank % 2) * 0x4000;
+
+ ret = peci_mmio_read(priv->peci_dev, 0, GET_CPU_SEG(reg_val), GET_CPU_BUS(reg_val),
+ dev, 0, offset, data);
+ if (ret)
+ return ret;
+
+ return 0;
+}
+
+static const struct dimm_info dimm_hsx = {
+ .chan_rank_max = CHAN_RANK_MAX_ON_HSX,
+ .dimm_idx_max = DIMM_IDX_MAX_ON_HSX,
+ .min_peci_revision = 0x33,
+ .read_thresholds = &read_thresholds_hsx,
+};
+
+static const struct dimm_info dimm_bdx = {
+ .chan_rank_max = CHAN_RANK_MAX_ON_BDX,
+ .dimm_idx_max = DIMM_IDX_MAX_ON_BDX,
+ .min_peci_revision = 0x33,
+ .read_thresholds = &read_thresholds_hsx,
+};
+
+static const struct dimm_info dimm_bdxd = {
+ .chan_rank_max = CHAN_RANK_MAX_ON_BDXD,
+ .dimm_idx_max = DIMM_IDX_MAX_ON_BDXD,
+ .min_peci_revision = 0x33,
+ .read_thresholds = &read_thresholds_bdxd,
+};
+
+static const struct dimm_info dimm_skx = {
+ .chan_rank_max = CHAN_RANK_MAX_ON_SKX,
+ .dimm_idx_max = DIMM_IDX_MAX_ON_SKX,
+ .min_peci_revision = 0x33,
+ .read_thresholds = &read_thresholds_skx,
+};
+
+static const struct dimm_info dimm_icx = {
+ .chan_rank_max = CHAN_RANK_MAX_ON_ICX,
+ .dimm_idx_max = DIMM_IDX_MAX_ON_ICX,
+ .min_peci_revision = 0x40,
+ .read_thresholds = &read_thresholds_icx,
+};
+
+static const struct dimm_info dimm_icxd = {
+ .chan_rank_max = CHAN_RANK_MAX_ON_ICXD,
+ .dimm_idx_max = DIMM_IDX_MAX_ON_ICXD,
+ .min_peci_revision = 0x40,
+ .read_thresholds = &read_thresholds_icx,
+};
+
+static const struct auxiliary_device_id peci_dimmtemp_ids[] = {
+ {
+ .name = "peci_cpu.dimmtemp.hsx",
+ .driver_data = (kernel_ulong_t)&dimm_hsx,
+ },
+ {
+ .name = "peci_cpu.dimmtemp.bdx",
+ .driver_data = (kernel_ulong_t)&dimm_bdx,
+ },
+ {
+ .name = "peci_cpu.dimmtemp.bdxd",
+ .driver_data = (kernel_ulong_t)&dimm_bdxd,
+ },
+ {
+ .name = "peci_cpu.dimmtemp.skx",
+ .driver_data = (kernel_ulong_t)&dimm_skx,
+ },
+ {
+ .name = "peci_cpu.dimmtemp.icx",
+ .driver_data = (kernel_ulong_t)&dimm_icx,
+ },
+ {
+ .name = "peci_cpu.dimmtemp.icxd",
+ .driver_data = (kernel_ulong_t)&dimm_icxd,
+ },
+ { }
+};
+MODULE_DEVICE_TABLE(auxiliary, peci_dimmtemp_ids);
+
+static struct auxiliary_driver peci_dimmtemp_driver = {
+ .probe = peci_dimmtemp_probe,
+ .id_table = peci_dimmtemp_ids,
+};
+
+module_auxiliary_driver(peci_dimmtemp_driver);
+
+MODULE_AUTHOR("Jae Hyun Yoo <[email protected]>");
+MODULE_AUTHOR("Iwona Winiarska <[email protected]>");
+MODULE_DESCRIPTION("PECI dimmtemp driver");
+MODULE_LICENSE("GPL");
+MODULE_IMPORT_NS(PECI_CPU);
--
2.31.1


2021-11-15 22:33:08

by Winiarska, Iwona

[permalink] [raw]
Subject: [PATCH v3 13/13] docs: Add PECI documentation

Add a brief overview of PECI and PECI wire interface.
The documentation also contains kernel-doc for PECI subsystem internals
and PECI CPU Driver API.

Signed-off-by: Iwona Winiarska <[email protected]>
Reviewed-by: Pierre-Louis Bossart <[email protected]>
---
Documentation/index.rst | 1 +
Documentation/peci/index.rst | 16 +++++++++++
Documentation/peci/peci.rst | 51 ++++++++++++++++++++++++++++++++++++
MAINTAINERS | 1 +
4 files changed, 69 insertions(+)
create mode 100644 Documentation/peci/index.rst
create mode 100644 Documentation/peci/peci.rst

diff --git a/Documentation/index.rst b/Documentation/index.rst
index 54ce34fd6fbd..7671f2cd474f 100644
--- a/Documentation/index.rst
+++ b/Documentation/index.rst
@@ -137,6 +137,7 @@ needed).
misc-devices/index
scheduler/index
mhi/index
+ peci/index

Architecture-agnostic documentation
-----------------------------------
diff --git a/Documentation/peci/index.rst b/Documentation/peci/index.rst
new file mode 100644
index 000000000000..989de10416e7
--- /dev/null
+++ b/Documentation/peci/index.rst
@@ -0,0 +1,16 @@
+.. SPDX-License-Identifier: GPL-2.0-only
+
+====================
+Linux PECI Subsystem
+====================
+
+.. toctree::
+
+ peci
+
+.. only:: subproject and html
+
+ Indices
+ =======
+
+ * :ref:`genindex`
diff --git a/Documentation/peci/peci.rst b/Documentation/peci/peci.rst
new file mode 100644
index 000000000000..331b1ec00e22
--- /dev/null
+++ b/Documentation/peci/peci.rst
@@ -0,0 +1,51 @@
+.. SPDX-License-Identifier: GPL-2.0-only
+
+========
+Overview
+========
+
+The Platform Environment Control Interface (PECI) is a communication
+interface between Intel processor and management controllers
+(e.g. Baseboard Management Controller, BMC).
+PECI provides services that allow the management controller to
+configure, monitor and debug platform by accessing various registers.
+It defines a dedicated command protocol, where the management
+controller is acting as a PECI originator and the processor - as
+a PECI responder.
+PECI can be used in both single processor and multiple-processor based
+systems.
+
+NOTE:
+Intel PECI specification is not released as a dedicated document,
+instead it is a part of External Design Specification (EDS) for given
+Intel CPU. External Design Specifications are usually not publicly
+available.
+
+PECI Wire
+---------
+
+PECI Wire interface uses a single wire for self-clocking and data
+transfer. It does not require any additional control lines - the
+physical layer is a self-clocked one-wire bus signal that begins each
+bit with a driven, rising edge from an idle near zero volts. The
+duration of the signal driven high allows to determine whether the bit
+value is logic '0' or logic '1'. PECI Wire also includes variable data
+rate established with every message.
+
+For PECI Wire, each processor package will utilize unique, fixed
+addresses within a defined range and that address should
+have a fixed relationship with the processor socket ID - if one of the
+processors is removed, it does not affect addresses of remaining
+processors.
+
+PECI subsystem internals
+------------------------
+
+.. kernel-doc:: include/linux/peci.h
+.. kernel-doc:: drivers/peci/internal.h
+.. kernel-doc:: drivers/peci/core.c
+.. kernel-doc:: drivers/peci/request.c
+
+PECI CPU Driver API
+-------------------
+.. kernel-doc:: drivers/peci/cpu.c
diff --git a/MAINTAINERS b/MAINTAINERS
index 0f7216644bd5..4db4979db60a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14935,6 +14935,7 @@ R: Jae Hyun Yoo <[email protected]>
L: [email protected] (moderated for non-subscribers)
S: Supported
F: Documentation/devicetree/bindings/peci/
+F: Documentation/peci/
F: drivers/peci/
F: include/linux/peci-cpu.h
F: include/linux/peci.h
--
2.31.1


2021-11-15 22:33:09

by Winiarska, Iwona

[permalink] [raw]
Subject: [PATCH v3 03/13] ARM: dts: aspeed: Add PECI controller nodes

Add PECI controller nodes with all required information.

Co-developed-by: Jae Hyun Yoo <[email protected]>
Signed-off-by: Jae Hyun Yoo <[email protected]>
Signed-off-by: Iwona Winiarska <[email protected]>
---
arch/arm/boot/dts/aspeed-g4.dtsi | 14 ++++++++++++++
arch/arm/boot/dts/aspeed-g5.dtsi | 14 ++++++++++++++
arch/arm/boot/dts/aspeed-g6.dtsi | 14 ++++++++++++++
3 files changed, 42 insertions(+)

diff --git a/arch/arm/boot/dts/aspeed-g4.dtsi b/arch/arm/boot/dts/aspeed-g4.dtsi
index b313a1cf5f73..cdaf6db4f07d 100644
--- a/arch/arm/boot/dts/aspeed-g4.dtsi
+++ b/arch/arm/boot/dts/aspeed-g4.dtsi
@@ -391,6 +391,20 @@ uart_routing: uart-routing@9c {
};
};

+ peci0: peci-controller@1e78b000 {
+ compatible = "aspeed,ast2400-peci";
+ reg = <0x1e78b000 0x60>;
+ interrupts = <15>;
+ clocks = <&syscon ASPEED_CLK_GATE_REFCLK>;
+ resets = <&syscon ASPEED_RESET_PECI>;
+ cmd-timeout-ms = <1000>;
+ aspeed,clock-divider = <0>;
+ aspeed,msg-timing = <1>;
+ aspeed,addr-timing = <1>;
+ aspeed,rd-sampling-point = <8>;
+ status = "disabled";
+ };
+
uart2: serial@1e78d000 {
compatible = "ns16550a";
reg = <0x1e78d000 0x20>;
diff --git a/arch/arm/boot/dts/aspeed-g5.dtsi b/arch/arm/boot/dts/aspeed-g5.dtsi
index c7049454c7cb..1ff71c182c95 100644
--- a/arch/arm/boot/dts/aspeed-g5.dtsi
+++ b/arch/arm/boot/dts/aspeed-g5.dtsi
@@ -511,6 +511,20 @@ ibt: ibt@140 {
};
};

+ peci0: peci-controller@1e78b000 {
+ compatible = "aspeed,ast2500-peci";
+ reg = <0x1e78b000 0x60>;
+ interrupts = <15>;
+ clocks = <&syscon ASPEED_CLK_GATE_REFCLK>;
+ resets = <&syscon ASPEED_RESET_PECI>;
+ cmd-timeout-ms = <1000>;
+ aspeed,clock-divider = <0>;
+ aspeed,msg-timing = <1>;
+ aspeed,addr-timing = <1>;
+ aspeed,rd-sampling-point = <8>;
+ status = "disabled";
+ };
+
uart2: serial@1e78d000 {
compatible = "ns16550a";
reg = <0x1e78d000 0x20>;
diff --git a/arch/arm/boot/dts/aspeed-g6.dtsi b/arch/arm/boot/dts/aspeed-g6.dtsi
index 5106a424f1ce..efd87ceb40b6 100644
--- a/arch/arm/boot/dts/aspeed-g6.dtsi
+++ b/arch/arm/boot/dts/aspeed-g6.dtsi
@@ -507,6 +507,20 @@ wdt4: watchdog@1e7850c0 {
status = "disabled";
};

+ peci0: peci-controller@1e78b000 {
+ compatible = "aspeed,ast2600-peci";
+ reg = <0x1e78b000 0x100>;
+ interrupts = <GIC_SPI 38 IRQ_TYPE_LEVEL_HIGH>;
+ clocks = <&syscon ASPEED_CLK_GATE_REF0CLK>;
+ resets = <&syscon ASPEED_RESET_PECI>;
+ cmd-timeout-ms = <1000>;
+ aspeed,clock-divider = <0>;
+ aspeed,msg-timing = <1>;
+ aspeed,addr-timing = <1>;
+ aspeed,rd-sampling-point = <8>;
+ status = "disabled";
+ };
+
lpc: lpc@1e789000 {
compatible = "aspeed,ast2600-lpc-v2", "simple-mfd", "syscon";
reg = <0x1e789000 0x1000>;
--
2.31.1


2021-11-15 22:36:59

by Winiarska, Iwona

[permalink] [raw]
Subject: [PATCH v3 09/13] peci: Add peci-cpu driver

PECI is an interface that may be used by different types of devices.
Add a peci-cpu driver compatible with Intel processors. The driver is
responsible for handling auxiliary devices that can subsequently be used
by other drivers (e.g. hwmons).

Signed-off-by: Iwona Winiarska <[email protected]>
Reviewed-by: Pierre-Louis Bossart <[email protected]>
---
MAINTAINERS | 1 +
drivers/peci/Kconfig | 15 ++
drivers/peci/Makefile | 2 +
drivers/peci/cpu.c | 343 +++++++++++++++++++++++++++++++++++++++
drivers/peci/device.c | 1 +
drivers/peci/internal.h | 27 +++
drivers/peci/request.c | 213 ++++++++++++++++++++++++
include/linux/peci-cpu.h | 40 +++++
include/linux/peci.h | 8 -
9 files changed, 642 insertions(+), 8 deletions(-)
create mode 100644 drivers/peci/cpu.c
create mode 100644 include/linux/peci-cpu.h

diff --git a/MAINTAINERS b/MAINTAINERS
index a4da3bba09ba..d751590f32e3 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14927,6 +14927,7 @@ L: [email protected] (moderated for non-subscribers)
S: Supported
F: Documentation/devicetree/bindings/peci/
F: drivers/peci/
+F: include/linux/peci-cpu.h
F: include/linux/peci.h

PENSANDO ETHERNET DRIVERS
diff --git a/drivers/peci/Kconfig b/drivers/peci/Kconfig
index 99279df97a78..89872ad83320 100644
--- a/drivers/peci/Kconfig
+++ b/drivers/peci/Kconfig
@@ -16,6 +16,21 @@ menuconfig PECI

if PECI

+config PECI_CPU
+ tristate "PECI CPU"
+ select AUXILIARY_BUS
+ help
+ This option enables peci-cpu driver for Intel processors. It is
+ responsible for creating auxiliary devices that can subsequently
+ be used by other drivers in order to perform various
+ functionalities such as e.g. temperature monitoring.
+
+ Additional drivers must be enabled in order to use the functionality
+ of the device.
+
+ This driver can also be built as a module. If so, the module
+ will be called peci-cpu.
+
source "drivers/peci/controller/Kconfig"

endif # PECI
diff --git a/drivers/peci/Makefile b/drivers/peci/Makefile
index 917f689e147a..7de18137e738 100644
--- a/drivers/peci/Makefile
+++ b/drivers/peci/Makefile
@@ -3,6 +3,8 @@
# Core functionality
peci-y := core.o request.o device.o sysfs.o
obj-$(CONFIG_PECI) += peci.o
+peci-cpu-y := cpu.o
+obj-$(CONFIG_PECI_CPU) += peci-cpu.o

# Hardware specific bus drivers
obj-y += controller/
diff --git a/drivers/peci/cpu.c b/drivers/peci/cpu.c
new file mode 100644
index 000000000000..68eb61c65d34
--- /dev/null
+++ b/drivers/peci/cpu.c
@@ -0,0 +1,343 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright (c) 2021 Intel Corporation
+
+#include <linux/auxiliary_bus.h>
+#include <linux/module.h>
+#include <linux/peci.h>
+#include <linux/peci-cpu.h>
+#include <linux/slab.h>
+
+#include "internal.h"
+
+/**
+ * peci_temp_read() - read the maximum die temperature from PECI target device
+ * @device: PECI device to which request is going to be sent
+ * @temp_raw: where to store the read temperature
+ *
+ * It uses GetTemp PECI command.
+ *
+ * Return: 0 if succeeded, other values in case errors.
+ */
+int peci_temp_read(struct peci_device *device, s16 *temp_raw)
+{
+ struct peci_request *req;
+
+ req = peci_xfer_get_temp(device);
+ if (IS_ERR(req))
+ return PTR_ERR(req);
+
+ *temp_raw = peci_request_temp_read(req);
+
+ peci_request_free(req);
+
+ return 0;
+}
+EXPORT_SYMBOL_NS_GPL(peci_temp_read, PECI_CPU);
+
+/**
+ * peci_pcs_read() - read PCS register
+ * @device: PECI device to which request is going to be sent
+ * @index: PCS index
+ * @param: PCS parameter
+ * @data: where to store the read data
+ *
+ * It uses RdPkgConfig PECI command.
+ *
+ * Return: 0 if succeeded, other values in case errors.
+ */
+int peci_pcs_read(struct peci_device *device, u8 index, u16 param, u32 *data)
+{
+ struct peci_request *req;
+ int ret;
+
+ req = peci_xfer_pkg_cfg_readl(device, index, param);
+ if (IS_ERR(req))
+ return PTR_ERR(req);
+
+ ret = peci_request_status(req);
+ if (ret)
+ goto out_req_free;
+
+ *data = peci_request_data_readl(req);
+out_req_free:
+ peci_request_free(req);
+
+ return ret;
+}
+EXPORT_SYMBOL_NS_GPL(peci_pcs_read, PECI_CPU);
+
+/**
+ * peci_pci_local_read() - read 32-bit memory location using raw address
+ * @device: PECI device to which request is going to be sent
+ * @bus: bus
+ * @dev: device
+ * @func: function
+ * @reg: register
+ * @data: where to store the read data
+ *
+ * It uses RdPCIConfigLocal PECI command.
+ *
+ * Return: 0 if succeeded, other values in case errors.
+ */
+int peci_pci_local_read(struct peci_device *device, u8 bus, u8 dev, u8 func,
+ u16 reg, u32 *data)
+{
+ struct peci_request *req;
+ int ret;
+
+ req = peci_xfer_pci_cfg_local_readl(device, bus, dev, func, reg);
+ if (IS_ERR(req))
+ return PTR_ERR(req);
+
+ ret = peci_request_status(req);
+ if (ret)
+ goto out_req_free;
+
+ *data = peci_request_data_readl(req);
+out_req_free:
+ peci_request_free(req);
+
+ return ret;
+}
+EXPORT_SYMBOL_NS_GPL(peci_pci_local_read, PECI_CPU);
+
+/**
+ * peci_ep_pci_local_read() - read 32-bit memory location using raw address
+ * @device: PECI device to which request is going to be sent
+ * @seg: PCI segment
+ * @bus: bus
+ * @dev: device
+ * @func: function
+ * @reg: register
+ * @data: where to store the read data
+ *
+ * Like &peci_pci_local_read, but it uses RdEndpointConfig PECI command.
+ *
+ * Return: 0 if succeeded, other values in case errors.
+ */
+int peci_ep_pci_local_read(struct peci_device *device, u8 seg,
+ u8 bus, u8 dev, u8 func, u16 reg, u32 *data)
+{
+ struct peci_request *req;
+ int ret;
+
+ req = peci_xfer_ep_pci_cfg_local_readl(device, seg, bus, dev, func, reg);
+ if (IS_ERR(req))
+ return PTR_ERR(req);
+
+ ret = peci_request_status(req);
+ if (ret)
+ goto out_req_free;
+
+ *data = peci_request_data_readl(req);
+out_req_free:
+ peci_request_free(req);
+
+ return ret;
+}
+EXPORT_SYMBOL_NS_GPL(peci_ep_pci_local_read, PECI_CPU);
+
+/**
+ * peci_mmio_read() - read 32-bit memory location using 64-bit bar offset address
+ * @device: PECI device to which request is going to be sent
+ * @bar: PCI bar
+ * @seg: PCI segment
+ * @bus: bus
+ * @dev: device
+ * @func: function
+ * @address: 64-bit MMIO address
+ * @data: where to store the read data
+ *
+ * It uses RdEndpointConfig PECI command.
+ *
+ * Return: 0 if succeeded, other values in case errors.
+ */
+int peci_mmio_read(struct peci_device *device, u8 bar, u8 seg,
+ u8 bus, u8 dev, u8 func, u64 address, u32 *data)
+{
+ struct peci_request *req;
+ int ret;
+
+ req = peci_xfer_ep_mmio64_readl(device, bar, seg, bus, dev, func, address);
+ if (IS_ERR(req))
+ return PTR_ERR(req);
+
+ ret = peci_request_status(req);
+ if (ret)
+ goto out_req_free;
+
+ *data = peci_request_data_readl(req);
+out_req_free:
+ peci_request_free(req);
+
+ return ret;
+}
+EXPORT_SYMBOL_NS_GPL(peci_mmio_read, PECI_CPU);
+
+static const char * const peci_adev_types[] = {
+ "cputemp",
+ "dimmtemp",
+};
+
+struct peci_cpu {
+ struct peci_device *device;
+ const struct peci_device_id *id;
+};
+
+static void adev_release(struct device *dev)
+{
+ struct auxiliary_device *adev = to_auxiliary_dev(dev);
+
+ auxiliary_device_uninit(adev);
+
+ kfree(adev->name);
+ kfree(adev);
+}
+
+static struct auxiliary_device *adev_alloc(struct peci_cpu *priv, int idx)
+{
+ struct peci_controller *controller = to_peci_controller(priv->device->dev.parent);
+ struct auxiliary_device *adev;
+ const char *name;
+ int ret;
+
+ adev = kzalloc(sizeof(*adev), GFP_KERNEL);
+ if (!adev)
+ return ERR_PTR(-ENOMEM);
+
+ name = kasprintf(GFP_KERNEL, "%s.%s", peci_adev_types[idx], (const char *)priv->id->data);
+ if (!name) {
+ ret = -ENOMEM;
+ goto free_adev;
+ }
+
+ adev->name = name;
+ adev->dev.parent = &priv->device->dev;
+ adev->dev.release = adev_release;
+ adev->id = (controller->id << 16) | (priv->device->addr);
+
+ ret = auxiliary_device_init(adev);
+ if (ret)
+ goto free_name;
+
+ return adev;
+
+free_name:
+ kfree(name);
+free_adev:
+ kfree(adev);
+ return ERR_PTR(ret);
+}
+
+static void unregister_adev(void *_adev)
+{
+ struct auxiliary_device *adev = _adev;
+
+ auxiliary_device_delete(adev);
+}
+
+static int devm_adev_add(struct device *dev, int idx)
+{
+ struct peci_cpu *priv = dev_get_drvdata(dev);
+ struct auxiliary_device *adev;
+ int ret;
+
+ adev = adev_alloc(priv, idx);
+ if (IS_ERR(adev))
+ return PTR_ERR(adev);
+
+ ret = auxiliary_device_add(adev);
+ if (ret) {
+ auxiliary_device_uninit(adev);
+ return ret;
+ }
+
+ ret = devm_add_action_or_reset(&priv->device->dev, unregister_adev, adev);
+ if (ret)
+ return ret;
+
+ return 0;
+}
+
+static void peci_cpu_add_adevices(struct peci_cpu *priv)
+{
+ struct device *dev = &priv->device->dev;
+ int ret, i;
+
+ for (i = 0; i < ARRAY_SIZE(peci_adev_types); i++) {
+ ret = devm_adev_add(dev, i);
+ if (ret) {
+ dev_warn(dev, "Failed to register PECI auxiliary: %s, ret = %d\n",
+ peci_adev_types[i], ret);
+ continue;
+ }
+ }
+}
+
+static int
+peci_cpu_probe(struct peci_device *device, const struct peci_device_id *id)
+{
+ struct device *dev = &device->dev;
+ struct peci_cpu *priv;
+
+ priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
+ if (!priv)
+ return -ENOMEM;
+
+ dev_set_drvdata(dev, priv);
+ priv->device = device;
+ priv->id = id;
+
+ peci_cpu_add_adevices(priv);
+
+ return 0;
+}
+
+static const struct peci_device_id peci_cpu_device_ids[] = {
+ { /* Haswell Xeon */
+ .family = 6,
+ .model = INTEL_FAM6_HASWELL_X,
+ .data = "hsx",
+ },
+ { /* Broadwell Xeon */
+ .family = 6,
+ .model = INTEL_FAM6_BROADWELL_X,
+ .data = "bdx",
+ },
+ { /* Broadwell Xeon D */
+ .family = 6,
+ .model = INTEL_FAM6_BROADWELL_D,
+ .data = "bdxd",
+ },
+ { /* Skylake Xeon */
+ .family = 6,
+ .model = INTEL_FAM6_SKYLAKE_X,
+ .data = "skx",
+ },
+ { /* Icelake Xeon */
+ .family = 6,
+ .model = INTEL_FAM6_ICELAKE_X,
+ .data = "icx",
+ },
+ { /* Icelake Xeon D */
+ .family = 6,
+ .model = INTEL_FAM6_ICELAKE_D,
+ .data = "icxd",
+ },
+ { }
+};
+MODULE_DEVICE_TABLE(peci, peci_cpu_device_ids);
+
+static struct peci_driver peci_cpu_driver = {
+ .probe = peci_cpu_probe,
+ .id_table = peci_cpu_device_ids,
+ .driver = {
+ .name = "peci-cpu",
+ },
+};
+module_peci_driver(peci_cpu_driver);
+
+MODULE_AUTHOR("Iwona Winiarska <[email protected]>");
+MODULE_DESCRIPTION("PECI CPU driver");
+MODULE_LICENSE("GPL");
+MODULE_IMPORT_NS(PECI);
diff --git a/drivers/peci/device.c b/drivers/peci/device.c
index 0a7660f4fc9b..235815710e5b 100644
--- a/drivers/peci/device.c
+++ b/drivers/peci/device.c
@@ -3,6 +3,7 @@

#include <linux/bitfield.h>
#include <linux/peci.h>
+#include <linux/peci-cpu.h>
#include <linux/slab.h>

#include "internal.h"
diff --git a/drivers/peci/internal.h b/drivers/peci/internal.h
index 52c02e12874f..9d75ea54504c 100644
--- a/drivers/peci/internal.h
+++ b/drivers/peci/internal.h
@@ -22,6 +22,7 @@ void peci_request_free(struct peci_request *req);
int peci_request_status(struct peci_request *req);

u64 peci_request_dib_read(struct peci_request *req);
+s16 peci_request_temp_read(struct peci_request *req);

u8 peci_request_data_readb(struct peci_request *req);
u16 peci_request_data_readw(struct peci_request *req);
@@ -36,6 +37,32 @@ struct peci_request *peci_xfer_pkg_cfg_readw(struct peci_device *device, u8 inde
struct peci_request *peci_xfer_pkg_cfg_readl(struct peci_device *device, u8 index, u16 param);
struct peci_request *peci_xfer_pkg_cfg_readq(struct peci_device *device, u8 index, u16 param);

+struct peci_request *peci_xfer_pci_cfg_local_readb(struct peci_device *device,
+ u8 bus, u8 dev, u8 func, u16 reg);
+struct peci_request *peci_xfer_pci_cfg_local_readw(struct peci_device *device,
+ u8 bus, u8 dev, u8 func, u16 reg);
+struct peci_request *peci_xfer_pci_cfg_local_readl(struct peci_device *device,
+ u8 bus, u8 dev, u8 func, u16 reg);
+
+struct peci_request *peci_xfer_ep_pci_cfg_local_readb(struct peci_device *device, u8 seg,
+ u8 bus, u8 dev, u8 func, u16 reg);
+struct peci_request *peci_xfer_ep_pci_cfg_local_readw(struct peci_device *device, u8 seg,
+ u8 bus, u8 dev, u8 func, u16 reg);
+struct peci_request *peci_xfer_ep_pci_cfg_local_readl(struct peci_device *device, u8 seg,
+ u8 bus, u8 dev, u8 func, u16 reg);
+
+struct peci_request *peci_xfer_ep_pci_cfg_readb(struct peci_device *device, u8 seg,
+ u8 bus, u8 dev, u8 func, u16 reg);
+struct peci_request *peci_xfer_ep_pci_cfg_readw(struct peci_device *device, u8 seg,
+ u8 bus, u8 dev, u8 func, u16 reg);
+struct peci_request *peci_xfer_ep_pci_cfg_readl(struct peci_device *device, u8 seg,
+ u8 bus, u8 dev, u8 func, u16 reg);
+
+struct peci_request *peci_xfer_ep_mmio32_readl(struct peci_device *device, u8 bar, u8 seg,
+ u8 bus, u8 dev, u8 func, u64 offset);
+
+struct peci_request *peci_xfer_ep_mmio64_readl(struct peci_device *device, u8 bar, u8 seg,
+ u8 bus, u8 dev, u8 func, u64 offset);
/**
* struct peci_device_id - PECI device data to match
* @data: pointer to driver private data specific to device
diff --git a/drivers/peci/request.c b/drivers/peci/request.c
index 9bd1d81c6093..33da90d801d5 100644
--- a/drivers/peci/request.c
+++ b/drivers/peci/request.c
@@ -3,6 +3,7 @@

#include <linux/bug.h>
#include <linux/export.h>
+#include <linux/pci.h>
#include <linux/peci.h>
#include <linux/slab.h>
#include <linux/types.h>
@@ -15,6 +16,10 @@
#define PECI_GET_DIB_WR_LEN 1
#define PECI_GET_DIB_RD_LEN 8

+#define PECI_GET_TEMP_CMD 0x01
+#define PECI_GET_TEMP_WR_LEN 1
+#define PECI_GET_TEMP_RD_LEN 2
+
#define PECI_RDPKGCFG_CMD 0xa1
#define PECI_RDPKGCFG_WR_LEN 5
#define PECI_RDPKGCFG_RD_LEN_BASE 1
@@ -22,6 +27,45 @@
#define PECI_WRPKGCFG_WR_LEN_BASE 6
#define PECI_WRPKGCFG_RD_LEN 1

+#define PECI_RDIAMSR_CMD 0xb1
+#define PECI_RDIAMSR_WR_LEN 5
+#define PECI_RDIAMSR_RD_LEN 9
+#define PECI_WRIAMSR_CMD 0xb5
+#define PECI_RDIAMSREX_CMD 0xd1
+#define PECI_RDIAMSREX_WR_LEN 6
+#define PECI_RDIAMSREX_RD_LEN 9
+
+#define PECI_RDPCICFG_CMD 0x61
+#define PECI_RDPCICFG_WR_LEN 6
+#define PECI_RDPCICFG_RD_LEN 5
+#define PECI_RDPCICFG_RD_LEN_MAX 24
+#define PECI_WRPCICFG_CMD 0x65
+
+#define PECI_RDPCICFGLOCAL_CMD 0xe1
+#define PECI_RDPCICFGLOCAL_WR_LEN 5
+#define PECI_RDPCICFGLOCAL_RD_LEN_BASE 1
+#define PECI_WRPCICFGLOCAL_CMD 0xe5
+#define PECI_WRPCICFGLOCAL_WR_LEN_BASE 6
+#define PECI_WRPCICFGLOCAL_RD_LEN 1
+
+#define PECI_ENDPTCFG_TYPE_LOCAL_PCI 0x03
+#define PECI_ENDPTCFG_TYPE_PCI 0x04
+#define PECI_ENDPTCFG_TYPE_MMIO 0x05
+#define PECI_ENDPTCFG_ADDR_TYPE_PCI 0x04
+#define PECI_ENDPTCFG_ADDR_TYPE_MMIO_D 0x05
+#define PECI_ENDPTCFG_ADDR_TYPE_MMIO_Q 0x06
+#define PECI_RDENDPTCFG_CMD 0xc1
+#define PECI_RDENDPTCFG_PCI_WR_LEN 12
+#define PECI_RDENDPTCFG_MMIO_WR_LEN_BASE 10
+#define PECI_RDENDPTCFG_MMIO_D_WR_LEN 14
+#define PECI_RDENDPTCFG_MMIO_Q_WR_LEN 18
+#define PECI_RDENDPTCFG_RD_LEN_BASE 1
+#define PECI_WRENDPTCFG_CMD 0xc5
+#define PECI_WRENDPTCFG_PCI_WR_LEN_BASE 13
+#define PECI_WRENDPTCFG_MMIO_D_WR_LEN_BASE 15
+#define PECI_WRENDPTCFG_MMIO_Q_WR_LEN_BASE 19
+#define PECI_WRENDPTCFG_RD_LEN 1
+
/* Device Specific Completion Code (CC) Definition */
#define PECI_CC_SUCCESS 0x40
#define PECI_CC_NEED_RETRY 0x80
@@ -202,6 +246,27 @@ struct peci_request *peci_xfer_get_dib(struct peci_device *device)
}
EXPORT_SYMBOL_NS_GPL(peci_xfer_get_dib, PECI);

+struct peci_request *peci_xfer_get_temp(struct peci_device *device)
+{
+ struct peci_request *req;
+ int ret;
+
+ req = peci_request_alloc(device, PECI_GET_TEMP_WR_LEN, PECI_GET_TEMP_RD_LEN);
+ if (!req)
+ return ERR_PTR(-ENOMEM);
+
+ req->tx.buf[0] = PECI_GET_TEMP_CMD;
+
+ ret = peci_request_xfer(req);
+ if (ret) {
+ peci_request_free(req);
+ return ERR_PTR(ret);
+ }
+
+ return req;
+}
+EXPORT_SYMBOL_NS_GPL(peci_xfer_get_temp, PECI);
+
static struct peci_request *
__pkg_cfg_read(struct peci_device *device, u8 index, u16 param, u8 len)
{
@@ -226,6 +291,108 @@ __pkg_cfg_read(struct peci_device *device, u8 index, u16 param, u8 len)
return req;
}

+static u32 __get_pci_addr(u8 bus, u8 dev, u8 func, u16 reg)
+{
+ return reg | PCI_DEVID(bus, PCI_DEVFN(dev, func)) << 12;
+}
+
+static struct peci_request *
+__pci_cfg_local_read(struct peci_device *device, u8 bus, u8 dev, u8 func, u16 reg, u8 len)
+{
+ struct peci_request *req;
+ u32 pci_addr;
+ int ret;
+
+ req = peci_request_alloc(device, PECI_RDPCICFGLOCAL_WR_LEN,
+ PECI_RDPCICFGLOCAL_RD_LEN_BASE + len);
+ if (!req)
+ return ERR_PTR(-ENOMEM);
+
+ pci_addr = __get_pci_addr(bus, dev, func, reg);
+
+ req->tx.buf[0] = PECI_RDPCICFGLOCAL_CMD;
+ req->tx.buf[1] = 0;
+ put_unaligned_le24(pci_addr, &req->tx.buf[2]);
+
+ ret = peci_request_xfer_retry(req);
+ if (ret) {
+ peci_request_free(req);
+ return ERR_PTR(ret);
+ }
+
+ return req;
+}
+
+static struct peci_request *
+__ep_pci_cfg_read(struct peci_device *device, u8 msg_type, u8 seg,
+ u8 bus, u8 dev, u8 func, u16 reg, u8 len)
+{
+ struct peci_request *req;
+ u32 pci_addr;
+ int ret;
+
+ req = peci_request_alloc(device, PECI_RDENDPTCFG_PCI_WR_LEN,
+ PECI_RDENDPTCFG_RD_LEN_BASE + len);
+ if (!req)
+ return ERR_PTR(-ENOMEM);
+
+ pci_addr = __get_pci_addr(bus, dev, func, reg);
+
+ req->tx.buf[0] = PECI_RDENDPTCFG_CMD;
+ req->tx.buf[1] = 0;
+ req->tx.buf[2] = msg_type;
+ req->tx.buf[3] = 0;
+ req->tx.buf[4] = 0;
+ req->tx.buf[5] = 0;
+ req->tx.buf[6] = PECI_ENDPTCFG_ADDR_TYPE_PCI;
+ req->tx.buf[7] = seg; /* PCI Segment */
+ put_unaligned_le32(pci_addr, &req->tx.buf[8]);
+
+ ret = peci_request_xfer_retry(req);
+ if (ret) {
+ peci_request_free(req);
+ return ERR_PTR(ret);
+ }
+
+ return req;
+}
+
+static struct peci_request *
+__ep_mmio_read(struct peci_device *device, u8 bar, u8 addr_type, u8 seg,
+ u8 bus, u8 dev, u8 func, u64 offset, u8 tx_len, u8 len)
+{
+ struct peci_request *req;
+ int ret;
+
+ req = peci_request_alloc(device, tx_len, PECI_RDENDPTCFG_RD_LEN_BASE + len);
+ if (!req)
+ return ERR_PTR(-ENOMEM);
+
+ req->tx.buf[0] = PECI_RDENDPTCFG_CMD;
+ req->tx.buf[1] = 0;
+ req->tx.buf[2] = PECI_ENDPTCFG_TYPE_MMIO;
+ req->tx.buf[3] = 0; /* Endpoint ID */
+ req->tx.buf[4] = 0; /* Reserved */
+ req->tx.buf[5] = bar;
+ req->tx.buf[6] = addr_type;
+ req->tx.buf[7] = seg; /* PCI Segment */
+ req->tx.buf[8] = PCI_DEVFN(dev, func);
+ req->tx.buf[9] = bus; /* PCI Bus */
+
+ if (addr_type == PECI_ENDPTCFG_ADDR_TYPE_MMIO_D)
+ put_unaligned_le32(offset, &req->tx.buf[10]);
+ else
+ put_unaligned_le64(offset, &req->tx.buf[10]);
+
+ ret = peci_request_xfer_retry(req);
+ if (ret) {
+ peci_request_free(req);
+ return ERR_PTR(ret);
+ }
+
+ return req;
+}
+
u8 peci_request_data_readb(struct peci_request *req)
{
return req->rx.buf[1];
@@ -256,6 +423,12 @@ u64 peci_request_dib_read(struct peci_request *req)
}
EXPORT_SYMBOL_NS_GPL(peci_request_dib_read, PECI);

+s16 peci_request_temp_read(struct peci_request *req)
+{
+ return get_unaligned_le16(&req->rx.buf[0]);
+}
+EXPORT_SYMBOL_NS_GPL(peci_request_temp_read, PECI);
+
#define __read_pkg_config(x, type) \
struct peci_request *peci_xfer_pkg_cfg_##x(struct peci_device *device, u8 index, u16 param) \
{ \
@@ -267,3 +440,43 @@ __read_pkg_config(readb, u8);
__read_pkg_config(readw, u16);
__read_pkg_config(readl, u32);
__read_pkg_config(readq, u64);
+
+#define __read_pci_config_local(x, type) \
+struct peci_request * \
+peci_xfer_pci_cfg_local_##x(struct peci_device *device, u8 bus, u8 dev, u8 func, u16 reg) \
+{ \
+ return __pci_cfg_local_read(device, bus, dev, func, reg, sizeof(type)); \
+} \
+EXPORT_SYMBOL_NS_GPL(peci_xfer_pci_cfg_local_##x, PECI)
+
+__read_pci_config_local(readb, u8);
+__read_pci_config_local(readw, u16);
+__read_pci_config_local(readl, u32);
+
+#define __read_ep_pci_config(x, msg_type, type) \
+struct peci_request * \
+peci_xfer_ep_pci_cfg_##x(struct peci_device *device, u8 seg, u8 bus, u8 dev, u8 func, u16 reg) \
+{ \
+ return __ep_pci_cfg_read(device, msg_type, seg, bus, dev, func, reg, sizeof(type)); \
+} \
+EXPORT_SYMBOL_NS_GPL(peci_xfer_ep_pci_cfg_##x, PECI)
+
+__read_ep_pci_config(local_readb, PECI_ENDPTCFG_TYPE_LOCAL_PCI, u8);
+__read_ep_pci_config(local_readw, PECI_ENDPTCFG_TYPE_LOCAL_PCI, u16);
+__read_ep_pci_config(local_readl, PECI_ENDPTCFG_TYPE_LOCAL_PCI, u32);
+__read_ep_pci_config(readb, PECI_ENDPTCFG_TYPE_PCI, u8);
+__read_ep_pci_config(readw, PECI_ENDPTCFG_TYPE_PCI, u16);
+__read_ep_pci_config(readl, PECI_ENDPTCFG_TYPE_PCI, u32);
+
+#define __read_ep_mmio(x, y, addr_type, type1, type2) \
+struct peci_request *peci_xfer_ep_mmio##y##_##x(struct peci_device *device, u8 bar, u8 seg, \
+ u8 bus, u8 dev, u8 func, u64 offset) \
+{ \
+ return __ep_mmio_read(device, bar, addr_type, seg, bus, dev, func, \
+ offset, PECI_RDENDPTCFG_MMIO_WR_LEN_BASE + sizeof(type1), \
+ sizeof(type2)); \
+} \
+EXPORT_SYMBOL_NS_GPL(peci_xfer_ep_mmio##y##_##x, PECI)
+
+__read_ep_mmio(readl, 32, PECI_ENDPTCFG_ADDR_TYPE_MMIO_D, u32, u32);
+__read_ep_mmio(readl, 64, PECI_ENDPTCFG_ADDR_TYPE_MMIO_Q, u64, u32);
diff --git a/include/linux/peci-cpu.h b/include/linux/peci-cpu.h
new file mode 100644
index 000000000000..ff8ae9c26c80
--- /dev/null
+++ b/include/linux/peci-cpu.h
@@ -0,0 +1,40 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright (c) 2021 Intel Corporation */
+
+#ifndef __LINUX_PECI_CPU_H
+#define __LINUX_PECI_CPU_H
+
+#include <linux/types.h>
+
+#include "../../arch/x86/include/asm/intel-family.h"
+
+#define PECI_PCS_PKG_ID 0 /* Package Identifier Read */
+#define PECI_PKG_ID_CPU_ID 0x0000 /* CPUID Info */
+#define PECI_PKG_ID_PLATFORM_ID 0x0001 /* Platform ID */
+#define PECI_PKG_ID_DEVICE_ID 0x0002 /* Uncore Device ID */
+#define PECI_PKG_ID_MAX_THREAD_ID 0x0003 /* Max Thread ID */
+#define PECI_PKG_ID_MICROCODE_REV 0x0004 /* CPU Microcode Update Revision */
+#define PECI_PKG_ID_MCA_ERROR_LOG 0x0005 /* Machine Check Status */
+#define PECI_PCS_MODULE_TEMP 9 /* Per Core DTS Temperature Read */
+#define PECI_PCS_THERMAL_MARGIN 10 /* DTS thermal margin */
+#define PECI_PCS_DDR_DIMM_TEMP 14 /* DDR DIMM Temperature */
+#define PECI_PCS_TEMP_TARGET 16 /* Temperature Target Read */
+#define PECI_PCS_TDP_UNITS 30 /* Units for power/energy registers */
+
+struct peci_device;
+
+int peci_temp_read(struct peci_device *device, s16 *temp_raw);
+
+int peci_pcs_read(struct peci_device *device, u8 index,
+ u16 param, u32 *data);
+
+int peci_pci_local_read(struct peci_device *device, u8 bus, u8 dev,
+ u8 func, u16 reg, u32 *data);
+
+int peci_ep_pci_local_read(struct peci_device *device, u8 seg,
+ u8 bus, u8 dev, u8 func, u16 reg, u32 *data);
+
+int peci_mmio_read(struct peci_device *device, u8 bar, u8 seg,
+ u8 bus, u8 dev, u8 func, u64 address, u32 *data);
+
+#endif /* __LINUX_PECI_CPU_H */
diff --git a/include/linux/peci.h b/include/linux/peci.h
index dcf1c53f4e40..ce43705eaac4 100644
--- a/include/linux/peci.h
+++ b/include/linux/peci.h
@@ -14,14 +14,6 @@
*/
#define PECI_REQUEST_MAX_BUF_SIZE 32

-#define PECI_PCS_PKG_ID 0 /* Package Identifier Read */
-#define PECI_PKG_ID_CPU_ID 0x0000 /* CPUID Info */
-#define PECI_PKG_ID_PLATFORM_ID 0x0001 /* Platform ID */
-#define PECI_PKG_ID_DEVICE_ID 0x0002 /* Uncore Device ID */
-#define PECI_PKG_ID_MAX_THREAD_ID 0x0003 /* Max Thread ID */
-#define PECI_PKG_ID_MICROCODE_REV 0x0004 /* CPU Microcode Update Revision */
-#define PECI_PKG_ID_MCA_ERROR_LOG 0x0005 /* Machine Check Status */
-
struct peci_controller;
struct peci_request;

--
2.31.1


2021-11-16 01:32:46

by Winiarska, Iwona

[permalink] [raw]
Subject: [PATCH v3 06/13] peci: Add device detection

Since PECI devices are discoverable, we can dynamically detect devices
that are actually available in the system.

This change complements the earlier implementation by rescanning PECI
bus to detect available devices. For this purpose, it also introduces the
minimal API for PECI requests.

Signed-off-by: Iwona Winiarska <[email protected]>
Reviewed-by: Pierre-Louis Bossart <[email protected]>
---
drivers/peci/Makefile | 2 +-
drivers/peci/core.c | 33 ++++++++++++
drivers/peci/device.c | 117 ++++++++++++++++++++++++++++++++++++++++
drivers/peci/internal.h | 14 +++++
drivers/peci/request.c | 55 +++++++++++++++++++
5 files changed, 220 insertions(+), 1 deletion(-)
create mode 100644 drivers/peci/device.c
create mode 100644 drivers/peci/request.c

diff --git a/drivers/peci/Makefile b/drivers/peci/Makefile
index 926d8df15cbd..c5f9d3fe21bb 100644
--- a/drivers/peci/Makefile
+++ b/drivers/peci/Makefile
@@ -1,7 +1,7 @@
# SPDX-License-Identifier: GPL-2.0-only

# Core functionality
-peci-y := core.o
+peci-y := core.o request.o device.o
obj-$(CONFIG_PECI) += peci.o

# Hardware specific bus drivers
diff --git a/drivers/peci/core.c b/drivers/peci/core.c
index 73ad0a47fa9d..c3361e6e043a 100644
--- a/drivers/peci/core.c
+++ b/drivers/peci/core.c
@@ -29,6 +29,20 @@ struct device_type peci_controller_type = {
.release = peci_controller_dev_release,
};

+static int peci_controller_scan_devices(struct peci_controller *controller)
+{
+ int ret;
+ u8 addr;
+
+ for (addr = PECI_BASE_ADDR; addr < PECI_BASE_ADDR + PECI_DEVICE_NUM_MAX; addr++) {
+ ret = peci_device_create(controller, addr);
+ if (ret)
+ return ret;
+ }
+
+ return 0;
+}
+
static struct peci_controller *peci_controller_alloc(struct device *dev,
struct peci_controller_ops *ops)
{
@@ -64,10 +78,23 @@ static struct peci_controller *peci_controller_alloc(struct device *dev,
return ERR_PTR(ret);
}

+static int unregister_child(struct device *dev, void *dummy)
+{
+ peci_device_destroy(to_peci_device(dev));
+
+ return 0;
+}
+
static void unregister_controller(void *_controller)
{
struct peci_controller *controller = _controller;

+ /*
+ * Detach any active PECI devices. This can't fail, thus we do not
+ * check the returned value.
+ */
+ device_for_each_child_reverse(&controller->dev, NULL, unregister_child);
+
device_unregister(&controller->dev);

fwnode_handle_put(controller->dev.fwnode);
@@ -113,6 +140,12 @@ struct peci_controller *devm_peci_controller_add(struct device *dev,
if (ret)
return ERR_PTR(ret);

+ /*
+ * Ignoring retval since failures during scan are non-critical for
+ * controller itself.
+ */
+ peci_controller_scan_devices(controller);
+
return controller;

err_fwnode:
diff --git a/drivers/peci/device.c b/drivers/peci/device.c
new file mode 100644
index 000000000000..39f7f6409391
--- /dev/null
+++ b/drivers/peci/device.c
@@ -0,0 +1,117 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright (c) 2018-2021 Intel Corporation
+
+#include <linux/peci.h>
+#include <linux/slab.h>
+
+#include "internal.h"
+
+static int peci_detect(struct peci_controller *controller, u8 addr)
+{
+ /*
+ * PECI Ping is a command encoded by tx_len = 0, rx_len = 0.
+ * We expect correct Write FCS if the device at the target address
+ * is able to respond.
+ */
+ struct peci_request req = { 0 };
+ int ret;
+
+ mutex_lock(&controller->bus_lock);
+ ret = controller->ops->xfer(controller, addr, &req);
+ mutex_unlock(&controller->bus_lock);
+
+ return ret;
+}
+
+static bool peci_addr_valid(u8 addr)
+{
+ return addr >= PECI_BASE_ADDR && addr < PECI_BASE_ADDR + PECI_DEVICE_NUM_MAX;
+}
+
+static int peci_dev_exists(struct device *dev, void *data)
+{
+ struct peci_device *device = to_peci_device(dev);
+ u8 *addr = data;
+
+ if (device->addr == *addr)
+ return -EBUSY;
+
+ return 0;
+}
+
+int peci_device_create(struct peci_controller *controller, u8 addr)
+{
+ struct peci_device *device;
+ int ret;
+
+ if (!peci_addr_valid(addr))
+ return -EINVAL;
+
+ /* Check if we have already detected this device before. */
+ ret = device_for_each_child(&controller->dev, &addr, peci_dev_exists);
+ if (ret)
+ return 0;
+
+ ret = peci_detect(controller, addr);
+ if (ret) {
+ /*
+ * Device not present or host state doesn't allow successful
+ * detection at this time.
+ */
+ if (ret == -EIO || ret == -ETIMEDOUT)
+ return 0;
+
+ return ret;
+ }
+
+ device = kzalloc(sizeof(*device), GFP_KERNEL);
+ if (!device)
+ return -ENOMEM;
+
+ device_initialize(&device->dev);
+
+ device->addr = addr;
+ device->dev.parent = &controller->dev;
+ device->dev.bus = &peci_bus_type;
+ device->dev.type = &peci_device_type;
+
+ ret = dev_set_name(&device->dev, "%d-%02x", controller->id, device->addr);
+ if (ret)
+ goto err_put;
+
+ ret = device_add(&device->dev);
+ if (ret)
+ goto err_put;
+
+ return 0;
+
+err_put:
+ put_device(&device->dev);
+
+ return ret;
+}
+
+void peci_device_destroy(struct peci_device *device)
+{
+ bool killed;
+
+ device_lock(&device->dev);
+ killed = kill_device(&device->dev);
+ device_unlock(&device->dev);
+
+ if (!killed)
+ return;
+
+ device_unregister(&device->dev);
+}
+
+static void peci_device_release(struct device *dev)
+{
+ struct peci_device *device = to_peci_device(dev);
+
+ kfree(device);
+}
+
+struct device_type peci_device_type = {
+ .release = peci_device_release,
+};
diff --git a/drivers/peci/internal.h b/drivers/peci/internal.h
index 918dea745a86..57d11a902c5d 100644
--- a/drivers/peci/internal.h
+++ b/drivers/peci/internal.h
@@ -8,6 +8,20 @@
#include <linux/types.h>

struct peci_controller;
+struct peci_device;
+struct peci_request;
+
+/* PECI CPU address range 0x30-0x37 */
+#define PECI_BASE_ADDR 0x30
+#define PECI_DEVICE_NUM_MAX 8
+
+struct peci_request *peci_request_alloc(struct peci_device *device, u8 tx_len, u8 rx_len);
+void peci_request_free(struct peci_request *req);
+
+extern struct device_type peci_device_type;
+
+int peci_device_create(struct peci_controller *controller, u8 addr);
+void peci_device_destroy(struct peci_device *device);

extern struct bus_type peci_bus_type;

diff --git a/drivers/peci/request.c b/drivers/peci/request.c
new file mode 100644
index 000000000000..7dee51c50dd2
--- /dev/null
+++ b/drivers/peci/request.c
@@ -0,0 +1,55 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright (c) 2021 Intel Corporation
+
+#include <linux/export.h>
+#include <linux/peci.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+
+#include "internal.h"
+
+/**
+ * peci_request_alloc() - allocate &struct peci_requests
+ * @device: PECI device to which request is going to be sent
+ * @tx_len: TX length
+ * @rx_len: RX length
+ *
+ * Return: A pointer to a newly allocated &struct peci_request on success or NULL otherwise.
+ */
+struct peci_request *peci_request_alloc(struct peci_device *device, u8 tx_len, u8 rx_len)
+{
+ struct peci_request *req;
+
+ /*
+ * TX and RX buffers are fixed length members of peci_request, this is
+ * just a warn for developers to make sure to expand the buffers (or
+ * change the allocation method) if we go over the current limit.
+ */
+ if (WARN_ON_ONCE(tx_len > PECI_REQUEST_MAX_BUF_SIZE || rx_len > PECI_REQUEST_MAX_BUF_SIZE))
+ return NULL;
+ /*
+ * PECI controllers that we are using now don't support DMA, this
+ * should be converted to DMA API once support for controllers that do
+ * allow it is added to avoid an extra copy.
+ */
+ req = kzalloc(sizeof(*req), GFP_KERNEL);
+ if (!req)
+ return NULL;
+
+ req->device = device;
+ req->tx.len = tx_len;
+ req->rx.len = rx_len;
+
+ return req;
+}
+EXPORT_SYMBOL_NS_GPL(peci_request_alloc, PECI);
+
+/**
+ * peci_request_free() - free peci_request
+ * @req: the PECI request to be freed
+ */
+void peci_request_free(struct peci_request *req)
+{
+ kfree(req);
+}
+EXPORT_SYMBOL_NS_GPL(peci_request_free, PECI);
--
2.31.1


2021-11-16 01:35:00

by Winiarska, Iwona

[permalink] [raw]
Subject: [PATCH v3 12/13] docs: hwmon: Document PECI drivers

From: Jae Hyun Yoo <[email protected]>

Add documentation for peci-cputemp driver that provides DTS thermal
readings for CPU packages and CPU cores, and peci-dimmtemp driver that
provides Temperature Sensor on DIMM readings.

Signed-off-by: Jae Hyun Yoo <[email protected]>
Co-developed-by: Iwona Winiarska <[email protected]>
Signed-off-by: Iwona Winiarska <[email protected]>
Reviewed-by: Pierre-Louis Bossart <[email protected]>
---
Documentation/hwmon/index.rst | 2 +
Documentation/hwmon/peci-cputemp.rst | 90 +++++++++++++++++++++++++++
Documentation/hwmon/peci-dimmtemp.rst | 57 +++++++++++++++++
MAINTAINERS | 2 +
4 files changed, 151 insertions(+)
create mode 100644 Documentation/hwmon/peci-cputemp.rst
create mode 100644 Documentation/hwmon/peci-dimmtemp.rst

diff --git a/Documentation/hwmon/index.rst b/Documentation/hwmon/index.rst
index 7046bf1870d9..6ebd73a55c26 100644
--- a/Documentation/hwmon/index.rst
+++ b/Documentation/hwmon/index.rst
@@ -156,6 +156,8 @@ Hardware Monitoring Kernel Drivers
pcf8591
pim4328
pm6764tr
+ peci-cputemp
+ peci-dimmtemp
pmbus
powr1220
pxe1610
diff --git a/Documentation/hwmon/peci-cputemp.rst b/Documentation/hwmon/peci-cputemp.rst
new file mode 100644
index 000000000000..fe0422248dc5
--- /dev/null
+++ b/Documentation/hwmon/peci-cputemp.rst
@@ -0,0 +1,90 @@
+.. SPDX-License-Identifier: GPL-2.0-only
+
+Kernel driver peci-cputemp
+==========================
+
+Supported chips:
+ One of Intel server CPUs listed below which is connected to a PECI bus.
+ * Intel Xeon E5/E7 v3 server processors
+ Intel Xeon E5-14xx v3 family
+ Intel Xeon E5-24xx v3 family
+ Intel Xeon E5-16xx v3 family
+ Intel Xeon E5-26xx v3 family
+ Intel Xeon E5-46xx v3 family
+ Intel Xeon E7-48xx v3 family
+ Intel Xeon E7-88xx v3 family
+ * Intel Xeon E5/E7 v4 server processors
+ Intel Xeon E5-16xx v4 family
+ Intel Xeon E5-26xx v4 family
+ Intel Xeon E5-46xx v4 family
+ Intel Xeon E7-48xx v4 family
+ Intel Xeon E7-88xx v4 family
+ * Intel Xeon Scalable server processors
+ Intel Xeon D family
+ Intel Xeon Bronze family
+ Intel Xeon Silver family
+ Intel Xeon Gold family
+ Intel Xeon Platinum family
+
+ Datasheet: Available from http://www.intel.com/design/literature.htm
+
+Author: Jae Hyun Yoo <[email protected]>
+
+Description
+-----------
+
+This driver implements a generic PECI hwmon feature which provides Digital
+Thermal Sensor (DTS) thermal readings of the CPU package and CPU cores that are
+accessible via the processor PECI interface.
+
+All temperature values are given in millidegree Celsius and will be measurable
+only when the target CPU is powered on.
+
+Sysfs interface
+-------------------
+
+======================= =======================================================
+temp1_label "Die"
+temp1_input Provides current die temperature of the CPU package.
+temp1_max Provides thermal control temperature of the CPU package
+ which is also known as Tcontrol.
+temp1_crit Provides shutdown temperature of the CPU package which
+ is also known as the maximum processor junction
+ temperature, Tjmax or Tprochot.
+temp1_crit_hyst Provides the hysteresis value from Tcontrol to Tjmax of
+ the CPU package.
+
+temp2_label "DTS"
+temp2_input Provides current temperature of the CPU package scaled
+ to match DTS thermal profile.
+temp2_max Provides thermal control temperature of the CPU package
+ which is also known as Tcontrol.
+temp2_crit Provides shutdown temperature of the CPU package which
+ is also known as the maximum processor junction
+ temperature, Tjmax or Tprochot.
+temp2_crit_hyst Provides the hysteresis value from Tcontrol to Tjmax of
+ the CPU package.
+
+temp3_label "Tcontrol"
+temp3_input Provides current Tcontrol temperature of the CPU
+ package which is also known as Fan Temperature target.
+ Indicates the relative value from thermal monitor trip
+ temperature at which fans should be engaged.
+temp3_crit Provides Tcontrol critical value of the CPU package
+ which is same to Tjmax.
+
+temp4_label "Tthrottle"
+temp4_input Provides current Tthrottle temperature of the CPU
+ package. Used for throttling temperature. If this value
+ is allowed and lower than Tjmax - the throttle will
+ occur and reported at lower than Tjmax.
+
+temp5_label "Tjmax"
+temp5_input Provides the maximum junction temperature, Tjmax of the
+ CPU package.
+
+temp[6-N]_label Provides string "Core X", where X is resolved core
+ number.
+temp[6-N]_input Provides current temperature of each core.
+
+======================= =======================================================
diff --git a/Documentation/hwmon/peci-dimmtemp.rst b/Documentation/hwmon/peci-dimmtemp.rst
new file mode 100644
index 000000000000..e562aed620de
--- /dev/null
+++ b/Documentation/hwmon/peci-dimmtemp.rst
@@ -0,0 +1,57 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Kernel driver peci-dimmtemp
+===========================
+
+Supported chips:
+ One of Intel server CPUs listed below which is connected to a PECI bus.
+ * Intel Xeon E5/E7 v3 server processors
+ Intel Xeon E5-14xx v3 family
+ Intel Xeon E5-24xx v3 family
+ Intel Xeon E5-16xx v3 family
+ Intel Xeon E5-26xx v3 family
+ Intel Xeon E5-46xx v3 family
+ Intel Xeon E7-48xx v3 family
+ Intel Xeon E7-88xx v3 family
+ * Intel Xeon E5/E7 v4 server processors
+ Intel Xeon E5-16xx v4 family
+ Intel Xeon E5-26xx v4 family
+ Intel Xeon E5-46xx v4 family
+ Intel Xeon E7-48xx v4 family
+ Intel Xeon E7-88xx v4 family
+ * Intel Xeon Scalable server processors
+ Intel Xeon D family
+ Intel Xeon Bronze family
+ Intel Xeon Silver family
+ Intel Xeon Gold family
+ Intel Xeon Platinum family
+
+ Datasheet: Available from http://www.intel.com/design/literature.htm
+
+Author: Jae Hyun Yoo <[email protected]>
+
+Description
+-----------
+
+This driver implements a generic PECI hwmon feature which provides
+Temperature sensor on DIMM readings that are accessible via the processor PECI interface.
+
+All temperature values are given in millidegree Celsius and will be measurable
+only when the target CPU is powered on.
+
+Sysfs interface
+-------------------
+
+======================= =======================================================
+
+temp[N]_label Provides string "DIMM CI", where C is DIMM channel and
+ I is DIMM index of the populated DIMM.
+temp[N]_input Provides current temperature of the populated DIMM.
+temp[N]_max Provides thermal control temperature of the DIMM.
+temp[N]_crit Provides shutdown temperature of the DIMM.
+
+======================= =======================================================
+
+Note:
+ DIMM temperature attributes will appear when the client CPU's BIOS
+ completes memory training and testing.
diff --git a/MAINTAINERS b/MAINTAINERS
index 0094370be6c4..0f7216644bd5 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14925,6 +14925,8 @@ M: Iwona Winiarska <[email protected]>
R: Jae Hyun Yoo <[email protected]>
L: [email protected]
S: Supported
+F: Documentation/hwmon/peci-cputemp.rst
+F: Documentation/hwmon/peci-dimmtemp.rst
F: drivers/hwmon/peci/

PECI SUBSYSTEM
--
2.31.1


2021-11-16 01:35:15

by Winiarska, Iwona

[permalink] [raw]
Subject: [PATCH v3 10/13] hwmon: peci: Add cputemp driver

Add peci-cputemp driver for Digital Thermal Sensor (DTS) thermal
readings of the processor package and processor cores that are
accessible via the PECI interface.

The main use case for the driver (and PECI interface) is out-of-band
management, where we're able to obtain the DTS readings from an external
entity connected with PECI, e.g. BMC on server platforms.

Co-developed-by: Jae Hyun Yoo <[email protected]>
Signed-off-by: Jae Hyun Yoo <[email protected]>
Signed-off-by: Iwona Winiarska <[email protected]>
Reviewed-by: Pierre-Louis Bossart <[email protected]>
---
MAINTAINERS | 7 +
drivers/hwmon/Kconfig | 2 +
drivers/hwmon/Makefile | 1 +
drivers/hwmon/peci/Kconfig | 18 ++
drivers/hwmon/peci/Makefile | 5 +
drivers/hwmon/peci/common.h | 58 ++++
drivers/hwmon/peci/cputemp.c | 592 +++++++++++++++++++++++++++++++++++
7 files changed, 683 insertions(+)
create mode 100644 drivers/hwmon/peci/Kconfig
create mode 100644 drivers/hwmon/peci/Makefile
create mode 100644 drivers/hwmon/peci/common.h
create mode 100644 drivers/hwmon/peci/cputemp.c

diff --git a/MAINTAINERS b/MAINTAINERS
index d751590f32e3..0094370be6c4 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14920,6 +14920,13 @@ L: [email protected]
S: Maintained
F: drivers/platform/x86/peaq-wmi.c

+PECI HARDWARE MONITORING DRIVERS
+M: Iwona Winiarska <[email protected]>
+R: Jae Hyun Yoo <[email protected]>
+L: [email protected]
+S: Supported
+F: drivers/hwmon/peci/
+
PECI SUBSYSTEM
M: Iwona Winiarska <[email protected]>
R: Jae Hyun Yoo <[email protected]>
diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
index 64bd3dfba2c4..4858f7cfa81e 100644
--- a/drivers/hwmon/Kconfig
+++ b/drivers/hwmon/Kconfig
@@ -1528,6 +1528,8 @@ config SENSORS_PCF8591
These devices are hard to detect and rarely found on mainstream
hardware. If unsure, say N.

+source "drivers/hwmon/peci/Kconfig"
+
source "drivers/hwmon/pmbus/Kconfig"

config SENSORS_PWM_FAN
diff --git a/drivers/hwmon/Makefile b/drivers/hwmon/Makefile
index baee6a8d4dd1..2c0e210a28b8 100644
--- a/drivers/hwmon/Makefile
+++ b/drivers/hwmon/Makefile
@@ -204,6 +204,7 @@ obj-$(CONFIG_SENSORS_WM8350) += wm8350-hwmon.o
obj-$(CONFIG_SENSORS_XGENE) += xgene-hwmon.o

obj-$(CONFIG_SENSORS_OCC) += occ/
+obj-$(CONFIG_SENSORS_PECI) += peci/
obj-$(CONFIG_PMBUS) += pmbus/

ccflags-$(CONFIG_HWMON_DEBUG_CHIP) := -DDEBUG
diff --git a/drivers/hwmon/peci/Kconfig b/drivers/hwmon/peci/Kconfig
new file mode 100644
index 000000000000..e10eed68d70a
--- /dev/null
+++ b/drivers/hwmon/peci/Kconfig
@@ -0,0 +1,18 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+config SENSORS_PECI_CPUTEMP
+ tristate "PECI CPU temperature monitoring client"
+ depends on PECI
+ select SENSORS_PECI
+ select PECI_CPU
+ help
+ If you say yes here you get support for the generic Intel PECI
+ cputemp driver which provides Digital Thermal Sensor (DTS) thermal
+ readings of the CPU package and CPU cores that are accessible via
+ the processor PECI interface.
+
+ This driver can also be built as a module. If so, the module
+ will be called peci-cputemp.
+
+config SENSORS_PECI
+ tristate
diff --git a/drivers/hwmon/peci/Makefile b/drivers/hwmon/peci/Makefile
new file mode 100644
index 000000000000..e8a0ada5ab1f
--- /dev/null
+++ b/drivers/hwmon/peci/Makefile
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+peci-cputemp-y := cputemp.o
+
+obj-$(CONFIG_SENSORS_PECI_CPUTEMP) += peci-cputemp.o
diff --git a/drivers/hwmon/peci/common.h b/drivers/hwmon/peci/common.h
new file mode 100644
index 000000000000..734506b0eca2
--- /dev/null
+++ b/drivers/hwmon/peci/common.h
@@ -0,0 +1,58 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright (c) 2021 Intel Corporation */
+
+#include <linux/mutex.h>
+#include <linux/types.h>
+
+#ifndef __PECI_HWMON_COMMON_H
+#define __PECI_HWMON_COMMON_H
+
+#define PECI_HWMON_UPDATE_INTERVAL HZ
+
+/**
+ * struct peci_sensor_state - PECI state information
+ * @valid: flag to indicate the sensor value is valid
+ * @last_updated: time of the last update in jiffies
+ * @lock: mutex to protect sensor access
+ */
+struct peci_sensor_state {
+ bool valid;
+ unsigned long last_updated;
+ struct mutex lock; /* protect sensor access */
+};
+
+/**
+ * struct peci_sensor_data - PECI sensor information
+ * @value: sensor value in milli units
+ * @state: sensor update state
+ */
+
+struct peci_sensor_data {
+ s32 value;
+ struct peci_sensor_state state;
+};
+
+/**
+ * peci_sensor_need_update() - check whether sensor update is needed or not
+ * @sensor: pointer to sensor data struct
+ *
+ * Return: true if update is needed, false if not.
+ */
+
+static inline bool peci_sensor_need_update(struct peci_sensor_state *state)
+{
+ return !state->valid ||
+ time_after(jiffies, state->last_updated + PECI_HWMON_UPDATE_INTERVAL);
+}
+
+/**
+ * peci_sensor_mark_updated() - mark the sensor is updated
+ * @sensor: pointer to sensor data struct
+ */
+static inline void peci_sensor_mark_updated(struct peci_sensor_state *state)
+{
+ state->valid = true;
+ state->last_updated = jiffies;
+}
+
+#endif /* __PECI_HWMON_COMMON_H */
diff --git a/drivers/hwmon/peci/cputemp.c b/drivers/hwmon/peci/cputemp.c
new file mode 100644
index 000000000000..d264d6c2f437
--- /dev/null
+++ b/drivers/hwmon/peci/cputemp.c
@@ -0,0 +1,592 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright (c) 2018-2021 Intel Corporation
+
+#include <linux/auxiliary_bus.h>
+#include <linux/bitfield.h>
+#include <linux/bitops.h>
+#include <linux/hwmon.h>
+#include <linux/jiffies.h>
+#include <linux/module.h>
+#include <linux/peci.h>
+#include <linux/peci-cpu.h>
+#include <linux/units.h>
+
+#include "common.h"
+
+#define CORE_NUMS_MAX 64
+
+#define BASE_CHANNEL_NUMS 5
+#define CPUTEMP_CHANNEL_NUMS (BASE_CHANNEL_NUMS + CORE_NUMS_MAX)
+
+#define TEMP_TARGET_FAN_TEMP_MASK GENMASK(15, 8)
+#define TEMP_TARGET_REF_TEMP_MASK GENMASK(23, 16)
+#define TEMP_TARGET_TJ_OFFSET_MASK GENMASK(29, 24)
+
+#define DTS_MARGIN_MASK GENMASK(15, 0)
+#define PCS_MODULE_TEMP_MASK GENMASK(15, 0)
+
+struct resolved_cores_reg {
+ u8 bus;
+ u8 dev;
+ u8 func;
+ u8 offset;
+};
+
+struct cpu_info {
+ struct resolved_cores_reg *reg;
+ u8 min_peci_revision;
+ s32 (*thermal_margin_to_millidegree)(s32 val);
+};
+
+struct peci_temp_target {
+ s32 tcontrol;
+ s32 tthrottle;
+ s32 tjmax;
+ struct peci_sensor_state state;
+};
+
+enum peci_temp_target_type {
+ tcontrol_type,
+ tthrottle_type,
+ tjmax_type,
+ crit_hyst_type,
+};
+
+struct peci_cputemp {
+ struct peci_device *peci_dev;
+ struct device *dev;
+ const char *name;
+ const struct cpu_info *gen_info;
+ struct {
+ struct peci_temp_target target;
+ struct peci_sensor_data die;
+ struct peci_sensor_data dts;
+ struct peci_sensor_data core[CORE_NUMS_MAX];
+ } temp;
+ const char **coretemp_label;
+ DECLARE_BITMAP(core_mask, CORE_NUMS_MAX);
+};
+
+enum cputemp_channels {
+ channel_die,
+ channel_dts,
+ channel_tcontrol,
+ channel_tthrottle,
+ channel_tjmax,
+ channel_core,
+};
+
+static const char * const cputemp_label[BASE_CHANNEL_NUMS] = {
+ "Die",
+ "DTS",
+ "Tcontrol",
+ "Tthrottle",
+ "Tjmax",
+};
+
+static int update_temp_target(struct peci_cputemp *priv)
+{
+ s32 tthrottle_offset, tcontrol_margin;
+ u32 pcs;
+ int ret;
+
+ if (!peci_sensor_need_update(&priv->temp.target.state))
+ return 0;
+
+ ret = peci_pcs_read(priv->peci_dev, PECI_PCS_TEMP_TARGET, 0, &pcs);
+ if (ret)
+ return ret;
+
+ priv->temp.target.tjmax =
+ FIELD_GET(TEMP_TARGET_REF_TEMP_MASK, pcs) * MILLIDEGREE_PER_DEGREE;
+
+ tcontrol_margin = FIELD_GET(TEMP_TARGET_FAN_TEMP_MASK, pcs);
+ tcontrol_margin = sign_extend32(tcontrol_margin, 7) * MILLIDEGREE_PER_DEGREE;
+ priv->temp.target.tcontrol = priv->temp.target.tjmax - tcontrol_margin;
+
+ tthrottle_offset = FIELD_GET(TEMP_TARGET_TJ_OFFSET_MASK, pcs) * MILLIDEGREE_PER_DEGREE;
+ priv->temp.target.tthrottle = priv->temp.target.tjmax - tthrottle_offset;
+
+ peci_sensor_mark_updated(&priv->temp.target.state);
+
+ return 0;
+}
+
+static int get_temp_target(struct peci_cputemp *priv, enum peci_temp_target_type type, long *val)
+{
+ int ret;
+
+ mutex_lock(&priv->temp.target.state.lock);
+
+ ret = update_temp_target(priv);
+ if (ret)
+ goto unlock;
+
+ switch (type) {
+ case tcontrol_type:
+ *val = priv->temp.target.tcontrol;
+ break;
+ case tthrottle_type:
+ *val = priv->temp.target.tthrottle;
+ break;
+ case tjmax_type:
+ *val = priv->temp.target.tjmax;
+ break;
+ case crit_hyst_type:
+ *val = priv->temp.target.tjmax - priv->temp.target.tcontrol;
+ break;
+ default:
+ ret = -EOPNOTSUPP;
+ break;
+ }
+unlock:
+ mutex_unlock(&priv->temp.target.state.lock);
+
+ return ret;
+}
+
+/*
+ * Error codes:
+ * 0x8000: General sensor error
+ * 0x8001: Reserved
+ * 0x8002: Underflow on reading value
+ * 0x8003-0x81ff: Reserved
+ */
+static bool dts_valid(s32 val)
+{
+ return val < 0x8000 || val > 0x81ff;
+}
+
+/*
+ * Processors return a value of DTS reading in S10.6 fixed point format
+ * (16 bits: 10-bit signed magnitude, 6-bit fraction).
+ */
+static s32 dts_ten_dot_six_to_millidegree(s32 val)
+{
+ return sign_extend32(val, 15) * MILLIDEGREE_PER_DEGREE / 64;
+}
+
+/*
+ * For older processors, thermal margin reading is returned in S8.8 fixed
+ * point format (16 bits: 8-bit signed magnitude, 8-bit fraction).
+ */
+static s32 dts_eight_dot_eight_to_millidegree(s32 val)
+{
+ return sign_extend32(val, 15) * MILLIDEGREE_PER_DEGREE / 256;
+}
+
+static int get_die_temp(struct peci_cputemp *priv, long *val)
+{
+ int ret = 0;
+ long tjmax;
+ s16 temp;
+
+ mutex_lock(&priv->temp.die.state.lock);
+ if (!peci_sensor_need_update(&priv->temp.die.state))
+ goto skip_update;
+
+ ret = peci_temp_read(priv->peci_dev, &temp);
+ if (ret)
+ goto err_unlock;
+
+ if (!dts_valid(temp)) {
+ ret = -EIO;
+ goto err_unlock;
+ }
+
+ ret = get_temp_target(priv, tjmax_type, &tjmax);
+ if (ret)
+ goto err_unlock;
+
+ priv->temp.die.value = (s32)tjmax + dts_ten_dot_six_to_millidegree(temp);
+
+ peci_sensor_mark_updated(&priv->temp.die.state);
+
+skip_update:
+ *val = priv->temp.die.value;
+err_unlock:
+ mutex_unlock(&priv->temp.die.state.lock);
+ return ret;
+}
+
+static int get_dts(struct peci_cputemp *priv, long *val)
+{
+ int ret = 0;
+ s32 thermal_margin;
+ long tcontrol;
+ u32 pcs;
+
+ mutex_lock(&priv->temp.dts.state.lock);
+ if (!peci_sensor_need_update(&priv->temp.dts.state))
+ goto skip_update;
+
+ ret = peci_pcs_read(priv->peci_dev, PECI_PCS_THERMAL_MARGIN, 0, &pcs);
+ if (ret)
+ goto err_unlock;
+
+ thermal_margin = FIELD_GET(DTS_MARGIN_MASK, pcs);
+ if (!dts_valid(thermal_margin)) {
+ ret = -EIO;
+ goto err_unlock;
+ }
+
+ ret = get_temp_target(priv, tcontrol_type, &tcontrol);
+ if (ret)
+ goto err_unlock;
+
+ /* Note that the tcontrol should be available before calling it */
+ priv->temp.dts.value =
+ (s32)tcontrol - priv->gen_info->thermal_margin_to_millidegree(thermal_margin);
+
+ peci_sensor_mark_updated(&priv->temp.dts.state);
+
+skip_update:
+ *val = priv->temp.dts.value;
+err_unlock:
+ mutex_unlock(&priv->temp.dts.state.lock);
+ return ret;
+}
+
+static int get_core_temp(struct peci_cputemp *priv, int core_index, long *val)
+{
+ int ret = 0;
+ s32 core_dts_margin;
+ long tjmax;
+ u32 pcs;
+
+ mutex_lock(&priv->temp.core[core_index].state.lock);
+ if (!peci_sensor_need_update(&priv->temp.core[core_index].state))
+ goto skip_update;
+
+ ret = peci_pcs_read(priv->peci_dev, PECI_PCS_MODULE_TEMP, core_index, &pcs);
+ if (ret)
+ goto err_unlock;
+
+ core_dts_margin = FIELD_GET(PCS_MODULE_TEMP_MASK, pcs);
+ if (!dts_valid(core_dts_margin)) {
+ ret = -EIO;
+ goto err_unlock;
+ }
+
+ ret = get_temp_target(priv, tjmax_type, &tjmax);
+ if (ret)
+ goto err_unlock;
+
+ /* Note that the tjmax should be available before calling it */
+ priv->temp.core[core_index].value =
+ (s32)tjmax + dts_ten_dot_six_to_millidegree(core_dts_margin);
+
+ peci_sensor_mark_updated(&priv->temp.core[core_index].state);
+
+skip_update:
+ *val = priv->temp.core[core_index].value;
+err_unlock:
+ mutex_unlock(&priv->temp.core[core_index].state.lock);
+ return ret;
+}
+
+static int cputemp_read_string(struct device *dev, enum hwmon_sensor_types type,
+ u32 attr, int channel, const char **str)
+{
+ struct peci_cputemp *priv = dev_get_drvdata(dev);
+
+ if (attr != hwmon_temp_label)
+ return -EOPNOTSUPP;
+
+ *str = channel < channel_core ?
+ cputemp_label[channel] : priv->coretemp_label[channel - channel_core];
+
+ return 0;
+}
+
+static int cputemp_read(struct device *dev, enum hwmon_sensor_types type,
+ u32 attr, int channel, long *val)
+{
+ struct peci_cputemp *priv = dev_get_drvdata(dev);
+
+ switch (attr) {
+ case hwmon_temp_input:
+ switch (channel) {
+ case channel_die:
+ return get_die_temp(priv, val);
+ case channel_dts:
+ return get_dts(priv, val);
+ case channel_tcontrol:
+ return get_temp_target(priv, tcontrol_type, val);
+ case channel_tthrottle:
+ return get_temp_target(priv, tthrottle_type, val);
+ case channel_tjmax:
+ return get_temp_target(priv, tjmax_type, val);
+ default:
+ return get_core_temp(priv, channel - channel_core, val);
+ }
+ break;
+ case hwmon_temp_max:
+ return get_temp_target(priv, tcontrol_type, val);
+ case hwmon_temp_crit:
+ return get_temp_target(priv, tjmax_type, val);
+ case hwmon_temp_crit_hyst:
+ return get_temp_target(priv, crit_hyst_type, val);
+ default:
+ return -EOPNOTSUPP;
+ }
+
+ return 0;
+}
+
+static umode_t cputemp_is_visible(const void *data, enum hwmon_sensor_types type,
+ u32 attr, int channel)
+{
+ const struct peci_cputemp *priv = data;
+
+ if (channel > CPUTEMP_CHANNEL_NUMS)
+ return 0;
+
+ if (channel < channel_core)
+ return 0444;
+
+ if (test_bit(channel - channel_core, priv->core_mask))
+ return 0444;
+
+ return 0;
+}
+
+static int init_core_mask(struct peci_cputemp *priv)
+{
+ struct peci_device *peci_dev = priv->peci_dev;
+ struct resolved_cores_reg *reg = priv->gen_info->reg;
+ u64 core_mask;
+ u32 data;
+ int ret;
+
+ /* Get the RESOLVED_CORES register value */
+ switch (peci_dev->info.model) {
+ case INTEL_FAM6_ICELAKE_X:
+ case INTEL_FAM6_ICELAKE_D:
+ ret = peci_ep_pci_local_read(peci_dev, 0, reg->bus, reg->dev,
+ reg->func, reg->offset + 4, &data);
+ if (ret)
+ return ret;
+
+ core_mask = (u64)data << 32;
+
+ ret = peci_ep_pci_local_read(peci_dev, 0, reg->bus, reg->dev,
+ reg->func, reg->offset, &data);
+ if (ret)
+ return ret;
+
+ core_mask |= data;
+
+ break;
+ default:
+ ret = peci_pci_local_read(peci_dev, reg->bus, reg->dev,
+ reg->func, reg->offset, &data);
+ if (ret)
+ return ret;
+
+ core_mask = data;
+
+ break;
+ }
+
+ if (!core_mask)
+ return -EIO;
+
+ bitmap_from_u64(priv->core_mask, core_mask);
+
+ return 0;
+}
+
+static int create_temp_label(struct peci_cputemp *priv)
+{
+ unsigned long core_max = find_last_bit(priv->core_mask, CORE_NUMS_MAX);
+ int i;
+
+ priv->coretemp_label = devm_kzalloc(priv->dev, core_max * sizeof(char *), GFP_KERNEL);
+ if (!priv->coretemp_label)
+ return -ENOMEM;
+
+ for_each_set_bit(i, priv->core_mask, CORE_NUMS_MAX) {
+ priv->coretemp_label[i] = devm_kasprintf(priv->dev, GFP_KERNEL, "Core %d", i);
+ if (!priv->coretemp_label[i])
+ return -ENOMEM;
+ }
+
+ return 0;
+}
+
+static void check_resolved_cores(struct peci_cputemp *priv)
+{
+ /*
+ * Failure to resolve cores is non-critical, we're still able to
+ * provide other sensor data.
+ */
+
+ if (init_core_mask(priv))
+ return;
+
+ if (create_temp_label(priv))
+ bitmap_zero(priv->core_mask, CORE_NUMS_MAX);
+}
+
+static void sensor_init(struct peci_cputemp *priv)
+{
+ int i;
+
+ mutex_init(&priv->temp.target.state.lock);
+ mutex_init(&priv->temp.die.state.lock);
+ mutex_init(&priv->temp.dts.state.lock);
+
+ for_each_set_bit(i, priv->core_mask, CORE_NUMS_MAX)
+ mutex_init(&priv->temp.core[i].state.lock);
+}
+
+static const struct hwmon_ops peci_cputemp_ops = {
+ .is_visible = cputemp_is_visible,
+ .read_string = cputemp_read_string,
+ .read = cputemp_read,
+};
+
+static const u32 peci_cputemp_temp_channel_config[] = {
+ /* Die temperature */
+ HWMON_T_LABEL | HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_CRIT | HWMON_T_CRIT_HYST,
+ /* DTS margin */
+ HWMON_T_LABEL | HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_CRIT | HWMON_T_CRIT_HYST,
+ /* Tcontrol temperature */
+ HWMON_T_LABEL | HWMON_T_INPUT | HWMON_T_CRIT,
+ /* Tthrottle temperature */
+ HWMON_T_LABEL | HWMON_T_INPUT,
+ /* Tjmax temperature */
+ HWMON_T_LABEL | HWMON_T_INPUT,
+ /* Core temperature - for all core channels */
+ [channel_core ... CPUTEMP_CHANNEL_NUMS - 1] = HWMON_T_LABEL | HWMON_T_INPUT,
+ 0
+};
+
+static const struct hwmon_channel_info peci_cputemp_temp_channel = {
+ .type = hwmon_temp,
+ .config = peci_cputemp_temp_channel_config,
+};
+
+static const struct hwmon_channel_info *peci_cputemp_info[] = {
+ &peci_cputemp_temp_channel,
+ NULL
+};
+
+static const struct hwmon_chip_info peci_cputemp_chip_info = {
+ .ops = &peci_cputemp_ops,
+ .info = peci_cputemp_info,
+};
+
+static int peci_cputemp_probe(struct auxiliary_device *adev,
+ const struct auxiliary_device_id *id)
+{
+ struct device *dev = &adev->dev;
+ struct peci_device *peci_dev = to_peci_device(dev->parent);
+ struct peci_cputemp *priv;
+ struct device *hwmon_dev;
+
+ priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
+ if (!priv)
+ return -ENOMEM;
+
+ priv->name = devm_kasprintf(dev, GFP_KERNEL, "peci_cputemp.cpu%d",
+ peci_dev->info.socket_id);
+ if (!priv->name)
+ return -ENOMEM;
+
+ priv->dev = dev;
+ priv->peci_dev = peci_dev;
+ priv->gen_info = (const struct cpu_info *)id->driver_data;
+
+ /*
+ * This is just a sanity check. Since we're using commands that are
+ * guaranteed to be supported on a given platform, we should never see
+ * revision lower than expected.
+ */
+ if (peci_dev->info.peci_revision < priv->gen_info->min_peci_revision)
+ dev_warn(priv->dev,
+ "Unexpected PECI revision %#x, some features may be unavailable\n",
+ peci_dev->info.peci_revision);
+
+ check_resolved_cores(priv);
+
+ sensor_init(priv);
+
+ hwmon_dev = devm_hwmon_device_register_with_info(priv->dev, priv->name,
+ priv, &peci_cputemp_chip_info, NULL);
+
+ return PTR_ERR_OR_ZERO(hwmon_dev);
+}
+
+/*
+ * RESOLVED_CORES PCI configuration register may have different location on
+ * different platforms.
+ */
+static struct resolved_cores_reg resolved_cores_reg_hsx = {
+ .bus = 1,
+ .dev = 30,
+ .func = 3,
+ .offset = 0xb4,
+};
+
+static struct resolved_cores_reg resolved_cores_reg_icx = {
+ .bus = 14,
+ .dev = 30,
+ .func = 3,
+ .offset = 0xd0,
+};
+
+static const struct cpu_info cpu_hsx = {
+ .reg = &resolved_cores_reg_hsx,
+ .min_peci_revision = 0x33,
+ .thermal_margin_to_millidegree = &dts_eight_dot_eight_to_millidegree,
+};
+
+static const struct cpu_info cpu_icx = {
+ .reg = &resolved_cores_reg_icx,
+ .min_peci_revision = 0x40,
+ .thermal_margin_to_millidegree = &dts_ten_dot_six_to_millidegree,
+};
+
+static const struct auxiliary_device_id peci_cputemp_ids[] = {
+ {
+ .name = "peci_cpu.cputemp.hsx",
+ .driver_data = (kernel_ulong_t)&cpu_hsx,
+ },
+ {
+ .name = "peci_cpu.cputemp.bdx",
+ .driver_data = (kernel_ulong_t)&cpu_hsx,
+ },
+ {
+ .name = "peci_cpu.cputemp.bdxd",
+ .driver_data = (kernel_ulong_t)&cpu_hsx,
+ },
+ {
+ .name = "peci_cpu.cputemp.skx",
+ .driver_data = (kernel_ulong_t)&cpu_hsx,
+ },
+ {
+ .name = "peci_cpu.cputemp.icx",
+ .driver_data = (kernel_ulong_t)&cpu_icx,
+ },
+ {
+ .name = "peci_cpu.cputemp.icxd",
+ .driver_data = (kernel_ulong_t)&cpu_icx,
+ },
+ { }
+};
+MODULE_DEVICE_TABLE(auxiliary, peci_cputemp_ids);
+
+static struct auxiliary_driver peci_cputemp_driver = {
+ .probe = peci_cputemp_probe,
+ .id_table = peci_cputemp_ids,
+};
+
+module_auxiliary_driver(peci_cputemp_driver);
+
+MODULE_AUTHOR("Jae Hyun Yoo <[email protected]>");
+MODULE_AUTHOR("Iwona Winiarska <[email protected]>");
+MODULE_DESCRIPTION("PECI cputemp driver");
+MODULE_LICENSE("GPL");
+MODULE_IMPORT_NS(PECI_CPU);
--
2.31.1


2021-11-16 01:39:52

by Winiarska, Iwona

[permalink] [raw]
Subject: [PATCH v3 05/13] peci: Add peci-aspeed controller driver

From: Jae Hyun Yoo <[email protected]>

ASPEED AST24xx/AST25xx/AST26xx SoCs support the PECI electrical
interface (a.k.a PECI wire) that provides a communication channel with
Intel processors.
This driver allows BMC to discover devices connected to it and
communicate with them using PECI protocol.

Signed-off-by: Jae Hyun Yoo <[email protected]>
Co-developed-by: Iwona Winiarska <[email protected]>
Signed-off-by: Iwona Winiarska <[email protected]>
Reviewed-by: Pierre-Louis Bossart <[email protected]>
---
MAINTAINERS | 9 +
drivers/peci/Kconfig | 6 +
drivers/peci/Makefile | 3 +
drivers/peci/controller/Kconfig | 17 +
drivers/peci/controller/Makefile | 3 +
drivers/peci/controller/peci-aspeed.c | 429 ++++++++++++++++++++++++++
6 files changed, 467 insertions(+)
create mode 100644 drivers/peci/controller/Kconfig
create mode 100644 drivers/peci/controller/Makefile
create mode 100644 drivers/peci/controller/peci-aspeed.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 11a6c0869010..a4da3bba09ba 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2945,6 +2945,15 @@ S: Maintained
F: Documentation/devicetree/bindings/net/asix,ax88796c.yaml
F: drivers/net/ethernet/asix/ax88796c_*

+ASPEED PECI CONTROLLER
+M: Iwona Winiarska <[email protected]>
+M: Jae Hyun Yoo <[email protected]>
+L: [email protected] (moderated for non-subscribers)
+L: [email protected] (moderated for non-subscribers)
+S: Supported
+F: Documentation/devicetree/bindings/peci/peci-aspeed.yaml
+F: drivers/peci/controller/peci-aspeed.c
+
ASPEED PINCTRL DRIVERS
M: Andrew Jeffery <[email protected]>
L: [email protected] (moderated for non-subscribers)
diff --git a/drivers/peci/Kconfig b/drivers/peci/Kconfig
index 71a4ad81225a..99279df97a78 100644
--- a/drivers/peci/Kconfig
+++ b/drivers/peci/Kconfig
@@ -13,3 +13,9 @@ menuconfig PECI

This support is also available as a module. If so, the module
will be called peci.
+
+if PECI
+
+source "drivers/peci/controller/Kconfig"
+
+endif # PECI
diff --git a/drivers/peci/Makefile b/drivers/peci/Makefile
index e789a354e842..926d8df15cbd 100644
--- a/drivers/peci/Makefile
+++ b/drivers/peci/Makefile
@@ -3,3 +3,6 @@
# Core functionality
peci-y := core.o
obj-$(CONFIG_PECI) += peci.o
+
+# Hardware specific bus drivers
+obj-y += controller/
diff --git a/drivers/peci/controller/Kconfig b/drivers/peci/controller/Kconfig
new file mode 100644
index 000000000000..3008b1b81049
--- /dev/null
+++ b/drivers/peci/controller/Kconfig
@@ -0,0 +1,17 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+config PECI_ASPEED
+ tristate "ASPEED PECI support"
+ depends on ARCH_ASPEED || COMPILE_TEST
+ depends on OF
+ depends on HAS_IOMEM
+ help
+ This option enables PECI controller driver for ASPEED AST2400,
+ AST2500 and AST2600 SoCs. It allows BMC to discover devices
+ connected to it, and communicate with them using PECI protocol.
+
+ Say Y here if your system runs on ASPEED SoC and you are using it
+ as BMC for Intel platform.
+
+ This driver can also be built as a module. If so, the module will
+ be called peci-aspeed.
diff --git a/drivers/peci/controller/Makefile b/drivers/peci/controller/Makefile
new file mode 100644
index 000000000000..022c28ef1bf0
--- /dev/null
+++ b/drivers/peci/controller/Makefile
@@ -0,0 +1,3 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+obj-$(CONFIG_PECI_ASPEED) += peci-aspeed.o
diff --git a/drivers/peci/controller/peci-aspeed.c b/drivers/peci/controller/peci-aspeed.c
new file mode 100644
index 000000000000..e1ba14fc0d0c
--- /dev/null
+++ b/drivers/peci/controller/peci-aspeed.c
@@ -0,0 +1,429 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright (c) 2012-2017 ASPEED Technology Inc.
+// Copyright (c) 2018-2021 Intel Corporation
+
+#include <linux/bitfield.h>
+#include <linux/clk.h>
+#include <linux/delay.h>
+#include <linux/interrupt.h>
+#include <linux/io.h>
+#include <linux/iopoll.h>
+#include <linux/jiffies.h>
+#include <linux/module.h>
+#include <linux/of.h>
+#include <linux/peci.h>
+#include <linux/platform_device.h>
+#include <linux/reset.h>
+
+/* ASPEED PECI Registers */
+/* Control Register */
+#define ASPEED_PECI_CTRL 0x00
+#define ASPEED_PECI_CTRL_SAMPLING_MASK GENMASK(19, 16)
+#define ASPEED_PECI_CTRL_RD_MODE_MASK GENMASK(13, 12)
+#define ASPEED_PECI_CTRL_RD_MODE_DBG BIT(13)
+#define ASPEED_PECI_CTRL_RD_MODE_COUNT BIT(12)
+#define ASPEED_PECI_CTRL_CLK_SOURCE BIT(11)
+#define ASPEED_PECI_CTRL_CLK_DIV_MASK GENMASK(10, 8)
+#define ASPEED_PECI_CTRL_INVERT_OUT BIT(7)
+#define ASPEED_PECI_CTRL_INVERT_IN BIT(6)
+#define ASPEED_PECI_CTRL_BUS_CONTENTION_EN BIT(5)
+#define ASPEED_PECI_CTRL_PECI_EN BIT(4)
+#define ASPEED_PECI_CTRL_PECI_CLK_EN BIT(0)
+
+/* Timing Negotiation Register */
+#define ASPEED_PECI_TIMING_NEGOTIATION 0x04
+#define ASPEED_PECI_T_NEGO_MSG_MASK GENMASK(15, 8)
+#define ASPEED_PECI_T_NEGO_ADDR_MASK GENMASK(7, 0)
+
+/* Command Register */
+#define ASPEED_PECI_CMD 0x08
+#define ASPEED_PECI_CMD_PIN_MONITORING BIT(31)
+#define ASPEED_PECI_CMD_STS_MASK GENMASK(27, 24)
+#define ASPEED_PECI_CMD_STS_ADDR_T_NEGO 0x3
+#define ASPEED_PECI_CMD_IDLE_MASK \
+ (ASPEED_PECI_CMD_STS_MASK | ASPEED_PECI_CMD_PIN_MONITORING)
+#define ASPEED_PECI_CMD_FIRE BIT(0)
+
+/* Read/Write Length Register */
+#define ASPEED_PECI_RW_LENGTH 0x0c
+#define ASPEED_PECI_AW_FCS_EN BIT(31)
+#define ASPEED_PECI_RD_LEN_MASK GENMASK(23, 16)
+#define ASPEED_PECI_WR_LEN_MASK GENMASK(15, 8)
+#define ASPEED_PECI_TARGET_ADDR_MASK GENMASK(7, 0)
+
+/* Expected FCS Data Register */
+#define ASPEED_PECI_EXPECTED_FCS 0x10
+#define ASPEED_PECI_EXPECTED_RD_FCS_MASK GENMASK(23, 16)
+#define ASPEED_PECI_EXPECTED_AW_FCS_AUTO_MASK GENMASK(15, 8)
+#define ASPEED_PECI_EXPECTED_WR_FCS_MASK GENMASK(7, 0)
+
+/* Captured FCS Data Register */
+#define ASPEED_PECI_CAPTURED_FCS 0x14
+#define ASPEED_PECI_CAPTURED_RD_FCS_MASK GENMASK(23, 16)
+#define ASPEED_PECI_CAPTURED_WR_FCS_MASK GENMASK(7, 0)
+
+/* Interrupt Register */
+#define ASPEED_PECI_INT_CTRL 0x18
+#define ASPEED_PECI_TIMING_NEGO_SEL_MASK GENMASK(31, 30)
+#define ASPEED_PECI_1ST_BIT_OF_ADDR_NEGO 0
+#define ASPEED_PECI_2ND_BIT_OF_ADDR_NEGO 1
+#define ASPEED_PECI_MESSAGE_NEGO 2
+#define ASPEED_PECI_INT_MASK GENMASK(4, 0)
+#define ASPEED_PECI_INT_BUS_TIMEOUT BIT(4)
+#define ASPEED_PECI_INT_BUS_CONTENTION BIT(3)
+#define ASPEED_PECI_INT_WR_FCS_BAD BIT(2)
+#define ASPEED_PECI_INT_WR_FCS_ABORT BIT(1)
+#define ASPEED_PECI_INT_CMD_DONE BIT(0)
+
+/* Interrupt Status Register */
+#define ASPEED_PECI_INT_STS 0x1c
+#define ASPEED_PECI_INT_TIMING_RESULT_MASK GENMASK(29, 16)
+ /* bits[4..0]: Same bit fields in the 'Interrupt Register' */
+
+/* Rx/Tx Data Buffer Registers */
+#define ASPEED_PECI_WR_DATA0 0x20
+#define ASPEED_PECI_WR_DATA1 0x24
+#define ASPEED_PECI_WR_DATA2 0x28
+#define ASPEED_PECI_WR_DATA3 0x2c
+#define ASPEED_PECI_RD_DATA0 0x30
+#define ASPEED_PECI_RD_DATA1 0x34
+#define ASPEED_PECI_RD_DATA2 0x38
+#define ASPEED_PECI_RD_DATA3 0x3c
+#define ASPEED_PECI_WR_DATA4 0x40
+#define ASPEED_PECI_WR_DATA5 0x44
+#define ASPEED_PECI_WR_DATA6 0x48
+#define ASPEED_PECI_WR_DATA7 0x4c
+#define ASPEED_PECI_RD_DATA4 0x50
+#define ASPEED_PECI_RD_DATA5 0x54
+#define ASPEED_PECI_RD_DATA6 0x58
+#define ASPEED_PECI_RD_DATA7 0x5c
+#define ASPEED_PECI_DATA_BUF_SIZE_MAX 32
+
+/* Timing Negotiation */
+#define ASPEED_PECI_RD_SAMPLING_POINT_DEFAULT 8
+#define ASPEED_PECI_RD_SAMPLING_POINT_MAX (BIT(4) - 1)
+#define ASPEED_PECI_CLK_DIV_DEFAULT 0
+#define ASPEED_PECI_CLK_DIV_MAX (BIT(3) - 1)
+#define ASPEED_PECI_MSG_TIMING_DEFAULT 1
+#define ASPEED_PECI_MSG_TIMING_MAX (BIT(8) - 1)
+#define ASPEED_PECI_ADDR_TIMING_DEFAULT 1
+#define ASPEED_PECI_ADDR_TIMING_MAX (BIT(8) - 1)
+
+/* Timeout */
+#define ASPEED_PECI_IDLE_CHECK_TIMEOUT_US (50 * USEC_PER_MSEC)
+#define ASPEED_PECI_IDLE_CHECK_INTERVAL_US (10 * USEC_PER_MSEC)
+#define ASPEED_PECI_CMD_TIMEOUT_MS_DEFAULT (1000)
+#define ASPEED_PECI_CMD_TIMEOUT_MS_MAX (1000)
+
+struct aspeed_peci {
+ struct peci_controller *controller;
+ struct device *dev;
+ void __iomem *base;
+ struct clk *clk;
+ struct reset_control *rst;
+ int irq;
+ spinlock_t lock; /* to sync completion status handling */
+ struct completion xfer_complete;
+ u32 status;
+ u32 cmd_timeout_ms;
+ u32 msg_timing;
+ u32 addr_timing;
+ u32 rd_sampling_point;
+ u32 clk_div;
+};
+
+static void aspeed_peci_init_regs(struct aspeed_peci *priv)
+{
+ u32 val;
+
+ val = FIELD_PREP(ASPEED_PECI_CTRL_CLK_DIV_MASK, ASPEED_PECI_CLK_DIV_DEFAULT);
+ val |= ASPEED_PECI_CTRL_PECI_CLK_EN;
+ writel(val, priv->base + ASPEED_PECI_CTRL);
+ /*
+ * Timing negotiation period setting.
+ * The unit of the programmed value is 4 times of PECI clock period.
+ */
+ val = FIELD_PREP(ASPEED_PECI_T_NEGO_MSG_MASK, priv->msg_timing);
+ val |= FIELD_PREP(ASPEED_PECI_T_NEGO_ADDR_MASK, priv->addr_timing);
+ writel(val, priv->base + ASPEED_PECI_TIMING_NEGOTIATION);
+
+ /* Clear interrupts */
+ val = readl(priv->base + ASPEED_PECI_INT_STS) | ASPEED_PECI_INT_MASK;
+ writel(val, priv->base + ASPEED_PECI_INT_STS);
+
+ /* Set timing negotiation mode and enable interrupts */
+ val = FIELD_PREP(ASPEED_PECI_TIMING_NEGO_SEL_MASK, ASPEED_PECI_1ST_BIT_OF_ADDR_NEGO);
+ val |= ASPEED_PECI_INT_MASK;
+ writel(val, priv->base + ASPEED_PECI_INT_CTRL);
+
+ val = FIELD_PREP(ASPEED_PECI_CTRL_SAMPLING_MASK, priv->rd_sampling_point);
+ val |= FIELD_PREP(ASPEED_PECI_CTRL_CLK_DIV_MASK, priv->clk_div);
+ val |= ASPEED_PECI_CTRL_PECI_EN;
+ val |= ASPEED_PECI_CTRL_PECI_CLK_EN;
+ writel(val, priv->base + ASPEED_PECI_CTRL);
+}
+
+static inline int aspeed_peci_check_idle(struct aspeed_peci *priv)
+{
+ u32 cmd_sts = readl(priv->base + ASPEED_PECI_CMD);
+
+ if (FIELD_GET(ASPEED_PECI_CMD_STS_MASK, cmd_sts) == ASPEED_PECI_CMD_STS_ADDR_T_NEGO)
+ aspeed_peci_init_regs(priv);
+
+ return readl_poll_timeout(priv->base + ASPEED_PECI_CMD,
+ cmd_sts,
+ !(cmd_sts & ASPEED_PECI_CMD_IDLE_MASK),
+ ASPEED_PECI_IDLE_CHECK_INTERVAL_US,
+ ASPEED_PECI_IDLE_CHECK_TIMEOUT_US);
+}
+
+static int aspeed_peci_xfer(struct peci_controller *controller,
+ u8 addr, struct peci_request *req)
+{
+ struct aspeed_peci *priv = dev_get_drvdata(controller->dev.parent);
+ unsigned long timeout = msecs_to_jiffies(priv->cmd_timeout_ms);
+ u32 peci_head;
+ int ret;
+
+ if (req->tx.len > ASPEED_PECI_DATA_BUF_SIZE_MAX ||
+ req->rx.len > ASPEED_PECI_DATA_BUF_SIZE_MAX)
+ return -EINVAL;
+
+ /* Check command sts and bus idle state */
+ ret = aspeed_peci_check_idle(priv);
+ if (ret)
+ return ret; /* -ETIMEDOUT */
+
+ spin_lock_irq(&priv->lock);
+ reinit_completion(&priv->xfer_complete);
+
+ peci_head = FIELD_PREP(ASPEED_PECI_TARGET_ADDR_MASK, addr) |
+ FIELD_PREP(ASPEED_PECI_WR_LEN_MASK, req->tx.len) |
+ FIELD_PREP(ASPEED_PECI_RD_LEN_MASK, req->rx.len);
+
+ writel(peci_head, priv->base + ASPEED_PECI_RW_LENGTH);
+
+ memcpy_toio(priv->base + ASPEED_PECI_WR_DATA0, req->tx.buf, min_t(u8, req->tx.len, 16));
+ if (req->tx.len > 16)
+ memcpy_toio(priv->base + ASPEED_PECI_WR_DATA4, req->tx.buf + 16,
+ req->tx.len - 16);
+
+#if IS_ENABLED(CONFIG_DYNAMIC_DEBUG)
+ dev_dbg(priv->dev, "HEAD : %#08x\n", peci_head);
+ print_hex_dump_bytes("TX : ", DUMP_PREFIX_NONE, req->tx.buf, req->tx.len);
+#endif
+ priv->status = 0;
+ writel(ASPEED_PECI_CMD_FIRE, priv->base + ASPEED_PECI_CMD);
+ spin_unlock_irq(&priv->lock);
+
+ ret = wait_for_completion_interruptible_timeout(&priv->xfer_complete, timeout);
+ if (ret < 0)
+ return ret;
+
+ if (ret == 0) {
+ dev_dbg(priv->dev, "Timeout waiting for a response!\n");
+ return -ETIMEDOUT;
+ }
+
+ spin_lock_irq(&priv->lock);
+
+ writel(0, priv->base + ASPEED_PECI_CMD);
+
+ if (priv->status != ASPEED_PECI_INT_CMD_DONE) {
+ spin_unlock_irq(&priv->lock);
+ dev_dbg(priv->dev, "No valid response, status: %#02x\n", priv->status);
+ return -EIO;
+ }
+
+ spin_unlock_irq(&priv->lock);
+
+ memcpy_fromio(req->rx.buf, priv->base + ASPEED_PECI_RD_DATA0, min_t(u8, req->rx.len, 16));
+ if (req->rx.len > 16)
+ memcpy_fromio(req->rx.buf + 16, priv->base + ASPEED_PECI_RD_DATA4,
+ req->rx.len - 16);
+
+#if IS_ENABLED(CONFIG_DYNAMIC_DEBUG)
+ print_hex_dump_bytes("RX : ", DUMP_PREFIX_NONE, req->rx.buf, req->rx.len);
+#endif
+ return 0;
+}
+
+static irqreturn_t aspeed_peci_irq_handler(int irq, void *arg)
+{
+ struct aspeed_peci *priv = arg;
+ u32 status;
+
+ spin_lock(&priv->lock);
+ status = readl(priv->base + ASPEED_PECI_INT_STS);
+ writel(status, priv->base + ASPEED_PECI_INT_STS);
+ priv->status |= (status & ASPEED_PECI_INT_MASK);
+
+ /*
+ * All commands should be ended up with a ASPEED_PECI_INT_CMD_DONE bit
+ * set even in an error case.
+ */
+ if (status & ASPEED_PECI_INT_CMD_DONE)
+ complete(&priv->xfer_complete);
+
+ spin_unlock(&priv->lock);
+
+ return IRQ_HANDLED;
+}
+
+static void aspeed_peci_property_sanitize(struct device *dev, const char *propname,
+ u32 min, u32 max, u32 default_val, u32 *propval)
+{
+ u32 val;
+ int ret;
+
+ ret = device_property_read_u32(dev, propname, &val);
+ if (ret) {
+ val = default_val;
+ } else if (val > max || val < min) {
+ dev_warn(dev, "Invalid %s: %u, falling back to: %u\n",
+ propname, val, default_val);
+
+ val = default_val;
+ }
+
+ *propval = val;
+}
+
+static void aspeed_peci_property_setup(struct aspeed_peci *priv)
+{
+ aspeed_peci_property_sanitize(priv->dev, "aspeed,clock-divider",
+ 0, ASPEED_PECI_CLK_DIV_MAX,
+ ASPEED_PECI_CLK_DIV_DEFAULT, &priv->clk_div);
+ aspeed_peci_property_sanitize(priv->dev, "aspeed,msg-timing",
+ 0, ASPEED_PECI_MSG_TIMING_MAX,
+ ASPEED_PECI_MSG_TIMING_DEFAULT, &priv->msg_timing);
+ aspeed_peci_property_sanitize(priv->dev, "aspeed,addr-timing",
+ 0, ASPEED_PECI_ADDR_TIMING_MAX,
+ ASPEED_PECI_ADDR_TIMING_DEFAULT, &priv->addr_timing);
+ aspeed_peci_property_sanitize(priv->dev, "aspeed,rd-sampling-point",
+ 0, ASPEED_PECI_RD_SAMPLING_POINT_MAX,
+ ASPEED_PECI_RD_SAMPLING_POINT_DEFAULT,
+ &priv->rd_sampling_point);
+ aspeed_peci_property_sanitize(priv->dev, "cmd-timeout-ms",
+ 1, ASPEED_PECI_CMD_TIMEOUT_MS_MAX,
+ ASPEED_PECI_CMD_TIMEOUT_MS_DEFAULT, &priv->cmd_timeout_ms);
+}
+
+static struct peci_controller_ops aspeed_ops = {
+ .xfer = aspeed_peci_xfer,
+};
+
+static void aspeed_peci_reset_control_release(void *data)
+{
+ reset_control_assert(data);
+}
+
+static int devm_aspeed_peci_reset_control_deassert(struct device *dev, struct reset_control *rst)
+{
+ int ret;
+
+ ret = reset_control_deassert(rst);
+ if (ret)
+ return ret;
+
+ return devm_add_action_or_reset(dev, aspeed_peci_reset_control_release, rst);
+}
+
+static void aspeed_peci_clk_release(void *data)
+{
+ clk_disable_unprepare(data);
+}
+
+static int devm_aspeed_peci_clk_enable(struct device *dev, struct clk *clk)
+{
+ int ret;
+
+ ret = clk_prepare_enable(clk);
+ if (ret)
+ return ret;
+
+ return devm_add_action_or_reset(dev, aspeed_peci_clk_release, clk);
+}
+
+static int aspeed_peci_probe(struct platform_device *pdev)
+{
+ struct peci_controller *controller;
+ struct aspeed_peci *priv;
+ int ret;
+
+ priv = devm_kzalloc(&pdev->dev, sizeof(*priv), GFP_KERNEL);
+ if (!priv)
+ return -ENOMEM;
+
+ priv->dev = &pdev->dev;
+ dev_set_drvdata(priv->dev, priv);
+
+ priv->base = devm_platform_ioremap_resource(pdev, 0);
+ if (IS_ERR(priv->base))
+ return PTR_ERR(priv->base);
+
+ priv->irq = platform_get_irq(pdev, 0);
+ if (!priv->irq)
+ return priv->irq;
+
+ ret = devm_request_irq(&pdev->dev, priv->irq, aspeed_peci_irq_handler,
+ 0, "peci-aspeed", priv);
+ if (ret)
+ return ret;
+
+ init_completion(&priv->xfer_complete);
+ spin_lock_init(&priv->lock);
+
+ priv->rst = devm_reset_control_get(&pdev->dev, NULL);
+ if (IS_ERR(priv->rst))
+ return dev_err_probe(priv->dev, PTR_ERR(priv->rst),
+ "failed to get reset control\n");
+
+ ret = devm_aspeed_peci_reset_control_deassert(priv->dev, priv->rst);
+ if (ret)
+ return dev_err_probe(priv->dev, ret, "cannot deassert reset control\n");
+
+ priv->clk = devm_clk_get(priv->dev, NULL);
+ if (IS_ERR(priv->clk))
+ return dev_err_probe(priv->dev, PTR_ERR(priv->clk), "failed to get clk\n");
+
+ ret = devm_aspeed_peci_clk_enable(priv->dev, priv->clk);
+ if (ret)
+ return dev_err_probe(priv->dev, ret, "failed to enable clock\n");
+
+ aspeed_peci_property_setup(priv);
+
+ aspeed_peci_init_regs(priv);
+
+ controller = devm_peci_controller_add(priv->dev, &aspeed_ops);
+ if (IS_ERR(controller))
+ return dev_err_probe(priv->dev, PTR_ERR(controller),
+ "failed to add aspeed peci controller\n");
+
+ priv->controller = controller;
+
+ return 0;
+}
+
+static const struct of_device_id aspeed_peci_of_table[] = {
+ { .compatible = "aspeed,ast2400-peci", },
+ { .compatible = "aspeed,ast2500-peci", },
+ { .compatible = "aspeed,ast2600-peci", },
+ { }
+};
+MODULE_DEVICE_TABLE(of, aspeed_peci_of_table);
+
+static struct platform_driver aspeed_peci_driver = {
+ .probe = aspeed_peci_probe,
+ .driver = {
+ .name = "peci-aspeed",
+ .of_match_table = aspeed_peci_of_table,
+ },
+};
+module_platform_driver(aspeed_peci_driver);
+
+MODULE_AUTHOR("Ryan Chen <[email protected]>");
+MODULE_AUTHOR("Jae Hyun Yoo <[email protected]>");
+MODULE_DESCRIPTION("ASPEED PECI driver");
+MODULE_LICENSE("GPL");
+MODULE_IMPORT_NS(PECI);
--
2.31.1


2021-11-16 01:44:26

by Winiarska, Iwona

[permalink] [raw]
Subject: [PATCH v3 01/13] dt-bindings: Add generic bindings for PECI

Add device tree bindings for the PECI controller.

Signed-off-by: Iwona Winiarska <[email protected]>
Reviewed-by: Rob Herring <[email protected]>
---
.../bindings/peci/peci-controller.yaml | 33 +++++++++++++++++++
1 file changed, 33 insertions(+)
create mode 100644 Documentation/devicetree/bindings/peci/peci-controller.yaml

diff --git a/Documentation/devicetree/bindings/peci/peci-controller.yaml b/Documentation/devicetree/bindings/peci/peci-controller.yaml
new file mode 100644
index 000000000000..bbc3d3f3a929
--- /dev/null
+++ b/Documentation/devicetree/bindings/peci/peci-controller.yaml
@@ -0,0 +1,33 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/peci/peci-controller.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Generic Device Tree Bindings for PECI
+
+maintainers:
+ - Iwona Winiarska <[email protected]>
+
+description:
+ PECI (Platform Environment Control Interface) is an interface that provides a
+ communication channel from Intel processors and chipset components to external
+ monitoring or control devices.
+
+properties:
+ $nodename:
+ pattern: "^peci-controller(@.*)?$"
+
+ cmd-timeout-ms:
+ description:
+ Command timeout in units of ms.
+
+additionalProperties: true
+
+examples:
+ - |
+ peci-controller@1e78b000 {
+ reg = <0x1e78b000 0x100>;
+ cmd-timeout-ms = <500>;
+ };
+...
--
2.31.1


2021-11-16 04:02:15

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH v3 10/13] hwmon: peci: Add cputemp driver

On 11/15/21 10:25 AM, Iwona Winiarska wrote:
> Add peci-cputemp driver for Digital Thermal Sensor (DTS) thermal
> readings of the processor package and processor cores that are
> accessible via the PECI interface.
>
> The main use case for the driver (and PECI interface) is out-of-band
> management, where we're able to obtain the DTS readings from an external
> entity connected with PECI, e.g. BMC on server platforms.
>
> Co-developed-by: Jae Hyun Yoo <[email protected]>
> Signed-off-by: Jae Hyun Yoo <[email protected]>
> Signed-off-by: Iwona Winiarska <[email protected]>
> Reviewed-by: Pierre-Louis Bossart <[email protected]>
> ---
> MAINTAINERS | 7 +
> drivers/hwmon/Kconfig | 2 +
> drivers/hwmon/Makefile | 1 +
> drivers/hwmon/peci/Kconfig | 18 ++
> drivers/hwmon/peci/Makefile | 5 +
> drivers/hwmon/peci/common.h | 58 ++++
> drivers/hwmon/peci/cputemp.c | 592 +++++++++++++++++++++++++++++++++++
> 7 files changed, 683 insertions(+)
> create mode 100644 drivers/hwmon/peci/Kconfig
> create mode 100644 drivers/hwmon/peci/Makefile
> create mode 100644 drivers/hwmon/peci/common.h
> create mode 100644 drivers/hwmon/peci/cputemp.c
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index d751590f32e3..0094370be6c4 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -14920,6 +14920,13 @@ L: [email protected]
> S: Maintained
> F: drivers/platform/x86/peaq-wmi.c
>
> +PECI HARDWARE MONITORING DRIVERS
> +M: Iwona Winiarska <[email protected]>
> +R: Jae Hyun Yoo <[email protected]>
> +L: [email protected]
> +S: Supported
> +F: drivers/hwmon/peci/
> +
> PECI SUBSYSTEM
> M: Iwona Winiarska <[email protected]>
> R: Jae Hyun Yoo <[email protected]>
> diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
> index 64bd3dfba2c4..4858f7cfa81e 100644
> --- a/drivers/hwmon/Kconfig
> +++ b/drivers/hwmon/Kconfig
> @@ -1528,6 +1528,8 @@ config SENSORS_PCF8591
> These devices are hard to detect and rarely found on mainstream
> hardware. If unsure, say N.
>
> +source "drivers/hwmon/peci/Kconfig"
> +
> source "drivers/hwmon/pmbus/Kconfig"
>
> config SENSORS_PWM_FAN
> diff --git a/drivers/hwmon/Makefile b/drivers/hwmon/Makefile
> index baee6a8d4dd1..2c0e210a28b8 100644
> --- a/drivers/hwmon/Makefile
> +++ b/drivers/hwmon/Makefile
> @@ -204,6 +204,7 @@ obj-$(CONFIG_SENSORS_WM8350) += wm8350-hwmon.o
> obj-$(CONFIG_SENSORS_XGENE) += xgene-hwmon.o
>
> obj-$(CONFIG_SENSORS_OCC) += occ/
> +obj-$(CONFIG_SENSORS_PECI) += peci/
> obj-$(CONFIG_PMBUS) += pmbus/
>
> ccflags-$(CONFIG_HWMON_DEBUG_CHIP) := -DDEBUG
> diff --git a/drivers/hwmon/peci/Kconfig b/drivers/hwmon/peci/Kconfig
> new file mode 100644
> index 000000000000..e10eed68d70a
> --- /dev/null
> +++ b/drivers/hwmon/peci/Kconfig
> @@ -0,0 +1,18 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +
> +config SENSORS_PECI_CPUTEMP
> + tristate "PECI CPU temperature monitoring client"
> + depends on PECI
> + select SENSORS_PECI
> + select PECI_CPU
> + help
> + If you say yes here you get support for the generic Intel PECI
> + cputemp driver which provides Digital Thermal Sensor (DTS) thermal
> + readings of the CPU package and CPU cores that are accessible via
> + the processor PECI interface.
> +
> + This driver can also be built as a module. If so, the module
> + will be called peci-cputemp.
> +
> +config SENSORS_PECI
> + tristate
> diff --git a/drivers/hwmon/peci/Makefile b/drivers/hwmon/peci/Makefile
> new file mode 100644
> index 000000000000..e8a0ada5ab1f
> --- /dev/null
> +++ b/drivers/hwmon/peci/Makefile
> @@ -0,0 +1,5 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +
> +peci-cputemp-y := cputemp.o
> +
> +obj-$(CONFIG_SENSORS_PECI_CPUTEMP) += peci-cputemp.o
> diff --git a/drivers/hwmon/peci/common.h b/drivers/hwmon/peci/common.h
> new file mode 100644
> index 000000000000..734506b0eca2
> --- /dev/null
> +++ b/drivers/hwmon/peci/common.h
> @@ -0,0 +1,58 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/* Copyright (c) 2021 Intel Corporation */
> +
> +#include <linux/mutex.h>
> +#include <linux/types.h>
> +
> +#ifndef __PECI_HWMON_COMMON_H
> +#define __PECI_HWMON_COMMON_H
> +
> +#define PECI_HWMON_UPDATE_INTERVAL HZ
> +
> +/**
> + * struct peci_sensor_state - PECI state information
> + * @valid: flag to indicate the sensor value is valid
> + * @last_updated: time of the last update in jiffies
> + * @lock: mutex to protect sensor access
> + */
> +struct peci_sensor_state {
> + bool valid;
> + unsigned long last_updated;
> + struct mutex lock; /* protect sensor access */
> +};
> +
> +/**
> + * struct peci_sensor_data - PECI sensor information
> + * @value: sensor value in milli units
> + * @state: sensor update state
> + */
> +
> +struct peci_sensor_data {
> + s32 value;
> + struct peci_sensor_state state;
> +};
> +
> +/**
> + * peci_sensor_need_update() - check whether sensor update is needed or not
> + * @sensor: pointer to sensor data struct
> + *
> + * Return: true if update is needed, false if not.
> + */
> +
> +static inline bool peci_sensor_need_update(struct peci_sensor_state *state)
> +{
> + return !state->valid ||
> + time_after(jiffies, state->last_updated + PECI_HWMON_UPDATE_INTERVAL);
> +}
> +
> +/**
> + * peci_sensor_mark_updated() - mark the sensor is updated
> + * @sensor: pointer to sensor data struct
> + */
> +static inline void peci_sensor_mark_updated(struct peci_sensor_state *state)
> +{
> + state->valid = true;
> + state->last_updated = jiffies;
> +}
> +
> +#endif /* __PECI_HWMON_COMMON_H */
> diff --git a/drivers/hwmon/peci/cputemp.c b/drivers/hwmon/peci/cputemp.c
> new file mode 100644
> index 000000000000..d264d6c2f437
> --- /dev/null
> +++ b/drivers/hwmon/peci/cputemp.c
> @@ -0,0 +1,592 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +// Copyright (c) 2018-2021 Intel Corporation
> +
> +#include <linux/auxiliary_bus.h>
> +#include <linux/bitfield.h>
> +#include <linux/bitops.h>
> +#include <linux/hwmon.h>
> +#include <linux/jiffies.h>
> +#include <linux/module.h>
> +#include <linux/peci.h>
> +#include <linux/peci-cpu.h>
> +#include <linux/units.h>
> +
> +#include "common.h"
> +
> +#define CORE_NUMS_MAX 64
> +
> +#define BASE_CHANNEL_NUMS 5
> +#define CPUTEMP_CHANNEL_NUMS (BASE_CHANNEL_NUMS + CORE_NUMS_MAX)
> +
> +#define TEMP_TARGET_FAN_TEMP_MASK GENMASK(15, 8)
> +#define TEMP_TARGET_REF_TEMP_MASK GENMASK(23, 16)
> +#define TEMP_TARGET_TJ_OFFSET_MASK GENMASK(29, 24)
> +
> +#define DTS_MARGIN_MASK GENMASK(15, 0)
> +#define PCS_MODULE_TEMP_MASK GENMASK(15, 0)
> +
> +struct resolved_cores_reg {
> + u8 bus;
> + u8 dev;
> + u8 func;
> + u8 offset;
> +};
> +
> +struct cpu_info {
> + struct resolved_cores_reg *reg;
> + u8 min_peci_revision;
> + s32 (*thermal_margin_to_millidegree)(s32 val);
> +};
> +
> +struct peci_temp_target {
> + s32 tcontrol;
> + s32 tthrottle;
> + s32 tjmax;
> + struct peci_sensor_state state;
> +};
> +
> +enum peci_temp_target_type {
> + tcontrol_type,
> + tthrottle_type,
> + tjmax_type,
> + crit_hyst_type,
> +};
> +
> +struct peci_cputemp {
> + struct peci_device *peci_dev;
> + struct device *dev;
> + const char *name;
> + const struct cpu_info *gen_info;
> + struct {
> + struct peci_temp_target target;
> + struct peci_sensor_data die;
> + struct peci_sensor_data dts;
> + struct peci_sensor_data core[CORE_NUMS_MAX];
> + } temp;
> + const char **coretemp_label;
> + DECLARE_BITMAP(core_mask, CORE_NUMS_MAX);
> +};
> +
> +enum cputemp_channels {
> + channel_die,
> + channel_dts,
> + channel_tcontrol,
> + channel_tthrottle,
> + channel_tjmax,
> + channel_core,
> +};
> +
> +static const char * const cputemp_label[BASE_CHANNEL_NUMS] = {
> + "Die",
> + "DTS",
> + "Tcontrol",
> + "Tthrottle",
> + "Tjmax",
> +};
> +
> +static int update_temp_target(struct peci_cputemp *priv)
> +{
> + s32 tthrottle_offset, tcontrol_margin;
> + u32 pcs;
> + int ret;
> +
> + if (!peci_sensor_need_update(&priv->temp.target.state))
> + return 0;
> +
> + ret = peci_pcs_read(priv->peci_dev, PECI_PCS_TEMP_TARGET, 0, &pcs);
> + if (ret)
> + return ret;
> +
> + priv->temp.target.tjmax =
> + FIELD_GET(TEMP_TARGET_REF_TEMP_MASK, pcs) * MILLIDEGREE_PER_DEGREE;
> +
> + tcontrol_margin = FIELD_GET(TEMP_TARGET_FAN_TEMP_MASK, pcs);
> + tcontrol_margin = sign_extend32(tcontrol_margin, 7) * MILLIDEGREE_PER_DEGREE;
> + priv->temp.target.tcontrol = priv->temp.target.tjmax - tcontrol_margin;
> +
> + tthrottle_offset = FIELD_GET(TEMP_TARGET_TJ_OFFSET_MASK, pcs) * MILLIDEGREE_PER_DEGREE;
> + priv->temp.target.tthrottle = priv->temp.target.tjmax - tthrottle_offset;
> +
> + peci_sensor_mark_updated(&priv->temp.target.state);
> +
> + return 0;
> +}
> +
> +static int get_temp_target(struct peci_cputemp *priv, enum peci_temp_target_type type, long *val)

Why is this a long * ? It only uses variables which are declared as s32.

> +{
> + int ret;
> +
> + mutex_lock(&priv->temp.target.state.lock);
> +
> + ret = update_temp_target(priv);
> + if (ret)
> + goto unlock;
> +
> + switch (type) {
> + case tcontrol_type:
> + *val = priv->temp.target.tcontrol;
> + break;
> + case tthrottle_type:
> + *val = priv->temp.target.tthrottle;
> + break;
> + case tjmax_type:
> + *val = priv->temp.target.tjmax;
> + break;
> + case crit_hyst_type:
> + *val = priv->temp.target.tjmax - priv->temp.target.tcontrol;
> + break;
> + default:
> + ret = -EOPNOTSUPP;
> + break;
> + }
> +unlock:
> + mutex_unlock(&priv->temp.target.state.lock);
> +
> + return ret;
> +}
> +
> +/*
> + * Error codes:
> + * 0x8000: General sensor error
> + * 0x8001: Reserved
> + * 0x8002: Underflow on reading value
> + * 0x8003-0x81ff: Reserved
> + */
> +static bool dts_valid(s32 val)
> +{
> + return val < 0x8000 || val > 0x81ff;
> +}
> +
> +/*
> + * Processors return a value of DTS reading in S10.6 fixed point format
> + * (16 bits: 10-bit signed magnitude, 6-bit fraction).
> + */
> +static s32 dts_ten_dot_six_to_millidegree(s32 val)
> +{
> + return sign_extend32(val, 15) * MILLIDEGREE_PER_DEGREE / 64;
> +}
> +
> +/*
> + * For older processors, thermal margin reading is returned in S8.8 fixed
> + * point format (16 bits: 8-bit signed magnitude, 8-bit fraction).
> + */
> +static s32 dts_eight_dot_eight_to_millidegree(s32 val)
> +{
> + return sign_extend32(val, 15) * MILLIDEGREE_PER_DEGREE / 256;
> +}
> +
> +static int get_die_temp(struct peci_cputemp *priv, long *val)
> +{
> + int ret = 0;
> + long tjmax;
> + s16 temp;
> +
> + mutex_lock(&priv->temp.die.state.lock);
> + if (!peci_sensor_need_update(&priv->temp.die.state))
> + goto skip_update;
> +
> + ret = peci_temp_read(priv->peci_dev, &temp);
> + if (ret)
> + goto err_unlock;
> +
> + if (!dts_valid(temp)) {
> + ret = -EIO;
> + goto err_unlock;
> + }
> +

I am getting lost with all the variable types. temp is s16,
passed to dts_valid() as s32 and thus presumably sign extended.
Since temp is s16, that means that a positive value of 0x8000
or larger isn't really possible, and dts_valid() will never return
false. With that in mind, you might as well drop the check.

> + ret = get_temp_target(priv, tjmax_type, &tjmax);
> + if (ret)
> + goto err_unlock;
> +
> + priv->temp.die.value = (s32)tjmax + dts_ten_dot_six_to_millidegree(temp);
> +
> + peci_sensor_mark_updated(&priv->temp.die.state);
> +
> +skip_update:
> + *val = priv->temp.die.value;
> +err_unlock:
> + mutex_unlock(&priv->temp.die.state.lock);
> + return ret;
> +}
> +
> +static int get_dts(struct peci_cputemp *priv, long *val)
> +{
> + int ret = 0;
> + s32 thermal_margin;
> + long tcontrol;
> + u32 pcs;
> +
> + mutex_lock(&priv->temp.dts.state.lock);
> + if (!peci_sensor_need_update(&priv->temp.dts.state))
> + goto skip_update;
> +
> + ret = peci_pcs_read(priv->peci_dev, PECI_PCS_THERMAL_MARGIN, 0, &pcs);
> + if (ret)
> + goto err_unlock;
> +
> + thermal_margin = FIELD_GET(DTS_MARGIN_MASK, pcs);
> + if (!dts_valid(thermal_margin)) { > + ret = -EIO;
> + goto err_unlock;
> + }
> +
> + ret = get_temp_target(priv, tcontrol_type, &tcontrol);
> + if (ret)
> + goto err_unlock;
> +
> + /* Note that the tcontrol should be available before calling it */
> + priv->temp.dts.value =
> + (s32)tcontrol - priv->gen_info->thermal_margin_to_millidegree(thermal_margin);

tcontrol is stored as s32, converted to long, only to be typecasted
back to s32. Any reason for doing that ?

> +
> + peci_sensor_mark_updated(&priv->temp.dts.state);
> +
> +skip_update:
> + *val = priv->temp.dts.value;
> +err_unlock:
> + mutex_unlock(&priv->temp.dts.state.lock);
> + return ret;
> +}
> +
> +static int get_core_temp(struct peci_cputemp *priv, int core_index, long *val)
> +{
> + int ret = 0;
> + s32 core_dts_margin;
> + long tjmax;
> + u32 pcs;
> +
> + mutex_lock(&priv->temp.core[core_index].state.lock);
> + if (!peci_sensor_need_update(&priv->temp.core[core_index].state))
> + goto skip_update;
> +
> + ret = peci_pcs_read(priv->peci_dev, PECI_PCS_MODULE_TEMP, core_index, &pcs);
> + if (ret)
> + goto err_unlock;
> +
> + core_dts_margin = FIELD_GET(PCS_MODULE_TEMP_MASK, pcs);
> + if (!dts_valid(core_dts_margin)) {
> + ret = -EIO;
> + goto err_unlock;
> + }
> +
> + ret = get_temp_target(priv, tjmax_type, &tjmax);
> + if (ret)
> + goto err_unlock;
> +
> + /* Note that the tjmax should be available before calling it */
> + priv->temp.core[core_index].value =
> + (s32)tjmax + dts_ten_dot_six_to_millidegree(core_dts_margin);
> +
> + peci_sensor_mark_updated(&priv->temp.core[core_index].state);
> +
> +skip_update:
> + *val = priv->temp.core[core_index].value;
> +err_unlock:
> + mutex_unlock(&priv->temp.core[core_index].state.lock);
> + return ret;
> +}
> +
> +static int cputemp_read_string(struct device *dev, enum hwmon_sensor_types type,
> + u32 attr, int channel, const char **str)
> +{
> + struct peci_cputemp *priv = dev_get_drvdata(dev);
> +
> + if (attr != hwmon_temp_label)
> + return -EOPNOTSUPP;
> +
> + *str = channel < channel_core ?
> + cputemp_label[channel] : priv->coretemp_label[channel - channel_core];
> +
> + return 0;
> +}
> +
> +static int cputemp_read(struct device *dev, enum hwmon_sensor_types type,
> + u32 attr, int channel, long *val)
> +{
> + struct peci_cputemp *priv = dev_get_drvdata(dev);
> +
> + switch (attr) {
> + case hwmon_temp_input:
> + switch (channel) {
> + case channel_die:
> + return get_die_temp(priv, val);
> + case channel_dts:
> + return get_dts(priv, val);
> + case channel_tcontrol:
> + return get_temp_target(priv, tcontrol_type, val);
> + case channel_tthrottle:
> + return get_temp_target(priv, tthrottle_type, val);
> + case channel_tjmax:
> + return get_temp_target(priv, tjmax_type, val);
> + default:
> + return get_core_temp(priv, channel - channel_core, val);
> + }
> + break;
> + case hwmon_temp_max:
> + return get_temp_target(priv, tcontrol_type, val);
> + case hwmon_temp_crit:
> + return get_temp_target(priv, tjmax_type, val);
> + case hwmon_temp_crit_hyst:
> + return get_temp_target(priv, crit_hyst_type, val);
> + default:
> + return -EOPNOTSUPP;
> + }
> +
> + return 0;
> +}
> +
> +static umode_t cputemp_is_visible(const void *data, enum hwmon_sensor_types type,
> + u32 attr, int channel)
> +{
> + const struct peci_cputemp *priv = data;
> +
> + if (channel > CPUTEMP_CHANNEL_NUMS)
> + return 0;
> +
> + if (channel < channel_core)
> + return 0444;
> +
> + if (test_bit(channel - channel_core, priv->core_mask))
> + return 0444;
> +
> + return 0;
> +}
> +
> +static int init_core_mask(struct peci_cputemp *priv)
> +{
> + struct peci_device *peci_dev = priv->peci_dev;
> + struct resolved_cores_reg *reg = priv->gen_info->reg;
> + u64 core_mask;
> + u32 data;
> + int ret;
> +
> + /* Get the RESOLVED_CORES register value */
> + switch (peci_dev->info.model) {
> + case INTEL_FAM6_ICELAKE_X:
> + case INTEL_FAM6_ICELAKE_D:
> + ret = peci_ep_pci_local_read(peci_dev, 0, reg->bus, reg->dev,
> + reg->func, reg->offset + 4, &data);
> + if (ret)
> + return ret;
> +
> + core_mask = (u64)data << 32;
> +
> + ret = peci_ep_pci_local_read(peci_dev, 0, reg->bus, reg->dev,
> + reg->func, reg->offset, &data);
> + if (ret)
> + return ret;
> +
> + core_mask |= data;
> +
> + break;
> + default:
> + ret = peci_pci_local_read(peci_dev, reg->bus, reg->dev,
> + reg->func, reg->offset, &data);
> + if (ret)
> + return ret;
> +
> + core_mask = data;
> +
> + break;
> + }
> +
> + if (!core_mask)
> + return -EIO;
> +
> + bitmap_from_u64(priv->core_mask, core_mask);
> +
> + return 0;
> +}
> +
> +static int create_temp_label(struct peci_cputemp *priv)
> +{
> + unsigned long core_max = find_last_bit(priv->core_mask, CORE_NUMS_MAX);
> + int i;
> +
> + priv->coretemp_label = devm_kzalloc(priv->dev, core_max * sizeof(char *), GFP_KERNEL);
> + if (!priv->coretemp_label)
> + return -ENOMEM;
> +
> + for_each_set_bit(i, priv->core_mask, CORE_NUMS_MAX) {
> + priv->coretemp_label[i] = devm_kasprintf(priv->dev, GFP_KERNEL, "Core %d", i);
> + if (!priv->coretemp_label[i])
> + return -ENOMEM;
> + }
> +
> + return 0;
> +}
> +
> +static void check_resolved_cores(struct peci_cputemp *priv)
> +{
> + /*
> + * Failure to resolve cores is non-critical, we're still able to
> + * provide other sensor data.
> + */
> +
> + if (init_core_mask(priv))
> + return;
> +
> + if (create_temp_label(priv))
> + bitmap_zero(priv->core_mask, CORE_NUMS_MAX);
> +}
> +
> +static void sensor_init(struct peci_cputemp *priv)
> +{
> + int i;
> +
> + mutex_init(&priv->temp.target.state.lock);
> + mutex_init(&priv->temp.die.state.lock);
> + mutex_init(&priv->temp.dts.state.lock);
> +
> + for_each_set_bit(i, priv->core_mask, CORE_NUMS_MAX)
> + mutex_init(&priv->temp.core[i].state.lock);
> +}
> +
> +static const struct hwmon_ops peci_cputemp_ops = {
> + .is_visible = cputemp_is_visible,
> + .read_string = cputemp_read_string,
> + .read = cputemp_read,
> +};
> +
> +static const u32 peci_cputemp_temp_channel_config[] = {
> + /* Die temperature */
> + HWMON_T_LABEL | HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_CRIT | HWMON_T_CRIT_HYST,
> + /* DTS margin */
> + HWMON_T_LABEL | HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_CRIT | HWMON_T_CRIT_HYST,
> + /* Tcontrol temperature */
> + HWMON_T_LABEL | HWMON_T_INPUT | HWMON_T_CRIT,
> + /* Tthrottle temperature */
> + HWMON_T_LABEL | HWMON_T_INPUT,
> + /* Tjmax temperature */
> + HWMON_T_LABEL | HWMON_T_INPUT,
> + /* Core temperature - for all core channels */
> + [channel_core ... CPUTEMP_CHANNEL_NUMS - 1] = HWMON_T_LABEL | HWMON_T_INPUT,
> + 0
> +};
> +
> +static const struct hwmon_channel_info peci_cputemp_temp_channel = {
> + .type = hwmon_temp,
> + .config = peci_cputemp_temp_channel_config,
> +};
> +
> +static const struct hwmon_channel_info *peci_cputemp_info[] = {
> + &peci_cputemp_temp_channel,
> + NULL
> +};
> +
> +static const struct hwmon_chip_info peci_cputemp_chip_info = {
> + .ops = &peci_cputemp_ops,
> + .info = peci_cputemp_info,
> +};
> +
> +static int peci_cputemp_probe(struct auxiliary_device *adev,
> + const struct auxiliary_device_id *id)
> +{
> + struct device *dev = &adev->dev;
> + struct peci_device *peci_dev = to_peci_device(dev->parent);
> + struct peci_cputemp *priv;
> + struct device *hwmon_dev;
> +
> + priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
> + if (!priv)
> + return -ENOMEM;
> +
> + priv->name = devm_kasprintf(dev, GFP_KERNEL, "peci_cputemp.cpu%d",
> + peci_dev->info.socket_id);
> + if (!priv->name)
> + return -ENOMEM;
> +
> + priv->dev = dev;
> + priv->peci_dev = peci_dev;
> + priv->gen_info = (const struct cpu_info *)id->driver_data;
> +
> + /*
> + * This is just a sanity check. Since we're using commands that are
> + * guaranteed to be supported on a given platform, we should never see
> + * revision lower than expected.
> + */
> + if (peci_dev->info.peci_revision < priv->gen_info->min_peci_revision)
> + dev_warn(priv->dev,
> + "Unexpected PECI revision %#x, some features may be unavailable\n",
> + peci_dev->info.peci_revision);
> +
> + check_resolved_cores(priv);
> +
> + sensor_init(priv);
> +
> + hwmon_dev = devm_hwmon_device_register_with_info(priv->dev, priv->name,
> + priv, &peci_cputemp_chip_info, NULL);
> +
> + return PTR_ERR_OR_ZERO(hwmon_dev);
> +}
> +
> +/*
> + * RESOLVED_CORES PCI configuration register may have different location on
> + * different platforms.
> + */
> +static struct resolved_cores_reg resolved_cores_reg_hsx = {
> + .bus = 1,
> + .dev = 30,
> + .func = 3,
> + .offset = 0xb4,
> +};
> +
> +static struct resolved_cores_reg resolved_cores_reg_icx = {
> + .bus = 14,
> + .dev = 30,
> + .func = 3,
> + .offset = 0xd0,
> +};
> +
> +static const struct cpu_info cpu_hsx = {
> + .reg = &resolved_cores_reg_hsx,
> + .min_peci_revision = 0x33,
> + .thermal_margin_to_millidegree = &dts_eight_dot_eight_to_millidegree,
> +};
> +
> +static const struct cpu_info cpu_icx = {
> + .reg = &resolved_cores_reg_icx,
> + .min_peci_revision = 0x40,
> + .thermal_margin_to_millidegree = &dts_ten_dot_six_to_millidegree,
> +};
> +
> +static const struct auxiliary_device_id peci_cputemp_ids[] = {
> + {
> + .name = "peci_cpu.cputemp.hsx",
> + .driver_data = (kernel_ulong_t)&cpu_hsx,
> + },
> + {
> + .name = "peci_cpu.cputemp.bdx",
> + .driver_data = (kernel_ulong_t)&cpu_hsx,
> + },
> + {
> + .name = "peci_cpu.cputemp.bdxd",
> + .driver_data = (kernel_ulong_t)&cpu_hsx,
> + },
> + {
> + .name = "peci_cpu.cputemp.skx",
> + .driver_data = (kernel_ulong_t)&cpu_hsx,
> + },
> + {
> + .name = "peci_cpu.cputemp.icx",
> + .driver_data = (kernel_ulong_t)&cpu_icx,
> + },
> + {
> + .name = "peci_cpu.cputemp.icxd",
> + .driver_data = (kernel_ulong_t)&cpu_icx,
> + },
> + { }
> +};
> +MODULE_DEVICE_TABLE(auxiliary, peci_cputemp_ids);
> +
> +static struct auxiliary_driver peci_cputemp_driver = {
> + .probe = peci_cputemp_probe,
> + .id_table = peci_cputemp_ids,
> +};
> +
> +module_auxiliary_driver(peci_cputemp_driver);
> +
> +MODULE_AUTHOR("Jae Hyun Yoo <[email protected]>");
> +MODULE_AUTHOR("Iwona Winiarska <[email protected]>");
> +MODULE_DESCRIPTION("PECI cputemp driver");
> +MODULE_LICENSE("GPL");
> +MODULE_IMPORT_NS(PECI_CPU);
>


2021-11-17 04:21:00

by Zev Weiss

[permalink] [raw]
Subject: Re: [PATCH v3 00/13] Introduce PECI subsystem

On Mon, Nov 15, 2021 at 10:25:39AM PST, Iwona Winiarska wrote:
>Hi,
>
>This is a third round of patches introducing PECI subsystem.
>Sorry for the delay between v2 and v3.
>

Hi Iwona,

I've done some testing of these patches on my AST2500/E-2778G OpenBMC
platform -- I had to do a small bit of hacking to add support for
INTEL_FAM6_KABYLAKE, but with that in place the newly-added code for the
8.8 format seems to work as it should. Thanks!

In poking at it a bit further I encountered some sub-optimal behavior
w.r.t. to host power state transitions and timeouts though --
essentially, if I ever hit a timeout in aspeed_peci_xfer() (for example
on a read of a hwmon tempX_input file after an unexpected host
shutdown), it seems to get stuck in a state where even if the host comes
back online, all attempted PECI transfers continue just timing out.
(Rebooting the BMC seems to resolve the problem.) This also happens if
I remove the peci client device via the 'remove' sysfs file, shut down
the host, and then do a rescan via sysfs while the host is off (i.e.
another operation that times out).

Let me know if there's any other info that would be helpful for
debugging.


Thanks,
Zev

2021-11-17 22:20:28

by Winiarska, Iwona

[permalink] [raw]
Subject: Re: [PATCH v3 10/13] hwmon: peci: Add cputemp driver

On Mon, 2021-11-15 at 16:52 -0800, Guenter Roeck wrote:
> On 11/15/21 10:25 AM, Iwona Winiarska wrote:
> > Add peci-cputemp driver for Digital Thermal Sensor (DTS) thermal
> > readings of the processor package and processor cores that are
> > accessible via the PECI interface.
> >
> > The main use case for the driver (and PECI interface) is out-of-band
> > management, where we're able to obtain the DTS readings from an external
> > entity connected with PECI, e.g. BMC on server platforms.
> >
> > Co-developed-by: Jae Hyun Yoo <[email protected]>
> > Signed-off-by: Jae Hyun Yoo <[email protected]>
> > Signed-off-by: Iwona Winiarska <[email protected]>
> > Reviewed-by: Pierre-Louis Bossart <[email protected]>
> > ---
> >   MAINTAINERS                  |   7 +
> >   drivers/hwmon/Kconfig        |   2 +
> >   drivers/hwmon/Makefile       |   1 +
> >   drivers/hwmon/peci/Kconfig   |  18 ++
> >   drivers/hwmon/peci/Makefile  |   5 +
> >   drivers/hwmon/peci/common.h  |  58 ++++
> >   drivers/hwmon/peci/cputemp.c | 592 +++++++++++++++++++++++++++++++++++
> >   7 files changed, 683 insertions(+)
> >   create mode 100644 drivers/hwmon/peci/Kconfig
> >   create mode 100644 drivers/hwmon/peci/Makefile
> >   create mode 100644 drivers/hwmon/peci/common.h
> >   create mode 100644 drivers/hwmon/peci/cputemp.c
> >
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index d751590f32e3..0094370be6c4 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -14920,6 +14920,13 @@ L:     [email protected]
> >   S:    Maintained
> >   F:    drivers/platform/x86/peaq-wmi.c
> >  
> > +PECI HARDWARE MONITORING DRIVERS
> > +M:     Iwona Winiarska <[email protected]>
> > +R:     Jae Hyun Yoo <[email protected]>
> > +L:     [email protected]
> > +S:     Supported
> > +F:     drivers/hwmon/peci/
> > +
> >   PECI SUBSYSTEM
> >   M:    Iwona Winiarska <[email protected]>
> >   R:    Jae Hyun Yoo <[email protected]>
> > diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
> > index 64bd3dfba2c4..4858f7cfa81e 100644
> > --- a/drivers/hwmon/Kconfig
> > +++ b/drivers/hwmon/Kconfig
> > @@ -1528,6 +1528,8 @@ config SENSORS_PCF8591
> >           These devices are hard to detect and rarely found on mainstream
> >           hardware. If unsure, say N.
> >  
> > +source "drivers/hwmon/peci/Kconfig"
> > +
> >   source "drivers/hwmon/pmbus/Kconfig"
> >  
> >   config SENSORS_PWM_FAN
> > diff --git a/drivers/hwmon/Makefile b/drivers/hwmon/Makefile
> > index baee6a8d4dd1..2c0e210a28b8 100644
> > --- a/drivers/hwmon/Makefile
> > +++ b/drivers/hwmon/Makefile
> > @@ -204,6 +204,7 @@ obj-$(CONFIG_SENSORS_WM8350)        += wm8350-hwmon.o
> >   obj-$(CONFIG_SENSORS_XGENE)   += xgene-hwmon.o
> >  
> >   obj-$(CONFIG_SENSORS_OCC)     += occ/
> > +obj-$(CONFIG_SENSORS_PECI)     += peci/
> >   obj-$(CONFIG_PMBUS)           += pmbus/
> >  
> >   ccflags-$(CONFIG_HWMON_DEBUG_CHIP) := -DDEBUG
> > diff --git a/drivers/hwmon/peci/Kconfig b/drivers/hwmon/peci/Kconfig
> > new file mode 100644
> > index 000000000000..e10eed68d70a
> > --- /dev/null
> > +++ b/drivers/hwmon/peci/Kconfig
> > @@ -0,0 +1,18 @@
> > +# SPDX-License-Identifier: GPL-2.0-only
> > +
> > +config SENSORS_PECI_CPUTEMP
> > +       tristate "PECI CPU temperature monitoring client"
> > +       depends on PECI
> > +       select SENSORS_PECI
> > +       select PECI_CPU
> > +       help
> > +         If you say yes here you get support for the generic Intel PECI
> > +         cputemp driver which provides Digital Thermal Sensor (DTS) thermal
> > +         readings of the CPU package and CPU cores that are accessible via
> > +         the processor PECI interface.
> > +
> > +         This driver can also be built as a module. If so, the module
> > +         will be called peci-cputemp.
> > +
> > +config SENSORS_PECI
> > +       tristate
> > diff --git a/drivers/hwmon/peci/Makefile b/drivers/hwmon/peci/Makefile
> > new file mode 100644
> > index 000000000000..e8a0ada5ab1f
> > --- /dev/null
> > +++ b/drivers/hwmon/peci/Makefile
> > @@ -0,0 +1,5 @@
> > +# SPDX-License-Identifier: GPL-2.0-only
> > +
> > +peci-cputemp-y := cputemp.o
> > +
> > +obj-$(CONFIG_SENSORS_PECI_CPUTEMP)     += peci-cputemp.o
> > diff --git a/drivers/hwmon/peci/common.h b/drivers/hwmon/peci/common.h
> > new file mode 100644
> > index 000000000000..734506b0eca2
> > --- /dev/null
> > +++ b/drivers/hwmon/peci/common.h
> > @@ -0,0 +1,58 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/* Copyright (c) 2021 Intel Corporation */
> > +
> > +#include <linux/mutex.h>
> > +#include <linux/types.h>
> > +
> > +#ifndef __PECI_HWMON_COMMON_H
> > +#define __PECI_HWMON_COMMON_H
> > +
> > +#define PECI_HWMON_UPDATE_INTERVAL     HZ
> > +
> > +/**
> > + * struct peci_sensor_state - PECI state information
> > + * @valid: flag to indicate the sensor value is valid
> > + * @last_updated: time of the last update in jiffies
> > + * @lock: mutex to protect sensor access
> > + */
> > +struct peci_sensor_state {
> > +       bool valid;
> > +       unsigned long last_updated;
> > +       struct mutex lock; /* protect sensor access */
> > +};
> > +
> > +/**
> > + * struct peci_sensor_data - PECI sensor information
> > + * @value: sensor value in milli units
> > + * @state: sensor update state
> > + */
> > +
> > +struct peci_sensor_data {
> > +       s32 value;
> > +       struct peci_sensor_state state;
> > +};
> > +
> > +/**
> > + * peci_sensor_need_update() - check whether sensor update is needed or not
> > + * @sensor: pointer to sensor data struct
> > + *
> > + * Return: true if update is needed, false if not.
> > + */
> > +
> > +static inline bool peci_sensor_need_update(struct peci_sensor_state *state)
> > +{
> > +       return !state->valid ||
> > +              time_after(jiffies, state->last_updated +
> > PECI_HWMON_UPDATE_INTERVAL);
> > +}
> > +
> > +/**
> > + * peci_sensor_mark_updated() - mark the sensor is updated
> > + * @sensor: pointer to sensor data struct
> > + */
> > +static inline void peci_sensor_mark_updated(struct peci_sensor_state
> > *state)
> > +{
> > +       state->valid = true;
> > +       state->last_updated = jiffies;
> > +}
> > +
> > +#endif /* __PECI_HWMON_COMMON_H */
> > diff --git a/drivers/hwmon/peci/cputemp.c b/drivers/hwmon/peci/cputemp.c
> > new file mode 100644
> > index 000000000000..d264d6c2f437
> > --- /dev/null
> > +++ b/drivers/hwmon/peci/cputemp.c
> > @@ -0,0 +1,592 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +// Copyright (c) 2018-2021 Intel Corporation
> > +
> > +#include <linux/auxiliary_bus.h>
> > +#include <linux/bitfield.h>
> > +#include <linux/bitops.h>
> > +#include <linux/hwmon.h>
> > +#include <linux/jiffies.h>
> > +#include <linux/module.h>
> > +#include <linux/peci.h>
> > +#include <linux/peci-cpu.h>
> > +#include <linux/units.h>
> > +
> > +#include "common.h"
> > +
> > +#define CORE_NUMS_MAX          64
> > +
> > +#define BASE_CHANNEL_NUMS      5
> > +#define CPUTEMP_CHANNEL_NUMS   (BASE_CHANNEL_NUMS + CORE_NUMS_MAX)
> > +
> > +#define TEMP_TARGET_FAN_TEMP_MASK      GENMASK(15, 8)
> > +#define TEMP_TARGET_REF_TEMP_MASK      GENMASK(23, 16)
> > +#define TEMP_TARGET_TJ_OFFSET_MASK     GENMASK(29, 24)
> > +
> > +#define DTS_MARGIN_MASK                GENMASK(15, 0)
> > +#define PCS_MODULE_TEMP_MASK   GENMASK(15, 0)
> > +
> > +struct resolved_cores_reg {
> > +       u8 bus;
> > +       u8 dev;
> > +       u8 func;
> > +       u8 offset;
> > +};
> > +
> > +struct cpu_info {
> > +       struct resolved_cores_reg *reg;
> > +       u8 min_peci_revision;
> > +       s32 (*thermal_margin_to_millidegree)(s32 val);
> > +};
> > +
> > +struct peci_temp_target {
> > +       s32 tcontrol;
> > +       s32 tthrottle;
> > +       s32 tjmax;
> > +       struct peci_sensor_state state;
> > +};
> > +
> > +enum peci_temp_target_type {
> > +       tcontrol_type,
> > +       tthrottle_type,
> > +       tjmax_type,
> > +       crit_hyst_type,
> > +};
> > +
> > +struct peci_cputemp {
> > +       struct peci_device *peci_dev;
> > +       struct device *dev;
> > +       const char *name;
> > +       const struct cpu_info *gen_info;
> > +       struct {
> > +               struct peci_temp_target target;
> > +               struct peci_sensor_data die;
> > +               struct peci_sensor_data dts;
> > +               struct peci_sensor_data core[CORE_NUMS_MAX];
> > +       } temp;
> > +       const char **coretemp_label;
> > +       DECLARE_BITMAP(core_mask, CORE_NUMS_MAX);
> > +};
> > +
> > +enum cputemp_channels {
> > +       channel_die,
> > +       channel_dts,
> > +       channel_tcontrol,
> > +       channel_tthrottle,
> > +       channel_tjmax,
> > +       channel_core,
> > +};
> > +
> > +static const char * const cputemp_label[BASE_CHANNEL_NUMS] = {
> > +       "Die",
> > +       "DTS",
> > +       "Tcontrol",
> > +       "Tthrottle",
> > +       "Tjmax",
> > +};
> > +
> > +static int update_temp_target(struct peci_cputemp *priv)
> > +{
> > +       s32 tthrottle_offset, tcontrol_margin;
> > +       u32 pcs;
> > +       int ret;
> > +
> > +       if (!peci_sensor_need_update(&priv->temp.target.state))
> > +               return 0;
> > +
> > +       ret = peci_pcs_read(priv->peci_dev, PECI_PCS_TEMP_TARGET, 0, &pcs);
> > +       if (ret)
> > +               return ret;
> > +
> > +       priv->temp.target.tjmax =
> > +               FIELD_GET(TEMP_TARGET_REF_TEMP_MASK, pcs) *
> > MILLIDEGREE_PER_DEGREE;
> > +
> > +       tcontrol_margin = FIELD_GET(TEMP_TARGET_FAN_TEMP_MASK, pcs);
> > +       tcontrol_margin = sign_extend32(tcontrol_margin, 7) *
> > MILLIDEGREE_PER_DEGREE;
> > +       priv->temp.target.tcontrol = priv->temp.target.tjmax -
> > tcontrol_margin;
> > +
> > +       tthrottle_offset = FIELD_GET(TEMP_TARGET_TJ_OFFSET_MASK, pcs) *
> > MILLIDEGREE_PER_DEGREE;
> > +       priv->temp.target.tthrottle = priv->temp.target.tjmax -
> > tthrottle_offset;
> > +
> > +       peci_sensor_mark_updated(&priv->temp.target.state);
> > +
> > +       return 0;
> > +}
> > +
> > +static int get_temp_target(struct peci_cputemp *priv, enum
> > peci_temp_target_type type, long *val)
>
> Why is this a long * ? It only uses variables which are declared as s32.

It's going to be used by cputemp_read, which implements hwmon_ops.read (which
uses long *val).

>
> > +{
> > +       int ret;
> > +
> > +       mutex_lock(&priv->temp.target.state.lock);
> > +
> > +       ret = update_temp_target(priv);
> > +       if (ret)
> > +               goto unlock;
> > +
> > +       switch (type) {
> > +       case tcontrol_type:
> > +               *val = priv->temp.target.tcontrol;
> > +               break;
> > +       case tthrottle_type:
> > +               *val = priv->temp.target.tthrottle;
> > +               break;
> > +       case tjmax_type:
> > +               *val = priv->temp.target.tjmax;
> > +               break;
> > +       case crit_hyst_type:
> > +               *val = priv->temp.target.tjmax - priv->temp.target.tcontrol;
> > +               break;
> > +       default:
> > +               ret = -EOPNOTSUPP;
> > +               break;
> > +       }
> > +unlock:
> > +       mutex_unlock(&priv->temp.target.state.lock);
> > +
> > +       return ret;
> > +}
> > +
> > +/*
> > + * Error codes:
> > + *   0x8000: General sensor error
> > + *   0x8001: Reserved
> > + *   0x8002: Underflow on reading value
> > + *   0x8003-0x81ff: Reserved
> > + */
> > +static bool dts_valid(s32 val)
> > +{
> > +       return val < 0x8000 || val > 0x81ff;
> > +}
> > +
> > +/*
> > + * Processors return a value of DTS reading in S10.6 fixed point format
> > + * (16 bits: 10-bit signed magnitude, 6-bit fraction).
> > + */
> > +static s32 dts_ten_dot_six_to_millidegree(s32 val)
> > +{
> > +       return sign_extend32(val, 15) * MILLIDEGREE_PER_DEGREE / 64;
> > +}
> > +
> > +/*
> > + * For older processors, thermal margin reading is returned in S8.8 fixed
> > + * point format (16 bits: 8-bit signed magnitude, 8-bit fraction).
> > + */
> > +static s32 dts_eight_dot_eight_to_millidegree(s32 val)
> > +{
> > +       return sign_extend32(val, 15) * MILLIDEGREE_PER_DEGREE / 256;
> > +}
> > +
> > +static int get_die_temp(struct peci_cputemp *priv, long *val)
> > +{
> > +       int ret = 0;
> > +       long tjmax;
> > +       s16 temp;
> > +
> > +       mutex_lock(&priv->temp.die.state.lock);
> > +       if (!peci_sensor_need_update(&priv->temp.die.state))
> > +               goto skip_update;
> > +
> > +       ret = peci_temp_read(priv->peci_dev, &temp);
> > +       if (ret)
> > +               goto err_unlock;
> > +
> > +       if (!dts_valid(temp)) {
> > +               ret = -EIO;
> > +               goto err_unlock;
> > +       }
> > +
>
> I am getting lost with all the variable types. temp is s16,
> passed to dts_valid() as s32 and thus presumably sign extended.
> Since temp is s16, that means that a positive value of 0x8000
> or larger isn't really possible, and dts_valid() will never return
> false. With that in mind, you might as well drop the check.

It is a bug - dts_valid() should operate on unsigned value.
I'll fix it to take u16 instead of s32.
The idea then would be to treat value obtained from peci_temp_read (or any other
function that pulls values in "DTS" S10.6 or S8.8 format) as "raw" u16 and only
convert it to proper signed value after using respective X_to_milidegree
function.

>
> > +       ret = get_temp_target(priv, tjmax_type, &tjmax);
> > +       if (ret)
> > +               goto err_unlock;
> > +
> > +       priv->temp.die.value = (s32)tjmax +
> > dts_ten_dot_six_to_millidegree(temp);
> > +
> > +       peci_sensor_mark_updated(&priv->temp.die.state);
> > +
> > +skip_update:
> > +       *val = priv->temp.die.value;
> > +err_unlock:
> > +       mutex_unlock(&priv->temp.die.state.lock);
> > +       return ret;
> > +}
> > +
> > +static int get_dts(struct peci_cputemp *priv, long *val)
> > +{
> > +       int ret = 0;
> > +       s32 thermal_margin;
> > +       long tcontrol;
> > +       u32 pcs;
> > +
> > +       mutex_lock(&priv->temp.dts.state.lock);
> > +       if (!peci_sensor_need_update(&priv->temp.dts.state))
> > +               goto skip_update;
> > +
> > +       ret = peci_pcs_read(priv->peci_dev, PECI_PCS_THERMAL_MARGIN, 0,
> > &pcs);
> > +       if (ret)
> > +               goto err_unlock;
> > +
> > +       thermal_margin = FIELD_GET(DTS_MARGIN_MASK, pcs);
> > +       if (!dts_valid(thermal_margin)) { > +           ret = -EIO;
> > +               goto err_unlock;
> > +       }
> > +
> > +       ret = get_temp_target(priv, tcontrol_type, &tcontrol);
> > +       if (ret)
> > +               goto err_unlock;
> > +
> > +       /* Note that the tcontrol should be available before calling it */
> > +       priv->temp.dts.value =
> > +               (s32)tcontrol - priv->gen_info-
> > >thermal_margin_to_millidegree(thermal_margin);
>
> tcontrol is stored as s32, converted to long, only to be typecasted
> back to s32. Any reason for doing that ?

tcontrol is passed to get_temp_target(), which expects long.

Thanks
-Iwona

>
> > +
> > +       peci_sensor_mark_updated(&priv->temp.dts.state);
> > +
> > +skip_update:
> > +       *val = priv->temp.dts.value;
> > +err_unlock:
> > +       mutex_unlock(&priv->temp.dts.state.lock);
> > +       return ret;
> > +}
> > +
> > +static int get_core_temp(struct peci_cputemp *priv, int core_index, long
> > *val)
> > +{
> > +       int ret = 0;
> > +       s32 core_dts_margin;
> > +       long tjmax;
> > +       u32 pcs;
> > +
> > +       mutex_lock(&priv->temp.core[core_index].state.lock);
> > +       if (!peci_sensor_need_update(&priv->temp.core[core_index].state))
> > +               goto skip_update;
> > +
> > +       ret = peci_pcs_read(priv->peci_dev, PECI_PCS_MODULE_TEMP,
> > core_index, &pcs);
> > +       if (ret)
> > +               goto err_unlock;
> > +
> > +       core_dts_margin = FIELD_GET(PCS_MODULE_TEMP_MASK, pcs);
> > +       if (!dts_valid(core_dts_margin)) {
> > +               ret = -EIO;
> > +               goto err_unlock;
> > +       }
> > +
> > +       ret = get_temp_target(priv, tjmax_type, &tjmax);
> > +       if (ret)
> > +               goto err_unlock;
> > +
> > +       /* Note that the tjmax should be available before calling it */
> > +       priv->temp.core[core_index].value =
> > +               (s32)tjmax +
> > dts_ten_dot_six_to_millidegree(core_dts_margin);
> > +
> > +       peci_sensor_mark_updated(&priv->temp.core[core_index].state);
> > +
> > +skip_update:
> > +       *val = priv->temp.core[core_index].value;
> > +err_unlock:
> > +       mutex_unlock(&priv->temp.core[core_index].state.lock);
> > +       return ret;
> > +}
> > +
> > +static int cputemp_read_string(struct device *dev, enum hwmon_sensor_types
> > type,
> > +                              u32 attr, int channel, const char **str)
> > +{
> > +       struct peci_cputemp *priv = dev_get_drvdata(dev);
> > +
> > +       if (attr != hwmon_temp_label)
> > +               return -EOPNOTSUPP;
> > +
> > +       *str = channel < channel_core ?
> > +               cputemp_label[channel] : priv->coretemp_label[channel -
> > channel_core];
> > +
> > +       return 0;
> > +}
> > +
> > +static int cputemp_read(struct device *dev, enum hwmon_sensor_types type,
> > +                       u32 attr, int channel, long *val)
> > +{
> > +       struct peci_cputemp *priv = dev_get_drvdata(dev);
> > +
> > +       switch (attr) {
> > +       case hwmon_temp_input:
> > +               switch (channel) {
> > +               case channel_die:
> > +                       return get_die_temp(priv, val);
> > +               case channel_dts:
> > +                       return get_dts(priv, val);
> > +               case channel_tcontrol:
> > +                       return get_temp_target(priv, tcontrol_type, val);
> > +               case channel_tthrottle:
> > +                       return get_temp_target(priv, tthrottle_type, val);
> > +               case channel_tjmax:
> > +                       return get_temp_target(priv, tjmax_type, val);
> > +               default:
> > +                       return get_core_temp(priv, channel - channel_core,
> > val);
> > +               }
> > +               break;
> > +       case hwmon_temp_max:
> > +               return get_temp_target(priv, tcontrol_type, val);
> > +       case hwmon_temp_crit:
> > +               return get_temp_target(priv, tjmax_type, val);
> > +       case hwmon_temp_crit_hyst:
> > +               return get_temp_target(priv, crit_hyst_type, val);
> > +       default:
> > +               return -EOPNOTSUPP;
> > +       }
> > +
> > +       return 0;
> > +}
> > +
> > +static umode_t cputemp_is_visible(const void *data, enum hwmon_sensor_types
> > type,
> > +                                 u32 attr, int channel)
> > +{
> > +       const struct peci_cputemp *priv = data;
> > +
> > +       if (channel > CPUTEMP_CHANNEL_NUMS)
> > +               return 0;
> > +
> > +       if (channel < channel_core)
> > +               return 0444;
> > +
> > +       if (test_bit(channel - channel_core, priv->core_mask))
> > +               return 0444;
> > +
> > +       return 0;
> > +}
> > +
> > +static int init_core_mask(struct peci_cputemp *priv)
> > +{
> > +       struct peci_device *peci_dev = priv->peci_dev;
> > +       struct resolved_cores_reg *reg = priv->gen_info->reg;
> > +       u64 core_mask;
> > +       u32 data;
> > +       int ret;
> > +
> > +       /* Get the RESOLVED_CORES register value */
> > +       switch (peci_dev->info.model) {
> > +       case INTEL_FAM6_ICELAKE_X:
> > +       case INTEL_FAM6_ICELAKE_D:
> > +               ret = peci_ep_pci_local_read(peci_dev, 0, reg->bus, reg-
> > >dev,
> > +                                            reg->func, reg->offset + 4,
> > &data);
> > +               if (ret)
> > +                       return ret;
> > +
> > +               core_mask = (u64)data << 32;
> > +
> > +               ret = peci_ep_pci_local_read(peci_dev, 0, reg->bus, reg-
> > >dev,
> > +                                            reg->func, reg->offset, &data);
> > +               if (ret)
> > +                       return ret;
> > +
> > +               core_mask |= data;
> > +
> > +               break;
> > +       default:
> > +               ret = peci_pci_local_read(peci_dev, reg->bus, reg->dev,
> > +                                         reg->func, reg->offset, &data);
> > +               if (ret)
> > +                       return ret;
> > +
> > +               core_mask = data;
> > +
> > +               break;
> > +       }
> > +
> > +       if (!core_mask)
> > +               return -EIO;
> > +
> > +       bitmap_from_u64(priv->core_mask, core_mask);
> > +
> > +       return 0;
> > +}
> > +
> > +static int create_temp_label(struct peci_cputemp *priv)
> > +{
> > +       unsigned long core_max = find_last_bit(priv->core_mask,
> > CORE_NUMS_MAX);
> > +       int i;
> > +
> > +       priv->coretemp_label = devm_kzalloc(priv->dev, core_max *
> > sizeof(char *), GFP_KERNEL);
> > +       if (!priv->coretemp_label)
> > +               return -ENOMEM;
> > +
> > +       for_each_set_bit(i, priv->core_mask, CORE_NUMS_MAX) {
> > +               priv->coretemp_label[i] = devm_kasprintf(priv->dev,
> > GFP_KERNEL, "Core %d", i);
> > +               if (!priv->coretemp_label[i])
> > +                       return -ENOMEM;
> > +       }
> > +
> > +       return 0;
> > +}
> > +
> > +static void check_resolved_cores(struct peci_cputemp *priv)
> > +{
> > +       /*
> > +        * Failure to resolve cores is non-critical, we're still able to
> > +        * provide other sensor data.
> > +        */
> > +
> > +       if (init_core_mask(priv))
> > +               return;
> > +
> > +       if (create_temp_label(priv))
> > +               bitmap_zero(priv->core_mask, CORE_NUMS_MAX);
> > +}
> > +
> > +static void sensor_init(struct peci_cputemp *priv)
> > +{
> > +       int i;
> > +
> > +       mutex_init(&priv->temp.target.state.lock);
> > +       mutex_init(&priv->temp.die.state.lock);
> > +       mutex_init(&priv->temp.dts.state.lock);
> > +
> > +       for_each_set_bit(i, priv->core_mask, CORE_NUMS_MAX)
> > +               mutex_init(&priv->temp.core[i].state.lock);
> > +}
> > +
> > +static const struct hwmon_ops peci_cputemp_ops = {
> > +       .is_visible = cputemp_is_visible,
> > +       .read_string = cputemp_read_string,
> > +       .read = cputemp_read,
> > +};
> > +
> > +static const u32 peci_cputemp_temp_channel_config[] = {
> > +       /* Die temperature */
> > +       HWMON_T_LABEL | HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_CRIT |
> > HWMON_T_CRIT_HYST,
> > +       /* DTS margin */
> > +       HWMON_T_LABEL | HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_CRIT |
> > HWMON_T_CRIT_HYST,
> > +       /* Tcontrol temperature */
> > +       HWMON_T_LABEL | HWMON_T_INPUT | HWMON_T_CRIT,
> > +       /* Tthrottle temperature */
> > +       HWMON_T_LABEL | HWMON_T_INPUT,
> > +       /* Tjmax temperature */
> > +       HWMON_T_LABEL | HWMON_T_INPUT,
> > +       /* Core temperature - for all core channels */
> > +       [channel_core ... CPUTEMP_CHANNEL_NUMS - 1] = HWMON_T_LABEL |
> > HWMON_T_INPUT,
> > +       0
> > +};
> > +
> > +static const struct hwmon_channel_info peci_cputemp_temp_channel = {
> > +       .type = hwmon_temp,
> > +       .config = peci_cputemp_temp_channel_config,
> > +};
> > +
> > +static const struct hwmon_channel_info *peci_cputemp_info[] = {
> > +       &peci_cputemp_temp_channel,
> > +       NULL
> > +};
> > +
> > +static const struct hwmon_chip_info peci_cputemp_chip_info = {
> > +       .ops = &peci_cputemp_ops,
> > +       .info = peci_cputemp_info,
> > +};
> > +
> > +static int peci_cputemp_probe(struct auxiliary_device *adev,
> > +                             const struct auxiliary_device_id *id)
> > +{
> > +       struct device *dev = &adev->dev;
> > +       struct peci_device *peci_dev = to_peci_device(dev->parent);
> > +       struct peci_cputemp *priv;
> > +       struct device *hwmon_dev;
> > +
> > +       priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
> > +       if (!priv)
> > +               return -ENOMEM;
> > +
> > +       priv->name = devm_kasprintf(dev, GFP_KERNEL, "peci_cputemp.cpu%d",
> > +                                   peci_dev->info.socket_id);
> > +       if (!priv->name)
> > +               return -ENOMEM;
> > +
> > +       priv->dev = dev;
> > +       priv->peci_dev = peci_dev;
> > +       priv->gen_info = (const struct cpu_info *)id->driver_data;
> > +
> > +       /*
> > +        * This is just a sanity check. Since we're using commands that are
> > +        * guaranteed to be supported on a given platform, we should never
> > see
> > +        * revision lower than expected.
> > +        */
> > +       if (peci_dev->info.peci_revision < priv->gen_info-
> > >min_peci_revision)
> > +               dev_warn(priv->dev,
> > +                        "Unexpected PECI revision %#x, some features may be
> > unavailable\n",
> > +                        peci_dev->info.peci_revision);
> > +
> > +       check_resolved_cores(priv);
> > +
> > +       sensor_init(priv);
> > +
> > +       hwmon_dev = devm_hwmon_device_register_with_info(priv->dev, priv-
> > >name,
> > +                                                        priv,
> > &peci_cputemp_chip_info, NULL);
> > +
> > +       return PTR_ERR_OR_ZERO(hwmon_dev);
> > +}
> > +
> > +/*
> > + * RESOLVED_CORES PCI configuration register may have different location on
> > + * different platforms.
> > + */
> > +static struct resolved_cores_reg resolved_cores_reg_hsx = {
> > +       .bus = 1,
> > +       .dev = 30,
> > +       .func = 3,
> > +       .offset = 0xb4,
> > +};
> > +
> > +static struct resolved_cores_reg resolved_cores_reg_icx = {
> > +       .bus = 14,
> > +       .dev = 30,
> > +       .func = 3,
> > +       .offset = 0xd0,
> > +};
> > +
> > +static const struct cpu_info cpu_hsx = {
> > +       .reg            = &resolved_cores_reg_hsx,
> > +       .min_peci_revision = 0x33,
> > +       .thermal_margin_to_millidegree =
> > &dts_eight_dot_eight_to_millidegree,
> > +};
> > +
> > +static const struct cpu_info cpu_icx = {
> > +       .reg            = &resolved_cores_reg_icx,
> > +       .min_peci_revision = 0x40,
> > +       .thermal_margin_to_millidegree = &dts_ten_dot_six_to_millidegree,
> > +};
> > +
> > +static const struct auxiliary_device_id peci_cputemp_ids[] = {
> > +       {
> > +               .name = "peci_cpu.cputemp.hsx",
> > +               .driver_data = (kernel_ulong_t)&cpu_hsx,
> > +       },
> > +       {
> > +               .name = "peci_cpu.cputemp.bdx",
> > +               .driver_data = (kernel_ulong_t)&cpu_hsx,
> > +       },
> > +       {
> > +               .name = "peci_cpu.cputemp.bdxd",
> > +               .driver_data = (kernel_ulong_t)&cpu_hsx,
> > +       },
> > +       {
> > +               .name = "peci_cpu.cputemp.skx",
> > +               .driver_data = (kernel_ulong_t)&cpu_hsx,
> > +       },
> > +       {
> > +               .name = "peci_cpu.cputemp.icx",
> > +               .driver_data = (kernel_ulong_t)&cpu_icx,
> > +       },
> > +       {
> > +               .name = "peci_cpu.cputemp.icxd",
> > +               .driver_data = (kernel_ulong_t)&cpu_icx,
> > +       },
> > +       { }
> > +};
> > +MODULE_DEVICE_TABLE(auxiliary, peci_cputemp_ids);
> > +
> > +static struct auxiliary_driver peci_cputemp_driver = {
> > +       .probe          = peci_cputemp_probe,
> > +       .id_table       = peci_cputemp_ids,
> > +};
> > +
> > +module_auxiliary_driver(peci_cputemp_driver);
> > +
> > +MODULE_AUTHOR("Jae Hyun Yoo <[email protected]>");
> > +MODULE_AUTHOR("Iwona Winiarska <[email protected]>");
> > +MODULE_DESCRIPTION("PECI cputemp driver");
> > +MODULE_LICENSE("GPL");
> > +MODULE_IMPORT_NS(PECI_CPU);
> >
>

2021-11-17 23:26:35

by Winiarska, Iwona

[permalink] [raw]
Subject: Re: [PATCH v3 00/13] Introduce PECI subsystem

On Wed, 2021-11-17 at 03:56 +0000, Zev Weiss wrote:
> On Mon, Nov 15, 2021 at 10:25:39AM PST, Iwona Winiarska wrote:
> > Hi,
> >
> > This is a third round of patches introducing PECI subsystem.
> > Sorry for the delay between v2 and v3.
> >
>
> Hi Iwona,
>
> I've done some testing of these patches on my AST2500/E-2778G OpenBMC
> platform -- I had to do a small bit of hacking to add support for
> INTEL_FAM6_KABYLAKE, but with that in place the newly-added code for the
> 8.8 format seems to work as it should.  Thanks!

Thanks for the report and testing :)

>
> In poking at it a bit further I encountered some sub-optimal behavior
> w.r.t. to host power state transitions and timeouts though --
> essentially, if I ever hit a timeout in aspeed_peci_xfer() (for example
> on a read of a hwmon tempX_input file after an unexpected host
> shutdown), it seems to get stuck in a state where even if the host comes
> back online, all attempted PECI transfers continue just timing out.
> (Rebooting the BMC seems to resolve the problem.)  This also happens if
> I remove the peci client device via the 'remove' sysfs file, shut down
> the host, and then do a rescan via sysfs while the host is off (i.e.
> another operation that times out).
>
> Let me know if there's any other info that would be helpful for
> debugging.

That's unexpected. I do have an idea what might have caused that. Let me fix it
in v4.

Thanks
-Iwona

>
>
> Thanks,
> Zev

2021-11-18 21:51:15

by Winiarska, Iwona

[permalink] [raw]
Subject: Re: [PATCH v3 00/13] Introduce PECI subsystem

On Thu, 2021-11-18 at 14:19 +0200, Tomer Maimon wrote:
> Hi Iwona,
>
> My name is Tomer I working as a SW engineer in Nuvoton BMC project.
>
> First, thanks for upstreaming the PECI driver!
>
> Nuvoton (NPCM) PECI driver was in the PECI patchset that has been handheld by
> Jae.
> https://patchwork.kernel.org/project/linux-arm-kernel/patch/[email protected]/
>
> Could you add Nuvoton (NPCM) PECI driver to your patch set next time you will
> send upstream patches to Linux vanilla?

Some (relatively small) changes are going to be needed to adapt that driver to
changes that happened in PECI core.
I want to keep this series as small as possible, but once it gets merged,
Nuvoton driver can be added in a separate series.

> If you agree, we will check your patchset in Nuvoton systems in a few days and
> send you NPCM OECI driver and documentation.
>

I don't have any hardware to test it on so help will definitely be welcome :)

Thanks
-Iwona

> Thanks,
>
> Tomer
>
> On Mon, 15 Nov 2021 at 20:28, Iwona Winiarska <[email protected]> wrote:
> > Hi,
> >
> > This is a third round of patches introducing PECI subsystem.
> > Sorry for the delay between v2 and v3.
> >
> > The Platform Environment Control Interface (PECI) is a communication
> > interface between Intel processors and management controllers (e.g.
> > Baseboard Management Controller, BMC).
> >
> > This series adds a PECI subsystem and introduces drivers which run in
> > the Linux instance on the management controller (not the main Intel
> > processor) and is intended to be used by the OpenBMC [1], a Linux
> > distribution for BMC devices.
> > The information exposed over PECI (like processor and DIMM
> > temperature) refers to the Intel processor and can be consumed by
> > daemons running on the BMC to, for example, display the processor
> > temperature in its web interface.
> >
> > The PECI bus is collection of code that provides interface support
> > between PECI devices (that actually represent processors) and PECI
> > controllers (such as the "peci-aspeed" controller) that allow to
> > access physical PECI interface. PECI devices are bound to PECI
> > drivers that provides access to PECI services. This series introduces
> > a generic "peci-cpu" driver that exposes hardware monitoring "cputemp"
> > and "dimmtemp" using the auxiliary bus.
> >
> > Exposing "raw" PECI to userspace, either to write userspace drivers or
> > for debug/testing purpose was left out of this series to encourage
> > writing kernel drivers instead, but may be pursued in the future.
> >
> > Introducing PECI to upstream Linux was already attempted before [2].
> > Since it's been over a year since last revision, and the series
> > changed quite a bit in the meantime, I've decided to start from v1.
> >
> > I would also like to give credit to everyone who helped me with
> > different aspects of preliminary review:
> > - Pierre-Louis Bossart,
> > - Tony Luck,
> > - Andy Shevchenko,
> > - Dave Hansen.
> >
> > [1] https://github.com/openbmc/openbmc
> > [2]
> > https://lore.kernel.org/openbmc/[email protected]/
> >
> > Changes v2 -> v3:
> >
> > * Dropped x86/cpu patches (Boris)
> > * Dropped pr_fmt() for PECI module (Dan)
> > * Fixed releasing peci controller device flow (Dan)
> > * Improved peci-aspeed commit-msg and Kconfig help (Dan)
> > * Fixed aspeed_peci_xfer() to use the proper spin_lock function (Dan)
> > * Wrapped print_hex_dump_bytes() in CONFIG_DYNAMIC_DEBUG (Dan)
> > * Removed debug status logs from aspeed_peci_irq_handler() (Dan)
> > * Renamed functions using devres to start with "devm" (Dan)
> > * Changed request to be allocated on stack in peci_detect (Dan)
> > * Removed redundant WARN_ON on invalid PECI addr (Dan)
> > * Changed peci_device_create() to use device_initialize() + device_add()
> > pattern (Dan)
> > * Fixed peci_device_destroy() to use kill_device() avoiding double-free (Dan)
> > * Renamed functions that perform xfer using "peci_xfer_*" prefix (Dan)
> > * Renamed peci_request_data_dib(temp) -> peci_request_dib(temp)_read (Dan)
> > * Fixed thermal margin readings for older Intel processors (Zev)
> > * Misc hwmon simplifications (Guenter)
> > * Used BIT_PER_TYPE to verify macro value constrains (Guenter)
> > * Improved WARN_ON message to print chan_rank_max and idx_dimm_max (Guenter)
> > * Improved dimmtemp to not reattempt probe if no dimms are populated
> >
> > Changes v1 -> v2:
> >
> > Biggest changes when it comes to diffstat are locking in HWMON
> > (I decided to clean things up a bit while adding it), switching to
> > devres usage in more places and exposing sysfs interface in separate patch.
> >
> > * Moved extending X86 ARCHITECTURE MAINTAINERS earlier in series (Dan)
> > * Removed "default n" for GENERIC_LIB_X86 (Dan)
> > * Added vendor prefix for peci-aspeed specific properties (Rob)
> > * Refactored PECI to use devres consistently (Dan)
> > * Added missing sysfs documentation and excluded adding peci-sysfs to
> >   separate patch (Dan)
> > * Used module_init() instead of subsys_init() for peci module initialization
> > (Dan)
> > * Removed redundant struct peci_device member (Dan)
> > * Improved PECI Kconfig help (Randy/Dan)
> > * Fixed/removed log messages (Dan, Guenter)
> > * Refactored peci-cputemp and peci-dimmtemp and added missing locks (Guenter)
> > * Removed unused dev_set_drvdata() in peci-cputemp and peci-dimmtemp (Guenter)
> > * Fixed used types, names, fixed broken and added additional comments
> >   to peci-hwmon (Guenter, Zev)
> > * Refactored peci-dimmtemp to not return -ETIMEDOUT (Guenter)
> > * Added sanity check for min_peci_revision in peci-hwmon drivers (Zev)
> > * Added assert for DIMM_NUMS_MAX and additional warning in peci-dimmtemp (Zev)
> > * Fixed macro names in peci-aspeed (Zev)
> > * Refactored peci-aspeed sanitizing properties to a single helper function
> > (Zev)
> > * Fixed peci_cpu_device_ids definition for Broadwell Xeon D (David)
> > * Refactor peci_request to use a single allocation (Zev)
> > * Used min_t() to improve code readability (Zev)
> > * Added macro for PECI_RDENDPTCFG_MMIO_WR_LEN_BASE and fixed adev type
> >   array name to more descriptive (Zev)
> > * Fixed peci-hwmon commit-msg and documentation (Zev)
> >
> > Thanks
> > -Iwona
> >
> > Iwona Winiarska (11):
> >   dt-bindings: Add generic bindings for PECI
> >   dt-bindings: Add bindings for peci-aspeed
> >   ARM: dts: aspeed: Add PECI controller nodes
> >   peci: Add core infrastructure
> >   peci: Add device detection
> >   peci: Add sysfs interface for PECI bus
> >   peci: Add support for PECI device drivers
> >   peci: Add peci-cpu driver
> >   hwmon: peci: Add cputemp driver
> >   hwmon: peci: Add dimmtemp driver
> >   docs: Add PECI documentation
> >
> > Jae Hyun Yoo (2):
> >   peci: Add peci-aspeed controller driver
> >   docs: hwmon: Document PECI drivers
> >
> >  Documentation/ABI/testing/sysfs-bus-peci      |  16 +
> >  .../devicetree/bindings/peci/peci-aspeed.yaml | 109 +++
> >  .../bindings/peci/peci-controller.yaml        |  33 +
> >  Documentation/hwmon/index.rst                 |   2 +
> >  Documentation/hwmon/peci-cputemp.rst          |  90 +++
> >  Documentation/hwmon/peci-dimmtemp.rst         |  57 ++
> >  Documentation/index.rst                       |   1 +
> >  Documentation/peci/index.rst                  |  16 +
> >  Documentation/peci/peci.rst                   |  51 ++
> >  MAINTAINERS                                   |  29 +
> >  arch/arm/boot/dts/aspeed-g4.dtsi              |  14 +
> >  arch/arm/boot/dts/aspeed-g5.dtsi              |  14 +
> >  arch/arm/boot/dts/aspeed-g6.dtsi              |  14 +
> >  drivers/Kconfig                               |   3 +
> >  drivers/Makefile                              |   1 +
> >  drivers/hwmon/Kconfig                         |   2 +
> >  drivers/hwmon/Makefile                        |   1 +
> >  drivers/hwmon/peci/Kconfig                    |  31 +
> >  drivers/hwmon/peci/Makefile                   |   7 +
> >  drivers/hwmon/peci/common.h                   |  58 ++
> >  drivers/hwmon/peci/cputemp.c                  | 592 ++++++++++++++++
> >  drivers/hwmon/peci/dimmtemp.c                 | 630 ++++++++++++++++++
> >  drivers/peci/Kconfig                          |  36 +
> >  drivers/peci/Makefile                         |  10 +
> >  drivers/peci/controller/Kconfig               |  17 +
> >  drivers/peci/controller/Makefile              |   3 +
> >  drivers/peci/controller/peci-aspeed.c         | 429 ++++++++++++
> >  drivers/peci/core.c                           | 236 +++++++
> >  drivers/peci/cpu.c                            | 343 ++++++++++
> >  drivers/peci/device.c                         | 249 +++++++
> >  drivers/peci/internal.h                       | 136 ++++
> >  drivers/peci/request.c                        | 482 ++++++++++++++
> >  drivers/peci/sysfs.c                          |  82 +++
> >  include/linux/peci-cpu.h                      |  40 ++
> >  include/linux/peci.h                          | 110 +++
> >  35 files changed, 3944 insertions(+)
> >  create mode 100644 Documentation/ABI/testing/sysfs-bus-peci
> >  create mode 100644 Documentation/devicetree/bindings/peci/peci-aspeed.yaml
> >  create mode 100644 Documentation/devicetree/bindings/peci/peci-
> > controller.yaml
> >  create mode 100644 Documentation/hwmon/peci-cputemp.rst
> >  create mode 100644 Documentation/hwmon/peci-dimmtemp.rst
> >  create mode 100644 Documentation/peci/index.rst
> >  create mode 100644 Documentation/peci/peci.rst
> >  create mode 100644 drivers/hwmon/peci/Kconfig
> >  create mode 100644 drivers/hwmon/peci/Makefile
> >  create mode 100644 drivers/hwmon/peci/common.h
> >  create mode 100644 drivers/hwmon/peci/cputemp.c
> >  create mode 100644 drivers/hwmon/peci/dimmtemp.c
> >  create mode 100644 drivers/peci/Kconfig
> >  create mode 100644 drivers/peci/Makefile
> >  create mode 100644 drivers/peci/controller/Kconfig
> >  create mode 100644 drivers/peci/controller/Makefile
> >  create mode 100644 drivers/peci/controller/peci-aspeed.c
> >  create mode 100644 drivers/peci/core.c
> >  create mode 100644 drivers/peci/cpu.c
> >  create mode 100644 drivers/peci/device.c
> >  create mode 100644 drivers/peci/internal.h
> >  create mode 100644 drivers/peci/request.c
> >  create mode 100644 drivers/peci/sysfs.c
> >  create mode 100644 include/linux/peci-cpu.h
> >  create mode 100644 include/linux/peci.h
> >