2024-03-30 11:24:44

by Maximilian Luz

[permalink] [raw]
Subject: [PATCH 0/3] Add thermal sensor driver for Surface Aggregator

This series adds support for reading thermal sensors connected via the
Surface Aggregatgor Module (the embedded controller found on all modern
Microsoft Surface devices).

The EC can have up to 16 thermal sensors connected via a single
sub-device, each providing temperature readings and a label string.

This has been developed together with Ivor Wanders. For more details,
see the following links:

- https://github.com/linux-surface/surface-aggregator-module/issues/59
- https://github.com/linux-surface/surface-aggregator-module/pull/68

Maximilian Luz (3):
hwmon: Add thermal sensor driver for Surface Aggregator Module
hwmon: surface_temp: Add support for sensor names
platform/surface: aggregator_registry: Add support for thermal sensors
on the Surface Pro 9

MAINTAINERS | 6 +
drivers/hwmon/Kconfig | 10 +
drivers/hwmon/Makefile | 1 +
drivers/hwmon/surface_temp.c | 243 ++++++++++++++++++
.../surface/surface_aggregator_registry.c | 7 +
5 files changed, 267 insertions(+)
create mode 100644 drivers/hwmon/surface_temp.c

--
2.44.0



2024-03-30 11:24:57

by Maximilian Luz

[permalink] [raw]
Subject: [PATCH 1/3] hwmon: Add thermal sensor driver for Surface Aggregator Module

Some of the newer Microsoft Surface devices (such as the Surface Book
3 and Pro 9) have thermal sensors connected via the Surface Aggregator
Module (the embedded controller on those devices). Add a basic driver
to read out the temperature values of those sensors.

Link: https://github.com/linux-surface/surface-aggregator-module/issues/59
Signed-off-by: Maximilian Luz <[email protected]>
---
MAINTAINERS | 6 ++
drivers/hwmon/Kconfig | 10 +++
drivers/hwmon/Makefile | 1 +
drivers/hwmon/surface_temp.c | 165 +++++++++++++++++++++++++++++++++++
4 files changed, 182 insertions(+)
create mode 100644 drivers/hwmon/surface_temp.c

diff --git a/MAINTAINERS b/MAINTAINERS
index d36c19c1bf811..bc5bc418ed479 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14738,6 +14738,12 @@ S: Maintained
F: Documentation/hwmon/surface_fan.rst
F: drivers/hwmon/surface_fan.c

+MICROSOFT SURFACE SENSOR THERMAL DRIVER
+M: Maximilian Luz <[email protected]>
+L: [email protected]
+S: Maintained
+F: drivers/hwmon/surface_temp.c
+
MICROSOFT SURFACE GPE LID SUPPORT DRIVER
M: Maximilian Luz <[email protected]>
L: [email protected]
diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
index 83945397b6eb1..338ef73c96a3a 100644
--- a/drivers/hwmon/Kconfig
+++ b/drivers/hwmon/Kconfig
@@ -2070,6 +2070,16 @@ config SENSORS_SURFACE_FAN

Select M or Y here, if you want to be able to read the fan's speed.

+config SENSORS_SURFACE_TEMP
+ tristate "Microsoft Surface Thermal Sensor Driver"
+ depends on SURFACE_AGGREGATOR
+ help
+ Driver for monitoring thermal sensors connected via the Surface
+ Aggregator Module (embedded controller) on Microsoft Surface devices.
+
+ This driver can also be built as a module. If so, the module
+ will be called surface_temp.
+
config SENSORS_ADC128D818
tristate "Texas Instruments ADC128D818"
depends on I2C
diff --git a/drivers/hwmon/Makefile b/drivers/hwmon/Makefile
index 5c31808f6378d..de8bc99719e63 100644
--- a/drivers/hwmon/Makefile
+++ b/drivers/hwmon/Makefile
@@ -208,6 +208,7 @@ obj-$(CONFIG_SENSORS_SMSC47M192)+= smsc47m192.o
obj-$(CONFIG_SENSORS_SPARX5) += sparx5-temp.o
obj-$(CONFIG_SENSORS_STTS751) += stts751.o
obj-$(CONFIG_SENSORS_SURFACE_FAN)+= surface_fan.o
+obj-$(CONFIG_SENSORS_SURFACE_TEMP)+= surface_temp.o
obj-$(CONFIG_SENSORS_SY7636A) += sy7636a-hwmon.o
obj-$(CONFIG_SENSORS_AMC6821) += amc6821.o
obj-$(CONFIG_SENSORS_TC74) += tc74.o
diff --git a/drivers/hwmon/surface_temp.c b/drivers/hwmon/surface_temp.c
new file mode 100644
index 0000000000000..48c3e826713f6
--- /dev/null
+++ b/drivers/hwmon/surface_temp.c
@@ -0,0 +1,165 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Thermal sensor subsystem driver for Surface System Aggregator Module (SSAM).
+ *
+ * Copyright (C) 2022-2023 Maximilian Luz <[email protected]>
+ */
+
+#include <linux/bitops.h>
+#include <linux/hwmon.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/types.h>
+
+#include <linux/surface_aggregator/controller.h>
+#include <linux/surface_aggregator/device.h>
+
+
+/* -- SAM interface. -------------------------------------------------------- */
+
+SSAM_DEFINE_SYNC_REQUEST_CL_R(__ssam_tmp_get_available_sensors, __le16, {
+ .target_category = SSAM_SSH_TC_TMP,
+ .command_id = 0x04,
+});
+
+SSAM_DEFINE_SYNC_REQUEST_MD_R(__ssam_tmp_get_temperature, __le16, {
+ .target_category = SSAM_SSH_TC_TMP,
+ .command_id = 0x01,
+});
+
+static int ssam_tmp_get_available_sensors(struct ssam_device *sdev, s16 *sensors)
+{
+ __le16 sensors_le;
+ int status;
+
+ status = __ssam_tmp_get_available_sensors(sdev, &sensors_le);
+ if (status)
+ return status;
+
+ *sensors = le16_to_cpu(sensors_le);
+ return 0;
+}
+
+static int ssam_tmp_get_temperature(struct ssam_device *sdev, u8 iid, long *temperature)
+{
+ __le16 temp_le;
+ int status;
+
+ status = __ssam_tmp_get_temperature(sdev->ctrl, sdev->uid.target, iid, &temp_le);
+ if (status)
+ return status;
+
+ /* Convert 1/10 °K to 1/1000 °C */
+ *temperature = (le16_to_cpu(temp_le) - 2731) * 100L;
+ return 0;
+}
+
+
+/* -- Driver.---------------------------------------------------------------- */
+
+struct ssam_temp {
+ struct ssam_device *sdev;
+ s16 sensors;
+};
+
+static umode_t ssam_temp_hwmon_is_visible(const void *data,
+ enum hwmon_sensor_types type,
+ u32 attr, int channel)
+{
+ const struct ssam_temp *ssam_temp = data;
+
+ if (!(ssam_temp->sensors & BIT(channel)))
+ return 0;
+
+ return 0444;
+}
+
+static int ssam_temp_hwmon_read(struct device *dev,
+ enum hwmon_sensor_types type,
+ u32 attr, int channel, long *value)
+{
+ const struct ssam_temp *ssam_temp = dev_get_drvdata(dev);
+
+ return ssam_tmp_get_temperature(ssam_temp->sdev, channel + 1, value);
+}
+
+static const struct hwmon_channel_info * const ssam_temp_hwmon_info[] = {
+ HWMON_CHANNEL_INFO(chip,
+ HWMON_C_REGISTER_TZ),
+ /* We have at most 16 thermal sensor channels. */
+ HWMON_CHANNEL_INFO(temp,
+ HWMON_T_INPUT,
+ HWMON_T_INPUT,
+ HWMON_T_INPUT,
+ HWMON_T_INPUT,
+ HWMON_T_INPUT,
+ HWMON_T_INPUT,
+ HWMON_T_INPUT,
+ HWMON_T_INPUT,
+ HWMON_T_INPUT,
+ HWMON_T_INPUT,
+ HWMON_T_INPUT,
+ HWMON_T_INPUT,
+ HWMON_T_INPUT,
+ HWMON_T_INPUT,
+ HWMON_T_INPUT,
+ HWMON_T_INPUT),
+ NULL
+};
+
+static const struct hwmon_ops ssam_temp_hwmon_ops = {
+ .is_visible = ssam_temp_hwmon_is_visible,
+ .read = ssam_temp_hwmon_read,
+};
+
+static const struct hwmon_chip_info ssam_temp_hwmon_chip_info = {
+ .ops = &ssam_temp_hwmon_ops,
+ .info = ssam_temp_hwmon_info,
+};
+
+static int ssam_temp_probe(struct ssam_device *sdev)
+{
+ struct ssam_temp *ssam_temp;
+ struct device *hwmon_dev;
+ s16 sensors;
+ int status;
+
+ status = ssam_tmp_get_available_sensors(sdev, &sensors);
+ if (status)
+ return status;
+
+ ssam_temp = devm_kzalloc(&sdev->dev, sizeof(*ssam_temp), GFP_KERNEL);
+ if (!ssam_temp)
+ return -ENOMEM;
+
+ ssam_temp->sdev = sdev;
+ ssam_temp->sensors = sensors;
+
+ hwmon_dev = devm_hwmon_device_register_with_info(&sdev->dev,
+ "surface_thermal", ssam_temp, &ssam_temp_hwmon_chip_info,
+ NULL);
+ if (IS_ERR(hwmon_dev))
+ return PTR_ERR(hwmon_dev);
+
+ return 0;
+}
+
+static const struct ssam_device_id ssam_temp_match[] = {
+ { SSAM_SDEV(TMP, SAM, 0x00, 0x02) },
+ { },
+};
+MODULE_DEVICE_TABLE(ssam, ssam_temp_match);
+
+static struct ssam_device_driver ssam_temp = {
+ .probe = ssam_temp_probe,
+ .match_table = ssam_temp_match,
+ .driver = {
+ .name = "surface_temp",
+ .probe_type = PROBE_PREFER_ASYNCHRONOUS,
+ },
+};
+module_ssam_device_driver(ssam_temp);
+
+MODULE_AUTHOR("Maximilian Luz <[email protected]>");
+MODULE_DESCRIPTION("Thermal sensor subsystem driver for Surface System Aggregator Module");
+MODULE_LICENSE("GPL");
--
2.44.0


2024-03-30 11:25:08

by Maximilian Luz

[permalink] [raw]
Subject: [PATCH 2/3] hwmon: surface_temp: Add support for sensor names

From: Ivor Wanders <[email protected]>

The thermal subsystem of the Surface Aggregator Module allows us to
query the names of the respective thermal sensors. Forward those to
userspace.

Signed-off-by: Ivor Wanders <[email protected]>
Co-developed-by: Maximilian Luz <[email protected]>
Signed-off-by: Maximilian Luz <[email protected]>
---
drivers/hwmon/surface_temp.c | 112 +++++++++++++++++++++++++++++------
1 file changed, 95 insertions(+), 17 deletions(-)

diff --git a/drivers/hwmon/surface_temp.c b/drivers/hwmon/surface_temp.c
index 48c3e826713f6..7a2e1f638336c 100644
--- a/drivers/hwmon/surface_temp.c
+++ b/drivers/hwmon/surface_temp.c
@@ -17,6 +17,27 @@

/* -- SAM interface. -------------------------------------------------------- */

+/*
+ * Available sensors are indicated by a 16-bit bitfield, where a 1 marks the
+ * presence of a sensor. So we have at most 16 possible sensors/channels.
+ */
+#define SSAM_TMP_SENSOR_MAX_COUNT 16
+
+/*
+ * All names observed so far are 6 characters long, but there's only
+ * zeros after the name, so perhaps they can be longer. This number reflects
+ * the maximum zero-padded space observed in the returned buffer.
+ */
+#define SSAM_TMP_SENSOR_NAME_LENGTH 18
+
+struct ssam_tmp_get_name_rsp {
+ __le16 unknown1;
+ char unknown2;
+ char name[SSAM_TMP_SENSOR_NAME_LENGTH];
+} __packed;
+
+static_assert(sizeof(struct ssam_tmp_get_name_rsp) == 21);
+
SSAM_DEFINE_SYNC_REQUEST_CL_R(__ssam_tmp_get_available_sensors, __le16, {
.target_category = SSAM_SSH_TC_TMP,
.command_id = 0x04,
@@ -27,6 +48,11 @@ SSAM_DEFINE_SYNC_REQUEST_MD_R(__ssam_tmp_get_temperature, __le16, {
.command_id = 0x01,
});

+SSAM_DEFINE_SYNC_REQUEST_MD_R(__ssam_tmp_get_name, struct ssam_tmp_get_name_rsp, {
+ .target_category = SSAM_SSH_TC_TMP,
+ .command_id = 0x0e,
+});
+
static int ssam_tmp_get_available_sensors(struct ssam_device *sdev, s16 *sensors)
{
__le16 sensors_le;
@@ -54,12 +80,37 @@ static int ssam_tmp_get_temperature(struct ssam_device *sdev, u8 iid, long *temp
return 0;
}

+static int ssam_tmp_get_name(struct ssam_device *sdev, u8 iid, char *buf, size_t buf_len)
+{
+ struct ssam_tmp_get_name_rsp name_rsp;
+ int status;
+
+ status = __ssam_tmp_get_name(sdev->ctrl, sdev->uid.target, iid, &name_rsp);
+ if (status)
+ return status;
+
+ /*
+ * This should not fail unless the name in the returned struct is not
+ * null-terminated or someone changed something in the struct
+ * definitions above, since our buffer and struct have the same
+ * capacity by design. So if this fails blow this up with a warning.
+ * Since the more likely cause is that the returned string isn't
+ * null-terminated, we might have received garbage (as opposed to just
+ * an incomplete string), so also fail the function.
+ */
+ status = strscpy(buf, name_rsp.name, buf_len);
+ WARN_ON(status < 0);
+
+ return status < 0 ? status : 0;
+}
+

/* -- Driver.---------------------------------------------------------------- */

struct ssam_temp {
struct ssam_device *sdev;
s16 sensors;
+ char names[SSAM_TMP_SENSOR_MAX_COUNT][SSAM_TMP_SENSOR_NAME_LENGTH];
};

static umode_t ssam_temp_hwmon_is_visible(const void *data,
@@ -83,33 +134,47 @@ static int ssam_temp_hwmon_read(struct device *dev,
return ssam_tmp_get_temperature(ssam_temp->sdev, channel + 1, value);
}

+static int ssam_temp_hwmon_read_string(struct device *dev,
+ enum hwmon_sensor_types type,
+ u32 attr, int channel, const char **str)
+{
+ const struct ssam_temp *ssam_temp = dev_get_drvdata(dev);
+
+ *str = ssam_temp->names[channel];
+ return 0;
+}
+
static const struct hwmon_channel_info * const ssam_temp_hwmon_info[] = {
HWMON_CHANNEL_INFO(chip,
HWMON_C_REGISTER_TZ),
- /* We have at most 16 thermal sensor channels. */
+ /*
+ * We have at most SSAM_TMP_SENSOR_MAX_COUNT = 16 thermal sensor
+ * channels.
+ */
HWMON_CHANNEL_INFO(temp,
- HWMON_T_INPUT,
- HWMON_T_INPUT,
- HWMON_T_INPUT,
- HWMON_T_INPUT,
- HWMON_T_INPUT,
- HWMON_T_INPUT,
- HWMON_T_INPUT,
- HWMON_T_INPUT,
- HWMON_T_INPUT,
- HWMON_T_INPUT,
- HWMON_T_INPUT,
- HWMON_T_INPUT,
- HWMON_T_INPUT,
- HWMON_T_INPUT,
- HWMON_T_INPUT,
- HWMON_T_INPUT),
+ HWMON_T_INPUT | HWMON_T_LABEL,
+ HWMON_T_INPUT | HWMON_T_LABEL,
+ HWMON_T_INPUT | HWMON_T_LABEL,
+ HWMON_T_INPUT | HWMON_T_LABEL,
+ HWMON_T_INPUT | HWMON_T_LABEL,
+ HWMON_T_INPUT | HWMON_T_LABEL,
+ HWMON_T_INPUT | HWMON_T_LABEL,
+ HWMON_T_INPUT | HWMON_T_LABEL,
+ HWMON_T_INPUT | HWMON_T_LABEL,
+ HWMON_T_INPUT | HWMON_T_LABEL,
+ HWMON_T_INPUT | HWMON_T_LABEL,
+ HWMON_T_INPUT | HWMON_T_LABEL,
+ HWMON_T_INPUT | HWMON_T_LABEL,
+ HWMON_T_INPUT | HWMON_T_LABEL,
+ HWMON_T_INPUT | HWMON_T_LABEL,
+ HWMON_T_INPUT | HWMON_T_LABEL),
NULL
};

static const struct hwmon_ops ssam_temp_hwmon_ops = {
.is_visible = ssam_temp_hwmon_is_visible,
.read = ssam_temp_hwmon_read,
+ .read_string = ssam_temp_hwmon_read_string,
};

static const struct hwmon_chip_info ssam_temp_hwmon_chip_info = {
@@ -122,6 +187,7 @@ static int ssam_temp_probe(struct ssam_device *sdev)
struct ssam_temp *ssam_temp;
struct device *hwmon_dev;
s16 sensors;
+ int channel;
int status;

status = ssam_tmp_get_available_sensors(sdev, &sensors);
@@ -135,6 +201,18 @@ static int ssam_temp_probe(struct ssam_device *sdev)
ssam_temp->sdev = sdev;
ssam_temp->sensors = sensors;

+ /* Retrieve the name for each available sensor. */
+ for (channel = 0; channel < SSAM_TMP_SENSOR_MAX_COUNT; channel++) {
+ if (!(sensors & BIT(channel)))
+ continue;
+
+ status = ssam_tmp_get_name(sdev, channel + 1,
+ ssam_temp->names[channel],
+ SSAM_TMP_SENSOR_NAME_LENGTH);
+ if (status)
+ return status;
+ }
+
hwmon_dev = devm_hwmon_device_register_with_info(&sdev->dev,
"surface_thermal", ssam_temp, &ssam_temp_hwmon_chip_info,
NULL);
--
2.44.0


2024-03-30 11:25:18

by Maximilian Luz

[permalink] [raw]
Subject: [PATCH 3/3] platform/surface: aggregator_registry: Add support for thermal sensors on the Surface Pro 9

The Surface Pro 9 has thermal sensors connected via the Surface
Aggregator Module. Add a device node to support those.

Signed-off-by: Maximilian Luz <[email protected]>
---
drivers/platform/surface/surface_aggregator_registry.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/drivers/platform/surface/surface_aggregator_registry.c b/drivers/platform/surface/surface_aggregator_registry.c
index 035d6b4105cd6..c38203c00a705 100644
--- a/drivers/platform/surface/surface_aggregator_registry.c
+++ b/drivers/platform/surface/surface_aggregator_registry.c
@@ -74,6 +74,12 @@ static const struct software_node ssam_node_tmp_pprof = {
.parent = &ssam_node_root,
};

+/* Thermal sensors. */
+static const struct software_node ssam_node_tmp_sensors = {
+ .name = "ssam:01:03:01:00:02",
+ .parent = &ssam_node_root,
+};
+
/* Fan speed function. */
static const struct software_node ssam_node_fan_speed = {
.name = "ssam:01:05:01:01:01",
@@ -311,6 +317,7 @@ static const struct software_node *ssam_node_group_sp9[] = {
&ssam_node_bat_ac,
&ssam_node_bat_main,
&ssam_node_tmp_pprof,
+ &ssam_node_tmp_sensors,
&ssam_node_fan_speed,
&ssam_node_pos_tablet_switch,
&ssam_node_hid_kip_keyboard,
--
2.44.0


2024-03-30 11:58:40

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH 1/3] hwmon: Add thermal sensor driver for Surface Aggregator Module

On 3/30/24 04:24, Maximilian Luz wrote:
> Some of the newer Microsoft Surface devices (such as the Surface Book
> 3 and Pro 9) have thermal sensors connected via the Surface Aggregator
> Module (the embedded controller on those devices). Add a basic driver
> to read out the temperature values of those sensors.
>
> Link: https://github.com/linux-surface/surface-aggregator-module/issues/59
> Signed-off-by: Maximilian Luz <[email protected]>

I very much dislike the idea of having multiple drivers for hardware
monitoring on the same system. Please explain in detail why this and
the fan driver for the same system need separate drivers.

I'll also very much want to know if we will see submissions for separate
voltage, current, power, energy, humidity, and/or other hardware monitoring
entities for the same system later on.

Guenter


2024-03-30 12:59:14

by Maximilian Luz

[permalink] [raw]
Subject: Re: [PATCH 1/3] hwmon: Add thermal sensor driver for Surface Aggregator Module

On 3/30/24 12:58, Guenter Roeck wrote:
> On 3/30/24 04:24, Maximilian Luz wrote:
>> Some of the newer Microsoft Surface devices (such as the Surface Book
>> 3 and Pro 9) have thermal sensors connected via the Surface Aggregator
>> Module (the embedded controller on those devices). Add a basic driver
>> to read out the temperature values of those sensors.
>>
>> Link: https://github.com/linux-surface/surface-aggregator-module/issues/59
>> Signed-off-by: Maximilian Luz <[email protected]>
>
> I very much dislike the idea of having multiple drivers for hardware
> monitoring on the same system. Please explain in detail why this and
> the fan driver for the same system need separate drivers.
>
> I'll also very much want to know if we will see submissions for separate
> voltage, current, power, energy, humidity, and/or other hardware monitoring
> entities for the same system later on.

The Surface Aggregator EC is not really a single device, but rather a
collection of subsystems. For example, there's one for the battery, one
for thermal sensors, and a separate one for the fan. Not all subsystems
are present on all devices with that EC, so we have modeled them as
separate subdevices of the parent EC controller. This makes it quite a
bit easier to maintain. Especially, since I haven't found any reliable
way to auto-detect the active/supported subsystems.

For example: The Surface Book 3 has thermal sensors that can be accessed
via this driver as well as a fan. As far as I know, however, the fan
subsystem has been introduced later and the Surface Book 3 doesn't
support that yet. So there's (as far as I know) no fan-monitoring
support. Trying to access that will fail with a timeout. For that reason
(but not specifically for that device), we have introduced the split
into subystem devices, which are maunally registered in
surface_aggregator_registry.c based on what we know the device actually
supports.

Further, the devices created for these subsystems also act as a binding
mechanism to the subsystem, speficying the subsystem ID/category used
for making requests to it. For example, this driver probes for

SSAM_SDEV(TMP, SAM, 0x00, 0x02)

meaning it looks for a device of the TMP subsystem on the SAM target ID
(which can be seen as a bus number) with instance 0 and function 2. This
(in particular subsystem ID and target ID) are directly used when making
requests to the EC. So if we find out down the line that temperature
sensors can be accessed on target ID KIP in addition to SAM, it's as easy
as adding a new device match to the driver.

I believe that this would be more difficult if the driver is merged
together: Doing so would require us to figure out what's present and
what we can or should access inside of the driver (instead of via the
already established registry). So it would either require us to do a
certain amount of hardcoding and if/else branches or we would have to
introduce a bunch of device properties and a meta-device just to bundle
all monitoring-related subsystems together, and again use a bunch of
if/else branches... And in both cases, the direct subsystem binding
originally intended in the device structure of the Surface Aggregator
bus goes out of the window.

So, in my opinion at least, having separate smaller but specific drivers
and leaving that "logic" to the registry driver makes this much more
maintainable.

Regarding other monitoring entities: In short, I don't know. I am not
aware of any other (current/power/...) monitoring capabilities of the EC
right now, but I can't guarantee that MS won't introduce a new subsystem
for those down the line (or that there already is one and we just don't
know about it). At which point, it will quite likely only be supported
on some (probably newer) devices again.

Best regards,
Max

2024-04-08 14:15:08

by Hans de Goede

[permalink] [raw]
Subject: Re: [PATCH 2/3] hwmon: surface_temp: Add support for sensor names

Hi,

On 3/30/24 12:24 PM, Maximilian Luz wrote:
> From: Ivor Wanders <[email protected]>
>
> The thermal subsystem of the Surface Aggregator Module allows us to
> query the names of the respective thermal sensors. Forward those to
> userspace.
>
> Signed-off-by: Ivor Wanders <[email protected]>
> Co-developed-by: Maximilian Luz <[email protected]>
> Signed-off-by: Maximilian Luz <[email protected]>

Thanks, patch looks good to me:

Reviewed-by: Hans de Goede <[email protected]>

Regards,

Hans



> ---
> drivers/hwmon/surface_temp.c | 112 +++++++++++++++++++++++++++++------
> 1 file changed, 95 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/hwmon/surface_temp.c b/drivers/hwmon/surface_temp.c
> index 48c3e826713f6..7a2e1f638336c 100644
> --- a/drivers/hwmon/surface_temp.c
> +++ b/drivers/hwmon/surface_temp.c
> @@ -17,6 +17,27 @@
>
> /* -- SAM interface. -------------------------------------------------------- */
>
> +/*
> + * Available sensors are indicated by a 16-bit bitfield, where a 1 marks the
> + * presence of a sensor. So we have at most 16 possible sensors/channels.
> + */
> +#define SSAM_TMP_SENSOR_MAX_COUNT 16
> +
> +/*
> + * All names observed so far are 6 characters long, but there's only
> + * zeros after the name, so perhaps they can be longer. This number reflects
> + * the maximum zero-padded space observed in the returned buffer.
> + */
> +#define SSAM_TMP_SENSOR_NAME_LENGTH 18
> +
> +struct ssam_tmp_get_name_rsp {
> + __le16 unknown1;
> + char unknown2;
> + char name[SSAM_TMP_SENSOR_NAME_LENGTH];
> +} __packed;
> +
> +static_assert(sizeof(struct ssam_tmp_get_name_rsp) == 21);
> +
> SSAM_DEFINE_SYNC_REQUEST_CL_R(__ssam_tmp_get_available_sensors, __le16, {
> .target_category = SSAM_SSH_TC_TMP,
> .command_id = 0x04,
> @@ -27,6 +48,11 @@ SSAM_DEFINE_SYNC_REQUEST_MD_R(__ssam_tmp_get_temperature, __le16, {
> .command_id = 0x01,
> });
>
> +SSAM_DEFINE_SYNC_REQUEST_MD_R(__ssam_tmp_get_name, struct ssam_tmp_get_name_rsp, {
> + .target_category = SSAM_SSH_TC_TMP,
> + .command_id = 0x0e,
> +});
> +
> static int ssam_tmp_get_available_sensors(struct ssam_device *sdev, s16 *sensors)
> {
> __le16 sensors_le;
> @@ -54,12 +80,37 @@ static int ssam_tmp_get_temperature(struct ssam_device *sdev, u8 iid, long *temp
> return 0;
> }
>
> +static int ssam_tmp_get_name(struct ssam_device *sdev, u8 iid, char *buf, size_t buf_len)
> +{
> + struct ssam_tmp_get_name_rsp name_rsp;
> + int status;
> +
> + status = __ssam_tmp_get_name(sdev->ctrl, sdev->uid.target, iid, &name_rsp);
> + if (status)
> + return status;
> +
> + /*
> + * This should not fail unless the name in the returned struct is not
> + * null-terminated or someone changed something in the struct
> + * definitions above, since our buffer and struct have the same
> + * capacity by design. So if this fails blow this up with a warning.
> + * Since the more likely cause is that the returned string isn't
> + * null-terminated, we might have received garbage (as opposed to just
> + * an incomplete string), so also fail the function.
> + */
> + status = strscpy(buf, name_rsp.name, buf_len);
> + WARN_ON(status < 0);
> +
> + return status < 0 ? status : 0;
> +}
> +
>
> /* -- Driver.---------------------------------------------------------------- */
>
> struct ssam_temp {
> struct ssam_device *sdev;
> s16 sensors;
> + char names[SSAM_TMP_SENSOR_MAX_COUNT][SSAM_TMP_SENSOR_NAME_LENGTH];
> };
>
> static umode_t ssam_temp_hwmon_is_visible(const void *data,
> @@ -83,33 +134,47 @@ static int ssam_temp_hwmon_read(struct device *dev,
> return ssam_tmp_get_temperature(ssam_temp->sdev, channel + 1, value);
> }
>
> +static int ssam_temp_hwmon_read_string(struct device *dev,
> + enum hwmon_sensor_types type,
> + u32 attr, int channel, const char **str)
> +{
> + const struct ssam_temp *ssam_temp = dev_get_drvdata(dev);
> +
> + *str = ssam_temp->names[channel];
> + return 0;
> +}
> +
> static const struct hwmon_channel_info * const ssam_temp_hwmon_info[] = {
> HWMON_CHANNEL_INFO(chip,
> HWMON_C_REGISTER_TZ),
> - /* We have at most 16 thermal sensor channels. */
> + /*
> + * We have at most SSAM_TMP_SENSOR_MAX_COUNT = 16 thermal sensor
> + * channels.
> + */
> HWMON_CHANNEL_INFO(temp,
> - HWMON_T_INPUT,
> - HWMON_T_INPUT,
> - HWMON_T_INPUT,
> - HWMON_T_INPUT,
> - HWMON_T_INPUT,
> - HWMON_T_INPUT,
> - HWMON_T_INPUT,
> - HWMON_T_INPUT,
> - HWMON_T_INPUT,
> - HWMON_T_INPUT,
> - HWMON_T_INPUT,
> - HWMON_T_INPUT,
> - HWMON_T_INPUT,
> - HWMON_T_INPUT,
> - HWMON_T_INPUT,
> - HWMON_T_INPUT),
> + HWMON_T_INPUT | HWMON_T_LABEL,
> + HWMON_T_INPUT | HWMON_T_LABEL,
> + HWMON_T_INPUT | HWMON_T_LABEL,
> + HWMON_T_INPUT | HWMON_T_LABEL,
> + HWMON_T_INPUT | HWMON_T_LABEL,
> + HWMON_T_INPUT | HWMON_T_LABEL,
> + HWMON_T_INPUT | HWMON_T_LABEL,
> + HWMON_T_INPUT | HWMON_T_LABEL,
> + HWMON_T_INPUT | HWMON_T_LABEL,
> + HWMON_T_INPUT | HWMON_T_LABEL,
> + HWMON_T_INPUT | HWMON_T_LABEL,
> + HWMON_T_INPUT | HWMON_T_LABEL,
> + HWMON_T_INPUT | HWMON_T_LABEL,
> + HWMON_T_INPUT | HWMON_T_LABEL,
> + HWMON_T_INPUT | HWMON_T_LABEL,
> + HWMON_T_INPUT | HWMON_T_LABEL),
> NULL
> };
>
> static const struct hwmon_ops ssam_temp_hwmon_ops = {
> .is_visible = ssam_temp_hwmon_is_visible,
> .read = ssam_temp_hwmon_read,
> + .read_string = ssam_temp_hwmon_read_string,
> };
>
> static const struct hwmon_chip_info ssam_temp_hwmon_chip_info = {
> @@ -122,6 +187,7 @@ static int ssam_temp_probe(struct ssam_device *sdev)
> struct ssam_temp *ssam_temp;
> struct device *hwmon_dev;
> s16 sensors;
> + int channel;
> int status;
>
> status = ssam_tmp_get_available_sensors(sdev, &sensors);
> @@ -135,6 +201,18 @@ static int ssam_temp_probe(struct ssam_device *sdev)
> ssam_temp->sdev = sdev;
> ssam_temp->sensors = sensors;
>
> + /* Retrieve the name for each available sensor. */
> + for (channel = 0; channel < SSAM_TMP_SENSOR_MAX_COUNT; channel++) {
> + if (!(sensors & BIT(channel)))
> + continue;
> +
> + status = ssam_tmp_get_name(sdev, channel + 1,
> + ssam_temp->names[channel],
> + SSAM_TMP_SENSOR_NAME_LENGTH);
> + if (status)
> + return status;
> + }
> +
> hwmon_dev = devm_hwmon_device_register_with_info(&sdev->dev,
> "surface_thermal", ssam_temp, &ssam_temp_hwmon_chip_info,
> NULL);


2024-04-08 14:17:44

by Hans de Goede

[permalink] [raw]
Subject: Re: [PATCH 1/3] hwmon: Add thermal sensor driver for Surface Aggregator Module

Hi,

On 3/30/24 1:58 PM, Maximilian Luz wrote:
> On 3/30/24 12:58, Guenter Roeck wrote:
>> On 3/30/24 04:24, Maximilian Luz wrote:
>>> Some of the newer Microsoft Surface devices (such as the Surface Book
>>> 3 and Pro 9) have thermal sensors connected via the Surface Aggregator
>>> Module (the embedded controller on those devices). Add a basic driver
>>> to read out the temperature values of those sensors.
>>>
>>> Link: https://github.com/linux-surface/surface-aggregator-module/issues/59
>>> Signed-off-by: Maximilian Luz <[email protected]>
>>
>> I very much dislike the idea of having multiple drivers for hardware
>> monitoring on the same system. Please explain in detail why this and
>> the fan driver for the same system need separate drivers.
>>
>> I'll also very much want to know if we will see submissions for separate
>> voltage, current, power, energy, humidity, and/or other hardware monitoring
>> entities for the same system later on.
>
> The Surface Aggregator EC is not really a single device, but rather a
> collection of subsystems. For example, there's one for the battery, one
> for thermal sensors, and a separate one for the fan. Not all subsystems
> are present on all devices with that EC, so we have modeled them as
> separate subdevices of the parent EC controller. This makes it quite a
> bit easier to maintain. Especially, since I haven't found any reliable
> way to auto-detect the active/supported subsystems.
>
> For example: The Surface Book 3 has thermal sensors that can be accessed
> via this driver as well as a fan. As far as I know, however, the fan
> subsystem has been introduced later and the Surface Book 3 doesn't
> support that yet. So there's (as far as I know) no fan-monitoring
> support. Trying to access that will fail with a timeout. For that reason
> (but not specifically for that device), we have introduced the split
> into subystem devices, which are maunally registered in
> surface_aggregator_registry.c based on what we know the device actually
> supports.
>
> Further, the devices created for these subsystems also act as a binding
> mechanism to the subsystem, speficying the subsystem ID/category used
> for making requests to it. For example, this driver probes for
>
>     SSAM_SDEV(TMP, SAM, 0x00, 0x02)
>
> meaning it looks for a device of the TMP subsystem on the SAM target ID
> (which can be seen as a bus number) with instance 0 and function 2. This
> (in particular subsystem ID and target ID) are directly used when making
> requests to the EC. So if we find out down the line that temperature
> sensors can be accessed on target ID KIP in addition to SAM, it's as easy
> as adding a new device match to the driver.

<snip>

Right this is all working as designed, it is just that Microsoft has
gone a pretty custom route with the Surface devices.

Guenter another way of looking at this is if there were 2 ACPI devices
describing the fan vs the temperature monitoring capabilities that too
would result in 2 drivers even though the underlying ACPI AML code
might end up talking to the same monitoring-IC in the end.

I'll go and merge patch 3/3 of this series. I'll leave merging
1/3 and 2/3 up to the hwmon subsystem of course.

Regards,

Hans



2024-04-08 14:18:48

by Hans de Goede

[permalink] [raw]
Subject: Re: [PATCH 1/3] hwmon: Add thermal sensor driver for Surface Aggregator Module

Hi,

On 3/30/24 12:24 PM, Maximilian Luz wrote:
> Some of the newer Microsoft Surface devices (such as the Surface Book
> 3 and Pro 9) have thermal sensors connected via the Surface Aggregator
> Module (the embedded controller on those devices). Add a basic driver
> to read out the temperature values of those sensors.
>
> Link: https://github.com/linux-surface/surface-aggregator-module/issues/59
> Signed-off-by: Maximilian Luz <[email protected]>

Thanks, patch looks good to me:

Reviewed-by: Hans de Goede <[email protected]>

Regards,

Hans



> ---
> MAINTAINERS | 6 ++
> drivers/hwmon/Kconfig | 10 +++
> drivers/hwmon/Makefile | 1 +
> drivers/hwmon/surface_temp.c | 165 +++++++++++++++++++++++++++++++++++
> 4 files changed, 182 insertions(+)
> create mode 100644 drivers/hwmon/surface_temp.c
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index d36c19c1bf811..bc5bc418ed479 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -14738,6 +14738,12 @@ S: Maintained
> F: Documentation/hwmon/surface_fan.rst
> F: drivers/hwmon/surface_fan.c
>
> +MICROSOFT SURFACE SENSOR THERMAL DRIVER
> +M: Maximilian Luz <[email protected]>
> +L: [email protected]
> +S: Maintained
> +F: drivers/hwmon/surface_temp.c
> +
> MICROSOFT SURFACE GPE LID SUPPORT DRIVER
> M: Maximilian Luz <[email protected]>
> L: [email protected]
> diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
> index 83945397b6eb1..338ef73c96a3a 100644
> --- a/drivers/hwmon/Kconfig
> +++ b/drivers/hwmon/Kconfig
> @@ -2070,6 +2070,16 @@ config SENSORS_SURFACE_FAN
>
> Select M or Y here, if you want to be able to read the fan's speed.
>
> +config SENSORS_SURFACE_TEMP
> + tristate "Microsoft Surface Thermal Sensor Driver"
> + depends on SURFACE_AGGREGATOR
> + help
> + Driver for monitoring thermal sensors connected via the Surface
> + Aggregator Module (embedded controller) on Microsoft Surface devices.
> +
> + This driver can also be built as a module. If so, the module
> + will be called surface_temp.
> +
> config SENSORS_ADC128D818
> tristate "Texas Instruments ADC128D818"
> depends on I2C
> diff --git a/drivers/hwmon/Makefile b/drivers/hwmon/Makefile
> index 5c31808f6378d..de8bc99719e63 100644
> --- a/drivers/hwmon/Makefile
> +++ b/drivers/hwmon/Makefile
> @@ -208,6 +208,7 @@ obj-$(CONFIG_SENSORS_SMSC47M192)+= smsc47m192.o
> obj-$(CONFIG_SENSORS_SPARX5) += sparx5-temp.o
> obj-$(CONFIG_SENSORS_STTS751) += stts751.o
> obj-$(CONFIG_SENSORS_SURFACE_FAN)+= surface_fan.o
> +obj-$(CONFIG_SENSORS_SURFACE_TEMP)+= surface_temp.o
> obj-$(CONFIG_SENSORS_SY7636A) += sy7636a-hwmon.o
> obj-$(CONFIG_SENSORS_AMC6821) += amc6821.o
> obj-$(CONFIG_SENSORS_TC74) += tc74.o
> diff --git a/drivers/hwmon/surface_temp.c b/drivers/hwmon/surface_temp.c
> new file mode 100644
> index 0000000000000..48c3e826713f6
> --- /dev/null
> +++ b/drivers/hwmon/surface_temp.c
> @@ -0,0 +1,165 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +/*
> + * Thermal sensor subsystem driver for Surface System Aggregator Module (SSAM).
> + *
> + * Copyright (C) 2022-2023 Maximilian Luz <[email protected]>
> + */
> +
> +#include <linux/bitops.h>
> +#include <linux/hwmon.h>
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/types.h>
> +
> +#include <linux/surface_aggregator/controller.h>
> +#include <linux/surface_aggregator/device.h>
> +
> +
> +/* -- SAM interface. -------------------------------------------------------- */
> +
> +SSAM_DEFINE_SYNC_REQUEST_CL_R(__ssam_tmp_get_available_sensors, __le16, {
> + .target_category = SSAM_SSH_TC_TMP,
> + .command_id = 0x04,
> +});
> +
> +SSAM_DEFINE_SYNC_REQUEST_MD_R(__ssam_tmp_get_temperature, __le16, {
> + .target_category = SSAM_SSH_TC_TMP,
> + .command_id = 0x01,
> +});
> +
> +static int ssam_tmp_get_available_sensors(struct ssam_device *sdev, s16 *sensors)
> +{
> + __le16 sensors_le;
> + int status;
> +
> + status = __ssam_tmp_get_available_sensors(sdev, &sensors_le);
> + if (status)
> + return status;
> +
> + *sensors = le16_to_cpu(sensors_le);
> + return 0;
> +}
> +
> +static int ssam_tmp_get_temperature(struct ssam_device *sdev, u8 iid, long *temperature)
> +{
> + __le16 temp_le;
> + int status;
> +
> + status = __ssam_tmp_get_temperature(sdev->ctrl, sdev->uid.target, iid, &temp_le);
> + if (status)
> + return status;
> +
> + /* Convert 1/10 °K to 1/1000 °C */
> + *temperature = (le16_to_cpu(temp_le) - 2731) * 100L;
> + return 0;
> +}
> +
> +
> +/* -- Driver.---------------------------------------------------------------- */
> +
> +struct ssam_temp {
> + struct ssam_device *sdev;
> + s16 sensors;
> +};
> +
> +static umode_t ssam_temp_hwmon_is_visible(const void *data,
> + enum hwmon_sensor_types type,
> + u32 attr, int channel)
> +{
> + const struct ssam_temp *ssam_temp = data;
> +
> + if (!(ssam_temp->sensors & BIT(channel)))
> + return 0;
> +
> + return 0444;
> +}
> +
> +static int ssam_temp_hwmon_read(struct device *dev,
> + enum hwmon_sensor_types type,
> + u32 attr, int channel, long *value)
> +{
> + const struct ssam_temp *ssam_temp = dev_get_drvdata(dev);
> +
> + return ssam_tmp_get_temperature(ssam_temp->sdev, channel + 1, value);
> +}
> +
> +static const struct hwmon_channel_info * const ssam_temp_hwmon_info[] = {
> + HWMON_CHANNEL_INFO(chip,
> + HWMON_C_REGISTER_TZ),
> + /* We have at most 16 thermal sensor channels. */
> + HWMON_CHANNEL_INFO(temp,
> + HWMON_T_INPUT,
> + HWMON_T_INPUT,
> + HWMON_T_INPUT,
> + HWMON_T_INPUT,
> + HWMON_T_INPUT,
> + HWMON_T_INPUT,
> + HWMON_T_INPUT,
> + HWMON_T_INPUT,
> + HWMON_T_INPUT,
> + HWMON_T_INPUT,
> + HWMON_T_INPUT,
> + HWMON_T_INPUT,
> + HWMON_T_INPUT,
> + HWMON_T_INPUT,
> + HWMON_T_INPUT,
> + HWMON_T_INPUT),
> + NULL
> +};
> +
> +static const struct hwmon_ops ssam_temp_hwmon_ops = {
> + .is_visible = ssam_temp_hwmon_is_visible,
> + .read = ssam_temp_hwmon_read,
> +};
> +
> +static const struct hwmon_chip_info ssam_temp_hwmon_chip_info = {
> + .ops = &ssam_temp_hwmon_ops,
> + .info = ssam_temp_hwmon_info,
> +};
> +
> +static int ssam_temp_probe(struct ssam_device *sdev)
> +{
> + struct ssam_temp *ssam_temp;
> + struct device *hwmon_dev;
> + s16 sensors;
> + int status;
> +
> + status = ssam_tmp_get_available_sensors(sdev, &sensors);
> + if (status)
> + return status;
> +
> + ssam_temp = devm_kzalloc(&sdev->dev, sizeof(*ssam_temp), GFP_KERNEL);
> + if (!ssam_temp)
> + return -ENOMEM;
> +
> + ssam_temp->sdev = sdev;
> + ssam_temp->sensors = sensors;
> +
> + hwmon_dev = devm_hwmon_device_register_with_info(&sdev->dev,
> + "surface_thermal", ssam_temp, &ssam_temp_hwmon_chip_info,
> + NULL);
> + if (IS_ERR(hwmon_dev))
> + return PTR_ERR(hwmon_dev);
> +
> + return 0;
> +}
> +
> +static const struct ssam_device_id ssam_temp_match[] = {
> + { SSAM_SDEV(TMP, SAM, 0x00, 0x02) },
> + { },
> +};
> +MODULE_DEVICE_TABLE(ssam, ssam_temp_match);
> +
> +static struct ssam_device_driver ssam_temp = {
> + .probe = ssam_temp_probe,
> + .match_table = ssam_temp_match,
> + .driver = {
> + .name = "surface_temp",
> + .probe_type = PROBE_PREFER_ASYNCHRONOUS,
> + },
> +};
> +module_ssam_device_driver(ssam_temp);
> +
> +MODULE_AUTHOR("Maximilian Luz <[email protected]>");
> +MODULE_DESCRIPTION("Thermal sensor subsystem driver for Surface System Aggregator Module");
> +MODULE_LICENSE("GPL");


2024-04-08 15:23:25

by Hans de Goede

[permalink] [raw]
Subject: Re: [PATCH 3/3] platform/surface: aggregator_registry: Add support for thermal sensors on the Surface Pro 9

Hi,

On 3/30/24 12:24 PM, Maximilian Luz wrote:
> The Surface Pro 9 has thermal sensors connected via the Surface
> Aggregator Module. Add a device node to support those.
>
> Signed-off-by: Maximilian Luz <[email protected]>

Thank you for your patch, I've applied this patch to my review-hans
branch:
https://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86.git/log/?h=review-hans

Note I had to apply this manually do to a conflict with:
3427c443a6dc platform/surface: platform_profile: add fan profile switching

which I merged into pdx86/for-next after this series was send out.

Please double check I resolved the conflict correct.

Once I've run some tests on this branch the patches there will be
added to the platform-drivers-x86/for-next branch and eventually
will be included in the pdx86 pull-request to Linus for the next
merge-window.

Regards,

Hans


> ---
> drivers/platform/surface/surface_aggregator_registry.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/drivers/platform/surface/surface_aggregator_registry.c b/drivers/platform/surface/surface_aggregator_registry.c
> index 035d6b4105cd6..c38203c00a705 100644
> --- a/drivers/platform/surface/surface_aggregator_registry.c
> +++ b/drivers/platform/surface/surface_aggregator_registry.c
> @@ -74,6 +74,12 @@ static const struct software_node ssam_node_tmp_pprof = {
> .parent = &ssam_node_root,
> };
>
> +/* Thermal sensors. */
> +static const struct software_node ssam_node_tmp_sensors = {
> + .name = "ssam:01:03:01:00:02",
> + .parent = &ssam_node_root,
> +};
> +
> /* Fan speed function. */
> static const struct software_node ssam_node_fan_speed = {
> .name = "ssam:01:05:01:01:01",
> @@ -311,6 +317,7 @@ static const struct software_node *ssam_node_group_sp9[] = {
> &ssam_node_bat_ac,
> &ssam_node_bat_main,
> &ssam_node_tmp_pprof,
> + &ssam_node_tmp_sensors,
> &ssam_node_fan_speed,
> &ssam_node_pos_tablet_switch,
> &ssam_node_hid_kip_keyboard,


2024-04-16 13:15:24

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH 1/3] hwmon: Add thermal sensor driver for Surface Aggregator Module

On Sat, Mar 30, 2024 at 12:24:00PM +0100, Maximilian Luz wrote:
> Some of the newer Microsoft Surface devices (such as the Surface Book
> 3 and Pro 9) have thermal sensors connected via the Surface Aggregator
> Module (the embedded controller on those devices). Add a basic driver
> to read out the temperature values of those sensors.
>
> Link: https://github.com/linux-surface/surface-aggregator-module/issues/59
> Signed-off-by: Maximilian Luz <[email protected]>
> Reviewed-by: Hans de Goede <[email protected]>

CHECK: Please don't use multiple blank lines
#189: FILE: drivers/hwmon/surface_temp.c:17:
+
+

CHECK: Please don't use multiple blank lines
#229: FILE: drivers/hwmon/surface_temp.c:57:
+
+

CHECK: Alignment should match open parenthesis
#311: FILE: drivers/hwmon/surface_temp.c:139:
+ hwmon_dev = devm_hwmon_device_register_with_info(&sdev->dev,
+ "surface_thermal", ssam_temp, &ssam_temp_hwmon_chip_info,

Please fix.

Guenter

2024-04-16 13:27:35

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH 1/3] hwmon: Add thermal sensor driver for Surface Aggregator Module

On Sat, Mar 30, 2024 at 12:24:00PM +0100, Maximilian Luz wrote:
> Some of the newer Microsoft Surface devices (such as the Surface Book
> 3 and Pro 9) have thermal sensors connected via the Surface Aggregator
> Module (the embedded controller on those devices). Add a basic driver
> to read out the temperature values of those sensors.
>
> Link: https://github.com/linux-surface/surface-aggregator-module/issues/59
> Signed-off-by: Maximilian Luz <[email protected]>
> Reviewed-by: Hans de Goede <[email protected]>
> ---
[ ... ]
> + hwmon_dev = devm_hwmon_device_register_with_info(&sdev->dev,
> + "surface_thermal", ssam_temp, &ssam_temp_hwmon_chip_info,
> + NULL);
> + if (IS_ERR(hwmon_dev))
> + return PTR_ERR(hwmon_dev);
> +
> + return 0;

return PTR_ERR_OR_ZERO(hwmon_dev);

2024-04-16 13:35:09

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH 2/3] hwmon: surface_temp: Add support for sensor names

On Sat, Mar 30, 2024 at 12:24:01PM +0100, Maximilian Luz wrote:
> From: Ivor Wanders <[email protected]>
>
> The thermal subsystem of the Surface Aggregator Module allows us to
> query the names of the respective thermal sensors. Forward those to
> userspace.
>
> Signed-off-by: Ivor Wanders <[email protected]>
> Co-developed-by: Maximilian Luz <[email protected]>
> Signed-off-by: Maximilian Luz <[email protected]>
> Reviewed-by: Hans de Goede <[email protected]>
> ---
> drivers/hwmon/surface_temp.c | 112 +++++++++++++++++++++++++++++------
> 1 file changed, 95 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/hwmon/surface_temp.c b/drivers/hwmon/surface_temp.c
> index 48c3e826713f6..7a2e1f638336c 100644
> --- a/drivers/hwmon/surface_temp.c
> +++ b/drivers/hwmon/surface_temp.c
> @@ -17,6 +17,27 @@
>
> /* -- SAM interface. -------------------------------------------------------- */
>
> +/*
> + * Available sensors are indicated by a 16-bit bitfield, where a 1 marks the
> + * presence of a sensor. So we have at most 16 possible sensors/channels.
> + */
> +#define SSAM_TMP_SENSOR_MAX_COUNT 16

#define<space>DEFINITION<tab>value

> +
> +/*
> + * All names observed so far are 6 characters long, but there's only
> + * zeros after the name, so perhaps they can be longer. This number reflects
> + * the maximum zero-padded space observed in the returned buffer.
> + */
> +#define SSAM_TMP_SENSOR_NAME_LENGTH 18
> +
> +struct ssam_tmp_get_name_rsp {
> + __le16 unknown1;
> + char unknown2;
> + char name[SSAM_TMP_SENSOR_NAME_LENGTH];
> +} __packed;
> +
> +static_assert(sizeof(struct ssam_tmp_get_name_rsp) == 21);
> +
> SSAM_DEFINE_SYNC_REQUEST_CL_R(__ssam_tmp_get_available_sensors, __le16, {
> .target_category = SSAM_SSH_TC_TMP,
> .command_id = 0x04,
> @@ -27,6 +48,11 @@ SSAM_DEFINE_SYNC_REQUEST_MD_R(__ssam_tmp_get_temperature, __le16, {
> .command_id = 0x01,
> });
>
> +SSAM_DEFINE_SYNC_REQUEST_MD_R(__ssam_tmp_get_name, struct ssam_tmp_get_name_rsp, {
> + .target_category = SSAM_SSH_TC_TMP,
> + .command_id = 0x0e,
> +});
> +
> static int ssam_tmp_get_available_sensors(struct ssam_device *sdev, s16 *sensors)
> {
> __le16 sensors_le;
> @@ -54,12 +80,37 @@ static int ssam_tmp_get_temperature(struct ssam_device *sdev, u8 iid, long *temp
> return 0;
> }
>
> +static int ssam_tmp_get_name(struct ssam_device *sdev, u8 iid, char *buf, size_t buf_len)
> +{
> + struct ssam_tmp_get_name_rsp name_rsp;
> + int status;
> +
> + status = __ssam_tmp_get_name(sdev->ctrl, sdev->uid.target, iid, &name_rsp);
> + if (status)
> + return status;
> +
> + /*
> + * This should not fail unless the name in the returned struct is not
> + * null-terminated or someone changed something in the struct
> + * definitions above, since our buffer and struct have the same
> + * capacity by design. So if this fails blow this up with a warning.
> + * Since the more likely cause is that the returned string isn't
> + * null-terminated, we might have received garbage (as opposed to just
> + * an incomplete string), so also fail the function.
> + */
> + status = strscpy(buf, name_rsp.name, buf_len);
> + WARN_ON(status < 0);

Not acceptable. From include/asm-generic/bug.h:

* Do not use these macros when checking for invalid external inputs
* (e.g. invalid system call arguments, or invalid data coming from
* network/devices), and on transient conditions like ENOMEM or EAGAIN.
* These macros should be used for recoverable kernel issues only.

> +
> + return status < 0 ? status : 0;
> +}
> +
>
> /* -- Driver.---------------------------------------------------------------- */
>
> struct ssam_temp {
> struct ssam_device *sdev;
> s16 sensors;
> + char names[SSAM_TMP_SENSOR_MAX_COUNT][SSAM_TMP_SENSOR_NAME_LENGTH];
> };
>
> static umode_t ssam_temp_hwmon_is_visible(const void *data,
> @@ -83,33 +134,47 @@ static int ssam_temp_hwmon_read(struct device *dev,
> return ssam_tmp_get_temperature(ssam_temp->sdev, channel + 1, value);
> }
>
> +static int ssam_temp_hwmon_read_string(struct device *dev,
> + enum hwmon_sensor_types type,
> + u32 attr, int channel, const char **str)
> +{
> + const struct ssam_temp *ssam_temp = dev_get_drvdata(dev);
> +
> + *str = ssam_temp->names[channel];
> + return 0;
> +}
> +
> static const struct hwmon_channel_info * const ssam_temp_hwmon_info[] = {
> HWMON_CHANNEL_INFO(chip,
> HWMON_C_REGISTER_TZ),
> - /* We have at most 16 thermal sensor channels. */
> + /*
> + * We have at most SSAM_TMP_SENSOR_MAX_COUNT = 16 thermal sensor
> + * channels.
> + */

Pointless comment. Already explained above, and perfect example showing
why it has no value separating this and the previous patch.

> HWMON_CHANNEL_INFO(temp,
> - HWMON_T_INPUT,
> - HWMON_T_INPUT,
> - HWMON_T_INPUT,
> - HWMON_T_INPUT,
> - HWMON_T_INPUT,
> - HWMON_T_INPUT,
> - HWMON_T_INPUT,
> - HWMON_T_INPUT,
> - HWMON_T_INPUT,
> - HWMON_T_INPUT,
> - HWMON_T_INPUT,
> - HWMON_T_INPUT,
> - HWMON_T_INPUT,
> - HWMON_T_INPUT,
> - HWMON_T_INPUT,
> - HWMON_T_INPUT),
> + HWMON_T_INPUT | HWMON_T_LABEL,
> + HWMON_T_INPUT | HWMON_T_LABEL,
> + HWMON_T_INPUT | HWMON_T_LABEL,
> + HWMON_T_INPUT | HWMON_T_LABEL,
> + HWMON_T_INPUT | HWMON_T_LABEL,
> + HWMON_T_INPUT | HWMON_T_LABEL,
> + HWMON_T_INPUT | HWMON_T_LABEL,
> + HWMON_T_INPUT | HWMON_T_LABEL,
> + HWMON_T_INPUT | HWMON_T_LABEL,
> + HWMON_T_INPUT | HWMON_T_LABEL,
> + HWMON_T_INPUT | HWMON_T_LABEL,
> + HWMON_T_INPUT | HWMON_T_LABEL,
> + HWMON_T_INPUT | HWMON_T_LABEL,
> + HWMON_T_INPUT | HWMON_T_LABEL,
> + HWMON_T_INPUT | HWMON_T_LABEL,
> + HWMON_T_INPUT | HWMON_T_LABEL),

Another example. Why have me review the previous patch
just to change the code here ?

[ ... ]

> + /* Retrieve the name for each available sensor. */
> + for (channel = 0; channel < SSAM_TMP_SENSOR_MAX_COUNT; channel++) {
> + if (!(sensors & BIT(channel)))
> + continue;
> +
> + status = ssam_tmp_get_name(sdev, channel + 1,
> + ssam_temp->names[channel],
> + SSAM_TMP_SENSOR_NAME_LENGTH);
> + if (status)
> + return status;

Your call to fail probe in this case just because it can not find
a sensor name. I personally find that quite aggressive.

Guenter

2024-04-16 18:13:16

by Maximilian Luz

[permalink] [raw]
Subject: Re: [PATCH 1/3] hwmon: Add thermal sensor driver for Surface Aggregator Module

On 4/16/24 3:27 PM, Guenter Roeck wrote:
> On Sat, Mar 30, 2024 at 12:24:00PM +0100, Maximilian Luz wrote:
>> Some of the newer Microsoft Surface devices (such as the Surface Book
>> 3 and Pro 9) have thermal sensors connected via the Surface Aggregator
>> Module (the embedded controller on those devices). Add a basic driver
>> to read out the temperature values of those sensors.
>>
>> Link: https://github.com/linux-surface/surface-aggregator-module/issues/59
>> Signed-off-by: Maximilian Luz <[email protected]>
>> Reviewed-by: Hans de Goede <[email protected]>
>> ---
> [ ... ]
>> + hwmon_dev = devm_hwmon_device_register_with_info(&sdev->dev,
>> + "surface_thermal", ssam_temp, &ssam_temp_hwmon_chip_info,
>> + NULL);
>> + if (IS_ERR(hwmon_dev))
>> + return PTR_ERR(hwmon_dev);
>> +
>> + return 0;
>
> return PTR_ERR_OR_ZERO(hwmon_dev);

ACK. Will fix this and the blank lines.

Thanks,
Max

2024-04-16 19:00:21

by Maximilian Luz

[permalink] [raw]
Subject: Re: [PATCH 2/3] hwmon: surface_temp: Add support for sensor names

On 4/16/24 3:30 PM, Guenter Roeck wrote:
> On Sat, Mar 30, 2024 at 12:24:01PM +0100, Maximilian Luz wrote:

[...]

>> +static int ssam_tmp_get_name(struct ssam_device *sdev, u8 iid, char *buf, size_t buf_len)
>> +{
>> + struct ssam_tmp_get_name_rsp name_rsp;
>> + int status;
>> +
>> + status = __ssam_tmp_get_name(sdev->ctrl, sdev->uid.target, iid, &name_rsp);
>> + if (status)
>> + return status;
>> +
>> + /*
>> + * This should not fail unless the name in the returned struct is not
>> + * null-terminated or someone changed something in the struct
>> + * definitions above, since our buffer and struct have the same
>> + * capacity by design. So if this fails blow this up with a warning.
>> + * Since the more likely cause is that the returned string isn't
>> + * null-terminated, we might have received garbage (as opposed to just
>> + * an incomplete string), so also fail the function.
>> + */
>> + status = strscpy(buf, name_rsp.name, buf_len);
>> + WARN_ON(status < 0);
>
> Not acceptable. From include/asm-generic/bug.h:
>
> * Do not use these macros when checking for invalid external inputs
> * (e.g. invalid system call arguments, or invalid data coming from
> * network/devices), and on transient conditions like ENOMEM or EAGAIN.
> * These macros should be used for recoverable kernel issues only.
>

Hmm, I always interpreted that as "do not use for checking user-defined
input", which this is not.

The reason I added/requested it here was to check for "bugs" in how we
think the interface behaves (and our definitions related to it) as the
interface was reverse-engineered. Generally, when this fails I expect
that we made some mistake in our code (or the things we assume about the
interface), which likely causes us to interpret the received data as
"garbage" (and not the EC sending corrupted data, which it is generally
not due to CRC checking and validation in the SAM driver). Hence, I
personally would prefer if this blows up in a big warning with a trace
attached to it, so that an end-user can easily report this to us and
that we can appropriately deal with it. As opposed to some one-line
error message that will likely get overlooked or not taken as seriously.

If you still insist, I could change that to a dev_err() message. Or
maybe make the comment a bit clearer.

>> +
>> + return status < 0 ? status : 0;
>> +}
>> +
>>
>> /* -- Driver.---------------------------------------------------------------- */
>>
>> struct ssam_temp {
>> struct ssam_device *sdev;
>> s16 sensors;
>> + char names[SSAM_TMP_SENSOR_MAX_COUNT][SSAM_TMP_SENSOR_NAME_LENGTH];
>> };
>>
>> static umode_t ssam_temp_hwmon_is_visible(const void *data,
>> @@ -83,33 +134,47 @@ static int ssam_temp_hwmon_read(struct device *dev,
>> return ssam_tmp_get_temperature(ssam_temp->sdev, channel + 1, value);
>> }
>>
>> +static int ssam_temp_hwmon_read_string(struct device *dev,
>> + enum hwmon_sensor_types type,
>> + u32 attr, int channel, const char **str)
>> +{
>> + const struct ssam_temp *ssam_temp = dev_get_drvdata(dev);
>> +
>> + *str = ssam_temp->names[channel];
>> + return 0;
>> +}
>> +
>> static const struct hwmon_channel_info * const ssam_temp_hwmon_info[] = {
>> HWMON_CHANNEL_INFO(chip,
>> HWMON_C_REGISTER_TZ),
>> - /* We have at most 16 thermal sensor channels. */
>> + /*
>> + * We have at most SSAM_TMP_SENSOR_MAX_COUNT = 16 thermal sensor
>> + * channels.
>> + */
>
> Pointless comment. Already explained above, and perfect example showing
> why it has no value separating this and the previous patch.

I can remove the comment.

The reason for it being two separate patches is to retain proper
attribution. I am sorry that this has created more work for you.

In short, Ivor reverse-engineered the interface calls to get the sensor
names and wrote the vast majority of this patch (I only changed some
smaller things and gave advice, hence the co-developed-by). Therefore I
wanted to give proper attribution to Ivor for his work and not squash it
into a single patch.

>> HWMON_CHANNEL_INFO(temp,
>> - HWMON_T_INPUT,
>> - HWMON_T_INPUT,
>> - HWMON_T_INPUT,
>> - HWMON_T_INPUT,
>> - HWMON_T_INPUT,
>> - HWMON_T_INPUT,
>> - HWMON_T_INPUT,
>> - HWMON_T_INPUT,
>> - HWMON_T_INPUT,
>> - HWMON_T_INPUT,
>> - HWMON_T_INPUT,
>> - HWMON_T_INPUT,
>> - HWMON_T_INPUT,
>> - HWMON_T_INPUT,
>> - HWMON_T_INPUT,
>> - HWMON_T_INPUT),
>> + HWMON_T_INPUT | HWMON_T_LABEL,
>> + HWMON_T_INPUT | HWMON_T_LABEL,
>> + HWMON_T_INPUT | HWMON_T_LABEL,
>> + HWMON_T_INPUT | HWMON_T_LABEL,
>> + HWMON_T_INPUT | HWMON_T_LABEL,
>> + HWMON_T_INPUT | HWMON_T_LABEL,
>> + HWMON_T_INPUT | HWMON_T_LABEL,
>> + HWMON_T_INPUT | HWMON_T_LABEL,
>> + HWMON_T_INPUT | HWMON_T_LABEL,
>> + HWMON_T_INPUT | HWMON_T_LABEL,
>> + HWMON_T_INPUT | HWMON_T_LABEL,
>> + HWMON_T_INPUT | HWMON_T_LABEL,
>> + HWMON_T_INPUT | HWMON_T_LABEL,
>> + HWMON_T_INPUT | HWMON_T_LABEL,
>> + HWMON_T_INPUT | HWMON_T_LABEL,
>> + HWMON_T_INPUT | HWMON_T_LABEL),
>
> Another example. Why have me review the previous patch
> just to change the code here ?

See above.

>
> [ ... ]
>
>> + /* Retrieve the name for each available sensor. */
>> + for (channel = 0; channel < SSAM_TMP_SENSOR_MAX_COUNT; channel++) {
>> + if (!(sensors & BIT(channel)))
>> + continue;
>> +
>> + status = ssam_tmp_get_name(sdev, channel + 1,
>> + ssam_temp->names[channel],
>> + SSAM_TMP_SENSOR_NAME_LENGTH);
>> + if (status)
>> + return status;
>
> Your call to fail probe in this case just because it can not find
> a sensor name. I personally find that quite aggressive.

We generally do not expect this to fail. And I think if this fails, it
should either be something wrong with the EC communication (in a way
that breaks other things like reading temperature values as well) or in
our assumptions of how the interface works. So similar to above, I
personally would prefer this to fail in a more noisy way, so that people
are more likely to tell us about the failure.

As an example: We expect sensor names and the interface for it to be
present. However, maybe some device (either one we couldn't test or some
future one) does not implement the interface call. Usually, an
unsupported call results in a timeout error. So if we would just ignore
that, the driver would load slow (as for each name it would wait for the
timeout), but users might not notice as the temperature sensors can
still be accessed normally after that. I do see that as an easily
detectable bug. Letting it continue to load IMHO just creates a more
subtle bug.

As above, if you prefer I can change that. Just let me know.

Thank you for your review.

Best regards,
Max

2024-04-16 21:08:36

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH 2/3] hwmon: surface_temp: Add support for sensor names

On Tue, Apr 16, 2024 at 09:00:05PM +0200, Maximilian Luz wrote:
> On 4/16/24 3:30 PM, Guenter Roeck wrote:
> > On Sat, Mar 30, 2024 at 12:24:01PM +0100, Maximilian Luz wrote:
>
> [...]
>
> > > +static int ssam_tmp_get_name(struct ssam_device *sdev, u8 iid, char *buf, size_t buf_len)
> > > +{
> > > + struct ssam_tmp_get_name_rsp name_rsp;
> > > + int status;
> > > +
> > > + status = __ssam_tmp_get_name(sdev->ctrl, sdev->uid.target, iid, &name_rsp);
> > > + if (status)
> > > + return status;
> > > +
> > > + /*
> > > + * This should not fail unless the name in the returned struct is not
> > > + * null-terminated or someone changed something in the struct
> > > + * definitions above, since our buffer and struct have the same
> > > + * capacity by design. So if this fails blow this up with a warning.
> > > + * Since the more likely cause is that the returned string isn't
> > > + * null-terminated, we might have received garbage (as opposed to just
> > > + * an incomplete string), so also fail the function.
> > > + */
> > > + status = strscpy(buf, name_rsp.name, buf_len);
> > > + WARN_ON(status < 0);
> >
> > Not acceptable. From include/asm-generic/bug.h:
> >
> > * Do not use these macros when checking for invalid external inputs
> > * (e.g. invalid system call arguments, or invalid data coming from
> > * network/devices), and on transient conditions like ENOMEM or EAGAIN.
> > * These macros should be used for recoverable kernel issues only.
> >
>
> Hmm, I always interpreted that as "do not use for checking user-defined
> input", which this is not.
>

"invalid data coming from network/devices" is not user-defined input.

> The reason I added/requested it here was to check for "bugs" in how we
> think the interface behaves (and our definitions related to it) as the
> interface was reverse-engineered. Generally, when this fails I expect
> that we made some mistake in our code (or the things we assume about the
> interface), which likely causes us to interpret the received data as
> "garbage" (and not the EC sending corrupted data, which it is generally
> not due to CRC checking and validation in the SAM driver). Hence, I
> personally would prefer if this blows up in a big warning with a trace
> attached to it, so that an end-user can easily report this to us and
> that we can appropriately deal with it. As opposed to some one-line
> error message that will likely get overlooked or not taken as seriously.
>

I have heard the "This backtrace is absolutely essential" argument before,
including the "will be fixed" part. Chromebooks report more than 500,000
warning backtraces _every single day_. None of them is getting fixed.

> If you still insist, I could change that to a dev_err() message. Or
> maybe make the comment a bit clearer.

dev_err() would be acceptable. WARN() or WARN_ON() are no-go.

Guenter

2024-04-16 21:38:58

by Ivor Wanders

[permalink] [raw]
Subject: Re: [PATCH 2/3] hwmon: surface_temp: Add support for sensor names

> Ivor reverse-engineered the interface calls to get the sensor
> names and wrote the vast majority of this patch (I only changed some
> smaller things and gave advice, hence the co-developed-by). Therefore I
> wanted to give proper attribution to Ivor for his work and not squash it
> into a single patch.

By all means squash them in the next patch revision to make things simpler
to review, I don't think it's a large enough piece of work to worry too
much about attribution if it makes review more difficult.

~Ivor

2024-05-02 20:04:29

by Maximilian Luz

[permalink] [raw]
Subject: Re: [PATCH 2/3] hwmon: surface_temp: Add support for sensor names

On 4/16/24 11:34 PM, Ivor Wanders wrote:
>> Ivor reverse-engineered the interface calls to get the sensor
>> names and wrote the vast majority of this patch (I only changed some
>> smaller things and gave advice, hence the co-developed-by). Therefore I
>> wanted to give proper attribution to Ivor for his work and not squash it
>> into a single patch.
>
> By all means squash them in the next patch revision to make things simpler
> to review, I don't think it's a large enough piece of work to worry too
> much about attribution if it makes review more difficult.

Alright, I will do that then.

Thanks Ivor!

Best regards,
Max

2024-05-02 20:06:14

by Maximilian Luz

[permalink] [raw]
Subject: Re: [PATCH 2/3] hwmon: surface_temp: Add support for sensor names

Hi,

On 4/16/24 11:08 PM, Guenter Roeck wrote:
> On Tue, Apr 16, 2024 at 09:00:05PM +0200, Maximilian Luz wrote:
>> On 4/16/24 3:30 PM, Guenter Roeck wrote:
>>> On Sat, Mar 30, 2024 at 12:24:01PM +0100, Maximilian Luz wrote:
>>
>> [...]
>>
>>>> +static int ssam_tmp_get_name(struct ssam_device *sdev, u8 iid, char *buf, size_t buf_len)
>>>> +{
>>>> + struct ssam_tmp_get_name_rsp name_rsp;
>>>> + int status;
>>>> +
>>>> + status = __ssam_tmp_get_name(sdev->ctrl, sdev->uid.target, iid, &name_rsp);
>>>> + if (status)
>>>> + return status;
>>>> +
>>>> + /*
>>>> + * This should not fail unless the name in the returned struct is not
>>>> + * null-terminated or someone changed something in the struct
>>>> + * definitions above, since our buffer and struct have the same
>>>> + * capacity by design. So if this fails blow this up with a warning.
>>>> + * Since the more likely cause is that the returned string isn't
>>>> + * null-terminated, we might have received garbage (as opposed to just
>>>> + * an incomplete string), so also fail the function.
>>>> + */
>>>> + status = strscpy(buf, name_rsp.name, buf_len);
>>>> + WARN_ON(status < 0);
>>>
>>> Not acceptable. From include/asm-generic/bug.h:
>>>
>>> * Do not use these macros when checking for invalid external inputs
>>> * (e.g. invalid system call arguments, or invalid data coming from
>>> * network/devices), and on transient conditions like ENOMEM or EAGAIN.
>>> * These macros should be used for recoverable kernel issues only.
>>>
>>
>> Hmm, I always interpreted that as "do not use for checking user-defined
>> input", which this is not.
>>
>
> "invalid data coming from network/devices" is not user-defined input.
>
>> The reason I added/requested it here was to check for "bugs" in how we
>> think the interface behaves (and our definitions related to it) as the
>> interface was reverse-engineered. Generally, when this fails I expect
>> that we made some mistake in our code (or the things we assume about the
>> interface), which likely causes us to interpret the received data as
>> "garbage" (and not the EC sending corrupted data, which it is generally
>> not due to CRC checking and validation in the SAM driver). Hence, I
>> personally would prefer if this blows up in a big warning with a trace
>> attached to it, so that an end-user can easily report this to us and
>> that we can appropriately deal with it. As opposed to some one-line
>> error message that will likely get overlooked or not taken as seriously.
>>
>
> I have heard the "This backtrace is absolutely essential" argument before,
> including the "will be fixed" part. Chromebooks report more than 500,000
> warning backtraces _every single day_. None of them is getting fixed.
>
>> If you still insist, I could change that to a dev_err() message. Or
>> maybe make the comment a bit clearer.
>
> dev_err() would be acceptable. WARN() or WARN_ON() are no-go.

Sorry for the delayed response. I will change this to a dev_err() then
and try to re-spin the patches this weekend.

Best regards,
Max