2019-05-28 20:10:37

by Richard Gong

[permalink] [raw]
Subject: [PATCHv4 2/4] firmware: add Intel Stratix10 remote system update driver

From: Richard Gong <[email protected]>

The Intel Remote System Update (RSU) driver exposes interfaces access
through the Intel Service Layer to user space via sysfs interface.
The RSU interfaces report and control some of the optional RSU features
on Intel Stratix 10 SoC.

The RSU feature provides a way for customers to update the boot
configuration of a Intel Stratix 10 SoC device with significantly reduced
risk of corrupting the bitstream storage and bricking the system.

Signed-off-by: Richard Gong <[email protected]>
Reviewed-by: Alan Tull <[email protected]>
---
v2: s/SysFS/sysfs, s/scnprintf/sprintf,
s/attr_group/ATTRIBUTE_GROUPS, use devm_device_add_groups() and
devm_device_remove_groups()
added check the return value from rsu_send_msg()
removed a pr_info in stratix10_rsu_probe()
removed compatible = "intel,stratix10-rsu"
v3: no change
v4: removed devm_device_add_groups() & devm_device_remove_groups(),
utilized groups at struct device_driver
---
drivers/firmware/Kconfig | 18 ++
drivers/firmware/Makefile | 1 +
drivers/firmware/stratix10-rsu.c | 403 +++++++++++++++++++++++++++++++++++++++
3 files changed, 422 insertions(+)
create mode 100644 drivers/firmware/stratix10-rsu.c

diff --git a/drivers/firmware/Kconfig b/drivers/firmware/Kconfig
index 11fda9e..0b69dc7 100644
--- a/drivers/firmware/Kconfig
+++ b/drivers/firmware/Kconfig
@@ -214,6 +214,24 @@ config INTEL_STRATIX10_SERVICE

Say Y here if you want Stratix10 service layer support.

+config INTEL_STRATIX10_RSU
+ tristate "Intel Stratix10 Remote System Update"
+ depends on INTEL_STRATIX10_SERVICE
+ help
+ The Intel Remote System Update (RSU) driver exposes interfaces
+ access through the Intel Service Layer to user space via sysfs
+ device attribute nodes. The RSU interfaces report/control some of
+ the optional RSU features of the Stratix 10 SoC FPGA.
+
+ The RSU provides a way for customers to update the boot
+ configuration of a Stratix 10 SoC device with significantly reduced
+ risk of corrupting the bitstream storage and bricking the system.
+
+ Enable RSU support if you are using an Intel SoC FPGA with the RSU
+ feature enabled and you want Linux user space control.
+
+ Say Y here if you want Intel RSU support.
+
config QCOM_SCM
bool
depends on ARM || ARM64
diff --git a/drivers/firmware/Makefile b/drivers/firmware/Makefile
index 3fa0b34..d04d5fc 100644
--- a/drivers/firmware/Makefile
+++ b/drivers/firmware/Makefile
@@ -11,6 +11,7 @@ obj-$(CONFIG_EDD) += edd.o
obj-$(CONFIG_EFI_PCDP) += pcdp.o
obj-$(CONFIG_DMIID) += dmi-id.o
obj-$(CONFIG_INTEL_STRATIX10_SERVICE) += stratix10-svc.o
+obj-$(CONFIG_INTEL_STRATIX10_RSU) += stratix10-rsu.o
obj-$(CONFIG_ISCSI_IBFT_FIND) += iscsi_ibft_find.o
obj-$(CONFIG_ISCSI_IBFT) += iscsi_ibft.o
obj-$(CONFIG_FIRMWARE_MEMMAP) += memmap.o
diff --git a/drivers/firmware/stratix10-rsu.c b/drivers/firmware/stratix10-rsu.c
new file mode 100644
index 0000000..7bc7aaf
--- /dev/null
+++ b/drivers/firmware/stratix10-rsu.c
@@ -0,0 +1,403 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2018-2019, Intel Corporation
+ */
+
+#include <linux/arm-smccc.h>
+#include <linux/bitfield.h>
+#include <linux/completion.h>
+#include <linux/kobject.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/of.h>
+#include <linux/of_platform.h>
+#include <linux/platform_device.h>
+#include <linux/firmware/intel/stratix10-svc-client.h>
+#include <linux/string.h>
+#include <linux/sysfs.h>
+
+#define RSU_STATE_MASK GENMASK_ULL(31, 0)
+#define RSU_VERSION_MASK GENMASK_ULL(63, 32)
+#define RSU_ERROR_LOCATION_MASK GENMASK_ULL(31, 0)
+#define RSU_ERROR_DETAIL_MASK GENMASK_ULL(63, 32)
+
+#define RSU_TIMEOUT (msecs_to_jiffies(SVC_RSU_REQUEST_TIMEOUT_MS))
+
+/**
+ * struct stratix10_rsu_priv - rsu data structure
+ * @chan: pointer to the allocated service channel
+ * @client: active service client
+ * @completion: state for callback completion
+ * @lock: a mutex to protect callback completion state
+ * @status.current_image: address of image currently running in flash
+ * @status.fail_image: address of failed image in flash
+ * @status.version: the version number of RSU firmware
+ * @status.state: the state of RSU system
+ * @status.error_details: error code
+ * @status.error_location: the error offset inside the image that failed
+ */
+struct stratix10_rsu_priv {
+ struct stratix10_svc_chan *chan;
+ struct stratix10_svc_client client;
+ struct completion completion;
+ struct mutex lock;
+ struct {
+ unsigned long current_image;
+ unsigned long fail_image;
+ unsigned int version;
+ unsigned int state;
+ unsigned int error_details;
+ unsigned int error_location;
+ } status;
+};
+
+/**
+ * rsu_status_callback() - Status callback from Intel service layer
+ * @client: pointer to service client
+ * @data: pointer to callback data structure
+ *
+ * Callback from Intel service layer for RSU status request. Status is
+ * only updated after a system reboot, so a get updated status call is
+ * made during driver probe.
+ */
+static void rsu_status_callback(struct stratix10_svc_client *client,
+ struct stratix10_svc_cb_data *data)
+{
+ struct stratix10_rsu_priv *priv = client->priv;
+ struct arm_smccc_res *res = (struct arm_smccc_res *)data->kaddr1;
+
+ if (data->status == BIT(SVC_STATUS_RSU_OK)) {
+ priv->status.version = FIELD_GET(RSU_VERSION_MASK,
+ res->a2);
+ priv->status.state = FIELD_GET(RSU_STATE_MASK, res->a2);
+ priv->status.fail_image = res->a1;
+ priv->status.current_image = res->a0;
+ priv->status.error_location =
+ FIELD_GET(RSU_ERROR_LOCATION_MASK, res->a3);
+ priv->status.error_details =
+ FIELD_GET(RSU_ERROR_DETAIL_MASK, res->a3);
+ } else {
+ dev_err(client->dev, "COMMAND_RSU_STATUS returned 0x%lX\n",
+ res->a0);
+ priv->status.version = 0;
+ priv->status.state = 0;
+ priv->status.fail_image = 0;
+ priv->status.current_image = 0;
+ priv->status.error_location = 0;
+ priv->status.error_details = 0;
+ }
+
+ complete(&priv->completion);
+}
+
+/**
+ * rsu_command_callback() - Update callback from Intel service layer
+ * @client: pointer to client
+ * @data: pointer to callback data structure
+ *
+ * Callback from Intel service layer for RSU commands.
+ */
+static void rsu_command_callback(struct stratix10_svc_client *client,
+ struct stratix10_svc_cb_data *data)
+{
+ struct stratix10_rsu_priv *priv = client->priv;
+
+ if (data->status != BIT(SVC_STATUS_RSU_OK))
+ dev_err(client->dev, "RSU returned status is %i\n",
+ data->status);
+ complete(&priv->completion);
+}
+
+/**
+ * rsu_send_msg() - send a message to Intel service layer
+ * @priv: pointer to rsu private data
+ * @command: RSU status or update command
+ * @arg: the request argument, the bitstream address or notify status
+ * @callback: function pointer for the callback (status or update)
+ *
+ * Start an Intel service layer transaction to perform the SMC call that
+ * is necessary to get RSU boot log or set the address of bitstream to
+ * boot after reboot.
+ *
+ * Returns 0 on success or -ETIMEDOUT on error.
+ */
+static int rsu_send_msg(struct stratix10_rsu_priv *priv,
+ enum stratix10_svc_command_code command,
+ unsigned long arg,
+ void (*callback)(struct stratix10_svc_client *client,
+ struct stratix10_svc_cb_data *data))
+{
+ struct stratix10_svc_client_msg msg;
+ int ret;
+
+ mutex_lock(&priv->lock);
+ reinit_completion(&priv->completion);
+ priv->client.receive_cb = callback;
+
+ msg.command = command;
+ if (arg)
+ msg.arg[0] = arg;
+
+ ret = stratix10_svc_send(priv->chan, &msg);
+ if (ret < 0)
+ goto status_done;
+
+ ret = wait_for_completion_interruptible_timeout(&priv->completion,
+ RSU_TIMEOUT);
+ if (!ret) {
+ dev_err(priv->client.dev,
+ "timeout waiting for COMMAND_RSU_STATUS\n");
+ ret = -ETIMEDOUT;
+ goto status_done;
+ } else if (ret < 0) {
+ dev_err(priv->client.dev,
+ "error %d waiting for COMMAND_RSU_STATUS\n", ret);
+ goto status_done;
+ } else {
+ ret = 0;
+ }
+
+status_done:
+ stratix10_svc_done(priv->chan);
+ mutex_unlock(&priv->lock);
+ return ret;
+}
+
+/*
+ * This driver exposes some optional features of the Intel Stratix 10 SoC FPGA.
+ * The sysfs interfaces exposed here are FPGA Remote System Update (RSU)
+ * related. They allow user space software to query the configuration system
+ * status and to request optional reboot behavior specific to Intel FPGAs.
+ */
+
+static ssize_t current_image_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct stratix10_rsu_priv *priv = dev_get_drvdata(dev);
+
+ if (!priv)
+ return -ENODEV;
+
+ return sprintf(buf, "%ld", priv->status.current_image);
+}
+
+static ssize_t fail_image_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct stratix10_rsu_priv *priv = dev_get_drvdata(dev);
+
+ if (!priv)
+ return -ENODEV;
+
+ return sprintf(buf, "%ld", priv->status.fail_image);
+}
+
+static ssize_t version_show(struct device *dev, struct device_attribute *attr,
+ char *buf)
+{
+ struct stratix10_rsu_priv *priv = dev_get_drvdata(dev);
+
+ if (!priv)
+ return -ENODEV;
+
+ return sprintf(buf, "%d", priv->status.version);
+}
+
+static ssize_t state_show(struct device *dev, struct device_attribute *attr,
+ char *buf)
+{
+ struct stratix10_rsu_priv *priv = dev_get_drvdata(dev);
+
+ if (!priv)
+ return -ENODEV;
+
+ return sprintf(buf, "%d", priv->status.state);
+}
+
+static ssize_t error_location_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct stratix10_rsu_priv *priv = dev_get_drvdata(dev);
+
+ if (!priv)
+ return -ENODEV;
+
+ return sprintf(buf, "%d", priv->status.error_location);
+}
+
+static ssize_t error_details_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct stratix10_rsu_priv *priv = dev_get_drvdata(dev);
+
+ if (!priv)
+ return -ENODEV;
+
+ return sprintf(buf, "%d", priv->status.error_details);
+}
+
+static ssize_t reboot_image_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct stratix10_rsu_priv *priv = dev_get_drvdata(dev);
+ unsigned long address;
+ int ret;
+
+ if (priv == 0)
+ return -ENODEV;
+
+ ret = kstrtoul(buf, 0, &address);
+ if (ret)
+ return ret;
+
+ ret = rsu_send_msg(priv, COMMAND_RSU_UPDATE,
+ address, rsu_command_callback);
+ if (ret) {
+ dev_err(dev, "Error, RSU update returned %i\n", ret);
+ return ret;
+ }
+
+ return count;
+}
+
+static ssize_t notify_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct stratix10_rsu_priv *priv = dev_get_drvdata(dev);
+ unsigned long status;
+ int ret;
+
+ if (priv == 0)
+ return -ENODEV;
+
+ ret = kstrtoul(buf, 0, &status);
+ if (ret)
+ return ret;
+
+ ret = rsu_send_msg(priv, COMMAND_RSU_NOTIFY,
+ status, rsu_command_callback);
+ if (ret) {
+ dev_err(dev, "Error, RSU notify returned %i\n", ret);
+ return ret;
+ }
+
+ return count;
+}
+
+static ssize_t watchdog_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct stratix10_rsu_priv *priv = dev_get_drvdata(dev);
+ unsigned long wd_action;
+ int ret;
+
+ if (priv == 0)
+ return -ENODEV;
+
+ ret = kstrtoul(buf, 0, &wd_action);
+ if (ret)
+ return ret;
+
+ ret = rsu_send_msg(priv, COMMAND_RSU_WD,
+ wd_action, rsu_command_callback);
+ if (ret) {
+ dev_err(dev, "Error, WD timer expiration returned %i\n", ret);
+ return ret;
+ }
+
+ return count;
+}
+
+static DEVICE_ATTR_RO(current_image);
+static DEVICE_ATTR_RO(fail_image);
+static DEVICE_ATTR_RO(state);
+static DEVICE_ATTR_RO(version);
+static DEVICE_ATTR_RO(error_location);
+static DEVICE_ATTR_RO(error_details);
+static DEVICE_ATTR_WO(reboot_image);
+static DEVICE_ATTR_WO(notify);
+static DEVICE_ATTR_WO(watchdog);
+
+static struct attribute *rsu_attrs[] = {
+ &dev_attr_current_image.attr,
+ &dev_attr_fail_image.attr,
+ &dev_attr_state.attr,
+ &dev_attr_version.attr,
+ &dev_attr_error_location.attr,
+ &dev_attr_error_details.attr,
+ &dev_attr_reboot_image.attr,
+ &dev_attr_notify.attr,
+ &dev_attr_watchdog.attr,
+ NULL
+};
+
+ATTRIBUTE_GROUPS(rsu);
+
+static int stratix10_rsu_probe(struct platform_device *pdev)
+{
+ struct device *dev = &pdev->dev;
+ struct stratix10_rsu_priv *priv;
+ int ret;
+
+ priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
+ if (!priv)
+ return -ENOMEM;
+
+ priv->client.dev = dev;
+ priv->client.receive_cb = NULL;
+ priv->client.priv = priv;
+ priv->status.current_image = 0;
+ priv->status.fail_image = 0;
+ priv->status.error_location = 0;
+ priv->status.error_details = 0;
+ priv->status.version = 0;
+ priv->status.state = 0;
+
+ mutex_init(&priv->lock);
+ priv->chan = stratix10_svc_request_channel_byname(&priv->client,
+ SVC_CLIENT_RSU);
+ if (IS_ERR(priv->chan)) {
+ dev_err(dev, "couldn't get service channel %s\n",
+ SVC_CLIENT_RSU);
+ return PTR_ERR(priv->chan);
+ }
+
+ init_completion(&priv->completion);
+ platform_set_drvdata(pdev, priv);
+
+ /* status is only updated after reboot */
+ ret = rsu_send_msg(priv, COMMAND_RSU_STATUS,
+ 0, rsu_status_callback);
+ if (ret) {
+ dev_err(dev, "Error, getting RSU status %i\n", ret);
+ stratix10_svc_free_channel(priv->chan);
+ }
+
+ return ret;
+}
+
+static int stratix10_rsu_remove(struct platform_device *pdev)
+{
+ struct stratix10_rsu_priv *priv = platform_get_drvdata(pdev);
+
+ stratix10_svc_free_channel(priv->chan);
+ return 0;
+}
+
+static struct platform_driver stratix10_rsu_driver = {
+ .probe = stratix10_rsu_probe,
+ .remove = stratix10_rsu_remove,
+ .driver = {
+ .name = "stratix10-rsu",
+ .groups = rsu_groups,
+ },
+};
+
+module_platform_driver(stratix10_rsu_driver);
+
+MODULE_LICENSE("GPL v2");
+MODULE_DESCRIPTION("Intel Remote System Update Driver");
+MODULE_AUTHOR("Richard Gong <[email protected]>");
--
2.7.4


2019-05-28 23:17:29

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCHv4 2/4] firmware: add Intel Stratix10 remote system update driver

On Tue, May 28, 2019 at 03:20:31PM -0500, [email protected] wrote:
> From: Richard Gong <[email protected]>
>
> The Intel Remote System Update (RSU) driver exposes interfaces access
> through the Intel Service Layer to user space via sysfs interface.
> The RSU interfaces report and control some of the optional RSU features
> on Intel Stratix 10 SoC.
>
> The RSU feature provides a way for customers to update the boot
> configuration of a Intel Stratix 10 SoC device with significantly reduced
> risk of corrupting the bitstream storage and bricking the system.
>
> Signed-off-by: Richard Gong <[email protected]>
> Reviewed-by: Alan Tull <[email protected]>

Is Alan reviewing all of these new versions before you post them
publicly? If so, great, if not, don't add tags to new versions when you
change things around...

thanks,

greg k-h

2019-05-28 23:25:22

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCHv4 2/4] firmware: add Intel Stratix10 remote system update driver

On Tue, May 28, 2019 at 03:20:31PM -0500, [email protected] wrote:
> +/**
> + * rsu_send_msg() - send a message to Intel service layer
> + * @priv: pointer to rsu private data
> + * @command: RSU status or update command
> + * @arg: the request argument, the bitstream address or notify status
> + * @callback: function pointer for the callback (status or update)
> + *
> + * Start an Intel service layer transaction to perform the SMC call that
> + * is necessary to get RSU boot log or set the address of bitstream to
> + * boot after reboot.
> + *
> + * Returns 0 on success or -ETIMEDOUT on error.
> + */
> +static int rsu_send_msg(struct stratix10_rsu_priv *priv,
> + enum stratix10_svc_command_code command,
> + unsigned long arg,
> + void (*callback)(struct stratix10_svc_client *client,
> + struct stratix10_svc_cb_data *data))
> +{
> + struct stratix10_svc_client_msg msg;
> + int ret;
> +
> + mutex_lock(&priv->lock);
> + reinit_completion(&priv->completion);
> + priv->client.receive_cb = callback;
> +
> + msg.command = command;
> + if (arg)
> + msg.arg[0] = arg;
> +
> + ret = stratix10_svc_send(priv->chan, &msg);

meta-question, can you send messages that are on the stack and not in
DMA-able memory? Or should this be a dynamicly created variable so you
know it can work properly with DMA?

And how big is that structure, will it mess with stack sizes?

thanks,

greg k-h

2019-05-28 23:26:34

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCHv4 2/4] firmware: add Intel Stratix10 remote system update driver

On Tue, May 28, 2019 at 03:20:31PM -0500, [email protected] wrote:
> +static int rsu_send_msg(struct stratix10_rsu_priv *priv,
> + enum stratix10_svc_command_code command,
> + unsigned long arg,
> + void (*callback)(struct stratix10_svc_client *client,
> + struct stratix10_svc_cb_data *data))

Odd indentation for arg, and then callback.

Why isn't callback a typedef to make this simpler to use?

thanks,

greg k-h

2019-05-29 14:05:10

by Richard Gong

[permalink] [raw]
Subject: Re: [PATCHv4 2/4] firmware: add Intel Stratix10 remote system update driver

Hi Greg,


On 5/28/19 6:15 PM, Greg KH wrote:
> On Tue, May 28, 2019 at 03:20:31PM -0500, [email protected] wrote:
>> From: Richard Gong <[email protected]>
>>
>> The Intel Remote System Update (RSU) driver exposes interfaces access
>> through the Intel Service Layer to user space via sysfs interface.
>> The RSU interfaces report and control some of the optional RSU features
>> on Intel Stratix 10 SoC.
>>
>> The RSU feature provides a way for customers to update the boot
>> configuration of a Intel Stratix 10 SoC device with significantly reduced
>> risk of corrupting the bitstream storage and bricking the system.
>>
>> Signed-off-by: Richard Gong <[email protected]>
>> Reviewed-by: Alan Tull <[email protected]>
>
> Is Alan reviewing all of these new versions before you post them
> publicly? If so, great, if not, don't add tags to new versions when you
> change things around...
Yes, Alan reviewed all of these new versions before I submitted for
upstream.

>
> thanks,
>
> greg k-h
>

Regards,
Richard

2019-05-29 14:46:05

by Richard Gong

[permalink] [raw]
Subject: Re: [PATCHv4 2/4] firmware: add Intel Stratix10 remote system update driver


Hi Greg,

On 5/28/19 6:22 PM, Greg KH wrote:
> On Tue, May 28, 2019 at 03:20:31PM -0500, [email protected] wrote:
>> +/**
>> + * rsu_send_msg() - send a message to Intel service layer
>> + * @priv: pointer to rsu private data
>> + * @command: RSU status or update command
>> + * @arg: the request argument, the bitstream address or notify status
>> + * @callback: function pointer for the callback (status or update)
>> + *
>> + * Start an Intel service layer transaction to perform the SMC call that
>> + * is necessary to get RSU boot log or set the address of bitstream to
>> + * boot after reboot.
>> + *
>> + * Returns 0 on success or -ETIMEDOUT on error.
>> + */
>> +static int rsu_send_msg(struct stratix10_rsu_priv *priv,
>> + enum stratix10_svc_command_code command,
>> + unsigned long arg,
>> + void (*callback)(struct stratix10_svc_client *client,
>> + struct stratix10_svc_cb_data *data))
>> +{
>> + struct stratix10_svc_client_msg msg;
>> + int ret;
>> +
>> + mutex_lock(&priv->lock);
>> + reinit_completion(&priv->completion);
>> + priv->client.receive_cb = callback;
>> +
>> + msg.command = command;
>> + if (arg)
>> + msg.arg[0] = arg;
>> +
>> + ret = stratix10_svc_send(priv->chan, &msg);
>
> meta-question, can you send messages that are on the stack and not in
> DMA-able memory? Or should this be a dynamicly created variable so you
> know it can work properly with DMA?
>
> And how big is that structure, will it mess with stack sizes?
>

stratix10_svc_send() is a function from Intel Stratix10 service layer
driver, which is used by service clients (RSU and FPGA manager drivers
as now) to add a message to the service layer driver's queue for being
sent to the secure world via SMC call.

It is not DMA related, we send messages via FIFO API.

The size of FIFO is sizeof(struct stratix10_svc_data) *
SVC_NUM_DATA_IN_FIFO, SVC_NUM_DATA_IN_FIFO is defined as 32.

fifo_size = sizeof(struct stratix10_svc_data) * SVC_NUM_DATA_IN_FIFO;
ret = kfifo_alloc(&controller->svc_fifo, fifo_size, GFP_KERNEL);
if (ret) {
dev_err(dev, "failed to allocate FIFO\n");
return ret;
}
spin_lock_init(&controller->svc_fifo_lock);

It will not mess with stack sizes.

> thanks,
>
> greg k-h
>

Regards,
Richard

2019-05-29 15:03:20

by Richard Gong

[permalink] [raw]
Subject: Re: [PATCHv4 2/4] firmware: add Intel Stratix10 remote system update driver


Hi Greg,

On 5/28/19 6:24 PM, Greg KH wrote:
> On Tue, May 28, 2019 at 03:20:31PM -0500, [email protected] wrote:
>> +static int rsu_send_msg(struct stratix10_rsu_priv *priv,
>> + enum stratix10_svc_command_code command,
>> + unsigned long arg,
>> + void (*callback)(struct stratix10_svc_client *client,
>> + struct stratix10_svc_cb_data *data))
>
> Odd indentation for arg, and then callback.
>
> Why isn't callback a typedef to make this simpler to use?
>

I will make correction in the next submission.

> thanks,
>
> greg k-h
>

Regards,
Richard

2019-06-03 15:46:35

by Richard Gong

[permalink] [raw]
Subject: A potential broken at platform driver?


Hi Greg,

Following your suggestion, I replaced devm_device_add_groups() with
.group = rus_groups in my version #4 submission. But I found out that
RSU driver outputs the garbage data if I use .group = rsu_groups.

To make RSU driver work properly, I have to revert the change at version
#4 and use devm_device_add_groups() again. Sorry, I didn't catch this
problem early.

I did some debug & below are captured log, you can see priv pointer get
messed at current_image_show(). I am not sure if something related to
platform driver work properly. I attach my debug patch in this mail.

1. Using .groups = rsu_groups,

[ 1.191115] *** rsu_status_callback:
[ 1.194782] res->a1=2000000
[ 1.197588] res->a1=0
[ 1.199865] res->a2=0
[ 1.202150] res->a3=0
[ 1.204433] priv=0xffff80007aa28e80
[ 1.207933] version=0, state=0, current_image=2000000, fail_image=0,
error_location=0, error_details=0
[ 1.217249] *** stratix10_rsu_probe: priv=0xffff80007aa28e80
root@stratix10:/sys/bus/platform/drivers/stratix10-rsu# cat current_image
[ 38.849341] *** current_image_show: priv=0xffff80007aa28d00
... output garbage data
priv pointer got changed

2. Using devm_device_add_groups

[ 1.191196] *** rsu_status_callback:
[ 1.194864] res->a1=2000000
[ 1.197660] res->a1=0
[ 1.199928] res->a2=0
[ 1.202204] res->a3=0
[ 1.204479] priv=0xffff80007a427e80
[ 1.207968] version=0, state=0, current_image=2000000, fail_image=0,
error_location=0, error_details=0
[ 1.217322] *** stratix10_rsu_probe: priv=0xffff80007a427e80
root@stratix10:/sys/devices/platform/stratix10-rsu.0# cat current_image
[ 39.032648] *** current_image_show: priv=0xffff80007a427e80
0x2000000
… ... output all correct data and correct priv pointer

I checked kernel sources and observed that .groups = xx_groups are
widely used in
device/misdevice/device_type/device_driver/bus_driver/pci_driver etc,
but not in platform driver.

A few platform drivers which does utilize groups,
1. driver/s390/char/sclp.c does use .group = xx_groups, but it use the
global variables for data exchanges between functions.
2. driver/firmware/arm_scpi.c doesn't use .group = xx_groups, instead it
use devm_device_add_groups().

Regards,
Richard



On 5/29/19 9:55 AM, Richard Gong wrote:
>
> Hi Greg,
>
> On 5/28/19 6:22 PM, Greg KH wrote:
>> On Tue, May 28, 2019 at 03:20:31PM -0500, [email protected]
>> wrote:
>>> +/**
>>> + * rsu_send_msg() - send a message to Intel service layer
>>> + * @priv: pointer to rsu private data
>>> + * @command: RSU status or update command
>>> + * @arg: the request argument, the bitstream address or notify status
>>> + * @callback: function pointer for the callback (status or update)
>>> + *
>>> + * Start an Intel service layer transaction to perform the SMC call
>>> that
>>> + * is necessary to get RSU boot log or set the address of bitstream to
>>> + * boot after reboot.
>>> + *
>>> + * Returns 0 on success or -ETIMEDOUT on error.
>>> + */
>>> +static int rsu_send_msg(struct stratix10_rsu_priv *priv,
>>> +            enum stratix10_svc_command_code command,
>>> +    unsigned long arg,
>>> +    void (*callback)(struct stratix10_svc_client *client,
>>> +             struct stratix10_svc_cb_data *data))
>>> +{
>>> +    struct stratix10_svc_client_msg msg;
>>> +    int ret;
>>> +
>>> +    mutex_lock(&priv->lock);
>>> +    reinit_completion(&priv->completion);
>>> +    priv->client.receive_cb = callback;
>>> +
>>> +    msg.command = command;
>>> +    if (arg)
>>> +        msg.arg[0] = arg;
>>> +
>>> +    ret = stratix10_svc_send(priv->chan, &msg);
>>
>> meta-question, can you send messages that are on the stack and not in
>> DMA-able memory?  Or should this be a dynamicly created variable so you
>> know it can work properly with DMA?
>>
>> And how big is that structure, will it mess with stack sizes?
>>
>
> stratix10_svc_send() is a function from Intel Stratix10 service layer
> driver, which is used by service clients (RSU and FPGA manager drivers
> as now) to add a message to the service layer driver's queue for being
> sent to the secure world via SMC call.
>
> It is not DMA related, we send messages via FIFO API.
>
> The size of FIFO is sizeof(struct stratix10_svc_data) *
> SVC_NUM_DATA_IN_FIFO, SVC_NUM_DATA_IN_FIFO is defined as 32.
>
> fifo_size = sizeof(struct stratix10_svc_data) *     SVC_NUM_DATA_IN_FIFO;
> ret = kfifo_alloc(&controller->svc_fifo, fifo_size, GFP_KERNEL);
> if (ret) {
>     dev_err(dev, "failed to allocate FIFO\n");
>         return ret;
> }
> spin_lock_init(&controller->svc_fifo_lock);
>
> It will not mess with stack sizes.
>
>> thanks,
>>
>> greg k-h
>>
>
> Regards,
> Richard


Attachments:
0001-add-just-for-debug.patch (3.56 kB)

2019-06-03 18:04:51

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: A potential broken at platform driver?

On Mon, Jun 03, 2019 at 10:57:18AM -0500, Richard Gong wrote:
>
> Hi Greg,
>
> Following your suggestion, I replaced devm_device_add_groups() with .group =
> rus_groups in my version #4 submission. But I found out that RSU driver
> outputs the garbage data if I use .group = rsu_groups.

What is "garbage"?

> To make RSU driver work properly, I have to revert the change at version #4
> and use devm_device_add_groups() again. Sorry, I didn't catch this problem
> early.
>
> I did some debug & below are captured log, you can see priv pointer get
> messed at current_image_show(). I am not sure if something related to
> platform driver work properly. I attach my debug patch in this mail.
>
> 1. Using .groups = rsu_groups,
>
> [ 1.191115] *** rsu_status_callback:
> [ 1.194782] res->a1=2000000
> [ 1.197588] res->a1=0
> [ 1.199865] res->a2=0
> [ 1.202150] res->a3=0
> [ 1.204433] priv=0xffff80007aa28e80
> [ 1.207933] version=0, state=0, current_image=2000000, fail_image=0,
> error_location=0, error_details=0
> [ 1.217249] *** stratix10_rsu_probe: priv=0xffff80007aa28e80
> root@stratix10:/sys/bus/platform/drivers/stratix10-rsu# cat current_image
> [ 38.849341] *** current_image_show: priv=0xffff80007aa28d00
> ... output garbage data
> priv pointer got changed

I don't understand this, sorry. Are you sure you are actually using the
correct pointer to your device?

> @@ -394,7 +432,7 @@ static struct platform_driver stratix10_rsu_driver = {
> .remove = stratix10_rsu_remove,
> .driver = {
> .name = "stratix10-rsu",
> - .groups = rsu_groups,
> +// .groups = rsu_groups,

Are you sure this is the correct pointer? I think that might be
pointing to the driver's attributes, not the device's attributes.

If platform drivers do not have a way to register groups properly, then
that really needs to be fixed, as trying to register it by yourself as
you are doing, is ripe for racing with userspace.

thanks,

greg k-h

2019-06-03 23:12:56

by Richard Gong

[permalink] [raw]
Subject: Re: A potential broken at platform driver?

Hi Greg,

On 6/3/19 1:02 PM, Greg KH wrote:
> On Mon, Jun 03, 2019 at 10:57:18AM -0500, Richard Gong wrote:
>>
>> Hi Greg,
>>
>> Following your suggestion, I replaced devm_device_add_groups() with .group =
>> rus_groups in my version #4 submission. But I found out that RSU driver
>> outputs the garbage data if I use .group = rsu_groups.
>
> What is "garbage"?
I mean incorrect status info.

>
>> To make RSU driver work properly, I have to revert the change at version #4
>> and use devm_device_add_groups() again. Sorry, I didn't catch this problem
>> early.
>>
>> I did some debug & below are captured log, you can see priv pointer get
>> messed at current_image_show(). I am not sure if something related to
>> platform driver work properly. I attach my debug patch in this mail.
>>
>> 1. Using .groups = rsu_groups,
>>
>> [ 1.191115] *** rsu_status_callback:
>> [ 1.194782] res->a1=2000000
>> [ 1.197588] res->a1=0
>> [ 1.199865] res->a2=0
>> [ 1.202150] res->a3=0
>> [ 1.204433] priv=0xffff80007aa28e80
>> [ 1.207933] version=0, state=0, current_image=2000000, fail_image=0,
>> error_location=0, error_details=0
>> [ 1.217249] *** stratix10_rsu_probe: priv=0xffff80007aa28e80
>> root@stratix10:/sys/bus/platform/drivers/stratix10-rsu# cat current_image
>> [ 38.849341] *** current_image_show: priv=0xffff80007aa28d00
>> ... output garbage data
>> priv pointer got changed
>
> I don't understand this, sorry. Are you sure you are actually using the
> correct pointer to your device?
>
I think so.

The dev pointer at current_image_show() should points to RSU device, but
it seems point to driver_private if I use .group = rsU_groups. As a
result I can't get the priv pointer properly at current_image_show().

[ 1.190993] *** rsu_status_callback:
[ 1.194669] dev=0xffff80007b409410
[ 1.198083] priv=0xffff80007a4d4e80
[ 1.201582] version=0, state=0, current_image=0x2000000,
fail_image=0x0, error_location=0x0, error_details=0
[ 1.211416] *** stratix10_rsu_probe: priv=0xffff80007a4d4e80
[ 1.217063] *** stratix10_rsu_probe: dev=0xffff80007b409410

root@stratix10:/sys/bus/platform/drivers/stratix10-rsu# cat current_image
[ 72.101277] *** current_image_show: dev=stratix10_rsu_driver
[ 72.136205] *** current_image_show: priv=0xffff80007a4d4d00

If I use devm_device_add_groups(), the dev pointer does point to RSU device,

[ 1.191456] *** rsu_status_callback:
[ 1.195124] priv=0xffff80007a429280
[ 1.198615] version=0, state=0, current_image=0x2000000,
fail_image=0x0, error_location=0x0, error_details=0
[ 1.208458] *** stratix10_rsu_probe: priv=0xffff80007a429280
[ 1.214105] *** stratix10_rsu_probe: dev=0xffff80007b409410

root@stratix10:/sys/devices/platform/stratix10-rsu.0# cat current_image
[ 31.484131] *** current_image_show: dev=0xffff80007b409410
[ 31.489651] *** current_image_show: priv=0xffff80007a429280


>> @@ -394,7 +432,7 @@ static struct platform_driver stratix10_rsu_driver = {
>> .remove = stratix10_rsu_remove,
>> .driver = {
>> .name = "stratix10-rsu",
>> - .groups = rsu_groups,
>> +// .groups = rsu_groups,
>
> Are you sure this is the correct pointer? I think that might be
> pointing to the driver's attributes, not the device's attributes.
>
> If platform drivers do not have a way to register groups properly, then
> that really needs to be fixed, as trying to register it by yourself as
> you are doing, is ripe for racing with userspace.
>
I agree we shouldn't call devm_device_add_groups() directly.

RSU status is only updated after power on or reboot, RSU driver get
status info at probe() & save them to the private pointer priv via
platform_set_drvdata().

static struct platform_driver stratix10_rsu_driver = {
.probe = stratix10_rsu_probe,
.remove = stratix10_rsu_remove,
.driver = {
.name = "stratix10-rsu",
.groups = rsu_groups,
},
};

The problem is that I don't have a way to properly retrieve the priv
pointer at xx_show() functions. Global variable is a work around, but I
don't think it is a good approach.

Any suggestion?

Regards,
Richard

> thanks,
>
> greg k-h
>

2019-06-04 10:35:39

by Romain Izard

[permalink] [raw]
Subject: Re: A potential broken at platform driver?

On Mon, Jun 03, 2019 at 08:02:55PM +0200, Greg KH wrote:
> > @@ -394,7 +432,7 @@ static struct platform_driver stratix10_rsu_driver = {
> > .remove = stratix10_rsu_remove,
> > .driver = {
> > .name = "stratix10-rsu",
> > - .groups = rsu_groups,
> > +// .groups = rsu_groups,
>
> Are you sure this is the correct pointer? I think that might be
> pointing to the driver's attributes, not the device's attributes.
>
> If platform drivers do not have a way to register groups properly, then
> that really needs to be fixed, as trying to register it by yourself as
> you are doing, is ripe for racing with userspace.

This is a very common issue with platform drivers, and it seems to me that
it is not possible to add device attributes when binding a device to a
driver without entering the race condition.

My understanding is the following one:

The root cause is that the device has already been created and reported
to the userspace with a KOBJ_ADD uevent before the device and the driver
are bound together. On receiving this event, userspace will react, and
it will try to read the device's attributes. In parallel the kernel will
try to find a matching driver. If a driver is found, the kernel will
call the probe function from the driver with the device as a parameter,
and if successful a KOBJ_BIND uevent will be sent to userspace, but this
is a recent addition.

Unfortunately, not all created devices will be bound to a driver, and the
existing udev code relies on KOBJ_ADD uevents rather than KOBJ_BIND uevents.
If new per-device attributes have been added to the device during the
binding stage userspace may or may not see them, depending on when userspace
tries to read the device's attributes.

I have this possible workaround, but I do not know if it is a good solution:

When binding the device and the driver together, create a new device as a
child to the current device, and fill its "groups" member to point to the
per-device attributes' group. As the device will be created with all the
attributes, it will not be affected by the race issues. The functions
handling the attributes will need to be modified to use the parents of their
"device" parameter, instead of the device itself. Additionnaly, the sysfs
location of the attributes will be different, as the child device will show
up in the sysfs path. But for a newly introduced device this will not be
a problem.

Is this a good compromise ?

Best regards,
--
Romain Izard

2019-06-04 14:29:36

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: A potential broken at platform driver?

On Tue, Jun 04, 2019 at 12:33:03PM +0200, Romain Izard wrote:
> On Mon, Jun 03, 2019 at 08:02:55PM +0200, Greg KH wrote:
> > > @@ -394,7 +432,7 @@ static struct platform_driver stratix10_rsu_driver = {
> > > .remove = stratix10_rsu_remove,
> > > .driver = {
> > > .name = "stratix10-rsu",
> > > - .groups = rsu_groups,
> > > +// .groups = rsu_groups,
> >
> > Are you sure this is the correct pointer? I think that might be
> > pointing to the driver's attributes, not the device's attributes.
> >
> > If platform drivers do not have a way to register groups properly, then
> > that really needs to be fixed, as trying to register it by yourself as
> > you are doing, is ripe for racing with userspace.
>
> This is a very common issue with platform drivers, and it seems to me that
> it is not possible to add device attributes when binding a device to a
> driver without entering the race condition.
>
> My understanding is the following one:
>
> The root cause is that the device has already been created and reported
> to the userspace with a KOBJ_ADD uevent before the device and the driver
> are bound together. On receiving this event, userspace will react, and
> it will try to read the device's attributes. In parallel the kernel will
> try to find a matching driver. If a driver is found, the kernel will
> call the probe function from the driver with the device as a parameter,
> and if successful a KOBJ_BIND uevent will be sent to userspace, but this
> is a recent addition.
>
> Unfortunately, not all created devices will be bound to a driver, and the
> existing udev code relies on KOBJ_ADD uevents rather than KOBJ_BIND uevents.
> If new per-device attributes have been added to the device during the
> binding stage userspace may or may not see them, depending on when userspace
> tries to read the device's attributes.
>
> I have this possible workaround, but I do not know if it is a good solution:
>
> When binding the device and the driver together, create a new device as a
> child to the current device, and fill its "groups" member to point to the
> per-device attributes' group. As the device will be created with all the
> attributes, it will not be affected by the race issues. The functions
> handling the attributes will need to be modified to use the parents of their
> "device" parameter, instead of the device itself. Additionnaly, the sysfs
> location of the attributes will be different, as the child device will show
> up in the sysfs path. But for a newly introduced device this will not be
> a problem.
>
> Is this a good compromise ?

Not really. You just want the attributes on the platform device itself.

Given the horrible hack that platform devices are today, what's one more
hack!

Here's a patch below of what should probably be done here. Richard, can
you change your code to use the new dev_groups pointer in the struct
platform_driver and this patch and let me know if that works or not?

Note, I've only compiled this code, not tested it...

thanks,

greg k-h

diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index 4d1729853d1a..3dd4b73a9b30 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -598,6 +598,7 @@ struct platform_device *platform_device_register_full(
}
EXPORT_SYMBOL_GPL(platform_device_register_full);

+static int platform_drv_remove(struct device *_dev);
static int platform_drv_probe(struct device *_dev)
{
struct platform_driver *drv = to_platform_driver(_dev->driver);
@@ -614,8 +615,18 @@ static int platform_drv_probe(struct device *_dev)

if (drv->probe) {
ret = drv->probe(dev);
- if (ret)
+ if (ret) {
dev_pm_domain_detach(_dev, true);
+ goto out;
+ }
+ }
+ if (drv->dev_groups) {
+ ret = device_add_groups(_dev, drv->dev_groups);
+ if (ret) {
+ platform_drv_remove(_dev);
+ return ret;
+ }
+ kobject_uevent(&_dev->kobj, KOBJ_CHANGE);
}

out:
@@ -640,6 +651,8 @@ static int platform_drv_remove(struct device *_dev)

if (drv->remove)
ret = drv->remove(dev);
+ if (drv->dev_groups)
+ device_remove_groups(_dev, drv->dev_groups);
dev_pm_domain_detach(_dev, true);

return ret;
diff --git a/include/linux/platform_device.h b/include/linux/platform_device.h
index cc464850b71e..027f1e1d7af8 100644
--- a/include/linux/platform_device.h
+++ b/include/linux/platform_device.h
@@ -190,6 +190,7 @@ struct platform_driver {
int (*resume)(struct platform_device *);
struct device_driver driver;
const struct platform_device_id *id_table;
+ const struct attribute_group **dev_groups;
bool prevent_deferred_probe;
};

2019-06-04 14:30:53

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: A potential broken at platform driver?

On Mon, Jun 03, 2019 at 06:08:37PM -0500, Richard Gong wrote:
> Hi Greg,
>
> On 6/3/19 1:02 PM, Greg KH wrote:
> > On Mon, Jun 03, 2019 at 10:57:18AM -0500, Richard Gong wrote:
> > >
> > > Hi Greg,
> > >
> > > Following your suggestion, I replaced devm_device_add_groups() with .group =
> > > rus_groups in my version #4 submission. But I found out that RSU driver
> > > outputs the garbage data if I use .group = rsu_groups.
> >
> > What is "garbage"?
> I mean incorrect status info.
>
> >
> > > To make RSU driver work properly, I have to revert the change at version #4
> > > and use devm_device_add_groups() again. Sorry, I didn't catch this problem
> > > early.
> > >
> > > I did some debug & below are captured log, you can see priv pointer get
> > > messed at current_image_show(). I am not sure if something related to
> > > platform driver work properly. I attach my debug patch in this mail.
> > >
> > > 1. Using .groups = rsu_groups,
> > >
> > > [ 1.191115] *** rsu_status_callback:
> > > [ 1.194782] res->a1=2000000
> > > [ 1.197588] res->a1=0
> > > [ 1.199865] res->a2=0
> > > [ 1.202150] res->a3=0
> > > [ 1.204433] priv=0xffff80007aa28e80
> > > [ 1.207933] version=0, state=0, current_image=2000000, fail_image=0,
> > > error_location=0, error_details=0
> > > [ 1.217249] *** stratix10_rsu_probe: priv=0xffff80007aa28e80
> > > root@stratix10:/sys/bus/platform/drivers/stratix10-rsu# cat current_image
> > > [ 38.849341] *** current_image_show: priv=0xffff80007aa28d00
> > > ... output garbage data
> > > priv pointer got changed
> >
> > I don't understand this, sorry. Are you sure you are actually using the
> > correct pointer to your device?
> >
> I think so.
>
> The dev pointer at current_image_show() should points to RSU device, but it
> seems point to driver_private if I use .group = rsU_groups. As a result I
> can't get the priv pointer properly at current_image_show().

It points to the driver kobject, not the device kobject. So that's the
issue here. See the patch that I just posted for a potential fix for
this.

thanks,

greg k-h

2019-06-04 16:02:21

by Richard Gong

[permalink] [raw]
Subject: Re: A potential broken at platform driver?


Hi Greg,

On 6/4/19 9:28 AM, Greg KH wrote:
> On Tue, Jun 04, 2019 at 12:33:03PM +0200, Romain Izard wrote:
>> On Mon, Jun 03, 2019 at 08:02:55PM +0200, Greg KH wrote:
>>>> @@ -394,7 +432,7 @@ static struct platform_driver stratix10_rsu_driver = {
>>>> .remove = stratix10_rsu_remove,
>>>> .driver = {
>>>> .name = "stratix10-rsu",
>>>> - .groups = rsu_groups,
>>>> +// .groups = rsu_groups,
>>>
>>> Are you sure this is the correct pointer? I think that might be
>>> pointing to the driver's attributes, not the device's attributes.
>>>
>>> If platform drivers do not have a way to register groups properly, then
>>> that really needs to be fixed, as trying to register it by yourself as
>>> you are doing, is ripe for racing with userspace.
>>
>> This is a very common issue with platform drivers, and it seems to me that
>> it is not possible to add device attributes when binding a device to a
>> driver without entering the race condition.
>>
>> My understanding is the following one:
>>
>> The root cause is that the device has already been created and reported
>> to the userspace with a KOBJ_ADD uevent before the device and the driver
>> are bound together. On receiving this event, userspace will react, and
>> it will try to read the device's attributes. In parallel the kernel will
>> try to find a matching driver. If a driver is found, the kernel will
>> call the probe function from the driver with the device as a parameter,
>> and if successful a KOBJ_BIND uevent will be sent to userspace, but this
>> is a recent addition.
>>
>> Unfortunately, not all created devices will be bound to a driver, and the
>> existing udev code relies on KOBJ_ADD uevents rather than KOBJ_BIND uevents.
>> If new per-device attributes have been added to the device during the
>> binding stage userspace may or may not see them, depending on when userspace
>> tries to read the device's attributes.
>>
>> I have this possible workaround, but I do not know if it is a good solution:
>>
>> When binding the device and the driver together, create a new device as a
>> child to the current device, and fill its "groups" member to point to the
>> per-device attributes' group. As the device will be created with all the
>> attributes, it will not be affected by the race issues. The functions
>> handling the attributes will need to be modified to use the parents of their
>> "device" parameter, instead of the device itself. Additionnaly, the sysfs
>> location of the attributes will be different, as the child device will show
>> up in the sysfs path. But for a newly introduced device this will not be
>> a problem.
>>
>> Is this a good compromise ?
>
> Not really. You just want the attributes on the platform device itself.
>
> Given the horrible hack that platform devices are today, what's one more
> hack!
>
> Here's a patch below of what should probably be done here. Richard, can
> you change your code to use the new dev_groups pointer in the struct
> platform_driver and this patch and let me know if that works or not?
>
> Note, I've only compiled this code, not tested it...
>

Your patch works.

Many thanks for your help!

Regards,
Richard

> thanks,
>
> greg k-h
>
> diff --git a/drivers/base/platform.c b/drivers/base/platform.c
> index 4d1729853d1a..3dd4b73a9b30 100644
> --- a/drivers/base/platform.c
> +++ b/drivers/base/platform.c
> @@ -598,6 +598,7 @@ struct platform_device *platform_device_register_full(
> }
> EXPORT_SYMBOL_GPL(platform_device_register_full);
>
> +static int platform_drv_remove(struct device *_dev);
> static int platform_drv_probe(struct device *_dev)
> {
> struct platform_driver *drv = to_platform_driver(_dev->driver);
> @@ -614,8 +615,18 @@ static int platform_drv_probe(struct device *_dev)
>
> if (drv->probe) {
> ret = drv->probe(dev);
> - if (ret)
> + if (ret) {
> dev_pm_domain_detach(_dev, true);
> + goto out;
> + }
> + }
> + if (drv->dev_groups) {
> + ret = device_add_groups(_dev, drv->dev_groups);
> + if (ret) {
> + platform_drv_remove(_dev);
> + return ret;
> + }
> + kobject_uevent(&_dev->kobj, KOBJ_CHANGE);
> }
>
> out:
> @@ -640,6 +651,8 @@ static int platform_drv_remove(struct device *_dev)
>
> if (drv->remove)
> ret = drv->remove(dev);
> + if (drv->dev_groups)
> + device_remove_groups(_dev, drv->dev_groups);
> dev_pm_domain_detach(_dev, true);
>
> return ret;
> diff --git a/include/linux/platform_device.h b/include/linux/platform_device.h
> index cc464850b71e..027f1e1d7af8 100644
> --- a/include/linux/platform_device.h
> +++ b/include/linux/platform_device.h
> @@ -190,6 +190,7 @@ struct platform_driver {
> int (*resume)(struct platform_device *);
> struct device_driver driver;
> const struct platform_device_id *id_table;
> + const struct attribute_group **dev_groups;
> bool prevent_deferred_probe;
> };
>
>

2019-06-04 17:06:01

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: A potential broken at platform driver?

On Tue, Jun 04, 2019 at 11:13:02AM -0500, Richard Gong wrote:
>
> Hi Greg,
>
> On 6/4/19 9:28 AM, Greg KH wrote:
> > On Tue, Jun 04, 2019 at 12:33:03PM +0200, Romain Izard wrote:
> > > On Mon, Jun 03, 2019 at 08:02:55PM +0200, Greg KH wrote:
> > > > > @@ -394,7 +432,7 @@ static struct platform_driver stratix10_rsu_driver = {
> > > > > .remove = stratix10_rsu_remove,
> > > > > .driver = {
> > > > > .name = "stratix10-rsu",
> > > > > - .groups = rsu_groups,
> > > > > +// .groups = rsu_groups,
> > > >
> > > > Are you sure this is the correct pointer? I think that might be
> > > > pointing to the driver's attributes, not the device's attributes.
> > > >
> > > > If platform drivers do not have a way to register groups properly, then
> > > > that really needs to be fixed, as trying to register it by yourself as
> > > > you are doing, is ripe for racing with userspace.
> > > This is a very common issue with platform drivers, and it seems to me that
> > > it is not possible to add device attributes when binding a device to a
> > > driver without entering the race condition.
> > >
> > > My understanding is the following one:
> > >
> > > The root cause is that the device has already been created and reported
> > > to the userspace with a KOBJ_ADD uevent before the device and the driver
> > > are bound together. On receiving this event, userspace will react, and
> > > it will try to read the device's attributes. In parallel the kernel will
> > > try to find a matching driver. If a driver is found, the kernel will
> > > call the probe function from the driver with the device as a parameter,
> > > and if successful a KOBJ_BIND uevent will be sent to userspace, but this
> > > is a recent addition.
> > >
> > > Unfortunately, not all created devices will be bound to a driver, and the
> > > existing udev code relies on KOBJ_ADD uevents rather than KOBJ_BIND uevents.
> > > If new per-device attributes have been added to the device during the
> > > binding stage userspace may or may not see them, depending on when userspace
> > > tries to read the device's attributes.
> > >
> > > I have this possible workaround, but I do not know if it is a good solution:
> > >
> > > When binding the device and the driver together, create a new device as a
> > > child to the current device, and fill its "groups" member to point to the
> > > per-device attributes' group. As the device will be created with all the
> > > attributes, it will not be affected by the race issues. The functions
> > > handling the attributes will need to be modified to use the parents of their
> > > "device" parameter, instead of the device itself. Additionnaly, the sysfs
> > > location of the attributes will be different, as the child device will show
> > > up in the sysfs path. But for a newly introduced device this will not be
> > > a problem.
> > >
> > > Is this a good compromise ?
> >
> > Not really. You just want the attributes on the platform device itself.
> >
> > Given the horrible hack that platform devices are today, what's one more
> > hack!
> >
> > Here's a patch below of what should probably be done here. Richard, can
> > you change your code to use the new dev_groups pointer in the struct
> > platform_driver and this patch and let me know if that works or not?
> >
> > Note, I've only compiled this code, not tested it...
> >
>
> Your patch works.
>
> Many thanks for your help!

Nice!

I guess I need to turn it into a real patch now. Let me do that tonight
and see if I can convert some existing drivers to use it as well...

thanks,

greg k-h

2019-06-11 14:08:08

by Richard Gong

[permalink] [raw]
Subject: Re: A potential broken at platform driver?


Hi Greg,

On 6/4/19 12:03 PM, Greg KH wrote:
> On Tue, Jun 04, 2019 at 11:13:02AM -0500, Richard Gong wrote:
>>
>> Hi Greg,
>>
>> On 6/4/19 9:28 AM, Greg KH wrote:
>>> On Tue, Jun 04, 2019 at 12:33:03PM +0200, Romain Izard wrote:
>>>> On Mon, Jun 03, 2019 at 08:02:55PM +0200, Greg KH wrote:
>>>>>> @@ -394,7 +432,7 @@ static struct platform_driver stratix10_rsu_driver = {
>>>>>> .remove = stratix10_rsu_remove,
>>>>>> .driver = {
>>>>>> .name = "stratix10-rsu",
>>>>>> - .groups = rsu_groups,
>>>>>> +// .groups = rsu_groups,
>>>>>
>>>>> Are you sure this is the correct pointer? I think that might be
>>>>> pointing to the driver's attributes, not the device's attributes.
>>>>>
>>>>> If platform drivers do not have a way to register groups properly, then
>>>>> that really needs to be fixed, as trying to register it by yourself as
>>>>> you are doing, is ripe for racing with userspace.
>>>> This is a very common issue with platform drivers, and it seems to me that
>>>> it is not possible to add device attributes when binding a device to a
>>>> driver without entering the race condition.
>>>>
>>>> My understanding is the following one:
>>>>
>>>> The root cause is that the device has already been created and reported
>>>> to the userspace with a KOBJ_ADD uevent before the device and the driver
>>>> are bound together. On receiving this event, userspace will react, and
>>>> it will try to read the device's attributes. In parallel the kernel will
>>>> try to find a matching driver. If a driver is found, the kernel will
>>>> call the probe function from the driver with the device as a parameter,
>>>> and if successful a KOBJ_BIND uevent will be sent to userspace, but this
>>>> is a recent addition.
>>>>
>>>> Unfortunately, not all created devices will be bound to a driver, and the
>>>> existing udev code relies on KOBJ_ADD uevents rather than KOBJ_BIND uevents.
>>>> If new per-device attributes have been added to the device during the
>>>> binding stage userspace may or may not see them, depending on when userspace
>>>> tries to read the device's attributes.
>>>>
>>>> I have this possible workaround, but I do not know if it is a good solution:
>>>>
>>>> When binding the device and the driver together, create a new device as a
>>>> child to the current device, and fill its "groups" member to point to the
>>>> per-device attributes' group. As the device will be created with all the
>>>> attributes, it will not be affected by the race issues. The functions
>>>> handling the attributes will need to be modified to use the parents of their
>>>> "device" parameter, instead of the device itself. Additionnaly, the sysfs
>>>> location of the attributes will be different, as the child device will show
>>>> up in the sysfs path. But for a newly introduced device this will not be
>>>> a problem.
>>>>
>>>> Is this a good compromise ?
>>>
>>> Not really. You just want the attributes on the platform device itself.
>>>
>>> Given the horrible hack that platform devices are today, what's one more
>>> hack!
>>>
>>> Here's a patch below of what should probably be done here. Richard, can
>>> you change your code to use the new dev_groups pointer in the struct
>>> platform_driver and this patch and let me know if that works or not?
>>>
>>> Note, I've only compiled this code, not tested it...
>>>
>>
>> Your patch works.
>>
>> Many thanks for your help!
>
> Nice!
>
> I guess I need to turn it into a real patch now. Let me do that tonight
> and see if I can convert some existing drivers to use it as well...
>

Sorry for asking.

I haven't seen your patch, did you release that?

Regards,
Richard

> thanks,
>
> greg k-h
>

2019-06-11 16:42:53

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: A potential broken at platform driver?

On Tue, Jun 11, 2019 at 09:10:05AM -0500, Richard Gong wrote:
>
> Hi Greg,
>
> On 6/4/19 12:03 PM, Greg KH wrote:
> > On Tue, Jun 04, 2019 at 11:13:02AM -0500, Richard Gong wrote:
> > >
> > > Hi Greg,
> > >
> > > On 6/4/19 9:28 AM, Greg KH wrote:
> > > > On Tue, Jun 04, 2019 at 12:33:03PM +0200, Romain Izard wrote:
> > > > > On Mon, Jun 03, 2019 at 08:02:55PM +0200, Greg KH wrote:
> > > > > > > @@ -394,7 +432,7 @@ static struct platform_driver stratix10_rsu_driver = {
> > > > > > > .remove = stratix10_rsu_remove,
> > > > > > > .driver = {
> > > > > > > .name = "stratix10-rsu",
> > > > > > > - .groups = rsu_groups,
> > > > > > > +// .groups = rsu_groups,
> > > > > >
> > > > > > Are you sure this is the correct pointer? I think that might be
> > > > > > pointing to the driver's attributes, not the device's attributes.
> > > > > >
> > > > > > If platform drivers do not have a way to register groups properly, then
> > > > > > that really needs to be fixed, as trying to register it by yourself as
> > > > > > you are doing, is ripe for racing with userspace.
> > > > > This is a very common issue with platform drivers, and it seems to me that
> > > > > it is not possible to add device attributes when binding a device to a
> > > > > driver without entering the race condition.
> > > > >
> > > > > My understanding is the following one:
> > > > >
> > > > > The root cause is that the device has already been created and reported
> > > > > to the userspace with a KOBJ_ADD uevent before the device and the driver
> > > > > are bound together. On receiving this event, userspace will react, and
> > > > > it will try to read the device's attributes. In parallel the kernel will
> > > > > try to find a matching driver. If a driver is found, the kernel will
> > > > > call the probe function from the driver with the device as a parameter,
> > > > > and if successful a KOBJ_BIND uevent will be sent to userspace, but this
> > > > > is a recent addition.
> > > > >
> > > > > Unfortunately, not all created devices will be bound to a driver, and the
> > > > > existing udev code relies on KOBJ_ADD uevents rather than KOBJ_BIND uevents.
> > > > > If new per-device attributes have been added to the device during the
> > > > > binding stage userspace may or may not see them, depending on when userspace
> > > > > tries to read the device's attributes.
> > > > >
> > > > > I have this possible workaround, but I do not know if it is a good solution:
> > > > >
> > > > > When binding the device and the driver together, create a new device as a
> > > > > child to the current device, and fill its "groups" member to point to the
> > > > > per-device attributes' group. As the device will be created with all the
> > > > > attributes, it will not be affected by the race issues. The functions
> > > > > handling the attributes will need to be modified to use the parents of their
> > > > > "device" parameter, instead of the device itself. Additionnaly, the sysfs
> > > > > location of the attributes will be different, as the child device will show
> > > > > up in the sysfs path. But for a newly introduced device this will not be
> > > > > a problem.
> > > > >
> > > > > Is this a good compromise ?
> > > >
> > > > Not really. You just want the attributes on the platform device itself.
> > > >
> > > > Given the horrible hack that platform devices are today, what's one more
> > > > hack!
> > > >
> > > > Here's a patch below of what should probably be done here. Richard, can
> > > > you change your code to use the new dev_groups pointer in the struct
> > > > platform_driver and this patch and let me know if that works or not?
> > > >
> > > > Note, I've only compiled this code, not tested it...
> > > >
> > >
> > > Your patch works.
> > >
> > > Many thanks for your help!
> >
> > Nice!
> >
> > I guess I need to turn it into a real patch now. Let me do that tonight
> > and see if I can convert some existing drivers to use it as well...
> >
>
> Sorry for asking.
>
> I haven't seen your patch, did you release that?
>

I didn't post it yet, sorry. I started on cleaning up the whole kernel
tree, to show users of the new groups, and then got side-tracked. The
code is in a public branch, I'll clean it up this week and send it off,
hopefully I'll have time over the next few days...

thanks,

greg k-h