2023-01-20 03:13:33

by Lizhi Hou

[permalink] [raw]
Subject: [PATCH V7 0/3] Generate device tree node for pci devices

This patch series introduces OF overlay support for PCI devices which
primarily addresses two use cases. First, it provides a data driven method
to describe hardware peripherals that are present in a PCI endpoint and
hence can be accessed by the PCI host. Second, it allows reuse of a OF
compatible driver -- often used in SoC platforms -- in a PCI host based
system.

There are 2 series devices rely on this patch:

1) Xilinx Alveo Accelerator cards (FPGA based device)
2) Microchip LAN9662 Ethernet Controller

Please see: https://lore.kernel.org/lkml/[email protected]/

Normally, the PCI core discovers PCI devices and their BARs using the
PCI enumeration process. However, the process does not provide a way to
discover the hardware peripherals that are present in a PCI device, and
which can be accessed through the PCI BARs. Also, the enumeration process
does not provide a way to associate MSI-X vectors of a PCI device with the
hardware peripherals that are present in the device. PCI device drivers
often use header files to describe the hardware peripherals and their
resources as there is no standard data driven way to do so. This patch
series proposes to use flattened device tree blob to describe the
peripherals in a data driven way. Based on previous discussion, using
device tree overlay is the best way to unflatten the blob and populate
platform devices. To use device tree overlay, there are three obvious
problems that need to be resolved.

First, we need to create a base tree for non-DT system such as x86_64. A
patch series has been submitted for this:
https://lore.kernel.org/lkml/[email protected]/
https://lore.kernel.org/lkml/[email protected]/

Second, a device tree node corresponding to the PCI endpoint is required
for overlaying the flattened device tree blob for that PCI endpoint.
Because PCI is a self-discoverable bus, a device tree node is usually not
created for PCI devices. This series adds support to generate a device
tree node for a PCI device which advertises itself using PCI quirks
infrastructure.

Third, we need to generate device tree nodes for PCI bridges since a child
PCI endpoint may choose to have a device tree node created.

This patch series is made up of three patches.

The first patch is adding OF interface to create or destroy OF node
dynamically.

The second patch introduces a kernel option, CONFIG_DYNAMIC_PCI_OF_NODEX.
When the option is turned on, the kernel will generate device tree nodes
for all PCI bridges unconditionally. The patch also shows how to use the
PCI quirks infrastructure, DECLARE_PCI_FIXUP_FINAL to generate a device
tree node for a device. Specifically, the patch generates a device tree
node for Xilinx Alveo U50 PCIe accelerator device. The generated device
tree nodes do not have any property.

The third patch adds basic properties ('reg', 'compatible' and
'device_type') to the dynamically generated device tree nodes. More
properties can be added in the future.

Here is the example of device tree nodes generated within the ARM64 QEMU.
# lspci -t
-[0000:00]-+-00.0
+-01.0-[01]--
+-01.1-[02]----00.0
+-01.2-[03]----00.0
+-01.3-[04]----00.0
+-01.4-[05]----00.0
+-01.5-[06]--
+-01.6-[07]--
+-01.7-[08]--
+-02.0-[09-0b]----00.0-[0a-0b]----00.0-[0b]--+-00.0
| \-00.1
+-02.1-[0c]--
\-03.0-[0d-0e]----00.0-[0e]----01.0

# tree /sys/firmware/devicetree/base/pcie\@10000000
/sys/firmware/devicetree/base/pcie@10000000
|-- #address-cells
|-- #interrupt-cells
|-- #size-cells
|-- bus-range
|-- compatible
|-- device_type
|-- dma-coherent
|-- interrupt-map
|-- interrupt-map-mask
|-- linux,pci-domain
|-- msi-parent
|-- name
|-- pci@1,0
| |-- #address-cells
| |-- #size-cells
| |-- compatible
| |-- device_type
| |-- ranges
| `-- reg
|-- pci@1,1
| |-- #address-cells
| |-- #size-cells
| |-- compatible
| |-- device_type
| |-- ranges
| `-- reg
|-- pci@1,2
| |-- #address-cells
| |-- #size-cells
| |-- compatible
| |-- device_type
| |-- ranges
| `-- reg
|-- pci@1,3
| |-- #address-cells
| |-- #size-cells
| |-- compatible
| |-- device_type
| |-- ranges
| `-- reg
|-- pci@1,4
| |-- #address-cells
| |-- #size-cells
| |-- compatible
| |-- device_type
| |-- ranges
| `-- reg
|-- pci@1,5
| |-- #address-cells
| |-- #size-cells
| |-- compatible
| |-- device_type
| |-- ranges
| `-- reg
|-- pci@1,6
| |-- #address-cells
| |-- #size-cells
| |-- compatible
| |-- device_type
| |-- ranges
| `-- reg
|-- pci@1,7
| |-- #address-cells
| |-- #size-cells
| |-- compatible
| |-- device_type
| |-- ranges
| `-- reg
|-- pci@2,0
| |-- #address-cells
| |-- #size-cells
| |-- compatible
| |-- device_type
| |-- pci@0,0
| | |-- #address-cells
| | |-- #size-cells
| | |-- compatible
| | |-- device_type
| | |-- pci@0,0
| | | |-- #address-cells
| | | |-- #size-cells
| | | |-- compatible
| | | |-- dev@0,0
| | | | |-- compatible
| | | | `-- reg
| | | |-- dev@0,1
| | | | |-- compatible
| | | | `-- reg
| | | |-- device_type
| | | |-- ranges
| | | `-- reg
| | |-- ranges
| | `-- reg
| |-- ranges
| `-- reg
|-- pci@2,1
| |-- #address-cells
| |-- #size-cells
| |-- compatible
| |-- device_type
| |-- ranges
| `-- reg
|-- pci@3,0
| |-- #address-cells
| |-- #size-cells
| |-- compatible
| |-- device_type
| |-- pci@0,0
| | |-- #address-cells
| | |-- #size-cells
| | |-- compatible
| | |-- device_type
| | |-- ranges
| | `-- reg
| |-- ranges
| `-- reg
|-- ranges
`-- reg

Changes since v6:
- Removed single line wrapper functions
- Added Signed-off-by Clément Léger <[email protected]>

Changes since v5:
- Fixed code review comments
- Fixed incorrect 'ranges' and 'reg' properties and verified address
translation.

Changes since RFC v4:
- Fixed code review comments

Changes since RFC v3:
- Split the Xilinx Alveo U50 PCI quirk to a separate patch
- Minor changes in commit description and code comment

Changes since RFC v2:
- Merged patch 3 with patch 2
- Added OF interfaces of_changeset_add_prop_* and use them to create
properties.
- Added '#address-cells', '#size-cells' and 'ranges' properties.

Changes since RFC v1:
- Added one patch to create basic properties.
- To move DT related code out of PCI subsystem, replaced of_node_alloc()
with of_create_node()/of_destroy_node()

Lizhi Hou (3):
of: dynamic: Add interfaces for creating device node dynamically
PCI: Create device tree node for selected devices
PCI: Add PCI quirks to generate device tree node for Xilinx Alveo U50

drivers/of/dynamic.c | 197 +++++++++++++++++++++++++++++++++
drivers/pci/Kconfig | 12 ++
drivers/pci/Makefile | 1 +
drivers/pci/bus.c | 2 +
drivers/pci/msi/irqdomain.c | 6 +-
drivers/pci/of.c | 71 ++++++++++++
drivers/pci/of_property.c | 212 ++++++++++++++++++++++++++++++++++++
drivers/pci/pci-driver.c | 3 +-
drivers/pci/pci.h | 19 ++++
drivers/pci/quirks.c | 11 ++
drivers/pci/remove.c | 1 +
include/linux/of.h | 24 ++++
12 files changed, 556 insertions(+), 3 deletions(-)
create mode 100644 drivers/pci/of_property.c

--
2.34.1


2023-01-20 03:14:13

by Lizhi Hou

[permalink] [raw]
Subject: [PATCH V7 3/3] PCI: Add PCI quirks to generate device tree node for Xilinx Alveo U50

The Xilinx Alveo U50 PCI card exposes multiple hardware peripherals on
its PCI BAR. The card firmware provides a flattened device tree to
describe the hardware peripherals on its BARs. This allows U50 driver to
load the flattened device tree and generate the device tree node for
hardware peripherals underneath.

To generate device tree node for U50 card, added PCI quirks to call
of_pci_make_dev_node() for U50.

Signed-off-by: Lizhi Hou <[email protected]>
Signed-off-by: Sonal Santan <[email protected]>
Signed-off-by: Max Zhen <[email protected]>
Reviewed-by: Brian Xu <[email protected]>
Signed-off-by: Clément Léger <[email protected]>
---
drivers/pci/quirks.c | 11 +++++++++++
1 file changed, 11 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 285acc4aaccc..f184cf51b800 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -5992,3 +5992,14 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x9a2d, dpc_log_size);
DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x9a2f, dpc_log_size);
DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x9a31, dpc_log_size);
#endif
+
+/*
+ * For PCI device which have multiple downstream devices, its driver may use
+ * a flattened device tree to describe the downstream devices.
+ * To overlay the flattened device tree, the PCI device and all its ancestor
+ * devices need to have device tree nodes on system base device tree. Thus,
+ * before driver probing, it might need to add a device tree node as the final
+ * fixup.
+ */
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_XILINX, 0x5020, of_pci_make_dev_node);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_XILINX, 0x5021, of_pci_make_dev_node);
--
2.34.1

2023-01-20 03:14:13

by Lizhi Hou

[permalink] [raw]
Subject: [PATCH V7 1/3] of: dynamic: Add interfaces for creating device node dynamically

of_create_node() creates device node dynamically. The parent device node
and full name are required for creating the node. It optionally creates
an OF changeset and attaches the newly created node to the changeset. The
device node pointer and the changeset pointer can be used to add
properties to the device node and apply the node to the base tree.

of_destroy_node() frees the device node created by of_create_node(). If
an OF changeset was also created for this node, it will destroy the
changeset before freeing the device node.

Expand of_changeset APIs to handle specific types of properties.
of_changeset_add_prop_string()
of_changeset_add_prop_string_array()
of_changeset_add_prop_u32_array()

Signed-off-by: Lizhi Hou <[email protected]>
Signed-off-by: Sonal Santan <[email protected]>
Signed-off-by: Max Zhen <[email protected]>
Reviewed-by: Brian Xu <[email protected]>
Signed-off-by: Clément Léger <[email protected]>
---
drivers/of/dynamic.c | 197 +++++++++++++++++++++++++++++++++++++++++++
include/linux/of.h | 24 ++++++
2 files changed, 221 insertions(+)

diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
index cd3821a6444f..4e211a1d039f 100644
--- a/drivers/of/dynamic.c
+++ b/drivers/of/dynamic.c
@@ -461,6 +461,71 @@ struct device_node *__of_node_dup(const struct device_node *np,
return NULL;
}

+/**
+ * of_create_node - Dynamically create a device node
+ *
+ * @parent: Pointer to parent device node
+ * @full_name: Node full name
+ * @cset: Pointer to returning changeset
+ *
+ * Return: Pointer to the created device node or NULL in case of an error.
+ */
+struct device_node *of_create_node(struct device_node *parent,
+ const char *full_name,
+ struct of_changeset **cset)
+{
+ struct of_changeset *ocs;
+ struct device_node *np;
+ int ret;
+
+ np = __of_node_dup(NULL, full_name);
+ if (!np)
+ return NULL;
+ np->parent = parent;
+
+ if (!cset)
+ return np;
+
+ ocs = kmalloc(sizeof(*ocs), GFP_KERNEL);
+ if (!ocs) {
+ of_node_put(np);
+ return NULL;
+ }
+
+ of_changeset_init(ocs);
+ ret = of_changeset_attach_node(ocs, np);
+ if (ret) {
+ of_changeset_destroy(ocs);
+ of_node_put(np);
+ kfree(ocs);
+ return NULL;
+ }
+
+ np->data = ocs;
+ *cset = ocs;
+
+ return np;
+}
+EXPORT_SYMBOL(of_create_node);
+
+/**
+ * of_destroy_node - Destroy a dynamically created device node
+ *
+ * @np: Pointer to dynamically created device node
+ *
+ */
+void of_destroy_node(struct device_node *np)
+{
+ struct of_changeset *ocs;
+
+ if (np->data) {
+ ocs = (struct of_changeset *)np->data;
+ of_changeset_destroy(ocs);
+ }
+ of_node_put(np);
+}
+EXPORT_SYMBOL(of_destroy_node);
+
static void __of_changeset_entry_destroy(struct of_changeset_entry *ce)
{
if (ce->action == OF_RECONFIG_ATTACH_NODE &&
@@ -934,3 +999,135 @@ int of_changeset_action(struct of_changeset *ocs, unsigned long action,
return 0;
}
EXPORT_SYMBOL_GPL(of_changeset_action);
+
+static int of_changeset_add_prop_helper(struct of_changeset *ocs,
+ struct device_node *np,
+ const struct property *pp)
+{
+ struct property *new_pp;
+ int ret;
+
+ new_pp = __of_prop_dup(pp, GFP_KERNEL);
+ if (!new_pp)
+ return -ENOMEM;
+
+ ret = of_changeset_add_property(ocs, np, new_pp);
+ if (ret) {
+ kfree(new_pp->name);
+ kfree(new_pp->value);
+ kfree(new_pp);
+ }
+
+ return ret;
+}
+
+/**
+ * of_changeset_add_prop_string - Add a string property to a changeset
+ *
+ * @ocs: changeset pointer
+ * @np: device node pointer
+ * @prop_name: name of the property to be added
+ * @str: pointer to null terminated string
+ *
+ * Create a string property and add it to a changeset.
+ *
+ * Return: 0 on success, a negative error value in case of an error.
+ */
+int of_changeset_add_prop_string(struct of_changeset *ocs,
+ struct device_node *np,
+ const char *prop_name, const char *str)
+{
+ struct property prop;
+
+ prop.name = (char *)prop_name;
+ prop.length = strlen(str) + 1;
+ prop.value = (void *)str;
+
+ return of_changeset_add_prop_helper(ocs, np, &prop);
+}
+EXPORT_SYMBOL_GPL(of_changeset_add_prop_string);
+
+/**
+ * of_changeset_add_prop_string_array - Add a string list property to
+ * a changeset
+ *
+ * @ocs: changeset pointer
+ * @np: device node pointer
+ * @prop_name: name of the property to be added
+ * @str_array: pointer to an array of null terminated strings
+ * @sz: number of string array elements
+ *
+ * Create a string list property and add it to a changeset.
+ *
+ * Return: 0 on success, a negative error value in case of an error.
+ */
+int of_changeset_add_prop_string_array(struct of_changeset *ocs,
+ struct device_node *np,
+ const char *prop_name,
+ const char **str_array, size_t sz)
+{
+ struct property prop;
+ int i, ret;
+ char *vp;
+
+ prop.name = (char *)prop_name;
+
+ prop.length = 0;
+ for (i = 0; i < sz; i++)
+ prop.length += strlen(str_array[i]) + 1;
+
+ prop.value = kmalloc(prop.length, GFP_KERNEL);
+ if (!prop.value)
+ return -ENOMEM;
+
+ vp = prop.value;
+ for (i = 0; i < sz; i++) {
+ vp += snprintf(vp, (char *)prop.value + prop.length - vp, "%s",
+ str_array[i]) + 1;
+ }
+ ret = of_changeset_add_prop_helper(ocs, np, &prop);
+ kfree(prop.value);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(of_changeset_add_prop_string_array);
+
+/**
+ * of_changeset_add_prop_u32_array - Add a property of 32 bit integers
+ * property to a changeset
+ *
+ * @ocs: changeset pointer
+ * @np: device node pointer
+ * @prop_name: name of the property to be added
+ * @array: pointer to an array of 32 bit integers
+ * @sz: number of array elements
+ *
+ * Create a property of 32 bit integers and add it to a changeset.
+ *
+ * Return: 0 on success, a negative error value in case of an error.
+ */
+int of_changeset_add_prop_u32_array(struct of_changeset *ocs,
+ struct device_node *np,
+ const char *prop_name,
+ const u32 *array, size_t sz)
+{
+ struct property prop;
+ __be32 *val;
+ int i, ret;
+
+ val = kcalloc(sz, sizeof(__be32), GFP_KERNEL);
+ if (!val)
+ return -ENOMEM;
+
+ for (i = 0; i < sz; i++)
+ val[i] = cpu_to_be32(array[i]);
+ prop.name = (char *)prop_name;
+ prop.length = sizeof(u32) * sz;
+ prop.value = (void *)val;
+
+ ret = of_changeset_add_prop_helper(ocs, np, &prop);
+ kfree(val);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(of_changeset_add_prop_u32_array);
diff --git a/include/linux/of.h b/include/linux/of.h
index 8b9f94386dc3..ebd276d4c4aa 100644
--- a/include/linux/of.h
+++ b/include/linux/of.h
@@ -1505,6 +1505,30 @@ static inline int of_changeset_update_property(struct of_changeset *ocs,
{
return of_changeset_action(ocs, OF_RECONFIG_UPDATE_PROPERTY, np, prop);
}
+
+struct device_node *of_create_node(struct device_node *parent,
+ const char *full_name,
+ struct of_changeset **cset);
+void of_destroy_node(struct device_node *np);
+int of_changeset_add_prop_string(struct of_changeset *ocs,
+ struct device_node *np,
+ const char *prop_name, const char *str);
+int of_changeset_add_prop_string_array(struct of_changeset *ocs,
+ struct device_node *np,
+ const char *prop_name,
+ const char **str_array, size_t sz);
+int of_changeset_add_prop_u32_array(struct of_changeset *ocs,
+ struct device_node *np,
+ const char *prop_name,
+ const u32 *array, size_t sz);
+static inline int of_changeset_add_prop_u32(struct of_changeset *ocs,
+ struct device_node *np,
+ const char *prop_name,
+ const u32 val)
+{
+ return of_changeset_add_prop_u32_array(ocs, np, prop_name, &val, 1);
+}
+
#else /* CONFIG_OF_DYNAMIC */
static inline int of_reconfig_notifier_register(struct notifier_block *nb)
{
--
2.34.1

2023-01-23 04:32:20

by Frank Rowand

[permalink] [raw]
Subject: Re: [PATCH V7 0/3] Generate device tree node for pci devices

Hi Rob, Lizhi,

On 1/19/23 21:02, Lizhi Hou wrote:
> This patch series introduces OF overlay support for PCI devices which
> primarily addresses two use cases. First, it provides a data driven method
> to describe hardware peripherals that are present in a PCI endpoint and
> hence can be accessed by the PCI host. Second, it allows reuse of a OF
> compatible driver -- often used in SoC platforms -- in a PCI host based
> system.

I had hoped to review this series by today, but have not yet due to working
on some new unittest features. I hope to get to this series Monday.

-Frank

>
> There are 2 series devices rely on this patch:
>
> 1) Xilinx Alveo Accelerator cards (FPGA based device)
> 2) Microchip LAN9662 Ethernet Controller
>
> Please see: https://lore.kernel.org/lkml/[email protected]/
>
> Normally, the PCI core discovers PCI devices and their BARs using the
> PCI enumeration process. However, the process does not provide a way to
> discover the hardware peripherals that are present in a PCI device, and
> which can be accessed through the PCI BARs. Also, the enumeration process
> does not provide a way to associate MSI-X vectors of a PCI device with the
> hardware peripherals that are present in the device. PCI device drivers
> often use header files to describe the hardware peripherals and their
> resources as there is no standard data driven way to do so. This patch
> series proposes to use flattened device tree blob to describe the
> peripherals in a data driven way. Based on previous discussion, using
> device tree overlay is the best way to unflatten the blob and populate
> platform devices. To use device tree overlay, there are three obvious
> problems that need to be resolved.
>
> First, we need to create a base tree for non-DT system such as x86_64. A
> patch series has been submitted for this:
> https://lore.kernel.org/lkml/[email protected]/
> https://lore.kernel.org/lkml/[email protected]/
>
> Second, a device tree node corresponding to the PCI endpoint is required
> for overlaying the flattened device tree blob for that PCI endpoint.
> Because PCI is a self-discoverable bus, a device tree node is usually not
> created for PCI devices. This series adds support to generate a device
> tree node for a PCI device which advertises itself using PCI quirks
> infrastructure.
>
> Third, we need to generate device tree nodes for PCI bridges since a child
> PCI endpoint may choose to have a device tree node created.
>
> This patch series is made up of three patches.
>
> The first patch is adding OF interface to create or destroy OF node
> dynamically.
>
> The second patch introduces a kernel option, CONFIG_DYNAMIC_PCI_OF_NODEX.
> When the option is turned on, the kernel will generate device tree nodes
> for all PCI bridges unconditionally. The patch also shows how to use the
> PCI quirks infrastructure, DECLARE_PCI_FIXUP_FINAL to generate a device
> tree node for a device. Specifically, the patch generates a device tree
> node for Xilinx Alveo U50 PCIe accelerator device. The generated device
> tree nodes do not have any property.
>
> The third patch adds basic properties ('reg', 'compatible' and
> 'device_type') to the dynamically generated device tree nodes. More
> properties can be added in the future.
>
> Here is the example of device tree nodes generated within the ARM64 QEMU.
> # lspci -t
> -[0000:00]-+-00.0
> +-01.0-[01]--
> +-01.1-[02]----00.0
> +-01.2-[03]----00.0
> +-01.3-[04]----00.0
> +-01.4-[05]----00.0
> +-01.5-[06]--
> +-01.6-[07]--
> +-01.7-[08]--
> +-02.0-[09-0b]----00.0-[0a-0b]----00.0-[0b]--+-00.0
> | \-00.1
> +-02.1-[0c]--
> \-03.0-[0d-0e]----00.0-[0e]----01.0
>
> # tree /sys/firmware/devicetree/base/pcie\@10000000
> /sys/firmware/devicetree/base/pcie@10000000
> |-- #address-cells
> |-- #interrupt-cells
> |-- #size-cells
> |-- bus-range
> |-- compatible
> |-- device_type
> |-- dma-coherent
> |-- interrupt-map
> |-- interrupt-map-mask
> |-- linux,pci-domain
> |-- msi-parent
> |-- name
> |-- pci@1,0
> | |-- #address-cells
> | |-- #size-cells
> | |-- compatible
> | |-- device_type
> | |-- ranges
> | `-- reg
> |-- pci@1,1
> | |-- #address-cells
> | |-- #size-cells
> | |-- compatible
> | |-- device_type
> | |-- ranges
> | `-- reg
> |-- pci@1,2
> | |-- #address-cells
> | |-- #size-cells
> | |-- compatible
> | |-- device_type
> | |-- ranges
> | `-- reg
> |-- pci@1,3
> | |-- #address-cells
> | |-- #size-cells
> | |-- compatible
> | |-- device_type
> | |-- ranges
> | `-- reg
> |-- pci@1,4
> | |-- #address-cells
> | |-- #size-cells
> | |-- compatible
> | |-- device_type
> | |-- ranges
> | `-- reg
> |-- pci@1,5
> | |-- #address-cells
> | |-- #size-cells
> | |-- compatible
> | |-- device_type
> | |-- ranges
> | `-- reg
> |-- pci@1,6
> | |-- #address-cells
> | |-- #size-cells
> | |-- compatible
> | |-- device_type
> | |-- ranges
> | `-- reg
> |-- pci@1,7
> | |-- #address-cells
> | |-- #size-cells
> | |-- compatible
> | |-- device_type
> | |-- ranges
> | `-- reg
> |-- pci@2,0
> | |-- #address-cells
> | |-- #size-cells
> | |-- compatible
> | |-- device_type
> | |-- pci@0,0
> | | |-- #address-cells
> | | |-- #size-cells
> | | |-- compatible
> | | |-- device_type
> | | |-- pci@0,0
> | | | |-- #address-cells
> | | | |-- #size-cells
> | | | |-- compatible
> | | | |-- dev@0,0
> | | | | |-- compatible
> | | | | `-- reg
> | | | |-- dev@0,1
> | | | | |-- compatible
> | | | | `-- reg
> | | | |-- device_type
> | | | |-- ranges
> | | | `-- reg
> | | |-- ranges
> | | `-- reg
> | |-- ranges
> | `-- reg
> |-- pci@2,1
> | |-- #address-cells
> | |-- #size-cells
> | |-- compatible
> | |-- device_type
> | |-- ranges
> | `-- reg
> |-- pci@3,0
> | |-- #address-cells
> | |-- #size-cells
> | |-- compatible
> | |-- device_type
> | |-- pci@0,0
> | | |-- #address-cells
> | | |-- #size-cells
> | | |-- compatible
> | | |-- device_type
> | | |-- ranges
> | | `-- reg
> | |-- ranges
> | `-- reg
> |-- ranges
> `-- reg
>
> Changes since v6:
> - Removed single line wrapper functions
> - Added Signed-off-by Clément Léger <[email protected]>
>
> Changes since v5:
> - Fixed code review comments
> - Fixed incorrect 'ranges' and 'reg' properties and verified address
> translation.
>
> Changes since RFC v4:
> - Fixed code review comments
>
> Changes since RFC v3:
> - Split the Xilinx Alveo U50 PCI quirk to a separate patch
> - Minor changes in commit description and code comment
>
> Changes since RFC v2:
> - Merged patch 3 with patch 2
> - Added OF interfaces of_changeset_add_prop_* and use them to create
> properties.
> - Added '#address-cells', '#size-cells' and 'ranges' properties.
>
> Changes since RFC v1:
> - Added one patch to create basic properties.
> - To move DT related code out of PCI subsystem, replaced of_node_alloc()
> with of_create_node()/of_destroy_node()
>
> Lizhi Hou (3):
> of: dynamic: Add interfaces for creating device node dynamically
> PCI: Create device tree node for selected devices
> PCI: Add PCI quirks to generate device tree node for Xilinx Alveo U50
>
> drivers/of/dynamic.c | 197 +++++++++++++++++++++++++++++++++
> drivers/pci/Kconfig | 12 ++
> drivers/pci/Makefile | 1 +
> drivers/pci/bus.c | 2 +
> drivers/pci/msi/irqdomain.c | 6 +-
> drivers/pci/of.c | 71 ++++++++++++
> drivers/pci/of_property.c | 212 ++++++++++++++++++++++++++++++++++++
> drivers/pci/pci-driver.c | 3 +-
> drivers/pci/pci.h | 19 ++++
> drivers/pci/quirks.c | 11 ++
> drivers/pci/remove.c | 1 +
> include/linux/of.h | 24 ++++
> 12 files changed, 556 insertions(+), 3 deletions(-)
> create mode 100644 drivers/pci/of_property.c
>


2023-01-23 23:40:47

by Frank Rowand

[permalink] [raw]
Subject: Re: [PATCH V7 0/3] Generate device tree node for pci devices

On 1/22/23 22:32, Frank Rowand wrote:
> Hi Rob, Lizhi,
>
> On 1/19/23 21:02, Lizhi Hou wrote:
>> This patch series introduces OF overlay support for PCI devices which
>> primarily addresses two use cases. First, it provides a data driven method
>> to describe hardware peripherals that are present in a PCI endpoint and
>> hence can be accessed by the PCI host. Second, it allows reuse of a OF
>> compatible driver -- often used in SoC platforms -- in a PCI host based
>> system.
>
> I had hoped to review this series by today, but have not yet due to working
> on some new unittest features. I hope to get to this series Monday.

I have skimmed through the series, and dug more deeply into portions of
the patches, but have not yet fully examined the full series.

I will be on vacation for a week or so, and will resume the detailed
reading of the series.

One overall comment at this point, is that this series is somewhat
duplicating portions of the (incomplete and fragile) overlay functionality
but not fully. One trivial example is no locking to prevent interference
between tree modification by overlay code that could be occurring
asynchronously at the same time this new code is modifying the tree.

-Frank

>
> -Frank
>
>>
>> There are 2 series devices rely on this patch:
>>
>> 1) Xilinx Alveo Accelerator cards (FPGA based device)
>> 2) Microchip LAN9662 Ethernet Controller
>>
>> Please see: https://lore.kernel.org/lkml/[email protected]/
>>
>> Normally, the PCI core discovers PCI devices and their BARs using the
>> PCI enumeration process. However, the process does not provide a way to
>> discover the hardware peripherals that are present in a PCI device, and
>> which can be accessed through the PCI BARs. Also, the enumeration process
>> does not provide a way to associate MSI-X vectors of a PCI device with the
>> hardware peripherals that are present in the device. PCI device drivers
>> often use header files to describe the hardware peripherals and their
>> resources as there is no standard data driven way to do so. This patch
>> series proposes to use flattened device tree blob to describe the
>> peripherals in a data driven way. Based on previous discussion, using
>> device tree overlay is the best way to unflatten the blob and populate
>> platform devices. To use device tree overlay, there are three obvious
>> problems that need to be resolved.
>>
>> First, we need to create a base tree for non-DT system such as x86_64. A
>> patch series has been submitted for this:
>> https://lore.kernel.org/lkml/[email protected]/
>> https://lore.kernel.org/lkml/[email protected]/
>>
>> Second, a device tree node corresponding to the PCI endpoint is required
>> for overlaying the flattened device tree blob for that PCI endpoint.
>> Because PCI is a self-discoverable bus, a device tree node is usually not
>> created for PCI devices. This series adds support to generate a device
>> tree node for a PCI device which advertises itself using PCI quirks
>> infrastructure.
>>
>> Third, we need to generate device tree nodes for PCI bridges since a child
>> PCI endpoint may choose to have a device tree node created.
>>
>> This patch series is made up of three patches.
>>
>> The first patch is adding OF interface to create or destroy OF node
>> dynamically.
>>
>> The second patch introduces a kernel option, CONFIG_DYNAMIC_PCI_OF_NODEX.
>> When the option is turned on, the kernel will generate device tree nodes
>> for all PCI bridges unconditionally. The patch also shows how to use the
>> PCI quirks infrastructure, DECLARE_PCI_FIXUP_FINAL to generate a device
>> tree node for a device. Specifically, the patch generates a device tree
>> node for Xilinx Alveo U50 PCIe accelerator device. The generated device
>> tree nodes do not have any property.
>>
>> The third patch adds basic properties ('reg', 'compatible' and
>> 'device_type') to the dynamically generated device tree nodes. More
>> properties can be added in the future.
>>
>> Here is the example of device tree nodes generated within the ARM64 QEMU.
>> # lspci -t
>> -[0000:00]-+-00.0
>> +-01.0-[01]--
>> +-01.1-[02]----00.0
>> +-01.2-[03]----00.0
>> +-01.3-[04]----00.0
>> +-01.4-[05]----00.0
>> +-01.5-[06]--
>> +-01.6-[07]--
>> +-01.7-[08]--
>> +-02.0-[09-0b]----00.0-[0a-0b]----00.0-[0b]--+-00.0
>> | \-00.1
>> +-02.1-[0c]--
>> \-03.0-[0d-0e]----00.0-[0e]----01.0
>>
>> # tree /sys/firmware/devicetree/base/pcie\@10000000
>> /sys/firmware/devicetree/base/pcie@10000000
>> |-- #address-cells
>> |-- #interrupt-cells
>> |-- #size-cells
>> |-- bus-range
>> |-- compatible
>> |-- device_type
>> |-- dma-coherent
>> |-- interrupt-map
>> |-- interrupt-map-mask
>> |-- linux,pci-domain
>> |-- msi-parent
>> |-- name
>> |-- pci@1,0
>> | |-- #address-cells
>> | |-- #size-cells
>> | |-- compatible
>> | |-- device_type
>> | |-- ranges
>> | `-- reg
>> |-- pci@1,1
>> | |-- #address-cells
>> | |-- #size-cells
>> | |-- compatible
>> | |-- device_type
>> | |-- ranges
>> | `-- reg
>> |-- pci@1,2
>> | |-- #address-cells
>> | |-- #size-cells
>> | |-- compatible
>> | |-- device_type
>> | |-- ranges
>> | `-- reg
>> |-- pci@1,3
>> | |-- #address-cells
>> | |-- #size-cells
>> | |-- compatible
>> | |-- device_type
>> | |-- ranges
>> | `-- reg
>> |-- pci@1,4
>> | |-- #address-cells
>> | |-- #size-cells
>> | |-- compatible
>> | |-- device_type
>> | |-- ranges
>> | `-- reg
>> |-- pci@1,5
>> | |-- #address-cells
>> | |-- #size-cells
>> | |-- compatible
>> | |-- device_type
>> | |-- ranges
>> | `-- reg
>> |-- pci@1,6
>> | |-- #address-cells
>> | |-- #size-cells
>> | |-- compatible
>> | |-- device_type
>> | |-- ranges
>> | `-- reg
>> |-- pci@1,7
>> | |-- #address-cells
>> | |-- #size-cells
>> | |-- compatible
>> | |-- device_type
>> | |-- ranges
>> | `-- reg
>> |-- pci@2,0
>> | |-- #address-cells
>> | |-- #size-cells
>> | |-- compatible
>> | |-- device_type
>> | |-- pci@0,0
>> | | |-- #address-cells
>> | | |-- #size-cells
>> | | |-- compatible
>> | | |-- device_type
>> | | |-- pci@0,0
>> | | | |-- #address-cells
>> | | | |-- #size-cells
>> | | | |-- compatible
>> | | | |-- dev@0,0
>> | | | | |-- compatible
>> | | | | `-- reg
>> | | | |-- dev@0,1
>> | | | | |-- compatible
>> | | | | `-- reg
>> | | | |-- device_type
>> | | | |-- ranges
>> | | | `-- reg
>> | | |-- ranges
>> | | `-- reg
>> | |-- ranges
>> | `-- reg
>> |-- pci@2,1
>> | |-- #address-cells
>> | |-- #size-cells
>> | |-- compatible
>> | |-- device_type
>> | |-- ranges
>> | `-- reg
>> |-- pci@3,0
>> | |-- #address-cells
>> | |-- #size-cells
>> | |-- compatible
>> | |-- device_type
>> | |-- pci@0,0
>> | | |-- #address-cells
>> | | |-- #size-cells
>> | | |-- compatible
>> | | |-- device_type
>> | | |-- ranges
>> | | `-- reg
>> | |-- ranges
>> | `-- reg
>> |-- ranges
>> `-- reg
>>
>> Changes since v6:
>> - Removed single line wrapper functions
>> - Added Signed-off-by Clément Léger <[email protected]>
>>
>> Changes since v5:
>> - Fixed code review comments
>> - Fixed incorrect 'ranges' and 'reg' properties and verified address
>> translation.
>>
>> Changes since RFC v4:
>> - Fixed code review comments
>>
>> Changes since RFC v3:
>> - Split the Xilinx Alveo U50 PCI quirk to a separate patch
>> - Minor changes in commit description and code comment
>>
>> Changes since RFC v2:
>> - Merged patch 3 with patch 2
>> - Added OF interfaces of_changeset_add_prop_* and use them to create
>> properties.
>> - Added '#address-cells', '#size-cells' and 'ranges' properties.
>>
>> Changes since RFC v1:
>> - Added one patch to create basic properties.
>> - To move DT related code out of PCI subsystem, replaced of_node_alloc()
>> with of_create_node()/of_destroy_node()
>>
>> Lizhi Hou (3):
>> of: dynamic: Add interfaces for creating device node dynamically
>> PCI: Create device tree node for selected devices
>> PCI: Add PCI quirks to generate device tree node for Xilinx Alveo U50
>>
>> drivers/of/dynamic.c | 197 +++++++++++++++++++++++++++++++++
>> drivers/pci/Kconfig | 12 ++
>> drivers/pci/Makefile | 1 +
>> drivers/pci/bus.c | 2 +
>> drivers/pci/msi/irqdomain.c | 6 +-
>> drivers/pci/of.c | 71 ++++++++++++
>> drivers/pci/of_property.c | 212 ++++++++++++++++++++++++++++++++++++
>> drivers/pci/pci-driver.c | 3 +-
>> drivers/pci/pci.h | 19 ++++
>> drivers/pci/quirks.c | 11 ++
>> drivers/pci/remove.c | 1 +
>> include/linux/of.h | 24 ++++
>> 12 files changed, 556 insertions(+), 3 deletions(-)
>> create mode 100644 drivers/pci/of_property.c
>>
>


2023-02-10 09:37:32

by Christian Gmeiner

[permalink] [raw]
Subject: Re: [PATCH V7 0/3] Generate device tree node for pci devices

Hi all

> This patch series introduces OF overlay support for PCI devices which
> primarily addresses two use cases. First, it provides a data driven method
> to describe hardware peripherals that are present in a PCI endpoint and
> hence can be accessed by the PCI host. Second, it allows reuse of a OF
> compatible driver -- often used in SoC platforms -- in a PCI host based
> system.
>

I have tested this patch series on a FPGA connected via PCIe. I would love
to see this patch series hit mainline soon.

Tested-by: Christian Gmeiner <[email protected]>

--
greets
--
Christian Gmeiner, MSc

https://christian-gmeiner.info/privacypolicy

2023-02-22 15:26:49

by Frank Rowand

[permalink] [raw]
Subject: Re: [PATCH V7 0/3] Generate device tree node for pci devices

Hi Lizhi,

On 1/23/23 17:40, Frank Rowand wrote:
> On 1/22/23 22:32, Frank Rowand wrote:
>> Hi Rob, Lizhi,
>>
>> On 1/19/23 21:02, Lizhi Hou wrote:
>>> This patch series introduces OF overlay support for PCI devices which
>>> primarily addresses two use cases. First, it provides a data driven method
>>> to describe hardware peripherals that are present in a PCI endpoint and
>>> hence can be accessed by the PCI host. Second, it allows reuse of a OF
>>> compatible driver -- often used in SoC platforms -- in a PCI host based
>>> system.
>>
>> I had hoped to review this series by today, but have not yet due to working
>> on some new unittest features. I hope to get to this series Monday.
>
> I have skimmed through the series, and dug more deeply into portions of
> the patches, but have not yet fully examined the full series.
>
> I will be on vacation for a week or so, and will resume the detailed
> reading of the series.

I am back from vacation, and I have completed the other devicetree
related tasks that were also at the top of my devicetree todo list.

I will resume the detailed reading of the series as the item that is
now at the top of my devicetree todo list. But I consider any review
comments coming out of this as trivial compared to the issue raised in
the following paragraph:

>
> One overall comment at this point, is that this series is somewhat
> duplicating portions of the (incomplete and fragile) overlay functionality
> but not fully. One trivial example is no locking to prevent interference
> between tree modification by overlay code that could be occurring
> asynchronously at the same time this new code is modifying the tree.

Since there was no reply to the above paragraph, I am guessing that what
I wrote was not clear enough. As far as I am concerned, this issue is
critical. This patch series creates a body of code that is _more fragile_
and _more incomplete_ than the existing overlay code. I have been very
clear over many years that the overlay code is not usable in its current
implementation.

Further, the body of code in this patch series will interact with the
overlay code in a manner that makes the overlay code even more fragile
and not usable.

-Frank

>
> -Frank
>
>>
>> -Frank
>>
>>>
>>> There are 2 series devices rely on this patch:
>>>
>>> 1) Xilinx Alveo Accelerator cards (FPGA based device)
>>> 2) Microchip LAN9662 Ethernet Controller
>>>
>>> Please see: https://lore.kernel.org/lkml/[email protected]/
>>>
>>> Normally, the PCI core discovers PCI devices and their BARs using the
>>> PCI enumeration process. However, the process does not provide a way to
>>> discover the hardware peripherals that are present in a PCI device, and
>>> which can be accessed through the PCI BARs. Also, the enumeration process
>>> does not provide a way to associate MSI-X vectors of a PCI device with the
>>> hardware peripherals that are present in the device. PCI device drivers
>>> often use header files to describe the hardware peripherals and their
>>> resources as there is no standard data driven way to do so. This patch
>>> series proposes to use flattened device tree blob to describe the
>>> peripherals in a data driven way. Based on previous discussion, using
>>> device tree overlay is the best way to unflatten the blob and populate
>>> platform devices. To use device tree overlay, there are three obvious
>>> problems that need to be resolved.
>>>
>>> First, we need to create a base tree for non-DT system such as x86_64. A
>>> patch series has been submitted for this:
>>> https://lore.kernel.org/lkml/[email protected]/
>>> https://lore.kernel.org/lkml/[email protected]/
>>>
>>> Second, a device tree node corresponding to the PCI endpoint is required
>>> for overlaying the flattened device tree blob for that PCI endpoint.
>>> Because PCI is a self-discoverable bus, a device tree node is usually not
>>> created for PCI devices. This series adds support to generate a device
>>> tree node for a PCI device which advertises itself using PCI quirks
>>> infrastructure.
>>>
>>> Third, we need to generate device tree nodes for PCI bridges since a child
>>> PCI endpoint may choose to have a device tree node created.
>>>
>>> This patch series is made up of three patches.
>>>
>>> The first patch is adding OF interface to create or destroy OF node
>>> dynamically.
>>>
>>> The second patch introduces a kernel option, CONFIG_DYNAMIC_PCI_OF_NODEX.
>>> When the option is turned on, the kernel will generate device tree nodes
>>> for all PCI bridges unconditionally. The patch also shows how to use the
>>> PCI quirks infrastructure, DECLARE_PCI_FIXUP_FINAL to generate a device
>>> tree node for a device. Specifically, the patch generates a device tree
>>> node for Xilinx Alveo U50 PCIe accelerator device. The generated device
>>> tree nodes do not have any property.
>>>
>>> The third patch adds basic properties ('reg', 'compatible' and
>>> 'device_type') to the dynamically generated device tree nodes. More
>>> properties can be added in the future.
>>>
>>> Here is the example of device tree nodes generated within the ARM64 QEMU.
>>> # lspci -t
>>> -[0000:00]-+-00.0
>>> +-01.0-[01]--
>>> +-01.1-[02]----00.0
>>> +-01.2-[03]----00.0
>>> +-01.3-[04]----00.0
>>> +-01.4-[05]----00.0
>>> +-01.5-[06]--
>>> +-01.6-[07]--
>>> +-01.7-[08]--
>>> +-02.0-[09-0b]----00.0-[0a-0b]----00.0-[0b]--+-00.0
>>> | \-00.1
>>> +-02.1-[0c]--
>>> \-03.0-[0d-0e]----00.0-[0e]----01.0
>>>
>>> # tree /sys/firmware/devicetree/base/pcie\@10000000
>>> /sys/firmware/devicetree/base/pcie@10000000
>>> |-- #address-cells
>>> |-- #interrupt-cells
>>> |-- #size-cells
>>> |-- bus-range
>>> |-- compatible
>>> |-- device_type
>>> |-- dma-coherent
>>> |-- interrupt-map
>>> |-- interrupt-map-mask
>>> |-- linux,pci-domain
>>> |-- msi-parent
>>> |-- name
>>> |-- pci@1,0
>>> | |-- #address-cells
>>> | |-- #size-cells
>>> | |-- compatible
>>> | |-- device_type
>>> | |-- ranges
>>> | `-- reg
>>> |-- pci@1,1
>>> | |-- #address-cells
>>> | |-- #size-cells
>>> | |-- compatible
>>> | |-- device_type
>>> | |-- ranges
>>> | `-- reg
>>> |-- pci@1,2
>>> | |-- #address-cells
>>> | |-- #size-cells
>>> | |-- compatible
>>> | |-- device_type
>>> | |-- ranges
>>> | `-- reg
>>> |-- pci@1,3
>>> | |-- #address-cells
>>> | |-- #size-cells
>>> | |-- compatible
>>> | |-- device_type
>>> | |-- ranges
>>> | `-- reg
>>> |-- pci@1,4
>>> | |-- #address-cells
>>> | |-- #size-cells
>>> | |-- compatible
>>> | |-- device_type
>>> | |-- ranges
>>> | `-- reg
>>> |-- pci@1,5
>>> | |-- #address-cells
>>> | |-- #size-cells
>>> | |-- compatible
>>> | |-- device_type
>>> | |-- ranges
>>> | `-- reg
>>> |-- pci@1,6
>>> | |-- #address-cells
>>> | |-- #size-cells
>>> | |-- compatible
>>> | |-- device_type
>>> | |-- ranges
>>> | `-- reg
>>> |-- pci@1,7
>>> | |-- #address-cells
>>> | |-- #size-cells
>>> | |-- compatible
>>> | |-- device_type
>>> | |-- ranges
>>> | `-- reg
>>> |-- pci@2,0
>>> | |-- #address-cells
>>> | |-- #size-cells
>>> | |-- compatible
>>> | |-- device_type
>>> | |-- pci@0,0
>>> | | |-- #address-cells
>>> | | |-- #size-cells
>>> | | |-- compatible
>>> | | |-- device_type
>>> | | |-- pci@0,0
>>> | | | |-- #address-cells
>>> | | | |-- #size-cells
>>> | | | |-- compatible
>>> | | | |-- dev@0,0
>>> | | | | |-- compatible
>>> | | | | `-- reg
>>> | | | |-- dev@0,1
>>> | | | | |-- compatible
>>> | | | | `-- reg
>>> | | | |-- device_type
>>> | | | |-- ranges
>>> | | | `-- reg
>>> | | |-- ranges
>>> | | `-- reg
>>> | |-- ranges
>>> | `-- reg
>>> |-- pci@2,1
>>> | |-- #address-cells
>>> | |-- #size-cells
>>> | |-- compatible
>>> | |-- device_type
>>> | |-- ranges
>>> | `-- reg
>>> |-- pci@3,0
>>> | |-- #address-cells
>>> | |-- #size-cells
>>> | |-- compatible
>>> | |-- device_type
>>> | |-- pci@0,0
>>> | | |-- #address-cells
>>> | | |-- #size-cells
>>> | | |-- compatible
>>> | | |-- device_type
>>> | | |-- ranges
>>> | | `-- reg
>>> | |-- ranges
>>> | `-- reg
>>> |-- ranges
>>> `-- reg
>>>
>>> Changes since v6:
>>> - Removed single line wrapper functions
>>> - Added Signed-off-by Clément Léger <[email protected]>
>>>
>>> Changes since v5:
>>> - Fixed code review comments
>>> - Fixed incorrect 'ranges' and 'reg' properties and verified address
>>> translation.
>>>
>>> Changes since RFC v4:
>>> - Fixed code review comments
>>>
>>> Changes since RFC v3:
>>> - Split the Xilinx Alveo U50 PCI quirk to a separate patch
>>> - Minor changes in commit description and code comment
>>>
>>> Changes since RFC v2:
>>> - Merged patch 3 with patch 2
>>> - Added OF interfaces of_changeset_add_prop_* and use them to create
>>> properties.
>>> - Added '#address-cells', '#size-cells' and 'ranges' properties.
>>>
>>> Changes since RFC v1:
>>> - Added one patch to create basic properties.
>>> - To move DT related code out of PCI subsystem, replaced of_node_alloc()
>>> with of_create_node()/of_destroy_node()
>>>
>>> Lizhi Hou (3):
>>> of: dynamic: Add interfaces for creating device node dynamically
>>> PCI: Create device tree node for selected devices
>>> PCI: Add PCI quirks to generate device tree node for Xilinx Alveo U50
>>>
>>> drivers/of/dynamic.c | 197 +++++++++++++++++++++++++++++++++
>>> drivers/pci/Kconfig | 12 ++
>>> drivers/pci/Makefile | 1 +
>>> drivers/pci/bus.c | 2 +
>>> drivers/pci/msi/irqdomain.c | 6 +-
>>> drivers/pci/of.c | 71 ++++++++++++
>>> drivers/pci/of_property.c | 212 ++++++++++++++++++++++++++++++++++++
>>> drivers/pci/pci-driver.c | 3 +-
>>> drivers/pci/pci.h | 19 ++++
>>> drivers/pci/quirks.c | 11 ++
>>> drivers/pci/remove.c | 1 +
>>> include/linux/of.h | 24 ++++
>>> 12 files changed, 556 insertions(+), 3 deletions(-)
>>> create mode 100644 drivers/pci/of_property.c
>>>
>>
>


2023-02-23 00:37:49

by Lizhi Hou

[permalink] [raw]
Subject: Re: [PATCH V7 0/3] Generate device tree node for pci devices

Hi Frank,

On 2/22/23 07:26, Frank Rowand wrote:
> Hi Lizhi,
>
> On 1/23/23 17:40, Frank Rowand wrote:
>> On 1/22/23 22:32, Frank Rowand wrote:
>>> Hi Rob, Lizhi,
>>>
>>> On 1/19/23 21:02, Lizhi Hou wrote:
>>>> This patch series introduces OF overlay support for PCI devices which
>>>> primarily addresses two use cases. First, it provides a data driven method
>>>> to describe hardware peripherals that are present in a PCI endpoint and
>>>> hence can be accessed by the PCI host. Second, it allows reuse of a OF
>>>> compatible driver -- often used in SoC platforms -- in a PCI host based
>>>> system.
>>> I had hoped to review this series by today, but have not yet due to working
>>> on some new unittest features. I hope to get to this series Monday.
>> I have skimmed through the series, and dug more deeply into portions of
>> the patches, but have not yet fully examined the full series.
>>
>> I will be on vacation for a week or so, and will resume the detailed
>> reading of the series.
> I am back from vacation, and I have completed the other devicetree
> related tasks that were also at the top of my devicetree todo list.
>
> I will resume the detailed reading of the series as the item that is
> now at the top of my devicetree todo list. But I consider any review
> comments coming out of this as trivial compared to the issue raised in
> the following paragraph:
>
>> One overall comment at this point, is that this series is somewhat
>> duplicating portions of the (incomplete and fragile) overlay functionality
>> but not fully. One trivial example is no locking to prevent interference
>> between tree modification by overlay code that could be occurring
>> asynchronously at the same time this new code is modifying the tree.
> Since there was no reply to the above paragraph, I am guessing that what
> I wrote was not clear enough. As far as I am concerned, this issue is
> critical. This patch series creates a body of code that is _more fragile_
> and _more incomplete_ than the existing overlay code. I have been very
> clear over many years that the overlay code is not usable in its current
> implementation.
>
> Further, the body of code in this patch series will interact with the
> overlay code in a manner that makes the overlay code even more fragile
> and not usable.
>
> -Frank

I did not fully understand the concern here. And I thought you will add
more after your vacation. :)

This patch calls of_changeset_apply() to add new nodes to base tree. And
inside of_changeset_apply(),

global lock of_mutex is used to protect the change.  And this function
is also used by hotplug-cpu.c, i2c-demux-pinctrl.c, pnv_php.c.

Do you mean of_changeset_apply() is fragile and should not be used?


The scope of this patch is just create a device tree node for pci device
and add the node to the base tree during pci enumeration.

Could you give me an example code for the race condition you are
worrying about?


Thanks,

Lizhi


>
>> -Frank
>>
>>> -Frank
>>>
>>>> There are 2 series devices rely on this patch:
>>>>
>>>> 1) Xilinx Alveo Accelerator cards (FPGA based device)
>>>> 2) Microchip LAN9662 Ethernet Controller
>>>>
>>>> Please see: https://lore.kernel.org/lkml/[email protected]/
>>>>
>>>> Normally, the PCI core discovers PCI devices and their BARs using the
>>>> PCI enumeration process. However, the process does not provide a way to
>>>> discover the hardware peripherals that are present in a PCI device, and
>>>> which can be accessed through the PCI BARs. Also, the enumeration process
>>>> does not provide a way to associate MSI-X vectors of a PCI device with the
>>>> hardware peripherals that are present in the device. PCI device drivers
>>>> often use header files to describe the hardware peripherals and their
>>>> resources as there is no standard data driven way to do so. This patch
>>>> series proposes to use flattened device tree blob to describe the
>>>> peripherals in a data driven way. Based on previous discussion, using
>>>> device tree overlay is the best way to unflatten the blob and populate
>>>> platform devices. To use device tree overlay, there are three obvious
>>>> problems that need to be resolved.
>>>>
>>>> First, we need to create a base tree for non-DT system such as x86_64. A
>>>> patch series has been submitted for this:
>>>> https://lore.kernel.org/lkml/[email protected]/
>>>> https://lore.kernel.org/lkml/[email protected]/
>>>>
>>>> Second, a device tree node corresponding to the PCI endpoint is required
>>>> for overlaying the flattened device tree blob for that PCI endpoint.
>>>> Because PCI is a self-discoverable bus, a device tree node is usually not
>>>> created for PCI devices. This series adds support to generate a device
>>>> tree node for a PCI device which advertises itself using PCI quirks
>>>> infrastructure.
>>>>
>>>> Third, we need to generate device tree nodes for PCI bridges since a child
>>>> PCI endpoint may choose to have a device tree node created.
>>>>
>>>> This patch series is made up of three patches.
>>>>
>>>> The first patch is adding OF interface to create or destroy OF node
>>>> dynamically.
>>>>
>>>> The second patch introduces a kernel option, CONFIG_DYNAMIC_PCI_OF_NODEX.
>>>> When the option is turned on, the kernel will generate device tree nodes
>>>> for all PCI bridges unconditionally. The patch also shows how to use the
>>>> PCI quirks infrastructure, DECLARE_PCI_FIXUP_FINAL to generate a device
>>>> tree node for a device. Specifically, the patch generates a device tree
>>>> node for Xilinx Alveo U50 PCIe accelerator device. The generated device
>>>> tree nodes do not have any property.
>>>>
>>>> The third patch adds basic properties ('reg', 'compatible' and
>>>> 'device_type') to the dynamically generated device tree nodes. More
>>>> properties can be added in the future.
>>>>
>>>> Here is the example of device tree nodes generated within the ARM64 QEMU.
>>>> # lspci -t
>>>> -[0000:00]-+-00.0
>>>> +-01.0-[01]--
>>>> +-01.1-[02]----00.0
>>>> +-01.2-[03]----00.0
>>>> +-01.3-[04]----00.0
>>>> +-01.4-[05]----00.0
>>>> +-01.5-[06]--
>>>> +-01.6-[07]--
>>>> +-01.7-[08]--
>>>> +-02.0-[09-0b]----00.0-[0a-0b]----00.0-[0b]--+-00.0
>>>> | \-00.1
>>>> +-02.1-[0c]--
>>>> \-03.0-[0d-0e]----00.0-[0e]----01.0
>>>>
>>>> # tree /sys/firmware/devicetree/base/pcie\@10000000
>>>> /sys/firmware/devicetree/base/pcie@10000000
>>>> |-- #address-cells
>>>> |-- #interrupt-cells
>>>> |-- #size-cells
>>>> |-- bus-range
>>>> |-- compatible
>>>> |-- device_type
>>>> |-- dma-coherent
>>>> |-- interrupt-map
>>>> |-- interrupt-map-mask
>>>> |-- linux,pci-domain
>>>> |-- msi-parent
>>>> |-- name
>>>> |-- pci@1,0
>>>> | |-- #address-cells
>>>> | |-- #size-cells
>>>> | |-- compatible
>>>> | |-- device_type
>>>> | |-- ranges
>>>> | `-- reg
>>>> |-- pci@1,1
>>>> | |-- #address-cells
>>>> | |-- #size-cells
>>>> | |-- compatible
>>>> | |-- device_type
>>>> | |-- ranges
>>>> | `-- reg
>>>> |-- pci@1,2
>>>> | |-- #address-cells
>>>> | |-- #size-cells
>>>> | |-- compatible
>>>> | |-- device_type
>>>> | |-- ranges
>>>> | `-- reg
>>>> |-- pci@1,3
>>>> | |-- #address-cells
>>>> | |-- #size-cells
>>>> | |-- compatible
>>>> | |-- device_type
>>>> | |-- ranges
>>>> | `-- reg
>>>> |-- pci@1,4
>>>> | |-- #address-cells
>>>> | |-- #size-cells
>>>> | |-- compatible
>>>> | |-- device_type
>>>> | |-- ranges
>>>> | `-- reg
>>>> |-- pci@1,5
>>>> | |-- #address-cells
>>>> | |-- #size-cells
>>>> | |-- compatible
>>>> | |-- device_type
>>>> | |-- ranges
>>>> | `-- reg
>>>> |-- pci@1,6
>>>> | |-- #address-cells
>>>> | |-- #size-cells
>>>> | |-- compatible
>>>> | |-- device_type
>>>> | |-- ranges
>>>> | `-- reg
>>>> |-- pci@1,7
>>>> | |-- #address-cells
>>>> | |-- #size-cells
>>>> | |-- compatible
>>>> | |-- device_type
>>>> | |-- ranges
>>>> | `-- reg
>>>> |-- pci@2,0
>>>> | |-- #address-cells
>>>> | |-- #size-cells
>>>> | |-- compatible
>>>> | |-- device_type
>>>> | |-- pci@0,0
>>>> | | |-- #address-cells
>>>> | | |-- #size-cells
>>>> | | |-- compatible
>>>> | | |-- device_type
>>>> | | |-- pci@0,0
>>>> | | | |-- #address-cells
>>>> | | | |-- #size-cells
>>>> | | | |-- compatible
>>>> | | | |-- dev@0,0
>>>> | | | | |-- compatible
>>>> | | | | `-- reg
>>>> | | | |-- dev@0,1
>>>> | | | | |-- compatible
>>>> | | | | `-- reg
>>>> | | | |-- device_type
>>>> | | | |-- ranges
>>>> | | | `-- reg
>>>> | | |-- ranges
>>>> | | `-- reg
>>>> | |-- ranges
>>>> | `-- reg
>>>> |-- pci@2,1
>>>> | |-- #address-cells
>>>> | |-- #size-cells
>>>> | |-- compatible
>>>> | |-- device_type
>>>> | |-- ranges
>>>> | `-- reg
>>>> |-- pci@3,0
>>>> | |-- #address-cells
>>>> | |-- #size-cells
>>>> | |-- compatible
>>>> | |-- device_type
>>>> | |-- pci@0,0
>>>> | | |-- #address-cells
>>>> | | |-- #size-cells
>>>> | | |-- compatible
>>>> | | |-- device_type
>>>> | | |-- ranges
>>>> | | `-- reg
>>>> | |-- ranges
>>>> | `-- reg
>>>> |-- ranges
>>>> `-- reg
>>>>
>>>> Changes since v6:
>>>> - Removed single line wrapper functions
>>>> - Added Signed-off-by Clément Léger <[email protected]>
>>>>
>>>> Changes since v5:
>>>> - Fixed code review comments
>>>> - Fixed incorrect 'ranges' and 'reg' properties and verified address
>>>> translation.
>>>>
>>>> Changes since RFC v4:
>>>> - Fixed code review comments
>>>>
>>>> Changes since RFC v3:
>>>> - Split the Xilinx Alveo U50 PCI quirk to a separate patch
>>>> - Minor changes in commit description and code comment
>>>>
>>>> Changes since RFC v2:
>>>> - Merged patch 3 with patch 2
>>>> - Added OF interfaces of_changeset_add_prop_* and use them to create
>>>> properties.
>>>> - Added '#address-cells', '#size-cells' and 'ranges' properties.
>>>>
>>>> Changes since RFC v1:
>>>> - Added one patch to create basic properties.
>>>> - To move DT related code out of PCI subsystem, replaced of_node_alloc()
>>>> with of_create_node()/of_destroy_node()
>>>>
>>>> Lizhi Hou (3):
>>>> of: dynamic: Add interfaces for creating device node dynamically
>>>> PCI: Create device tree node for selected devices
>>>> PCI: Add PCI quirks to generate device tree node for Xilinx Alveo U50
>>>>
>>>> drivers/of/dynamic.c | 197 +++++++++++++++++++++++++++++++++
>>>> drivers/pci/Kconfig | 12 ++
>>>> drivers/pci/Makefile | 1 +
>>>> drivers/pci/bus.c | 2 +
>>>> drivers/pci/msi/irqdomain.c | 6 +-
>>>> drivers/pci/of.c | 71 ++++++++++++
>>>> drivers/pci/of_property.c | 212 ++++++++++++++++++++++++++++++++++++
>>>> drivers/pci/pci-driver.c | 3 +-
>>>> drivers/pci/pci.h | 19 ++++
>>>> drivers/pci/quirks.c | 11 ++
>>>> drivers/pci/remove.c | 1 +
>>>> include/linux/of.h | 24 ++++
>>>> 12 files changed, 556 insertions(+), 3 deletions(-)
>>>> create mode 100644 drivers/pci/of_property.c
>>>>

2023-02-26 22:39:06

by Frank Rowand

[permalink] [raw]
Subject: Re: [PATCH V7 0/3] Generate device tree node for pci devices

Hi Clément, Hi Lizhi,

On 1/19/23 21:02, Lizhi Hou wrote:
> This patch series introduces OF overlay support for PCI devices which
> primarily addresses two use cases. First, it provides a data driven method
> to describe hardware peripherals that are present in a PCI endpoint and
> hence can be accessed by the PCI host. Second, it allows reuse of a OF
> compatible driver -- often used in SoC platforms -- in a PCI host based
> system.
>
> There are 2 series devices rely on this patch:
>
> 1) Xilinx Alveo Accelerator cards (FPGA based device)
> 2) Microchip LAN9662 Ethernet Controller
>

Digging back through some history:

> Please see: https://lore.kernel.org/lkml/[email protected]/

(I am selectively pulling two fragments, see the above link for the
full email.)

Includes the following:

A driver using this support was added and can be seen at [3]. This
driver embeds a builtin overlay and applies it to the live tree using
of_overlay_fdt_apply_to_node(). An interrupt driver is also included and

and

This series was tested on a x86 kernel using CONFIG_OF under a virtual
machine using PCI passthrough.

Link: [1] https://lore.kernel.org/lkml/[email protected]/t/
Link: [2] https://lore.kernel.org/lkml/[email protected]/T/
Link: [3] https://github.com/clementleger/linux/tree/lan966x/of_support

Following link 3 to see how the driver implemented the concept, I arrived
at a git tree, with the commit be42efa "mfd: lan966x: add pci driver",
and have been looking at the code there.

Clément, is this still the best example of a driver implementation that
would use the framework proposed in the "[PATCH V7 0/3] Generate device
tree node for pci devices" patch series? And this is the driver for the
device listed as item 2 above "2) Microchip LAN9662 Ethernet Controller"?

-Frank

>
> Normally, the PCI core discovers PCI devices and their BARs using the
> PCI enumeration process. However, the process does not provide a way to
> discover the hardware peripherals that are present in a PCI device, and
> which can be accessed through the PCI BARs. Also, the enumeration process
> does not provide a way to associate MSI-X vectors of a PCI device with the
> hardware peripherals that are present in the device. PCI device drivers
> often use header files to describe the hardware peripherals and their
> resources as there is no standard data driven way to do so. This patch
> series proposes to use flattened device tree blob to describe the
> peripherals in a data driven way. Based on previous discussion, using
> device tree overlay is the best way to unflatten the blob and populate
> platform devices. To use device tree overlay, there are three obvious
> problems that need to be resolved.
>
> First, we need to create a base tree for non-DT system such as x86_64. A
> patch series has been submitted for this:
> https://lore.kernel.org/lkml/[email protected]/
> https://lore.kernel.org/lkml/[email protected]/
>
> Second, a device tree node corresponding to the PCI endpoint is required
> for overlaying the flattened device tree blob for that PCI endpoint.
> Because PCI is a self-discoverable bus, a device tree node is usually not
> created for PCI devices. This series adds support to generate a device
> tree node for a PCI device which advertises itself using PCI quirks
> infrastructure.
>
> Third, we need to generate device tree nodes for PCI bridges since a child
> PCI endpoint may choose to have a device tree node created.
>
> This patch series is made up of three patches.

< snip >


2023-02-27 06:51:38

by Frank Rowand

[permalink] [raw]
Subject: Re: [PATCH V7 0/3] Generate device tree node for pci devices

On 1/19/23 21:02, Lizhi Hou wrote:
> This patch series introduces OF overlay support for PCI devices which
> primarily addresses two use cases. First, it provides a data driven method
> to describe hardware peripherals that are present in a PCI endpoint and
> hence can be accessed by the PCI host. Second, it allows reuse of a OF
> compatible driver -- often used in SoC platforms -- in a PCI host based
> system.
>
> There are 2 series devices rely on this patch:
>
> 1) Xilinx Alveo Accelerator cards (FPGA based device)
> 2) Microchip LAN9662 Ethernet Controller
>
> Please see: https://lore.kernel.org/lkml/[email protected]/
>


> Normally, the PCI core discovers PCI devices and their BARs using the
> PCI enumeration process. However, the process does not provide a way to
> discover the hardware peripherals that are present in a PCI device, and
> which can be accessed through the PCI BARs. Also, the enumeration process

I'm confused. The PCI Configuration Header Registers should describe the
hardware on the PCI card.

Ignoring case 1 above _for the moment_ (FPGA devices are a world unto
themselves, so I would like to analyze that case separately), does the
second device, "Microchip LAN9662 Ethernet Controller" properly implement
the PCI Configuration Header Registers? What additional information is
needed that is not provided in those registers?

-Frank

> does not provide a way to associate MSI-X vectors of a PCI device with the
> hardware peripherals that are present in the device. PCI device drivers
> often use header files to describe the hardware peripherals and their
> resources as there is no standard data driven way to do so. This patch
> series proposes to use flattened device tree blob to describe the
> peripherals in a data driven way. Based on previous discussion, using
> device tree overlay is the best way to unflatten the blob and populate
> platform devices. To use device tree overlay, there are three obvious
> problems that need to be resolved.
>
> First, we need to create a base tree for non-DT system such as x86_64. A
> patch series has been submitted for this:
> https://lore.kernel.org/lkml/[email protected]/
> https://lore.kernel.org/lkml/[email protected]/
>
> Second, a device tree node corresponding to the PCI endpoint is required
> for overlaying the flattened device tree blob for that PCI endpoint.
> Because PCI is a self-discoverable bus, a device tree node is usually not
> created for PCI devices. This series adds support to generate a device
> tree node for a PCI device which advertises itself using PCI quirks
> infrastructure.
>
> Third, we need to generate device tree nodes for PCI bridges since a child
> PCI endpoint may choose to have a device tree node created.
>
> This patch series is made up of three patches.
>
> The first patch is adding OF interface to create or destroy OF node
> dynamically.
>
> The second patch introduces a kernel option, CONFIG_DYNAMIC_PCI_OF_NODEX.
> When the option is turned on, the kernel will generate device tree nodes
> for all PCI bridges unconditionally. The patch also shows how to use the
> PCI quirks infrastructure, DECLARE_PCI_FIXUP_FINAL to generate a device
> tree node for a device. Specifically, the patch generates a device tree
> node for Xilinx Alveo U50 PCIe accelerator device. The generated device
> tree nodes do not have any property.
>
> The third patch adds basic properties ('reg', 'compatible' and
> 'device_type') to the dynamically generated device tree nodes. More
> properties can be added in the future.
>
> Here is the example of device tree nodes generated within the ARM64 QEMU.
> # lspci -t
> -[0000:00]-+-00.0
> +-01.0-[01]--
> +-01.1-[02]----00.0
> +-01.2-[03]----00.0
> +-01.3-[04]----00.0
> +-01.4-[05]----00.0
> +-01.5-[06]--
> +-01.6-[07]--
> +-01.7-[08]--
> +-02.0-[09-0b]----00.0-[0a-0b]----00.0-[0b]--+-00.0
> | \-00.1
> +-02.1-[0c]--
> \-03.0-[0d-0e]----00.0-[0e]----01.0
>
> # tree /sys/firmware/devicetree/base/pcie\@10000000
> /sys/firmware/devicetree/base/pcie@10000000
> |-- #address-cells
> |-- #interrupt-cells
> |-- #size-cells
> |-- bus-range
> |-- compatible
> |-- device_type
> |-- dma-coherent
> |-- interrupt-map
> |-- interrupt-map-mask
> |-- linux,pci-domain
> |-- msi-parent
> |-- name
> |-- pci@1,0
> | |-- #address-cells
> | |-- #size-cells
> | |-- compatible
> | |-- device_type
> | |-- ranges
> | `-- reg
> |-- pci@1,1
> | |-- #address-cells
> | |-- #size-cells
> | |-- compatible
> | |-- device_type
> | |-- ranges
> | `-- reg
> |-- pci@1,2
> | |-- #address-cells
> | |-- #size-cells
> | |-- compatible
> | |-- device_type
> | |-- ranges
> | `-- reg
> |-- pci@1,3
> | |-- #address-cells
> | |-- #size-cells
> | |-- compatible
> | |-- device_type
> | |-- ranges
> | `-- reg
> |-- pci@1,4
> | |-- #address-cells
> | |-- #size-cells
> | |-- compatible
> | |-- device_type
> | |-- ranges
> | `-- reg
> |-- pci@1,5
> | |-- #address-cells
> | |-- #size-cells
> | |-- compatible
> | |-- device_type
> | |-- ranges
> | `-- reg
> |-- pci@1,6
> | |-- #address-cells
> | |-- #size-cells
> | |-- compatible
> | |-- device_type
> | |-- ranges
> | `-- reg
> |-- pci@1,7
> | |-- #address-cells
> | |-- #size-cells
> | |-- compatible
> | |-- device_type
> | |-- ranges
> | `-- reg
> |-- pci@2,0
> | |-- #address-cells
> | |-- #size-cells
> | |-- compatible
> | |-- device_type
> | |-- pci@0,0
> | | |-- #address-cells
> | | |-- #size-cells
> | | |-- compatible
> | | |-- device_type
> | | |-- pci@0,0
> | | | |-- #address-cells
> | | | |-- #size-cells
> | | | |-- compatible
> | | | |-- dev@0,0
> | | | | |-- compatible
> | | | | `-- reg
> | | | |-- dev@0,1
> | | | | |-- compatible
> | | | | `-- reg
> | | | |-- device_type
> | | | |-- ranges
> | | | `-- reg
> | | |-- ranges
> | | `-- reg
> | |-- ranges
> | `-- reg
> |-- pci@2,1
> | |-- #address-cells
> | |-- #size-cells
> | |-- compatible
> | |-- device_type
> | |-- ranges
> | `-- reg
> |-- pci@3,0
> | |-- #address-cells
> | |-- #size-cells
> | |-- compatible
> | |-- device_type
> | |-- pci@0,0
> | | |-- #address-cells
> | | |-- #size-cells
> | | |-- compatible
> | | |-- device_type
> | | |-- ranges
> | | `-- reg
> | |-- ranges
> | `-- reg
> |-- ranges
> `-- reg
>
> Changes since v6:
> - Removed single line wrapper functions
> - Added Signed-off-by Clément Léger <[email protected]>
>
> Changes since v5:
> - Fixed code review comments
> - Fixed incorrect 'ranges' and 'reg' properties and verified address
> translation.
>
> Changes since RFC v4:
> - Fixed code review comments
>
> Changes since RFC v3:
> - Split the Xilinx Alveo U50 PCI quirk to a separate patch
> - Minor changes in commit description and code comment
>
> Changes since RFC v2:
> - Merged patch 3 with patch 2
> - Added OF interfaces of_changeset_add_prop_* and use them to create
> properties.
> - Added '#address-cells', '#size-cells' and 'ranges' properties.
>
> Changes since RFC v1:
> - Added one patch to create basic properties.
> - To move DT related code out of PCI subsystem, replaced of_node_alloc()
> with of_create_node()/of_destroy_node()
>
> Lizhi Hou (3):
> of: dynamic: Add interfaces for creating device node dynamically
> PCI: Create device tree node for selected devices
> PCI: Add PCI quirks to generate device tree node for Xilinx Alveo U50
>
> drivers/of/dynamic.c | 197 +++++++++++++++++++++++++++++++++
> drivers/pci/Kconfig | 12 ++
> drivers/pci/Makefile | 1 +
> drivers/pci/bus.c | 2 +
> drivers/pci/msi/irqdomain.c | 6 +-
> drivers/pci/of.c | 71 ++++++++++++
> drivers/pci/of_property.c | 212 ++++++++++++++++++++++++++++++++++++
> drivers/pci/pci-driver.c | 3 +-
> drivers/pci/pci.h | 19 ++++
> drivers/pci/quirks.c | 11 ++
> drivers/pci/remove.c | 1 +
> include/linux/of.h | 24 ++++
> 12 files changed, 556 insertions(+), 3 deletions(-)
> create mode 100644 drivers/pci/of_property.c
>


2023-02-27 10:20:29

by Clément Léger

[permalink] [raw]
Subject: Re: [PATCH V7 0/3] Generate device tree node for pci devices

Le Sun, 26 Feb 2023 16:38:58 -0600,
Frank Rowand <[email protected]> a écrit :

> Hi Clément, Hi Lizhi,
>
> On 1/19/23 21:02, Lizhi Hou wrote:
> > This patch series introduces OF overlay support for PCI devices which
> > primarily addresses two use cases. First, it provides a data driven method
> > to describe hardware peripherals that are present in a PCI endpoint and
> > hence can be accessed by the PCI host. Second, it allows reuse of a OF
> > compatible driver -- often used in SoC platforms -- in a PCI host based
> > system.
> >
> > There are 2 series devices rely on this patch:
> >
> > 1) Xilinx Alveo Accelerator cards (FPGA based device)
> > 2) Microchip LAN9662 Ethernet Controller
> >
>
> Digging back through some history:
>
> > Please see: https://lore.kernel.org/lkml/[email protected]/
>
> (I am selectively pulling two fragments, see the above link for the
> full email.)
>
> Includes the following:
>
> A driver using this support was added and can be seen at [3]. This
> driver embeds a builtin overlay and applies it to the live tree using
> of_overlay_fdt_apply_to_node(). An interrupt driver is also included and
>
> and
>
> This series was tested on a x86 kernel using CONFIG_OF under a virtual
> machine using PCI passthrough.
>
> Link: [1] https://lore.kernel.org/lkml/[email protected]/t/
> Link: [2] https://lore.kernel.org/lkml/[email protected]/T/
> Link: [3] https://github.com/clementleger/linux/tree/lan966x/of_support
>
> Following link 3 to see how the driver implemented the concept, I arrived
> at a git tree, with the commit be42efa "mfd: lan966x: add pci driver",
> and have been looking at the code there.
>
> Clément, is this still the best example of a driver implementation that
> would use the framework proposed in the "[PATCH V7 0/3] Generate device
> tree node for pci devices" patch series? And this is the driver for the
> device listed as item 2 above "2) Microchip LAN9662 Ethernet Controller"?

Hi Frank,

The driver has slightly evolved to be based on Lizhi Patches and the
interrupt driver was reworked to be a standard platform driver. I'll
clean that up and push a new branch based on this work.

This driver is indeed the driver for the LAN9662 Ethernet Controller
which allows using the 2 SFPs ports and 2 RJ45 ports successfully (which
involves multiple subsystem and drivers).

While doing this work, I found multiple of_noderefcount issues which I
fixed and that are currently being reviewed. I won't be surprised if
there are other lying around in various part of the kernel. Just saying
so you know there is actually effort to make that more robust.

Clément

>
> -Frank
>
> >
> > Normally, the PCI core discovers PCI devices and their BARs using the
> > PCI enumeration process. However, the process does not provide a way to
> > discover the hardware peripherals that are present in a PCI device, and
> > which can be accessed through the PCI BARs. Also, the enumeration process
> > does not provide a way to associate MSI-X vectors of a PCI device with the
> > hardware peripherals that are present in the device. PCI device drivers
> > often use header files to describe the hardware peripherals and their
> > resources as there is no standard data driven way to do so. This patch
> > series proposes to use flattened device tree blob to describe the
> > peripherals in a data driven way. Based on previous discussion, using
> > device tree overlay is the best way to unflatten the blob and populate
> > platform devices. To use device tree overlay, there are three obvious
> > problems that need to be resolved.
> >
> > First, we need to create a base tree for non-DT system such as x86_64. A
> > patch series has been submitted for this:
> > https://lore.kernel.org/lkml/[email protected]/
> > https://lore.kernel.org/lkml/[email protected]/
> >
> > Second, a device tree node corresponding to the PCI endpoint is required
> > for overlaying the flattened device tree blob for that PCI endpoint.
> > Because PCI is a self-discoverable bus, a device tree node is usually not
> > created for PCI devices. This series adds support to generate a device
> > tree node for a PCI device which advertises itself using PCI quirks
> > infrastructure.
> >
> > Third, we need to generate device tree nodes for PCI bridges since a child
> > PCI endpoint may choose to have a device tree node created.
> >
> > This patch series is made up of three patches.
>
> < snip >
>



--
Clément Léger,
Embedded Linux and Kernel engineer at Bootlin
https://bootlin.com

2023-02-27 10:29:21

by Clément Léger

[permalink] [raw]
Subject: Re: [PATCH V7 0/3] Generate device tree node for pci devices

Le Mon, 27 Feb 2023 00:51:29 -0600,
Frank Rowand <[email protected]> a écrit :

> On 1/19/23 21:02, Lizhi Hou wrote:
> > This patch series introduces OF overlay support for PCI devices which
> > primarily addresses two use cases. First, it provides a data driven method
> > to describe hardware peripherals that are present in a PCI endpoint and
> > hence can be accessed by the PCI host. Second, it allows reuse of a OF
> > compatible driver -- often used in SoC platforms -- in a PCI host based
> > system.
> >
> > There are 2 series devices rely on this patch:
> >
> > 1) Xilinx Alveo Accelerator cards (FPGA based device)
> > 2) Microchip LAN9662 Ethernet Controller
> >
> > Please see: https://lore.kernel.org/lkml/[email protected]/
> >
>
>
> > Normally, the PCI core discovers PCI devices and their BARs using the
> > PCI enumeration process. However, the process does not provide a way to
> > discover the hardware peripherals that are present in a PCI device, and
> > which can be accessed through the PCI BARs. Also, the enumeration process
>
> I'm confused. The PCI Configuration Header Registers should describe the
> hardware on the PCI card.
>
> Ignoring case 1 above _for the moment_ (FPGA devices are a world unto
> themselves, so I would like to analyze that case separately), does the
> second device, "Microchip LAN9662 Ethernet Controller" properly implement
> the PCI Configuration Header Registers? What additional information is
> needed that is not provided in those registers?

Hi Frank,

I guess Lizhi wanted to say that it does not provide a way to describe
all the "platform" devices that are exposed by this PCI device. Which
is of course the whole point of the work we are doing right now. But
all the BARs are correctly described by the LAN9662 PCI card.

Clément

>
> -Frank
>
> > does not provide a way to associate MSI-X vectors of a PCI device with the
> > hardware peripherals that are present in the device. PCI device drivers
> > often use header files to describe the hardware peripherals and their
> > resources as there is no standard data driven way to do so. This patch
> > series proposes to use flattened device tree blob to describe the
> > peripherals in a data driven way. Based on previous discussion, using
> > device tree overlay is the best way to unflatten the blob and populate
> > platform devices. To use device tree overlay, there are three obvious
> > problems that need to be resolved.
> >
> > First, we need to create a base tree for non-DT system such as x86_64. A
> > patch series has been submitted for this:
> > https://lore.kernel.org/lkml/[email protected]/
> > https://lore.kernel.org/lkml/[email protected]/
> >
> > Second, a device tree node corresponding to the PCI endpoint is required
> > for overlaying the flattened device tree blob for that PCI endpoint.
> > Because PCI is a self-discoverable bus, a device tree node is usually not
> > created for PCI devices. This series adds support to generate a device
> > tree node for a PCI device which advertises itself using PCI quirks
> > infrastructure.
> >
> > Third, we need to generate device tree nodes for PCI bridges since a child
> > PCI endpoint may choose to have a device tree node created.
> >
> > This patch series is made up of three patches.
> >
> > The first patch is adding OF interface to create or destroy OF node
> > dynamically.
> >
> > The second patch introduces a kernel option, CONFIG_DYNAMIC_PCI_OF_NODEX.
> > When the option is turned on, the kernel will generate device tree nodes
> > for all PCI bridges unconditionally. The patch also shows how to use the
> > PCI quirks infrastructure, DECLARE_PCI_FIXUP_FINAL to generate a device
> > tree node for a device. Specifically, the patch generates a device tree
> > node for Xilinx Alveo U50 PCIe accelerator device. The generated device
> > tree nodes do not have any property.
> >
> > The third patch adds basic properties ('reg', 'compatible' and
> > 'device_type') to the dynamically generated device tree nodes. More
> > properties can be added in the future.
> >
> > Here is the example of device tree nodes generated within the ARM64 QEMU.
> > # lspci -t
> > -[0000:00]-+-00.0
> > +-01.0-[01]--
> > +-01.1-[02]----00.0
> > +-01.2-[03]----00.0
> > +-01.3-[04]----00.0
> > +-01.4-[05]----00.0
> > +-01.5-[06]--
> > +-01.6-[07]--
> > +-01.7-[08]--
> > +-02.0-[09-0b]----00.0-[0a-0b]----00.0-[0b]--+-00.0
> > | \-00.1
> > +-02.1-[0c]--
> > \-03.0-[0d-0e]----00.0-[0e]----01.0
> >
> > # tree /sys/firmware/devicetree/base/pcie\@10000000
> > /sys/firmware/devicetree/base/pcie@10000000
> > |-- #address-cells
> > |-- #interrupt-cells
> > |-- #size-cells
> > |-- bus-range
> > |-- compatible
> > |-- device_type
> > |-- dma-coherent
> > |-- interrupt-map
> > |-- interrupt-map-mask
> > |-- linux,pci-domain
> > |-- msi-parent
> > |-- name
> > |-- pci@1,0
> > | |-- #address-cells
> > | |-- #size-cells
> > | |-- compatible
> > | |-- device_type
> > | |-- ranges
> > | `-- reg
> > |-- pci@1,1
> > | |-- #address-cells
> > | |-- #size-cells
> > | |-- compatible
> > | |-- device_type
> > | |-- ranges
> > | `-- reg
> > |-- pci@1,2
> > | |-- #address-cells
> > | |-- #size-cells
> > | |-- compatible
> > | |-- device_type
> > | |-- ranges
> > | `-- reg
> > |-- pci@1,3
> > | |-- #address-cells
> > | |-- #size-cells
> > | |-- compatible
> > | |-- device_type
> > | |-- ranges
> > | `-- reg
> > |-- pci@1,4
> > | |-- #address-cells
> > | |-- #size-cells
> > | |-- compatible
> > | |-- device_type
> > | |-- ranges
> > | `-- reg
> > |-- pci@1,5
> > | |-- #address-cells
> > | |-- #size-cells
> > | |-- compatible
> > | |-- device_type
> > | |-- ranges
> > | `-- reg
> > |-- pci@1,6
> > | |-- #address-cells
> > | |-- #size-cells
> > | |-- compatible
> > | |-- device_type
> > | |-- ranges
> > | `-- reg
> > |-- pci@1,7
> > | |-- #address-cells
> > | |-- #size-cells
> > | |-- compatible
> > | |-- device_type
> > | |-- ranges
> > | `-- reg
> > |-- pci@2,0
> > | |-- #address-cells
> > | |-- #size-cells
> > | |-- compatible
> > | |-- device_type
> > | |-- pci@0,0
> > | | |-- #address-cells
> > | | |-- #size-cells
> > | | |-- compatible
> > | | |-- device_type
> > | | |-- pci@0,0
> > | | | |-- #address-cells
> > | | | |-- #size-cells
> > | | | |-- compatible
> > | | | |-- dev@0,0
> > | | | | |-- compatible
> > | | | | `-- reg
> > | | | |-- dev@0,1
> > | | | | |-- compatible
> > | | | | `-- reg
> > | | | |-- device_type
> > | | | |-- ranges
> > | | | `-- reg
> > | | |-- ranges
> > | | `-- reg
> > | |-- ranges
> > | `-- reg
> > |-- pci@2,1
> > | |-- #address-cells
> > | |-- #size-cells
> > | |-- compatible
> > | |-- device_type
> > | |-- ranges
> > | `-- reg
> > |-- pci@3,0
> > | |-- #address-cells
> > | |-- #size-cells
> > | |-- compatible
> > | |-- device_type
> > | |-- pci@0,0
> > | | |-- #address-cells
> > | | |-- #size-cells
> > | | |-- compatible
> > | | |-- device_type
> > | | |-- ranges
> > | | `-- reg
> > | |-- ranges
> > | `-- reg
> > |-- ranges
> > `-- reg
> >
> > Changes since v6:
> > - Removed single line wrapper functions
> > - Added Signed-off-by Clément Léger <[email protected]>
> >
> > Changes since v5:
> > - Fixed code review comments
> > - Fixed incorrect 'ranges' and 'reg' properties and verified address
> > translation.
> >
> > Changes since RFC v4:
> > - Fixed code review comments
> >
> > Changes since RFC v3:
> > - Split the Xilinx Alveo U50 PCI quirk to a separate patch
> > - Minor changes in commit description and code comment
> >
> > Changes since RFC v2:
> > - Merged patch 3 with patch 2
> > - Added OF interfaces of_changeset_add_prop_* and use them to create
> > properties.
> > - Added '#address-cells', '#size-cells' and 'ranges' properties.
> >
> > Changes since RFC v1:
> > - Added one patch to create basic properties.
> > - To move DT related code out of PCI subsystem, replaced of_node_alloc()
> > with of_create_node()/of_destroy_node()
> >
> > Lizhi Hou (3):
> > of: dynamic: Add interfaces for creating device node dynamically
> > PCI: Create device tree node for selected devices
> > PCI: Add PCI quirks to generate device tree node for Xilinx Alveo U50
> >
> > drivers/of/dynamic.c | 197 +++++++++++++++++++++++++++++++++
> > drivers/pci/Kconfig | 12 ++
> > drivers/pci/Makefile | 1 +
> > drivers/pci/bus.c | 2 +
> > drivers/pci/msi/irqdomain.c | 6 +-
> > drivers/pci/of.c | 71 ++++++++++++
> > drivers/pci/of_property.c | 212 ++++++++++++++++++++++++++++++++++++
> > drivers/pci/pci-driver.c | 3 +-
> > drivers/pci/pci.h | 19 ++++
> > drivers/pci/quirks.c | 11 ++
> > drivers/pci/remove.c | 1 +
> > include/linux/of.h | 24 ++++
> > 12 files changed, 556 insertions(+), 3 deletions(-)
> > create mode 100644 drivers/pci/of_property.c
> >
>



--
Clément Léger,
Embedded Linux and Kernel engineer at Bootlin
https://bootlin.com

2023-03-03 23:42:29

by Frank Rowand

[permalink] [raw]
Subject: Re: [PATCH V7 0/3] Generate device tree node for pci devices

On 2/27/23 04:31, Clément Léger wrote:
> Le Mon, 27 Feb 2023 00:51:29 -0600,
> Frank Rowand <[email protected]> a écrit :
>
>> On 1/19/23 21:02, Lizhi Hou wrote:
>>> This patch series introduces OF overlay support for PCI devices which
>>> primarily addresses two use cases. First, it provides a data driven method
>>> to describe hardware peripherals that are present in a PCI endpoint and
>>> hence can be accessed by the PCI host. Second, it allows reuse of a OF
>>> compatible driver -- often used in SoC platforms -- in a PCI host based
>>> system.
>>>
>>> There are 2 series devices rely on this patch:
>>>
>>> 1) Xilinx Alveo Accelerator cards (FPGA based device)
>>> 2) Microchip LAN9662 Ethernet Controller
>>>
>>> Please see: https://lore.kernel.org/lkml/[email protected]/
>>>
>>
>>
>>> Normally, the PCI core discovers PCI devices and their BARs using the
>>> PCI enumeration process. However, the process does not provide a way to
>>> discover the hardware peripherals that are present in a PCI device, and
>>> which can be accessed through the PCI BARs. Also, the enumeration process
>>
>> I'm confused. The PCI Configuration Header Registers should describe the
>> hardware on the PCI card.
>>
>> Ignoring case 1 above _for the moment_ (FPGA devices are a world unto
>> themselves, so I would like to analyze that case separately), does the
>> second device, "Microchip LAN9662 Ethernet Controller" properly implement
>> the PCI Configuration Header Registers? What additional information is
>> needed that is not provided in those registers?
>
> Hi Frank,
>
> I guess Lizhi wanted to say that it does not provide a way to describe
> all the "platform" devices that are exposed by this PCI device. Which
> is of course the whole point of the work we are doing right now. But
> all the BARs are correctly described by the LAN9662 PCI card.
>
> Clément

I remain confused.

[RFC 00/10] add support for fwnode in i2c mux system and sfp
https://lore.kernel.org/lkml/[email protected]/t/

references a PCIe driver:
[2] https://github.com/clementleger/linux/blob/fwnode_support/drivers/mfd/lan966x_pci_mfd.c

So there is a PCIe driver that works.

However, the RFC patch series was proposing adding fwnode support to the driver. My first
surface reading (just part of that one email, not the entire series or the replies yet),
notes:

... However, when
plugged in a PCIe slot (on a x86), there is no device-tree support and
the peripherals that are present must be described in some other way.

I am assuming that the peripherals are what you mentioned above as '"platform"
devices'. This is where my current confusion lies. Are the "platform"
devices accessed via the PCI bus or is there some other electrical connection
between the host system and the PCIe card?

If the "platform" devices are accessed via the PCI bus, then I would expect them
to be described by PCI configuration header registers. Are the PCI configuration
registers to describe the "platform" devices not present?

I'll read through the fwnode RFC thread to add to see what happened to the proposal.

-Frank

>
>>
>> -Frank
>>
>>> does not provide a way to associate MSI-X vectors of a PCI device with the
>>> hardware peripherals that are present in the device. PCI device drivers
>>> often use header files to describe the hardware peripherals and their
>>> resources as there is no standard data driven way to do so. This patch
>>> series proposes to use flattened device tree blob to describe the
>>> peripherals in a data driven way. Based on previous discussion, using
>>> device tree overlay is the best way to unflatten the blob and populate
>>> platform devices. To use device tree overlay, there are three obvious
>>> problems that need to be resolved.
>>>
>>> First, we need to create a base tree for non-DT system such as x86_64. A
>>> patch series has been submitted for this:
>>> https://lore.kernel.org/lkml/[email protected]/
>>> https://lore.kernel.org/lkml/[email protected]/
>>>
>>> Second, a device tree node corresponding to the PCI endpoint is required
>>> for overlaying the flattened device tree blob for that PCI endpoint.
>>> Because PCI is a self-discoverable bus, a device tree node is usually not
>>> created for PCI devices. This series adds support to generate a device
>>> tree node for a PCI device which advertises itself using PCI quirks
>>> infrastructure.
>>>
>>> Third, we need to generate device tree nodes for PCI bridges since a child
>>> PCI endpoint may choose to have a device tree node created.
>>>
>>> This patch series is made up of three patches.
>>>
>>> The first patch is adding OF interface to create or destroy OF node
>>> dynamically.
>>>
>>> The second patch introduces a kernel option, CONFIG_DYNAMIC_PCI_OF_NODEX.
>>> When the option is turned on, the kernel will generate device tree nodes
>>> for all PCI bridges unconditionally. The patch also shows how to use the
>>> PCI quirks infrastructure, DECLARE_PCI_FIXUP_FINAL to generate a device
>>> tree node for a device. Specifically, the patch generates a device tree
>>> node for Xilinx Alveo U50 PCIe accelerator device. The generated device
>>> tree nodes do not have any property.
>>>
>>> The third patch adds basic properties ('reg', 'compatible' and
>>> 'device_type') to the dynamically generated device tree nodes. More
>>> properties can be added in the future.
>>>
>>> Here is the example of device tree nodes generated within the ARM64 QEMU.
>>> # lspci -t
>>> -[0000:00]-+-00.0
>>> +-01.0-[01]--
>>> +-01.1-[02]----00.0
>>> +-01.2-[03]----00.0
>>> +-01.3-[04]----00.0
>>> +-01.4-[05]----00.0
>>> +-01.5-[06]--
>>> +-01.6-[07]--
>>> +-01.7-[08]--
>>> +-02.0-[09-0b]----00.0-[0a-0b]----00.0-[0b]--+-00.0
>>> | \-00.1
>>> +-02.1-[0c]--
>>> \-03.0-[0d-0e]----00.0-[0e]----01.0
>>>
>>> # tree /sys/firmware/devicetree/base/pcie\@10000000
>>> /sys/firmware/devicetree/base/pcie@10000000
>>> |-- #address-cells
>>> |-- #interrupt-cells
>>> |-- #size-cells
>>> |-- bus-range
>>> |-- compatible
>>> |-- device_type
>>> |-- dma-coherent
>>> |-- interrupt-map
>>> |-- interrupt-map-mask
>>> |-- linux,pci-domain
>>> |-- msi-parent
>>> |-- name
>>> |-- pci@1,0
>>> | |-- #address-cells
>>> | |-- #size-cells
>>> | |-- compatible
>>> | |-- device_type
>>> | |-- ranges
>>> | `-- reg
>>> |-- pci@1,1
>>> | |-- #address-cells
>>> | |-- #size-cells
>>> | |-- compatible
>>> | |-- device_type
>>> | |-- ranges
>>> | `-- reg
>>> |-- pci@1,2
>>> | |-- #address-cells
>>> | |-- #size-cells
>>> | |-- compatible
>>> | |-- device_type
>>> | |-- ranges
>>> | `-- reg
>>> |-- pci@1,3
>>> | |-- #address-cells
>>> | |-- #size-cells
>>> | |-- compatible
>>> | |-- device_type
>>> | |-- ranges
>>> | `-- reg
>>> |-- pci@1,4
>>> | |-- #address-cells
>>> | |-- #size-cells
>>> | |-- compatible
>>> | |-- device_type
>>> | |-- ranges
>>> | `-- reg
>>> |-- pci@1,5
>>> | |-- #address-cells
>>> | |-- #size-cells
>>> | |-- compatible
>>> | |-- device_type
>>> | |-- ranges
>>> | `-- reg
>>> |-- pci@1,6
>>> | |-- #address-cells
>>> | |-- #size-cells
>>> | |-- compatible
>>> | |-- device_type
>>> | |-- ranges
>>> | `-- reg
>>> |-- pci@1,7
>>> | |-- #address-cells
>>> | |-- #size-cells
>>> | |-- compatible
>>> | |-- device_type
>>> | |-- ranges
>>> | `-- reg
>>> |-- pci@2,0
>>> | |-- #address-cells
>>> | |-- #size-cells
>>> | |-- compatible
>>> | |-- device_type
>>> | |-- pci@0,0
>>> | | |-- #address-cells
>>> | | |-- #size-cells
>>> | | |-- compatible
>>> | | |-- device_type
>>> | | |-- pci@0,0
>>> | | | |-- #address-cells
>>> | | | |-- #size-cells
>>> | | | |-- compatible
>>> | | | |-- dev@0,0
>>> | | | | |-- compatible
>>> | | | | `-- reg
>>> | | | |-- dev@0,1
>>> | | | | |-- compatible
>>> | | | | `-- reg
>>> | | | |-- device_type
>>> | | | |-- ranges
>>> | | | `-- reg
>>> | | |-- ranges
>>> | | `-- reg
>>> | |-- ranges
>>> | `-- reg
>>> |-- pci@2,1
>>> | |-- #address-cells
>>> | |-- #size-cells
>>> | |-- compatible
>>> | |-- device_type
>>> | |-- ranges
>>> | `-- reg
>>> |-- pci@3,0
>>> | |-- #address-cells
>>> | |-- #size-cells
>>> | |-- compatible
>>> | |-- device_type
>>> | |-- pci@0,0
>>> | | |-- #address-cells
>>> | | |-- #size-cells
>>> | | |-- compatible
>>> | | |-- device_type
>>> | | |-- ranges
>>> | | `-- reg
>>> | |-- ranges
>>> | `-- reg
>>> |-- ranges
>>> `-- reg
>>>
>>> Changes since v6:
>>> - Removed single line wrapper functions
>>> - Added Signed-off-by Clément Léger <[email protected]>
>>>
>>> Changes since v5:
>>> - Fixed code review comments
>>> - Fixed incorrect 'ranges' and 'reg' properties and verified address
>>> translation.
>>>
>>> Changes since RFC v4:
>>> - Fixed code review comments
>>>
>>> Changes since RFC v3:
>>> - Split the Xilinx Alveo U50 PCI quirk to a separate patch
>>> - Minor changes in commit description and code comment
>>>
>>> Changes since RFC v2:
>>> - Merged patch 3 with patch 2
>>> - Added OF interfaces of_changeset_add_prop_* and use them to create
>>> properties.
>>> - Added '#address-cells', '#size-cells' and 'ranges' properties.
>>>
>>> Changes since RFC v1:
>>> - Added one patch to create basic properties.
>>> - To move DT related code out of PCI subsystem, replaced of_node_alloc()
>>> with of_create_node()/of_destroy_node()
>>>
>>> Lizhi Hou (3):
>>> of: dynamic: Add interfaces for creating device node dynamically
>>> PCI: Create device tree node for selected devices
>>> PCI: Add PCI quirks to generate device tree node for Xilinx Alveo U50
>>>
>>> drivers/of/dynamic.c | 197 +++++++++++++++++++++++++++++++++
>>> drivers/pci/Kconfig | 12 ++
>>> drivers/pci/Makefile | 1 +
>>> drivers/pci/bus.c | 2 +
>>> drivers/pci/msi/irqdomain.c | 6 +-
>>> drivers/pci/of.c | 71 ++++++++++++
>>> drivers/pci/of_property.c | 212 ++++++++++++++++++++++++++++++++++++
>>> drivers/pci/pci-driver.c | 3 +-
>>> drivers/pci/pci.h | 19 ++++
>>> drivers/pci/quirks.c | 11 ++
>>> drivers/pci/remove.c | 1 +
>>> include/linux/of.h | 24 ++++
>>> 12 files changed, 556 insertions(+), 3 deletions(-)
>>> create mode 100644 drivers/pci/of_property.c
>>>
>>
>
>
>


2023-03-06 08:33:47

by Clément Léger

[permalink] [raw]
Subject: Re: [PATCH V7 0/3] Generate device tree node for pci devices

Le 2023-03-04 00:42, Frank Rowand a écrit :
> On 2/27/23 04:31, Clément Léger wrote:
>> Le Mon, 27 Feb 2023 00:51:29 -0600,
>> Frank Rowand <[email protected]> a écrit :
>>
>>> On 1/19/23 21:02, Lizhi Hou wrote:
>>>> This patch series introduces OF overlay support for PCI devices
>>>> which
>>>> primarily addresses two use cases. First, it provides a data driven
>>>> method
>>>> to describe hardware peripherals that are present in a PCI endpoint
>>>> and
>>>> hence can be accessed by the PCI host. Second, it allows reuse of a
>>>> OF
>>>> compatible driver -- often used in SoC platforms -- in a PCI host
>>>> based
>>>> system.
>>>>
>>>> There are 2 series devices rely on this patch:
>>>>
>>>> 1) Xilinx Alveo Accelerator cards (FPGA based device)
>>>> 2) Microchip LAN9662 Ethernet Controller
>>>>
>>>> Please see:
>>>> https://lore.kernel.org/lkml/[email protected]/
>>>>
>>>
>>>
>>>> Normally, the PCI core discovers PCI devices and their BARs using
>>>> the
>>>> PCI enumeration process. However, the process does not provide a way
>>>> to
>>>> discover the hardware peripherals that are present in a PCI device,
>>>> and
>>>> which can be accessed through the PCI BARs. Also, the enumeration
>>>> process
>>>
>>> I'm confused. The PCI Configuration Header Registers should describe
>>> the
>>> hardware on the PCI card.
>>>
>>> Ignoring case 1 above _for the moment_ (FPGA devices are a world unto
>>> themselves, so I would like to analyze that case separately), does
>>> the
>>> second device, "Microchip LAN9662 Ethernet Controller" properly
>>> implement
>>> the PCI Configuration Header Registers? What additional information
>>> is
>>> needed that is not provided in those registers?
>>
>> Hi Frank,
>>
>> I guess Lizhi wanted to say that it does not provide a way to describe
>> all the "platform" devices that are exposed by this PCI device. Which
>> is of course the whole point of the work we are doing right now. But
>> all the BARs are correctly described by the LAN9662 PCI card.
>>
>> Clément
>
> I remain confused.
>
> [RFC 00/10] add support for fwnode in i2c mux system and sfp
> https://lore.kernel.org/lkml/[email protected]/t/
>
> references a PCIe driver:
> [2]
> https://github.com/clementleger/linux/blob/fwnode_support/drivers/mfd/lan966x_pci_mfd.c
>
> So there is a PCIe driver that works.
>
> However, the RFC patch series was proposing adding fwnode support to
> the driver. My first
> surface reading (just part of that one email, not the entire series or
> the replies yet),
> notes:
>
> ... However, when
> plugged in a PCIe slot (on a x86), there is no device-tree support
> and
> the peripherals that are present must be described in some other way.
>
> I am assuming that the peripherals are what you mentioned above as
> '"platform"
> devices'. This is where my current confusion lies. Are the "platform"
> devices accessed via the PCI bus or is there some other electrical
> connection
> between the host system and the PCIe card?

Hi Frank,

The platform devices exposed by this PCIe card are available via some
BAR using PCI memory mapped areas, so it's totally standard PCI stuff.

>
> If the "platform" devices are accessed via the PCI bus, then I would
> expect them
> to be described by PCI configuration header registers. Are the PCI
> configuration
> registers to describe the "platform" devices not present?

I'm not sure to understand what you mean here. PCI configuration headers
only provides some basic registers allowing to identify the PCI device
(vendor/product) and some memory areas that are exposed (BAR). They do
not provides the "list" of peripherals that are exposed by the devices,
only some BARs that can be mapped and that allows to access.

In the case of the lan9662 cnetwork controller, BAR 0 and 1 exposes
multiples devices that are located at some subranges of this BAR. For
instance (not accurate), we have the I2C controller located at BAR 0
+ offset 0X1000, then the flexcom controller exposed in BAR 0 at offset
0x20000, etc. This list of peripheral is not exposed at all by the PCI
configuration headers (since it is not the purpose of course). All of
these peripherals have already existing platform drivers which can then
be reused thanks to the PCI device-tree overlay series.

>
> I'll read through the fwnode RFC thread to add to see what happened to
> the proposal.

You can probably read the cover letter which described the use case in
details. However, don't spend too much time reading the patchset, we
discarded them for many good reason (way too much modifications in
subsystems, no standardization of software node bindings, etc).

Clément


2023-03-06 21:24:34

by Frank Rowand

[permalink] [raw]
Subject: Re: [PATCH V7 0/3] Generate device tree node for pci devices

On 3/6/23 02:35, [email protected] wrote:
> Le 2023-03-04 00:42, Frank Rowand a écrit :
>> On 2/27/23 04:31, Clément Léger wrote:
>>> Le Mon, 27 Feb 2023 00:51:29 -0600,
>>> Frank Rowand <[email protected]> a écrit :
>>>
>>>> On 1/19/23 21:02, Lizhi Hou wrote:
>>>>> This patch series introduces OF overlay support for PCI devices
>>>>> which
>>>>> primarily addresses two use cases. First, it provides a data driven
>>>>> method
>>>>> to describe hardware peripherals that are present in a PCI endpoint
>>>>> and
>>>>> hence can be accessed by the PCI host. Second, it allows reuse of a
>>>>> OF
>>>>> compatible driver -- often used in SoC platforms -- in a PCI host
>>>>> based
>>>>> system.
>>>>>
>>>>> There are 2 series devices rely on this patch:
>>>>>
>>>>> 1) Xilinx Alveo Accelerator cards (FPGA based device)
>>>>> 2) Microchip LAN9662 Ethernet Controller
>>>>>
>>>>> Please see:
>>>>> https://lore.kernel.org/lkml/[email protected]/
>>>>>
>>>>
>>>>
>>>>> Normally, the PCI core discovers PCI devices and their BARs using
>>>>> the
>>>>> PCI enumeration process. However, the process does not provide a way
>>>>> to
>>>>> discover the hardware peripherals that are present in a PCI device,
>>>>> and
>>>>> which can be accessed through the PCI BARs. Also, the enumeration
>>>>> process
>>>>
>>>> I'm confused. The PCI Configuration Header Registers should describe
>>>> the
>>>> hardware on the PCI card.
>>>>
>>>> Ignoring case 1 above _for the moment_ (FPGA devices are a world unto
>>>> themselves, so I would like to analyze that case separately), does
>>>> the
>>>> second device, "Microchip LAN9662 Ethernet Controller" properly
>>>> implement
>>>> the PCI Configuration Header Registers? What additional information
>>>> is
>>>> needed that is not provided in those registers?
>>>
>>> Hi Frank,
>>>
>>> I guess Lizhi wanted to say that it does not provide a way to describe
>>> all the "platform" devices that are exposed by this PCI device. Which
>>> is of course the whole point of the work we are doing right now. But
>>> all the BARs are correctly described by the LAN9662 PCI card.
>>>
>>> Clément
>>
>> I remain confused.
>>
>> [RFC 00/10] add support for fwnode in i2c mux system and sfp
>> https://lore.kernel.org/lkml/[email protected]/t/
>>
>> references a PCIe driver:
>> [2]
>> https://github.com/clementleger/linux/blob/fwnode_support/drivers/mfd/lan966x_pci_mfd.c
>>
>> So there is a PCIe driver that works.
>>
>> However, the RFC patch series was proposing adding fwnode support to
>> the driver. My first
>> surface reading (just part of that one email, not the entire series or
>> the replies yet),
>> notes:
>>
>> ... However, when
>> plugged in a PCIe slot (on a x86), there is no device-tree support
>> and
>> the peripherals that are present must be described in some other way.
>>
>> I am assuming that the peripherals are what you mentioned above as
>> '"platform"
>> devices'. This is where my current confusion lies. Are the "platform"
>> devices accessed via the PCI bus or is there some other electrical
>> connection
>> between the host system and the PCIe card?
>
> Hi Frank,
>
> The platform devices exposed by this PCIe card are available via some
> BAR using PCI memory mapped areas, so it's totally standard PCI stuff.
>
>>
>> If the "platform" devices are accessed via the PCI bus, then I would
>> expect them
>> to be described by PCI configuration header registers. Are the PCI
>> configuration
>> registers to describe the "platform" devices not present?
>
> I'm not sure to understand what you mean here. PCI configuration headers
> only provides some basic registers allowing to identify the PCI device
> (vendor/product) and some memory areas that are exposed (BAR). They do
> not provides the "list" of peripherals that are exposed by the devices,
> only some BARs that can be mapped and that allows to access.

Yes, "identify the PCI device (vendor/product) and some memory areas".
The driver for the (vendor/product) 'knows' what peripherals are exposed
by the device and where within the BAR to find the registers for each
of the devices.

A normal PCI driver would contain this information. If I understand the
proposal of this patch series, of_pci_make_dev_node() adds a node to
the devicetree, when invoked via a PCI quirk for certain specific
vendor/product cards. This node must exist for the flattened device
tree (FDT) overlay for the card to be loaded. The driver for the card
will get the overlay FDT from the card and load it into the kernel.
The driver will use the information that then exists in the devicetree
describing the card, instead of using information from the PCI configuration
headers from the card.

The intent is to be able to re-use devicetree based drivers instead of
having the driver be a native PCI driver.

This goes against historical Linux practice. The idea of taking a driver
from another environment (eg Windows, HP Unix, Sun Unix, IBM Unix, etc)
and adding a shim layer to translate between Linux and the other
environment has been rejected. Ironically, in this case, the other
environment is Linux (more specifically the Linux OF implementation).
Even thought the other environment is Linux, this is still adding a
shim layer to translate between that other environment and the native
Linux PCI environment for which the driver would normally be written.

In other words, this is not acceptable. Normal alternatives would be
something like (1) add the PCI awareness to the existing drivers,
(2) split the devicetree aware and PCI aware portions of the driver
to common code that would be invoked from separate devicetree and PCI
drivers, (3) write entirely separate devicetree and PCI drivers, or
(4) some other creative solution.

Am I mis-interpretting or misunderstanding anything crucial here?

-Frank

>
> In the case of the lan9662 cnetwork controller, BAR 0 and 1 exposes
> multiples devices that are located at some subranges of this BAR. For
> instance (not accurate), we have the I2C controller located at BAR 0
> + offset 0X1000, then the flexcom controller exposed in BAR 0 at offset
> 0x20000, etc. This list of peripheral is not exposed at all by the PCI
> configuration headers (since it is not the purpose of course). All of
> these peripherals have already existing platform drivers which can then
> be reused thanks to the PCI device-tree overlay series.
>
>>
>> I'll read through the fwnode RFC thread to add to see what happened to
>> the proposal.
>
> You can probably read the cover letter which described the use case in
> details. However, don't spend too much time reading the patchset, we
> discarded them for many good reason (way too much modifications in
> subsystems, no standardization of software node bindings, etc).
>
> Clément
>


2023-03-07 00:53:01

by Rob Herring (Arm)

[permalink] [raw]
Subject: Re: [PATCH V7 0/3] Generate device tree node for pci devices

On Mon, Mar 6, 2023 at 3:24 PM Frank Rowand <[email protected]> wrote:
>
> On 3/6/23 02:35, [email protected] wrote:
> > Le 2023-03-04 00:42, Frank Rowand a écrit :
> >> On 2/27/23 04:31, Clément Léger wrote:
> >>> Le Mon, 27 Feb 2023 00:51:29 -0600,
> >>> Frank Rowand <[email protected]> a écrit :
> >>>
> >>>> On 1/19/23 21:02, Lizhi Hou wrote:
> >>>>> This patch series introduces OF overlay support for PCI devices
> >>>>> which
> >>>>> primarily addresses two use cases. First, it provides a data driven
> >>>>> method
> >>>>> to describe hardware peripherals that are present in a PCI endpoint
> >>>>> and
> >>>>> hence can be accessed by the PCI host. Second, it allows reuse of a
> >>>>> OF
> >>>>> compatible driver -- often used in SoC platforms -- in a PCI host
> >>>>> based
> >>>>> system.
> >>>>>
> >>>>> There are 2 series devices rely on this patch:
> >>>>>
> >>>>> 1) Xilinx Alveo Accelerator cards (FPGA based device)
> >>>>> 2) Microchip LAN9662 Ethernet Controller
> >>>>>
> >>>>> Please see:
> >>>>> https://lore.kernel.org/lkml/[email protected]/
> >>>>>
> >>>>
> >>>>
> >>>>> Normally, the PCI core discovers PCI devices and their BARs using
> >>>>> the
> >>>>> PCI enumeration process. However, the process does not provide a way
> >>>>> to
> >>>>> discover the hardware peripherals that are present in a PCI device,
> >>>>> and
> >>>>> which can be accessed through the PCI BARs. Also, the enumeration
> >>>>> process
> >>>>
> >>>> I'm confused. The PCI Configuration Header Registers should describe
> >>>> the
> >>>> hardware on the PCI card.
> >>>>
> >>>> Ignoring case 1 above _for the moment_ (FPGA devices are a world unto
> >>>> themselves, so I would like to analyze that case separately), does
> >>>> the
> >>>> second device, "Microchip LAN9662 Ethernet Controller" properly
> >>>> implement
> >>>> the PCI Configuration Header Registers? What additional information
> >>>> is
> >>>> needed that is not provided in those registers?
> >>>
> >>> Hi Frank,
> >>>
> >>> I guess Lizhi wanted to say that it does not provide a way to describe
> >>> all the "platform" devices that are exposed by this PCI device. Which
> >>> is of course the whole point of the work we are doing right now. But
> >>> all the BARs are correctly described by the LAN9662 PCI card.
> >>>
> >>> Clément
> >>
> >> I remain confused.
> >>
> >> [RFC 00/10] add support for fwnode in i2c mux system and sfp
> >> https://lore.kernel.org/lkml/[email protected]/t/
> >>
> >> references a PCIe driver:
> >> [2]
> >> https://github.com/clementleger/linux/blob/fwnode_support/drivers/mfd/lan966x_pci_mfd.c
> >>
> >> So there is a PCIe driver that works.
> >>
> >> However, the RFC patch series was proposing adding fwnode support to
> >> the driver. My first
> >> surface reading (just part of that one email, not the entire series or
> >> the replies yet),
> >> notes:
> >>
> >> ... However, when
> >> plugged in a PCIe slot (on a x86), there is no device-tree support
> >> and
> >> the peripherals that are present must be described in some other way.
> >>
> >> I am assuming that the peripherals are what you mentioned above as
> >> '"platform"
> >> devices'. This is where my current confusion lies. Are the "platform"
> >> devices accessed via the PCI bus or is there some other electrical
> >> connection
> >> between the host system and the PCIe card?
> >
> > Hi Frank,
> >
> > The platform devices exposed by this PCIe card are available via some
> > BAR using PCI memory mapped areas, so it's totally standard PCI stuff.
> >
> >>
> >> If the "platform" devices are accessed via the PCI bus, then I would
> >> expect them
> >> to be described by PCI configuration header registers. Are the PCI
> >> configuration
> >> registers to describe the "platform" devices not present?
> >
> > I'm not sure to understand what you mean here. PCI configuration headers
> > only provides some basic registers allowing to identify the PCI device
> > (vendor/product) and some memory areas that are exposed (BAR). They do
> > not provides the "list" of peripherals that are exposed by the devices,
> > only some BARs that can be mapped and that allows to access.
>
> Yes, "identify the PCI device (vendor/product) and some memory areas".
> The driver for the (vendor/product) 'knows' what peripherals are exposed
> by the device and where within the BAR to find the registers for each
> of the devices.
>
> A normal PCI driver would contain this information. If I understand the
> proposal of this patch series, of_pci_make_dev_node() adds a node to
> the devicetree, when invoked via a PCI quirk for certain specific
> vendor/product cards. This node must exist for the flattened device
> tree (FDT) overlay for the card to be loaded. The driver for the card
> will get the overlay FDT from the card and load it into the kernel.
> The driver will use the information that then exists in the devicetree
> describing the card, instead of using information from the PCI configuration
> headers from the card.

How would all the sub devices be defined by the PCI config space other
than a VID/PID implies *everything*. That's the same as the pre-DT
world where the ARM machine ID number (from RMK's registry) implied
everything. These days, we can have an entire SoC exposed behind a PCI
BAR which I think is pretty much Clement's usecase. Putting an SoC
behind a PCI BAR is no more discoverable than a "normal" SoC.

>
> The intent is to be able to re-use devicetree based drivers instead of
> having the driver be a native PCI driver.

Not instead of. There's the PCI driver for the FPGA or SoC bus with
multiple unrelated devices behind it. The PCI driver is just a bus
driver much like we have for various custom SoC bus drivers.

> This goes against historical Linux practice. The idea of taking a driver
> from another environment (eg Windows, HP Unix, Sun Unix, IBM Unix, etc)
> and adding a shim layer to translate between Linux and the other
> environment has been rejected. Ironically, in this case, the other
> environment is Linux (more specifically the Linux OF implementation).

I don't see how your example relates to this in any way whatsoever.
We're talking about different discovery mechanisms, not different
driver models/environments.

> Even thought the other environment is Linux, this is still adding a
> shim layer to translate between that other environment and the native
> Linux PCI environment for which the driver would normally be written.
>
> In other words, this is not acceptable. Normal alternatives would be
> something like
> (1) add the PCI awareness to the existing drivers,

The downstream devices don't have their own PCI config space. That
won't work. PCI drivers expect a device with PCI config space. Devices
to drivers are always 1:1, so we couldn't share the config space among
multiple drivers or something. For devices which are not discoverable
like these are, our choices are DT, ACPI or s/w nodes (aka
platform_data 2.0).

> (2) split the devicetree aware and PCI aware portions of the driver
> to common code that would be invoked from separate devicetree and PCI
> drivers,

That only makes sense for something that is a single driver. Not the
case here. For the FPGA, the devices are not known up front.

> (3) write entirely separate devicetree and PCI drivers, or

For the same reason as 1, that simply won't work.

> (4) some other creative solution.
>
> Am I mis-interpretting or misunderstanding anything crucial here?

Yes...

We now have 3 different use cases all needing the same thing. The
3rd[1] is the recent test infrastructure change to have test devices
added. They all have non-discoverable devices downstream of a PCI
device. We need a solution here.

Rob

[1] https://lore.kernel.org/lkml/[email protected]/

2023-03-07 08:00:55

by Stefan Roese

[permalink] [raw]
Subject: Re: [PATCH V7 0/3] Generate device tree node for pci devices

On 3/7/23 01:52, Rob Herring wrote:
> On Mon, Mar 6, 2023 at 3:24 PM Frank Rowand <[email protected]> wrote:
>>
>> On 3/6/23 02:35, [email protected] wrote:
>>> Le 2023-03-04 00:42, Frank Rowand a écrit :
>>>> On 2/27/23 04:31, Clément Léger wrote:
>>>>> Le Mon, 27 Feb 2023 00:51:29 -0600,
>>>>> Frank Rowand <[email protected]> a écrit :
>>>>>
>>>>>> On 1/19/23 21:02, Lizhi Hou wrote:
>>>>>>> This patch series introduces OF overlay support for PCI devices
>>>>>>> which
>>>>>>> primarily addresses two use cases. First, it provides a data driven
>>>>>>> method
>>>>>>> to describe hardware peripherals that are present in a PCI endpoint
>>>>>>> and
>>>>>>> hence can be accessed by the PCI host. Second, it allows reuse of a
>>>>>>> OF
>>>>>>> compatible driver -- often used in SoC platforms -- in a PCI host
>>>>>>> based
>>>>>>> system.
>>>>>>>
>>>>>>> There are 2 series devices rely on this patch:
>>>>>>>
>>>>>>> 1) Xilinx Alveo Accelerator cards (FPGA based device)
>>>>>>> 2) Microchip LAN9662 Ethernet Controller
>>>>>>>
>>>>>>> Please see:
>>>>>>> https://lore.kernel.org/lkml/[email protected]/
>>>>>>>
>>>>>>
>>>>>>
>>>>>>> Normally, the PCI core discovers PCI devices and their BARs using
>>>>>>> the
>>>>>>> PCI enumeration process. However, the process does not provide a way
>>>>>>> to
>>>>>>> discover the hardware peripherals that are present in a PCI device,
>>>>>>> and
>>>>>>> which can be accessed through the PCI BARs. Also, the enumeration
>>>>>>> process
>>>>>>
>>>>>> I'm confused. The PCI Configuration Header Registers should describe
>>>>>> the
>>>>>> hardware on the PCI card.
>>>>>>
>>>>>> Ignoring case 1 above _for the moment_ (FPGA devices are a world unto
>>>>>> themselves, so I would like to analyze that case separately), does
>>>>>> the
>>>>>> second device, "Microchip LAN9662 Ethernet Controller" properly
>>>>>> implement
>>>>>> the PCI Configuration Header Registers? What additional information
>>>>>> is
>>>>>> needed that is not provided in those registers?
>>>>>
>>>>> Hi Frank,
>>>>>
>>>>> I guess Lizhi wanted to say that it does not provide a way to describe
>>>>> all the "platform" devices that are exposed by this PCI device. Which
>>>>> is of course the whole point of the work we are doing right now. But
>>>>> all the BARs are correctly described by the LAN9662 PCI card.
>>>>>
>>>>> Clément
>>>>
>>>> I remain confused.
>>>>
>>>> [RFC 00/10] add support for fwnode in i2c mux system and sfp
>>>> https://lore.kernel.org/lkml/[email protected]/t/
>>>>
>>>> references a PCIe driver:
>>>> [2]
>>>> https://github.com/clementleger/linux/blob/fwnode_support/drivers/mfd/lan966x_pci_mfd.c
>>>>
>>>> So there is a PCIe driver that works.
>>>>
>>>> However, the RFC patch series was proposing adding fwnode support to
>>>> the driver. My first
>>>> surface reading (just part of that one email, not the entire series or
>>>> the replies yet),
>>>> notes:
>>>>
>>>> ... However, when
>>>> plugged in a PCIe slot (on a x86), there is no device-tree support
>>>> and
>>>> the peripherals that are present must be described in some other way.
>>>>
>>>> I am assuming that the peripherals are what you mentioned above as
>>>> '"platform"
>>>> devices'. This is where my current confusion lies. Are the "platform"
>>>> devices accessed via the PCI bus or is there some other electrical
>>>> connection
>>>> between the host system and the PCIe card?
>>>
>>> Hi Frank,
>>>
>>> The platform devices exposed by this PCIe card are available via some
>>> BAR using PCI memory mapped areas, so it's totally standard PCI stuff.
>>>
>>>>
>>>> If the "platform" devices are accessed via the PCI bus, then I would
>>>> expect them
>>>> to be described by PCI configuration header registers. Are the PCI
>>>> configuration
>>>> registers to describe the "platform" devices not present?
>>>
>>> I'm not sure to understand what you mean here. PCI configuration headers
>>> only provides some basic registers allowing to identify the PCI device
>>> (vendor/product) and some memory areas that are exposed (BAR). They do
>>> not provides the "list" of peripherals that are exposed by the devices,
>>> only some BARs that can be mapped and that allows to access.
>>
>> Yes, "identify the PCI device (vendor/product) and some memory areas".
>> The driver for the (vendor/product) 'knows' what peripherals are exposed
>> by the device and where within the BAR to find the registers for each
>> of the devices.
>>
>> A normal PCI driver would contain this information. If I understand the
>> proposal of this patch series, of_pci_make_dev_node() adds a node to
>> the devicetree, when invoked via a PCI quirk for certain specific
>> vendor/product cards. This node must exist for the flattened device
>> tree (FDT) overlay for the card to be loaded. The driver for the card
>> will get the overlay FDT from the card and load it into the kernel.
>> The driver will use the information that then exists in the devicetree
>> describing the card, instead of using information from the PCI configuration
>> headers from the card.
>
> How would all the sub devices be defined by the PCI config space other
> than a VID/PID implies *everything*. That's the same as the pre-DT
> world where the ARM machine ID number (from RMK's registry) implied
> everything. These days, we can have an entire SoC exposed behind a PCI
> BAR which I think is pretty much Clement's usecase. Putting an SoC
> behind a PCI BAR is no more discoverable than a "normal" SoC.
>
>>
>> The intent is to be able to re-use devicetree based drivers instead of
>> having the driver be a native PCI driver.
>
> Not instead of. There's the PCI driver for the FPGA or SoC bus with
> multiple unrelated devices behind it. The PCI driver is just a bus
> driver much like we have for various custom SoC bus drivers.
>
>> This goes against historical Linux practice. The idea of taking a driver
>> from another environment (eg Windows, HP Unix, Sun Unix, IBM Unix, etc)
>> and adding a shim layer to translate between Linux and the other
>> environment has been rejected. Ironically, in this case, the other
>> environment is Linux (more specifically the Linux OF implementation).
>
> I don't see how your example relates to this in any way whatsoever.
> We're talking about different discovery mechanisms, not different
> driver models/environments.
>
>> Even thought the other environment is Linux, this is still adding a
>> shim layer to translate between that other environment and the native
>> Linux PCI environment for which the driver would normally be written.
>>
>> In other words, this is not acceptable. Normal alternatives would be
>> something like
>> (1) add the PCI awareness to the existing drivers,
>
> The downstream devices don't have their own PCI config space. That
> won't work. PCI drivers expect a device with PCI config space. Devices
> to drivers are always 1:1, so we couldn't share the config space among
> multiple drivers or something. For devices which are not discoverable
> like these are, our choices are DT, ACPI or s/w nodes (aka
> platform_data 2.0).
>
>> (2) split the devicetree aware and PCI aware portions of the driver
>> to common code that would be invoked from separate devicetree and PCI
>> drivers,
>
> That only makes sense for something that is a single driver. Not the
> case here. For the FPGA, the devices are not known up front.
>
>> (3) write entirely separate devicetree and PCI drivers, or
>
> For the same reason as 1, that simply won't work.
>
>> (4) some other creative solution.
>>
>> Am I mis-interpretting or misunderstanding anything crucial here?
>
> Yes...
>
> We now have 3 different use cases all needing the same thing. The
> 3rd[1] is the recent test infrastructure change to have test devices
> added. They all have non-discoverable devices downstream of a PCI
> device. We need a solution here.

Thanks Rob for this explanation. I would like to second that, as I've
been looking for a "solution" for exact this situation for a few years
now in multiple system configurations. The setup mostly being identical:
An FPGA connected via PCIe to the host CPU, embedded in the FPGA misc
devices like NS16550 UART, GPIO controller, etc which often have
existing drivers in the Kernel. Using this dynamic device tree node
approach for the PCIe EP with the possibility to describe the FPGA-
internal devices via device tree seems to be the most elegant solution
IMHO.

Thanks,
Stefan

> Rob
>
> [1] https://lore.kernel.org/lkml/[email protected]/

2023-03-07 08:44:40

by Clément Léger

[permalink] [raw]
Subject: Re: [PATCH V7 0/3] Generate device tree node for pci devices

Le Mon, 6 Mar 2023 18:52:42 -0600,
Rob Herring <[email protected]> a écrit :

> On Mon, Mar 6, 2023 at 3:24 PM Frank Rowand <[email protected]> wrote:
> >
> > On 3/6/23 02:35, [email protected] wrote:
> > > Le 2023-03-04 00:42, Frank Rowand a écrit :
> > >> On 2/27/23 04:31, Clément Léger wrote:
> > >>> Le Mon, 27 Feb 2023 00:51:29 -0600,
> > >>> Frank Rowand <[email protected]> a écrit :
> > >>>
> > >>>> On 1/19/23 21:02, Lizhi Hou wrote:
> > >>>>> This patch series introduces OF overlay support for PCI devices
> > >>>>> which
> > >>>>> primarily addresses two use cases. First, it provides a data driven
> > >>>>> method
> > >>>>> to describe hardware peripherals that are present in a PCI endpoint
> > >>>>> and
> > >>>>> hence can be accessed by the PCI host. Second, it allows reuse of a
> > >>>>> OF
> > >>>>> compatible driver -- often used in SoC platforms -- in a PCI host
> > >>>>> based
> > >>>>> system.
> > >>>>>
> > >>>>> There are 2 series devices rely on this patch:
> > >>>>>
> > >>>>> 1) Xilinx Alveo Accelerator cards (FPGA based device)
> > >>>>> 2) Microchip LAN9662 Ethernet Controller
> > >>>>>
> > >>>>> Please see:
> > >>>>> https://lore.kernel.org/lkml/[email protected]/
> > >>>>>
> > >>>>
> > >>>>
> > >>>>> Normally, the PCI core discovers PCI devices and their BARs using
> > >>>>> the
> > >>>>> PCI enumeration process. However, the process does not provide a way
> > >>>>> to
> > >>>>> discover the hardware peripherals that are present in a PCI device,
> > >>>>> and
> > >>>>> which can be accessed through the PCI BARs. Also, the enumeration
> > >>>>> process
> > >>>>
> > >>>> I'm confused. The PCI Configuration Header Registers should describe
> > >>>> the
> > >>>> hardware on the PCI card.
> > >>>>
> > >>>> Ignoring case 1 above _for the moment_ (FPGA devices are a world unto
> > >>>> themselves, so I would like to analyze that case separately), does
> > >>>> the
> > >>>> second device, "Microchip LAN9662 Ethernet Controller" properly
> > >>>> implement
> > >>>> the PCI Configuration Header Registers? What additional information
> > >>>> is
> > >>>> needed that is not provided in those registers?
> > >>>
> > >>> Hi Frank,
> > >>>
> > >>> I guess Lizhi wanted to say that it does not provide a way to describe
> > >>> all the "platform" devices that are exposed by this PCI device. Which
> > >>> is of course the whole point of the work we are doing right now. But
> > >>> all the BARs are correctly described by the LAN9662 PCI card.
> > >>>
> > >>> Clément
> > >>
> > >> I remain confused.
> > >>
> > >> [RFC 00/10] add support for fwnode in i2c mux system and sfp
> > >> https://lore.kernel.org/lkml/[email protected]/t/
> > >>
> > >> references a PCIe driver:
> > >> [2]
> > >> https://github.com/clementleger/linux/blob/fwnode_support/drivers/mfd/lan966x_pci_mfd.c
> > >>
> > >> So there is a PCIe driver that works.
> > >>
> > >> However, the RFC patch series was proposing adding fwnode support to
> > >> the driver. My first
> > >> surface reading (just part of that one email, not the entire series or
> > >> the replies yet),
> > >> notes:
> > >>
> > >> ... However, when
> > >> plugged in a PCIe slot (on a x86), there is no device-tree support
> > >> and
> > >> the peripherals that are present must be described in some other way.
> > >>
> > >> I am assuming that the peripherals are what you mentioned above as
> > >> '"platform"
> > >> devices'. This is where my current confusion lies. Are the "platform"
> > >> devices accessed via the PCI bus or is there some other electrical
> > >> connection
> > >> between the host system and the PCIe card?
> > >
> > > Hi Frank,
> > >
> > > The platform devices exposed by this PCIe card are available via some
> > > BAR using PCI memory mapped areas, so it's totally standard PCI stuff.
> > >
> > >>
> > >> If the "platform" devices are accessed via the PCI bus, then I would
> > >> expect them
> > >> to be described by PCI configuration header registers. Are the PCI
> > >> configuration
> > >> registers to describe the "platform" devices not present?
> > >
> > > I'm not sure to understand what you mean here. PCI configuration headers
> > > only provides some basic registers allowing to identify the PCI device
> > > (vendor/product) and some memory areas that are exposed (BAR). They do
> > > not provides the "list" of peripherals that are exposed by the devices,
> > > only some BARs that can be mapped and that allows to access.
> >
> > Yes, "identify the PCI device (vendor/product) and some memory areas".
> > The driver for the (vendor/product) 'knows' what peripherals are exposed
> > by the device and where within the BAR to find the registers for each
> > of the devices.
> >
> > A normal PCI driver would contain this information. If I understand the
> > proposal of this patch series, of_pci_make_dev_node() adds a node to
> > the devicetree, when invoked via a PCI quirk for certain specific
> > vendor/product cards. This node must exist for the flattened device
> > tree (FDT) overlay for the card to be loaded. The driver for the card
> > will get the overlay FDT from the card and load it into the kernel.
> > The driver will use the information that then exists in the devicetree
> > describing the card, instead of using information from the PCI configuration
> > headers from the card.
>
> How would all the sub devices be defined by the PCI config space other
> than a VID/PID implies *everything*. That's the same as the pre-DT
> world where the ARM machine ID number (from RMK's registry) implied
> everything. These days, we can have an entire SoC exposed behind a PCI
> BAR which I think is pretty much Clement's usecase. Putting an SoC
> behind a PCI BAR is no more discoverable than a "normal" SoC.

Thanks Rob for re-explaining all of that, I thought the cover letter
at [1] explained that This is *exactly* my usecase. the lan9662 SoC can
either be used to run Linux and uses a device-tree description, or can
be configured as a PCIe endpoint card and plugged on, a PCI port, in
which case all the SoC IO memories are exposed through BAR 0 & 1.

>
> >
> > The intent is to be able to re-use devicetree based drivers instead of
> > having the driver be a native PCI driver.
>
> Not instead of. There's the PCI driver for the FPGA or SoC bus with
> multiple unrelated devices behind it. The PCI driver is just a bus
> driver much like we have for various custom SoC bus drivers.
>
> > This goes against historical Linux practice. The idea of taking a driver
> > from another environment (eg Windows, HP Unix, Sun Unix, IBM Unix, etc)
> > and adding a shim layer to translate between Linux and the other
> > environment has been rejected. Ironically, in this case, the other
> > environment is Linux (more specifically the Linux OF implementation).
>
> I don't see how your example relates to this in any way whatsoever.
> We're talking about different discovery mechanisms, not different
> driver models/environments.
>
> > Even thought the other environment is Linux, this is still adding a
> > shim layer to translate between that other environment and the native
> > Linux PCI environment for which the driver would normally be written.

Since there is an entire SoC described behind that PCI card, we need to
link all the devcies together. So it's not as simple as saying "I want
a driver for each device to be probed", we also need to describe the
whole hierarchy & links between the devices. PCI "itself" does not
describe how to define that, only a way to access the memory and
identify the PCI device.

> >
> > In other words, this is not acceptable. Normal alternatives would be
> > something like
> > (1) add the PCI awareness to the existing drivers,
>
> The downstream devices don't have their own PCI config space. That
> won't work. PCI drivers expect a device with PCI config space. Devices
> to drivers are always 1:1, so we couldn't share the config space among
> multiple drivers or something. For devices which are not discoverable
> like these are, our choices are DT, ACPI or s/w nodes (aka
> platform_data 2.0).

Exactly, and even though it would be possible to share the the config
space, it would mean that each driver would need to be modified to
support PCI and all the OF layer that allows to link the devcuie
together and configure the device would need to be modified in some way
to allows passing arguments, that would be going back to paltform_data
stuff.

>
> > (2) split the devicetree aware and PCI aware portions of the driver
> > to common code that would be invoked from separate devicetree and PCI
> > drivers,
>
> That only makes sense for something that is a single driver. Not the
> case here. For the FPGA, the devices are not known up front.
>
> > (3) write entirely separate devicetree and PCI drivers, or
>
> For the same reason as 1, that simply won't work.
>
> > (4) some other creative solution.
> >
> > Am I mis-interpretting or misunderstanding anything crucial here?
>
> Yes...
>
> We now have 3 different use cases all needing the same thing. The
> 3rd[1] is the recent test infrastructure change to have test devices
> added. They all have non-discoverable devices downstream of a PCI
> device. We need a solution here.

We have been going back and forth for about more than a year now and we
tested several things (software nodes, ACPI, MFD cells,
device-tree overlays). The most elegant solution (at least from the
ones we though of) has proven to be the device-tree overlays with
dynamic PCI of nodes for various reasons:

- Minimal amount of code modified
- Reusability of the existing SoC device-tree
- Already available bindings (no need to define a new standard for
system description)
- Works on all platforms (x86, ARM, ARM64, etc)
- Ease of use for the end-user

No other solution was providing so much pros (see [1] for the history).
Of course there are some cons such as the *not so clear* status about OF
overlays statiblity when loaded/unloaded but we are clearly working
toward a better support.

I even think that a common driver to handle much of these use cases
could exists and allow to load an overlay based on the PCI VID/PID and
apply some ranges remapping depending on it, allowing to reduce the
amount of specific complex drivers for handling these usecases.

Clément

[1]
https://lore.kernel.org/lkml/[email protected]/
--
Clément Léger,
Embedded Linux and Kernel engineer at Bootlin
https://bootlin.com

2023-03-08 07:29:16

by Frank Rowand

[permalink] [raw]
Subject: Re: [PATCH V7 0/3] Generate device tree node for pci devices

On 3/7/23 02:47, Clément Léger wrote:
> Le Mon, 6 Mar 2023 18:52:42 -0600,
> Rob Herring <[email protected]> a écrit :
>
>> On Mon, Mar 6, 2023 at 3:24 PM Frank Rowand <[email protected]> wrote:
>>>
>>> On 3/6/23 02:35, [email protected] wrote:
>>>> Le 2023-03-04 00:42, Frank Rowand a écrit :
>>>>> On 2/27/23 04:31, Clément Léger wrote:
>>>>>> Le Mon, 27 Feb 2023 00:51:29 -0600,
>>>>>> Frank Rowand <[email protected]> a écrit :
>>>>>>
>>>>>>> On 1/19/23 21:02, Lizhi Hou wrote:
>>>>>>>> This patch series introduces OF overlay support for PCI devices
>>>>>>>> which
>>>>>>>> primarily addresses two use cases. First, it provides a data driven
>>>>>>>> method
>>>>>>>> to describe hardware peripherals that are present in a PCI endpoint
>>>>>>>> and
>>>>>>>> hence can be accessed by the PCI host. Second, it allows reuse of a
>>>>>>>> OF
>>>>>>>> compatible driver -- often used in SoC platforms -- in a PCI host
>>>>>>>> based
>>>>>>>> system.
>>>>>>>>
>>>>>>>> There are 2 series devices rely on this patch:
>>>>>>>>
>>>>>>>> 1) Xilinx Alveo Accelerator cards (FPGA based device)
>>>>>>>> 2) Microchip LAN9662 Ethernet Controller
>>>>>>>>
>>>>>>>> Please see:

First off, if I had paid gone back and finished reading through this link:

>>>>>>>> https://lore.kernel.org/lkml/[email protected]/

I would have found out that the Microchip LAN9662 Ethernet Controller is
_not_ just an ethernet controller (as I was assuming), but is instead also
contains ARM64 cores. (Actually I am sure that I had noted that an SoC
was involved, but lost track of that info when trying to wrap my head
around this patch series.)

Even this information does not explain where the ethernet controller is
located and how it is accessed. My most likely guess without looking at
chip specification is that the ethernet controller interfaces to the
processor bus and/or an IOMMU is probably involved. What is very unclear
to me is the interconnection between the card PCI connector (which will be
plugged into the host system where we are running Linux for this patch
series). This will feed into my misunderstanding below.

For future versions of this patch series, please include the info that the
LAN9662 Ethernet Controller is an SoC, as was noted in the above link.

>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Normally, the PCI core discovers PCI devices and their BARs using
>>>>>>>> the
>>>>>>>> PCI enumeration process. However, the process does not provide a way
>>>>>>>> to
>>>>>>>> discover the hardware peripherals that are present in a PCI device,
>>>>>>>> and
>>>>>>>> which can be accessed through the PCI BARs. Also, the enumeration
>>>>>>>> process
>>>>>>>
>>>>>>> I'm confused. The PCI Configuration Header Registers should describe
>>>>>>> the
>>>>>>> hardware on the PCI card.
>>>>>>>
>>>>>>> Ignoring case 1 above _for the moment_ (FPGA devices are a world unto
>>>>>>> themselves, so I would like to analyze that case separately), does
>>>>>>> the
>>>>>>> second device, "Microchip LAN9662 Ethernet Controller" properly
>>>>>>> implement
>>>>>>> the PCI Configuration Header Registers? What additional information
>>>>>>> is
>>>>>>> needed that is not provided in those registers?
>>>>>>
>>>>>> Hi Frank,
>>>>>>
>>>>>> I guess Lizhi wanted to say that it does not provide a way to describe
>>>>>> all the "platform" devices that are exposed by this PCI device. Which
>>>>>> is of course the whole point of the work we are doing right now. But
>>>>>> all the BARs are correctly described by the LAN9662 PCI card.
>>>>>>
>>>>>> Clément
>>>>>
>>>>> I remain confused.
>>>>>
>>>>> [RFC 00/10] add support for fwnode in i2c mux system and sfp
>>>>> https://lore.kernel.org/lkml/[email protected]/t/
>>>>>
>>>>> references a PCIe driver:
>>>>> [2]
>>>>> https://github.com/clementleger/linux/blob/fwnode_support/drivers/mfd/lan966x_pci_mfd.c
>>>>>
>>>>> So there is a PCIe driver that works.
>>>>>
>>>>> However, the RFC patch series was proposing adding fwnode support to
>>>>> the driver. My first
>>>>> surface reading (just part of that one email, not the entire series or
>>>>> the replies yet),
>>>>> notes:
>>>>>
>>>>> ... However, when
>>>>> plugged in a PCIe slot (on a x86), there is no device-tree support
>>>>> and
>>>>> the peripherals that are present must be described in some other way.
>>>>>
>>>>> I am assuming that the peripherals are what you mentioned above as
>>>>> '"platform"
>>>>> devices'. This is where my current confusion lies. Are the "platform"
>>>>> devices accessed via the PCI bus or is there some other electrical
>>>>> connection
>>>>> between the host system and the PCIe card?
>>>>>>>> Hi Frank,
>>>>

From Clément:

>>>> The platform devices exposed by this PCIe card are available via some
>>>> BAR using PCI memory mapped areas, so it's totally standard PCI stuff.
>>>>
>>>>>

I had been assuming that each device that is visible via the PCI connector
had a set of PCI configuration header registers that
provided the information required to use that device.

The above paragraph reinforced my understanding, saying "so it's totally
standard PCI stuff". (Apparently my understanding was wrong, see further
below...)

I had earlier asked this:

>>>>> If the "platform" devices are accessed via the PCI bus, then I would
>>>>> expect them
>>>>> to be described by PCI configuration header registers. Are the PCI
>>>>> configuration
>>>>> registers to describe the "platform" devices not present?

Clément replied:

>>>>
>>>> I'm not sure to understand what you mean here. PCI configuration headers
>>>> only provides some basic registers allowing to identify the PCI device
>>>> (vendor/product) and some memory areas that are exposed (BAR). They do
>>>> not provides the "list" of peripherals that are exposed by the devices,
>>>> only some BARs that can be mapped and that allows to access.
>>>

To which I replied:

>>> Yes, "identify the PCI device (vendor/product) and some memory areas".
>>> The driver for the (vendor/product) 'knows' what peripherals are exposed
>>> by the device and where within the BAR to find the registers for each
>>> of the devices.
>>>
>>> A normal PCI driver would contain this information. If I understand the
>>> proposal of this patch series, of_pci_make_dev_node() adds a node to
>>> the devicetree, when invoked via a PCI quirk for certain specific
>>> vendor/product cards. This node must exist for the flattened device
>>> tree (FDT) overlay for the card to be loaded. The driver for the card
>>> will get the overlay FDT from the card and load it into the kernel.
>>> The driver will use the information that then exists in the devicetree
>>> describing the card, instead of using information from the PCI configuration
>>> headers from the card.
>>

Clément had further replied (this seems to have gotten lost in Rob's reply to me:

In the case of the lan9662 cnetwork controller, BAR 0 and 1 exposes
multiples devices that are located at some subranges of this BAR. For
instance (not accurate), we have the I2C controller located at BAR 0
+ offset 0X1000, then the flexcom controller exposed in BAR 0 at offset
0x20000, etc. This list of peripheral is not exposed at all by the PCI
configuration headers (since it is not the purpose of course). All of
these peripherals have already existing platform drivers which can then
be reused thanks to the PCI device-tree overlay series.

At this point my understanding (or misunderstanding as we will see below)
remains intact, with the slight addition that the devices on the PCI card
are multi-function. I've never worked with multi-function devices on a
PCI card so I don't know how they are supposed to be described, but this
new information seems reasonable.

The (vendor/product) would provide the information to find a driver
that understands that this PCI device contains both the I2C controller
and the flexcom controller.

The phrase "This list of peripheral is not exposed at all by the PCI
configuration headers" is confusing to me. A major function of the
configuration headers it to list exactly what devices are available on
the card, as identified by (vendor/product) in each configuration
header. The (vendor/product) "exposes" the multi-function device,
also known as "the list of peripheral".


Rob's "re-explaining":

>> How would all the sub devices be defined by the PCI config space other
>> than a VID/PID implies *everything*. That's the same as the pre-DT
>> world where the ARM machine ID number (from RMK's registry) implied
>> everything. These days, we can have an entire SoC exposed behind a PCI
>> BAR which I think is pretty much Clement's usecase. Putting an SoC
>> behind a PCI BAR is no more discoverable than a "normal" SoC.

That is a PCI device problem, not a non-discoverable device on the
system non-discoverable busses. PCI devices are supposed to be
discoverable on the PCI bus.

This series claimed to be about (1) a method to "describe hardware devices
that are present in a PCI endpoint" and (2) "allows reuse of a OF
compatible driver".

(1) is normally implemented directly in a PCI driver

(2) is using a shim to make a single driver able to access a device that
is either directly on the system bus (the driver "native" environment)
or located on a PCI card (in other words on the other side of the PCI
bus interface on the PCI card).

"Putting an SoC behind a PCI BAR" is not different to me than any
other device on a PCI card. And I do not _even_ know what it means
to put an SoC behind a PCI BAR (there are many ways that could be
implemented), which is why I was asking Clément questions, trying
to understand the problem space.

PCI devices are more than a BAR, there is a set of PCI Configuration
Header Registers to provide the requisite information to locate the
proper driver for the device and provides the dynamic information
specific to the device in this card (eg address ranges, interrupt
info, MSI, MSI-X, etc). The driver is expected to contain all
further knowledge about how to control the device.


Clément replies:

> Thanks Rob for re-explaining all of that, I thought the cover letter
> at [1] explained that This is *exactly* my usecase. the lan9662 SoC can
> either be used to run Linux and uses a device-tree description, or can

This is a clue for me:

> be configured as a PCIe endpoint card and plugged on, a PCI port, in
> which case all the SoC IO memories are exposed through BAR 0 & 1.

I had been assuming that each SoC supplied device that is visible via
the PCI connector had a set of PCI Configuration Header Registers that
provided the information required to use that device.

I think this is where my misunderstanding jumps out.

Please correct and explain as necessary anything in the block the
I write here:

The clue above appears to say that one portion of "SoC IO memories" are
exposed through BAR 0 and a second portion of "SoC IO memories" are
exposed through BAR 1.

This would be consistent with Rob's description saying the entire
SoC is exposed by a single Configuration Header Register set.

Caveat: I've never worked with an IOMMU on ARM64 (only on other very
different architectures) so I have no clue as to what the specific
ARM64 IOMMU architecture is or any related nuances.

I don't know what "SoC IO memories" are. But given the context of a
driver accessing addresses, I am guessing these are addresses in the
context of the processor bus. The precise description probably does
not matter if I can simply assume they are addresses the driver will
use, and the addresses are properly mapped into the addresses listed
in the BARs.

Frank's assumption: the relevant devices, such as the ethernet controller
are located directly on the SoC processor bus (is the proper term "memory
bus"? - addresses that are accessed via SoC load and store instructions).

So are the entire set of SoC processor bus addresses mapped via the BARs?
Or even a significant subset?

Further Frank's assumptions: Some sort of firmware is running on the SoC
that is on the PCI card that does things like control clocks, do power
management, set up the interrupts from devices to be visible via the PCI
connector, as described by the PCI Configuration Header Registers (possibly
including the MSI and/or MSI-X registers), any IOMMU setup if needed, etc.
The host system that the PCI card is plugged into is insulated from that
set of activities by the PCI interface. <----- This assumption may turn
out to be wrong, as I ask more questions further below.


>
>>
>>>
>>> The intent is to be able to re-use devicetree based drivers instead of
>>> having the driver be a native PCI driver.
>>
>> Not instead of. There's the PCI driver for the FPGA or SoC bus with
>> multiple unrelated devices behind it. The PCI driver is just a bus
>> driver much like we have for various custom SoC bus drivers.
>>
>>> This goes against historical Linux practice. The idea of taking a driver
>>> from another environment (eg Windows, HP Unix, Sun Unix, IBM Unix, etc)
>>> and adding a shim layer to translate between Linux and the other
>>> environment has been rejected. Ironically, in this case, the other
>>> environment is Linux (more specifically the Linux OF implementation).
>>
>> I don't see how your example relates to this in any way whatsoever.
>> We're talking about different discovery mechanisms, not different
>> driver models/environments.
>>
>>> Even thought the other environment is Linux, this is still adding a
>>> shim layer to translate between that other environment and the native
>>> Linux PCI environment for which the driver would normally be written.
>

Clément replies:

> Since there is an entire SoC described behind that PCI card, we need to
> link all the devcies together. So it's not as simple as saying "I want
> a driver for each device to be probed", we also need to describe the
> whole hierarchy & links between the devices. PCI "itself" does not
> describe how to define that, only a way to access the memory and
> identify the PCI device.

This makes me guess that the assumption I made above in this email
is not correct. This seems to be saying that the host system needs
to understand related devices in the card's SoC, and possibly control
them. An example of this could be that the PCI Configuration Header
Register do _not_ describe the interrupt information for e.g. the
ethernet controller, and maybe the host system needs to understand
and/or program an interrupt controller in the card's SoC. Is this
conceptually correct (that is maybe not this specific hardware, but
a similar concept)?


>
>>>
>>> In other words, this is not acceptable. Normal alternatives would be
>>> something like
>>> (1) add the PCI awareness to the existing drivers,
>>

Rob said:

>> The downstream devices don't have their own PCI config space. That
>> won't work. PCI drivers expect a device with PCI config space. Devices
>> to drivers are always 1:1, so we couldn't share the config space among

A PCI device can be a multifunction device. Whether the host OS supports
this via a single driver or multiple drivers is OS dependent.

>> multiple drivers or something. For devices which are not discoverable
>> like these are, our choices are DT, ACPI or s/w nodes (aka
>> platform_data 2.0).

The devices _are_ discoverable. The PCI Configuration Header Register
(vendor/product) says what the devices are.

>

Clément replies:
> Exactly, and even though it would be possible to share the the config
> space, it would mean that each driver would need to be modified to
> support PCI and all the OF layer that allows to link the devcuie
> together and configure the device would need to be modified in some way
> to allows passing arguments, that would be going back to paltform_data
> stuff.

To me this implies that (as an example, using the ethernet controller)
the host system will be configuring the resources and/or infrastructure
on the SoC that is used by the ethernet controller.

Do I understand that correctly?

My previous expectation is the the ethernet controller driver on the
host system would only be touching registers mapped to the ethernet
controller driver to do things like send commands, read status results,
specify buffer addresses that would map back to flow through the PCI
connector, etc.

>
>>
>>> (2) split the devicetree aware and PCI aware portions of the driver
>>> to common code that would be invoked from separate devicetree and PCI
>>> drivers,
>>
>> That only makes sense for something that is a single driver. Not the
>> case here. For the FPGA, the devices are not known up front.
>>
>>> (3) write entirely separate devicetree and PCI drivers, or
>>
>> For the same reason as 1, that simply won't work.
>>
>>> (4) some other creative solution.
>>>
>>> Am I mis-interpretting or misunderstanding anything crucial here?
>>
>> Yes...
>>
>> We now have 3 different use cases all needing the same thing. The
>> 3rd[1] is the recent test infrastructure change to have test devices
>> added. They all have non-discoverable devices downstream of a PCI
>> device. We need a solution here.
>
> We have been going back and forth for about more than a year now and we
> tested several things (software nodes, ACPI, MFD cells,
> device-tree overlays). The most elegant solution (at least from the
> ones we though of) has proven to be the device-tree overlays with
> dynamic PCI of nodes for various reasons:

I can't really respond to this section and below because I still do
not understand the specific problem space of the SoC on the PCI card.

Hopefully your response to this email will move me further along
the path to properly understand the PCI card implementation.

-Frank

>
> - Minimal amount of code modified
> - Reusability of the existing SoC device-tree
> - Already available bindings (no need to define a new standard for
> system description)
> - Works on all platforms (x86, ARM, ARM64, etc)
> - Ease of use for the end-user
>
> No other solution was providing so much pros (see [1] for the history).
> Of course there are some cons such as the *not so clear* status about OF
> overlays statiblity when loaded/unloaded but we are clearly working
> toward a better support.
>
> I even think that a common driver to handle much of these use cases
> could exists and allow to load an overlay based on the PCI VID/PID and
> apply some ranges remapping depending on it, allowing to reduce the
> amount of specific complex drivers for handling these usecases.
>
> Clément
>
> [1]
> https://lore.kernel.org/lkml/[email protected]/


2023-03-08 07:31:59

by Frank Rowand

[permalink] [raw]
Subject: Re: [PATCH V7 0/3] Generate device tree node for pci devices

On 3/6/23 18:52, Rob Herring wrote:
> On Mon, Mar 6, 2023 at 3:24 PM Frank Rowand <[email protected]> wrote:
>>

< snip >

Hi Rob,

I am in no position to comment intelligently on your comments until I
understand the SoC on PCI card model I am asking to be described in
this subthread.

I'll try to remember to get back to this email once my understanding
is more complete.

-Frank


2023-03-08 07:39:16

by Frank Rowand

[permalink] [raw]
Subject: Re: [PATCH V7 0/3] Generate device tree node for pci devices

On 3/7/23 01:54, Stefan Roese wrote:
> On 3/7/23 01:52, Rob Herring wrote:
>> On Mon, Mar 6, 2023 at 3:24 PM Frank Rowand <[email protected]> wrote:
>>>
>>> On 3/6/23 02:35, [email protected] wrote:
>>>> Le 2023-03-04 00:42, Frank Rowand a écrit :
>>>>> On 2/27/23 04:31, Clément Léger wrote:
>>>>>> Le Mon, 27 Feb 2023 00:51:29 -0600,
>>>>>> Frank Rowand <[email protected]> a écrit :
>>>>>>
>>>>>>> On 1/19/23 21:02, Lizhi Hou wrote:
>>>>>>>> This patch series introduces OF overlay support for PCI devices
>>>>>>>> which
>>>>>>>> primarily addresses two use cases. First, it provides a data driven
>>>>>>>> method
>>>>>>>> to describe hardware peripherals that are present in a PCI endpoint
>>>>>>>> and
>>>>>>>> hence can be accessed by the PCI host. Second, it allows reuse of a
>>>>>>>> OF
>>>>>>>> compatible driver -- often used in SoC platforms -- in a PCI host
>>>>>>>> based
>>>>>>>> system.
>>>>>>>>
>>>>>>>> There are 2 series devices rely on this patch:
>>>>>>>>
>>>>>>>>    1) Xilinx Alveo Accelerator cards (FPGA based device)
>>>>>>>>    2) Microchip LAN9662 Ethernet Controller
>>>>>>>>
>>>>>>>>       Please see:
>>>>>>>> https://lore.kernel.org/lkml/[email protected]/
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Normally, the PCI core discovers PCI devices and their BARs using
>>>>>>>> the
>>>>>>>> PCI enumeration process. However, the process does not provide a way
>>>>>>>> to
>>>>>>>> discover the hardware peripherals that are present in a PCI device,
>>>>>>>> and
>>>>>>>> which can be accessed through the PCI BARs. Also, the enumeration
>>>>>>>> process
>>>>>>>
>>>>>>> I'm confused.  The PCI Configuration Header Registers should describe
>>>>>>> the
>>>>>>> hardware on the PCI card.
>>>>>>>
>>>>>>> Ignoring case 1 above _for the moment_ (FPGA devices are a world unto
>>>>>>> themselves, so I would like to analyze that case separately), does
>>>>>>> the
>>>>>>> second device, "Microchip LAN9662 Ethernet Controller" properly
>>>>>>> implement
>>>>>>> the PCI Configuration Header Registers?  What additional information
>>>>>>> is
>>>>>>> needed that is not provided in those registers?
>>>>>>
>>>>>> Hi Frank,
>>>>>>
>>>>>> I guess Lizhi wanted to say that it does not provide a way to describe
>>>>>> all the "platform" devices that are exposed by this PCI device. Which
>>>>>> is of course the whole point of the work we are doing right now. But
>>>>>> all the BARs are correctly described by the LAN9662 PCI card.
>>>>>>
>>>>>> Clément
>>>>>
>>>>> I remain confused.
>>>>>
>>>>> [RFC 00/10] add support for fwnode in i2c mux system and sfp
>>>>> https://lore.kernel.org/lkml/[email protected]/t/
>>>>>
>>>>>    references a PCIe driver:
>>>>>    [2]
>>>>> https://github.com/clementleger/linux/blob/fwnode_support/drivers/mfd/lan966x_pci_mfd.c
>>>>>
>>>>> So there is a PCIe driver that works.
>>>>>
>>>>> However, the RFC patch series was proposing adding fwnode support to
>>>>> the driver.  My first
>>>>> surface reading (just part of that one email, not the entire series or
>>>>> the replies yet),
>>>>> notes:
>>>>>
>>>>>    ... However, when
>>>>>    plugged in a PCIe slot (on a x86), there is no device-tree support
>>>>> and
>>>>>    the peripherals that are present must be described in some other way.
>>>>>
>>>>> I am assuming that the peripherals are what you mentioned above as
>>>>> '"platform"
>>>>> devices'.  This is where my current confusion lies.  Are the "platform"
>>>>> devices accessed via the PCI bus or is there some other electrical
>>>>> connection
>>>>> between the host system and the PCIe card?
>>>>
>>>> Hi Frank,
>>>>
>>>> The platform devices exposed by this PCIe card are available via some
>>>> BAR using PCI memory mapped areas, so it's totally standard PCI stuff.
>>>>
>>>>>
>>>>> If the "platform" devices are accessed via the PCI bus, then I would
>>>>> expect them
>>>>> to be described by PCI configuration header registers.  Are the PCI
>>>>> configuration
>>>>> registers to describe the "platform" devices not present?
>>>>
>>>> I'm not sure to understand what you mean here. PCI configuration headers
>>>> only provides some basic registers allowing to identify the PCI device
>>>> (vendor/product) and some memory areas that are exposed (BAR). They do
>>>> not provides the "list" of peripherals that are exposed by the devices,
>>>> only some BARs that can be mapped and that allows to access.
>>>
>>> Yes, "identify the PCI device (vendor/product) and some memory areas".
>>> The driver for the (vendor/product) 'knows' what peripherals are exposed
>>> by the device and where within the BAR to find the registers for each
>>> of the devices.
>>>
>>> A normal PCI driver would contain this information.  If I understand the
>>> proposal of this patch series, of_pci_make_dev_node() adds a node to
>>> the devicetree, when invoked via a PCI quirk for certain specific
>>> vendor/product cards.  This node must exist for the flattened device
>>> tree (FDT) overlay for the card to be loaded.  The driver for the card
>>> will get the overlay FDT from the card and load it into the kernel.
>>> The driver will use the information that then exists in the devicetree
>>> describing the card, instead of using information from the PCI configuration
>>> headers from the card.
>>
>> How would all the sub devices be defined by the PCI config space other
>> than a VID/PID implies *everything*. That's the same as the pre-DT
>> world where the ARM machine ID number (from RMK's registry) implied
>> everything. These days, we can have an entire SoC exposed behind a PCI
>> BAR which I think is pretty much Clement's usecase. Putting an SoC
>> behind a PCI BAR is no more discoverable than a "normal" SoC.
>>
>>>
>>> The intent is to be able to re-use devicetree based drivers instead of
>>> having the driver be a native PCI driver.
>>
>> Not instead of. There's the PCI driver for the FPGA or SoC bus with
>> multiple unrelated devices behind it. The PCI driver is just a bus
>> driver much like we have for various custom SoC bus drivers.
>>
>>> This goes against historical Linux practice.  The idea of taking a driver
>>> from another environment (eg Windows, HP Unix, Sun Unix, IBM Unix, etc)
>>> and adding a shim layer to translate between Linux and the other
>>> environment has been rejected.  Ironically, in this case, the other
>>> environment is Linux (more specifically the Linux OF implementation).
>>
>> I don't see how your example relates to this in any way whatsoever.
>> We're talking about different discovery mechanisms, not different
>> driver models/environments.
>>
>>> Even thought the other environment is Linux, this is still adding a
>>> shim layer to translate between that other environment and the native
>>> Linux PCI environment for which the driver would normally be written.
>>>
>>> In other words, this is not acceptable.  Normal alternatives would be
>>> something like
>>> (1) add the PCI awareness to the existing drivers,
>>
>> The downstream devices don't have their own PCI config space. That
>> won't work. PCI drivers expect a device with PCI config space. Devices
>> to drivers are always 1:1, so we couldn't share the config space among
>> multiple drivers or something. For devices which are not discoverable
>> like these are, our choices are DT, ACPI or s/w nodes (aka
>> platform_data 2.0).
>>
>>> (2) split the devicetree aware and PCI aware portions of the driver
>>> to common code that would be invoked from separate devicetree and PCI
>>> drivers,
>>
>> That only makes sense for something that is a single driver. Not the
>> case here. For the FPGA, the devices are not known up front.
>>
>>> (3) write entirely separate devicetree and PCI drivers, or
>>
>> For the same reason as 1, that simply won't work.
>>
>>> (4) some other creative solution.
>>>
>>> Am I mis-interpretting or misunderstanding anything crucial here?
>>
>> Yes...
>>
>> We now have 3 different use cases all needing the same thing. The
>> 3rd[1] is the recent test infrastructure change to have test devices
>> added. They all have non-discoverable devices downstream of a PCI
>> device. We need a solution here.
>
> Thanks Rob for this explanation. I would like to second that, as I've
> been looking for a "solution" for exact this situation for a few years
> now in multiple system configurations. The setup mostly being identical:

Please don't hijack this subthread.

I am really struggling to get a correct understanding of the issues
specifically related to the example of the Microchip LAN9662 Ethernet
Controller on a PCI card. I want this subthread to remain focused
on that, as much as possible.

The below info will be useful in the bigger picture discussion, and
the main thread discussion, but I want to keep this subthread focused
on the one specific example item of hardware.

Thanks (and thanks for all your useful participation over many years),

Frank

> An FPGA connected via PCIe to the host CPU, embedded in the FPGA misc
> devices like NS16550 UART, GPIO controller, etc which often have
> existing drivers in the Kernel. Using this dynamic device tree node
> approach for the PCIe EP with the possibility to describe the FPGA-
> internal devices via device tree seems to be the most elegant solution
> IMHO.
>
> Thanks,
> Stefan
>
>> Rob
>>
>> [1] https://lore.kernel.org/lkml/[email protected]/


2023-03-09 05:52:53

by Frank Rowand

[permalink] [raw]
Subject: Re: [PATCH V7 0/3] Generate device tree node for pci devices

On 1/19/23 21:02, Lizhi Hou wrote:
> This patch series introduces OF overlay support for PCI devices which
> primarily addresses two use cases. First, it provides a data driven method
> to describe hardware peripherals that are present in a PCI endpoint and
> hence can be accessed by the PCI host. Second, it allows reuse of a OF
> compatible driver -- often used in SoC platforms -- in a PCI host based
> system.
>
> There are 2 series devices rely on this patch:
>
> 1) Xilinx Alveo Accelerator cards (FPGA based device)
> 2) Microchip LAN9662 Ethernet Controller

Can someone please provide me a link to both:

- a high level specification sheet
- a detailed data sheet/programming manual

-Frank


>
> Please see: https://lore.kernel.org/lkml/[email protected]/
>
> Normally, the PCI core discovers PCI devices and their BARs using the
> PCI enumeration process. However, the process does not provide a way to
> discover the hardware peripherals that are present in a PCI device, and
> which can be accessed through the PCI BARs. Also, the enumeration process
> does not provide a way to associate MSI-X vectors of a PCI device with the
> hardware peripherals that are present in the device. PCI device drivers
> often use header files to describe the hardware peripherals and their
> resources as there is no standard data driven way to do so. This patch
> series proposes to use flattened device tree blob to describe the
> peripherals in a data driven way. Based on previous discussion, using
> device tree overlay is the best way to unflatten the blob and populate
> platform devices. To use device tree overlay, there are three obvious
> problems that need to be resolved.
>
> First, we need to create a base tree for non-DT system such as x86_64. A
> patch series has been submitted for this:
> https://lore.kernel.org/lkml/[email protected]/
> https://lore.kernel.org/lkml/[email protected]/
>
> Second, a device tree node corresponding to the PCI endpoint is required
> for overlaying the flattened device tree blob for that PCI endpoint.
> Because PCI is a self-discoverable bus, a device tree node is usually not
> created for PCI devices. This series adds support to generate a device
> tree node for a PCI device which advertises itself using PCI quirks
> infrastructure.
>
> Third, we need to generate device tree nodes for PCI bridges since a child
> PCI endpoint may choose to have a device tree node created.
>
> This patch series is made up of three patches.
>
> The first patch is adding OF interface to create or destroy OF node
> dynamically.
>
> The second patch introduces a kernel option, CONFIG_DYNAMIC_PCI_OF_NODEX.
> When the option is turned on, the kernel will generate device tree nodes
> for all PCI bridges unconditionally. The patch also shows how to use the
> PCI quirks infrastructure, DECLARE_PCI_FIXUP_FINAL to generate a device
> tree node for a device. Specifically, the patch generates a device tree
> node for Xilinx Alveo U50 PCIe accelerator device. The generated device
> tree nodes do not have any property.
>
> The third patch adds basic properties ('reg', 'compatible' and
> 'device_type') to the dynamically generated device tree nodes. More
> properties can be added in the future.
>
> Here is the example of device tree nodes generated within the ARM64 QEMU.
> # lspci -t
> -[0000:00]-+-00.0
> +-01.0-[01]--
> +-01.1-[02]----00.0
> +-01.2-[03]----00.0
> +-01.3-[04]----00.0
> +-01.4-[05]----00.0
> +-01.5-[06]--
> +-01.6-[07]--
> +-01.7-[08]--
> +-02.0-[09-0b]----00.0-[0a-0b]----00.0-[0b]--+-00.0
> | \-00.1
> +-02.1-[0c]--
> \-03.0-[0d-0e]----00.0-[0e]----01.0
>
> # tree /sys/firmware/devicetree/base/pcie\@10000000
> /sys/firmware/devicetree/base/pcie@10000000
> |-- #address-cells
> |-- #interrupt-cells
> |-- #size-cells
> |-- bus-range
> |-- compatible
> |-- device_type
> |-- dma-coherent
> |-- interrupt-map
> |-- interrupt-map-mask
> |-- linux,pci-domain
> |-- msi-parent
> |-- name
> |-- pci@1,0
> | |-- #address-cells
> | |-- #size-cells
> | |-- compatible
> | |-- device_type
> | |-- ranges
> | `-- reg
> |-- pci@1,1
> | |-- #address-cells
> | |-- #size-cells
> | |-- compatible
> | |-- device_type
> | |-- ranges
> | `-- reg
> |-- pci@1,2
> | |-- #address-cells
> | |-- #size-cells
> | |-- compatible
> | |-- device_type
> | |-- ranges
> | `-- reg
> |-- pci@1,3
> | |-- #address-cells
> | |-- #size-cells
> | |-- compatible
> | |-- device_type
> | |-- ranges
> | `-- reg
> |-- pci@1,4
> | |-- #address-cells
> | |-- #size-cells
> | |-- compatible
> | |-- device_type
> | |-- ranges
> | `-- reg
> |-- pci@1,5
> | |-- #address-cells
> | |-- #size-cells
> | |-- compatible
> | |-- device_type
> | |-- ranges
> | `-- reg
> |-- pci@1,6
> | |-- #address-cells
> | |-- #size-cells
> | |-- compatible
> | |-- device_type
> | |-- ranges
> | `-- reg
> |-- pci@1,7
> | |-- #address-cells
> | |-- #size-cells
> | |-- compatible
> | |-- device_type
> | |-- ranges
> | `-- reg
> |-- pci@2,0
> | |-- #address-cells
> | |-- #size-cells
> | |-- compatible
> | |-- device_type
> | |-- pci@0,0
> | | |-- #address-cells
> | | |-- #size-cells
> | | |-- compatible
> | | |-- device_type
> | | |-- pci@0,0
> | | | |-- #address-cells
> | | | |-- #size-cells
> | | | |-- compatible
> | | | |-- dev@0,0
> | | | | |-- compatible
> | | | | `-- reg
> | | | |-- dev@0,1
> | | | | |-- compatible
> | | | | `-- reg
> | | | |-- device_type
> | | | |-- ranges
> | | | `-- reg
> | | |-- ranges
> | | `-- reg
> | |-- ranges
> | `-- reg
> |-- pci@2,1
> | |-- #address-cells
> | |-- #size-cells
> | |-- compatible
> | |-- device_type
> | |-- ranges
> | `-- reg
> |-- pci@3,0
> | |-- #address-cells
> | |-- #size-cells
> | |-- compatible
> | |-- device_type
> | |-- pci@0,0
> | | |-- #address-cells
> | | |-- #size-cells
> | | |-- compatible
> | | |-- device_type
> | | |-- ranges
> | | `-- reg
> | |-- ranges
> | `-- reg
> |-- ranges
> `-- reg
>
> Changes since v6:
> - Removed single line wrapper functions
> - Added Signed-off-by Clément Léger <[email protected]>
>
> Changes since v5:
> - Fixed code review comments
> - Fixed incorrect 'ranges' and 'reg' properties and verified address
> translation.
>
> Changes since RFC v4:
> - Fixed code review comments
>
> Changes since RFC v3:
> - Split the Xilinx Alveo U50 PCI quirk to a separate patch
> - Minor changes in commit description and code comment
>
> Changes since RFC v2:
> - Merged patch 3 with patch 2
> - Added OF interfaces of_changeset_add_prop_* and use them to create
> properties.
> - Added '#address-cells', '#size-cells' and 'ranges' properties.
>
> Changes since RFC v1:
> - Added one patch to create basic properties.
> - To move DT related code out of PCI subsystem, replaced of_node_alloc()
> with of_create_node()/of_destroy_node()
>
> Lizhi Hou (3):
> of: dynamic: Add interfaces for creating device node dynamically
> PCI: Create device tree node for selected devices
> PCI: Add PCI quirks to generate device tree node for Xilinx Alveo U50
>
> drivers/of/dynamic.c | 197 +++++++++++++++++++++++++++++++++
> drivers/pci/Kconfig | 12 ++
> drivers/pci/Makefile | 1 +
> drivers/pci/bus.c | 2 +
> drivers/pci/msi/irqdomain.c | 6 +-
> drivers/pci/of.c | 71 ++++++++++++
> drivers/pci/of_property.c | 212 ++++++++++++++++++++++++++++++++++++
> drivers/pci/pci-driver.c | 3 +-
> drivers/pci/pci.h | 19 ++++
> drivers/pci/quirks.c | 11 ++
> drivers/pci/remove.c | 1 +
> include/linux/of.h | 24 ++++
> 12 files changed, 556 insertions(+), 3 deletions(-)
> create mode 100644 drivers/pci/of_property.c
>


2023-03-09 05:56:29

by Frank Rowand

[permalink] [raw]
Subject: Re: [PATCH V7 0/3] Generate device tree node for pci devices

On 3/8/23 23:52, Frank Rowand wrote:
> On 1/19/23 21:02, Lizhi Hou wrote:
>> This patch series introduces OF overlay support for PCI devices which
>> primarily addresses two use cases. First, it provides a data driven method
>> to describe hardware peripherals that are present in a PCI endpoint and
>> hence can be accessed by the PCI host. Second, it allows reuse of a OF
>> compatible driver -- often used in SoC platforms -- in a PCI host based
>> system.
>>
>> There are 2 series devices rely on this patch:
>>
>> 1) Xilinx Alveo Accelerator cards (FPGA based device)
>> 2) Microchip LAN9662 Ethernet Controller
>

Sorry, hit send too quickly...

> Can someone please provide me a link to both:
>
> - a high level specification sheet
> - a detailed data sheet/programming manual

for the lan9662 PCIe card

-Frank

>
> -Frank
>
>
>>
>> Please see: https://lore.kernel.org/lkml/[email protected]/
>>
>> Normally, the PCI core discovers PCI devices and their BARs using the
>> PCI enumeration process. However, the process does not provide a way to
>> discover the hardware peripherals that are present in a PCI device, and
>> which can be accessed through the PCI BARs. Also, the enumeration process
>> does not provide a way to associate MSI-X vectors of a PCI device with the
>> hardware peripherals that are present in the device. PCI device drivers
>> often use header files to describe the hardware peripherals and their
>> resources as there is no standard data driven way to do so. This patch
>> series proposes to use flattened device tree blob to describe the
>> peripherals in a data driven way. Based on previous discussion, using
>> device tree overlay is the best way to unflatten the blob and populate
>> platform devices. To use device tree overlay, there are three obvious
>> problems that need to be resolved.
>>
>> First, we need to create a base tree for non-DT system such as x86_64. A
>> patch series has been submitted for this:
>> https://lore.kernel.org/lkml/[email protected]/
>> https://lore.kernel.org/lkml/[email protected]/
>>
>> Second, a device tree node corresponding to the PCI endpoint is required
>> for overlaying the flattened device tree blob for that PCI endpoint.
>> Because PCI is a self-discoverable bus, a device tree node is usually not
>> created for PCI devices. This series adds support to generate a device
>> tree node for a PCI device which advertises itself using PCI quirks
>> infrastructure.
>>
>> Third, we need to generate device tree nodes for PCI bridges since a child
>> PCI endpoint may choose to have a device tree node created.
>>
>> This patch series is made up of three patches.
>>
>> The first patch is adding OF interface to create or destroy OF node
>> dynamically.
>>
>> The second patch introduces a kernel option, CONFIG_DYNAMIC_PCI_OF_NODEX.
>> When the option is turned on, the kernel will generate device tree nodes
>> for all PCI bridges unconditionally. The patch also shows how to use the
>> PCI quirks infrastructure, DECLARE_PCI_FIXUP_FINAL to generate a device
>> tree node for a device. Specifically, the patch generates a device tree
>> node for Xilinx Alveo U50 PCIe accelerator device. The generated device
>> tree nodes do not have any property.
>>
>> The third patch adds basic properties ('reg', 'compatible' and
>> 'device_type') to the dynamically generated device tree nodes. More
>> properties can be added in the future.
>>
>> Here is the example of device tree nodes generated within the ARM64 QEMU.
>> # lspci -t
>> -[0000:00]-+-00.0
>> +-01.0-[01]--
>> +-01.1-[02]----00.0
>> +-01.2-[03]----00.0
>> +-01.3-[04]----00.0
>> +-01.4-[05]----00.0
>> +-01.5-[06]--
>> +-01.6-[07]--
>> +-01.7-[08]--
>> +-02.0-[09-0b]----00.0-[0a-0b]----00.0-[0b]--+-00.0
>> | \-00.1
>> +-02.1-[0c]--
>> \-03.0-[0d-0e]----00.0-[0e]----01.0
>>
>> # tree /sys/firmware/devicetree/base/pcie\@10000000
>> /sys/firmware/devicetree/base/pcie@10000000
>> |-- #address-cells
>> |-- #interrupt-cells
>> |-- #size-cells
>> |-- bus-range
>> |-- compatible
>> |-- device_type
>> |-- dma-coherent
>> |-- interrupt-map
>> |-- interrupt-map-mask
>> |-- linux,pci-domain
>> |-- msi-parent
>> |-- name
>> |-- pci@1,0
>> | |-- #address-cells
>> | |-- #size-cells
>> | |-- compatible
>> | |-- device_type
>> | |-- ranges
>> | `-- reg
>> |-- pci@1,1
>> | |-- #address-cells
>> | |-- #size-cells
>> | |-- compatible
>> | |-- device_type
>> | |-- ranges
>> | `-- reg
>> |-- pci@1,2
>> | |-- #address-cells
>> | |-- #size-cells
>> | |-- compatible
>> | |-- device_type
>> | |-- ranges
>> | `-- reg
>> |-- pci@1,3
>> | |-- #address-cells
>> | |-- #size-cells
>> | |-- compatible
>> | |-- device_type
>> | |-- ranges
>> | `-- reg
>> |-- pci@1,4
>> | |-- #address-cells
>> | |-- #size-cells
>> | |-- compatible
>> | |-- device_type
>> | |-- ranges
>> | `-- reg
>> |-- pci@1,5
>> | |-- #address-cells
>> | |-- #size-cells
>> | |-- compatible
>> | |-- device_type
>> | |-- ranges
>> | `-- reg
>> |-- pci@1,6
>> | |-- #address-cells
>> | |-- #size-cells
>> | |-- compatible
>> | |-- device_type
>> | |-- ranges
>> | `-- reg
>> |-- pci@1,7
>> | |-- #address-cells
>> | |-- #size-cells
>> | |-- compatible
>> | |-- device_type
>> | |-- ranges
>> | `-- reg
>> |-- pci@2,0
>> | |-- #address-cells
>> | |-- #size-cells
>> | |-- compatible
>> | |-- device_type
>> | |-- pci@0,0
>> | | |-- #address-cells
>> | | |-- #size-cells
>> | | |-- compatible
>> | | |-- device_type
>> | | |-- pci@0,0
>> | | | |-- #address-cells
>> | | | |-- #size-cells
>> | | | |-- compatible
>> | | | |-- dev@0,0
>> | | | | |-- compatible
>> | | | | `-- reg
>> | | | |-- dev@0,1
>> | | | | |-- compatible
>> | | | | `-- reg
>> | | | |-- device_type
>> | | | |-- ranges
>> | | | `-- reg
>> | | |-- ranges
>> | | `-- reg
>> | |-- ranges
>> | `-- reg
>> |-- pci@2,1
>> | |-- #address-cells
>> | |-- #size-cells
>> | |-- compatible
>> | |-- device_type
>> | |-- ranges
>> | `-- reg
>> |-- pci@3,0
>> | |-- #address-cells
>> | |-- #size-cells
>> | |-- compatible
>> | |-- device_type
>> | |-- pci@0,0
>> | | |-- #address-cells
>> | | |-- #size-cells
>> | | |-- compatible
>> | | |-- device_type
>> | | |-- ranges
>> | | `-- reg
>> | |-- ranges
>> | `-- reg
>> |-- ranges
>> `-- reg
>>
>> Changes since v6:
>> - Removed single line wrapper functions
>> - Added Signed-off-by Clément Léger <[email protected]>
>>
>> Changes since v5:
>> - Fixed code review comments
>> - Fixed incorrect 'ranges' and 'reg' properties and verified address
>> translation.
>>
>> Changes since RFC v4:
>> - Fixed code review comments
>>
>> Changes since RFC v3:
>> - Split the Xilinx Alveo U50 PCI quirk to a separate patch
>> - Minor changes in commit description and code comment
>>
>> Changes since RFC v2:
>> - Merged patch 3 with patch 2
>> - Added OF interfaces of_changeset_add_prop_* and use them to create
>> properties.
>> - Added '#address-cells', '#size-cells' and 'ranges' properties.
>>
>> Changes since RFC v1:
>> - Added one patch to create basic properties.
>> - To move DT related code out of PCI subsystem, replaced of_node_alloc()
>> with of_create_node()/of_destroy_node()
>>
>> Lizhi Hou (3):
>> of: dynamic: Add interfaces for creating device node dynamically
>> PCI: Create device tree node for selected devices
>> PCI: Add PCI quirks to generate device tree node for Xilinx Alveo U50
>>
>> drivers/of/dynamic.c | 197 +++++++++++++++++++++++++++++++++
>> drivers/pci/Kconfig | 12 ++
>> drivers/pci/Makefile | 1 +
>> drivers/pci/bus.c | 2 +
>> drivers/pci/msi/irqdomain.c | 6 +-
>> drivers/pci/of.c | 71 ++++++++++++
>> drivers/pci/of_property.c | 212 ++++++++++++++++++++++++++++++++++++
>> drivers/pci/pci-driver.c | 3 +-
>> drivers/pci/pci.h | 19 ++++
>> drivers/pci/quirks.c | 11 ++
>> drivers/pci/remove.c | 1 +
>> include/linux/of.h | 24 ++++
>> 12 files changed, 556 insertions(+), 3 deletions(-)
>> create mode 100644 drivers/pci/of_property.c
>>
>


2023-03-09 08:43:15

by Clément Léger

[permalink] [raw]
Subject: Re: [PATCH V7 0/3] Generate device tree node for pci devices

Le Wed, 8 Mar 2023 01:31:52 -0600,
Frank Rowand <[email protected]> a écrit :

> On 3/6/23 18:52, Rob Herring wrote:
> > On Mon, Mar 6, 2023 at 3:24 PM Frank Rowand <[email protected]> wrote:
> >>
>
> < snip >
>
> Hi Rob,
>
> I am in no position to comment intelligently on your comments until I
> understand the SoC on PCI card model I am asking to be described in
> this subthread.

Hi Frank,

Rather than answering all of the assumptions that were made in the upper
thread (that are probably doing a bit too much of inference), I will
re-explain that from scratch.

Our usecase involves the lan966x SoCs. These SoCs are mainly targeting
networking application and offers multiple SFP and RGMII interfaces.
This Soc can be used in two exclusive modes (at least for the intended
usage):

SoC mode:
The device runs Linux by itself, on ARM64 cores included in the
SoC. This use-case of the lan966x is currently almost upstreamed,
using a traditional Device Tree representation of the lan996x HW
blocks [1] A number of drivers for the different IPs of the SoC have
already been merged in upstream Linux (see
arch/arm/boot/dts/lan966x.dtsi)

PCI mode:
The lan966x SoC is configured as a PCIe endpoint (PCI card),
connected to a separate platform that acts as the PCIe root complex.
In this case, all the IO memories that are exposed by the devices
embedded on this SoC are exposed through PCI BARs 0 & 1 and the ARM64
cores of the SoC are not used. Since this is a PCIe card, it can be
plugged on any platform, of any architecture supporting PCIe.

This work only focus on the *PCI mode* usage. In this mode, we have the
following prerequisites:
- Should work on all architectures (x86, ARM64, etc)
- Should be self-contained in the driver
- Should be able to reuse all existing platform drivers

In PCI mode, the card runs a firmware (not that it matters at all by
the way) which configure the card in PCI mode at boot time. In this
mode, it exposes a single PCI physical function associated with
vendor/product 0x1055/0x9660. This is not a multi-function PCI device !
This means that all the IO memories (peripheral memories, device
memories, registers, whatever you call them) are accessible using
standard readl()/writel() on the BARs that have been remapped. For
instance (not accurate), in the BAR 0, we will have this kind of memory
map:

BAR0
0x0 ┌───────────┐
│ │
├───────────┤
│ Clock │
│ controller│
├───────────┤
│ │
├───────────┤
│ I2C │
│ controller│
├───────────┤
│ │
├───────────┤
│ MDIO │
│ Controller│
├───────────┤
│ │
├───────────┤
│ Switch │
│ Controller│
├───────────┤
│ │
│ ... │


It also exposes either a single interrupt via the legacy interrupt
(which can then be demuxed by reading the SoC internal interrupt
controller registers), or multiple interrupts using MSI interrupts.

As stated before, all these peripherals are already supported in SoC
mode and thus, there are aleready existing platform drivers for each of
them. For more information about the devices that are exposed please
see link [1] which is the device-tree overlay used to describe the
lan9662 card.

In order to use the ethernet switch, we must configure everything that
lies around this ethernet controller, here are a few amongst all of
them:
- MDIO bus
- I2C controller for SFP modules access
- Clock controller
- Ethernet controller
- Syscon

Since all the platform drivers already exist for these devices, we
want to reuse them. Multiple solutions were thought of (fwnode, mfd,
ACPI, device-tree) and eventually ruled out for some of them and efforts
were made to try to tackle that (using fwnode [2], device-tree [3])

One way to do so is to use a device-tree overlay description that is
loaded dynamically on the PCI device OF node. This can be done using the
various device-tree series series that have been proposed (included
this one). On systems that do not provide a device-tree of_root, create
an empty of_root node (see [4]). Then during PCI enumeration, create
device-tree node matching the PCI tree that was enumerated (See [5]).
This is needed since the PCI card can be plugged on whatever port the
user wants and thus it can not be statically described using a fixed
"target-path" property in the overlay.

Finally, to glue everything together, we add a PCI driver for the
VID/PID of the PCI card (See [6]). This driver is responsible of adding
the "ranges" property in the device-tree PCI node to remap the child
nodes "reg" property to the PCI memory map. This is needed because the
PCI memory addresses differ between platform, enumeration order and so
on.Finally, the driver will load the device-tree overlay (See [1]) to
the PCI device-tree node. Eventually, a call to
of_platform_default_populate() will probe the nodes and platform
drivers.

I hope this will help you understanding what is going on here. In the
meantime, I'm also trying to obtain public documentation about the
lan966x SoC.

[1]
https://github.com/clementleger/linux/blob/bf9b4ef803d86c4ae59a4ca195a4152b0d5c3cea/drivers/mfd/lan966x_pci.dts
[2]
https://lore.kernel.org/netdev/[email protected]/T/
[3]
https://lore.kernel.org/lkml/[email protected]/
[4]
https://lore.kernel.org/lkml/[email protected]/
[5]
https://lore.kernel.org/lkml/[email protected]/
[6]
https://github.com/clementleger/linux/blob/bf9b4ef803d86c4ae59a4ca195a4152b0d5c3cea/drivers/mfd/lan966x_pci_of.c

--
Clément Léger,
Embedded Linux and Kernel engineer at Bootlin
https://bootlin.com

2023-03-21 08:46:22

by Christian Gmeiner

[permalink] [raw]
Subject: Re: [PATCH V7 0/3] Generate device tree node for pci devices

Hi all

Am Do., 9. März 2023 um 09:52 Uhr schrieb Clément Léger
<[email protected]>:
>
> Le Wed, 8 Mar 2023 01:31:52 -0600,
> Frank Rowand <[email protected]> a écrit :
>
> > On 3/6/23 18:52, Rob Herring wrote:
> > > On Mon, Mar 6, 2023 at 3:24 PM Frank Rowand <[email protected]> wrote:
> > >>
> >
> > < snip >
> >
> > Hi Rob,
> >
> > I am in no position to comment intelligently on your comments until I
> > understand the SoC on PCI card model I am asking to be described in
> > this subthread.
>
> Hi Frank,
>
> Rather than answering all of the assumptions that were made in the upper
> thread (that are probably doing a bit too much of inference), I will
> re-explain that from scratch.
>
> Our usecase involves the lan966x SoCs. These SoCs are mainly targeting
> networking application and offers multiple SFP and RGMII interfaces.
> This Soc can be used in two exclusive modes (at least for the intended
> usage):
>
> SoC mode:
> The device runs Linux by itself, on ARM64 cores included in the
> SoC. This use-case of the lan966x is currently almost upstreamed,
> using a traditional Device Tree representation of the lan996x HW
> blocks [1] A number of drivers for the different IPs of the SoC have
> already been merged in upstream Linux (see
> arch/arm/boot/dts/lan966x.dtsi)
>
> PCI mode:
> The lan966x SoC is configured as a PCIe endpoint (PCI card),
> connected to a separate platform that acts as the PCIe root complex.
> In this case, all the IO memories that are exposed by the devices
> embedded on this SoC are exposed through PCI BARs 0 & 1 and the ARM64
> cores of the SoC are not used. Since this is a PCIe card, it can be
> plugged on any platform, of any architecture supporting PCIe.
>
> This work only focus on the *PCI mode* usage. In this mode, we have the
> following prerequisites:
> - Should work on all architectures (x86, ARM64, etc)
> - Should be self-contained in the driver
> - Should be able to reuse all existing platform drivers
>
> In PCI mode, the card runs a firmware (not that it matters at all by
> the way) which configure the card in PCI mode at boot time. In this
> mode, it exposes a single PCI physical function associated with
> vendor/product 0x1055/0x9660. This is not a multi-function PCI device !
> This means that all the IO memories (peripheral memories, device
> memories, registers, whatever you call them) are accessible using
> standard readl()/writel() on the BARs that have been remapped. For
> instance (not accurate), in the BAR 0, we will have this kind of memory
> map:
>
> BAR0
> 0x0 ┌───────────┐
> │ │
> ├───────────┤
> │ Clock │
> │ controller│
> ├───────────┤
> │ │
> ├───────────┤
> │ I2C │
> │ controller│
> ├───────────┤
> │ │
> ├───────────┤
> │ MDIO │
> │ Controller│
> ├───────────┤
> │ │
> ├───────────┤
> │ Switch │
> │ Controller│
> ├───────────┤
> │ │
> │ ... │
>
>
> It also exposes either a single interrupt via the legacy interrupt
> (which can then be demuxed by reading the SoC internal interrupt
> controller registers), or multiple interrupts using MSI interrupts.
>
> As stated before, all these peripherals are already supported in SoC
> mode and thus, there are aleready existing platform drivers for each of
> them. For more information about the devices that are exposed please
> see link [1] which is the device-tree overlay used to describe the
> lan9662 card.
>
> In order to use the ethernet switch, we must configure everything that
> lies around this ethernet controller, here are a few amongst all of
> them:
> - MDIO bus
> - I2C controller for SFP modules access
> - Clock controller
> - Ethernet controller
> - Syscon
>
> Since all the platform drivers already exist for these devices, we
> want to reuse them. Multiple solutions were thought of (fwnode, mfd,
> ACPI, device-tree) and eventually ruled out for some of them and efforts
> were made to try to tackle that (using fwnode [2], device-tree [3])
>
> One way to do so is to use a device-tree overlay description that is
> loaded dynamically on the PCI device OF node. This can be done using the
> various device-tree series series that have been proposed (included
> this one). On systems that do not provide a device-tree of_root, create
> an empty of_root node (see [4]). Then during PCI enumeration, create
> device-tree node matching the PCI tree that was enumerated (See [5]).
> This is needed since the PCI card can be plugged on whatever port the
> user wants and thus it can not be statically described using a fixed
> "target-path" property in the overlay.
>
> Finally, to glue everything together, we add a PCI driver for the
> VID/PID of the PCI card (See [6]). This driver is responsible of adding
> the "ranges" property in the device-tree PCI node to remap the child
> nodes "reg" property to the PCI memory map. This is needed because the
> PCI memory addresses differ between platform, enumeration order and so
> on.Finally, the driver will load the device-tree overlay (See [1]) to
> the PCI device-tree node. Eventually, a call to
> of_platform_default_populate() will probe the nodes and platform
> drivers.
>
> I hope this will help you understanding what is going on here. In the
> meantime, I'm also trying to obtain public documentation about the
> lan966x SoC.
>
> [1]
> https://github.com/clementleger/linux/blob/bf9b4ef803d86c4ae59a4ca195a4152b0d5c3cea/drivers/mfd/lan966x_pci.dts
> [2]
> https://lore.kernel.org/netdev/[email protected]/T/
> [3]
> https://lore.kernel.org/lkml/[email protected]/
> [4]
> https://lore.kernel.org/lkml/[email protected]/
> [5]
> https://lore.kernel.org/lkml/[email protected]/
> [6]
> https://github.com/clementleger/linux/blob/bf9b4ef803d86c4ae59a4ca195a4152b0d5c3cea/drivers/mfd/lan966x_pci_of.c
>
> --
> Clément Léger,
> Embedded Linux and Kernel engineer at Bootlin
> https://bootlin.com

What is missing to move on with this patch set?

--
greets
--
Christian Gmeiner, MSc

https://christian-gmeiner.info/privacypolicy

2023-03-23 22:47:56

by Rob Herring (Arm)

[permalink] [raw]
Subject: Re: [PATCH V7 1/3] of: dynamic: Add interfaces for creating device node dynamically

On Thu, Jan 19, 2023 at 9:02 PM Lizhi Hou <[email protected]> wrote:
>
> of_create_node() creates device node dynamically. The parent device node
> and full name are required for creating the node. It optionally creates
> an OF changeset and attaches the newly created node to the changeset. The
> device node pointer and the changeset pointer can be used to add
> properties to the device node and apply the node to the base tree.
>
> of_destroy_node() frees the device node created by of_create_node(). If
> an OF changeset was also created for this node, it will destroy the
> changeset before freeing the device node.
>
> Expand of_changeset APIs to handle specific types of properties.
> of_changeset_add_prop_string()
> of_changeset_add_prop_string_array()
> of_changeset_add_prop_u32_array()
>
> Signed-off-by: Lizhi Hou <[email protected]>

Your Sob should be last because you sent this patch. The order of Sob
is roughly the order of possession of the patch.

> Signed-off-by: Sonal Santan <[email protected]>
> Signed-off-by: Max Zhen <[email protected]>

So Sonal and Max modified this patch?

> Reviewed-by: Brian Xu <[email protected]>
> Signed-off-by: Clément Léger <[email protected]>

Why does this have Clément's Sob?

> ---
> drivers/of/dynamic.c | 197 +++++++++++++++++++++++++++++++++++++++++++
> include/linux/of.h | 24 ++++++
> 2 files changed, 221 insertions(+)
>
> diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
> index cd3821a6444f..4e211a1d039f 100644
> --- a/drivers/of/dynamic.c
> +++ b/drivers/of/dynamic.c
> @@ -461,6 +461,71 @@ struct device_node *__of_node_dup(const struct device_node *np,
> return NULL;
> }
>
> +/**
> + * of_create_node - Dynamically create a device node

For consistency, I think this should be of_changeset_create_node().

> + *
> + * @parent: Pointer to parent device node
> + * @full_name: Node full name
> + * @cset: Pointer to returning changeset
> + *
> + * Return: Pointer to the created device node or NULL in case of an error.
> + */
> +struct device_node *of_create_node(struct device_node *parent,
> + const char *full_name,
> + struct of_changeset **cset)
> +{
> + struct of_changeset *ocs;
> + struct device_node *np;
> + int ret;
> +
> + np = __of_node_dup(NULL, full_name);
> + if (!np)
> + return NULL;
> + np->parent = parent;
> +
> + if (!cset)
> + return np;
> +
> + ocs = kmalloc(sizeof(*ocs), GFP_KERNEL);
> + if (!ocs) {
> + of_node_put(np);
> + return NULL;
> + }
> +
> + of_changeset_init(ocs);
> + ret = of_changeset_attach_node(ocs, np);
> + if (ret) {
> + of_changeset_destroy(ocs);
> + of_node_put(np);
> + kfree(ocs);
> + return NULL;
> + }
> +
> + np->data = ocs;
> + *cset = ocs;
> +
> + return np;
> +}
> +EXPORT_SYMBOL(of_create_node);
> +
> +/**
> + * of_destroy_node - Destroy a dynamically created device node
> + *
> + * @np: Pointer to dynamically created device node
> + *
> + */
> +void of_destroy_node(struct device_node *np)
> +{
> + struct of_changeset *ocs;
> +
> + if (np->data) {
> + ocs = (struct of_changeset *)np->data;
> + of_changeset_destroy(ocs);
> + }
> + of_node_put(np);

A sequence like this would be broken:

np = of_create_node()
of_node_get(np)
of_destroy_node(np)

The put here won't free the node because it still has a ref, but we
just freed the changeset. For this to work correctly, we would need
the release function to handle np->data instead. However, all users of
data aren't a changeset.

I'm failing to remember why we're storing the changeset in 'data', but
there doesn't seem to be a reason now so I think that can just be
dropped. Then if you want to free the node, you'd just do an
of_node_put(). (And maybe after the node is attached you do a put too,
because the attach does a get. Not completely sure.)

A unittest for all these functions would be helpful.

Rob

2023-03-24 02:15:39

by Lizhi Hou

[permalink] [raw]
Subject: Re: [PATCH V7 1/3] of: dynamic: Add interfaces for creating device node dynamically


On 3/23/23 15:40, Rob Herring wrote:
> On Thu, Jan 19, 2023 at 9:02 PM Lizhi Hou <[email protected]> wrote:
>> of_create_node() creates device node dynamically. The parent device node
>> and full name are required for creating the node. It optionally creates
>> an OF changeset and attaches the newly created node to the changeset. The
>> device node pointer and the changeset pointer can be used to add
>> properties to the device node and apply the node to the base tree.
>>
>> of_destroy_node() frees the device node created by of_create_node(). If
>> an OF changeset was also created for this node, it will destroy the
>> changeset before freeing the device node.
>>
>> Expand of_changeset APIs to handle specific types of properties.
>> of_changeset_add_prop_string()
>> of_changeset_add_prop_string_array()
>> of_changeset_add_prop_u32_array()
>>
>> Signed-off-by: Lizhi Hou <[email protected]>
> Your Sob should be last because you sent this patch. The order of Sob
> is roughly the order of possession of the patch.
Got it.
>
>> Signed-off-by: Sonal Santan <[email protected]>
>> Signed-off-by: Max Zhen <[email protected]>
> So Sonal and Max modified this patch?
They did not directly modify the code. And we discussed the design
together.  They also reviewed the patch before I sent it out. Please let
me know if other keyword should be used in this case.
>
>> Reviewed-by: Brian Xu <[email protected]>
>> Signed-off-by: Clément Léger <[email protected]>
> Why does this have Clément's Sob?
I referenced Clément 's code and used one portion in my first patch
series. And I re-implemented it later to address the code review
comments/requests.
>
>> ---
>> drivers/of/dynamic.c | 197 +++++++++++++++++++++++++++++++++++++++++++
>> include/linux/of.h | 24 ++++++
>> 2 files changed, 221 insertions(+)
>>
>> diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
>> index cd3821a6444f..4e211a1d039f 100644
>> --- a/drivers/of/dynamic.c
>> +++ b/drivers/of/dynamic.c
>> @@ -461,6 +461,71 @@ struct device_node *__of_node_dup(const struct device_node *np,
>> return NULL;
>> }
>>
>> +/**
>> + * of_create_node - Dynamically create a device node
> For consistency, I think this should be of_changeset_create_node().
Sure.
>
>> + *
>> + * @parent: Pointer to parent device node
>> + * @full_name: Node full name
>> + * @cset: Pointer to returning changeset
>> + *
>> + * Return: Pointer to the created device node or NULL in case of an error.
>> + */
>> +struct device_node *of_create_node(struct device_node *parent,
>> + const char *full_name,
>> + struct of_changeset **cset)
>> +{
>> + struct of_changeset *ocs;
>> + struct device_node *np;
>> + int ret;
>> +
>> + np = __of_node_dup(NULL, full_name);
>> + if (!np)
>> + return NULL;
>> + np->parent = parent;
>> +
>> + if (!cset)
>> + return np;
>> +
>> + ocs = kmalloc(sizeof(*ocs), GFP_KERNEL);
>> + if (!ocs) {
>> + of_node_put(np);
>> + return NULL;
>> + }
>> +
>> + of_changeset_init(ocs);
>> + ret = of_changeset_attach_node(ocs, np);
>> + if (ret) {
>> + of_changeset_destroy(ocs);
>> + of_node_put(np);
>> + kfree(ocs);
>> + return NULL;
>> + }
>> +
>> + np->data = ocs;
>> + *cset = ocs;
>> +
>> + return np;
>> +}
>> +EXPORT_SYMBOL(of_create_node);
>> +
>> +/**
>> + * of_destroy_node - Destroy a dynamically created device node
>> + *
>> + * @np: Pointer to dynamically created device node
>> + *
>> + */
>> +void of_destroy_node(struct device_node *np)
>> +{
>> + struct of_changeset *ocs;
>> +
>> + if (np->data) {
>> + ocs = (struct of_changeset *)np->data;
>> + of_changeset_destroy(ocs);
>> + }
>> + of_node_put(np);
> A sequence like this would be broken:
>
> np = of_create_node()
> of_node_get(np)
> of_destroy_node(np)
>
> The put here won't free the node because it still has a ref, but we
> just freed the changeset. For this to work correctly, we would need
> the release function to handle np->data instead. However, all users of
> data aren't a changeset.
>
> I'm failing to remember why we're storing the changeset in 'data', but
> there doesn't seem to be a reason now so I think that can just be
> dropped. Then if you want to free the node, you'd just do an
> of_node_put(). (And maybe after the node is attached you do a put too,
> because the attach does a get. Not completely sure.)

The question is how to save changeset and free it later. I used global
link list to track the changeset been created.

Storing the changeset in 'data' can avoid using the global link list.

To use of_node_put() to free both node and changeset, I think we can

  1) add a new flag, then in of_node_release() we can know np->data is
changeset by checking the flag.

  2) When creating node, allocate extra memory for changeset and set
np->data to a global function of_free_dynamic_node().

      In of_node_release(), check if np->data == of_free_dynamic_node,
call of_free_dynamic_node(np).

      in of_free_dynamic_node(), free changeset by
of_changeset_destroy(np+1)

Does this make sense to you? If yes, 1) or 2) sounds better?

>
> A unittest for all these functions would be helpful.

Ok, I will create unittest for the new added functions.


Thanks,

Lizhi

>
> Rob

2023-03-24 14:18:27

by Rob Herring (Arm)

[permalink] [raw]
Subject: Re: [PATCH V7 1/3] of: dynamic: Add interfaces for creating device node dynamically

On Thu, Mar 23, 2023 at 9:12 PM Lizhi Hou <[email protected]> wrote:
>
>
> On 3/23/23 15:40, Rob Herring wrote:
> > On Thu, Jan 19, 2023 at 9:02 PM Lizhi Hou <[email protected]> wrote:
> >> of_create_node() creates device node dynamically. The parent device node
> >> and full name are required for creating the node. It optionally creates
> >> an OF changeset and attaches the newly created node to the changeset. The
> >> device node pointer and the changeset pointer can be used to add
> >> properties to the device node and apply the node to the base tree.
> >>
> >> of_destroy_node() frees the device node created by of_create_node(). If
> >> an OF changeset was also created for this node, it will destroy the
> >> changeset before freeing the device node.
> >>
> >> Expand of_changeset APIs to handle specific types of properties.
> >> of_changeset_add_prop_string()
> >> of_changeset_add_prop_string_array()
> >> of_changeset_add_prop_u32_array()
> >>
> >> Signed-off-by: Lizhi Hou <[email protected]>
> > Your Sob should be last because you sent this patch. The order of Sob
> > is roughly the order of possession of the patch.
> Got it.
> >
> >> Signed-off-by: Sonal Santan <[email protected]>
> >> Signed-off-by: Max Zhen <[email protected]>
> > So Sonal and Max modified this patch?
> They did not directly modify the code. And we discussed the design
> together. They also reviewed the patch before I sent it out. Please let
> me know if other keyword should be used in this case.

Reviewed-by or nothing. Some feel that only reviews on public lists
should get that tag and internal, private reviews don't matter.

> >
> >> Reviewed-by: Brian Xu <[email protected]>
> >> Signed-off-by: Clément Léger <[email protected]>
> > Why does this have Clément's Sob?
> I referenced Clément 's code and used one portion in my first patch
> series. And I re-implemented it later to address the code review
> comments/requests.

Then it goes first or you can use the 'Co-developed-by' tag.

> >
> >> ---
> >> drivers/of/dynamic.c | 197 +++++++++++++++++++++++++++++++++++++++++++
> >> include/linux/of.h | 24 ++++++
> >> 2 files changed, 221 insertions(+)
> >>
> >> diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
> >> index cd3821a6444f..4e211a1d039f 100644
> >> --- a/drivers/of/dynamic.c
> >> +++ b/drivers/of/dynamic.c
> >> @@ -461,6 +461,71 @@ struct device_node *__of_node_dup(const struct device_node *np,
> >> return NULL;
> >> }
> >>
> >> +/**
> >> + * of_create_node - Dynamically create a device node
> > For consistency, I think this should be of_changeset_create_node().
> Sure.
> >
> >> + *
> >> + * @parent: Pointer to parent device node
> >> + * @full_name: Node full name
> >> + * @cset: Pointer to returning changeset
> >> + *
> >> + * Return: Pointer to the created device node or NULL in case of an error.
> >> + */
> >> +struct device_node *of_create_node(struct device_node *parent,
> >> + const char *full_name,
> >> + struct of_changeset **cset)
> >> +{
> >> + struct of_changeset *ocs;
> >> + struct device_node *np;
> >> + int ret;
> >> +
> >> + np = __of_node_dup(NULL, full_name);
> >> + if (!np)
> >> + return NULL;
> >> + np->parent = parent;
> >> +
> >> + if (!cset)
> >> + return np;
> >> +
> >> + ocs = kmalloc(sizeof(*ocs), GFP_KERNEL);
> >> + if (!ocs) {
> >> + of_node_put(np);
> >> + return NULL;
> >> + }
> >> +
> >> + of_changeset_init(ocs);
> >> + ret = of_changeset_attach_node(ocs, np);
> >> + if (ret) {
> >> + of_changeset_destroy(ocs);
> >> + of_node_put(np);
> >> + kfree(ocs);
> >> + return NULL;
> >> + }
> >> +
> >> + np->data = ocs;
> >> + *cset = ocs;
> >> +
> >> + return np;
> >> +}
> >> +EXPORT_SYMBOL(of_create_node);
> >> +
> >> +/**
> >> + * of_destroy_node - Destroy a dynamically created device node
> >> + *
> >> + * @np: Pointer to dynamically created device node
> >> + *
> >> + */
> >> +void of_destroy_node(struct device_node *np)
> >> +{
> >> + struct of_changeset *ocs;
> >> +
> >> + if (np->data) {
> >> + ocs = (struct of_changeset *)np->data;
> >> + of_changeset_destroy(ocs);
> >> + }
> >> + of_node_put(np);
> > A sequence like this would be broken:
> >
> > np = of_create_node()
> > of_node_get(np)
> > of_destroy_node(np)
> >
> > The put here won't free the node because it still has a ref, but we
> > just freed the changeset. For this to work correctly, we would need
> > the release function to handle np->data instead. However, all users of
> > data aren't a changeset.
> >
> > I'm failing to remember why we're storing the changeset in 'data', but
> > there doesn't seem to be a reason now so I think that can just be
> > dropped. Then if you want to free the node, you'd just do an
> > of_node_put(). (And maybe after the node is attached you do a put too,
> > because the attach does a get. Not completely sure.)
>
> The question is how to save changeset and free it later. I used global
> link list to track the changeset been created.
>
> Storing the changeset in 'data' can avoid using the global link list.
>
> To use of_node_put() to free both node and changeset, I think we can
>
> 1) add a new flag, then in of_node_release() we can know np->data is
> changeset by checking the flag.
>
> 2) When creating node, allocate extra memory for changeset and set
> np->data to a global function of_free_dynamic_node().
>
> In of_node_release(), check if np->data == of_free_dynamic_node,
> call of_free_dynamic_node(np).
>
> in of_free_dynamic_node(), free changeset by
> of_changeset_destroy(np+1)
>
> Does this make sense to you? If yes, 1) or 2) sounds better?

Neither works. Changesets and nodes are not 1:1 in general though they
are in your use. So you can use the data ptr, but the caller has to
decide that, not the DT core code.

Rob

2023-03-24 21:37:21

by Lizhi Hou

[permalink] [raw]
Subject: Re: [PATCH V7 1/3] of: dynamic: Add interfaces for creating device node dynamically


On 3/24/23 07:14, Rob Herring wrote:
> On Thu, Mar 23, 2023 at 9:12 PM Lizhi Hou <[email protected]> wrote:
>>
>> On 3/23/23 15:40, Rob Herring wrote:
>>> On Thu, Jan 19, 2023 at 9:02 PM Lizhi Hou <[email protected]> wrote:
>>>> of_create_node() creates device node dynamically. The parent device node
>>>> and full name are required for creating the node. It optionally creates
>>>> an OF changeset and attaches the newly created node to the changeset. The
>>>> device node pointer and the changeset pointer can be used to add
>>>> properties to the device node and apply the node to the base tree.
>>>>
>>>> of_destroy_node() frees the device node created by of_create_node(). If
>>>> an OF changeset was also created for this node, it will destroy the
>>>> changeset before freeing the device node.
>>>>
>>>> Expand of_changeset APIs to handle specific types of properties.
>>>> of_changeset_add_prop_string()
>>>> of_changeset_add_prop_string_array()
>>>> of_changeset_add_prop_u32_array()
>>>>
>>>> Signed-off-by: Lizhi Hou <[email protected]>
>>> Your Sob should be last because you sent this patch. The order of Sob
>>> is roughly the order of possession of the patch.
>> Got it.
>>>> Signed-off-by: Sonal Santan <[email protected]>
>>>> Signed-off-by: Max Zhen <[email protected]>
>>> So Sonal and Max modified this patch?
>> They did not directly modify the code. And we discussed the design
>> together. They also reviewed the patch before I sent it out. Please let
>> me know if other keyword should be used in this case.
> Reviewed-by or nothing. Some feel that only reviews on public lists
> should get that tag and internal, private reviews don't matter.
>
>>>> Reviewed-by: Brian Xu <[email protected]>
>>>> Signed-off-by: Clément Léger <[email protected]>
>>> Why does this have Clément's Sob?
>> I referenced Clément 's code and used one portion in my first patch
>> series. And I re-implemented it later to address the code review
>> comments/requests.
> Then it goes first or you can use the 'Co-developed-by' tag.
>
>>>> ---
>>>> drivers/of/dynamic.c | 197 +++++++++++++++++++++++++++++++++++++++++++
>>>> include/linux/of.h | 24 ++++++
>>>> 2 files changed, 221 insertions(+)
>>>>
>>>> diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
>>>> index cd3821a6444f..4e211a1d039f 100644
>>>> --- a/drivers/of/dynamic.c
>>>> +++ b/drivers/of/dynamic.c
>>>> @@ -461,6 +461,71 @@ struct device_node *__of_node_dup(const struct device_node *np,
>>>> return NULL;
>>>> }
>>>>
>>>> +/**
>>>> + * of_create_node - Dynamically create a device node
>>> For consistency, I think this should be of_changeset_create_node().
>> Sure.
>>>> + *
>>>> + * @parent: Pointer to parent device node
>>>> + * @full_name: Node full name
>>>> + * @cset: Pointer to returning changeset
>>>> + *
>>>> + * Return: Pointer to the created device node or NULL in case of an error.
>>>> + */
>>>> +struct device_node *of_create_node(struct device_node *parent,
>>>> + const char *full_name,
>>>> + struct of_changeset **cset)
>>>> +{
>>>> + struct of_changeset *ocs;
>>>> + struct device_node *np;
>>>> + int ret;
>>>> +
>>>> + np = __of_node_dup(NULL, full_name);
>>>> + if (!np)
>>>> + return NULL;
>>>> + np->parent = parent;
>>>> +
>>>> + if (!cset)
>>>> + return np;
>>>> +
>>>> + ocs = kmalloc(sizeof(*ocs), GFP_KERNEL);
>>>> + if (!ocs) {
>>>> + of_node_put(np);
>>>> + return NULL;
>>>> + }
>>>> +
>>>> + of_changeset_init(ocs);
>>>> + ret = of_changeset_attach_node(ocs, np);
>>>> + if (ret) {
>>>> + of_changeset_destroy(ocs);
>>>> + of_node_put(np);
>>>> + kfree(ocs);
>>>> + return NULL;
>>>> + }
>>>> +
>>>> + np->data = ocs;
>>>> + *cset = ocs;
>>>> +
>>>> + return np;
>>>> +}
>>>> +EXPORT_SYMBOL(of_create_node);
>>>> +
>>>> +/**
>>>> + * of_destroy_node - Destroy a dynamically created device node
>>>> + *
>>>> + * @np: Pointer to dynamically created device node
>>>> + *
>>>> + */
>>>> +void of_destroy_node(struct device_node *np)
>>>> +{
>>>> + struct of_changeset *ocs;
>>>> +
>>>> + if (np->data) {
>>>> + ocs = (struct of_changeset *)np->data;
>>>> + of_changeset_destroy(ocs);
>>>> + }
>>>> + of_node_put(np);
>>> A sequence like this would be broken:
>>>
>>> np = of_create_node()
>>> of_node_get(np)
>>> of_destroy_node(np)
>>>
>>> The put here won't free the node because it still has a ref, but we
>>> just freed the changeset. For this to work correctly, we would need
>>> the release function to handle np->data instead. However, all users of
>>> data aren't a changeset.
>>>
>>> I'm failing to remember why we're storing the changeset in 'data', but
>>> there doesn't seem to be a reason now so I think that can just be
>>> dropped. Then if you want to free the node, you'd just do an
>>> of_node_put(). (And maybe after the node is attached you do a put too,
>>> because the attach does a get. Not completely sure.)
>> The question is how to save changeset and free it later. I used global
>> link list to track the changeset been created.
>>
>> Storing the changeset in 'data' can avoid using the global link list.
>>
>> To use of_node_put() to free both node and changeset, I think we can
>>
>> 1) add a new flag, then in of_node_release() we can know np->data is
>> changeset by checking the flag.
>>
>> 2) When creating node, allocate extra memory for changeset and set
>> np->data to a global function of_free_dynamic_node().
>>
>> In of_node_release(), check if np->data == of_free_dynamic_node,
>> call of_free_dynamic_node(np).
>>
>> in of_free_dynamic_node(), free changeset by
>> of_changeset_destroy(np+1)
>>
>> Does this make sense to you? If yes, 1) or 2) sounds better?
> Neither works. Changesets and nodes are not 1:1 in general though they
> are in your use. So you can use the data ptr, but the caller has to
> decide that, not the DT core code.

Ok. In of_pci_make_dev_node(), I can do

     ocs = kmalloc(*ocs);

     of_changeset_init(ocs);

     np = of_changeset_create_node(ocs, name);

     np->data = ocs;

Then in of_pci_remove_node(), I can do

     if (!np || !of_node_check_flag(np, OF_DYNAMIC)) return;

    of_changeset_destroy(np->data);

    of_node_put(np);


Does this sound reasonable?


Thanks,

Lizhi

>
> Rob

2023-03-27 03:07:45

by Frank Rowand

[permalink] [raw]
Subject: Re: [PATCH V7 0/3] Generate device tree node for pci devices

On 3/21/23 03:44, Christian Gmeiner wrote:
> Hi all
>
> Am Do., 9. März 2023 um 09:52 Uhr schrieb Clément Léger
> <[email protected]>:
>>
>> Le Wed, 8 Mar 2023 01:31:52 -0600,
>> Frank Rowand <[email protected]> a écrit :
>>
>>> On 3/6/23 18:52, Rob Herring wrote:
>>>> On Mon, Mar 6, 2023 at 3:24 PM Frank Rowand <[email protected]> wrote:
>>>>>
>>>
>>> < snip >
>>>
>>> Hi Rob,
>>>
>>> I am in no position to comment intelligently on your comments until I
>>> understand the SoC on PCI card model I am asking to be described in
>>> this subthread.
>>
>> Hi Frank,
>>
>> Rather than answering all of the assumptions that were made in the upper
>> thread (that are probably doing a bit too much of inference), I will
>> re-explain that from scratch.
>>
>> Our usecase involves the lan966x SoCs. These SoCs are mainly targeting
>> networking application and offers multiple SFP and RGMII interfaces.
>> This Soc can be used in two exclusive modes (at least for the intended
>> usage):
>>
>> SoC mode:
>> The device runs Linux by itself, on ARM64 cores included in the
>> SoC. This use-case of the lan966x is currently almost upstreamed,
>> using a traditional Device Tree representation of the lan996x HW
>> blocks [1] A number of drivers for the different IPs of the SoC have
>> already been merged in upstream Linux (see
>> arch/arm/boot/dts/lan966x.dtsi)
>>
>> PCI mode:
>> The lan966x SoC is configured as a PCIe endpoint (PCI card),
>> connected to a separate platform that acts as the PCIe root complex.
>> In this case, all the IO memories that are exposed by the devices
>> embedded on this SoC are exposed through PCI BARs 0 & 1 and the ARM64
>> cores of the SoC are not used. Since this is a PCIe card, it can be
>> plugged on any platform, of any architecture supporting PCIe.
>>
>> This work only focus on the *PCI mode* usage. In this mode, we have the
>> following prerequisites:
>> - Should work on all architectures (x86, ARM64, etc)
>> - Should be self-contained in the driver
>> - Should be able to reuse all existing platform drivers
>>
>> In PCI mode, the card runs a firmware (not that it matters at all by
>> the way) which configure the card in PCI mode at boot time. In this
>> mode, it exposes a single PCI physical function associated with
>> vendor/product 0x1055/0x9660. This is not a multi-function PCI device !
>> This means that all the IO memories (peripheral memories, device
>> memories, registers, whatever you call them) are accessible using
>> standard readl()/writel() on the BARs that have been remapped. For
>> instance (not accurate), in the BAR 0, we will have this kind of memory
>> map:
>>
>> BAR0
>> 0x0 ┌───────────┐
>> │ │
>> ├───────────┤
>> │ Clock │
>> │ controller│
>> ├───────────┤
>> │ │
>> ├───────────┤
>> │ I2C │
>> │ controller│
>> ├───────────┤
>> │ │
>> ├───────────┤
>> │ MDIO │
>> │ Controller│
>> ├───────────┤
>> │ │
>> ├───────────┤
>> │ Switch │
>> │ Controller│
>> ├───────────┤
>> │ │
>> │ ... │
>>
>>
>> It also exposes either a single interrupt via the legacy interrupt
>> (which can then be demuxed by reading the SoC internal interrupt
>> controller registers), or multiple interrupts using MSI interrupts.
>>
>> As stated before, all these peripherals are already supported in SoC
>> mode and thus, there are aleready existing platform drivers for each of
>> them. For more information about the devices that are exposed please
>> see link [1] which is the device-tree overlay used to describe the
>> lan9662 card.
>>
>> In order to use the ethernet switch, we must configure everything that
>> lies around this ethernet controller, here are a few amongst all of
>> them:
>> - MDIO bus
>> - I2C controller for SFP modules access
>> - Clock controller
>> - Ethernet controller
>> - Syscon
>>
>> Since all the platform drivers already exist for these devices, we
>> want to reuse them. Multiple solutions were thought of (fwnode, mfd,
>> ACPI, device-tree) and eventually ruled out for some of them and efforts
>> were made to try to tackle that (using fwnode [2], device-tree [3])
>>
>> One way to do so is to use a device-tree overlay description that is
>> loaded dynamically on the PCI device OF node. This can be done using the
>> various device-tree series series that have been proposed (included
>> this one). On systems that do not provide a device-tree of_root, create
>> an empty of_root node (see [4]). Then during PCI enumeration, create
>> device-tree node matching the PCI tree that was enumerated (See [5]).
>> This is needed since the PCI card can be plugged on whatever port the
>> user wants and thus it can not be statically described using a fixed
>> "target-path" property in the overlay.
>>
>> Finally, to glue everything together, we add a PCI driver for the
>> VID/PID of the PCI card (See [6]). This driver is responsible of adding
>> the "ranges" property in the device-tree PCI node to remap the child
>> nodes "reg" property to the PCI memory map. This is needed because the
>> PCI memory addresses differ between platform, enumeration order and so
>> on.Finally, the driver will load the device-tree overlay (See [1]) to
>> the PCI device-tree node. Eventually, a call to
>> of_platform_default_populate() will probe the nodes and platform
>> drivers.
>>
>> I hope this will help you understanding what is going on here. In the
>> meantime, I'm also trying to obtain public documentation about the
>> lan966x SoC.
>>
>> [1]
>> https://github.com/clementleger/linux/blob/bf9b4ef803d86c4ae59a4ca195a4152b0d5c3cea/drivers/mfd/lan966x_pci.dts
>> [2]
>> https://lore.kernel.org/netdev/[email protected]/T/
>> [3]
>> https://lore.kernel.org/lkml/[email protected]/
>> [4]
>> https://lore.kernel.org/lkml/[email protected]/
>> [5]
>> https://lore.kernel.org/lkml/[email protected]/
>> [6]
>> https://github.com/clementleger/linux/blob/bf9b4ef803d86c4ae59a4ca195a4152b0d5c3cea/drivers/mfd/lan966x_pci_of.c
>>
>> --
>> Clément Léger,
>> Embedded Linux and Kernel engineer at Bootlin
>> https://bootlin.com
>
> What is missing to move on with this patch set?

I need to evaluate what Clément Léger wrote in the email you replied to. I had overlooked this
reply to my questions. From a quick scan, it looks like he _probably_ provided the context I
was looking for to understand the architecture of the proposal. But it is late Sunday night,
so I won't get to this tonight.

-Frank

>

2023-03-29 17:05:21

by Frank Rowand

[permalink] [raw]
Subject: Re: [PATCH V7 0/3] Generate device tree node for pci devices

On 3/9/23 02:45, Clément Léger wrote:
> Le Wed, 8 Mar 2023 01:31:52 -0600,
> Frank Rowand <[email protected]> a écrit :
>
>> On 3/6/23 18:52, Rob Herring wrote:
>>> On Mon, Mar 6, 2023 at 3:24 PM Frank Rowand <[email protected]> wrote:
>>>>
>>
>> < snip >
>>
>> Hi Rob,
>>
>> I am in no position to comment intelligently on your comments until I
>> understand the SoC on PCI card model I am asking to be described in
>> this subthread.
>
> Hi Frank,
>
> Rather than answering all of the assumptions that were made in the upper
> thread (that are probably doing a bit too much of inference), I will
> re-explain that from scratch.

Thanks! The below answers a lot of my questions.
>
> Our usecase involves the lan966x SoCs. These SoCs are mainly targeting
> networking application and offers multiple SFP and RGMII interfaces.
> This Soc can be used in two exclusive modes (at least for the intended
> usage):
>
> SoC mode:
> The device runs Linux by itself, on ARM64 cores included in the
> SoC. This use-case of the lan966x is currently almost upstreamed,
> using a traditional Device Tree representation of the lan996x HW
> blocks [1] A number of drivers for the different IPs of the SoC have
> already been merged in upstream Linux (see
> arch/arm/boot/dts/lan966x.dtsi)
>
> PCI mode:
> The lan966x SoC is configured as a PCIe endpoint (PCI card),
> connected to a separate platform that acts as the PCIe root complex.
> In this case, all the IO memories that are exposed by the devices
> embedded on this SoC are exposed through PCI BARs 0 & 1 and the ARM64
> cores of the SoC are not used. Since this is a PCIe card, it can be
> plugged on any platform, of any architecture supporting PCIe.
>
> This work only focus on the *PCI mode* usage. In this mode, we have the
> following prerequisites:
> - Should work on all architectures (x86, ARM64, etc)
> - Should be self-contained in the driver

> - Should be able to reuse all existing platform drivers

I've said before (in different words) that using an existing platform
driver for hardware on a PCI card requires shims, which have been
strongly rejected by the Linux kernel.

>
> In PCI mode, the card runs a firmware (not that it matters at all by
> the way) which configure the card in PCI mode at boot time. In this
> mode, it exposes a single PCI physical function associated with
> vendor/product 0x1055/0x9660. This is not a multi-function PCI device !
> This means that all the IO memories (peripheral memories, device
> memories, registers, whatever you call them) are accessible using
> standard readl()/writel() on the BARs that have been remapped. For
> instance (not accurate), in the BAR 0, we will have this kind of memory
> map:
>
> BAR0
> 0x0 ┌───────────┐
> │ │
> ├───────────┤
> │ Clock │
> │ controller│
> ├───────────┤
> │ │
> ├───────────┤
> │ I2C │
> │ controller│
> ├───────────┤
> │ │
> ├───────────┤
> │ MDIO │
> │ Controller│
> ├───────────┤
> │ │
> ├───────────┤
> │ Switch │
> │ Controller│
> ├───────────┤
> │ │
> │ ... │
>
>
> It also exposes either a single interrupt via the legacy interrupt
> (which can then be demuxed by reading the SoC internal interrupt
> controller registers), or multiple interrupts using MSI interrupts.
>
> As stated before, all these peripherals are already supported in SoC
> mode and thus, there are aleready existing platform drivers for each of
> them. For more information about the devices that are exposed please
> see link [1] which is the device-tree overlay used to describe the
> lan9662 card.
>
> In order to use the ethernet switch, we must configure everything that
> lies around this ethernet controller, here are a few amongst all of
> them:
> - MDIO bus
> - I2C controller for SFP modules access
> - Clock controller
> - Ethernet controller
> - Syscon
>
> Since all the platform drivers already exist for these devices, we
> want to reuse them. Multiple solutions were thought of (fwnode, mfd,
> ACPI, device-tree) and eventually ruled out for some of them and efforts
> were made to try to tackle that (using fwnode [2], device-tree [3])
>

> One way to do so is to use a device-tree overlay description that is
> loaded dynamically on the PCI device OF node. This can be done using the
> various device-tree series series that have been proposed (included
> this one). On systems that do not provide a device-tree of_root, create
> an empty of_root node (see [4]). Then during PCI enumeration, create
> device-tree node matching the PCI tree that was enumerated (See [5]).
> This is needed since the PCI card can be plugged on whatever port the
> user wants and thus it can not be statically described using a fixed
> "target-path" property in the overlay.

I understand that this is a use case and a desire to implement a
solution for the use case. But this is a very non-standard model.
The proposal exposes a bunch of hardware beyond the pci interface
in a non-pci method.

No, just no. Respect the pci interface boundary and do not drag
devicetree into an effort to pierce and straddle that boundary
(by adding information about the card, beyond the PCI controller,
into the system devicetree). Information about dynamically
discoverable hardware does not belong in the devicetree.

-Frank

>
> Finally, to glue everything together, we add a PCI driver for the
> VID/PID of the PCI card (See [6]). This driver is responsible of adding
> the "ranges" property in the device-tree PCI node to remap the child
> nodes "reg" property to the PCI memory map. This is needed because the
> PCI memory addresses differ between platform, enumeration order and so
> on.Finally, the driver will load the device-tree overlay (See [1]) to
> the PCI device-tree node. Eventually, a call to
> of_platform_default_populate() will probe the nodes and platform
> drivers.
>
> I hope this will help you understanding what is going on here. In the
> meantime, I'm also trying to obtain public documentation about the
> lan966x SoC.
>
> [1]
> https://github.com/clementleger/linux/blob/bf9b4ef803d86c4ae59a4ca195a4152b0d5c3cea/drivers/mfd/lan966x_pci.dts
> [2]
> https://lore.kernel.org/netdev/[email protected]/T/
> [3]
> https://lore.kernel.org/lkml/[email protected]/
> [4]
> https://lore.kernel.org/lkml/[email protected]/
> [5]
> https://lore.kernel.org/lkml/[email protected]/
> [6]
> https://github.com/clementleger/linux/blob/bf9b4ef803d86c4ae59a4ca195a4152b0d5c3cea/drivers/mfd/lan966x_pci_of.c
>

2023-03-30 15:23:39

by Rob Herring (Arm)

[permalink] [raw]
Subject: Re: [PATCH V7 0/3] Generate device tree node for pci devices

On Wed, Mar 29, 2023 at 11:51 AM Frank Rowand <[email protected]> wrote:
>
> On 3/9/23 02:45, Clément Léger wrote:
> > Le Wed, 8 Mar 2023 01:31:52 -0600,
> > Frank Rowand <[email protected]> a écrit :
> >
> >> On 3/6/23 18:52, Rob Herring wrote:
> >>> On Mon, Mar 6, 2023 at 3:24 PM Frank Rowand <[email protected]> wrote:
> >>>>
> >>
> >> < snip >
> >>
> >> Hi Rob,
> >>
> >> I am in no position to comment intelligently on your comments until I
> >> understand the SoC on PCI card model I am asking to be described in
> >> this subthread.
> >
> > Hi Frank,
> >
> > Rather than answering all of the assumptions that were made in the upper
> > thread (that are probably doing a bit too much of inference), I will
> > re-explain that from scratch.
>
> Thanks! The below answers a lot of my questions.
> >
> > Our usecase involves the lan966x SoCs. These SoCs are mainly targeting
> > networking application and offers multiple SFP and RGMII interfaces.
> > This Soc can be used in two exclusive modes (at least for the intended
> > usage):
> >
> > SoC mode:
> > The device runs Linux by itself, on ARM64 cores included in the
> > SoC. This use-case of the lan966x is currently almost upstreamed,
> > using a traditional Device Tree representation of the lan996x HW
> > blocks [1] A number of drivers for the different IPs of the SoC have
> > already been merged in upstream Linux (see
> > arch/arm/boot/dts/lan966x.dtsi)
> >
> > PCI mode:
> > The lan966x SoC is configured as a PCIe endpoint (PCI card),
> > connected to a separate platform that acts as the PCIe root complex.
> > In this case, all the IO memories that are exposed by the devices
> > embedded on this SoC are exposed through PCI BARs 0 & 1 and the ARM64
> > cores of the SoC are not used. Since this is a PCIe card, it can be
> > plugged on any platform, of any architecture supporting PCIe.
> >
> > This work only focus on the *PCI mode* usage. In this mode, we have the
> > following prerequisites:
> > - Should work on all architectures (x86, ARM64, etc)
> > - Should be self-contained in the driver
>
> > - Should be able to reuse all existing platform drivers
>
> I've said before (in different words) that using an existing platform
> driver for hardware on a PCI card requires shims, which have been
> strongly rejected by the Linux kernel.

Do you have an example of what you are saying has been rejected
because I have no clue what you are referring to?

The kernel already has a way to divide up a PCI device into multiple
non-PCI sub-drivers. It's auxiliary_bus. Is that not a "shim"? So why
not use that? From the docs:

* A key requirement for utilizing the auxiliary bus is that there is no
* dependency on a physical bus, device, register accesses or regmap support.
* These individual devices split from the core cannot live on the platform bus
* as they are not physical devices that are controlled by DT/ACPI. The same
* argument applies for not using MFD in this scenario as MFD relies on
* individual function devices being physical devices.


In the usecases here, they are physical devices because it's the same
devices when Linux is running on the SoC.

> >
> > In PCI mode, the card runs a firmware (not that it matters at all by
> > the way) which configure the card in PCI mode at boot time. In this
> > mode, it exposes a single PCI physical function associated with
> > vendor/product 0x1055/0x9660. This is not a multi-function PCI device !
> > This means that all the IO memories (peripheral memories, device
> > memories, registers, whatever you call them) are accessible using
> > standard readl()/writel() on the BARs that have been remapped. For
> > instance (not accurate), in the BAR 0, we will have this kind of memory
> > map:
> >
> > BAR0
> > 0x0 ┌───────────┐
> > │ │
> > ├───────────┤
> > │ Clock │
> > │ controller│
> > ├───────────┤
> > │ │
> > ├───────────┤
> > │ I2C │
> > │ controller│
> > ├───────────┤
> > │ │
> > ├───────────┤
> > │ MDIO │
> > │ Controller│
> > ├───────────┤
> > │ │
> > ├───────────┤
> > │ Switch │
> > │ Controller│
> > ├───────────┤
> > │ │
> > │ ... │
> >
> >
> > It also exposes either a single interrupt via the legacy interrupt
> > (which can then be demuxed by reading the SoC internal interrupt
> > controller registers), or multiple interrupts using MSI interrupts.
> >
> > As stated before, all these peripherals are already supported in SoC
> > mode and thus, there are aleready existing platform drivers for each of
> > them. For more information about the devices that are exposed please
> > see link [1] which is the device-tree overlay used to describe the
> > lan9662 card.
> >
> > In order to use the ethernet switch, we must configure everything that
> > lies around this ethernet controller, here are a few amongst all of
> > them:
> > - MDIO bus
> > - I2C controller for SFP modules access
> > - Clock controller
> > - Ethernet controller
> > - Syscon
> >
> > Since all the platform drivers already exist for these devices, we
> > want to reuse them. Multiple solutions were thought of (fwnode, mfd,
> > ACPI, device-tree) and eventually ruled out for some of them and efforts
> > were made to try to tackle that (using fwnode [2], device-tree [3])
> >
>
> > One way to do so is to use a device-tree overlay description that is
> > loaded dynamically on the PCI device OF node. This can be done using the
> > various device-tree series series that have been proposed (included
> > this one). On systems that do not provide a device-tree of_root, create
> > an empty of_root node (see [4]). Then during PCI enumeration, create
> > device-tree node matching the PCI tree that was enumerated (See [5]).
> > This is needed since the PCI card can be plugged on whatever port the
> > user wants and thus it can not be statically described using a fixed
> > "target-path" property in the overlay.
>
> I understand that this is a use case and a desire to implement a
> solution for the use case. But this is a very non-standard model.
> The proposal exposes a bunch of hardware beyond the pci interface
> in a non-pci method.

It is not the proposal that exposes a bunch of hardware. This device
exposes a bunch of hardware. As you say, it is *beyond the PCI
interface*, so it has zero to do with PCI.

We already support non-discoverable h/w behind a PCI bus. It's called
ISA. There's powerpc DT files in the tree with ISA devices.

> No, just no. Respect the pci interface boundary and do not drag
> devicetree into an effort to pierce and straddle that boundary
> (by adding information about the card, beyond the PCI controller,
> into the system devicetree). Information about dynamically
> discoverable hardware does not belong in the devicetree.

What is discoverable? Nothing more than a VID/PID.

Your suggestion is simply use the VID/PID(s) and then the PCI driver
for the card will have all the details that implies. There's a name
for that: board files. Just like we had a single machine ID per board
registered with RMK and the kernel had to contain all the
configuration details for each machine ID. It's not just 1 card here.
This is a chip and I imagine what's used or not used or how the
downstream peripherals are configured all depend on the customer and
their specific designs.

If DT is not used here, then swnodes (DT bindings embedded in the
kernel or platform_data 2.0) will be. It's exactly the same structure
in the kernel. It's still going to be non-PCI drivers for all the sub
devices. The change in the drivers will not be making them PCI
drivers, but simply converting DT APIs to fwnode APIs which is
pointless churn IMO.

Finally, let's not forget the FPGA usecase. We already support DT
overlays for FPGAs. The fact that the FPGA sits behind a PCI interface
is pretty much irrelevant to the problem. There is simply no way the
kernel is going to contain information about what's within the FPGA
image. If not DT, how should we communicate the FPGA configuration to
the kernel?

Rob

2023-03-31 22:13:15

by Frank Rowand

[permalink] [raw]
Subject: Re: [PATCH V7 0/3] Generate device tree node for pci devices

Hi Rob,

This is an off the cuff response, so anything that requires me to do research
(eg because it involves stuff that I am not educated about) will be an
incomplete answer, but I don't want to delay the discussion too much in
the meantime.

On 3/30/23 10:19, Rob Herring wrote:
> On Wed, Mar 29, 2023 at 11:51 AM Frank Rowand <[email protected]> wrote:
>>
>> On 3/9/23 02:45, Clément Léger wrote:
>>> Le Wed, 8 Mar 2023 01:31:52 -0600,
>>> Frank Rowand <[email protected]> a écrit :
>>>
>>>> On 3/6/23 18:52, Rob Herring wrote:
>>>>> On Mon, Mar 6, 2023 at 3:24 PM Frank Rowand <[email protected]> wrote:
>>>>>>
>>>>
>>>> < snip >
>>>>
>>>> Hi Rob,
>>>>
>>>> I am in no position to comment intelligently on your comments until I
>>>> understand the SoC on PCI card model I am asking to be described in
>>>> this subthread.
>>>
>>> Hi Frank,
>>>
>>> Rather than answering all of the assumptions that were made in the upper
>>> thread (that are probably doing a bit too much of inference), I will
>>> re-explain that from scratch.
>>
>> Thanks! The below answers a lot of my questions.
>>>
>>> Our usecase involves the lan966x SoCs. These SoCs are mainly targeting
>>> networking application and offers multiple SFP and RGMII interfaces.
>>> This Soc can be used in two exclusive modes (at least for the intended
>>> usage):
>>>
>>> SoC mode:
>>> The device runs Linux by itself, on ARM64 cores included in the
>>> SoC. This use-case of the lan966x is currently almost upstreamed,
>>> using a traditional Device Tree representation of the lan996x HW
>>> blocks [1] A number of drivers for the different IPs of the SoC have
>>> already been merged in upstream Linux (see
>>> arch/arm/boot/dts/lan966x.dtsi)
>>>
>>> PCI mode:
>>> The lan966x SoC is configured as a PCIe endpoint (PCI card),
>>> connected to a separate platform that acts as the PCIe root complex.
>>> In this case, all the IO memories that are exposed by the devices
>>> embedded on this SoC are exposed through PCI BARs 0 & 1 and the ARM64
>>> cores of the SoC are not used. Since this is a PCIe card, it can be
>>> plugged on any platform, of any architecture supporting PCIe.
>>>
>>> This work only focus on the *PCI mode* usage. In this mode, we have the
>>> following prerequisites:
>>> - Should work on all architectures (x86, ARM64, etc)
>>> - Should be self-contained in the driver
>>
>>> - Should be able to reuse all existing platform drivers
>>
>> I've said before (in different words) that using an existing platform
>> driver for hardware on a PCI card requires shims, which have been
>> strongly rejected by the Linux kernel.
>
> Do you have an example of what you are saying has been rejected
> because I have no clue what you are referring to?

Ancient history. Using windows drivers, proprietary unix drivers, etc
in the Linux kernel. Is that enough to make clear what I'm talking about,
or do I need to provide specific pointers?

>
> The kernel already has a way to divide up a PCI device into multiple
> non-PCI sub-drivers. It's auxiliary_bus. Is that not a "shim"? So why
> not use that? From the docs:

I don't know auxiliary_bus, so my response here is iffy. But from what
you describe here, auxiliary_bus sounds like a potential fit.

>
> * A key requirement for utilizing the auxiliary bus is that there is no
> * dependency on a physical bus, device, register accesses or regmap support.
> * These individual devices split from the core cannot live on the platform bus
> * as they are not physical devices that are controlled by DT/ACPI. The same
> * argument applies for not using MFD in this scenario as MFD relies on
> * individual function devices being physical devices.
>
>
> In the usecases here, they are physical devices because it's the same
> devices when Linux is running on the SoC.
>
>>>
>>> In PCI mode, the card runs a firmware (not that it matters at all by
>>> the way) which configure the card in PCI mode at boot time. In this
>>> mode, it exposes a single PCI physical function associated with
>>> vendor/product 0x1055/0x9660. This is not a multi-function PCI device !
>>> This means that all the IO memories (peripheral memories, device
>>> memories, registers, whatever you call them) are accessible using
>>> standard readl()/writel() on the BARs that have been remapped. For
>>> instance (not accurate), in the BAR 0, we will have this kind of memory
>>> map:
>>>
>>> BAR0
>>> 0x0 ┌───────────┐
>>> │ │
>>> ├───────────┤
>>> │ Clock │
>>> │ controller│
>>> ├───────────┤
>>> │ │
>>> ├───────────┤
>>> │ I2C │
>>> │ controller│
>>> ├───────────┤
>>> │ │
>>> ├───────────┤
>>> │ MDIO │
>>> │ Controller│
>>> ├───────────┤
>>> │ │
>>> ├───────────┤
>>> │ Switch │
>>> │ Controller│
>>> ├───────────┤
>>> │ │
>>> │ ... │
>>>
>>>
>>> It also exposes either a single interrupt via the legacy interrupt
>>> (which can then be demuxed by reading the SoC internal interrupt
>>> controller registers), or multiple interrupts using MSI interrupts.
>>>
>>> As stated before, all these peripherals are already supported in SoC
>>> mode and thus, there are aleready existing platform drivers for each of
>>> them. For more information about the devices that are exposed please
>>> see link [1] which is the device-tree overlay used to describe the
>>> lan9662 card.
>>>
>>> In order to use the ethernet switch, we must configure everything that
>>> lies around this ethernet controller, here are a few amongst all of
>>> them:
>>> - MDIO bus
>>> - I2C controller for SFP modules access
>>> - Clock controller
>>> - Ethernet controller
>>> - Syscon
>>>
>>> Since all the platform drivers already exist for these devices, we
>>> want to reuse them. Multiple solutions were thought of (fwnode, mfd,
>>> ACPI, device-tree) and eventually ruled out for some of them and efforts
>>> were made to try to tackle that (using fwnode [2], device-tree [3])
>>>
>>
>>> One way to do so is to use a device-tree overlay description that is
>>> loaded dynamically on the PCI device OF node. This can be done using the
>>> various device-tree series series that have been proposed (included
>>> this one). On systems that do not provide a device-tree of_root, create
>>> an empty of_root node (see [4]). Then during PCI enumeration, create
>>> device-tree node matching the PCI tree that was enumerated (See [5]).
>>> This is needed since the PCI card can be plugged on whatever port the
>>> user wants and thus it can not be statically described using a fixed
>>> "target-path" property in the overlay.
>>
>> I understand that this is a use case and a desire to implement a
>> solution for the use case. But this is a very non-standard model.
>> The proposal exposes a bunch of hardware beyond the pci interface
>> in a non-pci method.
>
> It is not the proposal that exposes a bunch of hardware. This device
> exposes a bunch of hardware. As you say, it is *beyond the PCI
> interface*, so it has zero to do with PCI.
>
> We already support non-discoverable h/w behind a PCI bus. It's called
> ISA. There's powerpc DT files in the tree with ISA devices.

Thanks for the specific example.

Off the top of my head, I suspect ISA is a very different case. Wasn't
ISA an example of a legacy bus that was expected to disappear? (A long
time since I have thought about ISA.) I'm not sure whether the powerpc
DT files with ISA devices is a good argument here or not - I would have
to look at them

>
>> No, just no. Respect the pci interface boundary and do not drag
>> devicetree into an effort to pierce and straddle that boundary
>> (by adding information about the card, beyond the PCI controller,
>> into the system devicetree). Information about dynamically
>> discoverable hardware does not belong in the devicetree.
>
> What is discoverable? Nothing more than a VID/PID.
>
> Your suggestion is simply use the VID/PID(s) and then the PCI driver
> for the card will have all the details that implies. There's a name
> for that: board files. Just like we had a single machine ID per board

No, IMHO it is most definitely not analogous to board files.

Board files were a mix of configuration information and description
of hardware that was non-discoverable. The hardware present could
vary quite a bit between different boards within a family. PCI
device drivers have been able to contain the information needed to
access the PCI device, and to handle variations between versions
of the board during the board's evolving life.

> registered with RMK and the kernel had to contain all the
> configuration details for each machine ID. It's not just 1 card here.
> This is a chip and I imagine what's used or not used or how the
> downstream peripherals are configured all depend on the customer and
> their specific designs.

You are saying that this is a universal board, that will be used
different ways by different customers?

This is one of the things I've been trying to get explained (eg
why I've asked for a high level data sheet). The use case as
presented in this patch series if for a specific PCI card that
has an SoC on it that can provide network functionality. My
guess would be that the card has one of more physical network
connectors that make the board useful in a larger system. The
SoC potentially has a lot of other functionality that could be
used if the proper physical connectors exist, but I don't know
because the information has not been provided. It has been
asserted that other logic blocks on the SoC need to be controlled
by other existing platform drivers - but is that because they
are needed to best use the network functionality (eg, as a
theoretical example, is some power management control desired
so the board doesn't consume excess power when the network is
idle).

>
> If DT is not used here, then swnodes (DT bindings embedded in the
> kernel or platform_data 2.0) will be. It's exactly the same structure
> in the kernel. It's still going to be non-PCI drivers for all the sub
> devices. The change in the drivers will not be making them PCI
> drivers, but simply converting DT APIs to fwnode APIs which is
> pointless churn IMO.

Why can't there be PCI drivers for devices on the PCI board?

Why add all the complexity of doing an impedance conversion to
make a device on a PCI card look like it is on the system bus
(in other words, the driver uses the DT description and APIs
as if the device was located on a system bus instead of being
on the other side of the PCI socket)?

>
> Finally, let's not forget the FPGA usecase. We already support DT
> overlays for FPGAs. The fact that the FPGA sits behind a PCI interface
> is pretty much irrelevant to the problem. There is simply no way the

I consider the PCI interface to be a big deal. Whether for an FPGA
or for an SoC on a PCI card.

The PCI socket (or cable) insulates all hardware on the PCI card from
the host system. The PCI controller "driver" is aware of details of
the host system (eg will have to arrange for system interrupts used
to implement the PCI interrupts), but PCI drivers do not manage
those details - they deal with PCI concepts (eg PCI interrupts).

Everything I've said above may be a little cloudy and incomplete. That
could reflect lack of knowledge on my part and/or incomplete information
about the need and the use case. I've clearly been struggling with
trying to understand the proposal, the architecture, and the mental
model involved. It has been asserted that I don't need to dig into
the previous rejected alternative approaches, but I suspect that not
having done so is contributing to my confusion.

> kernel is going to contain information about what's within the FPGA
> image. If not DT, how should we communicate the FPGA configuration to
> the kernel?

If the FPGA creates and exposes non-discoverable devices on the system
busses (not behind the PCI socket boundary) then DT seems reasonable.
If overlays, then preferably static overlays that can be applied by
the bootloader before the kernel boots. (Yes, I understand how restrictive
and unpopular that is for the FPGA community, but that also does not
negate the fact that overlay support has never been properly completed
for generic use.)

I don't want to re-debate the whole FPGA discussion. For this patch
series I have been trying to narrow the discussion to the SoC on a
PCI board issue. I'm trying to either find a solution for the
SoC on a PCI board or not accept non-solutions for that.

>
> Rob

2023-04-05 02:03:44

by Rob Herring (Arm)

[permalink] [raw]
Subject: Re: [PATCH V7 1/3] of: dynamic: Add interfaces for creating device node dynamically

On Fri, Mar 24, 2023 at 4:26 PM Lizhi Hou <[email protected]> wrote:
>
>
> On 3/24/23 07:14, Rob Herring wrote:
> > On Thu, Mar 23, 2023 at 9:12 PM Lizhi Hou <[email protected]> wrote:
> >>
> >> On 3/23/23 15:40, Rob Herring wrote:
> >>> On Thu, Jan 19, 2023 at 9:02 PM Lizhi Hou <[email protected]> wrote:
> >>>> of_create_node() creates device node dynamically. The parent device node
> >>>> and full name are required for creating the node. It optionally creates
> >>>> an OF changeset and attaches the newly created node to the changeset. The
> >>>> device node pointer and the changeset pointer can be used to add
> >>>> properties to the device node and apply the node to the base tree.
> >>>>
> >>>> of_destroy_node() frees the device node created by of_create_node(). If
> >>>> an OF changeset was also created for this node, it will destroy the
> >>>> changeset before freeing the device node.
> >>>>
> >>>> Expand of_changeset APIs to handle specific types of properties.
> >>>> of_changeset_add_prop_string()
> >>>> of_changeset_add_prop_string_array()
> >>>> of_changeset_add_prop_u32_array()
> >>>>
> >>>> Signed-off-by: Lizhi Hou <[email protected]>
> >>> Your Sob should be last because you sent this patch. The order of Sob
> >>> is roughly the order of possession of the patch.
> >> Got it.
> >>>> Signed-off-by: Sonal Santan <[email protected]>
> >>>> Signed-off-by: Max Zhen <[email protected]>
> >>> So Sonal and Max modified this patch?
> >> They did not directly modify the code. And we discussed the design
> >> together. They also reviewed the patch before I sent it out. Please let
> >> me know if other keyword should be used in this case.
> > Reviewed-by or nothing. Some feel that only reviews on public lists
> > should get that tag and internal, private reviews don't matter.
> >
> >>>> Reviewed-by: Brian Xu <[email protected]>
> >>>> Signed-off-by: Clément Léger <[email protected]>
> >>> Why does this have Clément's Sob?
> >> I referenced Clément 's code and used one portion in my first patch
> >> series. And I re-implemented it later to address the code review
> >> comments/requests.
> > Then it goes first or you can use the 'Co-developed-by' tag.
> >
> >>>> ---
> >>>> drivers/of/dynamic.c | 197 +++++++++++++++++++++++++++++++++++++++++++
> >>>> include/linux/of.h | 24 ++++++
> >>>> 2 files changed, 221 insertions(+)
> >>>>
> >>>> diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
> >>>> index cd3821a6444f..4e211a1d039f 100644
> >>>> --- a/drivers/of/dynamic.c
> >>>> +++ b/drivers/of/dynamic.c
> >>>> @@ -461,6 +461,71 @@ struct device_node *__of_node_dup(const struct device_node *np,
> >>>> return NULL;
> >>>> }
> >>>>
> >>>> +/**
> >>>> + * of_create_node - Dynamically create a device node
> >>> For consistency, I think this should be of_changeset_create_node().
> >> Sure.
> >>>> + *
> >>>> + * @parent: Pointer to parent device node
> >>>> + * @full_name: Node full name
> >>>> + * @cset: Pointer to returning changeset
> >>>> + *
> >>>> + * Return: Pointer to the created device node or NULL in case of an error.
> >>>> + */
> >>>> +struct device_node *of_create_node(struct device_node *parent,
> >>>> + const char *full_name,
> >>>> + struct of_changeset **cset)
> >>>> +{
> >>>> + struct of_changeset *ocs;
> >>>> + struct device_node *np;
> >>>> + int ret;
> >>>> +
> >>>> + np = __of_node_dup(NULL, full_name);
> >>>> + if (!np)
> >>>> + return NULL;
> >>>> + np->parent = parent;
> >>>> +
> >>>> + if (!cset)
> >>>> + return np;
> >>>> +
> >>>> + ocs = kmalloc(sizeof(*ocs), GFP_KERNEL);
> >>>> + if (!ocs) {
> >>>> + of_node_put(np);
> >>>> + return NULL;
> >>>> + }
> >>>> +
> >>>> + of_changeset_init(ocs);
> >>>> + ret = of_changeset_attach_node(ocs, np);
> >>>> + if (ret) {
> >>>> + of_changeset_destroy(ocs);
> >>>> + of_node_put(np);
> >>>> + kfree(ocs);
> >>>> + return NULL;
> >>>> + }
> >>>> +
> >>>> + np->data = ocs;
> >>>> + *cset = ocs;
> >>>> +
> >>>> + return np;
> >>>> +}
> >>>> +EXPORT_SYMBOL(of_create_node);
> >>>> +
> >>>> +/**
> >>>> + * of_destroy_node - Destroy a dynamically created device node
> >>>> + *
> >>>> + * @np: Pointer to dynamically created device node
> >>>> + *
> >>>> + */
> >>>> +void of_destroy_node(struct device_node *np)
> >>>> +{
> >>>> + struct of_changeset *ocs;
> >>>> +
> >>>> + if (np->data) {
> >>>> + ocs = (struct of_changeset *)np->data;
> >>>> + of_changeset_destroy(ocs);
> >>>> + }
> >>>> + of_node_put(np);
> >>> A sequence like this would be broken:
> >>>
> >>> np = of_create_node()
> >>> of_node_get(np)
> >>> of_destroy_node(np)
> >>>
> >>> The put here won't free the node because it still has a ref, but we
> >>> just freed the changeset. For this to work correctly, we would need
> >>> the release function to handle np->data instead. However, all users of
> >>> data aren't a changeset.
> >>>
> >>> I'm failing to remember why we're storing the changeset in 'data', but
> >>> there doesn't seem to be a reason now so I think that can just be
> >>> dropped. Then if you want to free the node, you'd just do an
> >>> of_node_put(). (And maybe after the node is attached you do a put too,
> >>> because the attach does a get. Not completely sure.)
> >> The question is how to save changeset and free it later. I used global
> >> link list to track the changeset been created.
> >>
> >> Storing the changeset in 'data' can avoid using the global link list.
> >>
> >> To use of_node_put() to free both node and changeset, I think we can
> >>
> >> 1) add a new flag, then in of_node_release() we can know np->data is
> >> changeset by checking the flag.
> >>
> >> 2) When creating node, allocate extra memory for changeset and set
> >> np->data to a global function of_free_dynamic_node().
> >>
> >> In of_node_release(), check if np->data == of_free_dynamic_node,
> >> call of_free_dynamic_node(np).
> >>
> >> in of_free_dynamic_node(), free changeset by
> >> of_changeset_destroy(np+1)
> >>
> >> Does this make sense to you? If yes, 1) or 2) sounds better?
> > Neither works. Changesets and nodes are not 1:1 in general though they
> > are in your use. So you can use the data ptr, but the caller has to
> > decide that, not the DT core code.
>
> Ok. In of_pci_make_dev_node(), I can do
>
> ocs = kmalloc(*ocs);
>
> of_changeset_init(ocs);
>
> np = of_changeset_create_node(ocs, name);
>
> np->data = ocs;
>
> Then in of_pci_remove_node(), I can do
>
> if (!np || !of_node_check_flag(np, OF_DYNAMIC)) return;
>
> of_changeset_destroy(np->data);
>
> of_node_put(np);
>
>
> Does this sound reasonable?

Yes, I think that should work.

Rob